Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 Aug 2007 18:53:26 GMT
From:      Douglas Rudoff <joseph.blough@yahoo.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/115524: [rpc][patch] Lockd fails to set RPC authentication for granted lock callback
Message-ID:  <200708141853.l7EIrQxB064992@www.freebsd.org>
Resent-Message-ID: <200708141900.l7EJ07C5078185@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         115524
>Category:       kern
>Synopsis:       [rpc][patch] Lockd fails to set RPC authentication for granted lock callback
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Aug 14 19:00:07 GMT 2007
>Closed-Date:
>Last-Modified:
>Originator:     Douglas Rudoff
>Release:        6.1
>Organization:
Isilon
>Environment:
System: FreeBSD drudoffbsd.isilon.com 6.1-RELEASE-p7 FreeBSD 6.1-RELEASE-p7 #3: Tue Sep 26 14:10:13 PDT 2006 root@:/usr/obj/usr/src/sys/SM865DESKTOP i386

>Description:
Scenario:
Linux Client1 gets exclusive lock on NFS-mounted file foo, sleeps a bit, then releases the lock. Linux Client2 attempts to lock foo while Client1 is holding the lock. Client2's request is blocked and Client2 is waiting for the lock to be released. When Client1 releases the lock, instead of the lock being granted immediately to Client2, the lock is obtained 30 seconds after Client2 initially requested the lock.

I previously submitted PR kern/10755 (http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/107555) about this problem. More detailed description can be found there including test code to run on clients.

What's happening:
When lockd attempts to send the NLM_GRANTED callback to Client 2, the RPC call fails due to an authentication error. Lockd is sending AUTH_NULL as its credentials and Linux doesn't accept that as a valid credential. Lockd doesn't bother to check if the RPC call failed and assigns the lock to Client2, but Client2 never received the NLM_GRANTED message so it is unaware it has the lock. Client2 waits 30 seconds after its initial request for the lock and asks again. Since lockd is already holding the lock for Client2, this second lock request succeeds and Client2 is now aware that it has the lock.

(This problem doesn't happen with FreeBSD clients. FreeBSD isn't as picky as Linux about the authentication credentials in the NLM_GRANTED messages sent from lockd.)
>How-To-Repeat:
Run rpc.lockd with a debug level 3 or above. Follow the directions that are in PR kern/10755 on how to run the Linux clients. If you feel like it, run tcpdump on the FreeBSD server during the test.

Afterwards in lockd's log you'll see "rpc.lockd: clnt_call returns 7(RPC: Authentication error) for granted".

Analyze the pcap in Wireshark and see that the Linux client rejects the NLM_GRANTED call with an AUTH_ERROR (Note that there's a bug in NLM filtering in Wireshark. To see the AUTH_ERROR reply you have to filter for both NLM and RPC).
>Fix:
The patch does this:

1) send_granted() gives RPC messages to the client an authentication with authunix_create_default(). The Linux client will now accept the NLM_GRANTED message and locks the file when the lock is first available. No more 30 second wait to receive the lock.

2) Previously, the status of the RPC call in send_granted() was ignored. In the case of failure, lockd held the lock for a client that was unaware it held the lock. The fix is if send_granted() fails, lockd does not hold on to the lock for the client _and_ the client's blocked lock is removed from the blocked list, requiring the client to ask again for the lock. Another alternative would be to leave the lock on the blocked list if send_granted() fails, but if it's impossible to send a message to clients, the lock requests would accumulate on the blocked list and never be removed. It seemed requiring the client to ask again is the better solution (and effectively that is what is done now when send_granted() fails).

3) Fixed typo in get_lock_matching_unlock() debug message.

Patch attached with submission follows:

Index: lockd_lock.c
===================================================================
--- lockd_lock.c	(revision 79264)
+++ lockd_lock.c	(working copy)
@@ -146,21 +146,21 @@ enum partialfilelock_status { PFL_GRANTE
 			      PFL_NFSDENIED, PFL_NFSBLOCKED, PFL_NFSDENIED_NOLOCK, PFL_NFSRESERR,
 			      PFL_HWDENIED,  PFL_HWBLOCKED,  PFL_HWDENIED_NOLOCK, PFL_HWRESERR};
 
 enum LFLAGS {LEDGE_LEFT, LEDGE_LBOUNDARY, LEDGE_INSIDE, LEDGE_RBOUNDARY, LEDGE_RIGHT};
 enum RFLAGS {REDGE_LEFT, REDGE_LBOUNDARY, REDGE_INSIDE, REDGE_RBOUNDARY, REDGE_RIGHT};
 /* XXX: WARNING! I HAVE OVERLOADED THIS STATUS ENUM!  SPLIT IT APART INTO TWO */
 enum split_status {SPL_DISJOINT=0, SPL_LOCK1=1, SPL_LOCK2=2, SPL_CONTAINED=4, SPL_RESERR=8};
 
 enum partialfilelock_status lock_partialfilelock(struct file_lock *fl);
 
-void send_granted(struct file_lock *fl, int opcode);
+int send_granted(struct file_lock *fl, int opcode);
 void siglock(void);
 void sigunlock(void);
 void monitor_lock_host(const char *hostname);
 void unmonitor_lock_host(char *hostname);
 
 void	copy_nlm4_lock_to_nlm4_holder(const struct nlm4_lock *src,
     const bool_t exclusive, struct nlm4_holder *dest);
 struct file_lock *	allocate_file_lock(const netobj *lockowner,
 					   const netobj *matchcookie,
 					   const struct sockaddr *addr,
@@ -760,21 +760,21 @@ get_lock_matching_unlock(const struct fi
 		    ifl->client.l_offset,ifl->client.l_len);
 
 		/* Regions overlap, check the identity */
 		if (!same_filelock_identity(fl,ifl))
 			continue;
 
 		debuglog("get_lock_matching_unlock: Duplicate lock id.  Granting\n");
 		return (ifl);
 	}
 
-	debuglog("Exiting bet_lock_matching_unlock\n");
+	debuglog("Exiting get_lock_matching_unlock\n");
 
 	return (NULL);
 }
 
 /*
  * test_nfslock: check for NFS lock in lock list
  *
  * This routine makes the following assumptions:
  *    1) Nothing will adjust the lock list during a lookup
  *
@@ -1309,21 +1309,31 @@ retry_blockingfilelocklist(void)
 		 * before being allowed to participate in the new list
 		 * which will automatically add it in if necessary.
 		 */
 
 		LIST_REMOVE(ifl, nfslocklist);
 		pflstatus = lock_partialfilelock(ifl);
 
 		if (pflstatus == PFL_GRANTED || pflstatus == PFL_GRANTED_DUPLICATE) {
 			debuglog("Granted blocked lock\n");
 			/* lock granted and is now being used */
-			send_granted(ifl,0);
+                       if (send_granted(ifl,0) == 0) {
+                               /* lock granted and is now being used */
+                               debuglog("Sent granted message to client\n");
+                       }
+                       else {
+                               syslog(LOG_NOTICE, "Failed to send lock granted "
+				   "message to client. Client must ask again "
+				   "for lock");
+			       /* Previous lock is no longer valid */
+			       unlock_partialfilelock(ifl);
+                       }
 		} else {
 			/* Reinsert lock back into blocked list */
 			debuglog("Replacing blocked lock\n");
 			LIST_INSERT_HEAD(&blockedlocklist_head, ifl, nfslocklist);
 		}
 	}
 
 	debuglog("Exiting retry_blockingfilelocklist\n");
 }
 
@@ -2177,21 +2187,21 @@ notify(const char *hostname, const int s
 {
 	debuglog("notify from %s, new state %d", hostname, state);
 
 	siglock();
 	do_clear(hostname);
 	sigunlock();
 
 	debuglog("Leaving notify\n");
 }
 
-void
+int
 send_granted(fl, opcode)
 	struct file_lock *fl;
 	int opcode __unused;
 {
 	CLIENT *cli;
 	static char dummy;
 	struct timeval timeo;
 	int success;
 	static struct nlm_res retval;
 	static struct nlm4_res retval4;
@@ -2201,25 +2211,27 @@ send_granted(fl, opcode)
 	cli = get_client(fl->addr,
 	    (fl->flags & LOCK_V4) ? NLM_VERS4 : NLM_VERS);
 	if (cli == NULL) {
 		syslog(LOG_NOTICE, "failed to get CLIENT for %s",
 		    fl->client_name);
 		/*
 		 * We fail to notify remote that the lock has been granted.
 		 * The client will timeout and retry, the lock will be
 		 * granted at this time.
 		 */
-		return;
+		return -1;
 	}
 	timeo.tv_sec = 0;
 	timeo.tv_usec = (fl->flags & LOCK_ASYNC) ? 0 : 500000; /* 0.5s */
 
+	cli->cl_auth = authunix_create_default();
+
 	if (fl->flags & LOCK_V4) {
 		static nlm4_testargs res;
 		res.cookie = fl->client_cookie;
 		res.exclusive = fl->client.exclusive;
 		res.alock.caller_name = fl->client_name;
 		res.alock.fh.n_len = sizeof(fhandle_t);
 		res.alock.fh.n_bytes = (char*)&fl->filehandle;
 		res.alock.oh = fl->client.oh;
 		res.alock.svid = fl->client.svid;
 		res.alock.l_offset = fl->client.l_offset;
@@ -2253,23 +2265,24 @@ send_granted(fl, opcode)
 			success = clnt_call(cli, NLM_GRANTED_MSG,
 			    (xdrproc_t)xdr_nlm_testargs, &res,
 			    (xdrproc_t)xdr_void, &dummy, timeo);
 		} else {
 			success = clnt_call(cli, NLM_GRANTED,
 			    (xdrproc_t)xdr_nlm_testargs, &res,
 			    (xdrproc_t)xdr_nlm_res, &retval, timeo);
 		}
 	}
 	if (debug_level > 2)
-		debuglog("clnt_call returns %d(%s) for granted",
-			 success, clnt_sperrno(success));
-
+		debuglog("clnt_call returns %d for granted: %s",
+			 success, clnt_sperror(cli, ""));
+	auth_destroy(cli->cl_auth);
+	return success;
 }
 
 /*
  * Routines below here have not been modified in the overhaul
  */
 
 /*
  * Are these two routines still required since lockd is not spawning off
  * children to service locks anymore?  Presumably they were originally
  * put in place to prevent a one child from changing the lock list out


>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200708141853.l7EIrQxB064992>