Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 4 Jan 2007 18:11:14 GMT
From:      Douglas Rudoff<joseph.blough@yahoo.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   misc/107530: [rpc][patch]NFS locking with certain Linux clients causes rpc.lockd to crash
Message-ID:  <200701041811.l04IBE4M077445@www.freebsd.org>
Resent-Message-ID: <200701041820.l04IKFdw053416@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         107530
>Category:       misc
>Synopsis:       [rpc][patch]NFS locking with certain Linux clients causes rpc.lockd to crash
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Jan 04 18:20:14 GMT 2007
>Closed-Date:
>Last-Modified:
>Originator:     Douglas Rudoff
>Release:        6.1
>Organization:
Isilon
>Environment:
FreeBSD drudoffbsd.isilon.com 6.1-RELEASE-p7 FreeBSD 6.1-RELEASE-p7 #3: Tue Sep 26 14:10:13 PDT 2006     root@:/usr/obj/usr/src/sys/SM865DESKTOP  i386
>Description:
There is a bug in some releases of Linux in client-side NFS file
locking. When a Linux client attempts to do a wait lock on a file
that the FreeBSD NFS server has already locked, instead of the client
waiting for the callback granting the lock, the client overwhelms
rpc.lockd with NLM_LOCK request messages sent again and again and
again.

The end result is that the FreeBSD rpc.lockd takes up the majority of
the CPU time and eventually aborts.

Granted, this is caused by a bug on the Linux side of things. But
FreeBSD doesn't handle it gracefully. I've run the same tests with a
Sun NFS server, and it ignored the flood of NLM requests from the
Linux client and performed NFS locking as expected. (I used tcpdump
and Wireshark to examine the NLM exchanges).

I have seen other postings regarding problems with NFS file locking
with FreeBSD, but with no real resolution. Perhaps this will help
solve those problems.

The version of Linux with the excess lock requests is RedHat "Fedora
Core release 3 (Heidelberg)" with uname -rvsip printing "Linux 2.6.9-1.667smp #1 SMP Tue Nov 2 14:59:52 EST 2004 i686 i686 i386 GNU/Linux"

Both the FreeBSD 5.4 and 6.1 versions of rpc.lockd react the same way
to the buggy Linux client.

I have customers unwilling to upgrade their Linux release so I need to
have a fix for this.

What is happening is that each NLM_LOCK message from the client is
added to the list of blocked lock requests and this list keeps on
expanding until either rpc.lockd runs out of memory or the requested
file is unlocked and made available. Each NLM_LOCK message is
identical, including the cookie.

If the file becomes unlocked, rpc.locked sends an NLM_GRANTED callback
for each blocked request. However, the client replies with an
NLM_DENIED (not sure why) so the client never accepts a lock and thus
remains in the waiting state. Also, within rpc.lockd (in
lockd_lock.c::test_nfslock()) each granted lock request is considered
a duplicate of the first granted lock made to the client and is added
to the list of existing locks, and since the lock is never accepted by
the client the list of existing locks grows without limit until
rpc.lockd is out of memory.

RFC 1813 Appendix II where the NLM protocol is defined makes no
mention on what to do if identical blocking lock requests are made.

I've haven't had luck finding Solaris' lockd source on opensolaris.org
to see how Solaris manages to handle the excess lock requests
gracefully.
>How-To-Repeat:
I have a test program (below) and this is the procedure I use to
trigger the rpc.lockd failure:

1) Have a FreeBSD NFS server mounted on the Linux box with the release
mentioned above.

2) Have two shells on the Linux box.

3) In one shell run the test program to lock a file on the FreeBSD box
and sleep. Something like "./lock_test /mnt/bsd/test/a 10 &" (I run
the tests in the background because otherwise you'll never get back to
the shell while the program is waiting for an lock/unlock using the
existing rpc.lockd on FreeBSD).

4) While the test program has locked the file and is sleeping, run the
test program in the other shell "./lock_test /mnt/bsd/test/a &"

The expected result is the first test locking the file for 10 seconds
before unlocking the file and the second test waiting for the first to
unlock the file before it locks the file.

On the problem Linux client this does not happen. The test programs
never exit (the first is stuck in the unlock the second is stuck in
the lock).

Be ready to kill rpc.lockd on the FreeBSD system because it will take
over the CPU load.

The test procedure works as expected after updating rpc.lockd with
the patch.

Here's the test source code:

#include <stdio.h>
#include <fcntl.h>
#include <time.h>
#include <errno.h>

int main(int argc, char **argv)
{
	int fd;
	char *filename;
	struct flock fl;
	int sleeptime = 0;
	time_t start;

	if (argc < 2) {
		printf("Missing file name arg\n");
		exit(1);
	}
	filename = argv[1];
	fd = open(filename, O_RDWR|O_CREAT,0644);
	if (fd < 0) {
		perror("open");
		exit(1);
	}
	if (argc == 3) {
		sleeptime = atoi(argv[2]);
	}

	fl.l_type = F_WRLCK;
	fl.l_whence = SEEK_SET;
	fl.l_start = 0;
	fl.l_len = 0;
	printf("Requesting fcntl lock of %s...\n", filename);
	start = time(0);
	if( fcntl (fd, F_SETLKW, &fl) < 0 ) {
		printf("Error: '%s' (lock request elapsed time %d secs)\n",
		    strerror(errno), time(0) - start);
		exit(1);
	}
	printf("fcntl lock %s : OK (time to lock %d secs)\n", filename, time(0) - start);
	printf("Sleeping for %d secs...\n", sleeptime);
	sleep(sleeptime);

	fl.l_type = F_UNLCK;
	fl.l_whence = SEEK_SET;
	fl.l_start = 0;
	fl.l_len = 0;
	printf("Unlocking...\n");
	start = time(0);
	if( fcntl (fd, F_SETLK, &fl) < 0 ) {
		printf("Error (unlock elapsed time %d secs): ", time(0) - start);
		perror("fcntl unlock");
		exit(1);
	}
	printf("Unlocked  (time to unlock %d secs)\n", time(0) - start);
}

>Fix:
I created a patch that seems to resolve the problem. When a request to
add a blocking lock is made, only unique requests are added to the
list of blocked locks.

Now, I don't see how a client that is already blocked should _ever_ be making another lock request until the first one is resolved given that the client should be in a wait state.

My function that tests for a duplicate block may be overkill as it's probably only necessary to consider a blocking lock request a duplicate if it's the request comes from an already blocked client. 

Patch attached with submission follows:

--- /usr/src/usr.sbin/rpc.lockd/lockd_lock.c	Fri May 20 06:01:47 2005
+++ lockd_lock.c	Thu Jan  4 09:20:22 2007
@@ -1195,13 +1195,54 @@
  * if at all possible
  */
 
+int
+duplicate_block(struct file_lock *fl)
+{
+	struct file_lock *ifl,*nfl;
+	int retval = 0;
+
+	debuglog("Entering duplicate_block");
+
+	/*
+	 * Is this lock request already on the blocking list?
+	 * Condider it a dupe if the file handles, offset, length,
+	 * exclusivity and client match.
+	 */
+	LIST_FOREACH(ifl, &blockedlocklist_head, nfslocklist) {
+		if (!bcmp(&fl->filehandle, &ifl->filehandle,
+			sizeof(fhandle_t)) &&
+		    fl->client.exclusive == ifl->client.exclusive &&
+		    fl->client.l_offset == ifl->client.l_offset &&
+		    fl->client.l_len == ifl->client.l_len &&
+		    same_filelock_identity(fl, ifl)) {
+			retval = 1;
+			break;
+		}
+	}
+
+	debuglog("Exiting duplicate_block: %s\n", retval ? "already blocked"
+	    : "not already blocked");
+	return retval;
+}
+
 void
 add_blockingfilelock(struct file_lock *fl)
 {
-
 	debuglog("Entering add_blockingfilelock\n");
 
 	/*
+	 * A blocking lock request _should_ never be duplicated as a client
+	 * that is already blocked shouldn't be able to request another
+	 * lock. Alas, there are some buggy clients that do request the same
+	 * lock repeatedly. Make sure only unique locks are on the blocked
+	 * lock list.
+	 */
+	if (duplicate_block(fl)) {
+		debuglog("Exiting add_blockingfilelock: already blocked\n");
+		return;
+	}
+
+	/*
 	 * Clear the blocking flag so that it can be reused without
 	 * adding it to the blocking queue a second time
 	 */
@@ -1209,7 +1250,7 @@
 	fl->blocking = 0;
 	LIST_INSERT_HEAD(&blockedlocklist_head, fl, nfslocklist);
 
-	debuglog("Exiting add_blockingfilelock\n");
+	debuglog("Exiting add_blockingfilelock: added blocked lock\n");
 }
 
 void

>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200701041811.l04IBE4M077445>