Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 Jan 2007 01:28:13 GMT
From:      Douglas Rudoff<joseph.blough@yahoo.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   misc/107555: [rpc] 30 second delay in NFS lock of file after waiting for lock
Message-ID:  <200701050128.l051SDx6050921@www.freebsd.org>
Resent-Message-ID: <200701050130.l051UUo3010497@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         107555
>Category:       misc
>Synopsis:       [rpc] 30 second delay in NFS lock of file after waiting for lock
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Jan 05 01:30:30 GMT 2007
>Closed-Date:
>Last-Modified:
>Originator:     Douglas Rudoff
>Release:        6.1
>Organization:
Isilon
>Environment:
FreeBSD drudoffbsd.isilon.com 6.1-RELEASE-p7 FreeBSD 6.1-RELEASE-p7 #3: Tue Sep 26 14:10:13 PDT 2006 root@:/usr/obj/usr/src/sys/SM865DESKTOP i386
>Description:
With a certain Linux release if a process places a waiting NFS lock on
a file on a FreeBSD NFS server, when the lock on the file is available
it takes a multiple of 30 seconds for the client's lock request to complete
even if the lock is availble after waiting only a few seconds.

The Linux release is "CentOS release 4.1 (Final)" with uname printing
2.6.9-11.ELsmp #1 SMP Wed Jun 8 17:54:20 CDT 2005 i686 i386.

This problem does not occur with a Solaris NFS server

This problem does not occur if the Linux client is RedHat "Fedora Core
release 3 (Heidelberg)" with uname "2.6.9-1.667smp #1 SMP Tue Nov 2
14:59:52 EST 2004" after the patch in
http://www.freebsd.org/cgi/query-pr.cgi?pr=misc/107530 is made to
FreeBSD's rpc.lockd.

I have my test program below, but I've included the procedure here to
explain what I'm seeing.

1) Have a FreeBSD NFS server mounted on the Linux box with the release
mentioned above.

2) Have two shells on the Linux box.

3) In one shell run the test program (A) to lock a file on the FreeBSD
box and sleep. Something like "./lock_test /mnt/bsd/test/a 10" to sleep
for 10 seconds.

4) While the test program (A) has locked the file and is sleeping, run
the test program (B) in the other shell "./lock_test /mnt/bsd/test/a"

After A finishes its sleep and unlocks the file, B should get the lock
immediately. Instead it will take exactly 30 seconds for to execute
B's call to fcntl() for the waiting lock request.

If A is given a sleep greater than 30 seconds, B will take exactly 60
seconds to complete its lock request. If A is given a sleep greater
than 60 seconds, B will take 90 seconds to complete its lock request.

I discovered a difference in the order of NLM messags sent from the
lockd between FreeBSD and Sun.

Here are the NLM messages with a FreeBSD NFS server (with A and first
process requesting the lock and B the second process):

Lock Request (from A)
Lock Reply GRANTED (to A)
Lock Request (from B)
Lock Reply BLOCKED (to B)
[After A is finished sleeping while locking the file]
Unlock Request (from A)
Lock GRANTED callback (to B)
Unlock Reply GRANTED (to A)
[Now nothing happens until 30 seconds after A's initial lock request]
Lock Request (from B)
Lock Reply GRANTED (to B)
Unlock Request (from B)
Unlock Reply GRANTED (to B)

Notes:
1) rpc.lockd grants B's waiting request for lock before it informs A
   that it's been unlocked.
2) B isn't acknowledging the granted lock callback.
2) B requests a lock again after it receives its granted callback even
   though it should be necessary.

Trace of NLM calls using a Solaris NFS server (with same Linux client):

Lock Request (from A)
Lock Reply GRANTED (to A)
Lock Request (from B)
Lock Reply BLOCKED (to B)
[After A is finished sleeping while locking the file]
Unlock Request (from A)
Unlock Reply GRANTED (to A)
Lock GRANTED callback (to B)
Lock GRANTED Reply GRANTED (from B)
Unlock Request (from B)
Unlock Request (from A)
Unlock Reply GRANTED (to B)
Unlock Reply GRANTED (to A)

Notes:
1) Solaris' lockd replies to A's unlock request before granting the
   lock to B.
2) B accepts the lock callback (as expected)
3) A does an unneeded lock request at the end right before it exits

It doesn't matter if A and B are running on the same machine, as long
as B is on the problem Linux release the 30 second wait occurs. (This
was a test to see if it the wait was internal to the Linux kernel if
it was aware of two locks on the same file with different processes).

Running FreeBSD's rpc.lockd in debug mode showed a near 30 second gap
in log messages after A unlocked the file until B's lock was
processed (it's not a 30 second gap as it doesn't include the wait
between starting A and B):

Jan  4 14:57:05 <16.7> anadoug-1(id1) rpc.lockd: Exiting unlock...
Jan  4 14:57:27 <16.7> anadoug-1(id1) rpc.lockd: nlm4_lock from 10.54.141.75


Analysis:
1) Solaris's and FreeBSD's lockd process requests in not quite
   the same order.
2) The Solaris version works as expected with this Linux
   client.
3) The FreeBSD version does work but has those mysterious 30
   second wait added to the locking process.
4) The multiple of 30 second wait is very bizarre. It's not clear
   whether it's happening on the server or client

>How-To-Repeat:
See test procedure above using this source code:

#include <stdio.h>
#include <fcntl.h>
#include <time.h>
#include <errno.h>

int main(int argc, char **argv)
{
	int fd;
	char *filename;
	struct flock fl;
	int sleeptime = 0;
	time_t start;

	if (argc < 2) {
		printf("Missing file name arg\n");
		exit(1);
	}
	filename = argv[1];
	fd = open(filename, O_RDWR|O_CREAT,0644);
	if (fd < 0) {
		perror("open");
		exit(1);
	}
	if (argc == 3) {
		sleeptime = atoi(argv[2]);
	}

	fl.l_type = F_WRLCK;
	fl.l_whence = SEEK_SET;
	fl.l_start = 0;
	fl.l_len = 0;
	printf("Requesting fcntl lock of %s...\n", filename);
	start = time(0);
	if( fcntl (fd, F_SETLKW, &fl) < 0 ) {
		printf("Error: '%s' (lock request elapsed time %d secs)\n",
		    strerror(errno), time(0) - start);
		exit(1);
	}
	printf("fcntl lock %s : OK (time to lock %d secs)\n", filename, time(0) - start);
	printf("Sleeping for %d secs...\n", sleeptime);
	sleep(sleeptime);

	fl.l_type = F_UNLCK;
	fl.l_whence = SEEK_SET;
	fl.l_start = 0;
	fl.l_len = 0;
	printf("Unlocking...\n");
	start = time(0);
	if( fcntl (fd, F_SETLK, &fl) < 0 ) {
		printf("Error (unlock elapsed time %d secs): ", time(0) - start);
		perror("fcntl unlock");
		exit(1);
	}
	printf("Unlocked  (time to unlock %d secs)\n", time(0) - start);
}

>Fix:

>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200701050128.l051SDx6050921>