Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Dec 1999 11:42:44 -0500 (EST)
From:      Andrew Gallatin <gallatin@cs.duke.edu>
To:        freebsd-current@freebsd.org
Cc:        dillon@apollo.backplane.com
Subject:   Serious server-side NFS problem
Message-ID:  <14423.46117.353932.473968@grasshopper.cs.duke.edu>

next in thread | raw e-mail | index | archive | help

I have a few "scratch" servers which are running -current from early
July.  They serve large, fast scratch filesystems striped over 4 large
IDE drives.   With the recent improvements to the NFS code & the ATA
code, I was hoping to get them running a more recent -current.

However, I'm seeing a showstopping problem when running newer kernels:
When writing a large file via TCP, a Solaris 2.7 client pauses when
closing the file, and appears to become stuck in an infinate loop.
Eg:

dd if=/dev/zero of=zot bs=64k count=8192
8192+0 records in
8192+0 records out
^C	<------------- wedge

The process does not exit, and there is a flurry of activity between
the client & server:

solaris -> freebsd ETHER Type=0800 (IP), size = 1514 bytes
solaris -> freebsd IP  D=152.3.X.Z S=152.3.X.Y LEN=1500, ID=16922
solaris -> freebsd TCP D=2049 S=843     Ack=94978376 Seq=1906025252 Len=1460 Win=8760
solaris -> freebsd RPC C XID=299504169 PROG=100003 (NFS) VERS=3 PROC=7
solaris -> freebsd NFS C WRITE3 FH=F5CB at 369655808 for 32768 (ASYNC)
________________________________
<....>
freebsd -> solaris ETHER Type=0800 (IP), size = 218 bytes
freebsd -> solaris IP  D=152.3.X.Y S=152.3.X.Z LEN=204, ID=34565
freebsd -> solaris TCP D=843 S=2049     Ack=1906313520 Seq=94979688 Len=164 Win=33176
freebsd -> solaris RPC R (#5146) XID=299504169 Success
freebsd -> solaris NFS R WRITE3 OK 32768 (ASYNC)
________________________________
<....>

solaris -> freebsd ETHER Type=0800 (IP), size = 1514 bytes
solaris -> freebsd IP  D=152.3.X.Z S=152.3.X.Y LEN=1500, ID=49895
solaris -> freebsd TCP D=2049 S=843     Ack=96156528 Seq=2140928624 Len=1460 Win=8760
solaris -> freebsd RPC C XID=299511401 PROG=100003 (NFS) VERS=3 PROC=7
solaris -> freebsd NFS C WRITE3 FH=F5CB at 369655808 for 32768 (ASYNC)
<...>
________________________________
freebsd -> solaris ETHER Type=0800 (IP), size = 218 bytes
freebsd -> solaris IP  D=152.3.X.Y S=152.3.X.Z LEN=204, ID=51011
freebsd -> solaris TCP D=843 S=2049     Ack=2140968864 Seq=96156856 Len=164 Win=33176
freebsd -> solaris RPC R (#18995) XID=299511401 Success
freebsd -> solaris NFS R WRITE3 OK 32768 (A

As you can see, the client seems to write the same block multiple
times. 

I would think that this is not our fault, except things work just dandy
with a kernel from July.  In fact, the only way out of this situation
is to reboot the FreeBSD NFS server into an older kernel.

In the trace (about 30 seconds or so of the activity after dd
finished, but before it exited) there are ~21,000 packets.  There is a
grand total of:

NFS C WRITE3:		11024
NFS R WRITE3 OK:	10499
NFS C COMMIT3:		    1
NFS R COMMIT3 OK:	    1

In case more details are needed, I've left the complete trace in
~gallatin/nfs-trace.gz on freefall.


Also, while read performance has improved by 44%, write performance
has degraded by between 50 - 70% (FreeBSD clients)!  Here are some
quick benchmarks.  Note that the file size of 512MB is larger than
memory on both the server and client.  Also note that the disk array
on the server will read at 50MB/sec and write at 40MB/sec, so we are
not disk bound ;-)


- UDP NFS write performance from a FreeBSD client:

July's kernel:	
% dd if=/dev/zero of=zot bs=1024k count=512
512+0 records in
512+0 records out
536870912 bytes transferred in 52.780773 secs (10171714 bytes/sec)

Today's kernel::
% dd if=/dev/zero of=zot bs=1024k count=512
512+0 records in
512+0 records out
536870912 bytes transferred in 141.593458 secs (3791636 bytes/sec)


-- TCP NFS write performnace from a FreeBSD client:

July's kernel:
% dd if=/dev/zero of=zot bs=1024k count=512
512+0 records in
512+0 records out
536870912 bytes transferred in 69.935044 secs (7676708 bytes/sec)

Today's kernel:
% dd if=/dev/zero of=zot bs=1024k count=512
512+0 records in
512+0 records out
536870912 bytes transferred in 162.074402 secs (3312497 bytes/sec)


UDP NFS Read performance has gotten better:

July's kernel:
% dd if=zot of=/dev/null bs=64k
8192+0 records in
8192+0 records out
536870912 bytes transferred in 84.621477 secs (6344381 bytes/sec)

Today's kernel:
dd if=zot of=/dev/null bs=64k
8192+0 records in
8192+0 records out
536870912 bytes transferred in 58.544409 secs (9170319 bytes/sec)

Cheers,

Drew
------------------------------------------------------------------------------
Andrew Gallatin, Sr Systems Programmer	http://www.cs.duke.edu/~gallatin
Duke University				Email: gallatin@cs.duke.edu
Department of Computer Science		Phone: (919) 660-6590



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?14423.46117.353932.473968>