From owner-freebsd-stable@FreeBSD.ORG Mon Jun 28 03:14:02 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BE71A106566B for ; Mon, 28 Jun 2010 03:14:02 +0000 (UTC) (envelope-from rick@svn.kiwi-computer.com) Received: from svn.kiwi-computer.com (174-20-59-6.mpls.qwest.net [174.20.59.6]) by mx1.freebsd.org (Postfix) with SMTP id 4E0048FC17 for ; Mon, 28 Jun 2010 03:14:01 +0000 (UTC) Received: (qmail 45576 invoked by uid 2000); 28 Jun 2010 03:14:01 -0000 Date: Sun, 27 Jun 2010 22:14:01 -0500 From: "Rick C. Petty" To: Rick Macklem Message-ID: <20100628031401.GA45282@kay.kiwi-computer.com> References: <20100627221607.GA31646@kay.kiwi-computer.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Cc: freebsd-stable@freebsd.org Subject: Re: Why is NFSv4 so slow? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: rick-freebsd2009@kiwi-computer.com List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Jun 2010 03:14:02 -0000 On Sun, Jun 27, 2010 at 08:04:28PM -0400, Rick Macklem wrote: > > Weird, I don't see that here. The only thing I can think of is that the > experimental client/server will try to do I/O at the size of MAXBSIZE > by default, which might be causing a burst of traffic your net interface > can't keep up with. (This can be turned down to 32K via the > rsize=32768,wsize=32768 mount options. I found this necessary to avoid > abissmal performance on some Macs for the Mac OS X port.) Hmm. When I mounted the same filesystem with nfs3 from a different client, everything started working at almost normal speed (still a little slower though). Now on that same host I saw a file get corrupted. On the server, I see the following: % hd testfile | tail -4 00677fd0 2a 24 cc 43 03 90 ad e2 9a 4a 01 d9 c4 6a f7 14 |*$.C.....J...j..| 00677fe0 3f ba 01 77 28 4f 0f 58 1a 21 67 c5 73 1e 4f 54 |?..w(O.X.!g.s.OT| 00677ff0 bf 75 59 05 52 54 07 6f db 62 d6 4a 78 e8 3e 2b |.uY.RT.o.b.Jx.>+| 00678000 But on the client I see this: % hd testfile | tail -4 00011ff0 1e af dc 8e d6 73 67 a2 cd 93 fe cb 7e a4 dd 83 |.....sg.....~...| 00012000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00678000 The only thing I could do to fix it was to copy the file on the server, delete the original file on the client, and move the copied file back. Not only is it affecting random file reads, but started breaking src and ports builds in random places. In one situation, portmaster failed because of a port checksum. It then tried to refetch and failed with the same checksum problem. I manually deleted the file, tried again and it built just fine. The ports tree and distfiles are nfs4 mounted. > The other thing that can really slow it down is if the uid<->login-name > (and/or gid<->group-name) is messed up, but this would normally only > show up for things like "ls -l". (Beware having multiple password database > entries for the same uid, such as "root" and "toor".) I use the same UIDs/GIDs on all my boxes, so that can't be it. But thanks for the idea. > I don't recommend the use of "intr or soft" for NFSv4 mounts, but they > wouldn't affect performance for trivial tests. You might want to try: > "nfsv4,rsize=32768,wsize=32768" and see how that works. I'm trying that right now (with rdirplus also) on one host. If I start to the delays again, I'll compare between hosts. > When you did the nfs3 mount did you specify "newnfs" or "nfs" for the > file system type? (I'm wondering if you still saw the problem with the > regular "nfs" client against the server? Others have had good luck using > the server for NFSv3 mounts.) I used "nfs" for FStype. So I should be using "newnfs"? This wasn't very clear in the man pages. In fact "newnfs" wasn't mentioned in "man mount_newnfs". > When I see abissmal NFS perf. it is usually an issue with the underlying > transport. Looking at things like "netstat -i" or "netstat -s" might > give you a hint? I suspected it might be transport-related. I didn't see anything out of the ordinary from netstat, but then again I don't know what's "ordinary" with NFS. =) ~~ One other thing I noticed but I'm not sure if it's a bug or expected behavior (unrelated to the delays or corruption), is I have the following filesystems on the server: /vol/a /vol/a/b /vol/a/c I export all three volumes and set my NFS V4 root to "/". On the client, I'll "mount ... server:vol /vol" and the "b" and "c" directories show up but when I try "ls /vol/a/b /vol/a/c", they show up empty. In dmesg I see: kernel: nfsv4 client/server protocol prob err=10020 After unmounting /vol, I discovered that my client already had /vol/a/b and /vol/a/c directories (because pre-NFSv4, I had to mount each filesystem separately). Once I removed those empty dirs and remounted, the problem went away. But it did drive me crazy for a few hours. -- Rick C. Petty