Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Jun 2010 00:30:30 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        "Rick C. Petty" <rick-freebsd2009@kiwi-computer.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Why is NFSv4 so slow?
Message-ID:  <Pine.GSO.4.63.1006280017190.2680@muncher.cs.uoguelph.ca>
In-Reply-To: <20100628031401.GA45282@kay.kiwi-computer.com>
References:  <20100627221607.GA31646@kay.kiwi-computer.com> <Pine.GSO.4.63.1006271949220.3233@muncher.cs.uoguelph.ca> <20100628031401.GA45282@kay.kiwi-computer.com>

next in thread | previous in thread | raw e-mail | index | archive | help


On Sun, 27 Jun 2010, Rick C. Petty wrote:

>
> Hmm.  When I mounted the same filesystem with nfs3 from a different client,
> everything started working at almost normal speed (still a little slower
> though).
>
> Now on that same host I saw a file get corrupted.  On the server, I see
> the following:
>
> % hd testfile | tail -4
> 00677fd0  2a 24 cc 43 03 90 ad e2  9a 4a 01 d9 c4 6a f7 14  |*$.C.....J...j..|
> 00677fe0  3f ba 01 77 28 4f 0f 58  1a 21 67 c5 73 1e 4f 54  |?..w(O.X.!g.s.OT|
> 00677ff0  bf 75 59 05 52 54 07 6f  db 62 d6 4a 78 e8 3e 2b  |.uY.RT.o.b.Jx.>+|
> 00678000
>
> But on the client I see this:
>
> % hd testfile | tail -4
> 00011ff0  1e af dc 8e d6 73 67 a2  cd 93 fe cb 7e a4 dd 83  |.....sg.....~...|
> 00012000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
> *
> 00678000
>
> The only thing I could do to fix it was to copy the file on the server,
> delete the original file on the client, and move the copied file back.
>
> Not only is it affecting random file reads, but started breaking src
> and ports builds in random places.  In one situation, portmaster failed
> because of a port checksum.  It then tried to refetch and failed with the
> same checksum problem.  I manually deleted the file, tried again and it
> built just fine.  The ports tree and distfiles are nfs4 mounted.
>

I can't explain the corruption, beyond the fact that "soft,intr" can
cause all sorts of grief. If mounts without "soft,intr" still show
corruption problems, try disabling delegations (either kill off the
nfscbd daemons on the client or set vfs.newnfs.issue_delegations=0
on the server). It is disabled by default because it is the "greenest"
part of the subsystem.

>> The other thing that can really slow it down is if the uid<->login-name
>> (and/or gid<->group-name) is messed up, but this would normally only
>> show up for things like "ls -l". (Beware having multiple password database
>> entries for the same uid, such as "root" and "toor".)
>
> I use the same UIDs/GIDs on all my boxes, so that can't be it.  But thanks
> for the idea.
>

Make sure you don't have multiple entries for the same uid, such as "root"
and "toor" both for uid 0 in your /etc/passwd. (ie. get rid of one of 
them, if you have both)

>
>> When you did the nfs3 mount did you specify "newnfs" or "nfs" for the
>> file system type? (I'm wondering if you still saw the problem with the
>> regular "nfs" client against the server? Others have had good luck using
>> the server for NFSv3 mounts.)
>
> I used "nfs" for FStype.  So I should be using "newnfs"?  This wasn't very
> clear in the man pages.  In fact "newnfs" wasn't mentioned in
> "man mount_newnfs".
>

When you specify "nfs" for an NFSv3 mount, you get the regular client.
When you specify "newnfs" for an NFSv3 mount, you get the experimental
client. When you specify "nfsv4" you always get the experimental NFS
client, and it doesn't matter which FStype you've specified.

>
> One other thing I noticed but I'm not sure if it's a bug or expected
> behavior (unrelated to the delays or corruption), is I have the following
> filesystems on the server:
>
> /vol/a
> /vol/a/b
> /vol/a/c
>
> I export all three volumes and set my NFS V4 root to "/".  On the client,
> I'll "mount ... server:vol /vol" and the "b" and "c" directories show up
> but when I try "ls /vol/a/b /vol/a/c", they show up empty.  In dmesg I see:
>

If you are using UFS/FFS on the server, this should work and I don't know
why the empty directories under /vol on the client confused it. If your
server is using ZFS, everything from / including /vol need to be exported.

> 	kernel: nfsv4 client/server protocol prob err=10020
>

This error indicates that there wasn't a valid FH for the server. I
suspect that the mount failed. (It does a loop of Lookups from "/" in
the kernel during the mount and it somehow got confused part way through.)

> After unmounting /vol, I discovered that my client already had /vol/a/b and
> /vol/a/c directories (because pre-NFSv4, I had to mount each filesystem
> separately).  Once I removed those empty dirs and remounted, the problem
> went away.  But it did drive me crazy for a few hours.
>
I don't know why these empty dirs would confuse it. I'll try a test
here, but I suspect the real problem was that the mount failed and
then happened to succeed after you deleted the empty dirs.

It still smells like some sort of transport/net interface/... issue
is at the bottom of this. (see response to your next post)

rick




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.1006280017190.2680>