Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Mar 2016 09:29:25 -0500
From:      Paul Mather <paul@gromit.dlib.vt.edu>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        Ronald Klop <ronald-lists@klop.ws>, freebsd-fs@freebsd.org, freebsd-arm@freebsd.org
Subject:   Re: Unstable NFS on recent CURRENT
Message-ID:  <BF9757C7-654D-4FAC-97E4-7E8B36C6E4A7@gromit.dlib.vt.edu>
In-Reply-To: <508973676.11871738.1457575196588.JavaMail.zimbra@uoguelph.ca>
References:  <3DAB3639-8FB8-43D3-9517-94D46EDEC19E@gromit.dlib.vt.edu> <op.ydylazgukndu52@ronaldradial.radialsg.local> <1482595660.8940439.1457405756110.JavaMail.zimbra@uoguelph.ca> <08710728-3130-49BE-8BD7-AFE85A31C633@gromit.dlib.vt.edu> <1290552239.10146172.1457484570450.JavaMail.zimbra@uoguelph.ca> <60E8006A-F0A8-4284-839E-882FAD7E6A55@gromit.dlib.vt.edu> <508973676.11871738.1457575196588.JavaMail.zimbra@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mar 9, 2016, at 8:59 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Paul Mather wrote:
>> On Mar 8, 2016, at 7:49 PM, Rick Macklem <rmacklem@uoguelph.ca> =
wrote:
>>=20
>>> Paul Mather wrote:
>>>> On Mar 7, 2016, at 9:55 PM, Rick Macklem <rmacklem@uoguelph.ca> =
wrote:
>>>>=20
>>>>> Paul Mather (forwarded by Ronald Klop) wrote:
>>>>>> On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather
>>>>>> <paul@gromit.dlib.vt.edu>
>>>>>> wrote:
>>>>>>=20
>>>>>>> On my BeagleBone Black running 11-CURRENT (r296162) lately I =
have been
>>>>>>> having trouble with NFS.  I have been doing a buildworld and
>>>>>>> buildkernel
>>>>>>> with /usr/src and /usr/obj mounted via NFS.  Recently, this =
process has
>>>>>>> resulted in the buildworld failing at some point, with a variety =
of
>>>>>>> errors (Segmentation fault; Permission denied; etc.).  Even a =
"ls -alR"
>>>>>>> of /usr/src doesn't manage to complete.  It errors out thus:
>>>>>>>=20
>>>>>>> =3D=3D=3D=3D=3D
>>>>>>> [[...]]
>>>>>>> total 0
>>>>>>> ls: ./.svn/pristine/fe: Permission denied
>>>>>>>=20
>>>>>>> ./.svn/pristine/ff:
>>>>>>> total 0
>>>>>>> ls: ./.svn/pristine/ff: Permission denied
>>>>>>> ls: fts_read: Permission denied
>>>>>>> =3D=3D=3D=3D=3D
>>>>>>>=20
>>>>>>> On the console, I get the following:
>>>>>>>=20
>>>>>>> newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid
>>>>>>> 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS =
SERVER OR
>>>>>>> MIDDLEWARE)
>>>>>>>=20
>>> Oh, I had forgotten this. Here's the comment related to this error.
>>> (about line#445 in sys/fs/nfsclient/nfs_clport.c):
>>> 446                      * BROKEN NFS SERVER OR MIDDLEWARE
>>> 447 	                 *
>>> 448 	                 * Certain NFS servers (certain old =
proprietary filers
>>> ca.
>>> 449 	                 * 2006) or broken middleboxes (e.g. WAN =
accelerator
>>> products)
>>> 450 	                 * will respond to GETATTR requests with =
results for a
>>> 451 	                 * different fileid.
>>> 452 	                 *
>>> 453 	                 * The WAN accelerator we've observed =
not only serves
>>> stale
>>> 454 	                 * cache results for a given file, it =
also
>>> occasionally serves
>>> 455 	                 * results for wholly different files.  =
This causes
>>> surprising
>>> 456 	                 * problems; for example the cached size =
attribute of
>>> a file
>>> 457 	                 * may truncate down and then back up, =
resulting in
>>> zero
>>> 458 	                 * regions in file contents read by =
applications.  We
>>> observed
>>> 459 	                 * this reliably with Clang and .c files =
during
>>> parallel build.
>>> 460 	                 * A pcap revealed packet fragmentation =
and GETATTR
>>> RPC
>>> 461 	                 * responses with wholly wrong fileids.
>>>=20
>>> If you can connect the client->server with a simple switch (or just =
an RJ45
>>> cable), it
>>> might be worth testing that way. (I don't recall the name of the =
middleware
>>> product, but
>>> I think it was shipped by one of the major switch vendors. I also =
don't
>>> know if the product
>>> supports NFSv4?)
>>>=20
>>> rick
>>=20
>>=20
>> Currently, the client is connected to the server via a dumb gigabit =
switch,
>> so it is already fairly direct.
>>=20
>> As for the above error, it appeared on the console only once.  (Sorry =
if I
>> made it sound like it appears every time.)
>>=20
>> I just tried another buildworld attempt via NFS and it failed again.  =
This
>> time, I get this on the BeagleBone Black console:
>>=20
>> 	nfs_getpages: error 13
>> 	vm_fault: pager read error, pid 5401 (install)
>>=20
> 13 is EACCES and could be caused by what I mention below. (Any mount =
of a file
> system on the server unless "-S" is specified as a flag for mountd.)
>=20
>>=20
>> The other thing I have noticed is that if I induce heavy load on the =
NFS
>> server---e.g., by starting a Poudriere bulk build---then that =
provokes the
>> client to crash much more readily.  For example, I started a NFS =
buildworld
>> on the BeagleBone Black, and it seemed to be chugging along nicely.  =
The
>> moment I kicked off a Poudriere build update of my packages on the =
NFS
>> server, it crashed the buildworld on the NFS client.
>>=20
> Try adding "-S" to mountd_flags on the server. Any time file systems =
are mounted
> (and Poudriere likes to do that, I am told), mount sends a SIGHUP to =
mountd to
> reload /etc/exports. When /etc/exports are being reloaded, there will =
be access
> errors for mounts (that are temporarily not exported) unless you =
specify "-S"
> (which makes mountd suspend the nfsd threads during the reload of =
/etc/exports).
>=20
> rick


Bingo!  I think we may have a winner.  I added that flag to mountd_flags =
on the server and the "instability" appears to have gone away.

It may be that all along the NFS problems on the client just coincided =
with Poudriere runs on the server.  I build custom packages for my local =
machines using Poudriere so I use it quite a lot.  Maybe the Poudriere =
port should come with a warning at install to those using NFS that it =
may provoke disruption and suggest the addition of "-S"?  =
(Alternatively, maybe "-S" could become a default for mountd_flags?  Is =
there a downside from using it that means making it a default option is =
unsuitable?)

Anyway, many, many thanks for all the help, Rick.  I'll keep monitoring =
my BeagleBone Black, but it looks for now that this has solved the NFS =
"instability."

Cheers,

Paul.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BF9757C7-654D-4FAC-97E4-7E8B36C6E4A7>