Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 9 Mar 2016 11:12:36 -0500
From:      Paul Mather <paul@gromit.dlib.vt.edu>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        Ronald Klop <ronald-lists@klop.ws>, freebsd-fs@freebsd.org, freebsd-arm@freebsd.org
Subject:   Re: Unstable NFS on recent CURRENT
Message-ID:  <60E8006A-F0A8-4284-839E-882FAD7E6A55@gromit.dlib.vt.edu>
In-Reply-To: <1290552239.10146172.1457484570450.JavaMail.zimbra@uoguelph.ca>
References:  <3DAB3639-8FB8-43D3-9517-94D46EDEC19E@gromit.dlib.vt.edu> <op.ydylazgukndu52@ronaldradial.radialsg.local> <1482595660.8940439.1457405756110.JavaMail.zimbra@uoguelph.ca> <08710728-3130-49BE-8BD7-AFE85A31C633@gromit.dlib.vt.edu> <1290552239.10146172.1457484570450.JavaMail.zimbra@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mar 8, 2016, at 7:49 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Paul Mather wrote:
>> On Mar 7, 2016, at 9:55 PM, Rick Macklem <rmacklem@uoguelph.ca> =
wrote:
>>=20
>>> Paul Mather (forwarded by Ronald Klop) wrote:
>>>> On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather =
<paul@gromit.dlib.vt.edu>
>>>> wrote:
>>>>=20
>>>>> On my BeagleBone Black running 11-CURRENT (r296162) lately I have =
been
>>>>> having trouble with NFS.  I have been doing a buildworld and =
buildkernel
>>>>> with /usr/src and /usr/obj mounted via NFS.  Recently, this =
process has
>>>>> resulted in the buildworld failing at some point, with a variety =
of
>>>>> errors (Segmentation fault; Permission denied; etc.).  Even a "ls =
-alR"
>>>>> of /usr/src doesn't manage to complete.  It errors out thus:
>>>>>=20
>>>>> =3D=3D=3D=3D=3D
>>>>> [[...]]
>>>>> total 0
>>>>> ls: ./.svn/pristine/fe: Permission denied
>>>>>=20
>>>>> ./.svn/pristine/ff:
>>>>> total 0
>>>>> ls: ./.svn/pristine/ff: Permission denied
>>>>> ls: fts_read: Permission denied
>>>>> =3D=3D=3D=3D=3D
>>>>>=20
>>>>> On the console, I get the following:
>>>>>=20
>>>>> newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid
>>>>> 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS SERVER =
OR
>>>>> MIDDLEWARE)
>>>>>=20
> Oh, I had forgotten this. Here's the comment related to this error.
> (about line#445 in sys/fs/nfsclient/nfs_clport.c):
> 446                      * BROKEN NFS SERVER OR MIDDLEWARE
> 447 	                 *
> 448 	                 * Certain NFS servers (certain old proprietary =
filers ca.
> 449 	                 * 2006) or broken middleboxes (e.g. WAN =
accelerator products)
> 450 	                 * will respond to GETATTR requests with results =
for a
> 451 	                 * different fileid.
> 452 	                 *
> 453 	                 * The WAN accelerator we've observed not only =
serves stale
> 454 	                 * cache results for a given file, it also =
occasionally serves
> 455 	                 * results for wholly different files.  This =
causes surprising
> 456 	                 * problems; for example the cached size =
attribute of a file
> 457 	                 * may truncate down and then back up, resulting =
in zero
> 458 	                 * regions in file contents read by =
applications.  We observed
> 459 	                 * this reliably with Clang and .c files during =
parallel build.
> 460 	                 * A pcap revealed packet fragmentation and =
GETATTR RPC
> 461 	                 * responses with wholly wrong fileids.
>=20
> If you can connect the client->server with a simple switch (or just an =
RJ45 cable), it
> might be worth testing that way. (I don't recall the name of the =
middleware product, but
> I think it was shipped by one of the major switch vendors. I also =
don't know if the product
> supports NFSv4?)
>=20
> rick


Currently, the client is connected to the server via a dumb gigabit =
switch, so it is already fairly direct.

As for the above error, it appeared on the console only once.  (Sorry if =
I made it sound like it appears every time.)

I just tried another buildworld attempt via NFS and it failed again.  =
This time, I get this on the BeagleBone Black console:

	nfs_getpages: error 13
	vm_fault: pager read error, pid 5401 (install)


The other thing I have noticed is that if I induce heavy load on the NFS =
server---e.g., by starting a Poudriere bulk build---then that provokes =
the client to crash much more readily.  For example, I started a NFS =
buildworld on the BeagleBone Black, and it seemed to be chugging along =
nicely.  The moment I kicked off a Poudriere build update of my packages =
on the NFS server, it crashed the buildworld on the NFS client.

I have had problems with swap on FreeBSD/arm before.  Swapping to a file =
does not appear to work for me.  As a result, I switched to swapping to =
a partition on the SD card.  Maybe this is unreliable, too?

Cheers,

Paul.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?60E8006A-F0A8-4284-839E-882FAD7E6A55>