Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Dec 2014 09:29:48 +0100
From:      Gerrit =?ISO-8859-1?Q?K=FChn?= <gerrit.kuehn@aei.mpg.de>
To:        freebsd-net@freebsd.org
Subject:   Re: compiling on nfs directories
Message-ID:  <20141216092948.605dc8e2e0fec3fa4a5f8ec1@aei.mpg.de>
In-Reply-To: <2048229686.13136235.1418677169130.JavaMail.root@uoguelph.ca>
References:  <CAOgwaMs%2BYLUoLSHDsu6BOYgwr_oi09xNk9yOnSNYjjXqaiDCQQ@mail.gmail.com> <2048229686.13136235.1418677169130.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 15 Dec 2014 15:59:29 -0500 (EST) Rick Macklem
<rmacklem@uoguelph.ca> wrote about Re: compiling on nfs directories:


RM> Also, note that he didn't see the problem with FreeBSD8.3, which would
RM> have been following the same rules on the server as 10.1.
RM> 
RM> What I suspect might cause this is one of two things:
RM> 1 - The modify time of the file is now changing at a time the Linux
RM>     client doesn't expect, due to changes in ZFS or maybe TOD clock
RM>     resolution. (At one time, the TOD clock was only at a resolution
RM>     of 1sec, so the client wouldn't see the modify time change often.
RM>     I think it is now at a much higher resolution, but would have to
RM>     look at the code/test to be sure.)
RM> 2 - I think you mention this one later in your message, in that the
RM>     build might be depending on file locking. If this is the case,
RM>     trying NFSv4, which does better file locking, might fix the
RM>     problem.

Meanwhile I have googled around a bit more, and one of the few reasons
other people see the error messages I see appears to be a broken clock that
makes "make" recompile stuff on the installation stage. As I was already
wondering why compilation took longer than I had actually expected, I may
be seeing something similar (still need to look into that), although my
clock is fine (but time stamps on the NFS might be messed up somehow like
you mention above under "1").

RM> Gerrit, I would suggest that you do "nfsstat -m" on the Linux client,
RM> to see what the mount options are. The Linux client might be using
RM> NFSv4 already.

This is what it says about my nfs-root:

---
pt-nds ~ # nfsstat -m
/ from 192.168.32.253:/tank/diskless/nds
 Flags: rw,relatime,vers=3,rsize=4096,wsize=4096,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.32.253,mountvers=3,mountproto=tcp,local_lock=all,addr=192.168.32.253
---

This is what I set up for pxe-booting:

---
label gentoo-cs2
  menu label linux-3.8.13-gentoo-2
  kernel bzImage-3.8.13-gentoo-2
  append ip=dhcp root=/dev/nfs rw nfsroot=192.168.32.253:/tank/diskless/nds,nolock,tcp,v3 rootdelay=15
---


So I definitely run "nfsv3" and "nolock". I remember trying to use nfsv4 on the diskless machines some years ago, but back then it was not ready for prime time.

RM> Also, avoid "soft, intr" especially if you are using NFSv4, since these
RM> can cause slow server response to result in a failure of a read/write
RM> when it shouldn't fail, due to timeout or interruption by a signal.

There is "hard" in there as a default option. However, I might try turning on locking (I regarded it as superfluous up to now as I have only one client using the filesystem).

RM> If you could find out more about what causes the specific build failure
RM> on the Linux side, that might help.

As I said above, I have some hints that indicate something might be wrong with timestamps, but I still need to dig deeper into that.

RM> If you can reproduce a build failure quickly/easily, you can capture
RM> packets via "tcpdump -s 0 -w <file> host <client-hostname>" on the
RM> server and then look at it in wireshark to see what the server is
RM> replying when the build failure occurs. (I don't mind looking at a
RM> packet trace if it is relatively small, if you email it to me as an
RM> attachment.)

I can reproduce it 100%, but it only happens on the installation stage, after having compiled the whole stuff. So I don't know if I will be able to produce a dump of reasonable size that contains the issue, but I'll try.

RM> ps: I am not familiar with the Linux mount options, but if it has
RM>     stuff like "nocto", you could try those.

The manpage has the following:

---
       cto / nocto    Selects  whether  to  use  close-to-open cache coherence
                      semantics.  If neither option is specified (or if cto is
                      specified),  the  client uses close-to-open cache coher-
                      ence semantics. If the nocto option  is  specified,  the
                      client  uses  a non-standard heuristic to determine when
                      files on the server have changed.

                      Using the nocto option may improve performance for read-
                      only  mounts, but should be used only if the data on the
                      server changes only occasionally.  The DATA AND METADATA
                      COHERENCE  section discusses the behavior of this option
                      in more detail.
---


So "cto" appears to be the default and is probably what is used right now. I'll put "nocto" on my list of things to try (although the description is not really that incouraging... :-).


cu
  Gerrit



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141216092948.605dc8e2e0fec3fa4a5f8ec1>