Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 5 Oct 2004 08:51:02 -0400
From:      Bill Moran <wmoran@potentialtech.com>
To:        Alex de Kruijff <freebsd@akruijff.dds.nl>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: nfs server not responding / is alive again
Message-ID:  <20041005085102.376a7e95.wmoran@potentialtech.com>
In-Reply-To: <20041005052249.GC917@alex.lan>
References:  <20041004001747.J10913@ganymede.hub.org> <20041005052249.GC917@alex.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
Alex de Kruijff <freebsd@akruijff.dds.nl> wrote:
> On Mon, Oct 04, 2004 at 12:22:30AM -0300, Marc G. Fournier wrote:
> > 
> > I'm using an nfs mount to get at the underlying file system on a system 
> > that uses unionfs mounts ... instead of using nullfs, which, last time I 
> > used it over a year ago, caused the server to crash to no end ...
> > 
> > But, as soon as there is any 'load', I'm getting a whack of:
> > 
> > Oct  3 22:46:16 neptune /kernel: nfs server neptune.hub.org:/vm: not 
> > responding
> > Oct  3 22:46:16 neptune /kernel: nfs server neptune.hub.org:/vm: is alive 
> > again
> > Oct  3 22:48:30 neptune /kernel: nfs server neptune.hub.org:/vm: not 
> > responding
> > Oct  3 22:48:30 neptune /kernel: nfs server neptune.hub.org:/vm: is alive 
> > again

In my experience, this is caused by the server responding unpredictably.

Someone smarter than me may correct me, but I believe the nfs client keeps
track of how quickly the NFS server responds, and uses it to judge whether
the server is still working or not.  Any time the server's response time
varies too much from that amount, the client will assume the server is
down, but if the server is not down, you'll see the "is alive" message
immediately after.  Basically, during normal usage, the server is
responding very quickly, so the client assumes it will always respond
that fast.  Then, under heavy load, the slower response makes the client
a little paranoid.

I've seen this when running NFS over WiFi, where the ping times are
usually not consistent.

One thing is to just ignore the messages and accept that this is a
natural side effect of high loads.  Another would be to use TCP mounts
instead of UDP mounts, which don't have this trouble.

What kind of network topology is between the two machines?  Do you notice
a high load on the hub/switch/routers during these activities?  You may
be able to improve the intervening network topology to improve the
problem as well.

> > 
> > in /var/log/messages ...
> > 
> > I'm running nfsd with the standard flags:
> > 
> > 	nfs_server_flags="-u -t -n 4"
> > 
> > Is there something that I can do to reduce this problem?  increase number 
> > of nfsd processes?  force a tcp connection?
> 
> You could try giving the nfsd processes more priority as root with
> rtprio. If the file /var/run/nfsd.pid exist then you could try something
> like: rtprio 10 -`cat /var/run/nfds.pid`.
> 
> You could also try giving the other porcesses less priority like
> nice -n 2 rsync. But i'm am not show how this works at the other end. 
> 
> > The issue is more prevalent when I have >4 processes trying to read from 
> > the nfs mounts ... should there be one mount per process?  the process(es) 
> > in question are rsync, if that helps ... they tend to be a bit more 'disk 
> > intensive' then most processes, which is why I thought of increasing -n 
> > ...

Might help.  I would look at networking before I looked at disk usage ...
are there dropped packets and the like.  But it could be either.

<snip>


-- 
Bill Moran
Potential Technologies
http://www.potentialtech.com



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20041005085102.376a7e95.wmoran>