Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 4 May 2010 10:34:02 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Cheng-Lin Yang <yuwen@exodus.cs.ccu.edu.tw>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>, lab <lab@cs.ccu.edu.tw>
Subject:   Re: Struggling on NFS problem
Message-ID:  <Pine.GSO.4.63.1005041019180.14588@muncher.cs.uoguelph.ca>
In-Reply-To: <1272960060.34062.yuwen@exodus.cs.ccu.edu.tw>
References:  <1272960060.34062.yuwen@exodus.cs.ccu.edu.tw>

next in thread | previous in thread | raw e-mail | index | archive | help


On Tue, 4 May 2010, Cheng-Lin Yang wrote:

> Dear all,
> Currently, we have a NFS server which runs FreeBSD8 with ZFS and few workstations as NFS client (2 * FreeBSD8 amd64 + 1 * FreeBSD7.2 i386 + 2 * Fedora + Debian). We spotted that NFS performs weirdly on FreeBSD clients, which will significantly slow down the system response. The only solution to it is to reboot the clients (Linux client runs smoothly). So we try to use "nfsstat -c" on FreeBSD client to dig into the problem and found strange result (http://pastebin.com/K71qpEDG) :
> csie0[~]# nfsstat -c
[stuff snipped]
>
> As you can see, the value of "BioW Hits" is a negative number. Shouldn't it be equal or larger than zero? We have totally no idea on this issue. Please kindly help us on investigating the problem. Any suggestion is extremely welcomed. Thank you.
>
I suspect that the negative value is just a wrap around (assuming you're
on a 32bit arch) and hust means lottsa hits. If that is the case, it
suggests a fairly heavy write load, which can be an issue for servers
using ZFS (as others have already posted about).

There are a # of patches for FreeBSD8.0 related to NFS (one specifically
w.r.t. the server using ZFS) at:
 	http://people.freebsd.org/~rmacklem

If you are using FreeBSD8.0 for the server, it would be worth trying
these patches (they are all independent, in that any of them can be
applied, in any order). (If you are using a recent stable/8, then
you should already have the patches.)

In particular, one of them fixes a case where FreeBSD clients will
get stuck looping trying to access a file after it has been deleted
on the server, because the server reported EIO instead of ESTALE for
this case.

If the patches don't help, please try to collect more information
from both the slow clients and server. "ps axl" on them all can be
useful. Also, you can use "tcpdump -s 0 -w <file> host <clienthost>"
to capture traffic between the slow client and server which can be
looked at via wireshark. (tcpdump doesn't decode NFS traffic well,
but a binary capture from tcpdump goes into wireshark ok and it does
understand NFS traffic) If you get to this point, you can email me
the "<file>" as an attachment and I can take a look at it. If you
look at it, one scenario that is of interest is where the client
just keeps retrying the same NFS RPC.

Good luck with it and let us know how it goes, rick




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.1005041019180.14588>