Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 May 2006 17:43:32 -0400
From:      "Rong-en Fan" <grafan@gmail.com>
To:        "Howard Leadmon" <howard@leadmon.net>, "Kris Kennaway" <kris@obsecurity.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Trouble with NFSd under 6.1-Stable, any ideas?
Message-ID:  <6eb82e0605221443m5cc3c93bwaf9126ff2fb59667@mail.gmail.com>
In-Reply-To: <20060515024958.GA99002@xor.obsecurity.org>
References:  <017301c67784$45377a90$071872cf@Leadmon.local> <20060515024958.GA99002@xor.obsecurity.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 5/14/06, Kris Kennaway <kris@obsecurity.org> wrote:
> On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote:
> >
> >    Hello All,
> >
> >  I have been running FBSD a long while, and actually running since the =
5.x
> > releases on the server I am having troubles with.   I basically have a =
small
> > network and just use NIS/NFS to link my various FBSD and Solaris machin=
es
> > together.
> >
> >  This has all been running fine up till a few days ago, when all of a s=
udden
> > NFS came to a crawl, and CPU usage so high the box appears to freeze al=
most.
> > When I had 6.1-RC running all seemed well, then came the announcement f=
or the
> > official 6.1 release, so I did the cvs updates, made world, kernel, and=
 ran
> > mergemaster to get everything up to the 6.1 stable version.
> >
> >  Now after doing this, something is wrong with NFS.   It works, it will=
 return
> > information and open files, just it's very very slow, and while perform=
ing a
> > request the CPU spike is astounding.  A simple du of my home directory =
can
> > take minutes, and machine all but locks up if the request is done over =
NFS.
> > Here is top snip:
> >
> >   PID USERNAME   THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMM=
AND
> >   497 root         1   4    0  1252K   780K -      2  50:42 188.48% nfs=
d
> >
> >
> >  This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM =
on a
> > disk array, and locally is screams, heck NFS used to scream till I upda=
ted.  I
> > am not really sure what info would be useful in debugging, so won't pos=
t tons
> > of misc junk in this eMail, but if anyone has any ideas as to how best =
to
> > figure out and resolve this issue it would sure be appreicated...
>
> Use tcpdump and related tools to find out what traffic is being sent.
>
> Also verify that you did not change your system configuration in any
> way: there have been no changes to NFS since the release, so it is
> unclear why an update would cause the problem to suddenly occur.
>
> Kris

Hi Kris and Howard,

As I posted few days ago, I have similar problems like Howard's
(some details in the thread "6.1-RELEASE, em0 high interrupt rate
and nfsd eats lots of cpu" on stable@). After binary searching
the source tree, I found that

RELENG_6_1, 2006.04.30.03.57 ok
RELENG_6_1, 2006.04.30.04.00 bad

The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91.
With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90,
the same problem occurs.

Let me refresh what problems I'm seeing

1. a client (no matter Linux 2.6.16 or FreeBSD 6.1) runs du on
   a nfs directory
2. on server-side, nfsd starts to eats lots of CPU
3. the du finishes
4. on server-side, nfsd still eats lots of CPU, but there is no
   nfs traffic. Wait for 5 minutes, you can still see that nfsd is
   "running" and eats lots of CPU.

On FreeBSD 6.1R client, it uses UDP mount and fstab is like
"rw,-L,nosuid,bg,nodev". On Linux cleint, it uses UDP mount and
fstab is like "defaults,udp,hard,intr,nfsvers=3D3,rsize=3D8192,wsize=3D8192=
".
The server's kernel conf is at

http://www.rafan.org/FreeBSD/nfs/KERNEL

Some related configuration files:

/etc/export
  /export/dir1 host1 host2...
  /export/dir2 host1 host2...

/etc/rc.conf
nfs_server_enable=3D"YES"
nfs_server_flags=3D"-u -t -n 16"
mountd_enable=3D"YES"
mountd_flags=3D"-r -l -n"
rpc_lockd_enable=3D"YES"
rpc_statd_enable=3D"YES"
rpcbind_enable=3D"YES"

/etc/fstab:
/dev/...  /export/dir1 ufs rw,nosuid,noexec 2 2
/dev/...  /export/dir2 ufs rw,nosuid,noexec,userquota 2 2

The NFS server is also using amd to mount some backup directories
from another NFS server. the amd.conf is

[global]
browsable_dirs =3D yes
map_type =3D file
mount_type =3D nfs
auto_dir =3D /nfs
fully_qualified_hosts =3D no
log_file =3D syslog
nfs_proto =3D udp
nfs_allow_insecure_port =3D no
nfs_vers =3D 3
# plock =3D yes
selectors_on_default =3D yes
restart_mounts =3D yes

[/backup]
map_options =3D type:=3Ddirect
map_name =3D /etc/amd.direct

/etc/amd.direct:
/defaults
opts:=3Drw,grpid,resvport,vers=3D3,proto=3Dudp,nosuid,nodev,rsize=3D8192,ws=
ize=3D8192
backup          type:=3Dnfs;rhost:=3Dnfs2;rfs:=3D/nfs2/${host}


If there are any thing I can provide to help tracking this down. Please
let me know. By the way, I tried with truss/kdump to see what happens
when nfsd eats lot of CPUs, but in vain. They do not return anything.

Regards,
Rong-En Fan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6eb82e0605221443m5cc3c93bwaf9126ff2fb59667>