Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 May 2006 08:56:54 -0400
From:      "Rong-en Fan" <grafan@gmail.com>
To:        "Konstantin Belousov" <kostikbel@gmail.com>
Cc:        freebsd-stable@freebsd.org, Howard Leadmon <howard@leadmon.net>, Kris Kennaway <kris@obsecurity.org>
Subject:   Re: Trouble with NFSd under 6.1-Stable, any ideas?
Message-ID:  <6eb82e0605230556n31b86e55y1b07a2ef6ad9ca14@mail.gmail.com>
In-Reply-To: <20060523081041.GL54541@deviant.kiev.zoral.com.ua>
References:  <017301c67784$45377a90$071872cf@Leadmon.local> <20060515024958.GA99002@xor.obsecurity.org> <6eb82e0605221443m5cc3c93bwaf9126ff2fb59667@mail.gmail.com> <20060523081041.GL54541@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On 5/23/06, Konstantin Belousov <kostikbel@gmail.com> wrote:
> On Mon, May 22, 2006 at 05:43:32PM -0400, Rong-en Fan wrote:
> > On 5/14/06, Kris Kennaway <kris@obsecurity.org> wrote:
> > >On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote:
> > >>
> > >>    Hello All,
> > >>
> > >>  I have been running FBSD a long while, and actually running since t=
he
> > >5.x
> > >> releases on the server I am having troubles with.   I basically have=
 a
> > >small
> > >> network and just use NIS/NFS to link my various FBSD and Solaris mac=
hines
> > >> together.
> > >>
> > >>  This has all been running fine up till a few days ago, when all of =
a
> > >sudden
> > >> NFS came to a crawl, and CPU usage so high the box appears to freeze
> > >almost.
> > >> When I had 6.1-RC running all seemed well, then came the announcemen=
t
> > >for the
> > >> official 6.1 release, so I did the cvs updates, made world, kernel, =
and
> > >ran
> > >> mergemaster to get everything up to the 6.1 stable version.
> > >>
> > >>  Now after doing this, something is wrong with NFS.   It works, it w=
ill
> > >return
> > >> information and open files, just it's very very slow, and while
> > >performing a
> > >> request the CPU spike is astounding.  A simple du of my home directo=
ry
> > >can
> > >> take minutes, and machine all but locks up if the request is done ov=
er
> > >NFS.
> > >> Here is top snip:
> > >>
> > >>   PID USERNAME   THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU
> > >COMMAND
> > >>   497 root         1   4    0  1252K   780K -      2  50:42 188.48% =
nfsd
> > >>
> > >>
> > >>  This is a nice IBM eServer with dual P4-XEON's and a couple GB or R=
AM
> > >on a
> > >> disk array, and locally is screams, heck NFS used to scream till I
> > >updated.  I
> > >> am not really sure what info would be useful in debugging, so won't =
post
> > >tons
> > >> of misc junk in this eMail, but if anyone has any ideas as to how be=
st to
> > >> figure out and resolve this issue it would sure be appreicated...
> > >
> > >Use tcpdump and related tools to find out what traffic is being sent.
> > >
> > >Also verify that you did not change your system configuration in any
> > >way: there have been no changes to NFS since the release, so it is
> > >unclear why an update would cause the problem to suddenly occur.
> > >
> > >Kris
> >
> > Hi Kris and Howard,
> >
> > As I posted few days ago, I have similar problems like Howard's
> > (some details in the thread "6.1-RELEASE, em0 high interrupt rate
> > and nfsd eats lots of cpu" on stable@). After binary searching
> > the source tree, I found that
> >
> > RELENG_6_1, 2006.04.30.03.57 ok
> > RELENG_6_1, 2006.04.30.04.00 bad
> >
> > The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91.
> > With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90,
> > the same problem occurs.
> >
> > Let me refresh what problems I'm seeing
> >
> > 1. a client (no matter Linux 2.6.16 or FreeBSD 6.1) runs du on
> >   a nfs directory
> > 2. on server-side, nfsd starts to eats lots of CPU
> > 3. the du finishes
> > 4. on server-side, nfsd still eats lots of CPU, but there is no
> >   nfs traffic. Wait for 5 minutes, you can still see that nfsd is
> >   "running" and eats lots of CPU.
> >
> > On FreeBSD 6.1R client, it uses UDP mount and fstab is like
> > "rw,-L,nosuid,bg,nodev". On Linux cleint, it uses UDP mount and
> > fstab is like "defaults,udp,hard,intr,nfsvers=3D3,rsize=3D8192,wsize=3D=
8192".
> > The server's kernel conf is at
> >
> > http://www.rafan.org/FreeBSD/nfs/KERNEL
> >
> > Some related configuration files:
> >
> > /etc/export
> >  /export/dir1 host1 host2...
> >  /export/dir2 host1 host2...
> >
> > /etc/rc.conf
> > nfs_server_enable=3D"YES"
> > nfs_server_flags=3D"-u -t -n 16"
> > mountd_enable=3D"YES"
> > mountd_flags=3D"-r -l -n"
> > rpc_lockd_enable=3D"YES"
> > rpc_statd_enable=3D"YES"
> > rpcbind_enable=3D"YES"
> >
> > /etc/fstab:
> > /dev/...  /export/dir1 ufs rw,nosuid,noexec 2 2
> > /dev/...  /export/dir2 ufs rw,nosuid,noexec,userquota 2 2
> >
> > The NFS server is also using amd to mount some backup directories
> > from another NFS server. the amd.conf is
> >
> > [global]
> > browsable_dirs =3D yes
> > map_type =3D file
> > mount_type =3D nfs
> > auto_dir =3D /nfs
> > fully_qualified_hosts =3D no
> > log_file =3D syslog
> > nfs_proto =3D udp
> > nfs_allow_insecure_port =3D no
> > nfs_vers =3D 3
> > # plock =3D yes
> > selectors_on_default =3D yes
> > restart_mounts =3D yes
> >
> > [/backup]
> > map_options =3D type:=3Ddirect
> > map_name =3D /etc/amd.direct
> >
> > /etc/amd.direct:
> > /defaults
> > opts:=3Drw,grpid,resvport,vers=3D3,proto=3Dudp,nosuid,nodev,rsize=3D819=
2,wsize=3D8192
> > backup          type:=3Dnfs;rhost:=3Dnfs2;rfs:=3D/nfs2/${host}
> >
> >
> > If there are any thing I can provide to help tracking this down. Please
> > let me know. By the way, I tried with truss/kdump to see what happens
> > when nfsd eats lot of CPUs, but in vain. They do not return anything.
> >
> I tried your recipe on 7-CURRENT with locally exported fs, remounted
> over nfs. I did not get the behaviour your described.

As noted in my previous thread, I have another 6.1-RELEASE nfs server,
which does not have this problem.

> Could you, please, provide the backtrace for the nfsd that
> eats the CPU (from the ddb). I think it would be helpful to get several
> backtraces (i.e., bt <nfsd pid>, cont, bt <nfsd pid> ...) to
> see where it running.

I'm afraid that I can not do that. Last time I tried breaking into ddb (on =
5.x),
it hangs my serial console and the server is miles away :-( . Perhaps we
can ask Howard to do that?

> Also, just in case, does filesystem that is exported and shows problem,
> have quotas enabled ? One line of your fstab has userquotas, other does n=
ot.

No.

Regards,
Rong-En Fan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6eb82e0605230556n31b86e55y1b07a2ef6ad9ca14>