From owner-freebsd-stable@FreeBSD.ORG Tue May 23 08:10:48 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DC5A316A422 for ; Tue, 23 May 2006 08:10:48 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from fw.zoral.com.ua (ll-227.216.82.212.sovam.net.ua [212.82.216.227]) by mx1.FreeBSD.org (Postfix) with ESMTP id E601E43D48 for ; Tue, 23 May 2006 08:10:47 +0000 (GMT) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by fw.zoral.com.ua (8.13.4/8.13.4) with ESMTP id k4N8Ag86004880 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 23 May 2006 11:10:42 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.13.6/8.13.6) with ESMTP id k4N8Agt3064332; Tue, 23 May 2006 11:10:42 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.13.6/8.13.6/Submit) id k4N8AfCC064331; Tue, 23 May 2006 11:10:41 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 23 May 2006 11:10:41 +0300 From: Konstantin Belousov To: Rong-en Fan Message-ID: <20060523081041.GL54541@deviant.kiev.zoral.com.ua> References: <017301c67784$45377a90$071872cf@Leadmon.local> <20060515024958.GA99002@xor.obsecurity.org> <6eb82e0605221443m5cc3c93bwaf9126ff2fb59667@mail.gmail.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="fLj60tP2PZ34xyqD" Content-Disposition: inline In-Reply-To: <6eb82e0605221443m5cc3c93bwaf9126ff2fb59667@mail.gmail.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV version 0.88.1, clamav-milter version 0.88.1 on fw.zoral.com.ua X-Virus-Status: Clean Cc: freebsd-stable@freebsd.org, Howard Leadmon , Kris Kennaway Subject: Re: Trouble with NFSd under 6.1-Stable, any ideas? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 May 2006 08:10:49 -0000 --fLj60tP2PZ34xyqD Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, May 22, 2006 at 05:43:32PM -0400, Rong-en Fan wrote: > On 5/14/06, Kris Kennaway wrote: > >On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote: > >> > >> Hello All, > >> > >> I have been running FBSD a long while, and actually running since the= =20 > >5.x > >> releases on the server I am having troubles with. I basically have a= =20 > >small > >> network and just use NIS/NFS to link my various FBSD and Solaris machi= nes > >> together. > >> > >> This has all been running fine up till a few days ago, when all of a= =20 > >sudden > >> NFS came to a crawl, and CPU usage so high the box appears to freeze= =20 > >almost. > >> When I had 6.1-RC running all seemed well, then came the announcement= =20 > >for the > >> official 6.1 release, so I did the cvs updates, made world, kernel, an= d=20 > >ran > >> mergemaster to get everything up to the 6.1 stable version. > >> > >> Now after doing this, something is wrong with NFS. It works, it wil= l=20 > >return > >> information and open files, just it's very very slow, and while=20 > >performing a > >> request the CPU spike is astounding. A simple du of my home directory= =20 > >can > >> take minutes, and machine all but locks up if the request is done over= =20 > >NFS. > >> Here is top snip: > >> > >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU=20 > >COMMAND > >> 497 root 1 4 0 1252K 780K - 2 50:42 188.48% nf= sd > >> > >> > >> This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM= =20 > >on a > >> disk array, and locally is screams, heck NFS used to scream till I=20 > >updated. I > >> am not really sure what info would be useful in debugging, so won't po= st=20 > >tons > >> of misc junk in this eMail, but if anyone has any ideas as to how best= to > >> figure out and resolve this issue it would sure be appreicated... > > > >Use tcpdump and related tools to find out what traffic is being sent. > > > >Also verify that you did not change your system configuration in any > >way: there have been no changes to NFS since the release, so it is > >unclear why an update would cause the problem to suddenly occur. > > > >Kris >=20 > Hi Kris and Howard, >=20 > As I posted few days ago, I have similar problems like Howard's > (some details in the thread "6.1-RELEASE, em0 high interrupt rate > and nfsd eats lots of cpu" on stable@). After binary searching > the source tree, I found that >=20 > RELENG_6_1, 2006.04.30.03.57 ok > RELENG_6_1, 2006.04.30.04.00 bad >=20 > The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91. > With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90, > the same problem occurs. >=20 > Let me refresh what problems I'm seeing >=20 > 1. a client (no matter Linux 2.6.16 or FreeBSD 6.1) runs du on > a nfs directory > 2. on server-side, nfsd starts to eats lots of CPU > 3. the du finishes > 4. on server-side, nfsd still eats lots of CPU, but there is no > nfs traffic. Wait for 5 minutes, you can still see that nfsd is > "running" and eats lots of CPU. >=20 > On FreeBSD 6.1R client, it uses UDP mount and fstab is like > "rw,-L,nosuid,bg,nodev". On Linux cleint, it uses UDP mount and > fstab is like "defaults,udp,hard,intr,nfsvers=3D3,rsize=3D8192,wsize=3D81= 92". > The server's kernel conf is at >=20 > http://www.rafan.org/FreeBSD/nfs/KERNEL >=20 > Some related configuration files: >=20 > /etc/export > /export/dir1 host1 host2... > /export/dir2 host1 host2... >=20 > /etc/rc.conf > nfs_server_enable=3D"YES" > nfs_server_flags=3D"-u -t -n 16" > mountd_enable=3D"YES" > mountd_flags=3D"-r -l -n" > rpc_lockd_enable=3D"YES" > rpc_statd_enable=3D"YES" > rpcbind_enable=3D"YES" >=20 > /etc/fstab: > /dev/... /export/dir1 ufs rw,nosuid,noexec 2 2 > /dev/... /export/dir2 ufs rw,nosuid,noexec,userquota 2 2 >=20 > The NFS server is also using amd to mount some backup directories > from another NFS server. the amd.conf is >=20 > [global] > browsable_dirs =3D yes > map_type =3D file > mount_type =3D nfs > auto_dir =3D /nfs > fully_qualified_hosts =3D no > log_file =3D syslog > nfs_proto =3D udp > nfs_allow_insecure_port =3D no > nfs_vers =3D 3 > # plock =3D yes > selectors_on_default =3D yes > restart_mounts =3D yes >=20 > [/backup] > map_options =3D type:=3Ddirect > map_name =3D /etc/amd.direct >=20 > /etc/amd.direct: > /defaults > opts:=3Drw,grpid,resvport,vers=3D3,proto=3Dudp,nosuid,nodev,rsize=3D8192,= wsize=3D8192 > backup type:=3Dnfs;rhost:=3Dnfs2;rfs:=3D/nfs2/${host} >=20 >=20 > If there are any thing I can provide to help tracking this down. Please > let me know. By the way, I tried with truss/kdump to see what happens > when nfsd eats lot of CPUs, but in vain. They do not return anything. >=20 I tried your recipe on 7-CURRENT with locally exported fs, remounted over nfs. I did not get the behaviour your described. Could you, please, provide the backtrace for the nfsd that eats the CPU (from the ddb). I think it would be helpful to get several backtraces (i.e., bt , cont, bt ...) to see where it running. Also, just in case, does filesystem that is exported and shows problem, have quotas enabled ? One line of your fstab has userquotas, other does not. --fLj60tP2PZ34xyqD Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (FreeBSD) iD8DBQFEcsOAC3+MBN1Mb4gRAsW0AJ0eDFKjG3pmdzTe+/vySWbroCbUfACgsBxP /FED5tWVvZycXwKaId17eKk= =8ugA -----END PGP SIGNATURE----- --fLj60tP2PZ34xyqD--