From owner-freebsd-stable@FreeBSD.ORG Mon May 22 21:43:34 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 752B816A8AB for ; Mon, 22 May 2006 21:43:34 +0000 (UTC) (envelope-from grafan@gmail.com) Received: from nz-out-0102.google.com (nz-out-0102.google.com [64.233.162.199]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9EDDF43D5C for ; Mon, 22 May 2006 21:43:32 +0000 (GMT) (envelope-from grafan@gmail.com) Received: by nz-out-0102.google.com with SMTP id l8so1072633nzf for ; Mon, 22 May 2006 14:43:32 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=ctFO1qfBfwSw5llHlIKaGqevTOLzwRFGY5/RKF9b9Jnyan9fK7l60b+Hnje8FYVotisPoeG9VEWwOP7zfIjuk9S14XVx1Ac6rTo8FeYgG9zqxvtnCdKPKSs6K0qIKLENxVbd7p4s9MZ8sEqB9ZJblhQ0AUdJ/Zx05yQUZ/wLRUA= Received: by 10.36.74.10 with SMTP id w10mr1187905nza; Mon, 22 May 2006 14:43:32 -0700 (PDT) Received: by 10.36.60.19 with HTTP; Mon, 22 May 2006 14:43:32 -0700 (PDT) Message-ID: <6eb82e0605221443m5cc3c93bwaf9126ff2fb59667@mail.gmail.com> Date: Mon, 22 May 2006 17:43:32 -0400 From: "Rong-en Fan" To: "Howard Leadmon" , "Kris Kennaway" In-Reply-To: <20060515024958.GA99002@xor.obsecurity.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <017301c67784$45377a90$071872cf@Leadmon.local> <20060515024958.GA99002@xor.obsecurity.org> Cc: freebsd-stable@freebsd.org Subject: Re: Trouble with NFSd under 6.1-Stable, any ideas? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 May 2006 21:43:52 -0000 On 5/14/06, Kris Kennaway wrote: > On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote: > > > > Hello All, > > > > I have been running FBSD a long while, and actually running since the = 5.x > > releases on the server I am having troubles with. I basically have a = small > > network and just use NIS/NFS to link my various FBSD and Solaris machin= es > > together. > > > > This has all been running fine up till a few days ago, when all of a s= udden > > NFS came to a crawl, and CPU usage so high the box appears to freeze al= most. > > When I had 6.1-RC running all seemed well, then came the announcement f= or the > > official 6.1 release, so I did the cvs updates, made world, kernel, and= ran > > mergemaster to get everything up to the 6.1 stable version. > > > > Now after doing this, something is wrong with NFS. It works, it will= return > > information and open files, just it's very very slow, and while perform= ing a > > request the CPU spike is astounding. A simple du of my home directory = can > > take minutes, and machine all but locks up if the request is done over = NFS. > > Here is top snip: > > > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMM= AND > > 497 root 1 4 0 1252K 780K - 2 50:42 188.48% nfs= d > > > > > > This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM = on a > > disk array, and locally is screams, heck NFS used to scream till I upda= ted. I > > am not really sure what info would be useful in debugging, so won't pos= t tons > > of misc junk in this eMail, but if anyone has any ideas as to how best = to > > figure out and resolve this issue it would sure be appreicated... > > Use tcpdump and related tools to find out what traffic is being sent. > > Also verify that you did not change your system configuration in any > way: there have been no changes to NFS since the release, so it is > unclear why an update would cause the problem to suddenly occur. > > Kris Hi Kris and Howard, As I posted few days ago, I have similar problems like Howard's (some details in the thread "6.1-RELEASE, em0 high interrupt rate and nfsd eats lots of cpu" on stable@). After binary searching the source tree, I found that RELENG_6_1, 2006.04.30.03.57 ok RELENG_6_1, 2006.04.30.04.00 bad The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91. With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90, the same problem occurs. Let me refresh what problems I'm seeing 1. a client (no matter Linux 2.6.16 or FreeBSD 6.1) runs du on a nfs directory 2. on server-side, nfsd starts to eats lots of CPU 3. the du finishes 4. on server-side, nfsd still eats lots of CPU, but there is no nfs traffic. Wait for 5 minutes, you can still see that nfsd is "running" and eats lots of CPU. On FreeBSD 6.1R client, it uses UDP mount and fstab is like "rw,-L,nosuid,bg,nodev". On Linux cleint, it uses UDP mount and fstab is like "defaults,udp,hard,intr,nfsvers=3D3,rsize=3D8192,wsize=3D8192= ". The server's kernel conf is at http://www.rafan.org/FreeBSD/nfs/KERNEL Some related configuration files: /etc/export /export/dir1 host1 host2... /export/dir2 host1 host2... /etc/rc.conf nfs_server_enable=3D"YES" nfs_server_flags=3D"-u -t -n 16" mountd_enable=3D"YES" mountd_flags=3D"-r -l -n" rpc_lockd_enable=3D"YES" rpc_statd_enable=3D"YES" rpcbind_enable=3D"YES" /etc/fstab: /dev/... /export/dir1 ufs rw,nosuid,noexec 2 2 /dev/... /export/dir2 ufs rw,nosuid,noexec,userquota 2 2 The NFS server is also using amd to mount some backup directories from another NFS server. the amd.conf is [global] browsable_dirs =3D yes map_type =3D file mount_type =3D nfs auto_dir =3D /nfs fully_qualified_hosts =3D no log_file =3D syslog nfs_proto =3D udp nfs_allow_insecure_port =3D no nfs_vers =3D 3 # plock =3D yes selectors_on_default =3D yes restart_mounts =3D yes [/backup] map_options =3D type:=3Ddirect map_name =3D /etc/amd.direct /etc/amd.direct: /defaults opts:=3Drw,grpid,resvport,vers=3D3,proto=3Dudp,nosuid,nodev,rsize=3D8192,ws= ize=3D8192 backup type:=3Dnfs;rhost:=3Dnfs2;rfs:=3D/nfs2/${host} If there are any thing I can provide to help tracking this down. Please let me know. By the way, I tried with truss/kdump to see what happens when nfsd eats lot of CPUs, but in vain. They do not return anything. Regards, Rong-En Fan