From owner-freebsd-hackers Wed Apr 25 7: 8:59 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from mutare.noc.clara.net (mutare.noc.clara.net [195.8.70.94]) by hub.freebsd.org (Postfix) with ESMTP id 718BD37B423 for ; Wed, 25 Apr 2001 07:08:53 -0700 (PDT) (envelope-from ollie@mutare.noc.clara.net) Received: from ollie by mutare.noc.clara.net with local (Exim 3.22 #50) id 14sPyO-000A4i-00 for freebsd-hackers@freebsd.org; Wed, 25 Apr 2001 15:08:52 +0100 Date: Wed, 25 Apr 2001 15:08:52 +0100 From: Oliver Cook To: freebsd-hackers@freebsd.org Subject: open (vfs_syscalls.c:994) && NFS Message-ID: <20010425150852.B37512@mutare.noc.clara.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i X-Operating-System: FreeBSD 4.2-RELEASE i386 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG A little bit of background: these systems are FreeBSD 3.x and 4.x installations running Apache 1.3.x serving webpages stored on a NetApp filer over NFS. One folder has a corrupt directory entry: /clara/htdocs/clara.net/k/o/m/komunikation/webspace/ Trying to 'cat', 'cp' etc any file in this directory results in a process locked in "D" disk wait. After about a week there are hundreds of stuck httpd processes in exactly this state. It is not possible to attach to them, but information can be gleaned from a kernel backtrace: hera[/]# ps aux|grep httpd|grep " D"|head 1 claranet 82569 0.0 0.0 2464 68 ?? D 6:47AM 0:00.01 /usr/local/apache/bin/httpd Broken pipe hera[/]# gdb -k /sys/compile/HERA/kernel.debug /dev/mem GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd"... IdlePTD 3080192 initial pcb at be17000 panic messages: --- --- #0 mi_switch () at ../../kern/kern_synch.c:859 859 if (switchtime.tv_sec == 0) (kgdb) proc 82569 (kgdb) bt #0 mi_switch () at ../../kern/kern_synch.c:859 #1 0xc01467e9 in tsleep (ident=0xe00a3aca, priority=18, wmesg=0xc024a79b "nfsvinval", timo=0) at ../../kern/kern_synch.c:468 #2 0xc01ad14f in nfs_vinvalbuf (vp=0xe0097b80, flags=1, cred=0xc691e800, p=0xe27c8220, intrflg=1) at ../../nfs/nfs_bio.c:1170 #3 0xc01d02a6 in nfs_open (ap=0xe2878e10) at ../../nfs/nfs_vnops.c:506 #4 0xc01736af in vn_open (ndp=0xe2878edc, fmode=1, cmode=420) at vnode_if.h:189 #5 0xc016f6a1 in open (p=0xe27c8220, uap=0xe2878f80) at ../../kern/vfs_syscalls.c:994 #6 0xc02238e6 in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 4, tf_esi = 672559256, tf_ebp = -1077937648, tf_isp = -494432300, tf_ebx = 672502180, tf_edx = 672559256, tf_ecx = 15, tf_eax = 5, tf_trapno = 12, tf_err = 2, tf_eip = 672418516, tf_cs = 31, tf_eflags = 659, tf_esp = -1077937692, tf_ss = 47}) at ../../i386/i386/trap.c:1073 #7 0xc0218be6 in Xint0x80_syscall () #8 0x8062fe0 in ?? () #9 0x806ccdd in ?? () #10 0x806618c in ?? () #11 0x80797f4 in ?? () #12 0x807985e in ?? () #13 0x8071027 in ?? () #14 0x80712ac in ?? () #15 0x807162c in ?? () #16 0x8071b41 in ?? () #17 0x8072144 in ?? () #18 0x804a159 in ?? () (kgdb) fr 5 #5 0xc016f6a1 in open (p=0xe27c8220, uap=0xe2878f80) at ../../kern/vfs_syscalls.c:994 994 error = vn_open(&nd, flags, cmode); (kgdb) print nd $1 = {ni_dirp = 0x80e6a64 "/clara/htdocs/clara.net/k/o/m/komunikation/webspace/mabel.xls", ni_segflg = UIO_USERSPACE, ni_startdir = 0x0, ni_rootdir = 0xdd196ec0, ni_topdir = 0x0, ni_vp = 0xe0097b80, ni_dvp = 0xe0097c20, ni_pathlen = 1, ni_next = 0xe0424036 "htm", ni_loopcnt = 1, ni_cnd = {cn_nameiop = 0, cn_flags = 49220, cn_proc = 0xe27c8220, cn_cred = 0xc691e800, cn_pnbuf = 0xe0424000 "", cn_nameptr = 0xe042402d "ce/n1nhs.htm", cn_namelen = 9, cn_consume = 0}} (kgdb) print nd->ni_cnd->cn_nameptr $2 = 0xe042402d "ce/n1nhs.htm" (kgdb) print nd->ni_cnd->cn_nameptr $3 = 0xe042402d "ce/n1nhs.htm" The pointer ni_dirp contains a reference to a file in the directory with the corrupt entry. This is true for ALL the processes that are stuck in 'D'. What does change is the pointer cn_nameptr, which changes for every web request. I would have thought that httpd would have alloc'ed memory for the open(), so I am at a loss at as to why the ni_dirp pointer contains the reference to the Excel spreadsheet in the directory with the corrupt entry. Why does this not change from request to request as more files are opened and closed over NFS? Can anybody explain what is going on with open()? Thanks. Ollie -- Oliver Cook Systems Administrator, ClaraNET ollie@uk.clara.net 020 7903 3000 ext. 291 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message