From owner-freebsd-current Mon Jun 28 2: 2:20 1999 Delivered-To: freebsd-current@freebsd.org Received: from allegro.lemis.com (allegro.lemis.com [192.109.197.134]) by hub.freebsd.org (Postfix) with ESMTP id 6814D14BDA for ; Mon, 28 Jun 1999 02:02:12 -0700 (PDT) (envelope-from grog@freebie.lemis.com) Received: from freebie.lemis.com (freebie.lemis.com [192.109.197.137]) by allegro.lemis.com (8.9.1/8.9.0) with ESMTP id SAA17555; Mon, 28 Jun 1999 18:32:08 +0930 (CST) Received: (from grog@localhost) by freebie.lemis.com (8.9.3/8.9.0) id SAA46890; Mon, 28 Jun 1999 18:32:06 +0930 (CST) Date: Mon, 28 Jun 1999 18:32:06 +0930 From: Greg Lehey To: Peter Wemm Cc: Kirk McKusick , Matthew Dillon , Alan Cox , Julian Elischer , Mike Smith , "John S. Dyson" , dg@root.com, dyson@iquest.net, current@freebsd.org Subject: Re: Found the startup panic - ccd ( patch included ) Message-ID: <19990628183206.T43194@freebie.lemis.com> References: <199906280347.UAA01061@flamingo.McKusick.COM> <19990628083631.B54E582@overcee.netplex.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4i In-Reply-To: <19990628083631.B54E582@overcee.netplex.com.au>; from Peter Wemm on Mon, Jun 28, 1999 at 04:36:31PM +0800 WWW-Home-Page: http://www.lemis.com/~grog X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-41-739-7062 Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Monday, 28 June 1999 at 16:36:31 +0800, Peter Wemm wrote: > Kirk McKusick wrote: > [..] > Re: kern_lock.c - looks like a reasonable fix. There isn't much point > panicing for a poll. > >> I do not see the problem that you are pointing out with missing >> BUF_KERNPROC in cluster_callback, but it is well past midnight, >> so I may not be thinking clearly. > > No, you are correct. I had experimented with moving the cluster_head > reassignment to LK_KERNPROC out of BUF_KERNPROC() and into the point > where it was added into the tailq, but in the end I realized it wasn't > much different and suspected that there could be problems if the cluster > build was abandoned at some point, so I left it the way it is. > >> Greg Lehey has sent me a panic with the buffer locking in the NFS code. >> I am too tired to attack it tonight, but will look at it in the morning. > > I might have a look if I get a chance.. I've been collecting them :-) The first one looks like this: Debugger (msg=0xc025e7fb "panic") at ../../i386/i386/db_interface.c:326 326 } #0 Debugger (msg=0xc025e7fb "panic") at ../../i386/i386/db_interface.c:326 #1 0xc0153474 in panic (fmt=0xc02676a0 "nfs_strategy: buffer %p not locked") at ../../kern/kern_shutdown.c:450 #2 0xc01e41b6 in nfs_strategy (ap=0xc6df2d84) at ../../nfs/nfs_vnops.c:2650 #3 0xc01e4be1 in nfs_writebp (bp=0xc236d7d8, force=0x0, procp=0x0) at vnode_if.h:891 #4 0xc01b1f93 in nfs_write (ap=0xc6df2eb4) at ../../nfs/nfs_bio.c:975 #5 0xc01807d2 in vn_write (fp=0xc0f55000, uio=0xc6df2efc, cred=0xc0f52300, flags=0x0) at vnode_if.h:331 #6 0xc015f4c8 in dofilewrite (p=0xc68aa260, fp=0xc0f55000, fd=0x1, buf=0x80a4000, nbyte=0x400, offset=0xffffffffffffffff, flags=0x0) at ../../kern/sys_generic.c:363 #7 0xc015f3d7 in write (p=0xc68aa260, uap=0xc6df2f80) at ../../kern/sys_generic.c:298 #8 0xc022aee6 in syscall (frame={tf_fs = 0x2f, tf_es = 0x2f, tf_ds = 0x2f, tf_edi = 0x80a4000, tf_esi = 0x400, tf_ebp = 0xbfbfa6c4, tf_isp = 0xc6df2fd4, tf_ebx = 0x0, tf_edx = 0x1, tf_ecx = 0xa, tf_eax = 0x4, tf_trapno = 0x0, tf_err = 0x2, tf_eip = 0x806a7f8, tf_cs = 0x1f, tf_eflags = 0x246, tf_esp = 0xbfbfa6a8, tf_ss = 0x2f}) at ../../i386/i386/trap.c:1056 #9 0xc021c960 in Xint0x80_syscall () #10 0x8052c3c in ?? () #11 0x8052bea in ?? () #12 0x804a170 in ?? () #13 0x804b45c in ?? () #14 0x804a675 in ?? () #15 0x8051073 in ?? () #16 0x8050fef in ?? () #17 0x80480e9 in ?? () It happens when I do just about any NFS write; I've been reproducing it with 'make depend' in the NFS-mounted kernel build directory. I'll try to get a dump and send you both a message about where you can find it. The other one is even more simple to reproduce: $ dd if=/dev/da0d of=/dev/null bs=120b Debugger (msg=0xc025e7fb "panic") at ../../i386/i386/db_interface.c:326 326 } #0 Debugger (msg=0xc025e7fb "panic") at ../../i386/i386/db_interface.c:326 #1 0xc0153474 in panic (fmt=0xc025d9a0 "lockmgr: locking against myself") at ../../kern/kern_shutdown.c:450 #2 0xc014eafb in debuglockmgr (lkp=0xc2368c4c, flags=0x10022, interlkp=0xc02c4564, p=0xc68aa3c0, name=0xc0261452 "lockmgr", file=0xc026145a "../../sys/buf.h", line=0x11b) at ../../kern/kern_lock.c:341 #3 0xc0173fc2 in getblk (vp=0xc6dcd780, blkno=0x20, size=0x2000, slpflag=0x0, slptimeo=0x0) at ../../sys/buf.h:283 #4 0xc0172531 in breadn (vp=0xc6dcd780, blkno=0x20, size=0x2000, rablkno=0xc6deee44, rabsize=0xc6deee48, cnt=0x1, cred=0x0, bpp=0xc6deee4c) at ../../kern/vfs_bio.c:433 #5 0xc01864d5 in spec_read (ap=0xc6deeeb4) at ../../miscfs/specfs/spec_vnops.c:308 #6 0xc01f8378 in ufsspec_read (ap=0xc6deeeb4) at ../../ufs/ufs/ufs_vnops.c:1826 #7 0xc01f8931 in ufs_vnoperatespec (ap=0xc6deeeb4) at ../../ufs/ufs/ufs_vnops.c:2327 #8 0xc0180684 in vn_read (fp=0xc0f56d80, uio=0xc6deeefc, cred=0xc0f52200, flags=0x0) at vnode_if.h:303 #9 0xc015f157 in dofileread (p=0xc68aa3c0, fp=0xc0f56d80, fd=0x3, buf=0x805d000, nbyte=0xf000, offset=0xffffffffffffffff, flags=0x0) at ../../kern/sys_generic.c:179 #10 0xc015f067 in read (p=0xc68aa3c0, uap=0xc6deef80) at ../../kern/sys_generic.c:111 #11 0xc022aee6 in syscall (frame={tf_fs = 0x2f, tf_es = 0x2f, tf_ds = 0x2f, tf_edi = 0xbfbfcfdc, tf_esi = 0xbfbfcfc8, tf_ebp = 0xbfbfcf90, tf_isp = 0xc6deefd4, tf_ebx = 0xbfbfcfc8, tf_edx = 0x4, tf_ecx = 0x80580a0, tf_eax = 0x3, tf_trapno = 0x16, tf_err = 0x2, tf_eip = 0x8049eec, tf_cs = 0x1f, tf_eflags = 0x246, tf_esp = 0xbfbfcf80, tf_ss = 0x2f}) at ../../i386/i386/trap.c:1056 #12 0xc021c960 in Xint0x80_syscall () #13 0x8048c6d in ?? () #14 0x80480e9 in ?? () For some reason, breadn() doesn't seem to work any more. It works find with bread(), but I haven't localized how the code differs. I've applied Kirk's patch to lockmgr, which seemed to relate to the breadn() problem, but it then found a different set of flags to pass to lockmgr (LK_SLEEPFAIL instead of LK_NOWAIT) and still cause the panic. Greg -- See complete headers for address, home page and phone numbers finger grog@lemis.com for PGP public key To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message