From owner-freebsd-current Thu Aug 16 13:36:27 2001 Delivered-To: freebsd-current@freebsd.org Received: from blackhelicopters.org (geburah.blackhelicopters.org [209.69.178.18]) by hub.freebsd.org (Postfix) with ESMTP id 8C47337B401; Thu, 16 Aug 2001 13:36:22 -0700 (PDT) (envelope-from mwlucas@blackhelicopters.org) Received: (from mwlucas@localhost) by blackhelicopters.org (8.9.3/8.9.3) id QAA52395; Thu, 16 Aug 2001 16:36:18 -0400 (EDT) (envelope-from mwlucas) Date: Thu, 16 Aug 2001 16:36:18 -0400 From: Michael Lucas To: John Baldwin Cc: current@FreeBSD.org Subject: Re: devfs and Vinum (was: any -current && vinum problems?) Message-ID: <20010816163617.A52310@blackhelicopters.org> References: <20010815181100.A48748@blackhelicopters.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: ; from jhb@FreeBSD.org on Wed, Aug 15, 2001 at 04:42:21PM -0700 Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG [cc's trimmed] John, Thanks for the suggestion, I appreciate it. I did as you suggested (diff below). It paniced again, but this time savecore said "dump time is unreasonable." The short panic message was: panicstr: bremfree: bp 0xcc2a1ae4 not locked Looks like the same thing to me, sorry. Any other suggestions? magpire/sys/kern;diff subr_witness.c subr_witness.c-dist 392a393 > mtx_lock(&all_mtx); 395d395 < mtx_lock(&all_mtx); magpire/sys/kern;diff -c subr_witness.c subr_witness.c-dist *** subr_witness.c Thu Aug 16 16:16:06 2001 --- subr_witness.c-dist Thu Aug 16 16:15:20 2001 *************** *** 390,398 **** mtx_unlock_spin(&w_mtx); } lock_cur_cnt--; STAILQ_REMOVE(&all_locks, lock, lock_object, lo_list); - mtx_lock(&all_mtx); lock->lo_flags &= ~LO_INITIALIZED; mtx_unlock(&all_mtx); } --- 390,398 ---- mtx_unlock_spin(&w_mtx); } + mtx_lock(&all_mtx); lock_cur_cnt--; STAILQ_REMOVE(&all_locks, lock, lock_object, lo_list); lock->lo_flags &= ~LO_INITIALIZED; mtx_unlock(&all_mtx); } magpire/sys/kern; On Wed, Aug 15, 2001 at 04:42:21PM -0700, John Baldwin wrote: > > On 15-Aug-01 Michael Lucas wrote: > > On Wed, Aug 15, 2001 at 10:21:39AM +0930, Greg Lehey wrote: > >> To help localize this problem, could you please try this same thing on > >> a kernel without devfs? The dump you sent me did not look like a > >> Vinum bug, as I said in my reply. > > > > Sorry, it happens on a non-devfs kernel as well. Since it doesn't > > appear to be a Vinum bug, I'm taking the liberty of sending the whole > > thing to -current. (I sent my first dump to Greg in particular, since > > a Vinum command triggered whatever this is.) > > > > > Script started on Wed Aug 15 17:57:48 2001 > > magpire/var/crash;file /boot/kernel/vinum.ko > > /boot/kernel/vinum.ko: ELF 32-bit LSB shared object, Intel 80386, version 1 > > (FreeBSD), not stripped > > magpire/var/crash;file kernel.debug.nodevfs > > kernel.debug.nodevfs: ELF 32-bit LSB executable, Intel 80386, version 1 > > (FreeBSD), dynamically linked (uses shared libs), not stripped > > magpire/var/crash;gdb -k kernel.debug.nodevfs vmcore.3 > > GNU gdb 4.18 > > Copyright 1998 Free Software Foundation, Inc. > > GDB is free software, covered by the GNU General Public License, and you are > > welcome to change it and/or distribute copies of it under certain conditions. > > Type "show copying" to see the conditions. > > There is absolutely no warranty for GDB. Type "show warranty" for details. > > This GDB was configured as "i386-unknown-freebsd"... > > IdlePTD 4284416 > > initial pcb at 34b860 > > panicstr: bremfree: bp 0xcc2a1ae4 not locked > > Unfortunately this is the panic message from later on during the syncing disks > stage, not the real panic. :( > > >#15 0xc01f0783 in witness_destroy (lock=0xc1ec4e68) at > >#../../../kern/subr_witness.c:395 > > This is the real problem: > > mtx_lock(&all_mtx); > lock_cur_cnt--; > STAILQ_REMOVE(&all_locks, lock, lock_object, lo_list); > lock->lo_flags &= ~LO_INITIALIZED; > mtx_unlock(&all_mtx); > > It panics in the STAILQ_REMOVE(). I've seen this a couple of times but have no > idea how that list pointer is getting corrupted. My guess is that a mutex is > being destroyed twice or something dumb like that; however, I'm not sure how. > The LO_INITIALIZED flags and checks are supposed to catch that case. I suppose > there is a chance we could preempt in between the LO_INITIALIZED check and the > actual removal and then free it and get in trouble that way. Hmm. Try moving > the mtx_lock of &all_mtx before the check for LO_INITIALIZED and see if you can > get a different panic. It may be a bug in the ucred stuff. (At least several > other panics of this type have been the result of crfree's.) > > -- > > John Baldwin -- http://www.FreeBSD.org/~jhb/ > PGP Key: http://www.baldwin.cx/~john/pgpkey.asc > "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ -- Michael Lucas mwlucas@blackhelicopters.org http://www.blackhelicopters.org/~mwlucas/ Big Scary Daemons: http://www.oreillynet.com/pub/q/Big_Scary_Daemons To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message