From owner-freebsd-current@FreeBSD.ORG Tue Oct 19 21:13:17 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 49DE316A530 for ; Tue, 19 Oct 2004 21:13:17 +0000 (GMT) Received: from mail1.speakeasy.net (mail1.speakeasy.net [216.254.0.201]) by mx1.FreeBSD.org (Postfix) with ESMTP id 05FA343D46 for ; Tue, 19 Oct 2004 21:13:17 +0000 (GMT) (envelope-from john@baldwin.cx) Received: (qmail 20957 invoked from network); 19 Oct 2004 21:13:16 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 19 Oct 2004 21:13:16 -0000 Received: from zion.baldwin.cx (zion.baldwin.cx [192.168.0.7]) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i9JLDC2j050321; Tue, 19 Oct 2004 17:13:12 -0400 (EDT) (envelope-from john@baldwin.cx) Received: from zion.baldwin.cx (localhost [127.0.0.1]) by zion.baldwin.cx (8.12.10/8.12.10) with ESMTP id i9JLDC8h005714; Tue, 19 Oct 2004 17:13:12 -0400 (EDT) (envelope-from john@zion.baldwin.cx) Received: from localhost (localhost [[UNIX: localhost]]) by zion.baldwin.cx (8.12.10/8.12.10/Submit) id i9JLDCRr005713; Tue, 19 Oct 2004 17:13:12 -0400 (EDT) (envelope-from john) From: John Baldwin To: freebsd-current@freebsd.org Date: Tue, 19 Oct 2004 17:03:52 -0400 User-Agent: KMail/1.6 References: In-Reply-To: MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200410191703.52737.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: Robert Watson cc: current@freebsd.org Subject: Re: Preemption-related bug in propagate_priority(): it's OK to hold Giant but not be running X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Oct 2004 21:13:17 -0000 On Sunday 17 October 2004 03:17 pm, Robert Watson wrote: > Ran into an interesting problem relating to WITNESS and a > multiply-destroyed mutex today, but I didn't get the real panic message > because of what looks like one or two other bugs. Here's the high-level: > > sodealloc() tries to destroy the mutex on a socket more than once, causing > WITNESS to panic. The soft clock ithread preempts the serial output in > panic() when a critical section is released, resulting in a context switch > to that ithread. That ithread then attempts to lock a mutex, and is > horrified to find out that Giant is held by the thread it preempted, but > that that thread isn't blocked: > > tiger-2# ppanic: process 15955(tcpconnect):2 holds Giant but isn't blocked > on a lock > > % rwatson -- note that double p is from the initiating panic > > cpuid = 0 > KDB: enter: panic > [thread 100071] > Stopped at kdb_enter+0x2b: nop > db> trace > kdb_enter(c07fc101) at kdb_enter+0x2b > panic(c07ff1ec,3e53,c5f90588,2,c080c957) at panic+0x127 > propagate_priority(c547b180,c08c47b8,c5f91780,c08bdbc0,0) at > propagate_priority+ > 0x139 > turnstile_wait(0,c08bdbc0,c5f91780,c08bdbc0,2,c07fb4a5,217) at > turnstile_wait+0x > 2de > _mtx_lock_sleep(c08bdbc0,c547b180,0,c07fd124,f7) at _mtx_lock_sleep+0x15f > _mtx_lock_flags(c08bdbc0,0,c07fd124,f7,0) at _mtx_lock_flags+0x85 > softclock(0) at softclock+0x16a > ithread_loop(c5436e80,e942fd48,c5436e80,c05f7c80,0) at ithread_loop+0x124 > fork_exit(c05f7c80,c5436e80,e942fd48) at fork_exit+0xa4 > fork_trampoline() at fork_trampoline+0x8 > --- trap 0x1, eip = 0, esp = 0xe942fd7c, ebp = 0 --- > db> show locks > db> show pcpu > cpuid = 0 > curthread = 0xc547b180: pid 87 "swi4: clock sio" > curpcb = 0xe942fda0 > fpcurthread = none > idlethread = 0xc53e4a80: pid 14 "idle: cpu0" > APIC ID = 0 > currentldt = 0x28 > spin locks held: > db> ps > pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd > 15955 c5f90400 ebbb1000 0 525 15955 0004002 [Can run] tcpconnect > ... > 87 c548b400 e946e000 0 0 0 000020c [LOCK Giant c5432d00] > swi4: clock sio > db> trace 15955 > sched_switch(c5f91780,c5451d80,6) at sched_switch+0x16f > mi_switch(6,c5451d80,c5451ed0,c5451d80,ebba492c) at mi_switch+0x264 > maybe_preempt(c5451d80) at maybe_preempt+0x156 > sched_add(c5451d80,4,c53e2d80,c5451d80,c5450800) at sched_add+0x153 > setrunqueue(c5451d80,4) at setrunqueue+0xab > ithread_schedule(c53e2d80,1f,c5f91780,c5f91780,2580) at > ithread_schedule+0xb3 > intr_execute_handlers(c53dc0d8,ebba49a8,1f,ebba49f0,c0780ad3) at > intr_execute_ha > ndlers+0xf5 > lapic_handle_intr(4f) at lapic_handle_intr+0x2e > Xapic_isr2() at Xapic_isr2+0x33 > --- interrupt, eip = 0xc0619446, esp = 0xebba49ec, ebp = 0xebba49f0 --- > critical_exit(3f8,ebba4a30,c076f731,c08fa000,0) at critical_exit+0x92 > _mtx_unlock_spin_flags(c08fa000,0,c08174fc,be4) at > _mtx_unlock_spin_flags+0x8d > siocnputc(c08987c0,70) at siocnputc+0xb9 > cnputc(70) at cnputc+0x4d > putchar(70,ebba4b44) at putchar+0x52 > kvprintf(c07fc2ed,c06219b8,ebba4b44,a,ebba4b64) at kvprintf+0x77 > printf(c07fc2ed,c08bf060,c08bf060,100,c07ff91e,ebba4b98) at printf+0x43 > panic(c07ff91e,c07b69b1,c0813aa8,c07ff6af,c5a23ebc) at panic+0xcb > witness_destroy(c5a23ebc,c5a23dec,ebba4bd0,c063f15f,c5a23ebc) at > witness_destroy > +0x40 > mtx_destroy(c5a23ebc,c5e94e00,c5a23ebc,ebba4be8,c063f4fd) at > mtx_destroy+0x7a > sodealloc(c5a23dec,c5a23eb0,c5832bb0,c5a23dec,ebba4c00) at sodealloc+0xf3 > sofree(c5a23dec) at sofree+0x281 > soclose(c5a23dec,c5832bb0,0,ebba4c2c,c05efad0) at soclose+0x24d > soo_close(c5832bb0,c5f91780) at soo_close+0x4b > fdrop_locked(c5832bb0,c5f91780,c53ccf40,0,c07f90b1) at fdrop_locked+0x84 > fdrop(c5832bb0,c5f91780,ebba4c78,c0628c5c,c07f90ba) at fdrop+0x24 > closef(c5832bb0,c5f91780,c5ebc628,0,c07f90b1) at closef+0x1db > close(c5f91780,ebba4d14,1,14b,292) at close+0xe6 > syscall(2f,bfbf002f,bfbf002f,3,bfbfedf7) at syscall+0x213 > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (6, FreeBSD ELF32, close), eip = 0x280c154b, esp = 0xbfbfec0c, > ebp = > 0xbfbfec48 --- > db> show locks 15955 > exclusive sleep mutex Giant r = 0 (0xc08bdbc0) locked @ > kern/kern_descrip.c:967 > > So on face value, it looks like propagate_priority() is unhappy about the > tcpconnect thread (that was preempted) still holding Giant. However, > given that preemption is supposed to preempt but leave a thread runnable, > this would seem not to be a panic-worthy condition. Yeah, pp doesn't like a thread in TD_CAN_RUN(). That state can only happen if you have already panic'd though. I can fix it to allow TD_CAN_RUN if the kernel has already panic'd though. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/