From owner-freebsd-current Wed Jul 3 0:26:50 2002 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 446B137B400 for ; Wed, 3 Jul 2002 00:26:44 -0700 (PDT) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5636343E3B for ; Wed, 3 Jul 2002 00:26:43 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id RAA11552; Wed, 3 Jul 2002 17:26:33 +1000 Date: Wed, 3 Jul 2002 17:32:29 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Julian Elischer Cc: Andrew Gallatin , Subject: Re: KSE signal problems still In-Reply-To: Message-ID: <20020703165856.Q15258-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Tue, 2 Jul 2002, Julian Elischer wrote: > On Tue, 2 Jul 2002, Andrew Gallatin wrote: > > > > > An easy way to induce a panic w/a post KSE -current is to ^C gdb as it > > starts on an SMP machine: > > A possibly related breakage is: > > type ^Z while doing "make buiildworld" (or something similar). > > when you type 'fg' there is a high change the build will abort.. > > > > # gdb -k /var/crash/kernel.1 /var/crash/vmcore.1 > > GNU gdb 5.2.0 (FreeBSD) 20020627 > > Copyright 2002 Free Software Foundation, Inc. > > GDB is free software, covered by the GNU General Public License, and > > you are > > welcome to change it and/or distribute copies of it under certain > > conditions. > > Type "show copying" to see the conditions. > > There is absolutely no warranty for GDB. Type "show warranty" for > > details. > > This GDB was configured as "i386-undermydesk-freebsd"... > > ^C > > > > panic: mutex sched lock not owned at ../../../kern/subr_smp.c:126 > > cpuid = 1; lapic.id = 01000000 > > Debugger("panic") > > Stopped at Debugger+0x46: xchgl %ebx,in_Debugger.0 > > db> where > > No such command > > db> tr > > Debugger(c02dbf5a) at Debugger+0x46 > > panic(c02db1a8,c02db318,c02df736,7e,c4445540) at panic+0xd6 > > _mtx_assert(c0315440,1,c02df736,7e) at _mtx_assert+0xa8 > > forward_signal(c4445540) at forward_signal+0x1a > > tdsignal(c4445540,2,2) at tdsignal+0x182 > > psignal(c443d558,2) at psignal+0x3c8 > > pgsignal(c441ad00,2,1,c441ad1c,0) at pgsignal+0x63 > > ttyinput(3,c41e8e30,c41e8e00,0,c0347903) at ttyinput+0x316 > > ptcwrite(c4307a00,d7d5ec88,7f0011,1,d7d5ebc4) at ptcwrite+0x17f > > spec_write(d7d5ebf0,d7d5ec3c,c0204cc8,d7d5ebf0,7f0011) at spec_write+0x5a > > spec_vnoperate(d7d5ebf0) at spec_vnoperate+0x13 > > vn_write(c41ded5c,d7d5ec88,c440cd80,0,c409e780) at vn_write+0x1c8 > > dofilewrite(c409e780,c41ded5c,5,8088000,1) at dofilewrite+0xaf > > write(c409e780,d7d5ed14,3,b,282) at write+0x39 > > syscall(2f,2f,2f,1,8073410) at syscall+0x23c > > syscall_with_err_pushed() at syscall_with_err_pushed+0x1b > > --- syscall (4, FreeBSD ELF, write), eip = 0x281fb3a3, esp = > > 0xbfbff37c, ebp = 0xbfbff3e8 --- > > > > > > hummmmm > > so, the question is: > where should we get the sched lock? Maybe just remove the foot-shooting that releases it? % Index: kern_sig.c % =================================================================== % RCS file: /home/ncvs/src/sys/kern/kern_sig.c,v % retrieving revision 1.170 % retrieving revision 1.171 % diff -u -1 -r1.170 -r1.171 % --- kern_sig.c 29 Jun 2002 02:00:01 -0000 1.170 % +++ kern_sig.c 29 Jun 2002 17:26:18 -0000 1.171 % @@ -1486,15 +1540,9 @@ % */ % - if (p->p_stat == SRUN) { % + mtx_unlock_spin(&sched_lock); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ shoot foot % + if (td->td_state == TDS_RUNQ || % + td->td_state == TDS_RUNNING) { I think sched_lock is needed for checking td_state too (strictly to use the result of the check, so the lock is not critical if the use doesn't do anything harmful), but there is no lock indication for td_state in proc.h like there used to be for p_stat. % + signotify(td->td_proc); Holding sched_lock when calling signotify() used to be an error, but that was changed in rev.1.155. This signotify() call seems to be bogus anyway. signotify() should only be called after the signal mask is changed. The call to signotify() here was removed in rev.1.154 when the semantics of signotify() was changed a little. Bogus calls to signotify() just waste time. % #ifdef SMP % - struct kse *ke; % - struct thread *td = curthread; % -/* we should only deliver to one thread.. but which one? */ % - FOREACH_KSEGRP_IN_PROC(p, kg) { % - FOREACH_KSE_IN_GROUP(kg, ke) { % - if (ke->ke_thread == td) { % - continue; % - } % - forward_signal(ke->ke_thread); % - } % - } % + if (td->td_state == TDS_RUNNING && td != curthread) % + forward_signal(td); % #endif forward_signal() was called with sched_lock held in rev.1.170, and forward_signal() still requires it to be held. I think sched_lock is needed for checking td_state too, as above. Here it is fairly clear that calling forward_signal() bogusly after losing a race is harmless. It just wakes up td to look for a signal that isn't there or can't be handled yet. Since this only happens if we lose a race, it may be more efficient to let it happen (rarely) than to lock (always) to prevent it happening. But we already held the lock so the locking was free except for latency issues. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message