Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 3 Jul 2002 17:32:29 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Julian Elischer <julian@elischer.org>
Cc:        Andrew Gallatin <gallatin@cs.duke.edu>, <freebsd-current@FreeBSD.ORG>
Subject:   Re: KSE signal problems still
Message-ID:  <20020703165856.Q15258-100000@gamplex.bde.org>
In-Reply-To: <Pine.BSF.4.21.0207021535240.97650-100000@InterJet.elischer.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 2 Jul 2002, Julian Elischer wrote:

> On Tue, 2 Jul 2002, Andrew Gallatin wrote:
>
> >
> > An easy way to induce a panic w/a post KSE -current is to ^C gdb as it
> > starts on an SMP machine:
>
> A possibly related breakage is:
>
> type ^Z while doing "make buiildworld" (or something similar).
>
> when you type 'fg' there is a high change the build will abort..
> >
> > # gdb -k /var/crash/kernel.1  /var/crash/vmcore.1
> > GNU gdb 5.2.0 (FreeBSD) 20020627
> > Copyright 2002 Free Software Foundation, Inc.
> > GDB is free software, covered by the GNU General Public License, and
> > you are
> > welcome to change it and/or distribute copies of it under certain
> > conditions.
> > Type "show copying" to see the conditions.
> > There is absolutely no warranty for GDB.  Type "show warranty" for
> > details.
> > This GDB was configured as "i386-undermydesk-freebsd"...
> > ^C
> >
> > panic: mutex sched lock not owned at ../../../kern/subr_smp.c:126
> > cpuid = 1; lapic.id = 01000000
> > Debugger("panic")
> > Stopped at      Debugger+0x46:  xchgl   %ebx,in_Debugger.0
> > db> where
> > No such command
> > db> tr
> > Debugger(c02dbf5a) at Debugger+0x46
> > panic(c02db1a8,c02db318,c02df736,7e,c4445540) at panic+0xd6
> > _mtx_assert(c0315440,1,c02df736,7e) at _mtx_assert+0xa8
> > forward_signal(c4445540) at forward_signal+0x1a
> > tdsignal(c4445540,2,2) at tdsignal+0x182
> > psignal(c443d558,2) at psignal+0x3c8
> > pgsignal(c441ad00,2,1,c441ad1c,0) at pgsignal+0x63
> > ttyinput(3,c41e8e30,c41e8e00,0,c0347903) at ttyinput+0x316
> > ptcwrite(c4307a00,d7d5ec88,7f0011,1,d7d5ebc4) at ptcwrite+0x17f
> > spec_write(d7d5ebf0,d7d5ec3c,c0204cc8,d7d5ebf0,7f0011) at spec_write+0x5a
> > spec_vnoperate(d7d5ebf0) at spec_vnoperate+0x13
> > vn_write(c41ded5c,d7d5ec88,c440cd80,0,c409e780) at vn_write+0x1c8
> > dofilewrite(c409e780,c41ded5c,5,8088000,1) at dofilewrite+0xaf
> > write(c409e780,d7d5ed14,3,b,282) at write+0x39
> > syscall(2f,2f,2f,1,8073410) at syscall+0x23c
> > syscall_with_err_pushed() at syscall_with_err_pushed+0x1b
> > --- syscall (4, FreeBSD ELF, write), eip = 0x281fb3a3, esp =
> > 0xbfbff37c, ebp = 0xbfbff3e8 ---
> >
> >
>
> hummmmm
>
> so, the question is:
> where should we get the sched lock?

Maybe just remove the foot-shooting that releases it?

% Index: kern_sig.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/kern/kern_sig.c,v
% retrieving revision 1.170
% retrieving revision 1.171
% diff -u -1 -r1.170 -r1.171
% --- kern_sig.c	29 Jun 2002 02:00:01 -0000	1.170
% +++ kern_sig.c	29 Jun 2002 17:26:18 -0000	1.171
% @@ -1486,15 +1540,9 @@
%  		 */
% -		if (p->p_stat == SRUN) {
% +		mtx_unlock_spin(&sched_lock);
   		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ shoot foot
% +		if (td->td_state == TDS_RUNQ ||
% +		    td->td_state == TDS_RUNNING) {

I think sched_lock is needed for checking td_state too (strictly to use
the result of the check, so the lock is not critical if the use doesn't
do anything harmful), but there is no lock indication for td_state
in proc.h like there used to be for p_stat.

% +			signotify(td->td_proc);

Holding sched_lock when calling signotify() used to be an error, but that
was changed in rev.1.155.  This signotify() call seems to be bogus anyway.
signotify() should only be called after the signal mask is changed.  The
call to signotify() here was removed in rev.1.154 when the semantics of
signotify() was changed a little.  Bogus calls to signotify() just waste
time.

%  #ifdef SMP
% -			struct kse *ke;
% -			struct thread *td = curthread;
% -/* we should only deliver to one thread.. but which one? */
% -			FOREACH_KSEGRP_IN_PROC(p, kg) {
% -				FOREACH_KSE_IN_GROUP(kg, ke) {
% -					if (ke->ke_thread == td) {
% -						continue;
% -					}
% -					forward_signal(ke->ke_thread);
% -				}
% -			}
% +			if (td->td_state == TDS_RUNNING && td != curthread)
% +				forward_signal(td);
%  #endif

forward_signal() was called with sched_lock held in rev.1.170, and
forward_signal() still requires it to be held.  I think sched_lock is
needed for checking td_state too, as above.  Here it is fairly clear
that calling forward_signal() bogusly after losing a race is harmless.
It just wakes up td to look for a signal that isn't there or can't
be handled yet.  Since this only happens if we lose a race, it may be
more efficient to let it happen (rarely) than to lock (always) to prevent
it happening.  But we already held the lock so the locking was free
except for latency issues.

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020703165856.Q15258-100000>