From owner-freebsd-current  Wed Jul  3  0:26:50 2002
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 446B137B400
	for <freebsd-current@FreeBSD.ORG>; Wed,  3 Jul 2002 00:26:44 -0700 (PDT)
Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 5636343E3B
	for <freebsd-current@FreeBSD.ORG>; Wed,  3 Jul 2002 00:26:43 -0700 (PDT)
	(envelope-from bde@zeta.org.au)
Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102])
	by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id RAA11552;
	Wed, 3 Jul 2002 17:26:33 +1000
Date: Wed, 3 Jul 2002 17:32:29 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: Julian Elischer <julian@elischer.org>
Cc: Andrew Gallatin <gallatin@cs.duke.edu>,
	<freebsd-current@FreeBSD.ORG>
Subject: Re: KSE signal problems still
In-Reply-To: <Pine.BSF.4.21.0207021535240.97650-100000@InterJet.elischer.org>
Message-ID: <20020703165856.Q15258-100000@gamplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

On Tue, 2 Jul 2002, Julian Elischer wrote:

> On Tue, 2 Jul 2002, Andrew Gallatin wrote:
>
> >
> > An easy way to induce a panic w/a post KSE -current is to ^C gdb as it
> > starts on an SMP machine:
>
> A possibly related breakage is:
>
> type ^Z while doing "make buiildworld" (or something similar).
>
> when you type 'fg' there is a high change the build will abort..
> >
> > # gdb -k /var/crash/kernel.1  /var/crash/vmcore.1
> > GNU gdb 5.2.0 (FreeBSD) 20020627
> > Copyright 2002 Free Software Foundation, Inc.
> > GDB is free software, covered by the GNU General Public License, and
> > you are
> > welcome to change it and/or distribute copies of it under certain
> > conditions.
> > Type "show copying" to see the conditions.
> > There is absolutely no warranty for GDB.  Type "show warranty" for
> > details.
> > This GDB was configured as "i386-undermydesk-freebsd"...
> > ^C
> >
> > panic: mutex sched lock not owned at ../../../kern/subr_smp.c:126
> > cpuid = 1; lapic.id = 01000000
> > Debugger("panic")
> > Stopped at      Debugger+0x46:  xchgl   %ebx,in_Debugger.0
> > db> where
> > No such command
> > db> tr
> > Debugger(c02dbf5a) at Debugger+0x46
> > panic(c02db1a8,c02db318,c02df736,7e,c4445540) at panic+0xd6
> > _mtx_assert(c0315440,1,c02df736,7e) at _mtx_assert+0xa8
> > forward_signal(c4445540) at forward_signal+0x1a
> > tdsignal(c4445540,2,2) at tdsignal+0x182
> > psignal(c443d558,2) at psignal+0x3c8
> > pgsignal(c441ad00,2,1,c441ad1c,0) at pgsignal+0x63
> > ttyinput(3,c41e8e30,c41e8e00,0,c0347903) at ttyinput+0x316
> > ptcwrite(c4307a00,d7d5ec88,7f0011,1,d7d5ebc4) at ptcwrite+0x17f
> > spec_write(d7d5ebf0,d7d5ec3c,c0204cc8,d7d5ebf0,7f0011) at spec_write+0x5a
> > spec_vnoperate(d7d5ebf0) at spec_vnoperate+0x13
> > vn_write(c41ded5c,d7d5ec88,c440cd80,0,c409e780) at vn_write+0x1c8
> > dofilewrite(c409e780,c41ded5c,5,8088000,1) at dofilewrite+0xaf
> > write(c409e780,d7d5ed14,3,b,282) at write+0x39
> > syscall(2f,2f,2f,1,8073410) at syscall+0x23c
> > syscall_with_err_pushed() at syscall_with_err_pushed+0x1b
> > --- syscall (4, FreeBSD ELF, write), eip = 0x281fb3a3, esp =
> > 0xbfbff37c, ebp = 0xbfbff3e8 ---
> >
> >
>
> hummmmm
>
> so, the question is:
> where should we get the sched lock?

Maybe just remove the foot-shooting that releases it?

% Index: kern_sig.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/kern/kern_sig.c,v
% retrieving revision 1.170
% retrieving revision 1.171
% diff -u -1 -r1.170 -r1.171
% --- kern_sig.c	29 Jun 2002 02:00:01 -0000	1.170
% +++ kern_sig.c	29 Jun 2002 17:26:18 -0000	1.171
% @@ -1486,15 +1540,9 @@
%  		 */
% -		if (p->p_stat == SRUN) {
% +		mtx_unlock_spin(&sched_lock);
   		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ shoot foot
% +		if (td->td_state == TDS_RUNQ ||
% +		    td->td_state == TDS_RUNNING) {

I think sched_lock is needed for checking td_state too (strictly to use
the result of the check, so the lock is not critical if the use doesn't
do anything harmful), but there is no lock indication for td_state
in proc.h like there used to be for p_stat.

% +			signotify(td->td_proc);

Holding sched_lock when calling signotify() used to be an error, but that
was changed in rev.1.155.  This signotify() call seems to be bogus anyway.
signotify() should only be called after the signal mask is changed.  The
call to signotify() here was removed in rev.1.154 when the semantics of
signotify() was changed a little.  Bogus calls to signotify() just waste
time.

%  #ifdef SMP
% -			struct kse *ke;
% -			struct thread *td = curthread;
% -/* we should only deliver to one thread.. but which one? */
% -			FOREACH_KSEGRP_IN_PROC(p, kg) {
% -				FOREACH_KSE_IN_GROUP(kg, ke) {
% -					if (ke->ke_thread == td) {
% -						continue;
% -					}
% -					forward_signal(ke->ke_thread);
% -				}
% -			}
% +			if (td->td_state == TDS_RUNNING && td != curthread)
% +				forward_signal(td);
%  #endif

forward_signal() was called with sched_lock held in rev.1.170, and
forward_signal() still requires it to be held.  I think sched_lock is
needed for checking td_state too, as above.  Here it is fairly clear
that calling forward_signal() bogusly after losing a race is harmless.
It just wakes up td to look for a signal that isn't there or can't
be handled yet.  Since this only happens if we lose a race, it may be
more efficient to let it happen (rarely) than to lock (always) to prevent
it happening.  But we already held the lock so the locking was free
except for latency issues.

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message