From owner-freebsd-current Sun Jun 30 19:19:12 2002 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B7EEB37B400 for ; Sun, 30 Jun 2002 19:19:05 -0700 (PDT) Received: from mail.viasoft.com.cn (ip-167-164-97-218.anlai.com [218.97.164.167]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5ADE443E26 for ; Sun, 30 Jun 2002 19:19:00 -0700 (PDT) (envelope-from davidx@viasoft.com.cn) Received: from davidwnt (davidwnt.viasoft.com.cn [192.168.1.239]) by mail.viasoft.com.cn (8.9.3/8.9.3) with SMTP id KAA10242; Mon, 1 Jul 2002 10:33:37 +0800 Message-ID: <002801c220a4$4637c2a0$ef01a8c0@davidwnt> From: "David Xu" To: "\"Matthew Dillon\"" Cc: "Julian Elischer" , , , Subject: Re: KSE / interrupt panic (patch) Date: Mon, 1 Jul 2002 10:09:03 +0800 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2600.0000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Now let me describe where the race is: Thread A: | Thread B: cv_timedwait() | softclock() | = cv_timedwait_end() | =20 mtx_lock_spin(&sched_lock); | = mtx_lock_spin(&sched_lock); /* suppose blocked!!! */ ... | =20 callout_stop(&td->td_slpcallout) | td->td_flags |=3D TDF_TIMEOUT; | td->td_proc->p_stats->p_ru.ru_nivcsw++; | // problem, does not set thread state to TDS_SLP!!! | mi_switch(); | // run at once again | ... | mtx_unlock_spin(&sched_lock); | mtx_lock_spin() = returned !!! thread is still running | if (td->td_flags & = TDF_TIMEOUT) { ... | td->td_flags = &=3D ~TDF_TIMEOUT; some place call mi_switch() and now on runqueue | = setrunqueue(td); // crash | } ---------------------------------------------------------- here is the patch: --- /sys/kern/kern_condvar.c.old Mon Jul 1 09:06:01 2002 +++ /sys/kern/kern_condvar.c Mon Jul 1 09:32:50 2002 @@ -396,6 +396,7 @@ * between msleep and endtsleep. */ td->td_flags |=3D TDF_TIMEOUT; + td->td_state =3D TDS_SLP; td->td_proc->p_stats->p_ru.ru_nivcsw++; mi_switch(); } @@ -472,6 +473,7 @@ * between msleep and endtsleep. */ td->td_flags |=3D TDF_TIMEOUT; + td->td_state =3D TDS_SLP; td->td_proc->p_stats->p_ru.ru_nivcsw++; mi_switch(); } ------------------------------------------------------------------- bug is because cv_timedwait() detects timeout callout is running,=20 but it does not correctly wait callout to complete, so panic. BTW, the bug seems also exists in msleep() and endtsleep(), please=20 fix it! -David Xu --- David Xu wrote: > setrunqueue() call can be simply removed from cv_timedwait_end(), = because > there > is a race in softclock() and callout_stop(), when cv_timedwait_end() = losts a=20 > race, it means that that thread is already running(wokenup by another > thread), > when you setrunqueue() it, of course it will panic. > in cv_timedwait_end(), sentence "if (td->td_flags & TDF_TIMEOUT) = {...}" > is to check this race condition. >=20 > -David Xu >=20 > ----- Original Message -----=20 > From: "Matthew Dillon" > To: "Julian Elischer" > Cc: "Steve Kargl" ; "walt" > ; > Sent: Monday, July 01, 2002 4:43 AM > Subject: KSE / interrupt panic >=20 >=20 > > Got another one. Different panic, same place. > >=20 > > panic: setrunqueue: bad thread state > > cpuid =3D 0; lapic.id =3D 01000000 > > Debugger("panic") > > Stopped at Debugger+0x46: xchgl %ebx,in_Debugger.0 > > db> trace > > Debugger(c02ec2ba) at Debugger+0x46 > > panic(c02ec8a9,c6461d80,c6461d80,c6461d80,c01afa30) at panic+0xd6 > > setrunqueue(c6461d80) at setrunqueue+0x1dd > > cv_timedwait_end(c6461d80) at cv_timedwait_end+0x36 > > softclock(0) at softclock+0x159 > > ithread_loop(c229c700,df3eed48,c22aec00,c01b9c6c,0) at = ithread_loop+0x12c > > fork_exit(c01b9c6c,c229c700,df3eed48) at fork_exit+0xa8 > > fork_trampoline() at fork_trampoline+0x37 > > db> gdb > > ... > >=20 > > #0 Debugger (msg=3D0xc02ec2ba "panic") > > at /FreeBSD/FreeBSD-current/src/sys/i386/i386/db_interface.c:324 > > #1 0xc01c878a in panic (fmt=3D0xc02ec8a9 "setrunqueue: bad thread = state") > > at /FreeBSD/FreeBSD-current/src/sys/kern/kern_shutdown.c:482 > > #2 0xc01cc6cd in setrunqueue (td=3D0xc6461d80) > > at /FreeBSD/FreeBSD-current/src/sys/kern/kern_switch.c:396 > > #3 0xc01afa66 in cv_timedwait_end (arg=3D0xc6461d80) > > at /FreeBSD/FreeBSD-current/src/sys/kern/kern_condvar.c:608 > > #4 0xc01d22c9 in softclock (dummy=3D0x0) > > at /FreeBSD/FreeBSD-current/src/sys/kern/kern_timeout.c:187 > > #5 0xc01b9d98 in ithread_loop (arg=3D0xc229c700) > > at /FreeBSD/FreeBSD-current/src/sys/kern/kern_intr.c:535 > > #6 0xc01b923c in fork_exit (callout=3D0xc01b9c6c ,=20 > > arg=3D0xc229c700, frame=3D0xdf3eed48) > > at /FreeBSD/FreeBSD-current/src/sys/kern/kern_fork.c:863 > > =20 > > I'm not sure why the panic was 'bad thread state' when gdb seems = to > > show it being stuck on 'unexpected ke present'. Maybe it was an = > > optimization and gdb is confused. The panic is definitely > > 'bad thread state'. > >=20 > > (gdb) print td->td_state > > $2 =3D TDS_RUNQ > >=20 > > setrunqueue() is being called on a thread which is already on = the run > > queue. > >=20 > > -Matt > >=20 > >=20 > > To Unsubscribe: send mail to majordomo@FreeBSD.org > > with "unsubscribe freebsd-current" in the body of the message >=20 > __________________________________________________ > Do You Yahoo!? > Yahoo! - Official partner of 2002 FIFA World Cup > http://fifaworldcup.yahoo.com >=20 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message