Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 3 Oct 2004 03:53:03 -0400
From:      Brian Fundakowski Feldman <green@FreeBSD.org>
To:        John Baldwin <jhb@FreeBSD.org>
Cc:        jeff@FreeBSD.org
Subject:   Re: panic: APIC: Previous IPI is stuck
Message-ID:  <20041003075303.GG1034@green.homeunix.org>
In-Reply-To: <20041002060201.GB1034@green.homeunix.org>
References:  <20040924230425.GB1164@green.homeunix.org> <20040925101021.A78979@bpgate.speednet.com.au> <200409271635.44017.jhb@FreeBSD.org> <20041002060201.GB1034@green.homeunix.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Oct 02, 2004 at 02:02:01AM -0400, Brian Fundakowski Feldman wrote:
> On Mon, Sep 27, 2004 at 04:35:44PM -0400, John Baldwin wrote:
> > On Friday 24 September 2004 08:24 pm, Andy Farkas wrote:
> > > I have been having this problem for a few weeks now. Glad I'm not the only
> > > one. My box is a 4xPPro running 5.3-BETA5. It panics with either ULE
> > > or 4BSD.
> > >
> > > My theory is that a physical IPI gets lost somewhere and the kerenl spins
> > > waiting for it. But thats just a stab in the dark because nobody cares to
> > > explain why IPI's would be stuck.
> > 
> > The panic has to do with a previous IPI not finished being sent from the same 
> > CPU.  I've yet to determine why this happens.  You can try editing 
> > sys/i386/i386/local_apic.c and turning on 'DETECT_DEADLOCK' (I think it is 
> > just commented out) and seeing if that improves stability.  I also see this 
> > on a 4xPIIXeon system I use for testing.
> > 
> > > -andyf
> > >
> > > On Fri, 24 Sep 2004, Brian Fundakowski Feldman wrote:
> > > > This is on a 2xAthlon with the SCHED_ULE, HZ=1000, SW_WATCHDOG, and
> > > > nothing really special in development.
> > > >
> > > > FreeBSD green.homeunix.org 6.0-CURRENT FreeBSD 6.0-CURRENT #110: Wed Sep
> > > > 22 11:28:27 EDT 2004    
> > > > root@green.homeunix.org:/usr/src/sys/i386/compile/GREEN  i386
> > > >
> > > > panic: APIC: Previous IPI is stuck
> > > > cpuid = 1
> > > > KDB: stack backtrace:
> > > > kdb_backtrace(c063cae7,1,c063c5e7,d4411b28,c1da2000) at
> > > > kdb_backtrace+0x2e panic(c063c5e7,1,f3,1,2) at panic+0x128
> > > > lapic_ipi_vectored(f3,1,c1da2494,1,c0675910) at 64) at
> > > > sched_add_internal+0x21e kseq_assign(c0675910,1,c0625a07,5e0,c1da1540) at
> > > > kseq_assign+0x4a sched_clock(c1da2000,2,c0621165,17e,d4411c54) at
> > > > sched_clock+0x74 statclock(d4411c54,c1ecc840,d4411c3c,c05edc8b,d4411c54)
> > > > at statclock+0xf8 rtcintr(d4411c54,c0487af4,c06733a0,2,8) at rtcintr+0x4f
> > > > intr_execute_handlers(c1dca8f0,d4411c54,d4411cb4,c05ea0e3,38) at
> > > > intr_execute_ha ndlers+0xab
> > > > lapic_handle_intr(38) at lapic_handle_intr+0x3a
> > > > Xapic_isr1() at Xapic_isr1+0x33
> > > > --- interrupt, eip = 0xc04a640a, esp = 0xd4411c98, ebp = 0xd4411cb4 ---
> > > > _mtx_lock_sleep(c06733e0,c1da2000,0,c06220e8,222) at
> > > > _mtx_lock_sleep+0x13a _mtx_lock_flags(c06733e0,0,c06220e8,222,0) at
> > > > _mtx_lock_flags+0xc0
> > > > ithread_loop(c1da6200,d4411d48,c0621edb,31f,c1da6200) at
> > > > ithread_loop+0x15a fork_exit(c0499660,c1da6200,d4411d48) at
> > > > fork_exit+0xc6
> > > > fork_trampoline() at fork_trampoline+0x8
> > > > --- trap 0x1, eip = 0, esp = 0xd4411d7c, ebp = 0 ---
> > > > KDB: enter: panic
> > > > panic: APIC: Previous IPI is stuck
> > > > cpuid = 1
> > > > boot() called on cpu#1
> > > > Uptime: 2d0h16m55s
> > > > ^^ full hang instead of reset
> 
> Okay, I just got another one of these, exactly the same as that one but
> for the fact that the softclock() interrupt was specifically locking
> Giant instead of the interrupt thread loop.  So the other CPU owned
> Giant at the time and the scheduling CPU is trying to acquire it and
> interrupted by needing to run the statclock().
> 
> This is way too coincidental to ignore.
> 
> SCHED_ULE is far too complex for me to understand much of right now;
> what prevents sched_clock() from calling kseq_assign() multiple times
> per CPU?  Are we _absolutely_100%_certain_ that functionality works
> correctly?

Ping... adding Jeff... I really wish I understood SCHED_ULE, because it
seems entirely plausible it's trying to send two IPIs, the first of
which would get blocked waiting for the held sched_lock, and the second
of which would never have its interrupt serviced because the first one
blocked on sched_lock would have interrupts disabled and would remain
unable to respond to an IPI...

-- 
Brian Fundakowski Feldman                           \'[ FreeBSD ]''''''''''\
  <> green@FreeBSD.org                               \  The Power to Serve! \
 Opinions expressed are my own.                       \,,,,,,,,,,,,,,,,,,,,,,\



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20041003075303.GG1034>