Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Oct 2003 08:48:57 -0700
From:      Luigi Rizzo <rizzo@icir.org>
To:        Michael Marchetti <mmarchetti@sandvine.com>
Cc:        "'hackers@freebsd.org'" <hackers@freebsd.org>
Subject:   Re: hardclock interrupt deadlock
Message-ID:  <20031016084857.A26357@xorpc.icir.org>
In-Reply-To: <FE045D4D9F7AED4CBFF1B3B813C8533701ED5EC9@mail.sandvine.com>; from mmarchetti@sandvine.com on Thu, Oct 16, 2003 at 11:17:50AM -0400
References:  <FE045D4D9F7AED4CBFF1B3B813C8533701ED5EC9@mail.sandvine.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Oct 16, 2003 at 11:17:50AM -0400, Michael Marchetti wrote:
> Hi,
> 
> We have encountered a problem where the system hangs.  We are running a 4.7
> SMP kernel using kernel polling on a Dual Xeon with hyperthreading enabled

puzzled on what you mean by "kernel polling" ... DEVICE_POLLING,
if that is what you mean, cannot work with SMP -- it should not even
build unless you manually disabled the check.

	luigi

> (essentially a 4 processor system).  As a result, the only HW interrupts in
> the system are hardclock (8254), the rtc, serial console and scsi.  The
> synchronous interrupts are (8254 and rtc).  When the system is hung, I have
> found that the ipending and iactive bits for the 8254 and rtc are set
> (meaning the interrupt is pending and active) although giant lock is not
> held and all processors are idle (and halted).  This lead me to believe that
> somehow the ipending bit was set "just before" the last interrupt returned.
> The only way the system would be able to run that interrupt again is if
> another interrupt would run and it would notice that ipending is set, and it
> would run (an interrupt delay would be seen).  In a non-polling system, I
> imagine the ethernet interrupts would wake it up.  I believe I found a
> potential hole where this could happen.
> 
> In i386/isa/ipl.s:
> 
> #ifdef SMP
> 	cli				/* early to prevent INT deadlock */
> doreti_next2:
> #endif
> 	movl	%eax,%ecx
> 	notl	%ecx			/* set bit = unmasked level */
> #ifndef SMP
> 	cli
> #endif
> 	andl	_ipending,%ecx		/* set bit = unmasked pending INT */
> 	jne	doreti_unpend
> 	movl	%eax,_cpl
> 
> I'm concerned in the instance the ipending is checked and deemed to be not
> set, but just after another interrupt occurs causing ipending to be set.
> Because CPL is not yet unmasked, that interrupt is not forwarded.  In
> Particular, in i386/isa/apic_vector.s:
> 
> 3: ; 			/* other cpu has isr lock */			\
> 	APIC_ITRACE(apic_itrace_noisrlock, irq_num, APIC_ITRACE_NOISRLOCK)
> ;\
> 	lock ;								\
> 	orl	$IRQ_BIT(irq_num), _ipending ;				\
> 	testl	$IRQ_BIT(irq_num), _cpl ;				\
> 	jne	4f ;				/* this INT masked */	\
> 	call	forward_irq ;	 /* forward irq to lock holder */	\
> 	POP_FRAME ;	 			/* and return */	\
> 	iret ;								\
> 	ALIGN_TEXT ;							\
> 
> The check for _cpl occurs right after the ipending, thus causing a potential
> race for checking/modifying the cpl.
> 
> One quick solution that I thought might correct this would be in ipl.s,
> right after modifying the cpl, recheck the ipending again to see if it
> changed, such as:
> 
> 
> #ifdef SMP
> 	cli				/* early to prevent INT deadlock */
> doreti_next2:
> #endif
> 	movl	%eax,%ecx
> 	notl	%ecx			/* set bit = unmasked level */
> #ifndef SMP
> 	cli
> #endif
> 	andl	_ipending,%ecx		/* set bit = unmasked pending INT */
> 	jne	doreti_unpend
> 	movl	%eax,_cpl
> 	andl	_ipending,%ecx		/* set bit = unmasked pending INT */
> 	jne	doreti_unpend
> 
> 
> Any opinions/insight?
> 
> thanks.
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20031016084857.A26357>