Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Oct 2003 11:17:50 -0400
From:      Michael Marchetti <mmarchetti@sandvine.com>
To:        "'hackers@freebsd.org'" <hackers@freebsd.org>, "'stable@freebsd.org'" <stable@freebsd.org>
Subject:   hardclock interrupt deadlock
Message-ID:  <FE045D4D9F7AED4CBFF1B3B813C8533701ED5EC9@mail.sandvine.com>

next in thread | raw e-mail | index | archive | help
Hi,

We have encountered a problem where the system hangs.  We are running a 4.7
SMP kernel using kernel polling on a Dual Xeon with hyperthreading enabled
(essentially a 4 processor system).  As a result, the only HW interrupts in
the system are hardclock (8254), the rtc, serial console and scsi.  The
synchronous interrupts are (8254 and rtc).  When the system is hung, I have
found that the ipending and iactive bits for the 8254 and rtc are set
(meaning the interrupt is pending and active) although giant lock is not
held and all processors are idle (and halted).  This lead me to believe that
somehow the ipending bit was set "just before" the last interrupt returned.
The only way the system would be able to run that interrupt again is if
another interrupt would run and it would notice that ipending is set, and it
would run (an interrupt delay would be seen).  In a non-polling system, I
imagine the ethernet interrupts would wake it up.  I believe I found a
potential hole where this could happen.

In i386/isa/ipl.s:

#ifdef SMP
	cli				/* early to prevent INT deadlock */
doreti_next2:
#endif
	movl	%eax,%ecx
	notl	%ecx			/* set bit = unmasked level */
#ifndef SMP
	cli
#endif
	andl	_ipending,%ecx		/* set bit = unmasked pending INT */
	jne	doreti_unpend
	movl	%eax,_cpl

I'm concerned in the instance the ipending is checked and deemed to be not
set, but just after another interrupt occurs causing ipending to be set.
Because CPL is not yet unmasked, that interrupt is not forwarded.  In
Particular, in i386/isa/apic_vector.s:

3: ; 			/* other cpu has isr lock */			\
	APIC_ITRACE(apic_itrace_noisrlock, irq_num, APIC_ITRACE_NOISRLOCK)
;\
	lock ;								\
	orl	$IRQ_BIT(irq_num), _ipending ;				\
	testl	$IRQ_BIT(irq_num), _cpl ;				\
	jne	4f ;				/* this INT masked */	\
	call	forward_irq ;	 /* forward irq to lock holder */	\
	POP_FRAME ;	 			/* and return */	\
	iret ;								\
	ALIGN_TEXT ;							\

The check for _cpl occurs right after the ipending, thus causing a potential
race for checking/modifying the cpl.

One quick solution that I thought might correct this would be in ipl.s,
right after modifying the cpl, recheck the ipending again to see if it
changed, such as:


#ifdef SMP
	cli				/* early to prevent INT deadlock */
doreti_next2:
#endif
	movl	%eax,%ecx
	notl	%ecx			/* set bit = unmasked level */
#ifndef SMP
	cli
#endif
	andl	_ipending,%ecx		/* set bit = unmasked pending INT */
	jne	doreti_unpend
	movl	%eax,_cpl
	andl	_ipending,%ecx		/* set bit = unmasked pending INT */
	jne	doreti_unpend


Any opinions/insight?

thanks.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FE045D4D9F7AED4CBFF1B3B813C8533701ED5EC9>