Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 20 Jan 2013 04:30:01 GMT
From:      John Baldwin <jhb@FreeBSD.org>
To:        freebsd-net@FreeBSD.org
Subject:   Re: kern/172113: [panic] [e1000] [patch] 9.1-RC1/amd64 panices in igb(4): m_getjcl: invalid cluster type
Message-ID:  <201301200430.r0K4U1A4093891@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/172113; it has been noted by GNATS.

From: John Baldwin <jhb@FreeBSD.org>
To: bug-followup@FreeBSD.org, egrosbein@rdtc.ru
Cc: jfv@FreeBSD.org, George Neville-Neil <gnn@FreeBSD.org>
Subject: Re: kern/172113: [panic] [e1000] [patch] 9.1-RC1/amd64 panices in
 igb(4): m_getjcl: invalid cluster type
Date: Sat, 19 Jan 2013 23:26:17 -0500

 I was able to finally reproduce this panic today.  It seems to require
 a server configured for PXE but that receives no DHCP reply (and
 possibly with the requisite SuperMicro X8 board).  I was able to
 prevent the panic with a subset of the referenced patch by only adding
 the 'if_drv_flags & IFF_DRV_RUNNING' check to the start of
 igb_msix_que().  The rest of the patch was unnecessary.  I also added
 some debugging to print out the ICR, EICR, IMS, and EIMS registers in
 this case.  It does look like the hardware is sending an interrupt that
 is not enabled in the interrupt mask (specifically LSC).  In fact, the
 82576 datasheet specifically mentions masking LSC until initialization
 is complete to avoid spurious interrupts during boot and AFAICT igb(4)
 does this since e1000_reset_hw() clears the interrupt mask via writes
 to IMC and doesn't re-enable interrupts until igb_init_locked() is
 invoked via 'ifconfig up'.  Here is my debug output:
 
 SMP: AP CPU #6 Launched!
 SMP: AP CPU #4 Launched!
 stray irq0
 igb0: interrupt on que 0: icr 0x1000004 eicr 0
      ims 0 eims 0x80000000
 
 Hmmm.   Nothing clears EIMS.  After some more debugging, I determined
 that e1000_reset_hw() always turns this bit in EIMS on, even if it is
 off before e1000_reset_hw() is called(!).  I added explicit calls to
 igb_disable_intr() to clear EIMS after each call to e1000_reset_hw().
 This removes the 'stray irq0', but I still get a spurious interrupt
 during boot (albeit with eims 0).  I can use the IFF_DRV_RUNNING hack
 for now, but I think the real fix is something else.
 
 -- 
 John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201301200430.r0K4U1A4093891>