Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 14 Jan 2011 16:47:31 -0500
From:      Charles Owens <cowens@greatbaysoftware.com>
To:        Jack Vogel <jfvogel@gmail.com>
Cc:        freebsd-net <freebsd-net@freebsd.org>
Subject:   Re: igb watchdog timeouts
Message-ID:  <4D30C473.7060900@greatbaysoftware.com>
In-Reply-To: <AANLkTimetnbGRLArCUT%2BoHM94A-BQZnizGVAN%2BQE8Pqz@mail.gmail.com>
References:  <20100729215649.GB2615@icir.org>	<20110103210209.GA13091@icir.org>	<4D2E66C4.5090607@greatbaysoftware.com>	<AANLkTinxDryptLu%2B7NRnLPLE7716BHw=CZ==jYOb_Q%2BY@mail.gmail.com>	<4D2F20BB.5080204@greatbaysoftware.com>	<AANLkTimK8VEQLd-m-zPsw8-%2BoBi-oJ5pc5eScmFXmujy@mail.gmail.com>	<4D2F71BE.2080801@greatbaysoftware.com> <AANLkTimetnbGRLArCUT%2BoHM94A-BQZnizGVAN%2BQE8Pqz@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Thanks for all the feedback on polling, Jack and others.  Very helpful.

We are working to merge the latest RELENG_8 em/igb driver into our 
custom build that's based on RELENG_8_1.  I've been able to create a 
patch using the following command:

cvs di -N -up -jRELENG_8_1 -jRELENG_8 sys/dev/e1000 sys/dev/ixgb 
sys/dev/ixgbe sys/conf/files > /tmp/e1000.diff

... by hand trimming sys/conf/files down to only the relevant bits.  It 
compiled and seems to be functioning, but I  wouldn't mind a sanity 
check on my methodology.  In particular:

    * Some of the patches overlapped with sys/dev/ixgb, igbe... so I
      included them.  Should I have?
    * Is there anything else I should have included?


Thanks very much,

Charles


On 1/13/11 4:49 PM, Jack Vogel wrote:
> Polling has seemed to me to be a way around other problems, problems 
> that these days
> no longer exist. I remember back in the FreeBSD 6 days having 
> interrupt problems which
> of course also led to watchdogs. Polling got rid of that. But now 
> there are dedicated
> MULTIPLE interrupts by using MSIX, so that reason for polling is gone.
>
> Of course there can still be advantages, reducing interrupts and hence 
> context switches,
> which is why the Linux approach does what it does.
>
> I have not spent time with that issue, its good to know that there 
> could be problems
> lurking with it. But if you can simply go with MSIX I would do that 
> for now.
>
> Jack
>
>
> On Thu, Jan 13, 2011 at 1:42 PM, Charles Owens 
> <cowens@greatbaysoftware.com <mailto:cowens@greatbaysoftware.com>> wrote:
>
>     So we went back to basics (stock 8.1-RELEASE) and found no
>     issue!    We then added in our kernel mods one by one and
>     ultimately discovered that device-polling is the culprit (the
>     kernel config was simply GENERIC + PAE + polling).
>
>     Immediately upon running "ifconfig igb0 polling" the symptoms appear.
>
>     This is very good news overall, in that we can certainly disable
>     polling for igb.  This begs the question, though, as to whether
>     polling is recommended these days at all for em/igb NICs... or
>     even in general.  From other conversations we've seen there seems
>     to be some general debate about this.  In testing we've done in
>     the past (circa 7.0) there certainly seemed to be benefit to using
>     this feature.  What are your thoughts about this?
>
>     For our product releases we'd like stay with RELENG_8_1.  Would
>     you recommend the driver in 8.2 as being preferable?
>
>     In case it's of interest:
>
>     igb0@pci0:1:0:0:        class=0x020000 card=0x34de8086 chip=0x10a78086 rev=0x02
>     hdr=0x00
>          vendor     = 'Intel Corporation'    device     = '82575EB Gigabit Network Connection'
>          class      = network
>          subclass   = ethernet
>
>
>
>     Thanks,
>     Charles
>
>
>
>     On 1/13/11 1:27 PM, Jack Vogel wrote:
>>     The 8.2 latest does have the latest igb, so using that should be
>>     indicative...
>>
>>     Jack
>>
>>
>>     On Thu, Jan 13, 2011 at 7:56 AM, Charles Owens
>>     <cowens@greatbaysoftware.com
>>     <mailto:cowens@greatbaysoftware.com>> wrote:
>>
>>         Ok... I got my wires crossed:  our first time testing 8.1 on
>>         this particular platform was with a kernel that had ichwd
>>         enabled (a new thing for us) and so when igb started
>>         complaining about "watchdog" we thought it was related.
>>
>>         We've tested again and clearly the real story is that we're
>>         simply seeing igb issues, symptoms similar to those described.
>>
>>         Does 8.2-RC1 have sufficiently "latest" code, or should I be
>>         looking to load up something else?  (8-stable, maybe?)
>>
>>         Thanks,
>>         Charles
>>
>>
>>
>>         On 1/13/11 12:07 AM, Jack Vogel wrote:
>>>         The problem that Robin saw was due to having MSIX interrupts
>>>         disabled on the system, I doubt that
>>>         is going to be the "issue" for others.
>>>
>>>         Get the latest version of the igb code and see if that helps
>>>         you as a first step.
>>>
>>>         Jack
>>>
>>>
>>>         On Wed, Jan 12, 2011 at 6:43 PM, Charles Owens
>>>         <cowens@greatbaysoftware.com
>>>         <mailto:cowens@greatbaysoftware.com>> wrote:
>>>
>>>             I'd like to report that we're running into this issue
>>>             also, in our case on systems that are based on the Intel
>>>             S5520UR Server Board, running 8.1-RELEASE.  If the ichwd
>>>             driver is loaded we see the same messages, and network
>>>             communication via the igb nics is non-functional.
>>>
>>>             Have you had any luck?
>>>
>>>             Thanks,
>>>             Charles
>>>
>>>              Charles Owens
>>>              Great Bay Software, Inc.
>>>
>>>
>>>
>>>
>>>             On 1/3/11 4:02 PM, Robin Sommer wrote:
>>>
>>>                 Hello all,
>>>
>>>                 quite a while ago I asked about the problem below.
>>>                 Unfortunately, I
>>>                 haven't found a solution yet and I'm actually still
>>>                 seeing these
>>>                 timeouts after just upgrading to 8.2-RC1. Any
>>>                 further ideas on what
>>>                 could be triggering them, or how I could track down
>>>                 the cause?
>>>
>>>                 Thanks,
>>>
>>>                 Robin
>>>
>>>                 On Thu, Jul 29, 2010 at 14:56 -0700, I wrote:
>>>
>>>                     Since upgrading from 8.0 to 8.1-RELEASE, I'm
>>>                     seeing lots of messages
>>>                     like those below on all my SuperMicro
>>>                     SBI-7425C-T3 blades. There's
>>>                     almost no traffic on those interfaces.
>>>
>>>                     Any idea?
>>>
>>>                     Thanks,
>>>
>>>                     Robin
>>>
>>>                     Jul 29 13:01:18 blade0 kernel: igb1: Watchdog
>>>                     timeout -- resetting
>>>                     Jul 29 13:01:18 blade0 kernel: igb1: Queue(0)
>>>                     tdh = 256, hw tdt = 266
>>>                     Jul 29 13:01:18 blade0 kernel: igb1: TX(0) desc
>>>                     avail = 1013,Next TX to Clean = 255
>>>                     Jul 29 13:01:18 blade0 kernel: igb1: link state
>>>                     changed to DOWN
>>>                     Jul 29 13:01:18 blade0 kernel: igb1: link state
>>>                     changed to UP
>>>                     Jul 29 13:01:29 blade0 kernel: igb1: Watchdog
>>>                     timeout -- resetting
>>>                     Jul 29 13:01:29 blade0 kernel: igb1: Queue(0)
>>>                     tdh = 0, hw tdt = 10
>>>                     Jul 29 13:01:29 blade0 kernel: igb1: TX(0) desc
>>>                     avail = 1014,Next TX to Clean = 0
>>>                     Jul 29 13:01:29 blade0 kernel: igb1: link state
>>>                     changed to DOWN
>>>                     Jul 29 13:01:29 blade0 kernel: igb1: link state
>>>                     changed to UP
>>>                     Jul 29 13:01:46 blade0 kernel: igb1: Watchdog
>>>                     timeout -- resetting
>>>                     Jul 29 13:01:46 blade0 kernel: igb1: Queue(0)
>>>                     tdh = 32, hw tdt = 33
>>>                     Jul 29 13:01:46 blade0 kernel: igb1: TX(0) desc
>>>                     avail = 1022,Next TX to Clean = 31
>>>                     Jul 29 13:01:46 blade0 kernel: igb1: link state
>>>                     changed to DOWN
>>>                     Jul 29 13:01:46 blade0 kernel: igb1: link state
>>>                     changed to UP
>>>                     Jul 29 13:01:57 blade0 kernel: igb1: Watchdog
>>>                     timeout -- resetting
>>>                     Jul 29 13:01:57 blade0 kernel: igb1: Queue(0)
>>>                     tdh = 0, hw tdt = 10
>>>                     Jul 29 13:01:57 blade0 kernel: igb1: TX(0) desc
>>>                     avail = 1014,Next TX to Clean = 0
>>>                     Jul 29 13:01:57 blade0 kernel: igb1: link state
>>>                     changed to DOWN
>>>                     Jul 29 13:01:58 blade0 kernel: igb1: link state
>>>                     changed to UP
>>>                     Jul 29 13:02:13 blade0 kernel: igb1: Watchdog
>>>                     timeout -- resetting
>>>
>>>                         grep igb /var/run/dmesg.boot
>>>
>>>                     igb0:<Intel(R) PRO/1000 Network Connection
>>>                     version - 1.9.5>  port 0x2000-0x201f mem
>>>                     0xfc940000-0xfc95ffff,0xfc920000-0xfc93ffff,0xfc900000-0xfc903fff
>>>                     irq 16 at device 0.0 on pci4
>>>                     igb0: [FILTER]
>>>                     igb0: Ethernet address: 00:30:48:9e:22:00
>>>                     igb1:<Intel(R) PRO/1000 Network Connection
>>>                     version - 1.9.5>  port 0x2020-0x203f mem
>>>                     0xfc980000-0xfc99ffff,0xfc960000-0xfc97ffff,0xfc904000-0xfc907fff
>>>                     irq 17 at device 0.1 on pci4
>>>                     igb1: [FILTER]
>>>                     igb1: Ethernet address: 00:30:48:9e:22:01
>>>
>>>                         pciconf -lv
>>>
>>>                     [...]
>>>                     igb0@pci0:4:0:0: class=0x020000 card=0x10a915d9
>>>                     chip=0x10a98086 rev=0x02 hdr=0x00
>>>                         vendor     = 'Intel Corporation'
>>>                         device     = '82575EB Gigabit Backplane
>>>                     Connection'
>>>                         class      = network
>>>                         subclass   = ethernet
>>>                     igb1@pci0:4:0:1:        class=0x020000
>>>                     card=0x10a915d9
>>>                     chip=0x10a98086 rev=0x02 hdr=0x00
>>>                         vendor     = 'Intel Corporation'
>>>                         device     = '82575EB Gigabit Backplane
>>>                     Connection'
>>>                         class      = network
>>>                         subclass   = ethernet
>>>                     [...]
>>>
>>>
>>>             _______________________________________________
>>>             freebsd-net@freebsd.org <mailto:freebsd-net@freebsd.org>
>>>             mailing list
>>>             http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>>             To unsubscribe, send any mail to
>>>             "freebsd-net-unsubscribe@freebsd.org
>>>             <mailto:freebsd-net-unsubscribe@freebsd.org>"
>>>
>>>
>>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4D30C473.7060900>