From owner-freebsd-net@FreeBSD.ORG Thu Jan 13 21:49:59 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3D9341065670 for ; Thu, 13 Jan 2011 21:49:59 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from mail-yw0-f54.google.com (mail-yw0-f54.google.com [209.85.213.54]) by mx1.freebsd.org (Postfix) with ESMTP id E51C48FC1F for ; Thu, 13 Jan 2011 21:49:58 +0000 (UTC) Received: by ywp6 with SMTP id 6so648304ywp.13 for ; Thu, 13 Jan 2011 13:49:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=+xNkd/gDBY/N7mLM4ou2m90gy1xONwCrPz0Axu5qVtc=; b=c9CLt57FsuBFw6EB4nRmRsocdhq8kmU9bvqxTm3FokZH3zMvHMsgJcqfTcF6HxEtVd o0kvDpq7wP/ZzZFcHzb18aLZP4oalRd1TmYyfhpE4a/Ocz5g6X4glHSD/Q3yXqF0Y89I zcCP4eyW5CQXkiTA+o4pRuvIGLT0JDgjt+CXU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=dltwn2lu4ghGb2we6bJJPkHO0RN8qYAGd70evSagYENDx48WUSTDYB4l9ubOu+4IkO xKMRYSthw2+hDVgrEJ0S7Vo/nC1r9tx+AZ7FOijNgb4SsCcB5Xp4+7aCcyBj2/Ms9DNF Jcs08y5wzoimrlab70usuwlRXcWZY9omvqq8k= MIME-Version: 1.0 Received: by 10.150.212.14 with SMTP id k14mr354082ybg.73.1294955397995; Thu, 13 Jan 2011 13:49:57 -0800 (PST) Received: by 10.147.182.20 with HTTP; Thu, 13 Jan 2011 13:49:57 -0800 (PST) In-Reply-To: <4D2F71BE.2080801@greatbaysoftware.com> References: <20100729215649.GB2615@icir.org> <20110103210209.GA13091@icir.org> <4D2E66C4.5090607@greatbaysoftware.com> <4D2F20BB.5080204@greatbaysoftware.com> <4D2F71BE.2080801@greatbaysoftware.com> Date: Thu, 13 Jan 2011 13:49:57 -0800 Message-ID: From: Jack Vogel To: Charles Owens Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-net , Robin Sommer Subject: Re: igb watchdog timeouts X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Jan 2011 21:49:59 -0000 Polling has seemed to me to be a way around other problems, problems that these days no longer exist. I remember back in the FreeBSD 6 days having interrupt problems which of course also led to watchdogs. Polling got rid of that. But now there are dedicated MULTIPLE interrupts by using MSIX, so that reason for polling is gone. Of course there can still be advantages, reducing interrupts and hence context switches, which is why the Linux approach does what it does. I have not spent time with that issue, its good to know that there could be problems lurking with it. But if you can simply go with MSIX I would do that for now. Jack On Thu, Jan 13, 2011 at 1:42 PM, Charles Owens wrote: > So we went back to basics (stock 8.1-RELEASE) and found no issue! We > then added in our kernel mods one by one and ultimately discovered that > device-polling is the culprit (the kernel config was simply GENERIC + PAE + > polling). > > Immediately upon running "ifconfig igb0 polling" the symptoms appear. > > This is very good news overall, in that we can certainly disable polling > for igb. This begs the question, though, as to whether polling is > recommended these days at all for em/igb NICs... or even in general. From > other conversations we've seen there seems to be some general debate about > this. In testing we've done in the past (circa 7.0) there certainly seemed > to be benefit to using this feature. What are your thoughts about this? > > For our product releases we'd like stay with RELENG_8_1. Would you > recommend the driver in 8.2 as being preferable? > > In case it's of interest: > > igb0@pci0:1:0:0: class=0x020000 card=0x34de8086 chip=0x10a78086 rev=0x02 > hdr=0x00 > vendor = 'Intel Corporation' device = '82575EB Gigabit Network Connection' > class = network > subclass = ethernet > > > > Thanks, > Charles > > > > On 1/13/11 1:27 PM, Jack Vogel wrote: > > The 8.2 latest does have the latest igb, so using that should be > indicative... > > Jack > > > On Thu, Jan 13, 2011 at 7:56 AM, Charles Owens < > cowens@greatbaysoftware.com> wrote: > >> Ok... I got my wires crossed: our first time testing 8.1 on this >> particular platform was with a kernel that had ichwd enabled (a new thing >> for us) and so when igb started complaining about "watchdog" we thought it >> was related. >> >> We've tested again and clearly the real story is that we're simply seeing >> igb issues, symptoms similar to those described. >> >> Does 8.2-RC1 have sufficiently "latest" code, or should I be looking to >> load up something else? (8-stable, maybe?) >> >> Thanks, >> Charles >> >> >> >> On 1/13/11 12:07 AM, Jack Vogel wrote: >> >> The problem that Robin saw was due to having MSIX interrupts disabled on >> the system, I doubt that >> is going to be the "issue" for others. >> >> Get the latest version of the igb code and see if that helps you as a >> first step. >> >> Jack >> >> >> On Wed, Jan 12, 2011 at 6:43 PM, Charles Owens < >> cowens@greatbaysoftware.com> wrote: >> >>> I'd like to report that we're running into this issue also, in our case >>> on systems that are based on the Intel S5520UR Server Board, running >>> 8.1-RELEASE. If the ichwd driver is loaded we see the same messages, and >>> network communication via the igb nics is non-functional. >>> >>> Have you had any luck? >>> >>> Thanks, >>> Charles >>> >>> Charles Owens >>> Great Bay Software, Inc. >>> >>> >>> >>> >>> On 1/3/11 4:02 PM, Robin Sommer wrote: >>> >>>> Hello all, >>>> >>>> quite a while ago I asked about the problem below. Unfortunately, I >>>> haven't found a solution yet and I'm actually still seeing these >>>> timeouts after just upgrading to 8.2-RC1. Any further ideas on what >>>> could be triggering them, or how I could track down the cause? >>>> >>>> Thanks, >>>> >>>> Robin >>>> >>>> On Thu, Jul 29, 2010 at 14:56 -0700, I wrote: >>>> >>>> Since upgrading from 8.0 to 8.1-RELEASE, I'm seeing lots of messages >>>>> like those below on all my SuperMicro SBI-7425C-T3 blades. There's >>>>> almost no traffic on those interfaces. >>>>> >>>>> Any idea? >>>>> >>>>> Thanks, >>>>> >>>>> Robin >>>>> >>>>> Jul 29 13:01:18 blade0 kernel: igb1: Watchdog timeout -- resetting >>>>> Jul 29 13:01:18 blade0 kernel: igb1: Queue(0) tdh = 256, hw tdt = 266 >>>>> Jul 29 13:01:18 blade0 kernel: igb1: TX(0) desc avail = 1013,Next TX to >>>>> Clean = 255 >>>>> Jul 29 13:01:18 blade0 kernel: igb1: link state changed to DOWN >>>>> Jul 29 13:01:18 blade0 kernel: igb1: link state changed to UP >>>>> Jul 29 13:01:29 blade0 kernel: igb1: Watchdog timeout -- resetting >>>>> Jul 29 13:01:29 blade0 kernel: igb1: Queue(0) tdh = 0, hw tdt = 10 >>>>> Jul 29 13:01:29 blade0 kernel: igb1: TX(0) desc avail = 1014,Next TX to >>>>> Clean = 0 >>>>> Jul 29 13:01:29 blade0 kernel: igb1: link state changed to DOWN >>>>> Jul 29 13:01:29 blade0 kernel: igb1: link state changed to UP >>>>> Jul 29 13:01:46 blade0 kernel: igb1: Watchdog timeout -- resetting >>>>> Jul 29 13:01:46 blade0 kernel: igb1: Queue(0) tdh = 32, hw tdt = 33 >>>>> Jul 29 13:01:46 blade0 kernel: igb1: TX(0) desc avail = 1022,Next TX to >>>>> Clean = 31 >>>>> Jul 29 13:01:46 blade0 kernel: igb1: link state changed to DOWN >>>>> Jul 29 13:01:46 blade0 kernel: igb1: link state changed to UP >>>>> Jul 29 13:01:57 blade0 kernel: igb1: Watchdog timeout -- resetting >>>>> Jul 29 13:01:57 blade0 kernel: igb1: Queue(0) tdh = 0, hw tdt = 10 >>>>> Jul 29 13:01:57 blade0 kernel: igb1: TX(0) desc avail = 1014,Next TX to >>>>> Clean = 0 >>>>> Jul 29 13:01:57 blade0 kernel: igb1: link state changed to DOWN >>>>> Jul 29 13:01:58 blade0 kernel: igb1: link state changed to UP >>>>> Jul 29 13:02:13 blade0 kernel: igb1: Watchdog timeout -- resetting >>>>> >>>>> grep igb /var/run/dmesg.boot >>>>>> >>>>> igb0: port >>>>> 0x2000-0x201f mem >>>>> 0xfc940000-0xfc95ffff,0xfc920000-0xfc93ffff,0xfc900000-0xfc903fff irq 16 at >>>>> device 0.0 on pci4 >>>>> igb0: [FILTER] >>>>> igb0: Ethernet address: 00:30:48:9e:22:00 >>>>> igb1: port >>>>> 0x2020-0x203f mem >>>>> 0xfc980000-0xfc99ffff,0xfc960000-0xfc97ffff,0xfc904000-0xfc907fff irq 17 at >>>>> device 0.1 on pci4 >>>>> igb1: [FILTER] >>>>> igb1: Ethernet address: 00:30:48:9e:22:01 >>>>> >>>>> pciconf -lv >>>>>> >>>>> [...] >>>>> igb0@pci0:4:0:0: class=0x020000 card=0x10a915d9 >>>>> chip=0x10a98086 rev=0x02 hdr=0x00 >>>>> vendor = 'Intel Corporation' >>>>> device = '82575EB Gigabit Backplane Connection' >>>>> class = network >>>>> subclass = ethernet >>>>> igb1@pci0:4:0:1: class=0x020000 card=0x10a915d9 >>>>> chip=0x10a98086 rev=0x02 hdr=0x00 >>>>> vendor = 'Intel Corporation' >>>>> device = '82575EB Gigabit Backplane Connection' >>>>> class = network >>>>> subclass = ethernet >>>>> [...] >>>>> >>>> >>> _______________________________________________ >>> freebsd-net@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >>> >> >> >