Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 07 Oct 2011 15:24:04 -0400
From:      Mike Tancsa <mike@sentex.net>
To:        Jason Wolfe <nitroboost@gmail.com>
Cc:        freebsd-net@freebsd.org
Subject:   Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX	enabled
Message-ID:  <4E8F51D4.1060509@sentex.net>
In-Reply-To: <CAAAm0r2JH43Rct7UxQK2duH1p43Nepnj5mpb6bXo==DPayhJLg@mail.gmail.com>
References:  <CAAAm0r0RXEJo4UiKS=Ui0e5OQTg6sg-xcYf3mYB5%2Bvk8i8557w@mail.gmail.com>	<4E8F157A.40702@sentex.net> <CAAAm0r2JH43Rct7UxQK2duH1p43Nepnj5mpb6bXo==DPayhJLg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 10/7/2011 2:59 PM, Jason Wolfe wrote:
> Mike,
> 
> I had a large pool of servers running 7.2.3 with MSI-X enabled during my
> testing, but it didn't resolve the issue. I just pulled back the
> sys/dev/e1000 directory from 8-STABLE and ran it on 8-RELEASE-p2 though, so
> if there were changes made outside of the actual driver code that helped I
> may have not seen the benefit. It's possible the lagg is adding some
> complication, but when one of the interfaces wedge the lagg continues to
> operate over the other link (though half of the traffic simply fails). It
> appears the interface just runs out of one of its buffers, and is helpless
> to resolve it without a bounce.
> 
> I do recall coming across the ASPM threads, but my Supermicro boards didn't
> have the option and many people claimed it didn't resolve it, so I didn't
> follow through. I'll do a bit more digging there, thanks.
> 
> Disabling MSI-X has without a doubt completely resolved my problem though. I
> would receive about 30 reports/failures a day from my servers when I was
> running with it, since disabling it I haven't received a single one in ~40
> days.  The servers are currently running with the 7.2.3 driver also, so if
> nothing jumps out from my original email I'm happy to re enable it on a
> handful of servers and collect some fresh reports.



Hi Jason,
	This sounds like a real drag :(  You certainly have WAY more servers to
sample from than I do/did (a couple). The problem on my boxes were not
very frequent to start with, so it would take a while. But the symptoms
were very similar in that I would see queue overruns in the stats when
things were wedged.  I have other em nics (non 82574) that get the odd
overrun when they are busy, but they seem to recover from the situation
just fine. The 82574 did not.

When you disable MSI-X, you mean via hw.pci.enable_msix=0 across the
board, or you disable multi-queue for the NIC, so it uses just one
interrupt, rather than separate ones for xmit and recv ?

Also, what is the purpose of
hw.pci.do_power_nodriver=3 vs 0 (3 means put absolutely everything
in D3 state.)

net.link.ifqmaxlen 1024 vs 50 (does anything else need to be adjusted of
this value is increased?)

hw.em.rxd="2048"
hw.em.txd="2048"

Have you tried leaving these two at the default on 7.2.3 ?
if_em.h implies 1024 for each.

	---Mike




-- 
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E8F51D4.1060509>