Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 7 Oct 2011 13:46:36 -0700
From:      Jason Wolfe <nitroboost@gmail.com>
To:        Arnaud Lacombe <lacombar@gmail.com>
Cc:        freebsd-net@freebsd.org
Subject:   Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled
Message-ID:  <CAAAm0r1ed1U=LzYhHfCfqr2N9MmC6OR1qSk-wZVCkiaPQ0uvaw@mail.gmail.com>
In-Reply-To: <CACqU3MVUsO4eRS8rZannCO%2BKRf0T55aaYSXrkavi07LMofwreQ@mail.gmail.com>
References:  <CAAAm0r0RXEJo4UiKS=Ui0e5OQTg6sg-xcYf3mYB5%2Bvk8i8557w@mail.gmail.com> <CAFOYbc=6J%2B43sqMEO3gYGQXdOaOw2V4sV=xdgr3feKcc3qPM3A@mail.gmail.com> <CAAAm0r2MoFrGE_ubB_CX8jrV9cxobR-rsrZbRiRqBiLO_-Py4w@mail.gmail.com> <CACqU3MVUsO4eRS8rZannCO%2BKRf0T55aaYSXrkavi07LMofwreQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Oct 7, 2011 at 12:24 PM, Arnaud Lacombe <lacombar@gmail.com> wrote:

> Hi,
>
> On Fri, Oct 7, 2011 at 2:57 PM, Jason Wolfe <nitroboost@gmail.com> wrote:
> > Jack,
> >
> > Entirely possible there are multiple moving pieces here, the only bit I
> know
> > for certain is it's related to the different operation when running with
> MSI
> > vs MSI-X. Here is also my loader.conf for reference. I'm currently
> running
> > the modular congestion control stuff with cubic in use, but these issues
> > predate those changes also. Just to give you a scope of it though, it was
> > somewhat 'rare' for them to wedge. Out of a pool of ~2000 servers running
> > with the 82574L doing ~800Mb/s average, there were ~220 reports in a
> week.
> > So with some fuzzy math to put it in the same terms you were talking in,
> a
> > server in particular would hang about once every 9 weeks.
> >
> Just a two questions out of my mind:
>
> Are the failing server evenly distributed, or always the same are failing ?
>
> Did you collect the uptime and the kernel msgbuf of the server when
> the issue triggered ?
>
> Thanks,
>  - Arnaud
>

Arnaud,

The failures were pretty random, though there were a handful of servers that
did fail a couple times.  It didn't seem attributable to a certain batch or
physical location.

The uptime was not collected, but most were in the ballpark of 30-90 days.
 I was tailing /var/log/messages, but didn't save kern.msgbuf no.  I've
added both of these to the collections and pulled a couple that did fail
more than once and will be re enabling MSI-X on them later today.

Jason



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAAAm0r1ed1U=LzYhHfCfqr2N9MmC6OR1qSk-wZVCkiaPQ0uvaw>