Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 09 Nov 2011 12:21:46 +0330
From:      Hooman Fazaeli <hoomanfazaeli@gmail.com>
To:        Adrian Chadd <adrian@freebsd.org>
Cc:        pyunyh@gmail.com, freebsd-net@freebsd.org, Emil Muratov <gpm@hotplug.ru>, Jack Vogel <jfvogel@gmail.com>, Jason Wolfe <nitroboost@gmail.com>
Subject:   Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled
Message-ID:  <4EBA3F22.2060204@gmail.com>
In-Reply-To: <CAJ-Vmomf-wxb8dY7YF7qT_FGK5d-YLPU3BkPOeHnOtKZ%2BUrYeQ@mail.gmail.com>
References:  <CAAAm0r0RXEJo4UiKS=Ui0e5OQTg6sg-xcYf3mYB5%2Bvk8i8557w@mail.gmail.com>	<4E8F51D4.1060509@sentex.net>	<CACqU3MVwLaepFymZJkaVk6p=SpykGhqs=VYFjLh9fP9S=AxDhg@mail.gmail.com>	<CAAAm0r1DKvoL9=Ket9up=4%2B5xiCzTTZJK99FhF9jcCA28B0M%2BA@mail.gmail.com>	<CAAAm0r3XdsMHZh%2BP_NF-txZasdExzwZ8ymmGQgGhJQds0fOiBQ@mail.gmail.com>	<CAAAm0r1iS3z-7CBJ=xYDf%2BJOA1Q2nU0O54Twbyb7FjvgWHjKVw@mail.gmail.com>	<4EA7E203.3020306@sepehrs.com>	<CAAAm0r3Nr2t8cCetPkFnLQ-3KwqHw_0SpqbtvYPRUkSP=9n8CA@mail.gmail.com>	<4EA80818.3030504@sentex.net>	<4EA80F88.4000400@hotplug.ru>	<4EA82715.2000404@gmail.com>	<4EA8FA40.7010504@hotplug.ru>	<4EA91836.2040508@gmail.com>	<4EA959EE.2070806@hotplug.ru>	<4EAD116A.8090006@gmail.com>	<CAAAm0r3qm=nQQuAmZDD4k4X8K-xW6_kM9TukRT=1GoG9dYR3zw@mail.gmail.com>	<4EAE58A2.9040803@gmail.com>	<CAAAm0r0uoPPEQbq5rHkFr6ZLp-WJ4YVjDVvxxV6y%2BUh4eEKDEA@mail.gmail.com>	<4EB96511.50701@gmail.com> <CAJ-Vmomf-wxb8dY7YF7qT_FGK5d-YLPU3BkPOeHnOtKZ%2BUrYeQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 11/8/2011 11:00 PM, Adrian Chadd wrote:
> On 8 November 2011 09:21, Hooman Fazaeli<hoomanfazaeli@gmail.com>  wrote:
>
>> With MSIX enabled, the link task (em_handle_link) does _not_ triggers
>> _start when the link changes state from inactive to active (which it
>> should).
>> If if_snd quickly fills up during a temporary link loss, transmission is
>> stopped forever and the driver never recovers from that state.
>>
>> The last patch should have reduced the frequency of the problem
>> but it assumes every IFQ_ENQUEUE is followed by a if_start which
>> is not a true assumption.
>
> FWIW, I saw something very similar with the if_arge code port from
> Linux. If the TX queue filled up and wasn't serviced before it hit
> completely full, it was never drained.
>
> It may be worthwhile auditing some of the other NIC drivers to ensure
> this kind of situation isn't occuring. Especially if they came from
> Linux. :-)
>
> That's a great catch, I hope it finally fixes the if_em issues with MSIX. :-)
>
>
> Adrian
Just for the record, I should inform you that igb, ixgb and ixbge have the
same issue. I have not checked other drivers.

And there is another subtle problem with all these drivers: if transmit (xxx_xmit)
fails for a temporary memory shortage (i.e., DMA failure for ENOMEM), the driver
may enter the OACTIVE state and _never_ recovers! The scenario is somehow as
before:

- if_start is executed.
- xxx_xmit fails with ENOMEM.
- xxx_start_locked sets OACTIVE. Note that this is different from a low TX descriptor
   condition which also sets OACTIVE.
- stack enqueues packets in if_snd but does not call if_start since driver is OACTIVE.
- stack enqueues more packets until if_snd fills up and packets start to drop.
- Since there is nowhere in the driver's code to re-try transmission when memory becomes
   available again (xxx_local_timer is a candidate), the driver remains OACTIVE forever
   until it is re-initialized.

I am working on patches for em/igb/ixgb/ixgbe to fix these issues and would be
happy to share them with anyone who is interested.

since these are really severe problems, I hope gurus apply official fixes ASAP.











Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4EBA3F22.2060204>