Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 May 2009 17:17:01 -0400
From:      Alexander Sack <pisymbol@gmail.com>
To:        freebsd-current@freebsd.org
Subject:   Re: Broadcom bge(4) panics while shutting down
Message-ID:  <3c0b01820905141417h76e9104fl2800524e364d62b6@mail.gmail.com>
In-Reply-To: <3c0b01820905141301h1b08fc0ay1e6a1676b5a149d4@mail.gmail.com>
References:  <3c0b01820905141202w113966dp4bfbab73d84d585@mail.gmail.com> <4A0C7544.6010304@delphij.net> <3c0b01820905141301h1b08fc0ay1e6a1676b5a149d4@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, May 14, 2009 at 4:01 PM, Alexander Sack <pisymbol@gmail.com> wrote:
> On Thu, May 14, 2009 at 3:47 PM, Xin LI <delphij@delphij.net> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hi, Alexander,
>>
>> Alexander Sack wrote:
>>> Hello:
>>>
>>> Under heavy traffic (100% utilization GIGE on a 2 port BGE card)
>>> running BGE CURRENT driver I see panics on shutdown. =A0The reason is
>>> because bge_rxeof() while processing its RX ring of BD's drops the
>>> softc lock when it hands it off to its input function. =A0If bge_stop()
>>> is waiting for it, it will then proceed to acquire lock and then
>>> quiesce the hardware (reseting the card, clearing out BDs etc.). =A0Onc=
e
>>> bge_stop() releases the softc lock, then bge_rxeof() under an
>>> interrupt context (no polling here) will reacquire and continue to
>>> process the ring which is a bad idea. =A0It should check to see if the
>>> card is still running before continuing processing BDs (i.e. once
>>> IF_DRV_RUNNING has been reset by bge_stop(), bge_rxeof() is done, bail
>>> out).
>>>
>>> Here is my first go around with this patch:
>>>
>>>
>>> -- if_bge.c.CURRENT =A0 2009-05-14 14:39:39.000000000 -0400
>>> +++ if_bge.c =A02009-05-14 14:39:24.000000000 -0400
>>> @@ -3081,6 +3081,10 @@
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 uint16_t =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0vla=
n_tag =3D 0;
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 int =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 have_tag =3D 0;
>>>
>>> + =A0 =A0 =A0 =A0 =A0 =A0 if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) {
>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;
>>> + =A0 =A0 =A0 =A0 =A0 =A0 }
>>> +
>>> =A0#ifdef DEVICE_POLLING
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (ifp->if_capenable & IFCAP_POLLING) {
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (sc->rxcycles <=3D 0)
>>>
>>>
>>> This prevents any panics during shutdown under heavy load and AS IT
>>> TURNS out (I feel stupid for not looking) that em(4) already had this
>>> check in its em_rxeof() function (right at the top of the loop). =A0I'm
>>> more than happy changing it to the em style but above seems reasonable
>>> to me though I have to verify there isn't anything missing off the
>>> loop from a hardware standpoint (I don't think so because bge_stop()
>>> did all the dirty work so I believe touching any registers after that
>>> from bge_rxeof() is a bad idea).
>>>
>>> Preliminary testing shows no more panics start and stopping ports
>>> under heavy load (panics were almost immediate otherwise).
>>>
>>> Thoughts?
>>
>> I think this would solve the problem but I'm not sure whether this would
>> increase some overhead on the RX path. =A0It seems that there is a race
>> between bge_release_resources() and bge_intr(), I mean, it might be a
>> good idea to "drain" bge_intr() instead?
>
> Are you talking about detach time? =A0Because bge_stop() gets called
> before bge_release_resources() and stops host interrupts so where is
> the race again? =A0I mean at this point no more interrupts should be
> delivered to bge_intr() (I can confirm from spec since BGE has
> released it in the wild). =A0So why would you "drain" it at this
> point....(the hardware is down including the firmware).
>
> I agree it adds a little overhead to the standard bge_rxeof() path
> which I agree is very sensitive to change. =A0However, I think the check
> at top is tolerable since the other recourse is crash. =A0I mean its
> very easy to reproduce. =A0Flood a Broadcom card with traffic then stop
> the card and let the race begin...it will go down in bge_rxeof() after
> bge_stop releases the lock.
>
> I actually did not look at changing anything structurally to perhaps
> make this whole predicament better but minimally there should be a
> shield against this no?
>
> -aps
>

http://www.freebsd.org/cgi/query-pr.cgi?pr=3D134548

To track...with patch (though spacing got killed, my apologies, I
moved the check into the while logic a la em).  I've tested this with
zero issue so far.

-aps



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3c0b01820905141417h76e9104fl2800524e364d62b6>