Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 7 Dec 2009 23:17:11 -0500
From:      Alexander Sack <pisymbol@gmail.com>
To:        Scott Long <scottl@samsco.org>
Cc:        scottl@freebsd.org, freebsd-current@freebsd.org, emaste@freebsd.org, Jung-uk Kim <jkim@freebsd.org>
Subject:   Re: aac(4) resource FIB starvation on BUS scan revisited
Message-ID:  <3c0b01820912072017x7d85c9e3t875692d7264bc05@mail.gmail.com>
In-Reply-To: <0FFC216C-E938-48E4-B0E4-351077C6088A@samsco.org>
References:  <3c0b01820912071342u1c722b2clf9c8413e40097279@mail.gmail.com> <200912071931.46002.jkim@FreeBSD.org> <D7DDDA30-44B2-4E84-9F52-42DD2C43DC62@samsco.org> <200912072005.02662.jkim@FreeBSD.org> <3A549504-2AFE-4133-A8EF-642D53BC9F73@samsco.org> <3c0b01820912072000l7ad1a67ek3514dfccb96417be@mail.gmail.com> <0FFC216C-E938-48E4-B0E4-351077C6088A@samsco.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Dec 7, 2009 at 11:04 PM, Scott Long <scottl@samsco.org> wrote:
>
> On Dec 7, 2009, at 9:00 PM, Alexander Sack wrote:
>
>> On Mon, Dec 7, 2009 at 8:14 PM, Scott Long <scottl@samsco.org> wrote:
>>>
>>> On Dec 7, 2009, at 6:05 PM, Jung-uk Kim wrote:
>>>>
>>>> On Monday 07 December 2009 07:47 pm, Scott Long wrote:
>>>>>
>>>>> On Dec 7, 2009, at 5:31 PM, Jung-uk Kim wrote:
>>>>>>
>>>>>> On Monday 07 December 2009 05:30 pm, Alexander Sack wrote:
>>>>>>>
>>>>>>> On Mon, Dec 7, 2009 at 4:42 PM, Alexander Sack
>>>>>>> <pisymbol@gmail.com>
>>>>>>
>>>>>> wrote:
>>>>>>>>
>>>>>>>> Folks:
>>>>>>>>
>>>>>>>> I posted a similar thread on freebsd-scsi only to realize that
>>>>>>>> scottl had fixed my first issue during some MP CAM cleanup with
>>>>>>>> respect to a race during resource allocation issues on a later
>>>>>>>> version of the driver we are using (I believe we did the same
>>>>>>>> thing to resolve a lock issue on bootup).
>>>>>>>>
>>>>>>>> However on my RELENG_8 box with (2) Adaptec 5085s connected to
>>>>>>>> some JBODs (9TB each) I still have a FIB starvation issue
>>>>>>>> during the LUN scan:
>>>>>>>>
>>>>>>>> The number of FIBs allocated to this card is 512 (older cards
>>>>>>>> are 256). =A0The max_target per bus is 287. =A0On a six channel
>>>>>>>> controller with a BUS scan done in parallel I see a lot of
>>>>>>>> this:
>>>>>>>>
>>>>>>>> ...
>>>>>>>> (probe501:aacp1:0:214:0): Request Requeued
>>>>>>>> (probe501:aacp1:0:214:0): Retrying Command
>>>>>>>> (probe520:aacp1:0:233:0): Request Requeued
>>>>>>>> (probe520:aacp1:0:233:0): Retrying Command
>>>>>>>> (probe528:aacp1:0:241:0): Request Requeued
>>>>>>>> (probe528:aacp1:0:241:0): Retrying Command
>>>>>>>> (probe540:aacp1:0:253:0): Request Requeued
>>>>>>>> (probe540:aacp1:0:253:0): Retrying Command
>>>>>>>> (probe541:aacp1:0:254:0): Request Requeued
>>>>>>>> (probe541:aacp1:0:254:0): Retrying Command
>>>>>>>> ....
>>>>>>>>
>>>>>>>> I think the driver is much happier with the following attached
>>>>>>>> patch (with dmesg).
>>>>>>>
>>>>>>> Patch again but this time not base-64 encoded:
>>>>>>
>>>>>> [SNIP!]
>>>>>>
>>>>>> I want it to be little conservative here, i.e., pre-allocating
>>>>>> half of max_fibs. =A0Will the attached patch work for you?
>>>>>
>>>>> The FIB allocation scheme was written when it was common for
>>>>> machines to only have 64MB of RAM and proportionally less KVA, so
>>>>> 256KB or 512KB was a lot of RAM to wire down. =A0Those days have
>>>>> probably passed.
>>>>
>>>> So, what would do if you were hypothetically rewriting it today? :-)
>>>>
>>>
>>> Most hardware have mechanisms for probing their command queue depth.
>>> =A0What I
>>> typically do these days is allocate a minimum number of commands so tha=
t
>>> this probing can be done, then do a single slab allocation based on the
>>> results. =A0AAC doesn't have this capability, but the 256/512 size is
>>> pretty
>>> well understood. =A0The page-by-page allocation of aac works, but adds
>>> extra
>>> bookkeeping and complication to the driver.
>>>
>>
>> Right Scott, that is what JK and I discussed this evening. =A0I figured
>> the 128 macro was just historical cruft and your email confirms it.
>> So are we ALL okay with the original patch as it stands for now? =A0JK I
>> am fine with the divide 2 change but I think raising it to 256 is
>> really the way to go at this point! =A0:D
>
>
> If you're going to increase it, why not simply increase it to the max amo=
unt
> that is appropriate for each card?

Totally right!  I thought though that the max fibs variable was set my
reading firmware bits up.   Am I off?

1755         /* Check for broken hardware that does a lower number of
commands */
1756         sc->aac_max_fibs =3D (sc->flags & AAC_FLAGS_256FIBS ? 256:512)=
;
1757

So checking against sc->aac_max_fibs would yield 512 up front on
modern controllers.

> One other thing I forgot to mention was contiguous memory. =A0The page-by=
-page
> allocation in aac has another benefit, and that's to not tax contigmalloc
> with finding 256KB of contiguous memory. That's not a big deal at boot, b=
ut
> is a problem if you load the driver after the system has been running for=
 a
> while. =A0It's immensely useful during development, but it's never been c=
lear
> to me how useful it is in real life.

True.  I can't imagine even today after loading it, it would be THAT
much of an issue (besides its a RAID controller, do you really think
you are going to load it so late in the game?).

I am filing PR as we speak just to track!

-aps



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3c0b01820912072017x7d85c9e3t875692d7264bc05>