Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 12 Apr 2005 22:21:11 -0600
From:      Scott Long <scottl@samsco.org>
To:        David Sze <dsze@alumni.uwaterloo.ca>
Cc:        mb@imp.ch
Subject:   Re: [PATCH] Stability fixes for IPS driver for 4.x
Message-ID:  <425C9E37.2010105@samsco.org>
In-Reply-To: <6.2.1.2.2.20050412234622.05a6daf8@mail.distrust.net>
References:  <4257F20C.70004@samsco.org> <6.2.1.2.2.20050411005214.065dc018@mail.distrust.net> <425A0BB2.10704@samsco.org> <6.2.1.2.2.20050411234713.069afb28@mail.distrust.net> <425C12E3.5050205@samsco.org> <6.2.1.2.2.20050412234622.05a6daf8@mail.distrust.net>

next in thread | previous in thread | raw e-mail | index | archive | help
David Sze wrote:
> At 12:26 PM 12/04/2005 -0600, Scott Long wrote this to All:
> 
>> David Sze wrote:
>>
>>> At 11:31 PM 10/04/2005 -0600, Scott Long wrote this to All:
>>>
>>>> Making a driver PAE-ified means either teaching it to do 64-bit
>>>> scatter-gather (assuming that the peripheral hardware can do this
>>>> and that it's documented), or teaching the driver to correctly handle
>>>> EINPROGRESS from bus_dmamap_load() along with using the proper busdma
>>>> tag limits.  The strategy I took with 6.x/5.x was the second one since
>>>> I didn't have good IPS docs in front of me and I wanted it follow the
>>>> APIs correctly.  I did test it with 8GB of memory and it performed
>>>> correctly under load.  I haven't taken a close enough look at your
>>>> MFC patch to say for sure if it's correct or not.  I'm not sure if
>>>> I'll have time to take another look in the next few days, 
>>>> unfortunately.
>>>> Is there any chance you could test 5.x/6.0 under load with PAE just to
>>>> validate the assertion that it works correctly there?
>>>
>>>
>>> I had a chance to test 5.4-RC1 (i386) today with GENERIC, SMP, PAE, 
>>> and SMP-PAE kernels (the last one is just PAE with "options SMP").
>>> To recap, the hardware is an IBM xSeries 346, Dual Xeon 3GHz 
>>> (non-E64MT), ServeRAID-7K.
>>> GENERIC and SMP survived "make buildkernel", but PAE and SMP-PAE 
>>> paniced reproducibly doing the same.  The DDB stack trace doesn't 
>>> appear to be anywhere near the IPS driver though, so I'm way out of 
>>> my league.
>>
>>
>> Darnit, hard to say if this is an existing bug in 5.4 or if it's a 
>> bug/corruption in ips.Can you re-run with PAE disabled?
> 
> 
> Works fine with PAE disabled (or at least I couldn't get it to panic), 
> both UP and SMP kernels.
> 
> 
>> Would you be
>> willing to put the Giant lock back on top of the driver?  This would
>> mean modifying the call to bus_intr_config(), adding the D_GIANTNEEDED
>> flag to the disk structure in disk_create(), and switching the mutex
>> argument in bus_dma_tag_create() for the sg_dmatag tag.
> 
> 
> I put Giant back in as you described (patch attached), but it still 
> panic'ed with PAE enabled, both UP and SMP kernels.  The stack trace was 
> very similar; the fault address (0x24) and the top three stack frames 
> were the same as without Giant:
> 
>         propagate_priority
>         turnstile_wait
>         _mtx_lock_sleep
> 
> At this point I no longer have access to the hardware, the customer 
> wanted his servers back.  They're going into the datacenter with 
> RELENG_4 (w/IPS stability patch), without PAE (so the top ~900MB of his 
> 4GB RAM is lost to PCI-X address space).
> 
> 

Crumbs, I see a potential problem.  I won't have time until this weekend
to sort it out, though.  Sorry this has become such a drawn-out affair,
I hope that your customer isn't too upset.

Scott



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?425C9E37.2010105>