Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Jul 2006 09:26:41 -0600
From:      Scott Long <scottl@samsco.org>
To:        User Freebsd <freebsd@hub.org>
Cc:        Kostik Belousov <kostikbel@gmail.com>, Achim_Leubner@adaptec.com, Robert Watson <rwatson@freebsd.org>, freebsd-stable@freebsd.org
Subject:   Re: file system deadlock - the whole story?
Message-ID:  <44BE4F31.9020606@samsco.org>
In-Reply-To: <20060719115948.M1799@ganymede.hub.org>
References:  <E1FxzUU-000MMw-5m@cs1.cs.huji.ac.il>	<20060705100403.Y80381@fledge.watson.org>	<cone.1152136419.991036.72616.1000@zoraida.natserv.net>	<20060705234514.I70011@fledge.watson.org>	<20060715000351.U1799@ganymede.hub.org>	<20060715035308.GJ32624@deviant.kiev.zoral.com.ua>	<20060718074804.W1799@ganymede.hub.org>	<20060719112424.GK1464@deviant.kiev.zoral.com.ua>	<20060719082627.H1799@ganymede.hub.org>	<20060719151327.H5132@fledge.watson.org>	<20060719112208.Y1799@ganymede.hub.org>	<20060719154447.K5132@fledge.watson.org> <20060719115948.M1799@ganymede.hub.org>

next in thread | previous in thread | raw e-mail | index | archive | help
User Freebsd wrote:
> On Wed, 19 Jul 2006, Robert Watson wrote:
> 
>> On Wed, 19 Jul 2006, User Freebsd wrote:
>>
>>>> Yes, this was going to be my next question -- if you're seeing 
>>>> wedges under load and there's a common controller in use, maybe 
>>>> we're looking at a driver bug.  Bugs of those sort typically look a 
>>>> lot like what you describe: an I/O is "lost" and so eveything that 
>>>> depends on the I/O wedges waiting for it, leading to a lot of 
>>>> processes hanging around waiting for vnode locks, etc.
>>>
>>>
>>> 'k, but how do we debug *that*? :( If it was one, I'd suspect 
>>> hardware ... but *three*, and only acting up *after* upgrading to 
>>> FreeBSD 6.x, and only acting up under load ...
>>
>>
>> There are two normal approaches:
>>
>> (1) Switch controllers and see if the problem goes away, then blame the
>>    controller that was replaced. :-)
>>
>> (2) Debug the driver when the system is in the wedged state.  When 
>> Scott Long
>>    helped me out with an identical problem with the 3ware driver a few 
>> years
>>    ago, he basically added debugging output for the driver in the 
>> debugger to
>>    list the state of outstanding I/Os, count the number of in-bound,
>>    out-bound I/Os, etc, to try and find where the missing one was 
>> leaked. My
>>    impression is that once he had confirmed the presence of the 
>> problem, it
>>    was fairly easy to fix, but that confirming it required quite a bit of
>>    paperwork.
> 
> 
> 'k, first question is with the core file provide any insight into this? 
> ie. provide further confirmation that it looks like the driver vs file 
> system?
> 
> second question, who is currently maintaining the iir driver?  I've CC'd 
> Achim in this, as he's listed in the man page as being the maintainer ...
> 
> Now, uranus has all the various kernel debugging enabled right now, and 
> a serial console, so we're good for the debugging side of things ... and 
> I believe that I can fairly easily "recreate" the issue by just moving a 
> whack of vServers onto that machine to give it the load that seems to 
> kill it ... *and* uranus is one of my newer machines, so the card that 
> is in it is fairly new ... but, since I have a full BIOS serial console 
> working on it, I should be able to get full model # and firmware 
> version, which I take it will help some?
> 

What exact version of FreeBSD are you dealing with?

Scott




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?44BE4F31.9020606>