Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Sep 2006 15:20:02 -0600
From:      Scott Long <scottl@samsco.org>
To:        David G Lawrence <dg@dglawrence.com>
Cc:        Peter Jeremy <peterjeremy@optushome.com.au>, freebsd-stable@freebsd.org, Oliver Brandmueller <ob@e-Gitt.NET>, John Baldwin <jhb@freebsd.org>
Subject:   Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Message-ID:  <451AEB02.2090806@samsco.org>
In-Reply-To: <20060927210349.GG14975@tnn.dglawrence.com>
References:  <451A1375.5080202@gneto.com> <20060927071538.GF22229@e-Gitt.NET> <451A4189.5020906@samsco.org> <20060927152824.GJ22229@e-Gitt.NET> <20060927155553.GB14563@icarus.home.lan> <20060927155904.GM22229@e-Gitt.NET> <451AA7B1.5080202@samsco.org> <20060927191402.GB932@turion.vk2pj.dyndns.org> <20060927210349.GG14975@tnn.dglawrence.com>

next in thread | previous in thread | raw e-mail | index | archive | help
David G Lawrence wrote:
>>In the past (RELENG_5) I've had major problems with syncer delaying
>>interrupt threads for long periods (I've seen 8msec).  See
>>http://lists.freebsd.org/pipermail/freebsd-stable/2005-February/012346.html
>>I'm not sure if this is still a problem (but I am still having some
>>problems which may be caused by excessive interrupt and will be doing
>>some debugging as I get time).
> 
> ...
> 
>>tool and then post-process the file looking for oddities.  In my case,
>>there was a _very_ high correlation between long latencies and syncer.
>>If anyone's interested in this approach, I can provide the relevant
>>code diffs.
> 
> 
>    I've seen this problem as well - results in around 9-10ms of occasional
> scheduling delay for a real-time streaming application that I'm developing.
> Shutting off softupdates on all of the mounted filesystems helps.
>    Note that the watchdog timeout for the network drivers is usually 8000ms
> (8 seconds), so this is unlikely to be related to that problem.
> 

Well, I kinda danced around the issue before, but I'll say it now.  I,
as well as a few others, have seen instances of Giant being held by the
syncer for 5 or more seconds at a time.  I can't explain why, and I've
never been able to catch it in the act in a meaningful way.  But it is
known to happen.  My best wild guess is that the syncer is doing a lot
of work (there is no question here), and keeps on getting preempted, and
as part of this, it blocks without locks being dropped.  Actually, this
is most likely exactly what is going on.  The syncer is sending out I/O
and is getting interrupted+preempted by the sata controller+driver, and
it winds up making very slow progress, while never actually releasing
Giant.

An easy way to test this would be to turn off preemption.  Could someone
with this problem remove the 'option PREEMPTION' line in their kernel
config and recompile/retest?  If this is in fact the root cause, then it
indeed has nothing to do with em driver INTR_FAST changes.  The easiest
fix then becomes the ichsmb and usb driver shims that I talked about.
The longer term fix is to continue progress on making the syncer run
without Giant and also not do so much work.  I think that there should
also be some discussion on the locking consequences of preemption.

Scott



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?451AEB02.2090806>