Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Sep 2009 22:47:12 -0700
From:      Julian Elischer <julian@elischer.org>
To:        Linda Messerschmidt <linda.messerschmidt@gmail.com>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: Intermittent system hangs on 7.2-RELEASE-p1
Message-ID:  <4AAB35E0.3000908@elischer.org>
In-Reply-To: <237c27100909112147h64f71585p2a97f2b48a510985@mail.gmail.com>
References:  <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com>	 <200909111102.14503.jhb@freebsd.org>	 <237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com>	 <200909111506.47309.jhb@freebsd.org>	 <237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com>	 <237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com>	 <4AAB1E34.2060908@elischer.org> <237c27100909112147h64f71585p2a97f2b48a510985@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Linda Messerschmidt wrote:
> On Sat, Sep 12, 2009 at 12:06 AM, Julian Elischer <julian@elischer.org> wrote:
>> does the system have a serial console? how about a normal console /keyboard?
> 
> It has an IP KVM.
> 
>> how often deos it hang? and for  how long?
> 
> Well, this is interesting.  I got really frustrated with the other
> approach, so I thought I'd thin a machine down absolutely as far as I
> could, eliminate every possible source of delay, and see what happens.
>  I killed everything... cron, RPC, NFS, devd, gmon, nrpe, everything.
> The Apache and its exerciser are now the only things running on the
> machine, and the Apache is only touching an md0 swap device mounted on
> /mnt.  I *still* get the hangs.

ok now we need to describe the hang..  if you can predictably get a 
hang every 7 seconds does this mean that it doesn't respond to 
keyboard for a moment every 7 seconds? or that it doesn't accept 
packets every 7 seconds? if you lean on the A key, do you see echo 
stop every 7 seconds for a moment?

Or is it just the apache process that hangs?

Does the watching process that you refer to below also hang?
would it hang if it tried to access the disk?
if the watching process is on the same machine, does it only trigger 
AFTER teh request has taken a ling time or could it time out with a 
select DURING the delayed response? (another way of asking "how hung
is 'hung'?"



> 
> It hangs for all sorts of different periods, but the duration of the
> stall is approximately inversely proportional to the chance of seeing
> it.  To get a short delay, you need wait only a little bit.  If you
> want a 2-3 second delay, you may have to wait 15-20 minutes.
> 
> *However* in order to answer your question, I changed up the test
> program, which up til now has been cycling requests every 50 ms until
> it gets one >2s, at which point it sysctls to stop ktr and aborts.
> 
> Now it prints the timestamp of all "too long" requests.  But I also
> dropped the threshold for "too long" from 2s to 100ms, since with
> everything on RAM disk, there's no longer any reason to expect a
> request to take more than 1-2ms in the worst case.
> 
> The results are pretty profound:
> 
> 1252729876: request 82 131ms
> 1252729883: request 210 388ms
> 1252729890: request 338 380ms
> 1252729897: request 466 388ms
> 1252729904: request 594 404ms
> 1252729919: request 849 810ms
> 1252729926: request 977 386ms
> 1252729933: request 1105 370ms
> 1252729940: request 1233 366ms
> 1252729947: request 1361 400ms
> 1252729961: request 1617 746ms
> 1252729968: request 1744 477ms
> 1252729975: request 1872 388ms
> 1252729982: request 2000 380ms
> 1252729989: request 2128 384ms
> 1252729996: request 2256 395ms
> 
> It goes on and on like this, I get a 380-400ms stall every seven
> seconds.  I have had a few come back higher, in the 750-850ms range,
> usually after missing a beat:
> 
> 1252729897: request 466 388ms
> 1252729904: request 594 404ms
> 1252729919: request 849 810ms
> 1252729926: request 977 386ms
> 
> 1252730010: request 2512 416ms
> 1252730017: request 2640 390ms
> 1252730031: request 2896 774ms
> 1252730038: request 3023 431ms
> 
> 1252730454: request 10568 378ms
> 1252730461: request 10696 397ms
> 1252730475: request 10952 733ms
> 1252730482: request 11080 366ms
> 
> So far, nothing over 1s.
> 
> So what happens every seven seconds??




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4AAB35E0.3000908>