Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 12 Sep 2009 00:47:31 -0400
From:      Linda Messerschmidt <linda.messerschmidt@gmail.com>
To:        Julian Elischer <julian@elischer.org>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: Intermittent system hangs on 7.2-RELEASE-p1
Message-ID:  <237c27100909112147h64f71585p2a97f2b48a510985@mail.gmail.com>
In-Reply-To: <4AAB1E34.2060908@elischer.org>
References:  <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200909111102.14503.jhb@freebsd.org> <237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com> <200909111506.47309.jhb@freebsd.org> <237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com> <237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com> <4AAB1E34.2060908@elischer.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Sep 12, 2009 at 12:06 AM, Julian Elischer <julian@elischer.org> wro=
te:
> does the system have a serial console? how about a normal console /keyboa=
rd?

It has an IP KVM.

> how often deos it hang? and for =A0how long?

Well, this is interesting.  I got really frustrated with the other
approach, so I thought I'd thin a machine down absolutely as far as I
could, eliminate every possible source of delay, and see what happens.
 I killed everything... cron, RPC, NFS, devd, gmon, nrpe, everything.
The Apache and its exerciser are now the only things running on the
machine, and the Apache is only touching an md0 swap device mounted on
/mnt.  I *still* get the hangs.

It hangs for all sorts of different periods, but the duration of the
stall is approximately inversely proportional to the chance of seeing
it.  To get a short delay, you need wait only a little bit.  If you
want a 2-3 second delay, you may have to wait 15-20 minutes.

*However* in order to answer your question, I changed up the test
program, which up til now has been cycling requests every 50 ms until
it gets one >2s, at which point it sysctls to stop ktr and aborts.

Now it prints the timestamp of all "too long" requests.  But I also
dropped the threshold for "too long" from 2s to 100ms, since with
everything on RAM disk, there's no longer any reason to expect a
request to take more than 1-2ms in the worst case.

The results are pretty profound:

1252729876: request 82 131ms
1252729883: request 210 388ms
1252729890: request 338 380ms
1252729897: request 466 388ms
1252729904: request 594 404ms
1252729919: request 849 810ms
1252729926: request 977 386ms
1252729933: request 1105 370ms
1252729940: request 1233 366ms
1252729947: request 1361 400ms
1252729961: request 1617 746ms
1252729968: request 1744 477ms
1252729975: request 1872 388ms
1252729982: request 2000 380ms
1252729989: request 2128 384ms
1252729996: request 2256 395ms

It goes on and on like this, I get a 380-400ms stall every seven
seconds.  I have had a few come back higher, in the 750-850ms range,
usually after missing a beat:

1252729897: request 466 388ms
1252729904: request 594 404ms
1252729919: request 849 810ms
1252729926: request 977 386ms

1252730010: request 2512 416ms
1252730017: request 2640 390ms
1252730031: request 2896 774ms
1252730038: request 3023 431ms

1252730454: request 10568 378ms
1252730461: request 10696 397ms
1252730475: request 10952 733ms
1252730482: request 11080 366ms

So far, nothing over 1s.

So what happens every seven seconds??



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?237c27100909112147h64f71585p2a97f2b48a510985>