Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 12 Sep 2009 03:52:22 -0400
From:      Linda Messerschmidt <linda.messerschmidt@gmail.com>
To:        Julian Elischer <julian@elischer.org>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: Intermittent system hangs on 7.2-RELEASE-p1
Message-ID:  <237c27100909120052k1db7e029xcf36e075865d29d8@mail.gmail.com>
In-Reply-To: <237c27100909112352k5504357dge725c8f905ee650a@mail.gmail.com>
References:  <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200909111102.14503.jhb@freebsd.org> <237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com> <200909111506.47309.jhb@freebsd.org> <237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com> <237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com> <4AAB1E34.2060908@elischer.org> <237c27100909112147h64f71585p2a97f2b48a510985@mail.gmail.com> <4AAB35E0.3000908@elischer.org> <237c27100909112352k5504357dge725c8f905ee650a@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
OK, first, I figured out the seven second thing.  I actually had
already found that particular issue earlier in the troubleshooting
process, but forgot all about it when I pulled in a second machine to
test with.  It was simply a case of setting Apache's
MaxRequestsPerChild to a very low value (128) in combination with only
allowing 1 access at a time.  128 requests * (50ms sleep + 2ms request
+ overhead) ~=3D 7s.  So that was just noise masking the real problem,
which is less frequent and less predictable.  Sorry for the red
herring. :(

On Sat, Sep 12, 2009 at 2:52 AM, Linda Messerschmidt
<linda.messerschmidt@gmail.com> wrote:
> If you're asking could the check script be modified to time out after,
> say, 1 second, and if so, would it return during the hang or after it?
> =A0I don't know. =A0My guess based on the earlier ktrace output is that i=
t
> would time out, but not return until the hang ended. =A0I'll see if I
> the curl lib exposes a configurable timeout and try it.

This proved to be quite easy to do.  I ran the script twice, once with
the timeout and once without.

Without timeout:
1252741492: request 910 101ms
1252741567: request 2133 1429ms
1252741603: request 2722 146ms

With 1s timeout:
1252741492: request 1078 106ms
1252741567: request 2302 1010ms (<--- Timeout)
1252741567: request 2303 273ms   (<--- after 50ms sleep, goes back to
end of stall)
1252741603: request 2892 136ms

As you can see, the two scripts experience stalls in pretty much
lockstep, but the script itself does not appear affected, so it's just
on the Apache side.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?237c27100909120052k1db7e029xcf36e075865d29d8>