Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Sep 2009 15:07:35 -0400
From:      Linda Messerschmidt <linda.messerschmidt@gmail.com>
To:        Julian Elischer <julian@elischer.org>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: Intermittent system hangs on 7.2-RELEASE-p1
Message-ID:  <237c27100909101207q73f0c513r60dd5ab83fdfd083@mail.gmail.com>
In-Reply-To: <4AA94995.6030700@elischer.org>
References:  <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200908261642.59419.jhb@freebsd.org> <237c27100908271237y66219ef4o4b1b8a6e13ab2f6c@mail.gmail.com> <200908271729.55213.jhb@freebsd.org> <237c27100909100946q3d186af3h66757e0efff307a5@mail.gmail.com> <bc2d970909100957y6d7fd707g9f3184165f8cb766@mail.gmail.com> <237c27100909101129y28771061o86db3c6a50a640eb@mail.gmail.com> <4AA94995.6030700@elischer.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Sep 10, 2009 at 2:46 PM, Julian Elischer<julian@elischer.org> wrote=
:
> I've noticed that schedgraph tends to show the idle threads slightly
> skewed one way or the other. =A0I think there is a cumulative rounding
> error in the way they are drawn due to the fact that they are run so
> often. =A0Check the raw data and I think you will find that you just
> need to imagine the idle threads slightly to the left or right a bit.

No, there's no period anywhere in the trace where either idle thread
didn't run for an entire second.

I'm pretty sure schedgraph is throwing in some nonsense results.  I
did capture a second, larger, dataset after a 2.1s stall, and
schedgraph includes an httpd process that supposedly spent 58 seconds
on the run queue.  I don't know if it's a dropped record or a parsing
error or what.

I do think on this second graph I can kind of see the *end* of the
stall, because all of a sudden a ton of processes... everything from
sshd to httpd to gmond to sh to vnlru to bufdaemon to fdc0... comes
off of whatever it's waiting on and hits the run queue.  The combined
run queues for both processors spike up to 32 tasks at one point and
then rapidly tail off as things return to normal.

That pretty much matches the behavior shown by ktrace in my initial
post, where everything goes to sleep on something-or-other in the
kernel, and then at the end of the stall, everything wakes up at the
same time.

I think this means the problem is somehow related to locking, rather
than scheduling.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?237c27100909101207q73f0c513r60dd5ab83fdfd083>