Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Jan 2016 14:04:43 -0800
From:      Mark Millard <markmi@dsl-only.net>
To:        Hans Petter Selasky <hps@selasky.org>
Cc:        freebsd-arm <freebsd-arm@freebsd.org>, Ian Lepore <ian@freebsd.org>
Subject:   Re: FYI: various 11.0-CURRENT -r293227 (and older) hangs on arm (rpi2): a description of sorts
Message-ID:  <B7E8D0FD-B3A9-40DF-B0ED-9D3041F8B2A2@dsl-only.net>
In-Reply-To: <568ED92C.9070602@selasky.org>
References:  <E0379BE9-308A-4219-A8AE-A5FFE828BA93@dsl-only.net> <1452183170.1215.4.camel@freebsd.org> <FB0D5486-AD27-44A7-86CA-68989AE08EC7@dsl-only.net> <1452196099.1215.12.camel@freebsd.org> <568EC4D8.7010106@selasky.org> <8B728C93-9C90-4821-A607-5D157F028812@dsl-only.net> <568ED810.8010309@selasky.org> <568ED92C.9070602@selasky.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On 2016-Jan-7, at 1:31 PM, Hans Petter Selasky <hps at selasky.org> =
wrote:
>=20
> On 01/07/16 22:26, Hans Petter Selasky wrote:
>> On 01/07/16 21:20, Mark Millard wrote:
>>>=20
>>> On 2016-Jan-7, at 12:04 PM, Hans Petter Selasky <hps at selasky.org>
>>> wrote:
>>>>=20
>>>> On 01/07/16 20:48, Ian Lepore wrote:
>>>>> If the filesystems and swap space are on a usb drive, then maybe =
it's
>>>>> the usb subsystem that's hanging.  The wait states you showed for =
those
>>>>> processes are consistant with what I've seen when all buffers get
>>>>> backed up in a queue on one non-responsive or slow device.  It may =
be
>>>>> that there's a way to get the system deadlocked when it's low on
>>>>> buffers and there is memory pressure causing the swap to be used =
(I
>>>>> generally run arms systems without any swap configured).
>>>>>=20
>>>>> Running gstat in another window while this is going on may give =
you
>>>>> some insight into the situation.  Beyond that I don't know what to =
look
>>>>> at, especially since you generally can't launch any new tools once =
the
>>>>> system gets into this kind of state.
>>>>>=20
>>>>> -- Ian
>>>>=20
>>>> Hi,
>>>>=20
>>>> All USB transfers towards disk devices have timeouts, so if =
something
>>>> is hanging at USB level, you'll get a printout eventually.
>>>=20
>>> What sort of timescale after deadlock/live-lock is observed to
>>> apparently have started does one have to wait in order to conclude
>>> that the timeouts would have happened and so they do not apply to =
the
>>> deadlock/live-lock?
>>>=20
>>>> The USB kernel processes needed for doing I/O transfers are not
>>>> pinned to RAM. Can it happen if a USB process is swapped to disk,
>>>> that the system cannot wakeup a swapped out process to get more =
swap?
>>>>=20
>>>> --HPS
>>>=20
>>=20
>> Hi,
>>=20
>>> Wow. Could I use ddb to somehow check on the "USB kernel processes"
>>> swap status when the overall context is deadlocked/live-locked?
>>=20
>> Are you able to run something like:
>>=20
>> ps auxwwH | grep usb
>>=20
>> > If yes, how? Otherwise something in top or some such display that =
I'd
>> left running over the serial console would have to present useful
>> information on the subject. Is there anything that would?
>>=20
>=20
> Are you able to SSH into the box or ping it?
>=20
> --HPS

Once the live-lock condition is reached no new processes can be created =
as far as I can tell: the attempt will hang any process that attempts =
the creation.

I'd need "ps auxwwH" to be internally repeating to even get that much: =
I'd have to start it before the live-lock happened and it would have to =
be still running when the hang occurs, no on-going process creations =
involved.

I'm not so sure that two communicating processes (ps and grep over a =
pipe) would work but I can not get to even one new process so far.

ssh sessions also hang, input and output stop for them fairly generally. =
(Sometimes the context is such that ^t still works but shows no progress =
in what it reports.) No new ssh connections are possible: "Operation =
timed out".

ping does respond normally: it is more of a live-lock status then a true =
deadlock one overall.

The serial console still outputs what it was already running if that =
process does nothing that locks up. Changing what it is doing generally =
locks it up too.

Doing something like unplugging a usb keyboard or mouse or plugging one =
in does show the expected messages via the console: it is more of a =
live-lock status then a true deadlock one overall.

I can get to ddb after the hang. But I do not know what I'd do with it =
to find any useful information.


As noted in another message: I used gstat instead of top on the serial =
console:

> gstat shows everything zero during a hang, even L(q) column. (Length =
of queue?)
>=20
> I used:
>=20
> gstat -cod
>=20
> and had it running over the serial console port during the attempted =
portmaster activity.


=3D=3D=3D
Mark Millard
markmi at dsl-only.net







Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B7E8D0FD-B3A9-40DF-B0ED-9D3041F8B2A2>