Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Jan 2016 15:24:17 -0800
From:      Mark Millard <markmi@dsl-only.net>
To:        Warner Losh <imp@bsdimp.com>, Hans Petter Selasky <hps@selasky.org>
Cc:        Ian Lepore <ian@freebsd.org>, freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: FYI: various 11.0-CURRENT -r293227 (and older) hangs on arm (rpi2): a description of sorts
Message-ID:  <97E0840E-987C-4893-9E63-EA51741CFC75@dsl-only.net>
In-Reply-To: <CANCZdfqGUJ19Gbu=ermSGh1LJ5N9OPEyRYH9kPEAoaUmTuObdw@mail.gmail.com>
References:  <E0379BE9-308A-4219-A8AE-A5FFE828BA93@dsl-only.net> <1452183170.1215.4.camel@freebsd.org> <FB0D5486-AD27-44A7-86CA-68989AE08EC7@dsl-only.net> <1452196099.1215.12.camel@freebsd.org> <568EC4D8.7010106@selasky.org> <8B728C93-9C90-4821-A607-5D157F028812@dsl-only.net> <568ED810.8010309@selasky.org> <568ED92C.9070602@selasky.org> <B7E8D0FD-B3A9-40DF-B0ED-9D3041F8B2A2@dsl-only.net> <D44C4EF3-0976-45E7-944A-A8F23D3D89BF@dsl-only.net> <CANCZdfqGUJ19Gbu=ermSGh1LJ5N9OPEyRYH9kPEAoaUmTuObdw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 2016-Jan-7, at 2:28 PM, Warner Losh <imp at bsdimp.com> wrote:
>=20
> 4 page requests shouldn't hang the whole system. That should be more =
like hundreds or thousands depending on the tuning you've done.
>=20
> Warner
>=20

FYI: I do not remember doing any explicit tuning. Other than having a =
SSD for the root file system (via fstab content) and using cortex-a7 =
related compile options things are default with ssh and little else =
enabled as I remember. I'm even currently running KERNCONF=3DRPI2 =
instead of my RPI2-NODBG variant.

For my note about L(q)=3D=3D4 for md0: "SWAP/swap/md0" showed 0. The =
only "name" showing a non-zero value was "md0" --and only for L(q).



It does look like the latest hang finally produced some messages: 3 =
copies of

smsc0: warning: failed to create new mbuf

but these messages do not normally appear.



> On Thu, Jan 7, 2016 at 3:16 PM, Mark Millard <markmi@dsl-only.net> =
wrote:
> I'm top posting this change of information about the hang status seen =
via gstat:
>=20
> After a long time the gstat -cod is showing a non-zero value in one =
place:
>=20
> L(q) for md0 is showing 4 now.
>=20
> (I've no clue when it changed. I do not expect that I missed the 4 =
before.)
>=20
> md0 is for the file-system based page file. That file is on the SSD, =
not the sdcard.
>=20
>=20
> =3D=3D=3D
> Mark Millard
> markmi at dsl-only.net
>=20
> On 2016-Jan-7, at 2:04 PM, Mark Millard <markmi@dsl-only.net> wrote:
>=20
> >
> > On 2016-Jan-7, at 1:31 PM, Hans Petter Selasky <hps at selasky.org> =
wrote:
> >>
> >> On 01/07/16 22:26, Hans Petter Selasky wrote:
> >>> On 01/07/16 21:20, Mark Millard wrote:
> >>>>
> >>>> On 2016-Jan-7, at 12:04 PM, Hans Petter Selasky <hps at =
selasky.org>
> >>>> wrote:
> >>>>>
> >>>>> On 01/07/16 20:48, Ian Lepore wrote:
> >>>>>> If the filesystems and swap space are on a usb drive, then =
maybe it's
> >>>>>> the usb subsystem that's hanging.  The wait states you showed =
for those
> >>>>>> processes are consistant with what I've seen when all buffers =
get
> >>>>>> backed up in a queue on one non-responsive or slow device.  It =
may be
> >>>>>> that there's a way to get the system deadlocked when it's low =
on
> >>>>>> buffers and there is memory pressure causing the swap to be =
used (I
> >>>>>> generally run arms systems without any swap configured).
> >>>>>>
> >>>>>> Running gstat in another window while this is going on may give =
you
> >>>>>> some insight into the situation.  Beyond that I don't know what =
to look
> >>>>>> at, especially since you generally can't launch any new tools =
once the
> >>>>>> system gets into this kind of state.
> >>>>>>
> >>>>>> -- Ian
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> All USB transfers towards disk devices have timeouts, so if =
something
> >>>>> is hanging at USB level, you'll get a printout eventually.
> >>>>
> >>>> What sort of timescale after deadlock/live-lock is observed to
> >>>> apparently have started does one have to wait in order to =
conclude
> >>>> that the timeouts would have happened and so they do not apply to =
the
> >>>> deadlock/live-lock?
> >>>>
> >>>>> The USB kernel processes needed for doing I/O transfers are not
> >>>>> pinned to RAM. Can it happen if a USB process is swapped to =
disk,
> >>>>> that the system cannot wakeup a swapped out process to get more =
swap?
> >>>>>
> >>>>> --HPS
> >>>>
> >>>
> >>> Hi,
> >>>
> >>>> Wow. Could I use ddb to somehow check on the "USB kernel =
processes"
> >>>> swap status when the overall context is deadlocked/live-locked?
> >>>
> >>> Are you able to run something like:
> >>>
> >>> ps auxwwH | grep usb
> >>>
> >>>> If yes, how? Otherwise something in top or some such display that =
I'd
> >>> left running over the serial console would have to present useful
> >>> information on the subject. Is there anything that would?
> >>>
> >>
> >> Are you able to SSH into the box or ping it?
> >>
> >> --HPS
> >
> > Once the live-lock condition is reached no new processes can be =
created as far as I can tell: the attempt will hang any process that =
attempts the creation.
> >
> > I'd need "ps auxwwH" to be internally repeating to even get that =
much: I'd have to start it before the live-lock happened and it would =
have to be still running when the hang occurs, no on-going process =
creations involved.
> >
> > I'm not so sure that two communicating processes (ps and grep over a =
pipe) would work but I can not get to even one new process so far.
> >
> > ssh sessions also hang, input and output stop for them fairly =
generally. (Sometimes the context is such that ^t still works but shows =
no progress in what it reports.) No new ssh connections are possible: =
"Operation timed out".
> >
> > ping does respond normally: it is more of a live-lock status then a =
true deadlock one overall.
> >
> > The serial console still outputs what it was already running if that =
process does nothing that locks up. Changing what it is doing generally =
locks it up too.
> >
> > Doing something like unplugging a usb keyboard or mouse or plugging =
one in does show the expected messages via the console: it is more of a =
live-lock status then a true deadlock one overall.
> >
> > I can get to ddb after the hang. But I do not know what I'd do with =
it to find any useful information.
> >
> >
> > As noted in another message: I used gstat instead of top on the =
serial console:
> >
> >> gstat shows everything zero during a hang, even L(q) column. =
(Length of queue?)
> >>
> >> I used:
> >>
> >> gstat -cod
> >>
> >> and had it running over the serial console port during the =
attempted portmaster activity.
> >
> >
> =3D=3D=3D
> Mark Millard
> markmi at dsl-only.net
>=20
>=20
>=20
>=20
>=20
> _______________________________________________
> freebsd-arm@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arm
> To unsubscribe, send any mail to "freebsd-arm-unsubscribe@freebsd.org"
>=20

=3D=3D=3D
Mark Millard
markmi at dsl-only.net




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?97E0840E-987C-4893-9E63-EA51741CFC75>