Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Jan 2016 11:24:09 -0800
From:      Mark Millard <markmi@dsl-only.net>
To:        Ian Lepore <ian@freebsd.org>
Cc:        freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: FYI: various 11.0-CURRENT -r293227 (and older) hangs on arm (rpi2): a description of sorts
Message-ID:  <FB0D5486-AD27-44A7-86CA-68989AE08EC7@dsl-only.net>
In-Reply-To: <1452183170.1215.4.camel@freebsd.org>
References:  <E0379BE9-308A-4219-A8AE-A5FFE828BA93@dsl-only.net> <1452183170.1215.4.camel@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On 2016-Jan-7, at 8:12 AM, Ian Lepore <ian@freebsd.org> wrote:
>=20
> On Thu, 2016-01-07 at 02:19 -0800, Mark Millard wrote:
>> I've had various hangs when the rpi2 was busy over longish periods,
>> both debug buildkernel/buildworld builds of the arm and non-debug
>> variants. No log files or console messages produced.
>>=20
>> I've not had any analogous issues with powerpc64 (PowerMac G5) or
>> with amd64 (Virtual Box used on Mac OS X).
>>=20
>> I've finally discovered that if I have, say, top running on the rpi2
>> serial console that top continues to update its display so long as I
>> leave it alone during the hang. (Otherwise it hangs too.) So I
>> finally have a little window for seeing some of what is happening.
>>=20
>> An example top display showed after the hang:
>>=20
>> Mem: 764M Active 12M Inact 141M Wired 98M Buf 8k free
>> Swap: 2048M Total 29M Used 2019 Free 1% in use
>>=20
>> (Yep: Just 8K free Mem.)
>>=20
>=20
> That's not a problem.
>=20
>> The unusual STATEs for processes seemed to be (for the specific
>> hang):
>>=20
>> STATE   COMMANDs
>> pfault  [ld] [ld] /usr/sbin/syslogd
>> vmwait  [ld] [md0] [kernel]
>> wswbuf  [pagedaemon]
>>=20
>> Those same 3 states seem to always be involved. Some of the processes
>> vary from one hang to the next: the prior hang had build/genautoma ,
>> /usr/sbin/moused , and /usr/sbin/ntpd instead of 3 [ld]'s.
>>=20
>> /usr/sbin/syslogd, [md0], [kernel], and [pagedaemon] and their states
>> do not seem to vary (so far).
>>=20
>>=20
>=20
> Everything is backed up waiting for slow sdcard IO.  You can get an
> amd64 system with many cores and gigabytes of ram into the same state
> with an sdcard (or any other storage device that takes literally
> seconds for any individual IO to complete).  All the available buffers
> get queued up to the one slow device, then you can't do anything that
> requires IO (even launch tools to try to figure out what's going on).
>=20
> -- Ian

This is not the (or a) sdcard for the root file system, it is a fast, =
400GB+ SSD, USB 3.0 capable (not that rpi2 uses it that way). Note below =
the "da0" and the size and such (other than /boot/msdos):

ugen0.5: <Other World Computing> at usbus0
umass0: <Other World Computing Envoy Pro, class 0/0, rev 2.10/1.00, addr =
5> on usbus0
umass0:  SCSI over Bulk-Only; quirks =3D 0x0100
umass0:0:0: Attached to scbus0
da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
da0: <ASMT 2105 0> Fixed Direct Access SPC-4 SCSI device
da0: Serial Number XXXXXXXXXXXX
Release APs
da0: 40.000MB/s transfers
da0: 457862MB (937703088 512 byte sectors)
da0: quirks=3D0x2<NO_6_BYTE>
Trying to mount root from ufs:/dev/ufs/RPI2rootfs [rw,noatime]...
. . .
Starting file system checks:
/dev/ufs/RPI2rootfs: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufs/RPI2rootfs: clean, 109711666 free (14002 frags, 13712208 =
blocks, 0.0% fragmentation)
Mounting local file systems:.
. . .

> Filesystem          1M-blocks  Used  Avail Capacity  Mounted on
> /dev/ufs/RPI2rootfs    443473 16791 391203     4%    /
> devfs                       0     0      0   100%    /dev
> /dev/mmcsd0s1              49     7     42    15%    /boot/msdos


In USB 3.0 contexts I have never observed seconds for an IO for these =
types of SSDs and I use them that way extensively. Nor for USB 2.0 uses, =
though that is not as common of a context for me. Nor have I had any =
problems with the type of USB 3.0 capable hub messing up IO.

I use this type of SSD to hold my Virtual Box virtual machine(s) that I =
run amd64 FreeBSD in on Mac OS X. No problems there. But it is true that =
I've never directly booted amd64 FreeBSD from one of these SSDs in a =
non-virtual amd64 context.

Ignoring that for a moment, so this is an acceptable/expected FreeBSD =
behavior when a "disk" device is slow? Interesting. I've let it sit for =
hours and the hangup does not clear: it is effectively deadlocked for =
overall usage. The rpi2 never will be able to buildworld, buildkernel, =
ports, etc. reliably if this is the sort of behavior that results.

Back to this context: I there a way for me to confirm the queuing of =
buffers to the SSD? Or at least some detail about its buffer usage? Can =
I get some information from ddb that would confirm/deny/provide insight?






=3D=3D=3D
Mark Millard
markmi at dsl-only.net




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FB0D5486-AD27-44A7-86CA-68989AE08EC7>