Date: Thu, 7 Jan 2016 11:24:09 -0800 From: Mark Millard <markmi@dsl-only.net> To: Ian Lepore <ian@freebsd.org> Cc: freebsd-arm <freebsd-arm@freebsd.org> Subject: Re: FYI: various 11.0-CURRENT -r293227 (and older) hangs on arm (rpi2): a description of sorts Message-ID: <FB0D5486-AD27-44A7-86CA-68989AE08EC7@dsl-only.net> In-Reply-To: <1452183170.1215.4.camel@freebsd.org> References: <E0379BE9-308A-4219-A8AE-A5FFE828BA93@dsl-only.net> <1452183170.1215.4.camel@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2016-Jan-7, at 8:12 AM, Ian Lepore <ian@freebsd.org> wrote: >=20 > On Thu, 2016-01-07 at 02:19 -0800, Mark Millard wrote: >> I've had various hangs when the rpi2 was busy over longish periods, >> both debug buildkernel/buildworld builds of the arm and non-debug >> variants. No log files or console messages produced. >>=20 >> I've not had any analogous issues with powerpc64 (PowerMac G5) or >> with amd64 (Virtual Box used on Mac OS X). >>=20 >> I've finally discovered that if I have, say, top running on the rpi2 >> serial console that top continues to update its display so long as I >> leave it alone during the hang. (Otherwise it hangs too.) So I >> finally have a little window for seeing some of what is happening. >>=20 >> An example top display showed after the hang: >>=20 >> Mem: 764M Active 12M Inact 141M Wired 98M Buf 8k free >> Swap: 2048M Total 29M Used 2019 Free 1% in use >>=20 >> (Yep: Just 8K free Mem.) >>=20 >=20 > That's not a problem. >=20 >> The unusual STATEs for processes seemed to be (for the specific >> hang): >>=20 >> STATE COMMANDs >> pfault [ld] [ld] /usr/sbin/syslogd >> vmwait [ld] [md0] [kernel] >> wswbuf [pagedaemon] >>=20 >> Those same 3 states seem to always be involved. Some of the processes >> vary from one hang to the next: the prior hang had build/genautoma , >> /usr/sbin/moused , and /usr/sbin/ntpd instead of 3 [ld]'s. >>=20 >> /usr/sbin/syslogd, [md0], [kernel], and [pagedaemon] and their states >> do not seem to vary (so far). >>=20 >>=20 >=20 > Everything is backed up waiting for slow sdcard IO. You can get an > amd64 system with many cores and gigabytes of ram into the same state > with an sdcard (or any other storage device that takes literally > seconds for any individual IO to complete). All the available buffers > get queued up to the one slow device, then you can't do anything that > requires IO (even launch tools to try to figure out what's going on). >=20 > -- Ian This is not the (or a) sdcard for the root file system, it is a fast, = 400GB+ SSD, USB 3.0 capable (not that rpi2 uses it that way). Note below = the "da0" and the size and such (other than /boot/msdos): ugen0.5: <Other World Computing> at usbus0 umass0: <Other World Computing Envoy Pro, class 0/0, rev 2.10/1.00, addr = 5> on usbus0 umass0: SCSI over Bulk-Only; quirks =3D 0x0100 umass0:0:0: Attached to scbus0 da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 da0: <ASMT 2105 0> Fixed Direct Access SPC-4 SCSI device da0: Serial Number XXXXXXXXXXXX Release APs da0: 40.000MB/s transfers da0: 457862MB (937703088 512 byte sectors) da0: quirks=3D0x2<NO_6_BYTE> Trying to mount root from ufs:/dev/ufs/RPI2rootfs [rw,noatime]... . . . Starting file system checks: /dev/ufs/RPI2rootfs: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/ufs/RPI2rootfs: clean, 109711666 free (14002 frags, 13712208 = blocks, 0.0% fragmentation) Mounting local file systems:. . . . > Filesystem 1M-blocks Used Avail Capacity Mounted on > /dev/ufs/RPI2rootfs 443473 16791 391203 4% / > devfs 0 0 0 100% /dev > /dev/mmcsd0s1 49 7 42 15% /boot/msdos In USB 3.0 contexts I have never observed seconds for an IO for these = types of SSDs and I use them that way extensively. Nor for USB 2.0 uses, = though that is not as common of a context for me. Nor have I had any = problems with the type of USB 3.0 capable hub messing up IO. I use this type of SSD to hold my Virtual Box virtual machine(s) that I = run amd64 FreeBSD in on Mac OS X. No problems there. But it is true that = I've never directly booted amd64 FreeBSD from one of these SSDs in a = non-virtual amd64 context. Ignoring that for a moment, so this is an acceptable/expected FreeBSD = behavior when a "disk" device is slow? Interesting. I've let it sit for = hours and the hangup does not clear: it is effectively deadlocked for = overall usage. The rpi2 never will be able to buildworld, buildkernel, = ports, etc. reliably if this is the sort of behavior that results. Back to this context: I there a way for me to confirm the queuing of = buffers to the SSD? Or at least some detail about its buffer usage? Can = I get some information from ddb that would confirm/deny/provide insight? =3D=3D=3D Mark Millard markmi at dsl-only.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FB0D5486-AD27-44A7-86CA-68989AE08EC7>