From owner-freebsd-arm@freebsd.org Thu Jan 7 22:04:46 2016 Return-Path: Delivered-To: freebsd-arm@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DAACAA666BC for ; Thu, 7 Jan 2016 22:04:46 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-4.reflexion.net [208.70.210.4]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9D81D1E6F for ; Thu, 7 Jan 2016 22:04:45 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 21977 invoked from network); 7 Jan 2016 22:04:44 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 7 Jan 2016 22:04:44 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v7.80.0) with SMTP; Thu, 07 Jan 2016 17:04:42 -0500 (EST) Received: (qmail 3861 invoked from network); 7 Jan 2016 22:04:42 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with SMTP; 7 Jan 2016 22:04:42 -0000 X-No-Relay: not in my network X-No-Relay: not in my network X-No-Relay: not in my network Received: from [192.168.1.8] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 14F221C43C1; Thu, 7 Jan 2016 14:04:40 -0800 (PST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: FYI: various 11.0-CURRENT -r293227 (and older) hangs on arm (rpi2): a description of sorts From: Mark Millard In-Reply-To: <568ED92C.9070602@selasky.org> Date: Thu, 7 Jan 2016 14:04:43 -0800 Cc: freebsd-arm , Ian Lepore Content-Transfer-Encoding: quoted-printable Message-Id: References: <1452183170.1215.4.camel@freebsd.org> <1452196099.1215.12.camel@freebsd.org> <568EC4D8.7010106@selasky.org> <8B728C93-9C90-4821-A607-5D157F028812@dsl-only.net> <568ED810.8010309@selasky.org> <568ED92C.9070602@selasky.org> To: Hans Petter Selasky X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jan 2016 22:04:46 -0000 On 2016-Jan-7, at 1:31 PM, Hans Petter Selasky = wrote: >=20 > On 01/07/16 22:26, Hans Petter Selasky wrote: >> On 01/07/16 21:20, Mark Millard wrote: >>>=20 >>> On 2016-Jan-7, at 12:04 PM, Hans Petter Selasky >>> wrote: >>>>=20 >>>> On 01/07/16 20:48, Ian Lepore wrote: >>>>> If the filesystems and swap space are on a usb drive, then maybe = it's >>>>> the usb subsystem that's hanging. The wait states you showed for = those >>>>> processes are consistant with what I've seen when all buffers get >>>>> backed up in a queue on one non-responsive or slow device. It may = be >>>>> that there's a way to get the system deadlocked when it's low on >>>>> buffers and there is memory pressure causing the swap to be used = (I >>>>> generally run arms systems without any swap configured). >>>>>=20 >>>>> Running gstat in another window while this is going on may give = you >>>>> some insight into the situation. Beyond that I don't know what to = look >>>>> at, especially since you generally can't launch any new tools once = the >>>>> system gets into this kind of state. >>>>>=20 >>>>> -- Ian >>>>=20 >>>> Hi, >>>>=20 >>>> All USB transfers towards disk devices have timeouts, so if = something >>>> is hanging at USB level, you'll get a printout eventually. >>>=20 >>> What sort of timescale after deadlock/live-lock is observed to >>> apparently have started does one have to wait in order to conclude >>> that the timeouts would have happened and so they do not apply to = the >>> deadlock/live-lock? >>>=20 >>>> The USB kernel processes needed for doing I/O transfers are not >>>> pinned to RAM. Can it happen if a USB process is swapped to disk, >>>> that the system cannot wakeup a swapped out process to get more = swap? >>>>=20 >>>> --HPS >>>=20 >>=20 >> Hi, >>=20 >>> Wow. Could I use ddb to somehow check on the "USB kernel processes" >>> swap status when the overall context is deadlocked/live-locked? >>=20 >> Are you able to run something like: >>=20 >> ps auxwwH | grep usb >>=20 >> > If yes, how? Otherwise something in top or some such display that = I'd >> left running over the serial console would have to present useful >> information on the subject. Is there anything that would? >>=20 >=20 > Are you able to SSH into the box or ping it? >=20 > --HPS Once the live-lock condition is reached no new processes can be created = as far as I can tell: the attempt will hang any process that attempts = the creation. I'd need "ps auxwwH" to be internally repeating to even get that much: = I'd have to start it before the live-lock happened and it would have to = be still running when the hang occurs, no on-going process creations = involved. I'm not so sure that two communicating processes (ps and grep over a = pipe) would work but I can not get to even one new process so far. ssh sessions also hang, input and output stop for them fairly generally. = (Sometimes the context is such that ^t still works but shows no progress = in what it reports.) No new ssh connections are possible: "Operation = timed out". ping does respond normally: it is more of a live-lock status then a true = deadlock one overall. The serial console still outputs what it was already running if that = process does nothing that locks up. Changing what it is doing generally = locks it up too. Doing something like unplugging a usb keyboard or mouse or plugging one = in does show the expected messages via the console: it is more of a = live-lock status then a true deadlock one overall. I can get to ddb after the hang. But I do not know what I'd do with it = to find any useful information. As noted in another message: I used gstat instead of top on the serial = console: > gstat shows everything zero during a hang, even L(q) column. (Length = of queue?) >=20 > I used: >=20 > gstat -cod >=20 > and had it running over the serial console port during the attempted = portmaster activity. =3D=3D=3D Mark Millard markmi at dsl-only.net