From owner-freebsd-current@FreeBSD.ORG Fri Dec 20 19:03:04 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5364C5FA; Fri, 20 Dec 2013 19:03:04 +0000 (UTC) Received: from outpost1.zedat.fu-berlin.de (outpost1.zedat.fu-berlin.de [130.133.4.66]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 0C3EF1109; Fri, 20 Dec 2013 19:03:03 +0000 (UTC) Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost.zedat.fu-berlin.de (Exim 4.82) with esmtp (envelope-from ) id <1Vu5Li-004BLk-CI>; Fri, 20 Dec 2013 20:03:02 +0100 Received: from telesto.geoinf.fu-berlin.de ([130.133.86.198] helo=telesto) by inpost2.zedat.fu-berlin.de (Exim 4.82) with esmtpsa (envelope-from ) id <1Vu5Li-001QGx-8j>; Fri, 20 Dec 2013 20:03:02 +0100 Date: Fri, 20 Dec 2013 20:02:55 +0100 From: "O. Hartmann" To: Alan Somers Subject: Re: ZFS/zpool command blocks ... locking up all terminals Message-ID: <20131220200255.22d2f1b7@telesto> In-Reply-To: References: <20131220115534.52e79a76@telesto> Organization: FU Berlin X-Mailer: Claws Mail 3.9.3 (GTK+ 2.24.22; amd64-portbld-freebsd11.0) MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/8L//lek1G0Ig4P0jzt7g=C9"; protocol="application/pgp-signature" X-Originating-IP: 130.133.86.198 X-ZEDAT-Hint: A Cc: FreeBSD CURRENT X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Dec 2013 19:03:04 -0000 --Sig_/8L//lek1G0Ig4P0jzt7g=C9 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Fri, 20 Dec 2013 11:23:25 -0700 Alan Somers wrote: > On Fri, Dec 20, 2013 at 3:55 AM, O. Hartmann > wrote: > > > > I have a faulty pool with an ambiguous label and I tried to resolve > > that problem. ZFS is at the moment highly active copying data from > > several volumes to another. > > > > Operating system: > > > > 11.0-CURRENT FreeBSD 11.0-CURRENT #1 r259522: Tue Dec 17 19:02:10 > > CET 2013 amd64 > > > > In one terminal I exported the pool in question and tried to list it > > via "zpool import". But the this command sequence locks up the > > terminal for an hour up! > > > > In another terminal I tried to issue to command "zpool status" to > > watch the status of the pools (I have several). But this terminal > > ist alos locked up right now! > > > > What is wrong here? I had such an issue in 10.0-CURRENT as well. It > > seems ZFS is locking everything up and can only be brought back by a > > hard reset! What is going on? Why is zpool locking up in trying to > > display a label-scrambled pool while the zpool status is then also > > locked up, but latter is supposed to show the status of the other, > > healthy pools? This reminds me of single-threaded tools which looks > > up every operation consecutively issued after the blocking command. > > > > How is this to be solved? >=20 > Sounds like a deadlock. Did the "zpool export" complete successfully? No, it didn't, it is now stuck for ~ 8 hours. As well as "zpool status". > Did the pool become suspended at any point? Can you get to the The pools not exported are under heavy load at the moment (two further pools). The pool exported isn't to be checked - I can't check the status since the command is blocking. > kernel debugger? Most importantly, can you reproduce it? If you can, > you'll probably need a WITNESS enabled kernel to get any useful info. I regret, I have no debugging kernel on this machine. The question regarding the fact whether the problem is reproducable is unanswered since I have no chance at this moment to try the procedure under the very same conditions. I once realised the same behaviour in 10.0-CURRENT three months ago. I do not recall the exact conditions. What I do recall is, that after all operations on any pool has finished, the "deadlock" released. At this moment, I try to copy ~ 4TB data from a pool (RAIDZ-0) to an external drive (via USB 3.0, also a ZFS pool). That takes hours and I suspect the deadlock will last that long until the copying is finished. But it is scaring, that a single faulty command can block all further operations of ZFS/zpool even on different pools. > When I find a deadlock, I usually go into the kernel debugger and > issue the following commands. It results in about a megabyte of > output, so use screen or tmux or something to capture the output >=20 > x/s version > show msginfo > ps > alltrace > show alllocks # You need witness for this one I try this later after the backup is gone through. Thank you very much. Oliver >=20 > -Alan >=20 > > > > Oliver > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to > "freebsd-current-unsubscribe@freebsd.org" --Sig_/8L//lek1G0Ig4P0jzt7g=C9 Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQEcBAEBAgAGBQJStJRkAAoJEOgBcD7A/5N8z4QIAK66BVzuvm7mdQOqKipxTpPD rBhMQt544VsK75o0+u7iA57e6B7A88phny174Nq8zFXvQ6kprZviZ7nuSWoWFVzj 1hbfvb5K4KL9bX6bteyFCgvSe4pR/e9qe9cc3wUBoesXxf+cVY69+DcDpU3SiGMF 5Xk/r7f6ABAMILupyTLYrSZ51ZWr4WdJPvC1CIaGkgzn5FkteDRalzkODFRXtuv3 cOZjMP4k/cMwRT3uelXOYfHG4inwwdmZLqRM2pW2PVgB51DmPvXJO9sdbBCZtpzU 4xaKg0CR3z+iA3iq/j7VN0Yej+218DZVhQZo9OLYvCDYR3LEYEEyEeHPh2B9wnI= =k0rW -----END PGP SIGNATURE----- --Sig_/8L//lek1G0Ig4P0jzt7g=C9--