From owner-freebsd-current@FreeBSD.ORG Sun Sep 7 17:00:41 2014 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7799399B for ; Sun, 7 Sep 2014 17:00:41 +0000 (UTC) Received: from smtprelay05.ispgateway.de (smtprelay05.ispgateway.de [80.67.31.99]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 06CBD1DD7 for ; Sun, 7 Sep 2014 17:00:40 +0000 (UTC) Received: from [87.79.252.204] (helo=fabiankeil.de) by smtprelay05.ispgateway.de with esmtpsa (SSLv3:AES128-SHA:128) (Exim 4.68) (envelope-from ) id 1XQfpI-0000Of-Cf for freebsd-current@freebsd.org; Sun, 07 Sep 2014 19:00:32 +0200 Date: Sun, 7 Sep 2014 19:00:33 +0200 From: Fabian Keil To: Subject: Re: ZFS-related panic: "possible" spa->spa_errlog_lock deadlock Message-ID: <69f90aa2.5d2989ae@fabiankeil.de> In-Reply-To: <540C8039.7010309@delphij.net> References: <492dbacb.5942cc9b@fabiankeil.de> <540C66AC.8070809@delphij.net> <4fa875ba.3cc970d7@fabiankeil.de> <540C8039.7010309@delphij.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/A5pQFSMRW5iUxcWMFYlZxJV"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Sep 2014 17:00:41 -0000 --Sig_/A5pQFSMRW5iUxcWMFYlZxJV Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Xin Li wrote: > On 9/7/14 11:23 PM, Fabian Keil wrote: > > Xin Li wrote: > >=20 > >> On 9/7/14 9:02 PM, Fabian Keil wrote: > >>> Using a kernel built from FreeBSD 11.0-CURRENT r271182 I got > >>> the following panic yesterday: > >>>=20 > >>> [...] Unread portion of the kernel message buffer: [6880] > >>> panic: deadlkres: possible deadlock detected for > >>> 0xfffff80015289490, blocked for 1800503 ticks > >>=20 > >> Any chance to get all backtraces (e.g. thread apply all bt full > >> 16)? I think a different thread that held the lock have been > >> blocked, probably related to your disconnected vdev. > >=20 > > Output of "thread apply all bt full 16" is available at:=20 > > http://www.fabiankeil.de/tmp/freebsd/kgdb-output-spa_errlog_lock-deadlo= ck.txt > > > > A lot of the backtraces prematurely end with "Cannot access memory > > at address", therefore I also added "thread apply all bt" output. > >=20 > > Apparently there are at least two additional threads blocking below > > spa_get_stats(): [...] > Yes, thread 1182 owned the lock and is waiting for the zio be done. > Other threads that wanted the lock would have to wait. >=20 > I don't have much clue why the system entered this state, however, as > the operations should have errored out (the GELI device is gone on > 21:44:56 based on your log, which suggests all references were closed) > instead of waiting. It occurred to me that I have relevant auth.log entries as well: Sep 6 20:54:31 r500 sudo: fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/geli attach -j - /dev/label/spaceloop Sep 6 20:54:44 r500 sudo: fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/geli attach -j - /dev/label/takems Sep 6 20:54:51 r500 sudo: fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/zpool import -d /dev/label takems Sep 6 20:55:30 r500 sudo: fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/zfs send -i @2014-08-13_23:10 tank/home/fk@2014-0= 9-06_19:56 Sep 6 20:55:30 r500 sudo: fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/zfs receive -v -u -F spaceloop/backup/r500/tank/h= ome/fk Sep 6 20:55:46 r500 sudo: fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/zfs send -i @2014-08-13_23:10 tank/home/fk@2014-0= 9-06_19:56 Sep 6 20:55:46 r500 sudo: fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/zfs receive -v -u -F spaceloop/backup/r500/tank/h= ome/fk [...] Sep 6 21:28:47 r500 sudo: fk : TTY=3Dpts/6 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/zpool status spaceloop Sep 6 21:29:43 r500 sudo: fk : TTY=3Dpts/9 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/zpool export takems Sep 6 21:29:46 r500 sudo: fk : TTY=3Dpts/9 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/geli detach label/takems.eli Sep 6 21:30:08 r500 sudo: fk : TTY=3Dpts/10 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/zpool clear spaceloop Sep 6 21:44:16 r500 sudo: fk : TTY=3Dpts/11 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/usr/sbin/usbconfig Sep 6 21:44:56 r500 sudo: fk : TTY=3Dpts/11 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/usr/sbin/usbconfig -d 1.3 reset Sep 6 21:46:26 r500 sudo: fk : TTY=3Dpts/1 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/usr/sbin/usbconfig Sep 6 22:03:33 r500 login: login on ttyv0 as fk IIRC, I tried the USB reset because the "zfs receive ... spaceloop/*", "zpool status spaceloop" and "zpool clear spaceloop" processes (and some that weren't called with sudo and thus weren't logged) got stuck and while it caused the kernel to close label/spaceloop.eli as intended, it did not noticeable affect the deadlocked processes. I don't remember exactly why the same ZFS stream was sent twice, but the mo= st likely explanation seems to be that I aborted the operation before it was d= one and it's conceivable that this left the system in a state that caused the s= econd attempt to get stuck. Fabian --Sig_/A5pQFSMRW5iUxcWMFYlZxJV Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlQMjzEACgkQBYqIVf93VJ3JhQCfYxpbckkuu4k0LhEBrdFEz+oe qVEAoJ+WeRJfPjfnVreyC/aMiUAzKNZg =tiZ7 -----END PGP SIGNATURE----- --Sig_/A5pQFSMRW5iUxcWMFYlZxJV--