Date: Sun, 7 Sep 2014 19:00:33 +0200 From: Fabian Keil <freebsd-listen@fabiankeil.de> To: <freebsd-current@freebsd.org> Subject: Re: ZFS-related panic: "possible" spa->spa_errlog_lock deadlock Message-ID: <69f90aa2.5d2989ae@fabiankeil.de> In-Reply-To: <540C8039.7010309@delphij.net> References: <492dbacb.5942cc9b@fabiankeil.de> <540C66AC.8070809@delphij.net> <4fa875ba.3cc970d7@fabiankeil.de> <540C8039.7010309@delphij.net>
next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/A5pQFSMRW5iUxcWMFYlZxJV Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Xin Li <delphij@delphij.net> wrote: > On 9/7/14 11:23 PM, Fabian Keil wrote: > > Xin Li <delphij@delphij.net> wrote: > >=20 > >> On 9/7/14 9:02 PM, Fabian Keil wrote: > >>> Using a kernel built from FreeBSD 11.0-CURRENT r271182 I got > >>> the following panic yesterday: > >>>=20 > >>> [...] Unread portion of the kernel message buffer: [6880] > >>> panic: deadlkres: possible deadlock detected for > >>> 0xfffff80015289490, blocked for 1800503 ticks > >>=20 > >> Any chance to get all backtraces (e.g. thread apply all bt full > >> 16)? I think a different thread that held the lock have been > >> blocked, probably related to your disconnected vdev. > >=20 > > Output of "thread apply all bt full 16" is available at:=20 > > http://www.fabiankeil.de/tmp/freebsd/kgdb-output-spa_errlog_lock-deadlo= ck.txt > > > > A lot of the backtraces prematurely end with "Cannot access memory > > at address", therefore I also added "thread apply all bt" output. > >=20 > > Apparently there are at least two additional threads blocking below > > spa_get_stats(): [...] > Yes, thread 1182 owned the lock and is waiting for the zio be done. > Other threads that wanted the lock would have to wait. >=20 > I don't have much clue why the system entered this state, however, as > the operations should have errored out (the GELI device is gone on > 21:44:56 based on your log, which suggests all references were closed) > instead of waiting. It occurred to me that I have relevant auth.log entries as well: Sep 6 20:54:31 r500 sudo: fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/geli attach -j - /dev/label/spaceloop Sep 6 20:54:44 r500 sudo: fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/geli attach -j - /dev/label/takems Sep 6 20:54:51 r500 sudo: fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/zpool import -d /dev/label takems Sep 6 20:55:30 r500 sudo: fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/zfs send -i @2014-08-13_23:10 tank/home/fk@2014-0= 9-06_19:56 Sep 6 20:55:30 r500 sudo: fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/zfs receive -v -u -F spaceloop/backup/r500/tank/h= ome/fk Sep 6 20:55:46 r500 sudo: fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/zfs send -i @2014-08-13_23:10 tank/home/fk@2014-0= 9-06_19:56 Sep 6 20:55:46 r500 sudo: fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/zfs receive -v -u -F spaceloop/backup/r500/tank/h= ome/fk [...] Sep 6 21:28:47 r500 sudo: fk : TTY=3Dpts/6 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/zpool status spaceloop Sep 6 21:29:43 r500 sudo: fk : TTY=3Dpts/9 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/zpool export takems Sep 6 21:29:46 r500 sudo: fk : TTY=3Dpts/9 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/geli detach label/takems.eli Sep 6 21:30:08 r500 sudo: fk : TTY=3Dpts/10 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/zpool clear spaceloop Sep 6 21:44:16 r500 sudo: fk : TTY=3Dpts/11 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/usr/sbin/usbconfig Sep 6 21:44:56 r500 sudo: fk : TTY=3Dpts/11 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/usr/sbin/usbconfig -d 1.3 reset Sep 6 21:46:26 r500 sudo: fk : TTY=3Dpts/1 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/usr/sbin/usbconfig Sep 6 22:03:33 r500 login: login on ttyv0 as fk IIRC, I tried the USB reset because the "zfs receive ... spaceloop/*", "zpool status spaceloop" and "zpool clear spaceloop" processes (and some that weren't called with sudo and thus weren't logged) got stuck and while it caused the kernel to close label/spaceloop.eli as intended, it did not noticeable affect the deadlocked processes. I don't remember exactly why the same ZFS stream was sent twice, but the mo= st likely explanation seems to be that I aborted the operation before it was d= one and it's conceivable that this left the system in a state that caused the s= econd attempt to get stuck. Fabian --Sig_/A5pQFSMRW5iUxcWMFYlZxJV Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlQMjzEACgkQBYqIVf93VJ3JhQCfYxpbckkuu4k0LhEBrdFEz+oe qVEAoJ+WeRJfPjfnVreyC/aMiUAzKNZg =tiZ7 -----END PGP SIGNATURE----- --Sig_/A5pQFSMRW5iUxcWMFYlZxJV--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?69f90aa2.5d2989ae>