From owner-freebsd-stable@FreeBSD.ORG Tue Mar 9 12:58:23 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 026391065670 for ; Tue, 9 Mar 2010 12:58:23 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (chello089077043238.chello.pl [89.77.43.238]) by mx1.freebsd.org (Postfix) with ESMTP id 414F48FC2D for ; Tue, 9 Mar 2010 12:58:22 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 4EF5045CAC; Tue, 9 Mar 2010 13:58:20 +0100 (CET) Received: from localhost (pdawidek.wheel.pl [10.0.1.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id DD09C45CBA; Tue, 9 Mar 2010 13:58:14 +0100 (CET) Date: Tue, 9 Mar 2010 13:58:15 +0100 From: Pawel Jakub Dawidek To: Borja Marcos Message-ID: <20100309125815.GF3155@garage.freebsd.pl> References: <864468D4-DCE9-493B-9280-00E5FAB2A05C@lassitu.de> <20100309122954.GE3155@garage.freebsd.pl> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="+sHJum3is6Tsg7/J" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 9.0-CURRENT i386 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-5.9 required=4.5 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.0.4 Cc: FreeBSD Stable , Stefan Bethke Subject: Re: Many processes stuck in zfs X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Mar 2010 12:58:23 -0000 --+sHJum3is6Tsg7/J Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Mar 09, 2010 at 01:57:07PM +0100, Borja Marcos wrote: >=20 > On Mar 9, 2010, at 1:29 PM, Pawel Jakub Dawidek wrote: >=20 > > On Tue, Mar 09, 2010 at 10:15:53AM +0100, Stefan Bethke wrote: > >> Over the past couple of months, I've more or less regularly observed m= achines having more and more processes stuck in the zfs wchan. The process= es never recover from that, and trying to reboot only gets the entire syste= m stuck, without any console messages. I can enter the debugger, and I hav= e saved a couple of dumps. > >>=20 > >> The situation seems to be triggered by zfs receive'ing snapshots from = the sister machine (both synchronize their active ZFS filesystems to each o= ther, using zfs send and zfs receive). It appears it's the receiving causi= ng trouble. > >>=20 > >> Both machines run 8-stable from mid-February, with a single-disk ZFS p= ool, with ARC limited to 512M, prefetch and ZIL disabled via loader.conf. > >>=20 > >> What should I be looking at to further diagnose? > >=20 > > What kind of hardware do you have there? There is 3-way deadlock I've a > > fix for which would be hard to trigger on single or dual core machines. > >=20 > > Feel free to try the fix: > >=20 > > http://people.freebsd.org/~pjd/patches/zfs_3way_deadlock.patch >=20 > Maybe related to the deadlock I reported when I was receiving an incremen= tal snapshot while the target dataset was being read? Could be. This deadlock is in general related to zfs recv functionality. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --+sHJum3is6Tsg7/J Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkuWReYACgkQForvXbEpPzQXUgCff7LzvckBJCEu/KzhxEwApHCe hXcAoPS1vGVYm+6SnLr4LHP3k9+tdXWq =GWQu -----END PGP SIGNATURE----- --+sHJum3is6Tsg7/J--