From owner-freebsd-stable@FreeBSD.ORG Tue Mar 9 12:30:03 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 762FE106566B for ; Tue, 9 Mar 2010 12:30:03 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (chello089077043238.chello.pl [89.77.43.238]) by mx1.freebsd.org (Postfix) with ESMTP id AE2868FC15 for ; Tue, 9 Mar 2010 12:30:02 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 4719A45CA0; Tue, 9 Mar 2010 13:30:00 +0100 (CET) Received: from localhost (pdawidek.wheel.pl [10.0.1.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id B8ECD45683; Tue, 9 Mar 2010 13:29:53 +0100 (CET) Date: Tue, 9 Mar 2010 13:29:54 +0100 From: Pawel Jakub Dawidek To: Stefan Bethke Message-ID: <20100309122954.GE3155@garage.freebsd.pl> References: <864468D4-DCE9-493B-9280-00E5FAB2A05C@lassitu.de> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="+jhVVhN62yS6hEJ8" Content-Disposition: inline In-Reply-To: <864468D4-DCE9-493B-9280-00E5FAB2A05C@lassitu.de> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 9.0-CURRENT i386 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-5.9 required=4.5 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.0.4 Cc: FreeBSD Stable Subject: Re: Many processes stuck in zfs X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Mar 2010 12:30:03 -0000 --+jhVVhN62yS6hEJ8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Mar 09, 2010 at 10:15:53AM +0100, Stefan Bethke wrote: > Over the past couple of months, I've more or less regularly observed mach= ines having more and more processes stuck in the zfs wchan. The processes = never recover from that, and trying to reboot only gets the entire system s= tuck, without any console messages. I can enter the debugger, and I have s= aved a couple of dumps. >=20 > The situation seems to be triggered by zfs receive'ing snapshots from the= sister machine (both synchronize their active ZFS filesystems to each othe= r, using zfs send and zfs receive). It appears it's the receiving causing = trouble. >=20 > Both machines run 8-stable from mid-February, with a single-disk ZFS pool= , with ARC limited to 512M, prefetch and ZIL disabled via loader.conf. >=20 > What should I be looking at to further diagnose? What kind of hardware do you have there? There is 3-way deadlock I've a fix for which would be hard to trigger on single or dual core machines. Feel free to try the fix: http://people.freebsd.org/~pjd/patches/zfs_3way_deadlock.patch --=20 Pawel Jakub Dawidek http://www.wheelsystems.com pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --+jhVVhN62yS6hEJ8 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkuWP0EACgkQForvXbEpPzQCcQCfQqvddvCccbLw3EUv2m6jCi1u 1Y8An3FZEheJd24loScxPr/nrEy1kkM7 =wzoV -----END PGP SIGNATURE----- --+jhVVhN62yS6hEJ8--