From owner-freebsd-stable@FreeBSD.ORG Fri Mar 16 10:33:01 2007 Return-Path: X-Original-To: stable@freebsd.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3A1B216A404 for ; Fri, 16 Mar 2007 10:33:01 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from relay01.kiev.sovam.com (relay01.kiev.sovam.com [62.64.120.200]) by mx1.freebsd.org (Postfix) with ESMTP id C828913C487 for ; Fri, 16 Mar 2007 10:33:00 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from [212.82.216.227] (helo=fw.zoral.com.ua) by relay01.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.60) (envelope-from ) id 1HS9ju-000JKN-Oe for stable@freebsd.org; Fri, 16 Mar 2007 12:32:59 +0200 Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by fw.zoral.com.ua (8.13.4/8.13.4) with ESMTP id l2GAWjRH047767 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 16 Mar 2007 12:32:45 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.13.8/8.13.8) with ESMTP id l2GAWjmR095531; Fri, 16 Mar 2007 12:32:45 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.13.8/8.13.8/Submit) id l2GAWjgY095530; Fri, 16 Mar 2007 12:32:45 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 16 Mar 2007 12:32:45 +0200 From: Kostik Belousov To: Ulrich Spoerlein Message-ID: <20070316103245.GI80993@deviant.kiev.zoral.com.ua> References: <7ad7ddd90703160121u6e5b208fqcbc4221a0cbdd03f@mail.gmail.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="hABqaeELJqnDDeDE" Content-Disposition: inline In-Reply-To: <7ad7ddd90703160121u6e5b208fqcbc4221a0cbdd03f@mail.gmail.com> User-Agent: Mutt/1.4.2.2i X-Virus-Scanned: ClamAV version 0.88.7, clamav-milter version 0.88.7 on fw.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-0.1 required=5.0 tests=ALL_TRUSTED,SPF_NEUTRAL autolearn=failed version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on fw.zoral.com.ua X-Scanner-Signature: eb35aeac2b84d7677a861f668fbe3b20 X-DrWeb-checked: yes X-SpamTest-Envelope-From: kostikbel@gmail.com X-SpamTest-Group-ID: 00000000 X-SpamTest-Info: Profiles 882 [Mar 16 2007] X-SpamTest-Info: helo_type=3 X-SpamTest-Info: {received from trusted relay: not dialup} X-SpamTest-Method: none X-SpamTest-Method: Local Lists X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0255], KAS30/Release Cc: stable@freebsd.org Subject: Re: Snapshot deadlock while dumping X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Mar 2007 10:33:01 -0000 --hABqaeELJqnDDeDE Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Mar 16, 2007 at 09:21:13AM +0100, Ulrich Spoerlein wrote: > Hi, >=20 > One of our fileservers deadlocked, again. It is running RELENG_6 from > 2006-11-14 and was running dump(8) -L on a 11% filled 400GB UFS2 > volume. It is hanging for 3h hours now, and there is no disk activity. >=20 > # ps axl | grep snap > 0 46 0 1 -4 0 0 8 snaplk DL ?? 98:58.88=20 > [bufdaemon] > 0 48 0 0 -4 0 0 8 snaplk DL ?? 68:22.58 [synce= r] > 0 15179 11192 5 8 0 1708 1044 wait I+ p1 0:00.00 sh > -c /sbin/mksnap_ffs /export/ > 0 18738 15179 0 -8 0 2776 1756 getbuf D+ p1 0:04.07 > /sbin/mksnap_ffs /export/homes >=20 > Quotas are enabled in the server, but the filesystems are currently > mounted without quota support (they were once mounted with userquota, > though). >=20 > Thanks, > Uli And, what is the question ? You know what is needed to debug the hang. In addition to DDB, "options DEBUG_LOCKS, DEBUG_VFS_LOCKS" would be very helpful. =46rom the wait channel for proc 18738, I suspect that the problem might be the LOR between cg buffer lock and snaplk. The fix was committed to CURRENT some time ago, and I'm waiting for re@ decision whether the change could be MFCed. Meantime, if you can systematically reproduce the problem, I would recommend you, in addition to providing proper deadlock report, to try the following patch (it was heavily reviewed and tested before committed to CURRENT): http://people.freebsd.org/~kib/misc/bdwrite.8.patch (just ignore xfs chunk). >=20 > PS: I can't break to DDB, as it is not configured for this server. > What are the recommended DDB settings for _production_ servers? I want > them to reboot on panic, but be able to grab the panic string via > serial console. Is something like this gonna do the trick? Is there > some kind of performance impact? >=20 > options KDB > options DDB > options KDB_UNATTENDED > options ALT_BREAK_TO_DEBUGGER >=20 > It should *NOT* enter the debugger, if I plug/pull an RS232 cable. I > read somewhere, that some controllers do send a break if the cable > gets pulled, IIRC. It seems to be reasonable set of options (see above for DEBUG_VFS_LOCKS, that would have some impact on performance). --hABqaeELJqnDDeDE Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFF+nJMC3+MBN1Mb4gRAoNDAJ44/lO39GovF7GdF6WVFdPq76kBzQCgsGRE efePAlAmfIzhBr+6/EJZoCs= =6/Mu -----END PGP SIGNATURE----- --hABqaeELJqnDDeDE--