From owner-freebsd-fs@FreeBSD.ORG Thu May 1 17:09:52 2014 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C1BEAB2D for ; Thu, 1 May 2014 17:09:52 +0000 (UTC) Received: from albert.catwhisker.org (mx.catwhisker.org [198.144.209.73]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8E6551442 for ; Thu, 1 May 2014 17:09:52 +0000 (UTC) Received: from albert.catwhisker.org (localhost [127.0.0.1]) by albert.catwhisker.org (8.14.8/8.14.8) with ESMTP id s41H9prV033679; Thu, 1 May 2014 10:09:51 -0700 (PDT) (envelope-from david@albert.catwhisker.org) Received: (from david@localhost) by albert.catwhisker.org (8.14.8/8.14.8/Submit) id s41H9pUS033678; Thu, 1 May 2014 10:09:51 -0700 (PDT) (envelope-from david) Date: Thu, 1 May 2014 10:09:51 -0700 From: David Wolfskill To: Kirk McKusick Subject: Re: SU+J: 185 processes in state "suspfs" for >8 hrs. ... not good, right? Message-ID: <20140501170951.GI1120@albert.catwhisker.org> References: <20140501161856.GH1120@albert.catwhisker.org> <201405011651.s41GphgX089174@chez.mckusick.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="s8zlJy+nbqKJ93gX" Content-Disposition: inline In-Reply-To: <201405011651.s41GphgX089174@chez.mckusick.com> User-Agent: Mutt/1.5.23 (2014-03-12) Cc: fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list Reply-To: fs@freebsd.org, David Wolfskill List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 May 2014 17:09:52 -0000 --s8zlJy+nbqKJ93gX Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, May 01, 2014 at 09:51:43AM -0700, Kirk McKusick wrote: >>... >=20 > The following fix for related problems was made to head and MFC'ed > to stable/10 but not stable/9. >=20 > *** stable/9/sys/ufs/ffs/ffs_vnops.c 2014-03-05 08:51:48.000000000 -0800 > --- stable/9/sys/ufs/ffsffs_vnops.c 2014-05-01 09:41:35.000000000 -0700 > *************** > *** 258,266 **** > continue; > if (bp->b_lblkno > lbn) > panic("ffs_syncvnode: syncing truncated data."); > ! if (BUF_LOCK(bp, LK_EXCLUSIVE | LK_NOWAIT, NULL)) > continue; > - BO_UNLOCK(bo); > if ((bp->b_flags & B_DELWRI) =3D=3D 0) > panic("ffs_fsync: not dirty"); > /* > --- 258,274 ---- > continue; > if (bp->b_lblkno > lbn) > panic("ffs_syncvnode: syncing truncated data."); > ! if (BUF_LOCK(bp, LK_EXCLUSIVE | LK_NOWAIT, NULL) =3D=3D 0) { > ! BO_UNLOCK(bo); > ! } else if (wait !=3D 0) { > ! if (BUF_LOCK(bp, > ! LK_EXCLUSIVE | LK_SLEEPFAIL | LK_INTERLOCK, > ! BO_LOCKPTR(bo)) !=3D 0) { > ! bp->b_vflags &=3D ~BV_SCANNED; > ! goto next; > ! } > ! } else > continue; > if ((bp->b_flags & B_DELWRI) =3D=3D 0) > panic("ffs_fsync: not dirty"); > /* >=20 > The associated comment is: >=20 > If we fail to do a non-blocking acquire of a buf lock while doing a > waiting sync pass we need to do a blocking acquire and restart. > Another thread, typically the buf daemon, may have this buf locked and > if we don't wait we can fail to sync the file. This lead to a great > variety of softdep panics and deadlocks because we rely on all > dependencies being flushed before proceeding in several cases. Cool -- thanks! > Let me know if it helps your problem. If it does, I will MFC it to 9. > There have been several other fixes made to SU+J that are more likely > to be the cause of your problem, but they are not easily back-ported > to stable/9. So if this does not fix your problem my only suggestions > are to turn off journaling or move to running on stable/10. >=20 > Kirk McKusick Roger that. And yes, stable/10 is a goal -- but I *just* finally managed to get the machines migrated from 8.2-ish to 9.2. :-) (Note: I do not have direct control -- merely a measure of influence. :-}) Peace, david --=20 David H. Wolfskill david@catwhisker.org Taliban: Evil cowards with guns afraid of truth from a 14-year old girl. See http://www.catwhisker.org/~david/publickey.gpg for my public key. --s8zlJy+nbqKJ93gX Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQJ8BAEBCgBmBQJTYn/dXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQ4RThEMDY4QTIxMjc1MDZFRDIzODYzRTc4 QTY3RjlDOERFRjQxOTNCAAoJEIpn+cje9Bk7IH4P/RtaLA4RxmpGOZJ6ndwkCOR9 Xm5Bwz2lH7SMUI4wqxylcy/9zkTJdqdzeliU22TNQ0mL0ldN50p7tnHTRi99pAO3 OTOTzmzqoKIDe+gzqR+tpHNumolg2+rTWHHMw2S/sT8brKsdUkYFN5zh1yb5T9kC dX9Oz6Lwht1xZfUFlrBg63aGdn+eqVxKbFD//WTTNAeRLpHPl4K22w+JhKmjxcp3 rrRMwTR0Vd99fW2z7zJ67hZFWkKVZ0i3c3KQMWHxzBbZXM9WS5pU4xCoWDkPOwCr ELQ3myZeV+2+72k9fe8voGKjsOiPDyyg07J+WU8ECqeymGUJLL9Haf+UXEfquqGR BEkPjpzW9ZuLvTx2DDWBgT5yZqI2cFh6WBA+GH1eQgSpaO+cfd8Az2s2VS+tl/61 NwGLIcr82LYnMW7Cx3d8L6VdB70UVaLNdIr7Vy7ER/x4THjQ9vuhgx7yShKgfTLz 1OTrxgTsHBvFPH88pmjQVTEI4KslCLsi8tE7TtYtsndE3bvAjULPbxrmefUkngSu m/zaxZdjmVGciv1m/GB6WaYwl+Qe22mADl9VTq761DgIxIpMLeyRZoAUBjNu2lTo 6AyR9OMoCiCelZKwhajUy6a9iLw+iDgvdVdt9dYanfrskz9HDqv59AJW/vXje8nm 43+333DfnlRqBO6FhrcL =4V2i -----END PGP SIGNATURE----- --s8zlJy+nbqKJ93gX--