From owner-freebsd-hackers@FreeBSD.ORG Wed Dec 3 12:45:43 2008 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EEEE91065672 for ; Wed, 3 Dec 2008 12:45:43 +0000 (UTC) (envelope-from david@catwhisker.org) Received: from bunrab.catwhisker.org (adsl-63-193-123-122.dsl.snfc21.pacbell.net [63.193.123.122]) by mx1.freebsd.org (Postfix) with ESMTP id A38BB8FC14 for ; Wed, 3 Dec 2008 12:45:43 +0000 (UTC) (envelope-from david@catwhisker.org) Received: from bunrab.catwhisker.org (localhost [127.0.0.1]) by bunrab.catwhisker.org (8.13.3/8.13.3) with ESMTP id mB3Cj7dA008543; Wed, 3 Dec 2008 04:45:07 -0800 (PST) (envelope-from david@bunrab.catwhisker.org) Received: (from david@localhost) by bunrab.catwhisker.org (8.13.3/8.13.1/Submit) id mB3Cj7JD008542; Wed, 3 Dec 2008 04:45:07 -0800 (PST) (envelope-from david) Date: Wed, 3 Dec 2008 04:45:07 -0800 From: David Wolfskill To: Danny Braniss Message-ID: <20081203124507.GE96383@bunrab.catwhisker.org> References: <20081203001538.GC96383@bunrab.catwhisker.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="vmttodhTwj0NAgWp" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i Cc: hackers@freebsd.org Subject: Re: NFS (& amd?) dysfunction descending a hierarchy X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Dec 2008 12:45:44 -0000 --vmttodhTwj0NAgWp Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Dec 03, 2008 at 02:20:32PM +0200, Danny Braniss wrote: > ... > i'll try to check it here soon, but in the meantime, could you try the sa= me > but mounting directly, not via amd, to remove one item from the equation? > (I don't know how much amd is involved here, but if you are running on a > 64bit host, amd could be swapped out, in which case it tends to realy scr= ew > things up, which is not your case, but ...) Sorry; I should have mentioned that the NFS client was running RELENG_7_1 as of Monday morning, i386 arch. The amd.conf file specifies "plock" for amd(8). Note that merely telling amd(8) to kick the interval of attempted unmounts from 2 minutes to 12 hours appears to avoid the observed symptoms, so I'm fairly confident that bypassing amd(8) altogether would do so as well. In looking at the output from ktrace against amd(8), I recall having seen that shortly before an observed failure, the (master) amd process forks a child to attempt the unmount; the child issues an unmount, the return for which is EBUSY (IIRC -- I'm not in a good position to check just at the moment), so the child terminates with an "interrupted system call". I'd have thought that since the attempted unmount failed, it wouldn't make any difference, but it's right around that point that rm(1) is told that a directory entry it found earlier doesn't exist, which rather snowballs into the previously-described symptoms. Peace, david --=20 David H. Wolfskill david@catwhisker.org Depriving a girl or boy of an opportunity for education is evil. See http://www.catwhisker.org/~david/publickey.gpg for my public key. --vmttodhTwj0NAgWp Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (FreeBSD) iEYEARECAAYFAkk2f1IACgkQmprOCmdXAD0QGgCfV6hjA8RNY1gy2kd1AgFgO4/f rEoAn2/9n0tkA9auQrYetLZvLd3N0GGJ =CyTQ -----END PGP SIGNATURE----- --vmttodhTwj0NAgWp--