From owner-freebsd-stable@FreeBSD.ORG Mon Oct 29 19:38:25 2012 Return-Path: Delivered-To: stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id BA9ABF66; Mon, 29 Oct 2012 19:38:25 +0000 (UTC) (envelope-from h.schmalzbauer@omnilan.de) Received: from host.omnilan.net (s1.omnilan.net [62.245.232.135]) by mx1.freebsd.org (Postfix) with ESMTP id 0594D8FC15; Mon, 29 Oct 2012 19:38:24 +0000 (UTC) Received: from titan.inop.wdn.omnilan.net (titan.inop.wdn.omnilan.net [172.21.3.1]) (authenticated bits=0) by host.omnilan.net (8.13.8/8.13.8) with ESMTP id q9TJdpNq006408 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 29 Oct 2012 20:39:51 +0100 (CET) (envelope-from h.schmalzbauer@omnilan.de) Message-ID: <508EDB2F.3010608@omnilan.de> Date: Mon, 29 Oct 2012 20:38:23 +0100 From: Harald Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: stable@FreeBSD.org Subject: Re: lock violation in unionfs (9.0-STABLE r230270) References: <5022840B.3060708@omnilan.de> <5048C6D1.8020007@omnilan.de> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigCBE037511D23FEEE1D7B9120" Cc: daichi@FreeBSD.org, Pavel Polyakov X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Oct 2012 19:38:26 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigCBE037511D23FEEE1D7B9120 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable schrieb Attilio Rao am 27.10.2012 23:07 (localtime): > On Sat, Oct 27, 2012 at 9:46 PM, Attilio Rao wrot= e: >> On Sat, Sep 8, 2012 at 12:48 AM, Attilio Rao wro= te: >>> On Thu, Sep 6, 2012 at 4:52 PM, Harald Schmalzbauer >>> wrote: >>>> schrieb Attilio Rao am 09.08.2012 20:26 (localtime): >>>>> On 8/8/12, Harald Schmalzbauer wrote: >>>>>> schrieb Pavel Polyakov am 06.03.2012 11:20 (localtime): >>>>>>>>> mount -t unionfs -o noatime /usr /mnt >>>>>>>>> >>>>>>>>> insmntque: mp-safe fs and non-locked vp: 0xfffffe01d96704f0 is = not >>>>>>>>> exclusive locked but should be >>>>>>>>> KDB: enter: lock violation >>>>>>>> Pavel, >>>>>>>> can you give a spin to this patch?: >>>>>>>> http://www.freebsd.org/~attilio/unionfs_missing_insmntque_lock.p= atch >>>>>>>> >>>>>>>> I think that the unlocking is due at that point as the vnode loc= k can >>>>>>>> be switch later on. >>>>>>>> >>>>>>>> Let me know what you think about it and what the test does. >>>>>>> Thanks! >>>>>>> This patch fixes the problem with lock violation. Sorry I've test= ed it so >>>>>>> late. >>>>>> Hello, >>>>>> >>>>>> this patch still applies cleanly to RELENG_9_1. Was there another = fix >>>>>> for the issue or has it just not been PR-sent and thus forgotten? >>>>> Can you and Pavel try the attached patch? Unfortunately I had no ti= me >>>>> to test it, I just made in 5 free mins from a non-FreeBSD workstati= on, >>>> Sorry, couldn't test earlier, but now I did: >>>> With this patch applied the machine hangs without debug kernel and t= he >>>> latter gives the following panic: >>>> System call nmount returning with the following locks held: >>>> exclusive lockmgr ufs (ufs) r =3D 0 (0xc5438278) locked @ >>>> src/sys/fs/unionfs/union_vnops.c:1938 >>>> panic: witness_warn >>>> cpuid =3D 0 >>>> KDB: stack backtrace: >>>> db_trace_self_wrapper(c0a04f7f,c0c112c4,d1de3bb4,c097aa8c,fc,...) at= >>>> db_trace_self_wrapper+0x26 >>>> kdb_backtrace(c0a4965f,0,c09c2ede3c1c,0,...) at kdb_backtrace+0x2a >>>> witness_warn(2,0,c0a4ac34,c0a0990a,286,...) at witness_warn+0x1e4 >>>> syscall(d1de3d08) ar syscall+0x415 >>>> Xint0x80_syscall() at Xint0x80_syscall+0x21 >>>> --- syscall (0, FreeBSD ELF32, nosys), eip =3D 0x280b883f,esp =3D >>>> 0xbfbfe46c, ebp =3D 0xbfbfede8 --- >>>> KDB: enter: panic >>>> [ thread pid 86 tid 100054 ] >>>> Stopped ad kdb_enter+0x3a: movl $0,kdb_why >>>> db> bt >>>> Tracing pid 86 tid 100054 td 0xc541b000 >>>> kdb_enter(c0a00d16,c0a09130,0,0,0,...) at panix+0x190 >>>> witness_warn(2,0,x0a4ac34,c0a0990a,286,...) at witness_warn+0x1e4 >>>> syscall(d1de3d08) at syscall+0x415 >>>> Xint0x80_syscall() at Xint0x80_syscall+0x21 >>>> >>>> Hmm, I guess I forgot to install kernel debug symbols... >>>> Coming back if I have more >>> Unfortunately unionfs does very wrong things with the insmntque() loc= king. >>> It basically expects the vnode to return locked in the same way >>> requested by the precedent namei() (when that happens) but when you d= o >>> insmntque() you can only have an LK_EXCLUSIVE lock on the vnode. >> Hello, >> the following patch should workout the issues around unionfs_nodeget()= a bit: >> http://www.freebsd.org/~attilio/unionfs_nodeget2.patch >> >> Unfortunately unionfs code is rather messy in the lookup path about >> locking requirements so follow what it needs to be done there is a bit= >> difficult. >> I have no way to test this patch, so it is just test-compiled at the >> moment, but I would need that you also test lookup path (so directory >> "ls", find(1) on the whole unionfs volume, etc.) to validate it >> someway. > On a second thought, I think that locking in lookup (and also other > operations) is so fragile and difficult to follow that it makes all > vnops real locking landmines. > I think that the following patch fixes the insmntque insertion and > follows the old approach well enough to be committed separately: > http://www.freebsd.org/~attilio/unionfs_nodeget3.patch > Unfortunately I have no idea about all those locking strategies and implementations. Applying unionfs_nodeget3.patch results in: sys/fs/unionfs/union_subr.c: In function 'unionfs_nodeget': sys/fs/unionfs/union_subr.c:332: error: expected statement before ')' token *** [union_subr.o] Error code 1 I guess there is a typo in this chunk: @@ -317,11 +328,11 @@ unionfs_nodeget(struct mount *mp, struct vnode *up vref(vp); } else *vpp =3D vp; - -unionfs_nodeget_out: - if (lkflags & LK_TYPE_MASK) - vn_lock(vp, lkflags | LK_RETRY); - + if (lkflags & LK_TYPE_MASK) { + if (lkflags =3D=3D LK_SHARED)) ---------------------------------------- ^ + vn_lock(vp, LK_DOWNGRADE | LK_RETRY); + } else + VOP_UNLOCK(vp, LK_RELEASE); return (0); } After removing the second right parenthesis kernel compiles. But it still crashes: panic: Lock (lockmgr) ufs not locked @ sys/kern/vfs_default.c:512 cpuid =3D 1 KDB: stack backtrace: =2E.. If you can use the bt info I'll transcribe - no serial console available = :-( Am I right that I should only apply _one_ unionfs-patchX.patch (unionfs_nodeget3.patch in that case)? Thanks, -Harry --------------enigCBE037511D23FEEE1D7B9120 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAlCO2y8ACgkQLDqVQ9VXb8g20gCeINqbhpiC7Vd3Z+F/e6qf2YGF dZMAn2qTC9ze0+UQpBk0h5w9FlULovr/ =/2Lm -----END PGP SIGNATURE----- --------------enigCBE037511D23FEEE1D7B9120--