Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 13 May 2012 00:49:38 +0200
From:      Mateusz Guzik <mjguzik@gmail.com>
To:        Peter Holm <peter@holm.cc>
Cc:        Doug Barton <dougb@freebsd.org>, Sergey Kandaurov <pluknet@gmail.com>, freebsd-current <freebsd-current@freebsd.org>, mckusick@freebsd.org
Subject:   Re: panic, seems related to r234386
Message-ID:  <20120512224938.GA1322@dft-labs.eu>
In-Reply-To: <20120510103900.GA77554@x2.osted.lan>
References:  <4FA6F324.4080107@FreeBSD.org> <CAE-mSOJBHPP4E_2Hme5nwf0fGfckyRBWeAe9=kodHMmS6eQy%2Bg@mail.gmail.com> <4FA82269.6080406@FreeBSD.org> <20120507201153.GA19942@dft-labs.eu> <20120508194514.GA10688@x2.osted.lan> <20120510102118.GA26472@dft-labs.eu> <20120510103900.GA77554@x2.osted.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, May 10, 2012 at 12:39:00PM +0200, Peter Holm wrote:
> On Thu, May 10, 2012 at 12:21:18PM +0200, Mateusz Guzik wrote:
> > On Tue, May 08, 2012 at 09:45:14PM +0200, Peter Holm wrote:
> > > On Mon, May 07, 2012 at 10:11:53PM +0200, Mateusz Guzik wrote:
> > > > On Mon, May 07, 2012 at 12:28:41PM -0700, Doug Barton wrote:
> > > > > On 05/06/2012 15:19, Sergey Kandaurov wrote:
> > > > > > On 7 May 2012 01:54, Doug Barton <dougb@freebsd.org> wrote:
> > > > > >> I got this with today's current, previous (working) kernel is r232719.
> > > > > >>
> > > > > >> panic: _mtx_lock_sleep: recursed on non-recursive mutex struct mount mtx
> > > > > >> @ /frontier/svn/head/sys/kern/vfs_subr.c:4595
> > > > > 
> > > > > ...
> > > > > 
> > > > > > Please try this patch.
> > > > > > 
> > > > > > Index: fs/ext2fs/ext2_vfsops.c
> > > > > > ===================================================================
> > > > > > --- fs/ext2fs/ext2_vfsops.c     (revision 235108)
> > > > > > +++ fs/ext2fs/ext2_vfsops.c     (working copy)
> > > > > > @@ -830,7 +830,6 @@
> > > > > >         /*
> > > > > >          * Write back each (modified) inode.
> > > > > >          */
> > > > > > -       MNT_ILOCK(mp);
> > > > > >  loop:
> > > > > >         MNT_VNODE_FOREACH_ALL(vp, mp, mvp) {
> > > > > >                 if (vp->v_type == VNON) {
> > > > > > 
> > > > > 
> > > > > Didn't help, sorry. I put 234385 through some pretty heavy load
> > > > > yesterday, and everything was fine. As soon as I move up to 234386, the
> > > > > panic triggered again. So I cleaned everything up, applied your patch,
> > > > > built a kernel from scratch, and rebooted. It was Ok for a few seconds
> > > > > after boot, then panic'ed again, I think in a different place, but I'm
> > > > > not sure because subsequent attempts to fsck the file systems caused new
> > > > > panics which overwrote the old ones before they could be saved.
> > > > > 
> > > > 
> > > > Another MNT_ILOCK was hiding few lines below, try this patch:
> > > > 
> > > > http://student.agh.edu.pl/~mjguzik/patches/ext2fs-ilock.patch
> > > > 
> > > > I've tested this a bit and I believe this fixes your problem.
> > > > 
> > > 
> > > Gave this a spin and found what looks like a deadlock:
> > > 
> > > http://people.freebsd.org/~pho/stress/log/ext2fs.txt
> > > 
> > > Not a new problem, it would seem. Same issue with 8.3-PRERELEASE r232656M.
> > > 
> > 
> > pid 2680 (fts) holds lock for vnode cb4be414 and tries to lock cc0ac15c
> > pid 2581 (openat) holds lock for vnode cc0ac15c and tries to lock cb4be414
> > 
> > openat calls rmdir foo/bar and ext2_rmdir unlocks and tries to lock
> > again foo's vnode.
> > 
> > This is fairly easly reproducible with concurrently running mkdir and fts
> > testcase programs that are provided by stress2.
> > 
> > I'll try to come up with a patch by the end of the week.
> > 
> 

Easier way to reproduce: mkdir from stress2 and "while true; do find /mnt >
/dev/null; done" on another terminal.

Assuming foo/bar directory tree, deadlock happens during removal of bar
with simultaneous lookup of .. in bar.

Proposed trivial patch:
http://student.agh.edu.pl/~mjguzik/patches/ext2fs_rmdir-deadlock.patch

If the lock cannot be acquired immediately unlocks 'bar' vnode and then
locks both vnodes in order.

After patching this I ran into another issue - wrong vnode type panics
from cache_enter_time after calls by ext2_lookup. (It takes some time to
reproduce this, testcase as before.)

It looks like ext2_lookup is actually adapted version of ufs_lookup and
lacks some bugfixes present in current ufs_lookup. I believe those
bugfixes address this bug.

Here is my attempt to fix the problem (based on ufs_lookup changes):
http://student.agh.edu.pl/~mjguzik/patches/ext2fs_lookup-relookup.patch

-- 
Mateusz Guzik <mjguzik gmail.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120512224938.GA1322>