Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 2 Mar 1999 13:14:51 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Matthew Jacob <mjacob@feral.com>, freebsd-hackers@FreeBSD.ORG
Subject:   Re: Panic in FFS/4.0 as of yesterday - update
Message-ID:  <199903022114.NAA57990@apollo.backplane.com>
References:  <Pine.LNX.4.04.9902260950020.985-100000@feral-gw>    

next in thread | previous in thread | raw e-mail | index | archive | help
    Ok, we're making progress.  I found a major bug ( that Julian is 
    committing now ).  Kirk has already comitted fixes to ffs_fsync() for
    softupdates/NFS combinations and has some alternative code in softupdates
    for a BMSAFEMAP related issue.

    The bugfix is also in the queue to be committed into -3.x and hopefully
    also 2.2.x after we resolve a minor issue that John has brought up.  

    It's a very serious bug though, gladly, it does not happen very often.
    Basically the getblk() code in kern/vfs_bio.c has a section:

                /*
                 * This code is used to make sure that a buffer is not
                 * created while the getnewbuf routine is blocked.
                 * Normally the vnode is locked so this isn't a problem.
                 * VBLK type I/O requests, however, don't lock the vnode.
                 */
                if (VOP_ISLOCKED(vp) != LK_EXCLUSIVE && gbincore(vp, blkno)) {
                        bp->b_flags |= B_INVAL;
                        brelse(bp);
                        goto loop;
                }

    Which really should be:

                if (gbincore(vp, blkno)) {
                        bp->b_flags |= B_INVAL;
                        brelse(bp);
                        goto loop;
                }

    The problem is that the original comment implies that getblk() might be
    called without the vnode locked.  This does, in fact, happen.  Ok... but
    that doesn't mean you can avoid checking gbincore() if you happen to
    find the vnode locked!

    Thus, the bogus VOP_ISLOCKED check can result in the system creating 
    duplicate buffers for the same block.

    Needless to say, this can result in the complete destruction of 
    directories, bitmaps, and filedata, as well as to duplicate allocation
    of blocks and other bad things.  I believe this bug to be responsible
    for the 5 or 6 times ( over 4.5 years and 40+ machines ) that BEST has
    experienced severe filesystem corruption after a crash.

    --

    The new VFS BIO and NFS fixes are still in the commit queue and under
    review, but I expect to get them committed into -4.x soon.  Specifically,
    the new getnewbuf() code solves a low memory lockup problem and a more
    serious supervisor stack overflow problem ( on machines which have deep
    VFS call stacks, such as when you use the VN device ).  Fixes to NFS's
    B_DONE handling are part of this mess too.

						-Matt



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199903022114.NAA57990>