Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 12 Feb 2002 12:09:35 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Bruce Evans <bde@zeta.org.au>
Cc:        Matteo <drummino@yahoo.com>, <freebsd-stable@FreeBSD.ORG>, <freebsd-bugs@FreeBSD.ORG>, <aiuto@gufi.org>
Subject:   Re: Crash System with MSDOS file-system driver!
Message-ID:  <200202122009.g1CK9ZW64977@apollo.backplane.com>
References:   <20020212225510.F4096-100000@gamplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help

:
:On Mon, 11 Feb 2002, Matthew Dillon wrote:
:
:>     Well, I think you did find a bug in VFS/BIO.  Even UFS will unnecessarily
:>     break if a fully valid buffer is written and the write fails, due to
:>     b_resid being left non-zero (B_ERROR is cleared in that case and the
:>     write is re-queued).
:
:I think that failure is fairly safe.  The read will just terminate early
:for the bad block.

    Oh no, definitely not safe.  It's similar to the problem we had when
    write-failures caused the dirty block to be removed from the buffer
    cache.  You wind up with an inconsistent filesystem - for example, with
    inconsistent meta-data, as if the system just decided to randomly drop
    a dirty buffer.  This can result in unrecoverable filesystem corruption.
    So instead of having a simple write error and a failure on that directory
    or file, one winds up with a write error plus additional unrelated
    filesystem corruption.  Bitmap and inode blocks are especially
    vulnerable.

:>     I also don't think it's a good idea to just munge breadn() and leave
:>     bread() alone.
:
:bread() just calls breadn().

    As a general case, the situation should apply to anyone locking a buffer,
    thus the placement in getblk().  If it weren't for NFS we could do it
    across the board.

    In anycase, once I get the rest of the MFCs done I'll do a quick workup
    and start testing a fix.

:It is just a sanity check.  Device drivers return a short read for EOF at
:least.  The EOF case shouldn't happen because filesystems are built of
:whole blocks.  Except I forgot the nfs case...  I knew that short reads
:were non-errors for block devices and thought that removing block devices
:made them errors in all cases.

    Remember, the buffer cache is a logical block cache, not a physical block
    cache (though it does that too).  Blocks are primarily relative to their
    logical files.  NFS has no concept of physical backing store, and even
    UFS differentiates between a fragment-backed block and full-block-backed
    block.  The UFS strategy code actually figures out the correct block
    size and so does not depend on b_resid to detect a short read.  NFS does
    not have this luxury in the case of a directory or softlink but hacks
    around the directory problem by keeping track of the directory's ending
    offset.  Physio depends on b_resid to detect an EOF condition.

:>     The API, in general, does not explicitly
:>     disallow EOF occuring in the first buffer so unless you believe an API
:>     change is in order and clearly document it in breadn()'s comments, you
:>     should not commit this patch.  (For example, NFS doesn't use breadn()
:>     but if it did it would likely assume that EOF could occur anywhere, even
:>     in the first buffer).
:
:It's not a good API, so some changes to it wouldn't hurt.  Currently,
:callers of bread() are apparently required to detect short reads by
:comparing bp->b_bcount - bp->b_resid with their requested size.  Most
:callers don't bother.  physio() is the main exception.
:
:Bruce

    Short reads can only be implemented by the underlying physio.  Filesystems
    such as UFS and MSDOSFS know how large the underlying physical device is
    (and for files know how large the underlying file is) and so simply
    ensure that they do not attempt to read past the underlying device's EOF
    or past the file EOF.  This is why they can make the assumption that a
    short read will not occur.  A read error will return a real B_ERROR.

    NFS depends on short reads for directory and symlink operations and also
    depends on short reads for regular file reads if the server chooses to
    optimize a block of zeros.  Fortunately NFS only assumes persistent
    state for (non-VMIO) symlink reads.  Physio depends on short reads for EOF 
    detection but since buffers are not cached it is a transitory indicator
    (and, in anycase, also non-VMIO).

    The buffer cache API supports short reads.  It is not a bug or a mistake.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200202122009.g1CK9ZW64977>