Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 Feb 2002 18:33:57 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        Matteo <drummino@yahoo.com>, <freebsd-stable@FreeBSD.ORG>, <freebsd-bugs@FreeBSD.ORG>, <aiuto@gufi.org>
Subject:   Re: Crash System with MSDOS file-system driver!
Message-ID:  <20020213180814.R6948-100000@gamplex.bde.org>
In-Reply-To: <200202122009.g1CK9ZW64977@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 12 Feb 2002, Matthew Dillon wrote:

> :On Mon, 11 Feb 2002, Matthew Dillon wrote:
> :
> :>     Well, I think you did find a bug in VFS/BIO.  Even UFS will unnecessarily
> :>     break if a fully valid buffer is written and the write fails, due to
> :>     b_resid being left non-zero (B_ERROR is cleared in that case and the
> :>     write is re-queued).
> :
> :I think that failure is fairly safe.  The read will just terminate early
> :for the bad block.
>
>     Oh no, definitely not safe.  It's similar to the problem we had when
>     write-failures caused the dirty block to be removed from the buffer
>     cache.  You wind up with an inconsistent filesystem - for example, with
>     inconsistent meta-data, as if the system just decided to randomly drop
>     a dirty buffer.  This can result in unrecoverable filesystem corruption.
>     So instead of having a simple write error and a failure on that directory
>     or file, one winds up with a write error plus additional unrelated
>     filesystem corruption.  Bitmap and inode blocks are especially
>     vulnerable.

Er, the current problem doesn't affect metadata, since b_resid is not
checked for metadata blocks.  Only ffs_read() in ffs, and msdosfs_read()
and msdosfs_readdir() in msdosfs check b_resid (msdosfs for some reason
doesn't use VOP_READ() for readdir like ffs).  This makes the failure
fairly safe: ffs_read() terminates prematurely, but msdosfs_readdir()
spins on at least reads of null blocks.  I haven't checked what
msdos_read() does.

> :>     The API, in general, does not explicitly
> :>     disallow EOF occuring in the first buffer so unless you believe an API
> :>     change is in order and clearly document it in breadn()'s comments, you
> :>     should not commit this patch.  (For example, NFS doesn't use breadn()
> :>     but if it did it would likely assume that EOF could occur anywhere, even
> :>     in the first buffer).
> :
> :It's not a good API, so some changes to it wouldn't hurt.  Currently,
> :callers of bread() are apparently required to detect short reads by
> :comparing bp->b_bcount - bp->b_resid with their requested size.  Most
> :callers don't bother.  physio() is the main exception.
>
>     Short reads can only be implemented by the underlying physio.  Filesystems
>     such as UFS and MSDOSFS know how large the underlying physical device is
>     (and for files know how large the underlying file is) and so simply
>     ensure that they do not attempt to read past the underlying device's EOF
>     or past the file EOF.  This is why they can make the assumption that a
>     short read will not occur.  A read error will return a real B_ERROR.

Perhaps the correct fix for them is to actually make that assumption
everywhere and no checking for short reads for read and readdir.
However, I'm unhappy with the assumption not being checked for anywhere.

>     NFS depends on short reads for directory and symlink operations and also
>     depends on short reads for regular file reads if the server chooses to
>     optimize a block of zeros.  Fortunately NFS only assumes persistent
>     state for (non-VMIO) symlink reads.  Physio depends on short reads for EOF
>     detection but since buffers are not cached it is a transitory indicator
>     (and, in anycase, also non-VMIO).
>
>     The buffer cache API supports short reads.  It is not a bug or a mistake.

I disagree.  Critical callers like ffs_balloc() never checked for short
reads, so it must be bread()'s responsibility not to return them.

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020213180814.R6948-100000>