From owner-freebsd-bugs@FreeBSD.ORG  Wed Aug 18 00:32:50 2004
Return-Path: <owner-freebsd-bugs@FreeBSD.ORG>
Delivered-To: freebsd-bugs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 1F01216A4CE; Wed, 18 Aug 2004 00:32:50 +0000 (GMT)
Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 6BD8B43D41; Wed, 18 Aug 2004 00:32:49 +0000 (GMT)
	(envelope-from bde@zeta.org.au)
Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au
	[61.8.0.87])i7I0Wlje015342;	Wed, 18 Aug 2004 10:32:47 +1000
Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246])
	i7I0WhBX017417;	Wed, 18 Aug 2004 10:32:44 +1000
Date: Wed, 18 Aug 2004 10:32:42 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@delplex.bde.org
To: Amit Khivesara <amit.khivesara@utstar.com>
In-Reply-To: <200408062032.i76KWelQ051149@www.freebsd.org>
Message-ID: <20040818081847.J51701@delplex.bde.org>
References: <200408062032.i76KWelQ051149@www.freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-bugs@FreeBSD.org
cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: misc/70096: Full msdos file system causes corruption
X-BeenThere: freebsd-bugs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Bug reports <freebsd-bugs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-bugs>
List-Post: <mailto:freebsd-bugs@freebsd.org>
List-Help: <mailto:freebsd-bugs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Aug 2004 00:32:50 -0000

On Fri, 6 Aug 2004, Amit Khivesara wrote:

> >Description:
>  When the file system gets full, then it stores a incorrect
>  value of nxtfree into the disk fsinfo.
>
> This will cause the assert in :
> /*
>      * Check and validate (or perhaps invalidate?) the fsinfo structure?
>      */
>     if (pmp->pm_fsinfo && pmp->pm_nxtfree > pmp->pm_maxcluster) {
>         printf(
>         "Next free cluster in FSInfo (%lu) exceeds maxcluster (%lu)\n",
>             pmp->pm_nxtfree, pmp->pm_maxcluster);
>         error = EINVAL;
>         goto error_exit;
>     }
>
>
> to be fired.

I think failure of this check shouldn't cause a mount failure, since
pmp->pm_nxtfree is only advisory and quite often doesn't give a cluster
that is actually free.  IIRC, Windows (95,XP) doesn't care what it is
and doesn't fix it in scandisk or chkdsk after FreeBSD has messed it up.

The check seems to have been fixed incorrectly in revs.1.92 and 1.117
of msdosfs_vfsops.c.  Apart from the bug reported in this PR, there
only seems to be a problem with the magic value of 0xffffffff which
is sometimes stored in the fsinfo corresponding to pm_nxtfree.  Old
versions didn't do anything here.  NextBSD (at least in rev.1.14 of
msdosfs_vfsops.c) just silently converts the (wrong on 64-bit machines?)
value of (u_long)-1 to the slightly wrong value of 0 (cluster can never
be free).  Rev.1.92 in the FreeBSD version added the above error
handling.  It catches the magic value as a special case of a too-large
value.  That was wrong, and the problem was worked around in rev.1.117
by converting 0xffffffff to the slightly less wrong value of CLUST_FIRST
(CLUST_FIRST can be free but rarely is).


> >Fix:
> --- msdosfs_fat.c.orig  Wed Jun 16 11:16:37 2004
> +++ msdosfs_fat.c       Wed Jun 16 11:15:41 2004
> @@ -434,8 +434,9 @@
>                         for (cn = 0; cn < pmp->pm_maxcluster; cn += N_INUSEBITS)
                                       ^^^^^^^^^^^^^^^^^^^^^^^
>                                 if (pmp->pm_inusemap[cn / N_INUSEBITS] != (u_int)-1)
>                                         break;
> -                       pmp->pm_nxtfree = cn
> -                + ffs(pmp->pm_inusemap[cn / N_INUSEBITS]^(u_int)-1) - 1;
> +                       pmp->pm_nxtfree = (cn < pmp->pm_maxcluster)?
                                             ^^^^^^^^^^^^^^^^^^^^^^^

Er, this condition is always satisfied, so the patch has no effect.

> +                       (cn + ffs(pmp->pm_inusemap[cn / N_INUSEBITS]^(u_int)-1) - 1)
> +                       :0;
>                 }
>                 if (bread(pmp->pm_devvp, pmp->pm_fsinfo, fsi_size(pmp),
>                     NOCRED, &bpn) != 0) {
>

cn+0 and even cn+1 are within bounds, but cn+ffs(...) can apparently
be larger.  There should be no problem, since pm_inusemap[] is supposed
to have 1 bits corresponding to clusters after the last one (if the
last one is before the end of the bitmap).

I can't see any bug here.  I can see a related one that should be harmless
due to other bugs: RELENG_4 (and thus FreeBSD-4.9) has a serious off-by-1
error in the allocation of the bitmap.  From msdosfsmountfs() in RELENG_4:

% 	pmp->pm_inusemap = malloc(((pmp->pm_maxcluster + N_INUSEBITS - 1)
% 				   / N_INUSEBITS)
% 				  * sizeof(*pmp->pm_inusemap),
% 				  M_MSDOSFSFAT, M_WAITOK);

pm_maxcluster actually is the maximum cluster number, so 1 must be added
to it to give the size (in bits) of a minimal bitmap.  Not doing this
gives a too-small allocation (off by 1 32-bit word) iff pm->pm_maxcluster
is a multiple of 32.  Allocation sizes are rounded up to an allocation
or page boundary, so the bug is only serious of pm_maxcluster is a
multiple of PAGE_SIZE * NBBY or something like that.  It rarely happens.
When it happens, the word after the end of the malloced region is
clobbered, starting with initialization of the bitmap in fillinusemap():

% % 	/*
% 	 * Mark all clusters in use, we mark the free ones in the fat scan
% 	 * loop further down.
% 	 */
% 	for (cn = 0; cn < (pmp->pm_maxcluster + N_INUSEBITS) / N_INUSEBITS; cn++)
% 		pmp->pm_inusemap[cn] = (u_int)-1;

This is missing the off-by-1 error, so it initializes the last word in the
bitmap correctly, but this word may be beyond the end of the allocated
region so writes to it may clobber unrelated variables or be undone by
writes to unrelated variables.

The allocation bug is fixed in rev.1.118 of msdsofs_vfsops.c in -current,
but I forgot to merge the fix.

Nearby (old) bugs:

% 	/*
% 	 * If we have an FSInfo block, update it.
% 	 */
% 	if (pmp->pm_fsinfo) {
% 		u_long cn = pmp->pm_nxtfree;
%
% 		if (pmp->pm_freeclustercount
  		    ^^^^^^^^^^^^^^^^^^^^^^^^ (2)
% 		    && (pmp->pm_inusemap[cn / N_INUSEBITS]
% 			& (1 << (cn % N_INUSEBITS)))) {
% 			/*
% 			 * The cluster indicated in FSInfo isn't free
% 			 * any longer.  Got get a new free one.
% 			 */
% 			for (cn = 0; cn < pmp->pm_maxcluster; cn += N_INUSEBITS)
  			             ^^^^^^^^^^^^^^^^^^^^^^^ (1)
% 				if (pmp->pm_inusemap[cn / N_INUSEBITS] != (u_int)-1)
% 					break;
% 			pmp->pm_nxtfree = cn
% 				+ ffs(pmp->pm_inusemap[cn / N_INUSEBITS]
% 				      ^ (u_int)-1) - 1;
% 		}

(1) Off-by-1 error.  pm_maxcluster actually is the maximum cluster number,
    so the loop terminates 1 too early and if pm_maxcluster is a multiple
    of N_INUSEBITS.  This is actually a feature -- it compensates for
    the off-by-1 error for allocation, so garbage in the unallocated last
    word cannot be the direct cause of the probem in the PR.
(2) Bogus checking of pmp->pm_freeclustercount.  That variable being
    zero doesn't mean that pmp->pm_nxtfree is free; it just means that
    the loop is not worth doing because it would not find a free cluster.
    If we actually cared about the exported version of pmp->pm_nxtfree
    being free, then we should set pmp->pm_nxtfree to the magic value of
    0xffffffff or the nominal value of CLUST_FIRST if
    pmp->pm_freeclustercount is 0.
(3) General bogusness.  The exported version of pmp->pm_nxtfree is supposed
    to give a hint about a good place to start searching for a free
    cluster.  The loop always starts searching at the bad place 0.
    This is inefficent (FATs and their bitmaps can be very large, and
    by starting at cluster 0 we increase the chance of having to search
    through a large number of allocated clusters).  If we want
    inefficiency (but not so much), then we can just export the magic
    or nominal value.  That way, the the FAT only needs to be searched
    starting from near cluster 0 once when a new file system instance
    starts up (or performs an allocation) instead of on potentially many
    FAT updates.  We actually implement much larger inefficiencies by
    using random allocation for the first cluster of each file.  This
    mainly ensures that average seeks are large, but as a side effect
    it makes the hint provided by pmp->pm_nxtfree useless.  We just
    ignore the hint except for updating it when we update the FAT, but
    this results in pmp->pm_nxtfree being out of date when the FAT is
    updated, so it is very likely to be unfree when it should be
    very likely to be free, so the the slow search is very likely to
    be needed.

Summary: there is a lot of brokenness here; a good quick fix for RELENG_4
would be to remove all traces of pm_nxtfree and except to write a harmless
constant value (0xffffffff?) when exporting it.

Bruce