From owner-freebsd-bugs@FreeBSD.ORG Wed Aug 18 00:32:50 2004 Return-Path: Delivered-To: freebsd-bugs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1F01216A4CE; Wed, 18 Aug 2004 00:32:50 +0000 (GMT) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6BD8B43D41; Wed, 18 Aug 2004 00:32:49 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.0.87])i7I0Wlje015342; Wed, 18 Aug 2004 10:32:47 +1000 Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) i7I0WhBX017417; Wed, 18 Aug 2004 10:32:44 +1000 Date: Wed, 18 Aug 2004 10:32:42 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Amit Khivesara In-Reply-To: <200408062032.i76KWelQ051149@www.freebsd.org> Message-ID: <20040818081847.J51701@delplex.bde.org> References: <200408062032.i76KWelQ051149@www.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-bugs@FreeBSD.org cc: freebsd-gnats-submit@FreeBSD.org Subject: Re: misc/70096: Full msdos file system causes corruption X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Aug 2004 00:32:50 -0000 On Fri, 6 Aug 2004, Amit Khivesara wrote: > >Description: > When the file system gets full, then it stores a incorrect > value of nxtfree into the disk fsinfo. > > This will cause the assert in : > /* > * Check and validate (or perhaps invalidate?) the fsinfo structure? > */ > if (pmp->pm_fsinfo && pmp->pm_nxtfree > pmp->pm_maxcluster) { > printf( > "Next free cluster in FSInfo (%lu) exceeds maxcluster (%lu)\n", > pmp->pm_nxtfree, pmp->pm_maxcluster); > error = EINVAL; > goto error_exit; > } > > > to be fired. I think failure of this check shouldn't cause a mount failure, since pmp->pm_nxtfree is only advisory and quite often doesn't give a cluster that is actually free. IIRC, Windows (95,XP) doesn't care what it is and doesn't fix it in scandisk or chkdsk after FreeBSD has messed it up. The check seems to have been fixed incorrectly in revs.1.92 and 1.117 of msdosfs_vfsops.c. Apart from the bug reported in this PR, there only seems to be a problem with the magic value of 0xffffffff which is sometimes stored in the fsinfo corresponding to pm_nxtfree. Old versions didn't do anything here. NextBSD (at least in rev.1.14 of msdosfs_vfsops.c) just silently converts the (wrong on 64-bit machines?) value of (u_long)-1 to the slightly wrong value of 0 (cluster can never be free). Rev.1.92 in the FreeBSD version added the above error handling. It catches the magic value as a special case of a too-large value. That was wrong, and the problem was worked around in rev.1.117 by converting 0xffffffff to the slightly less wrong value of CLUST_FIRST (CLUST_FIRST can be free but rarely is). > >Fix: > --- msdosfs_fat.c.orig Wed Jun 16 11:16:37 2004 > +++ msdosfs_fat.c Wed Jun 16 11:15:41 2004 > @@ -434,8 +434,9 @@ > for (cn = 0; cn < pmp->pm_maxcluster; cn += N_INUSEBITS) ^^^^^^^^^^^^^^^^^^^^^^^ > if (pmp->pm_inusemap[cn / N_INUSEBITS] != (u_int)-1) > break; > - pmp->pm_nxtfree = cn > - + ffs(pmp->pm_inusemap[cn / N_INUSEBITS]^(u_int)-1) - 1; > + pmp->pm_nxtfree = (cn < pmp->pm_maxcluster)? ^^^^^^^^^^^^^^^^^^^^^^^ Er, this condition is always satisfied, so the patch has no effect. > + (cn + ffs(pmp->pm_inusemap[cn / N_INUSEBITS]^(u_int)-1) - 1) > + :0; > } > if (bread(pmp->pm_devvp, pmp->pm_fsinfo, fsi_size(pmp), > NOCRED, &bpn) != 0) { > cn+0 and even cn+1 are within bounds, but cn+ffs(...) can apparently be larger. There should be no problem, since pm_inusemap[] is supposed to have 1 bits corresponding to clusters after the last one (if the last one is before the end of the bitmap). I can't see any bug here. I can see a related one that should be harmless due to other bugs: RELENG_4 (and thus FreeBSD-4.9) has a serious off-by-1 error in the allocation of the bitmap. From msdosfsmountfs() in RELENG_4: % pmp->pm_inusemap = malloc(((pmp->pm_maxcluster + N_INUSEBITS - 1) % / N_INUSEBITS) % * sizeof(*pmp->pm_inusemap), % M_MSDOSFSFAT, M_WAITOK); pm_maxcluster actually is the maximum cluster number, so 1 must be added to it to give the size (in bits) of a minimal bitmap. Not doing this gives a too-small allocation (off by 1 32-bit word) iff pm->pm_maxcluster is a multiple of 32. Allocation sizes are rounded up to an allocation or page boundary, so the bug is only serious of pm_maxcluster is a multiple of PAGE_SIZE * NBBY or something like that. It rarely happens. When it happens, the word after the end of the malloced region is clobbered, starting with initialization of the bitmap in fillinusemap(): % % /* % * Mark all clusters in use, we mark the free ones in the fat scan % * loop further down. % */ % for (cn = 0; cn < (pmp->pm_maxcluster + N_INUSEBITS) / N_INUSEBITS; cn++) % pmp->pm_inusemap[cn] = (u_int)-1; This is missing the off-by-1 error, so it initializes the last word in the bitmap correctly, but this word may be beyond the end of the allocated region so writes to it may clobber unrelated variables or be undone by writes to unrelated variables. The allocation bug is fixed in rev.1.118 of msdsofs_vfsops.c in -current, but I forgot to merge the fix. Nearby (old) bugs: % /* % * If we have an FSInfo block, update it. % */ % if (pmp->pm_fsinfo) { % u_long cn = pmp->pm_nxtfree; % % if (pmp->pm_freeclustercount ^^^^^^^^^^^^^^^^^^^^^^^^ (2) % && (pmp->pm_inusemap[cn / N_INUSEBITS] % & (1 << (cn % N_INUSEBITS)))) { % /* % * The cluster indicated in FSInfo isn't free % * any longer. Got get a new free one. % */ % for (cn = 0; cn < pmp->pm_maxcluster; cn += N_INUSEBITS) ^^^^^^^^^^^^^^^^^^^^^^^ (1) % if (pmp->pm_inusemap[cn / N_INUSEBITS] != (u_int)-1) % break; % pmp->pm_nxtfree = cn % + ffs(pmp->pm_inusemap[cn / N_INUSEBITS] % ^ (u_int)-1) - 1; % } (1) Off-by-1 error. pm_maxcluster actually is the maximum cluster number, so the loop terminates 1 too early and if pm_maxcluster is a multiple of N_INUSEBITS. This is actually a feature -- it compensates for the off-by-1 error for allocation, so garbage in the unallocated last word cannot be the direct cause of the probem in the PR. (2) Bogus checking of pmp->pm_freeclustercount. That variable being zero doesn't mean that pmp->pm_nxtfree is free; it just means that the loop is not worth doing because it would not find a free cluster. If we actually cared about the exported version of pmp->pm_nxtfree being free, then we should set pmp->pm_nxtfree to the magic value of 0xffffffff or the nominal value of CLUST_FIRST if pmp->pm_freeclustercount is 0. (3) General bogusness. The exported version of pmp->pm_nxtfree is supposed to give a hint about a good place to start searching for a free cluster. The loop always starts searching at the bad place 0. This is inefficent (FATs and their bitmaps can be very large, and by starting at cluster 0 we increase the chance of having to search through a large number of allocated clusters). If we want inefficiency (but not so much), then we can just export the magic or nominal value. That way, the the FAT only needs to be searched starting from near cluster 0 once when a new file system instance starts up (or performs an allocation) instead of on potentially many FAT updates. We actually implement much larger inefficiencies by using random allocation for the first cluster of each file. This mainly ensures that average seeks are large, but as a side effect it makes the hint provided by pmp->pm_nxtfree useless. We just ignore the hint except for updating it when we update the FAT, but this results in pmp->pm_nxtfree being out of date when the FAT is updated, so it is very likely to be unfree when it should be very likely to be free, so the the slow search is very likely to be needed. Summary: there is a lot of brokenness here; a good quick fix for RELENG_4 would be to remove all traces of pm_nxtfree and except to write a harmless constant value (0xffffffff?) when exporting it. Bruce