Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 23 Aug 2001 06:36:18 -0700
From:      Kirk McKusick <mckusick@mckusick.com>
To:        Ian Dowse <iedowse@maths.tcd.ie>
Cc:        freebsd-current@freebsd.org, dwmalone@maths.tcd.ie, Ollivier Robert <roberto@keltia.freenix.fr>, Mikhail Teterin <mi@aldan.algebra.com>
Subject:   Re: fsck setting d_ino == 0 (was Re: filesystem errors) 
Message-ID:  <200108231336.f7NDaJR05765@beastie.mckusick.com>
In-Reply-To: Your message of "Wed, 22 Aug 2001 01:21:03 BST." <200108220121.aa40822@salmon.maths.tcd.ie> 

next in thread | previous in thread | raw e-mail | index | archive | help
	To: Kirk McKusick <mckusick@mckusick.com>
	cc: freebsd-current@freebsd.org, dwmalone@maths.tcd.ie,
	   Ollivier Robert <roberto@keltia.freenix.fr>,
	   Mikhail Teterin <mi@aldan.algebra.com>
	Subject: fsck setting d_ino == 0 (was Re: filesystem errors) 
	In-Reply-To: Your message of "Sat, 28 Jul 2001 12:48:54 PDT."
		     <200107281948.f6SJmso06543@beastie.mckusick.com> 
	Date: Wed, 22 Aug 2001 01:21:03 +0100
	From: Ian Dowse <iedowse@maths.tcd.ie>

	In message <200107281948.f6SJmso06543@beastie.mckusick.com>,
	    Kirk McKusick writes:
	>FFS will never set a directory ino == 0 at a location other
	>than the first entry in a directory, but fsck will do so to
	>get rid of an unwanted entry. The readdir routines know to
	>skip over an ino == 0 entry no matter where in the directory
	>it is found, so applications will never see such entries.
	>It would be a fair amount of work to change fsck to `do the
	>right thing', as the checking code is given only the current
	>entry with which to work. I am of the opinion that you
	>should simply accept that mid-directory block ino == 0 is
	>acceptable rather than trying to `fix' the problem.

	Bleh, well I guess not too surprisingly, there is a case in
	ufs_direnter() (ufs_lookup.c) where the kernel does the wrong thing
	when a mid-block entry has d_ino == 0. The result can be serious
	directory corruption, and the bug has been there since the Lite/2
	merge:

		# fetch http://www.maths.tcd.ie/~iedowse/FreeBSD/dirbug_img.gz
		Receiving dirbug_img.gz (6745 bytes): 100%
		6745 bytes transferred in 0.0 seconds (4.69 MBps)
		# gunzip dirbug_img.gz
		# mdconfig -a -t vnode -f dirbug_img
		md0
		# fsck_ffs /dev/md0
		** /dev/md0
		** Last Mounted on /mnt
		** Phase 1 - Check Blocks and Sizes
		** Phase 2 - Check Pathnames
		** Phase 3 - Check Connectivity
		** Phase 4 - Check Reference Counts
		** Phase 5 - Check Cyl groups
		20 files, 1 used, 2638 free (14 frags, 328 blocks, 0.5% fragmentation)
		# mount /dev/md0 /mnt
		# touch /mnt/ffffffffff12
		# umount /mnt
		# fsck_ffs /dev/md0
		** /dev/md0
		** Last Mounted on /mnt
		** Phase 1 - Check Blocks and Sizes
		** Phase 2 - Check Pathnames
		DIRECTORY CORRUPTED  I=2  OWNER=root MODE=40755
		SIZE=512 MTIME=Aug 21 22:28 2001 
		DIR=/

		SALVAGE? [yn]

	The bug is that when compressing directory blocks, the code trusts
	the DIRSIZ() macro to calculate the amount of data to be bcopy'd
	when moving a directory entry. If d_ino is zero, DIRSIZ() cannot
	be trusted, so random bytes in unused portions of the directory
	determine how much gets copied. I think it is very unlikely in
	practice for the value returned by DIRSIZ() to be harmful, but fsck
	certainly doesn't check it so this bug can be triggered after other
	types of corruption have been repaired by fsck.

	I just found this while looking for a dirhash bug - the dirhash
	code didn't check for d_ino == 0 when compressing directories,
	so it would freak when it couldn't find the entry to move. The
	patch below should fix both these issues, and it makes it clearer
	that DIRSIZ() is not used when d_ino == 0.

	Any comments welcome. The patch is a bit larger than it needs to
	be, but that directory compression code is so hard to understand
	that I think it is worth clarifying it slightly :-)

	Ian

The compaction code started out deeply nested and highly
conditional. I was very happy to get it down to one for loop
with single nested conditionals. That being said, it is still
pretty hard to follow. Anyway, I agree with your change. It is
amazing to me that that bug has been present since the day the
code was written (1983) and has not been noticed until now.

	Kirk


Index: ufs_lookup.c
===================================================================
RCS file: /FreeBSD/FreeBSD-CVS/src/sys/ufs/ufs/ufs_lookup.c,v
retrieving revision 1.52
diff -u -r1.52 ufs_lookup.c
--- ufs_lookup.c	2001/08/18 03:08:48	1.52
+++ ufs_lookup.c	2001/08/21 23:59:09
@@ -869,26 +869,38 @@
	 * dp->i_offset + dp->i_count would yield the space.
	 */
	ep = (struct direct *)dirbuf;
-	dsize = DIRSIZ(OFSFMT(dvp), ep);
+	dsize = ep->d_ino ? DIRSIZ(OFSFMT(dvp), ep) : 0;
	spacefree = ep->d_reclen - dsize;
	for (loc = ep->d_reclen; loc < dp->i_count; ) {
		nep = (struct direct *)(dirbuf + loc);
-		if (ep->d_ino) {
-			/* trim the existing slot */
-			ep->d_reclen = dsize;
-			ep = (struct direct *)((char *)ep + dsize);
-		} else {
-			/* overwrite; nothing there; header is ours */
-			spacefree += dsize;
+
+		/* Trim the existing slot (NB: dsize may be zero). */
+		ep->d_reclen = dsize;
+		ep = (struct direct *)((char *)ep + dsize);
+
+		loc += nep->d_reclen;
+		if (nep->d_ino == 0) {
+			/*
+			 * A mid-block unused entry. Such entries are
+			 * never created by the kernel, but fsck_ffs
+			 * can create them (and it doesn't fix them).
+			 *
+			 * Add up the free space, and initialise the
+			 * relocated entry since we don't bcopy it.
+			 */
+			spacefree += nep->d_reclen;
+			ep->d_ino = 0;
+			dsize = 0;
+			continue;
		}
		dsize = DIRSIZ(OFSFMT(dvp), nep);
		spacefree += nep->d_reclen - dsize;
 #ifdef UFS_DIRHASH
		if (dp->i_dirhash != NULL)
-			ufsdirhash_move(dp, nep, dp->i_offset + loc,
+			ufsdirhash_move(dp, nep,
+			    dp->i_offset + ((char *)nep - dirbuf),
			    dp->i_offset + ((char *)ep - dirbuf));
 #endif
-		loc += nep->d_reclen;
		if (DOINGSOFTDEP(dvp))
			softdep_change_directoryentry_offset(dp, dirbuf,
			    (caddr_t)nep, (caddr_t)ep, dsize); 
@@ -896,6 +908,11 @@
			bcopy((caddr_t)nep, (caddr_t)ep, dsize);
	}
	/*
+	 * Here, `ep' points to a directory entry containing `dsize' in-use
+	 * bytes followed by `spacefree' unused bytes. If ep->d_ino == 0,
+	 * then the entry is completely unused (dsize == 0). The value
+	 * of ep->d_reclen is indeterminate.
+	 *
	 * Update the pointer fields in the previous entry (if any),
	 * copy in the new entry, and write out the block.
	 */

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200108231336.f7NDaJR05765>