Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Feb 1997 19:44:08 -0500 (EST)
From:      Thomas David Rivers <ponds!rivers@dg-rtp.dg.com>
To:        ponds!freefall.cdrom.com!freebsd-hackers, ponds!uriah.heep.sax.de!joerg_wunsch, ponds!McKusick.COM!mckusick, ponds!lakes.water.net!rivers
Cc:        ponds!root.com!dg
Subject:   Re: dup alloc panic
Message-ID:  <199702180044.TAA24862@lakes.water.net>

next in thread | raw e-mail | index | archive | help

Well -

 Here's the latest info on what I've discovered.  It's very strange,
and quite suspect.  I'm sending this to solicit ideas...

 First, I've discovered that if I zero out the data on the disk
where this troublesome inode occurs - everything works just fine.
I did this by newfs'ing the file system; then using clri to zero
out the inode in question.  

 Also, once the inode is zero'd out; you have to corrupt it.  That is,
it doesn't appear that newfs is the culprit in getting the bad data
there to begin with... it could just be "stuff" left on your disk.

 So, I've written a trashing version of clri that fills the inode 
with 0xff instead of 0x00.  Also, I've had it print out block offsets 
so I can add printf's to newfs and ensure it is writing zero to that block.



 Here comes the weird part.



 If I trash the inode; and run newfs - I can check again (either
using my version of clri that prints values, or just using fsck to
check the new file system) and the file system is still hosed.
Newfs believes it wrote 0x0's at that block (I've verified that it
lseek'd to that location and wrote 8192 zero bytes there, and that
the return value of the lseek() and write() calls indicated success.)

 *But* if I trash the inode and run newfs two times in a row;
the inode is correctly zero'd out.  I was stunned by this discovery,
and have verified it 3 times; just to ensure I wasn't mistaken.
It doesn't occur this way _every_ time; just some times.  This is
by no means consistent...

 Also, very often when I run my newfs (built optimized with some
printf's in it) the inode will be properly cleared; whereis the
on on the boot and fixit floppies will not clear it.  Then, I'll
perform the test again (with no reboot, nothing, just running
the commands) and my newfs won't clear the inode... although it
indicates it has.  It is anything but regular (and most frustrating.)

 This is an implication that something more low-level (maybe VM,
maybe device driver, except I've seen in on IDE and SCSI hardware)
is going on here; and that the blocks simply are, sometimes, making it 
to the disk.  This is a *highly suspect* hypothesis - especially since 
every other block in-the-world seems to make it on the disk... (see, I
told you it was weird.)  I've been playing with it for two days
now, and can't figure it out.  Some steps I've taken include adding
fsync() calls to wtfs() in mkfs.c, running sync, etc...

 Here's the steps I go through:

	1) Trash inode # 7680.
	2) newfs the file system
	3) fsck the new file system; fsck reports the bad inode and
	    clears it out
	4) Trash inode # 7680
	5) newfs the file system
	6) newfs the file system again
	7) fsck the new file system - no problems reported.

[Or, replace steps 5/6 with "run the boot/fixit floppy newfs" ]

 With that in mind; I can say that I've only machines that
have experienced this problem are 386s.  I don't have the where-with-all
to put together a 486/586 that I can trash in this manner.

 Also, this doesn't seem to jive with Joerg's similar problem when
newfs's MFS file systems. [Joerg - was that even a 386 machine?]

 At this point; I'm up for suggestions...  

	- Dave Rivers -



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199702180044.TAA24862>