From owner-freebsd-hackers  Sun Mar  2 18:20:43 1997
Return-Path: <owner-hackers>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.5/8.8.5) id SAA21593
          for hackers-outgoing; Sun, 2 Mar 1997 18:20:43 -0800 (PST)
Received: from dg-rtp.dg.com (dg-rtp.rtp.dg.com [128.222.1.2])
          by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id SAA21585
          for <freebsd-hackers@freefall.FreeBSD.org>; Sun, 2 Mar 1997 18:20:38 -0800 (PST)
Received: by dg-rtp.dg.com (5.4R3.10/dg-rtp-v02)
	id AA14812; Sun, 2 Mar 1997 21:20:05 -0500
Received: from ponds by dg-rtp.dg.com.rtp.dg.com; Sun,  2 Mar 1997 21:20 EST
Received: from lakes.water.net (lakes [10.0.0.3]) by ponds.water.net (8.8.3/8.7.3) with ESMTP id UAA21548; Sun, 2 Mar 1997 20:30:48 -0500 (EST)
Received: (from rivers@localhost) by lakes.water.net (8.8.3/8.6.9) id UAA18920; Sun, 2 Mar 1997 20:36:24 -0500 (EST)
Date: Sun, 2 Mar 1997 20:36:24 -0500 (EST)
From: Thomas David Rivers <ponds!rivers@dg-rtp.dg.com>
Message-Id: <199703030136.UAA18920@lakes.water.net>
To: ponds!root.com!dg, ponds!freefall.cdrom.com!freebsd-hackers
Subject: current status of "dup alloc" problem.
Content-Type: text
Sender: owner-hackers@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk


Recall, this is a 2.1.6.1 machine that reliably demonstrates
the "dup alloc" problem (a 386dx-33 with 12megs, i387, aha1542B)
However, I've determined the problem exists with 2.2-GAMMA, and also
am seeing it on an IDE machine (with 8 megs).  The problem has also
been reported on Pentiums and 486's; as well as indications that it
can happen when an MFS is used.

Here's a recap of everything I've examined thus far:

	1) Initially, I thought it was a problem in the ffs_valloc()
	   routines; something to do with the inode allocation
	   bit-map.  However, investigation here revealed that was
	   not the case.

	2) Kirk McKusick pointed the way to something "fishy" in newfs.
	   I examined newfs to ensure that, in fact, empty inode's
	   were being zero'd out.  newfs is, indeed, performing the
	   correct write() call to fill the particular inode in question
	   with 0's.

	3) At this point, I wrote a version of clri that doesn't clear
	   the inode, but instead prints the current values and fills
	   it with 0xff's.  This way, I can now "trash" the inode;
	   and run newfs to see if it gets properly filled in with 0s.

	4) Then, I began investigating why the zero's don't appear on
	   the disk, when the write() call in newfs indicates success.

	5) Initially, I thought perhaps something was funny in disksort(),
	   where it was "loosing" a "struct buf *".  My investigations
	   indicate there's nothing wrong there (disksort() is so fundamental,
	   and straight-forward, I was reassured to be wrong..)

	6) However; I found that by adding printf()s to disksort(), I could
	   cause the problem to be masked.  This points the way to some
	   critical timing problem.

	7) I've been looking into "struct buf" management; particularly
	   when the buf is removed from the device's queue.  This all appears
	   to be in order.   I've now got a kernel with printf()s in
	   sdstart() to indicate when the buf in question (the one with
	   the physical blockno (b->pblkno) that matches the problematic
	   blocks) is removed from the device queue.  (Indicating
	   that the SCSI write has started); and another in scsi_done()
	   just before the biodone() call (again, testing the b_pblkno.)

	   This would indicate that the buf is properly removed from the
	   device queue; the I/O is started; and by the order of my
	   printf()s, the I/O completes and biodone() is called.  All 
	   of that is exactly the proper sequence of events...
	     *however*
	   my disk block contains 0xffffff and not 0x00  :-(

	8) I now have another guess...  I have determined that
 	   the device queue never grows beyond 1 I/O operation (recall,
	   I'm doing this using the fixit floppy; nothing else is
	   touching the disk, except my newfs operation) I've also
	   determined that the same buffer is used over-and-over.  Simply
	   being added to the device queue, removed, added, etc...

	   Could it be that this same buffer is inappropriately re-used,
	   before biodone() has been called. And since the low-level
	   SCSI routines use the data from the buffer, the operation is
	   actually going to the wrong spot on the disk?

	   This seems unlikely, as the check I have in scsi_done() 
	   examines the b_pblkno field to determine if the debugging
	   printf() should be performed.  Since that printf() approriately
	   appears on my console, I know that, at least that field hasn't
	   been inappropriately re-used.  Perhaps the number of bytes
	   to transfer is getting reduced, or something like that?

	   The problem with that side-theory is that it assumes an issue
	   in the SCSI device driver; and this problem is evident on
	   everything from MFS to IDE...


 Ideas/Opinions/Just-about-anything is welcome.

	- Dave Rivers -