From owner-freebsd-hackers Sun Mar 2 18:20:43 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id SAA21593 for hackers-outgoing; Sun, 2 Mar 1997 18:20:43 -0800 (PST) Received: from dg-rtp.dg.com (dg-rtp.rtp.dg.com [128.222.1.2]) by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id SAA21585 for ; Sun, 2 Mar 1997 18:20:38 -0800 (PST) Received: by dg-rtp.dg.com (5.4R3.10/dg-rtp-v02) id AA14812; Sun, 2 Mar 1997 21:20:05 -0500 Received: from ponds by dg-rtp.dg.com.rtp.dg.com; Sun, 2 Mar 1997 21:20 EST Received: from lakes.water.net (lakes [10.0.0.3]) by ponds.water.net (8.8.3/8.7.3) with ESMTP id UAA21548; Sun, 2 Mar 1997 20:30:48 -0500 (EST) Received: (from rivers@localhost) by lakes.water.net (8.8.3/8.6.9) id UAA18920; Sun, 2 Mar 1997 20:36:24 -0500 (EST) Date: Sun, 2 Mar 1997 20:36:24 -0500 (EST) From: Thomas David Rivers Message-Id: <199703030136.UAA18920@lakes.water.net> To: ponds!root.com!dg, ponds!freefall.cdrom.com!freebsd-hackers Subject: current status of "dup alloc" problem. Content-Type: text Sender: owner-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Recall, this is a 2.1.6.1 machine that reliably demonstrates the "dup alloc" problem (a 386dx-33 with 12megs, i387, aha1542B) However, I've determined the problem exists with 2.2-GAMMA, and also am seeing it on an IDE machine (with 8 megs). The problem has also been reported on Pentiums and 486's; as well as indications that it can happen when an MFS is used. Here's a recap of everything I've examined thus far: 1) Initially, I thought it was a problem in the ffs_valloc() routines; something to do with the inode allocation bit-map. However, investigation here revealed that was not the case. 2) Kirk McKusick pointed the way to something "fishy" in newfs. I examined newfs to ensure that, in fact, empty inode's were being zero'd out. newfs is, indeed, performing the correct write() call to fill the particular inode in question with 0's. 3) At this point, I wrote a version of clri that doesn't clear the inode, but instead prints the current values and fills it with 0xff's. This way, I can now "trash" the inode; and run newfs to see if it gets properly filled in with 0s. 4) Then, I began investigating why the zero's don't appear on the disk, when the write() call in newfs indicates success. 5) Initially, I thought perhaps something was funny in disksort(), where it was "loosing" a "struct buf *". My investigations indicate there's nothing wrong there (disksort() is so fundamental, and straight-forward, I was reassured to be wrong..) 6) However; I found that by adding printf()s to disksort(), I could cause the problem to be masked. This points the way to some critical timing problem. 7) I've been looking into "struct buf" management; particularly when the buf is removed from the device's queue. This all appears to be in order. I've now got a kernel with printf()s in sdstart() to indicate when the buf in question (the one with the physical blockno (b->pblkno) that matches the problematic blocks) is removed from the device queue. (Indicating that the SCSI write has started); and another in scsi_done() just before the biodone() call (again, testing the b_pblkno.) This would indicate that the buf is properly removed from the device queue; the I/O is started; and by the order of my printf()s, the I/O completes and biodone() is called. All of that is exactly the proper sequence of events... *however* my disk block contains 0xffffff and not 0x00 :-( 8) I now have another guess... I have determined that the device queue never grows beyond 1 I/O operation (recall, I'm doing this using the fixit floppy; nothing else is touching the disk, except my newfs operation) I've also determined that the same buffer is used over-and-over. Simply being added to the device queue, removed, added, etc... Could it be that this same buffer is inappropriately re-used, before biodone() has been called. And since the low-level SCSI routines use the data from the buffer, the operation is actually going to the wrong spot on the disk? This seems unlikely, as the check I have in scsi_done() examines the b_pblkno field to determine if the debugging printf() should be performed. Since that printf() approriately appears on my console, I know that, at least that field hasn't been inappropriately re-used. Perhaps the number of bytes to transfer is getting reduced, or something like that? The problem with that side-theory is that it assumes an issue in the SCSI device driver; and this problem is evident on everything from MFS to IDE... Ideas/Opinions/Just-about-anything is welcome. - Dave Rivers -