From owner-freebsd-scsi@FreeBSD.ORG Sun Jun 1 10:54:46 2003 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1DF5B37B401 for ; Sun, 1 Jun 2003 10:54:46 -0700 (PDT) Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48]) by mx1.FreeBSD.org (Postfix) with ESMTP id 621F043F85 for ; Sun, 1 Jun 2003 10:54:44 -0700 (PDT) (envelope-from kern@sibbald.com) Received: from [192.168.68.112] (rufus [192.168.68.112]) by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h51Hsfv01938 for ; Sun, 1 Jun 2003 19:54:42 +0200 From: Kern Sibbald To: freebsd-scsi@freebsd.org Content-Type: text/plain Organization: Message-Id: <1054490081.1582.1685.camel@rufus> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.4 Date: 01 Jun 2003 19:54:41 +0200 Content-Transfer-Encoding: 7bit Subject: SCSI tape data loss X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Jun 2003 17:54:46 -0000 Hello, I'm the author of a GPL'ed network backup program called Bacula (www.bacula.org). For the last three years, it has been working flawlessly on Solaris and Linux systems. When users attempted to use it recently on FreeBSD, it did not work. I subsequently modified Bacula so that it would work on FreeBSD -- basically, I had to program around some important differences in the way FreeBSD handles EOFs compared to Solaris and Linux. At some point in the future, I would like to discuss the problems I had in detail, if that interests you. However, more recently Dan Langille did some extensive testing writing a 6GB file to six tapes. This brought out additional problems of the driver "freezing" the tape, which I believe I have also programmed around, but worst (and the main reason for this email), Dan discovered that Bacula did not correctly read back the data that was "supposedly" written to the tape. We've now worked on this problem for several weeks, and I believe we have now isolated the problem (data loss) to occur when the end of medium is reached. We have now confirmed that Bacula correctly wrote to the tape, but when it was read back 13 blocks of 64512 bytes were missing. Below, I have listed in pseudo-language what Bacula was doing. Each write with the exception of the first block on the second tape is 64512 bytes: first tape mounted write(block 1) ... write(block 1554); write(block 1555); <=== block lost ... <=== blocks lost write(block 1567); <=== block lost write(block 1568) failed because of EOM detected ioctl(MTIOCERRSTAT); ioctl(MTWEOF); ioctl(MTWEOF); ioctl(MTBSF); ioctl(MTBSF); ioctl(MTBSR); read() returned 0 bytes. ioctl(MTREW); close() new tape mounted. write(block 1); Tape pre-label write(block 1 again); ioctl(MTREW); read(block1); ioctl(MTREW); write(block 1); Tape label write(block 1568); block not written to previous tape. I have verified that Bacula did successfully write 1567 blocks to the first tape, but in reading back the tape, blocks 1555-1567 are not on the tape. Now, the big question is: what caused the loss of those blocks? The most likely causes I can think of are: 1. Bacula is doing something (e.g. MTIOCERRSTAT, or the MTBSF) to cause the data to be lost. If this is the case, it is something specific to FreeBSD since this sequence of commands works on both Solaris and Linux (except that MTIOCERRSTAT is MTIOCLRERR on those systems). 2. The SCSI driver is doing asynchronous writes (very bad) and the End of Medium is not sent to Bacula until many writes after the end of the tape. 3. The SCSI driver has some sort of bug that causes buffers to be lost. There may be other possible reasons that I am unaware of at this moment. Can you shed any light on this problem? If you have any questions concerning the hardware, Dan (dan@langille.com) will be able to provide the answers. Best regards, Kern PS: I am not subscribed to the list so please copy me directly.