From owner-freebsd-scsi@FreeBSD.ORG Sun Jun 1 14:45:05 2003 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EF02237B401 for ; Sun, 1 Jun 2003 14:45:04 -0700 (PDT) Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4A7FD43F75 for ; Sun, 1 Jun 2003 14:45:03 -0700 (PDT) (envelope-from kern@sibbald.com) Received: from [192.168.68.112] (rufus [192.168.68.112]) by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h51Lirv02464; Sun, 1 Jun 2003 23:44:53 +0200 From: Kern Sibbald To: "Justin T. Gibbs" In-Reply-To: <2846020000.1054498114@aslan.scsiguy.com> References: <1054490081.1582.1685.camel@rufus> <2846020000.1054498114@aslan.scsiguy.com> Content-Type: text/plain Organization: Message-Id: <1054503893.1578.1723.camel@rufus> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.4 Date: 01 Jun 2003 23:44:53 +0200 Content-Transfer-Encoding: 7bit cc: freebsd-scsi@freebsd.org cc: mjacob@feral.com Subject: Re: SCSI tape data loss X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Jun 2003 21:45:05 -0000 Hello again, I just re-read the Linux mt pages, and I see that they have a setting both for async-writes and buffer-writes, so I'm now confused about what the distinction really is. I had assumed that if you are buffering then the writes must be asynchronous, otherwise why would you buffer? Best regards, Kern On Sun, 2003-06-01 at 22:08, Justin T. Gibbs wrote: > > Hello, > > > > I'm the author of a GPL'ed network backup program called > > Bacula (www.bacula.org). For the last three years, it > > has been working flawlessly on Solaris and Linux systems. > > When users attempted to use it recently on FreeBSD, > > it did not work. I subsequently modified Bacula so that > > it would work on FreeBSD -- basically, I had to program > > around some important differences in the way FreeBSD > > handles EOFs compared to Solaris and Linux. At some point > > in the future, I would like to discuss the problems > > I had in detail, if that interests you. > > I would be interested as I'm sure would other readers of this > list. > > > We've now worked on this problem for several weeks, and > > I believe we have now isolated the problem (data loss) to occur > > when the end of medium is reached. > > > > We have now confirmed that Bacula correctly wrote > > to the tape, but when it was read back 13 blocks > > of 64512 bytes were missing. > > > > Below, I have listed in pseudo-language what > > Bacula was doing. Each write with the exception > > of the first block on the second tape is 64512 > > bytes: > > > > first tape mounted > > write(block 1) > > ... > > write(block 1554); > > write(block 1555); <=== block lost > > ... <=== blocks lost > > write(block 1567); <=== block lost > > write(block 1568) failed because of EOM detected > > ioctl(MTIOCERRSTAT); > > What was the residual reported by MTIOCERRSTAT? If the > device is in buffered mode, that residual can be larger than > the last transaction that was failed. My guess is that either > MTIOCERRSTAT is not properly pulling the residual out of the > info field, or you are not backing up far enough in the data > stream when the EOM occurs. > > > I have verified that Bacula did successfully write 1567 blocks to the > > first tape, but in reading back the tape, blocks 1555-1567 are not > > on the tape. > > > > Now, the big question is: what caused the loss of those blocks? > > The most likely causes I can think of are: > > > > 1. Bacula is doing something (e.g. MTIOCERRSTAT, or the MTBSF) > > to cause the data to be lost. If this is the case, it is > > something specific to FreeBSD since this sequence of commands > > works on both Solaris and Linux (except that MTIOCERRSTAT is > > MTIOCLRERR on those systems). > > Perhaps both Linux and Solaris force the tape drives to run in > unbuffered mode? > > > 2. The SCSI driver is doing asynchronous writes (very bad) and > > the End of Medium is not sent to Bacula until many writes after > > the end of the tape. > > Disabling the tape drive's write buffer kills performance. All > of the information required to handle buffered writes should be > available to you. > > Perhaps we should also implement the MTCACHE/MTNOCACHE opcodes so > that userland apps can control this. It's not clear if this is > exactly what they were created for, but it may be better to use > these than to add some other opcodes. > > > 3. The SCSI driver has some sort of bug that causes buffers to be > > lost. > > I doubt that this would occur only at EOM. > > -- > Justin