From owner-freebsd-scsi@FreeBSD.ORG Mon Jun 2 01:28:54 2003 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D8BBE37B401 for ; Mon, 2 Jun 2003 01:28:52 -0700 (PDT) Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48]) by mx1.FreeBSD.org (Postfix) with ESMTP id 81EDF43FA3 for ; Mon, 2 Jun 2003 01:28:50 -0700 (PDT) (envelope-from kern@sibbald.com) Received: from [192.168.68.112] (rufus [192.168.68.112]) by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h528Scv04594; Mon, 2 Jun 2003 10:28:39 +0200 From: Kern Sibbald To: mjacob@feral.com In-Reply-To: <20030601163730.T97138@beppo> References: <20030601124620.S18592@root.org> <20030601163730.T97138@beppo> Content-Type: text/plain Organization: Message-Id: <1054542517.1578.1770.camel@rufus> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.4 Date: 02 Jun 2003 10:28:38 +0200 Content-Transfer-Encoding: 7bit cc: freebsd-scsi@freebsd.org Subject: Re: SCSI tape data loss X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Jun 2003 08:28:55 -0000 Hello, Yes, I've seen both your name and Justin Gibbs in the FreeBSD documentation (partial answer to question d below). On Mon, 2003-06-02 at 02:13, Matthew Jacob wrote: > Hello, I'm the author of the SA driver. This specific case is something > I have indeed tried to handle correctly, but could have missed something > on. In particular I've been wary of devices in fixed block mode. > > The executive summary: I need more info. I need to know: > > a) was the tape device in fixed or variable block mode The tape drive was in variable block mode. However at the time of the error all the blocks Bacula was writing were the same size i.e. 64512 bytes. > > b) you claim to have lost blocks 1555..1567, and that > 1568 was the signifier to change tapes. Are these tape > blocks reflective of single 'write' requests? Or are > these multiple tape records issued in one write? All Bacula writes are single non-buffered writes of what I call a block (a single tape record). By the way, these block numbers are Bacula block number counting from 1 beginning at the last EOF that Bacula wrote or the beginning of the tape. Each block contains among other things the block number -- this allowed us to identify which blocks were missing with certainty. In addition, Bacula increments the block number at only one place in the code -- after a "successful" write. At the end of the tape Bacula writes the final block number to its database -- we found the correct block number (the last of the missing blocks) in the Bacula database thus "proving" that Bacula actually wrote the blocks. > > c) What was the signifier you got that indicated that it > was time to change tapes (viz block 1568)? -1 and an errno > set? A residual that indicated that some data that you > had requested to be written had not been written. Bacula stops writing and changes tapes under a single condition: the return status from write() does not equal the number of bytes requested to be written. Then a bit of analysis is done and the reason is reported (write error, end of medium, ...). The effect is the same whether it was an end of tape or an I/O error. In this particular case I cannot say with 100% assurance what happened, but I believe that Bacula received a -1 status and errno was ENOSPC. If this point is critical, we can re-run the test with debug code inserted to give us the exact status that was returned. It is a bit of work for us both, but if it is important, say so and we will do it. > > > d) Other general info about whether you were indeed using > the 'no-rewind' device, whether you'd changed the default > EOT model (from 'dual filemark' to 'single filemark'- you > *have* read the man pages, yes? :-)) Yes, we were using the no rewind device (though this makes absolutely no difference to Bacula). The EOT model was 2 EOF's I am sure because I questioned Dan on that point and he proved it was 2. Yes, I have read the man pages several times in detail as well as the man pages for Linux and Solaris. > > > > There is one case I'm also worried about. This is from sa.c:saerror: > > if (csio->cdb_io.cdb_bytes[0] == SA_WRITE) { > if (sense_key == SSD_KEY_VOLUME_OVERFLOW) { > csio->resid = resid; > error = ENOSPC; > } else if (sense->flags & SSD_EOM) { > softc->flags |= SA_FLAG_EOM_PENDING; > /* > * Grotesque as it seems, the few times > * I've actually seen a non-zero resid, > * the tape drive actually lied and had > * writtent all the data!. > */ > csio->resid = 0; > } > > This is saying: if we were writing, and we got SSD_KEY_VOLUME_OVERFLOW, > we're at hard EOT- we have to assume we didn't write *any* data > for this last operation, and we return an errno. > > Otherwise, if early warning was spotted, mark EOM pending, but *don't* > believe the residual field. In both Solaris and Linux, they immediately notify Bacula with an errno=ENOSPC (or at least a -1 status) when the early warning is hit. When this happens, the write was not successful. I then immediately clear the error status and write two EOF marks (one would be sufficient) and change tapes. > > Every tape drive I'd tested with (and this was around 7 or 8) had all, > when presenting a non-zero residual, had lied about what they actually > had put on the tape. > > What I'm obviously worried about here is whether or not your tape drive > was correct in reporting a residual. This would indeed fit your data. > > I'm pretty sure I also tested my EOT test program with an Archive > autoloader- but I don't remember for sure. As long as the full record is not successfully written to the tape, Bacula will be 100% data correct because any "short" block that Bacula reads is discarded. Logically what you should do is that if a partial or full record is written to tape and an early EOM mark is detected, you should return a the bytes written and set a flag. On the next write, you should report no data written and errno=ENOSPC. That will ensure that every program knows exactly where it is. > > > > Other points: > > > However, more recently Dan Langille did some extensive > > testing writing a 6GB file to six tapes. This brought > > out additional problems of the driver "freezing" the tape, > > If the tape is 'freezing' it means that tape position was lost. > Under what circumstances did this occor? Yes, the tape is "freezing". I do not believe that it is freezing during the writing, but it apparently freezes during Bacula's check to see whether or not the last block was correctly written. This check fails on FreeBSD probably for two reasons: 1. you freeze the tape. 2. your handling of EOF marks does not correspond to what Solaris/Linux does. Point 1 freezing of the tape: At EOM (or I/O error) Bacula writes two EOF marks, backspaces over them, backspaces a record then rereads the record and compares it to the last block successfully written. This works perfectly on Solaris/Linux, but does not work on FreeBSD. One reason is that I believe you freeze the tape on the backspace record. Point 2 handling of EOF marks: FreeBSD's handling of EOF marks is quite different from Solaris/Linux in the sense that Solaris/Linux is "transparent" -- the program writer never sees the extra EOF marks. In FreeBSD the EOF marks that the driver adds are visible to the program (causing Bacula great problems, most of which I have programmed around). Basically the best I can determine after Bacula writes its two EOF marks, FreeBSD adds another one, but leaves the tape positioned after the EOF mark it wrote rather than before it. When Solaris/Linux add an EOF mark in the driver, they always backspace over it and leave you positioned "correctly". As an example: if I write: write() EOF EOF ioctl(MTBSF) ioctl(MTBSF) ioctl(MTBSR) I end up on Linux/Solaris end up positioned just before the last write. On FreeBSD, I seem to always end up positioned *after* the last write, which I claim in BSD tape mode is "incorrect" (i.e. not expected). Actually, if Linux/Solaris are intelligent, they will not write a third EOF mark. If I had only written one EOF mark, they would have added a second one, but backspaced over it before returning control to me.