From owner-freebsd-scsi@FreeBSD.ORG Sun Jun 1 17:13:47 2003 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6A70F37B401 for ; Sun, 1 Jun 2003 17:13:47 -0700 (PDT) Received: from beppo.feral.com (beppo.feral.com [192.67.166.79]) by mx1.FreeBSD.org (Postfix) with ESMTP id 623FA43F3F for ; Sun, 1 Jun 2003 17:13:46 -0700 (PDT) (envelope-from mjacob@feral.com) Received: from mailhost.feral.com (mjacob@mailhost.feral.com [192.67.166.1]) by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h520Djqw097367; Sun, 1 Jun 2003 17:13:45 -0700 (PDT) (envelope-from mjacob@feral.com) Date: Sun, 1 Jun 2003 17:13:45 -0700 (PDT) From: Matthew Jacob X-X-Sender: mjacob@beppo To: Kern Sibbald In-Reply-To: <20030601124620.S18592@root.org> Message-ID: <20030601163730.T97138@beppo> References: <20030601124620.S18592@root.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-scsi@freebsd.org Subject: Re: SCSI tape data loss X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: mjacob@feral.com List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Jun 2003 00:13:47 -0000 Hello, I'm the author of the SA driver. This specific case is something I have indeed tried to handle correctly, but could have missed something on. In particular I've been wary of devices in fixed block mode. The executive summary: I need more info. I need to know: a) was the tape device in fixed or variable block mode b) you claim to have lost blocks 1555..1567, and that 1568 was the signifier to change tapes. Are these tape blocks reflective of single 'write' requests? Or are these multiple tape records issued in one write? c) What was the signifier you got that indicated that it was time to change tapes (viz block 1568)? -1 and an errno set? A residual that indicated that some data that you had requested to be written had not been written. d) Other general info about whether you were indeed using the 'no-rewind' device, whether you'd changed the default EOT model (from 'dual filemark' to 'single filemark'- you *have* read the man pages, yes? :-)) There is one case I'm also worried about. This is from sa.c:saerror: if (csio->cdb_io.cdb_bytes[0] == SA_WRITE) { if (sense_key == SSD_KEY_VOLUME_OVERFLOW) { csio->resid = resid; error = ENOSPC; } else if (sense->flags & SSD_EOM) { softc->flags |= SA_FLAG_EOM_PENDING; /* * Grotesque as it seems, the few times * I've actually seen a non-zero resid, * the tape drive actually lied and had * writtent all the data!. */ csio->resid = 0; } This is saying: if we were writing, and we got SSD_KEY_VOLUME_OVERFLOW, we're at hard EOT- we have to assume we didn't write *any* data for this last operation, and we return an errno. Otherwise, if early warning was spotted, mark EOM pending, but *don't* believe the residual field. Every tape drive I'd tested with (and this was around 7 or 8) had all, when presenting a non-zero residual, had lied about what they actually had put on the tape. What I'm obviously worried about here is whether or not your tape drive was correct in reporting a residual. This would indeed fit your data. I'm pretty sure I also tested my EOT test program with an Archive autoloader- but I don't remember for sure. Other points: > However, more recently Dan Langille did some extensive > testing writing a 6GB file to six tapes. This brought > out additional problems of the driver "freezing" the tape, If the tape is 'freezing' it means that tape position was lost. Under what circumstances did this occor? -matt