From owner-freebsd-scsi@FreeBSD.ORG  Sun Jun  1 17:13:47 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6A70F37B401
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 17:13:47 -0700 (PDT)
Received: from beppo.feral.com (beppo.feral.com [192.67.166.79])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 623FA43F3F
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 17:13:46 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Received: from mailhost.feral.com (mjacob@mailhost.feral.com [192.67.166.1])
	by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h520Djqw097367;
	Sun, 1 Jun 2003 17:13:45 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Date: Sun, 1 Jun 2003 17:13:45 -0700 (PDT)
From: Matthew Jacob <mjacob@feral.com>
X-X-Sender: mjacob@beppo
To: Kern Sibbald <kern@sibbald.com>
In-Reply-To: <20030601124620.S18592@root.org>
Message-ID: <20030601163730.T97138@beppo>
References: <20030601124620.S18592@root.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: mjacob@feral.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 00:13:47 -0000


Hello, I'm the author of the SA driver. This specific case is something
I have indeed tried to handle correctly, but could have missed something
on. In particular I've been wary of devices in fixed block mode.

The executive summary: I need more info. I need to know:

	a) was the tape device in fixed or variable block mode

	b) you claim to have lost blocks 1555..1567, and that
	1568 was the signifier to change tapes. Are these tape
	blocks reflective of single 'write' requests? Or are
	these multiple tape records issued in one write?

	c) What was the signifier you got that indicated that it
	was time to change tapes (viz block 1568)? -1 and an errno
	set? A residual that indicated that some data that you
	had requested to be written had not been written.


	d) Other general info about whether you were indeed using
	the 'no-rewind' device, whether you'd changed the default
	EOT model (from 'dual filemark' to 'single filemark'- you
	*have* read the man pages, yes? :-))


There is one case I'm also worried about. This is from sa.c:saerror:

       if (csio->cdb_io.cdb_bytes[0] == SA_WRITE) {
                if (sense_key == SSD_KEY_VOLUME_OVERFLOW) {
                        csio->resid = resid;
                        error = ENOSPC;
                } else if (sense->flags & SSD_EOM) {
                        softc->flags |= SA_FLAG_EOM_PENDING;
                        /*
                         * Grotesque as it seems, the few times
                         * I've actually seen a non-zero resid,
                         * the tape drive actually lied and had
                         * writtent all the data!.
                         */
                        csio->resid = 0;
                }

This is saying: if we were writing, and we got SSD_KEY_VOLUME_OVERFLOW,
we're at hard EOT- we have to assume we didn't write *any* data
for this last operation, and we return an errno.

Otherwise, if early warning was spotted, mark EOM pending, but *don't*
believe the residual field.

Every tape drive I'd tested with (and this was around 7 or 8) had all,
when presenting a non-zero residual, had lied about what they actually
had put on the tape.

What I'm obviously worried about here is whether or not your tape drive
was correct in reporting a residual. This would indeed fit your data.

I'm pretty sure I also tested my EOT test program with an Archive
autoloader- but I don't remember for sure.


Other points:

> However, more recently Dan Langille did some extensive
> testing writing a 6GB file to six tapes. This brought
> out additional problems of the driver "freezing" the tape,

If the tape is 'freezing' it means that tape position was lost.
Under what circumstances did this occor?


-matt