From owner-freebsd-scsi@FreeBSD.ORG  Wed Jun 16 23:31:48 2010
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1466A1065674
	for <freebsd-scsi@freebsd.org>; Wed, 16 Jun 2010 23:31:48 +0000 (UTC)
	(envelope-from mj@feral.com)
Received: from ns1.feral.com (ns1.feral.com [192.67.166.1])
	by mx1.freebsd.org (Postfix) with ESMTP id D59758FC0A
	for <freebsd-scsi@freebsd.org>; Wed, 16 Jun 2010 23:31:47 +0000 (UTC)
Received: from [192.168.0.102] (m206-63.dsl.tsoft.com [198.144.206.63])
	by ns1.feral.com (8.14.3/8.14.3) with ESMTP id o5GNVlDG060455
	for <freebsd-scsi@freebsd.org>; Wed, 16 Jun 2010 16:31:47 -0700 (PDT)
	(envelope-from mj@feral.com)
Message-ID: <4C195EE6.1050207@feral.com>
Date: Wed, 16 Jun 2010 16:31:50 -0700
From: Matthew Jacob <mj@feral.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
	rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4
MIME-Version: 1.0
To: freebsd-scsi@freebsd.org
References: <AANLkTikULvhu5TRVDNAY59UvKII-BuBYBvDe83jQFLXR@mail.gmail.com>
In-Reply-To: <AANLkTikULvhu5TRVDNAY59UvKII-BuBYBvDe83jQFLXR@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Greylist: Default is to whitelist mail, not delayed by milter-greylist-4.2.3
	(ns1.feral.com [192.67.166.1]);
	Wed, 16 Jun 2010 16:31:47 -0700 (PDT)
Subject: Re: sa: write returns 0 = LEOM?
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Jun 2010 23:31:48 -0000

On 6/16/2010 3:52 PM, Dustin J. Mitchell wrote:
> I'm investigating a user bug report in Amanda:
>    http://forums.zmanda.com/showthread.php?t=2832
>
> The problem boils down to a write(2) call for a SCSI tape device
> (/dev/nsa0) returning 0 after quite a bit of data and a number of
> filemarks have been written.  Jean-Louis suspected that this was an
> early warning EOM indication, and that a subsequent write() would
> succeed, with Amanda having been duly warned that a physical EOM is
> coming up.

That is, I believe, a specific feature of Solaris (EOM detection 
triggers a zero write, but allows for trailer records).  I seem to 
recall helping architect this back in 1996.


>   But looking at scsi_sa.c, this doesn't seem to be the
> case.  It looks like an early warning would result in a successful
> write instead, because resid is set to zero.
>
> cam/scsi/scsi_sa.c:
> 2418         /*
> 2419          * Handle filemark, end of tape, mismatched record sizes....
> 2420          * From this point out, we're only handling read/write cases.
> 2421          * Handle writes&&  reads differently.
> 2422          */
> 2423
> 2424         if (csio->cdb_io.cdb_bytes[0] == SA_WRITE) {
> 2425                 if (sense_key == SSD_KEY_VOLUME_OVERFLOW) {
> 2426                         csio->resid = resid;
> 2427                         error = ENOSPC;
> 2428                 } else if (sense->flags&  SSD_EOM) {
> 2429                         softc->flags |= SA_FLAG_EOM_PENDING;
> 2430                         /*
> 2431                          * Grotesque as it seems, the few times
> 2432                          * I've actually seen a non-zero resid,
> 2433                          * the tape drive actually lied and had
> 2434                          * written all the data!.
> 2435                          */
> 2436                         csio->resid = 0;
> 2437                 }
>
>    

Yes, I remember this code. I remember on doing test readbacks that the 
residual reported was in fact incorrect- the data had actually been 
written. But this was really a long while back (at least 8 years ago).


> That said, I don't know my way around the kernel source, so I'm
> probably missing something obvious.  So:
>
> 1. What could cause a write syscall to return 0?
>    

I'll try and look into this.

Do you happen to know whether the device you experienced this on was set 
in fixed block or variable block mode?

> 2. Since we will be using early warning in the next version of Amanda,
> hints as to the best way to handle early warning from userspace would
> be appreciated.
>
>    

Urrr....

I used to have opinions about this. Now I'm not so sure. Expecting 
consistent behaviour from platform to platform is tough.

Can't you write until you get a hard failure, back up one record (which, 
of course, you've hung onto), write a trailer label and then ask for a 
new tape?