From owner-freebsd-scsi@FreeBSD.ORG  Sun Jun  1 04:53:50 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id D097937B401; Sun,  1 Jun 2003 04:53:50 -0700 (PDT)
Received: from silver.he.iki.fi (silver.he.iki.fi [193.64.42.241])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 5F91143FAF; Sun,  1 Jun 2003 04:51:09 -0700 (PDT)
	(envelope-from pete@he.iki.fi)
Received: from he.iki.fi (localhost.he.iki.fi [127.0.0.1])
	by silver.he.iki.fi (8.12.9/8.11.4) with ESMTP id h51Bp7k8003853;
	Sun, 1 Jun 2003 14:51:08 +0300 (EEST)
	(envelope-from pete@he.iki.fi)
Message-ID: <3ED9E8AB.5060106@he.iki.fi>
Date: Sun, 01 Jun 2003 14:51:07 +0300
From: Petri Helenius <pete@he.iki.fi>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3) Gecko/20030501
X-Accept-Language: English [en],Finnish [fi]
MIME-Version: 1.0
To: freebsd-current@freebsd.org, freebsd-scsi@freebsd.org
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
Subject: raidframe
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Jun 2003 11:53:51 -0000


Is there anyone actually successfully using raidframe and if yes, what kind
of hardware?

Same question goes for any recent SCSI RAID controllers supported
by FreeBSD.

I admit not having tried all combinations but it seems that using anything
else than simple ahc scsi stuff results in kernel panic with  5.x.

Pete
 

From owner-freebsd-scsi@FreeBSD.ORG  Sun Jun  1 05:25:10 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 7984337B401; Sun,  1 Jun 2003 05:25:10 -0700 (PDT)
Received: from grogged.dyndns.org (c-66-41-94-114.mn.client2.attbi.com
	[66.41.94.114])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id B685A43FE1; Sun,  1 Jun 2003 05:22:29 -0700 (PDT)
	(envelope-from matt@grogged.dyndns.org)
Received: by grogged.dyndns.org (Postfix, from userid 1001)
	id 060B216809; Sun,  1 Jun 2003 07:13:00 -0500 (CDT)
Received: from localhost (localhost [127.0.0.1])
	by grogged.dyndns.org (Postfix) with ESMTP
	id EB3C3D23C; Sun,  1 Jun 2003 07:13:00 -0500 (CDT)
Date: Sun, 1 Jun 2003 07:13:00 -0500 (CDT)
From: matt <matt@grogged.dyndns.org>
To: Petri Helenius <pete@he.iki.fi>
In-Reply-To: <3ED9E8AB.5060106@he.iki.fi>
Message-ID: <20030601071231.G76837-100000@grogged.dyndns.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@freebsd.org
cc: freebsd-current@freebsd.org
Subject: Re: raidframe
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Jun 2003 12:25:11 -0000


I'm using a Ami MegaRaid 1500 in 5.x without any issues.

-m

On Sun, 1 Jun 2003, Petri Helenius wrote:

>
> Is there anyone actually successfully using raidframe and if yes, what kind
> of hardware?
>
> Same question goes for any recent SCSI RAID controllers supported
> by FreeBSD.
>
> I admit not having tried all combinations but it seems that using anything
> else than simple ahc scsi stuff results in kernel panic with  5.x.
>
> Pete
>
>
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
>

From owner-freebsd-scsi@FreeBSD.ORG  Sun Jun  1 09:27:23 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 564BF37B401
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 09:27:23 -0700 (PDT)
Received: from hub.org (hub.org [64.117.225.220])
	by mx1.FreeBSD.org (Postfix) with ESMTP id CED2743FA3
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 09:27:22 -0700 (PDT)
	(envelope-from scrappy@hub.org)
Received: from hub.org (unknown [64.117.225.220])
	by hub.org (Postfix) with ESMTP
	id 1B6AA6BA75E; Sun,  1 Jun 2003 13:27:21 -0300 (ADT)
Date: Sun, 1 Jun 2003 13:27:21 -0300 (ADT)
From: "Marc G. Fournier" <scrappy@hub.org>
To: freebsd-scsi@freebsd.org
Message-ID: <20030601131404.P6572@hub.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: Scott Long <scott_long@btc.adaptec.com>
Subject: Critical bug in Adaptec(aac) driver ...
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Jun 2003 16:27:23 -0000


As those on this list will have seen over the past few months, I have a
server that had (past tense) an Adaptec 2120s controller in her that was
giving alot of grief ... about 3 weeks ago, the server it was in *really*
blew up ... one drive was reported as down (in a RAID5 array), and when we
tried to bring it back up, a second drive started to "fail" ... I got the
techs to shut her down, and literally rushed to the remote location to see
if there was anything that I could do to at least recover the data ...

When I got there to bring it back up, the server reported that a 3rd drive
had failed ... and within a few hours, a 4th drive failed ... the result
being that we lost all of the data on that server, which turned out to be
quite painful to recover ...

While down there, we replaced the Adaptec controller with an Intel one,
reformatted the exact same drives, in the exact same chassis, and she's
been running fine since ...

On my trip back, I had a chat with a friend that does development work in
the Linux world, and who had had that server previous to myself, and
apparently there is a "known bug" in Linux that he says sounds exactly
like what I experienced (they hit it right in the middle of developing on
that box) and that there are apparently two Linux kernel patches that they
had to apply (after rebuilding from scratch) to correct the problem ...

The way he explained the problem to me, he made it sound like the kernel
driver was interacting with the BIOs and causing some corruption ... not
sure at what level, but since trying to swap in a new controller didn't
restore things, I'm suspecting at the hard drive level ... ?

Scott, while down there, I tried just about everything I could think to
... we replaced the SCSI cable, put the drives/controller into a second
identical chassis, swap host controller cards themselves (I had brought
spares) ... and that server, as I mentioned, is currently running quite
happily with an Intel host controller in it :(  So, unless the same
"failure" was hitting two host controllers, hardware failure doesn't seem
to have been the cause ...


From owner-freebsd-scsi@FreeBSD.ORG  Sun Jun  1 10:38:45 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 407BA37B404
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 10:38:45 -0700 (PDT)
Received: from magic.adaptec.com (magic-mail.adaptec.com [208.236.45.100])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 75D0E43F93
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 10:38:44 -0700 (PDT)
	(envelope-from scott_long@btc.adaptec.com)
Received: from redfish.adaptec.com (redfish.adaptec.com [162.62.50.11])
	by magic.adaptec.com (8.11.6/8.11.6) with ESMTP id h51HXmZ28839;
	Sun, 1 Jun 2003 10:33:48 -0700
Received: from btc.adaptec.com (hollin.btc.adaptec.com [10.100.253.56])
	by redfish.adaptec.com (8.8.8p2+Sun/8.8.8) with ESMTP id KAA22298;
	Sun, 1 Jun 2003 10:38:42 -0700 (PDT)
Message-ID: <3EDA3982.5040202@btc.adaptec.com>
Date: Sun, 01 Jun 2003 11:36:02 -0600
From: Scott Long <scott_long@btc.adaptec.com>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3) Gecko/20030414
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: "Marc G. Fournier" <scrappy@hub.org>
References: <20030601131404.P6572@hub.org>
In-Reply-To: <20030601131404.P6572@hub.org>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
Subject: Re: Critical bug in Adaptec(aac) driver ...
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Jun 2003 17:38:45 -0000

Marc G. Fournier wrote:
> As those on this list will have seen over the past few months, I have a
> server that had (past tense) an Adaptec 2120s controller in her that was
> giving alot of grief ... about 3 weeks ago, the server it was in *really*
> blew up ... one drive was reported as down (in a RAID5 array), and when we
> tried to bring it back up, a second drive started to "fail" ... I got the
> techs to shut her down, and literally rushed to the remote location to see
> if there was anything that I could do to at least recover the data ...
> 
> When I got there to bring it back up, the server reported that a 3rd drive
> had failed ... and within a few hours, a 4th drive failed ... the result
> being that we lost all of the data on that server, which turned out to be
> quite painful to recover ...
> 
> While down there, we replaced the Adaptec controller with an Intel one,
> reformatted the exact same drives, in the exact same chassis, and she's
> been running fine since ...
> 
> On my trip back, I had a chat with a friend that does development work in
> the Linux world, and who had had that server previous to myself, and
> apparently there is a "known bug" in Linux that he says sounds exactly
> like what I experienced (they hit it right in the middle of developing on
> that box) and that there are apparently two Linux kernel patches that they
> had to apply (after rebuilding from scratch) to correct the problem ...
> 
> The way he explained the problem to me, he made it sound like the kernel
> driver was interacting with the BIOs and causing some corruption ... not
> sure at what level, but since trying to swap in a new controller didn't
> restore things, I'm suspecting at the hard drive level ... ?
> 
> Scott, while down there, I tried just about everything I could think to
> ... we replaced the SCSI cable, put the drives/controller into a second
> identical chassis, swap host controller cards themselves (I had brought
> spares) ... and that server, as I mentioned, is currently running quite
> happily with an Intel host controller in it :(  So, unless the same
> "failure" was hitting two host controllers, hardware failure doesn't seem
> to have been the cause ...
> 

I understand your frustration and wish there was more I could do to 
help.  Please send me whatever information that you have.

Scott

From owner-freebsd-scsi@FreeBSD.ORG  Sun Jun  1 10:45:24 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8684037B401
	for <freebsd-scsi@FreeBSD.org>; Sun,  1 Jun 2003 10:45:24 -0700 (PDT)
Received: from srv1.cosmo-project.de (srv1.cosmo-project.de [213.83.6.106])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0876143F75
	for <freebsd-scsi@FreeBSD.org>; Sun,  1 Jun 2003 10:45:23 -0700 (PDT)
	(envelope-from andreas@klemm.apsfilter.org)
Received: from srv1.cosmo-project.de (localhost [IPv6:::1])
	by srv1.cosmo-project.de (8.12.9/8.12.9) with ESMTP id h51HjJrN055503
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <freebsd-scsi@FreeBSD.org>; Sun, 1 Jun 2003 19:45:21 +0200 (CEST)
	(envelope-from andreas@klemm.apsfilter.org)
Received: (from uucp@localhost)h51HjIOf055502
	for freebsd-scsi@FreeBSD.org; Sun, 1 Jun 2003 19:45:18 +0200 (CEST)
	(envelope-from andreas@klemm.apsfilter.org)
Received: from titan.klemm.apsfilter.org (localhost.klemm.apsfilter.org
	[127.0.0.1])
	by klemm.apsfilter.org (8.12.9/8.12.9) with ESMTP id h51HiIJE039756
	for <freebsd-scsi@FreeBSD.org>; Sun, 1 Jun 2003 19:44:23 +0200 (CEST)
	(envelope-from andreas@titan.klemm.apsfilter.org)
Received: (from andreas@localhost)
	by titan.klemm.apsfilter.org (8.12.9/8.12.9/Submit) id h51HiI1l039755
	for freebsd-scsi@FreeBSD.org; Sun, 1 Jun 2003 19:44:18 +0200 (CEST)
Date: Sun, 1 Jun 2003 19:44:18 +0200
From: Andreas Klemm <andreas@klemm.apsfilter.org>
To: freebsd-scsi@FreeBSD.org
Message-ID: <20030601174418.GA39708@titan.klemm.apsfilter.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
X-Operating-System: FreeBSD 5.1-RC
X-Disclaimer: A free society is one where it is safe to be unpopular
User-Agent: Mutt/1.5.4i
Subject: Supported Controller under FreeBSD for Sun Storedge A5000, FC-AL ?
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Jun 2003 17:45:24 -0000

Hi,

q: is there a PCI controller available for FreeBSD 4.x and 5.x
which is capable to connect to a Sun Storedge A5000 ?

Which FreeBSD driver would it be, that supports this card ?

I found this article:
	http://www.sunhelp.org/pipermail/rescue/2002-June/058045.html

But I never heard of an Interphase 5526 PCI controller.
Which FreeBSD driver would support it ?

Here a example storedge offer on eBay:
http://cgi.ebay.de/ws/eBayISAPI.dll?ViewItem&category=3D8074&item=3D3026979=
541&rd=3D1

Best regards

	Andreas ///

--=20
Andreas Klemm
http://www.64bits.de
http://www.apsfilter.org/
http://people.FreeBSD.ORG/~andreas

From owner-freebsd-scsi@FreeBSD.ORG  Sun Jun  1 10:54:46 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 1DF5B37B401
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 10:54:46 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 621F043F85
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 10:54:44 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h51Hsfv01938
	for <freebsd-scsi@freebsd.org>; Sun, 1 Jun 2003 19:54:42 +0200
From: Kern Sibbald <kern@sibbald.com>
To: freebsd-scsi@freebsd.org
Content-Type: text/plain
Organization: 
Message-Id: <1054490081.1582.1685.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 01 Jun 2003 19:54:41 +0200
Content-Transfer-Encoding: 7bit
Subject: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Jun 2003 17:54:46 -0000

Hello,

I'm the author of a GPL'ed network backup program called
Bacula (www.bacula.org). For the last three years, it
has been working flawlessly on Solaris and Linux systems.
When users attempted to use it recently on FreeBSD,
it did not work. I subsequently modified Bacula so that
it would work on FreeBSD -- basically, I had to program
around some important differences in the way FreeBSD 
handles EOFs compared to Solaris and Linux.  At some point
in the future, I would like to discuss the problems
I had in detail, if that interests you.

However, more recently Dan Langille did some extensive
testing writing a 6GB file to six tapes. This brought
out additional problems of the driver "freezing" the tape,
which I believe I have also programmed around, but worst
(and the main reason for this email), Dan discovered
that Bacula did not correctly read back the data that
was "supposedly" written to the tape.

We've now worked on this problem for several weeks, and
I believe we have now isolated the problem (data loss) to occur
when the end of medium is reached.

We have now confirmed that Bacula correctly wrote
to the tape, but when it was read back 13 blocks
of 64512 bytes were missing.

Below, I have listed in pseudo-language what
Bacula was doing. Each write with the exception
of the first block on the second tape is 64512
bytes:

  first tape mounted
  write(block 1)
  ...
  write(block 1554);
  write(block 1555);   <=== block lost
  ...                  <=== blocks lost
  write(block 1567);   <=== block lost
  write(block 1568) failed because of EOM detected
  ioctl(MTIOCERRSTAT);
  ioctl(MTWEOF);
  ioctl(MTWEOF);

  ioctl(MTBSF);
  ioctl(MTBSF);
  ioctl(MTBSR);

  read() returned 0 bytes.
  ioctl(MTREW);
  close()
 
  new tape mounted.
  write(block 1); Tape pre-label
  write(block 1 again);
  ioctl(MTREW);
  read(block1);
  ioctl(MTREW);
  write(block 1);  Tape label
  write(block 1568);  block not written to previous tape.

I have verified that Bacula did successfully write 1567 blocks to the
first tape, but in reading back the tape, blocks 1555-1567 are not
on the tape.

Now, the big question is: what caused the loss of those blocks?
The most likely causes I can think of are:

1. Bacula is doing something (e.g. MTIOCERRSTAT, or the MTBSF)
   to cause the data to be lost.  If this is the case, it is
   something specific to FreeBSD since this sequence of commands
   works on both Solaris and Linux (except that MTIOCERRSTAT is
   MTIOCLRERR on those systems).

2. The SCSI driver is doing asynchronous writes (very bad) and
   the End of Medium is not sent to Bacula until many writes after
   the end of the tape.

3. The SCSI driver has some sort of bug that causes buffers to be
   lost.

There may be other possible reasons that I am unaware of at this moment.

Can you shed any light on this problem?

If you have any questions concerning the hardware, Dan 
(dan@langille.com) will be able to provide the answers.

Best regards,

Kern

PS: I am not subscribed to the list so please copy me directly.

From owner-freebsd-scsi@FreeBSD.ORG  Sun Jun  1 11:32:38 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 03CF737B418
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 11:32:38 -0700 (PDT)
Received: from bast.unixathome.org (bast.unixathome.org [66.11.174.150])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D025C43F93
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 11:32:36 -0700 (PDT)
	(envelope-from dan@langille.org)
Received: from wocker (wocker.unixathome.org [192.168.0.99])
	by bast.unixathome.org (Postfix) with ESMTP
	id 2CBA33F4F; Sun,  1 Jun 2003 14:32:36 -0400 (EDT)
From: "Dan Langille" <dan@langille.org>
To: freebsd-scsi@freebsd.org
Date: Sun, 01 Jun 2003 14:32:36 -0400
MIME-Version: 1.0
Message-ID: <3EDA0E84.15066.C424E3B1@localhost>
Priority: normal
In-reply-to: <1054490081.1582.1685.camel@rufus>
X-mailer: Pegasus Mail for Windows (v4.02a)
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Content-description: Mail message body
cc: Kern Sibbald <kern@sibbald.com>
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Jun 2003 18:32:38 -0000

On 1 Jun 2003 at 19:54, Kern Sibbald wrote:

> If you have any questions concerning the hardware, Dan 
> (dan@langille.com) will be able to provide the answers.

 The box is running 4.8-RC FreeBSD 4.8-RC #9: Fri Apr  4 09:15:39 EST 
2003 although it may be from a cvsup done much earlier thatn Apr 4.

The tape drive is a Archive Python p4586np: 
http://www.seagate.com/support/tape/specs/dds/p4586np.html

The SCSI card is an Adaptec 2940.

$ dmesg
Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 
1994
        The Regents of the University of California. All rights 
reserved.
FreeBSD 4.8-RC #9: Fri Apr  4 09:15:39 EST 2003
    root@Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 
1994
        The Regents of the University of California. All rights 
reserved.
FreeBSD 4.8-RC #9: Fri Apr  4 09:15:39 EST 2003
    root@undef.example.org:/usr/obj/usr/src/sys/UNDEF
Timecounter "i8254"  frequency 1193182 Hz
CPU: AMD Duron(tm) Processor (901.60-MHz 686-class CPU)
  Origin = "AuthenticAMD"  Id = 0x631  Stepping = 1
  
Features=0x183f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA
,CMOV,PAT,PSE36,MMX,FXSR>
  AMD Features=0xc0440000<RSVD,AMIE,DSP,3DNow!>
real memory  = 402587648 (393152K bytes)
config> di sn0
config> di lnc0
config> di ie0
config> di fe0
config> di ed0
config> di cs0
config> di bt0
config> di aic0
config> di aha0
config> di adv0
config> q
avail memory = 386621440 (377560K bytes)
Preloaded elf kernel "kernel" at 0xc04b3000.
Preloaded userconfig_script "/boot/kernel.conf" at 0xc04b309c.
Pentium Pro MTRR support enabled
md0: Malloc disk
Using $PIR table, 5 entries at 0xc00fdf20
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Host to PCI bridge> on motherboard
pci0: <PCI bus> on pcib0
pcib2: <VIA 8363 (Apollo KT133) PCI-PCI (AGP) bridge> at device 1.0 
on pci0
pci1: <PCI bus> on pcib2
pci1: <ATI model 5046 graphics accelerator> at 0.0 irq 11
isab0: <VIA 82C686 PCI-ISA bridge> at device 7.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <VIA 82C686 ATA66 controller> port 0xd000-0xd00f at device 
7.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
uhci0: <VIA 83C572 USB controller> port 0xd400-0xd41f irq 5 at device 
7.2 on pci0
usb0: <VIA 83C572 USB controller> on uhci0
usb0: USB revision 1.0
uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <VIA 83C572 USB controller> port 0xd800-0xd81f irq 5 at device 
7.3 on pci0
usb1: <VIA 83C572 USB controller> on uhci1
usb1: USB revision 1.0
uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
pci0: <unknown card> (vendor=0x1106, dev=0x3057) at 7.4
ahc0: <Adaptec 2940 SCSI adapter> port 0xe000-0xe0ff mem 0xdb000000-
0xdb000fff irq 11 at device 9.0 on pci0
aic7870: Single Channel A, SCSI Id=7, 16/253 SCBs
rl0: <RealTek 8139 10/100BaseTX> port 0xe400-0xe4ff mem 0xdb001000-
0xdb0010ff irq 10 at device 10.0 on pci0
rl0: Ethernet address: 00:50:fc:50:56:88
miibus0: <MII bus> on rl0
rlphy0: <RealTek internal media interface> on miibus0
rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
pcib1: <Host to PCI bridge> on motherboard
pci2: <PCI bus> on pcib1
orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xcc000-0xce7ff on isa0
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on 
isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on 
isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/8 bytes threshold
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
ad0: 38172MB <MAXTOR 6L040J2> [77557/16/63] at ata0-master UDMA66
acd0: CDROM <MATSHITA CR-589> at ata1-master PIO4
Waiting 15 seconds for SCSI devices to settle
sa0 at ahc0 bus 0 target 4 lun 0
sa0: <ARCHIVE 4586XX 28887-XXX 4BGD> Removable Sequential Access SCSI-
2 device
sa0: 5.000MB/s transfers (5.000MHz, offset 15)
pass1 at ahc0 bus 0 target 4 lun 1
pass1: <ARCHIVE 4586XX 28887-XXX 4BGD> Removable Changer SCSI-2 
device
pass1: 5.000MB/s transfers (5.000MHz, offset 15)
Mounting root from ufs:/dev/ad0s1a
(sa0:ahc0:0:4:0): REWIND. CDB: 1 0 0 0 0 0
(sa0:ahc0:0:4:0): NOT READY asc:3a,0
(sa0:ahc0:0:4:0): Medium not present
(sa0:ahc0:0:4:0): REWIND. CDB: 1 0 0 0 0 0
(sa0:ahc0:0:4:0): NOT READY asc:3a,0
(sa0:ahc0:0:4:0): Medium not present
(sa0:ahc0:0:4:0): REWIND. CDB: 1 0 0 0 0 0
(sa0:ahc0:0:4:0): NOT READY asc:3a,0
(sa0:ahc0:0:4:0): Medium not present
(sa0:ahc0:0:4:0): REWIND. CDB: 1 0 0 0 0 0
(sa0:ahc0:0:4:0): NOT READY asc:3a,0
(sa0:ahc0:0:4:0): Medium not present.org:/usr/obj/usr/src/sys/UNDEF
Timecounter "i8254"  frequency 1193182 Hz
CPU: AMD Duron(tm) Processor (901.60-MHz 686-class CPU)
  Origin = "AuthenticAMD"  Id = 0x631  Stepping = 1
  
Features=0x183f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA
,CMOV,PAT,PSE36,MMX,FXSR>
  AMD Features=0xc0440000<RSVD,AMIE,DSP,3DNow!>
real memory  = 402587648 (393152K bytes)
config> di sn0
config> di lnc0
config> di ie0
config> di fe0
config> di ed0
config> di cs0
config> di bt0
config> di aic0
config> di aha0
config> di adv0
config> q
avail memory = 386621440 (377560K bytes)
Preloaded elf kernel "kernel" at 0xc04b3000.
Preloaded userconfig_script "/boot/kernel.conf" at 0xc04b309c.
Pentium Pro MTRR support enabled
md0: Malloc disk
Using $PIR table, 5 entries at 0xc00fdf20
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Host to PCI bridge> on motherboard
pci0: <PCI bus> on pcib0
pcib2: <VIA 8363 (Apollo KT133) PCI-PCI (AGP) bridge> at device 1.0 
on pci0
pci1: <PCI bus> on pcib2
pci1: <ATI model 5046 graphics accelerator> at 0.0 irq 11
isab0: <VIA 82C686 PCI-ISA bridge> at device 7.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <VIA 82C686 ATA66 controller> port 0xd000-0xd00f at device 
7.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
uhci0: <VIA 83C572 USB controller> port 0xd400-0xd41f irq 5 at device 
7.2 on pci0
usb0: <VIA 83C572 USB controller> on uhci0
usb0: USB revision 1.0
uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <VIA 83C572 USB controller> port 0xd800-0xd81f irq 5 at device 
7.3 on pci0
usb1: <VIA 83C572 USB controller> on uhci1
usb1: USB revision 1.0
uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
pci0: <unknown card> (vendor=0x1106, dev=0x3057) at 7.4
ahc0: <Adaptec 2940 SCSI adapter> port 0xe000-0xe0ff mem 0xdb000000-
0xdb000fff irq 11 at device 9.0 on pci0
aic7870: Single Channel A, SCSI Id=7, 16/253 SCBs
rl0: <RealTek 8139 10/100BaseTX> port 0xe400-0xe4ff mem 0xdb001000-
0xdb0010ff irq 10 at device 10.0 on pci0
rl0: Ethernet address: 00:50:fc:50:56:88
miibus0: <MII bus> on rl0
rlphy0: <RealTek internal media interface> on miibus0
rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
pcib1: <Host to PCI bridge> on motherboard
pci2: <PCI bus> on pcib1
orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xcc000-0xce7ff on isa0
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on 
isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on 
isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/8 bytes threshold
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
ad0: 38172MB <MAXTOR 6L040J2> [77557/16/63] at ata0-master UDMA66
acd0: CDROM <MATSHITA CR-589> at ata1-master PIO4
Waiting 15 seconds for SCSI devices to settle
sa0 at ahc0 bus 0 target 4 lun 0
sa0: <ARCHIVE 4586XX 28887-XXX 4BGD> Removable Sequential Access SCSI-
2 device
sa0: 5.000MB/s transfers (5.000MHz, offset 15)
pass1 at ahc0 bus 0 target 4 lun 1
pass1: <ARCHIVE 4586XX 28887-XXX 4BGD> Removable Changer SCSI-2 
device
pass1: 5.000MB/s transfers (5.000MHz, offset 15)
Mounting root from ufs:/dev/ad0s1a
(sa0:ahc0:0:4:0): REWIND. CDB: 1 0 0 0 0 0
(sa0:ahc0:0:4:0): NOT READY asc:3a,0
(sa0:ahc0:0:4:0): Medium not present
(sa0:ahc0:0:4:0): REWIND. CDB: 1 0 0 0 0 0
(sa0:ahc0:0:4:0): NOT READY asc:3a,0
(sa0:ahc0:0:4:0): Medium not present
(sa0:ahc0:0:4:0): REWIND. CDB: 1 0 0 0 0 0
(sa0:ahc0:0:4:0): NOT READY asc:3a,0
(sa0:ahc0:0:4:0): Medium not present
(sa0:ahc0:0:4:0): REWIND. CDB: 1 0 0 0 0 0
(sa0:ahc0:0:4:0): NOT READY asc:3a,0
(sa0:ahc0:0:4:0): Medium not present
-- 
Dan Langille : http://www.langille.org/

From owner-freebsd-scsi@FreeBSD.ORG  Sun Jun  1 13:08:45 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id E2BFD37B404
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 13:08:45 -0700 (PDT)
Received: from aslan.scsiguy.com (mail.scsiguy.com [63.229.232.106])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 5868A43FA3
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 13:08:44 -0700 (PDT)
	(envelope-from gibbs@scsiguy.com)
Received: from aslan.scsiguy.com (aslan.scsiguy.com [63.229.232.106])
	by aslan.scsiguy.com (8.12.8/8.12.8) with ESMTP id h51K8YIh013032;
	Sun, 1 Jun 2003 14:08:35 -0600 (MDT)
	(envelope-from gibbs@scsiguy.com)
Date: Sun, 01 Jun 2003 14:08:34 -0600
From: "Justin T. Gibbs" <gibbs@scsiguy.com>
To: Kern Sibbald <kern@sibbald.com>, freebsd-scsi@freebsd.org
Message-ID: <2846020000.1054498114@aslan.scsiguy.com>
In-Reply-To: <1054490081.1582.1685.camel@rufus>
References: <1054490081.1582.1685.camel@rufus>
X-Mailer: Mulberry/3.0.3 (Linux/x86)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
cc: mjacob@feral.com
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Jun 2003 20:08:46 -0000

> Hello,
> 
> I'm the author of a GPL'ed network backup program called
> Bacula (www.bacula.org). For the last three years, it
> has been working flawlessly on Solaris and Linux systems.
> When users attempted to use it recently on FreeBSD,
> it did not work. I subsequently modified Bacula so that
> it would work on FreeBSD -- basically, I had to program
> around some important differences in the way FreeBSD 
> handles EOFs compared to Solaris and Linux.  At some point
> in the future, I would like to discuss the problems
> I had in detail, if that interests you.

I would be interested as I'm sure would other readers of this
list.

> We've now worked on this problem for several weeks, and
> I believe we have now isolated the problem (data loss) to occur
> when the end of medium is reached.
> 
> We have now confirmed that Bacula correctly wrote
> to the tape, but when it was read back 13 blocks
> of 64512 bytes were missing.
> 
> Below, I have listed in pseudo-language what
> Bacula was doing. Each write with the exception
> of the first block on the second tape is 64512
> bytes:
> 
>   first tape mounted
>   write(block 1)
>   ...
>   write(block 1554);
>   write(block 1555);   <=== block lost
>   ...                  <=== blocks lost
>   write(block 1567);   <=== block lost
>   write(block 1568) failed because of EOM detected
>   ioctl(MTIOCERRSTAT);

What was the residual reported by MTIOCERRSTAT?  If the
device is in buffered mode, that residual can be larger than
the last transaction that was failed.  My guess is that either
MTIOCERRSTAT is not properly pulling the residual out of the
info field, or you are not backing up far enough in the data
stream when the EOM occurs.

> I have verified that Bacula did successfully write 1567 blocks to the
> first tape, but in reading back the tape, blocks 1555-1567 are not
> on the tape.
> 
> Now, the big question is: what caused the loss of those blocks?
> The most likely causes I can think of are:
> 
> 1. Bacula is doing something (e.g. MTIOCERRSTAT, or the MTBSF)
>    to cause the data to be lost.  If this is the case, it is
>    something specific to FreeBSD since this sequence of commands
>    works on both Solaris and Linux (except that MTIOCERRSTAT is
>    MTIOCLRERR on those systems).

Perhaps both Linux and Solaris force the tape drives to run in
unbuffered mode?

> 2. The SCSI driver is doing asynchronous writes (very bad) and
>    the End of Medium is not sent to Bacula until many writes after
>    the end of the tape.

Disabling the tape drive's write buffer kills performance.  All
of the information required to handle buffered writes should be
available to you.

Perhaps we should also implement the MTCACHE/MTNOCACHE opcodes so
that userland apps can control this.  It's not clear if this is
exactly what they were created for, but it may be better to use
these than to add some other opcodes.

> 3. The SCSI driver has some sort of bug that causes buffers to be
>    lost.

I doubt that this would occur only at EOM.

--
Justin

From owner-freebsd-scsi@FreeBSD.ORG  Sun Jun  1 14:41:09 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id D695B37B401
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 14:41:09 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4E1A543F75
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 14:41:08 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h51Lf6v02454
	for <freebsd-scsi@freebsd.org>; Sun, 1 Jun 2003 23:41:07 +0200
From: Kern Sibbald <kern@sibbald.com>
To: freebsd-scsi@freebsd.org
Content-Type: text/plain
Organization: 
Message-Id: <1054503666.1582.1718.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 01 Jun 2003 23:41:06 +0200
Content-Transfer-Encoding: 7bit
Subject: [Fwd: Re: SCSI tape data loss]
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Jun 2003 21:41:10 -0000

Oops. Sorry, I didn't include the list.

-----Forwarded Message-----

From: Kern Sibbald <kern@sibbald.com>
To: Justin T. Gibbs <gibbs@scsiguy.com>
Subject: Re: SCSI tape data loss
Date: 01 Jun 2003 23:37:09 +0200

On Sun, 2003-06-01 at 22:08, Justin T. Gibbs wrote:
> > Hello,
> > 
> > I'm the author of a GPL'ed network backup program called
> > Bacula (www.bacula.org). For the last three years, it
> > has been working flawlessly on Solaris and Linux systems.
> > When users attempted to use it recently on FreeBSD,
> > it did not work. I subsequently modified Bacula so that
> > it would work on FreeBSD -- basically, I had to program
> > around some important differences in the way FreeBSD 
> > handles EOFs compared to Solaris and Linux.  At some point
> > in the future, I would like to discuss the problems
> > I had in detail, if that interests you.
> 
> I would be interested as I'm sure would other readers of this
> list.

OK, in the next few days, I will document the differences
between Solaris/Linux and FreeBSD that I have run into.

> 
> > We've now worked on this problem for several weeks, and
> > I believe we have now isolated the problem (data loss) to occur
> > when the end of medium is reached.
> > 
> > We have now confirmed that Bacula correctly wrote
> > to the tape, but when it was read back 13 blocks
> > of 64512 bytes were missing.
> > 
> > Below, I have listed in pseudo-language what
> > Bacula was doing. Each write with the exception
> > of the first block on the second tape is 64512
> > bytes:
> > 
> >   first tape mounted
> >   write(block 1)
> >   ...
> >   write(block 1554);
> >   write(block 1555);   <=== block lost
> >   ...                  <=== blocks lost
> >   write(block 1567);   <=== block lost
> >   write(block 1568) failed because of EOM detected
> >   ioctl(MTIOCERRSTAT);
> 
> What was the residual reported by MTIOCERRSTAT?  If the
> device is in buffered mode, that residual can be larger than
> the last transaction that was failed.  My guess is that either
> MTIOCERRSTAT is not properly pulling the residual out of the
> info field, or you are not backing up far enough in the data
> stream when the EOM occurs.
> 
> > I have verified that Bacula did successfully write 1567 blocks to the
> > first tape, but in reading back the tape, blocks 1555-1567 are not
> > on the tape.
> > 
> > Now, the big question is: what caused the loss of those blocks?
> > The most likely causes I can think of are:
> > 
> > 1. Bacula is doing something (e.g. MTIOCERRSTAT, or the MTBSF)
> >    to cause the data to be lost.  If this is the case, it is
> >    something specific to FreeBSD since this sequence of commands
> >    works on both Solaris and Linux (except that MTIOCERRSTAT is
> >    MTIOCLRERR on those systems).
> 
> Perhaps both Linux and Solaris force the tape drives to run in
> unbuffered mode?

Both of these systems run in synchronous write (unbuffered)
mode by default. It is possible to run with asynchronous
writes (buffered mode), but I am not aware of any 
program that does so.  The mt program can be used to set
synchronous/asynchronous writes, or other modes such
as Sys V compatibility rather than BSD style.


> 
> > 2. The SCSI driver is doing asynchronous writes (very bad) and
> >    the End of Medium is not sent to Bacula until many writes after
> >    the end of the tape.
> 
> Disabling the tape drive's write buffer kills performance.  All
> of the information required to handle buffered writes should be
> available to you.

My personal preference is for data security before performance.

If you are in fact doing asynchronous writes (buffered mode), then
Bacula will not support FreeBSD without essentially duplicating the
driver's buffering code inside Bacula -- something I don't plan
to do in the near future, if for not other reason than doing so
would mean a different driver for every operating system.

I'm not convinced that there is really much loss in performance,
and even if I am wrong (quite possibly) 
it can be easily compensated by having Bacula
buffer itself and using a separate thread dedicated to writing
and using synchronous (non-buffered) writes in the OS driver.
 

How do you support tar?  Tar knows nothing about buffering --
at least not GNU tar to the best of my knowledge.

> 
> Perhaps we should also implement the MTCACHE/MTNOCACHE opcodes so
> that userland apps can control this.  It's not clear if this is
> exactly what they were created for, but it may be better to use
> these than to add some other opcodes.

>From my experience with Solaris/Linux (absolutely no problems in
3 years), I'd recommend implementing a non-buffered mode (your
MTNOCACHE I assume), and it should be the default.  In fact,
though it is certainly possible and possibly worth the effort,
I've never heard of any standard Unix program handling a 
buffered tape drive.  If you know one, I would certainly like to
know about it.

Exactly what ioctl() does what is not critical for me as I can
always code it -- what counts is that it is well documented.
Of course, the more things are standard across systems, the
easier it is to program.

Maybe I missed it, but I didn't see anything that indicated that
the FreeBSD does asynchronous writes.

> 
> > 3. The SCSI driver has some sort of bug that causes buffers to be
> >    lost.
> 
> I doubt that this would occur only at EOM.

Well, if the drive is running in asynchronous write mode, then
data loss will occur in every Unix program that I know of at EOM
and any time there is a tape write error.

Could someone confirm whether or not the driver is doing 
asynchronous writes, and whether or not I can turn it off?
(I think this is the case from your email, but am not
100% sure).

Best regards,

Kern

From owner-freebsd-scsi@FreeBSD.ORG  Sun Jun  1 14:45:05 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id EF02237B401
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 14:45:04 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4A7FD43F75
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 14:45:03 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h51Lirv02464;
	Sun, 1 Jun 2003 23:44:53 +0200
From: Kern Sibbald <kern@sibbald.com>
To: "Justin T. Gibbs" <gibbs@scsiguy.com>
In-Reply-To: <2846020000.1054498114@aslan.scsiguy.com>
References: <1054490081.1582.1685.camel@rufus>
	 <2846020000.1054498114@aslan.scsiguy.com>
Content-Type: text/plain
Organization: 
Message-Id: <1054503893.1578.1723.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 01 Jun 2003 23:44:53 +0200
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
cc: mjacob@feral.com
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Jun 2003 21:45:05 -0000

Hello again,

I just re-read the Linux mt pages, and I see that
they have a setting both for async-writes and buffer-writes,
so I'm now confused about what the distinction really
is. I had assumed that if you are buffering then
the writes must be asynchronous, otherwise why would you
buffer?

Best regards,

Kern

On Sun, 2003-06-01 at 22:08, Justin T. Gibbs wrote:
> > Hello,
> > 
> > I'm the author of a GPL'ed network backup program called
> > Bacula (www.bacula.org). For the last three years, it
> > has been working flawlessly on Solaris and Linux systems.
> > When users attempted to use it recently on FreeBSD,
> > it did not work. I subsequently modified Bacula so that
> > it would work on FreeBSD -- basically, I had to program
> > around some important differences in the way FreeBSD 
> > handles EOFs compared to Solaris and Linux.  At some point
> > in the future, I would like to discuss the problems
> > I had in detail, if that interests you.
> 
> I would be interested as I'm sure would other readers of this
> list.
> 
> > We've now worked on this problem for several weeks, and
> > I believe we have now isolated the problem (data loss) to occur
> > when the end of medium is reached.
> > 
> > We have now confirmed that Bacula correctly wrote
> > to the tape, but when it was read back 13 blocks
> > of 64512 bytes were missing.
> > 
> > Below, I have listed in pseudo-language what
> > Bacula was doing. Each write with the exception
> > of the first block on the second tape is 64512
> > bytes:
> > 
> >   first tape mounted
> >   write(block 1)
> >   ...
> >   write(block 1554);
> >   write(block 1555);   <=== block lost
> >   ...                  <=== blocks lost
> >   write(block 1567);   <=== block lost
> >   write(block 1568) failed because of EOM detected
> >   ioctl(MTIOCERRSTAT);
> 
> What was the residual reported by MTIOCERRSTAT?  If the
> device is in buffered mode, that residual can be larger than
> the last transaction that was failed.  My guess is that either
> MTIOCERRSTAT is not properly pulling the residual out of the
> info field, or you are not backing up far enough in the data
> stream when the EOM occurs.
> 
> > I have verified that Bacula did successfully write 1567 blocks to the
> > first tape, but in reading back the tape, blocks 1555-1567 are not
> > on the tape.
> > 
> > Now, the big question is: what caused the loss of those blocks?
> > The most likely causes I can think of are:
> > 
> > 1. Bacula is doing something (e.g. MTIOCERRSTAT, or the MTBSF)
> >    to cause the data to be lost.  If this is the case, it is
> >    something specific to FreeBSD since this sequence of commands
> >    works on both Solaris and Linux (except that MTIOCERRSTAT is
> >    MTIOCLRERR on those systems).
> 
> Perhaps both Linux and Solaris force the tape drives to run in
> unbuffered mode?
> 
> > 2. The SCSI driver is doing asynchronous writes (very bad) and
> >    the End of Medium is not sent to Bacula until many writes after
> >    the end of the tape.
> 
> Disabling the tape drive's write buffer kills performance.  All
> of the information required to handle buffered writes should be
> available to you.
> 
> Perhaps we should also implement the MTCACHE/MTNOCACHE opcodes so
> that userland apps can control this.  It's not clear if this is
> exactly what they were created for, but it may be better to use
> these than to add some other opcodes.
> 
> > 3. The SCSI driver has some sort of bug that causes buffers to be
> >    lost.
> 
> I doubt that this would occur only at EOM.
> 
> --
> Justin

From owner-freebsd-scsi@FreeBSD.ORG  Sun Jun  1 14:49:41 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2282037B401
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 14:49:41 -0700 (PDT)
Received: from devil.stderror.at (at00d01-adsl-194-118-044-149.nextranet.at
	[194.118.44.149])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D0E4E43F75
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 14:49:39 -0700 (PDT)
	(envelope-from pinhead@stderror.at)
Received: by devil.stderror.at (Postfix, from userid 1000)
	id 3C5D415332; Sun,  1 Jun 2003 23:49:38 +0200 (CEST)
Date: Sun, 1 Jun 2003 23:49:38 +0200
From: Toni Schmidbauer <toni@stderror.at>
To: freebsd-scsi@freebsd.org
Message-ID: <20030601214937.GB22187@devil.stderror.at>
References: <20030601174418.GA39708@titan.klemm.apsfilter.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="NDin8bjvE/0mNLFQ"
Content-Disposition: inline
In-Reply-To: <20030601174418.GA39708@titan.klemm.apsfilter.org>
User-Agent: Mutt/1.4.1i
Subject: Re: Supported Controller under FreeBSD for Sun Storedge A5000,
	FC-AL ?
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: toni@stderror.at
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Jun 2003 21:49:41 -0000


--NDin8bjvE/0mNLFQ
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Jun 01, 2003 at 07:44:18PM +0200, Andreas Klemm wrote:
> q: is there a PCI controller available for FreeBSD 4.x and 5.x
> which is capable to connect to a Sun Storedge A5000 ?

i haven't tried that under freebsd but:

we are using the qlogic qlc2200f controller under linux and
solaris.
works like a charm, and theres a driver for 5.x and 4.8:=20

http://www.freebsd.org/releases/4.8R/hardware-i386.html
http://www.freebsd.org/releases/5.0R/hardware-i386.html

under linux our attached storage is an emc^2 and under sun it's
an a5000.

as far as i know, the qlc2200f understands fcal + fabric and for
the a5000 you will need fcal support, or you put it on a fc
switch port, which does fcal to fabric conversion.

> Which FreeBSD driver would it be, that supports this card ?

isp(4)

not exactly the information you ask for, but i hope its
nevertheless useful

toni
--=20
Behandle die Menschen, als w=E4ren sie, was sie sein | toni at stderror dot=
 at
sollten, und du wirst ihnen helfen, zu werden, was | Toni Schmidbauer
sie sein k=F6nnen.  - Johann Wolfgang von Goethe     |

--NDin8bjvE/0mNLFQ
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (FreeBSD)

iD8DBQE+2nTwu/mjSj7RMocRAgYuAJ9qS8sfOa9QgGAPHYBGvbMek+h9CACfVteM
0oQUD0IKaMJcJ1fiIARviiY=
=2wU5
-----END PGP SIGNATURE-----

--NDin8bjvE/0mNLFQ--

From owner-freebsd-scsi@FreeBSD.ORG  Sun Jun  1 15:17:46 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id EA1A937B401
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 15:17:46 -0700 (PDT)
Received: from misery.sdf.com (misery.sdf.com [207.200.153.226])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 7045843F75
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 15:17:45 -0700 (PDT)
	(envelope-from tom@sdf.com)
Received: from tom (helo=localhost)
	by misery.sdf.com with local-esmtp (Exim 2.12 #1)
	id 19MZWg-0002yv-00; Sun, 1 Jun 2003 13:33:58 -0700
Date: Sun, 1 Jun 2003 13:33:47 -0700 (PDT)
From: Tom Samplonius <tom@sdf.com>
To: Andreas Klemm <andreas@klemm.apsfilter.org>
In-Reply-To: <20030601174418.GA39708@titan.klemm.apsfilter.org>
Message-ID: <Pine.BSF.4.05.10306011326560.10694-100000@misery.sdf.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@FreeBSD.org
Subject: Re: Supported Controller under FreeBSD for Sun Storedge A5000, FC-AL
 ?
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Jun 2003 22:17:47 -0000


  Well, FreeBSD supports Qlogic FC-AL cards (man isp).

  You have to the matching media type though.  I have a bunch of Qlogic
ISP-2100 64bit PCI cards with a copper interface.  But rather than the
common DB9 connector, it is RJ-like clip connector.  I'll sell them for
$50/ea.


Tom


From owner-freebsd-scsi@FreeBSD.ORG  Sun Jun  1 15:39:23 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 9E9E437B401
	for <scsi@FreeBSD.org>; Sun,  1 Jun 2003 15:39:23 -0700 (PDT)
Received: from aslan.scsiguy.com (mail.scsiguy.com [63.229.232.106])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8A7E543F3F
	for <scsi@FreeBSD.org>; Sun,  1 Jun 2003 15:39:22 -0700 (PDT)
	(envelope-from gibbs@scsiguy.com)
Received: from aslan.scsiguy.com (aslan.scsiguy.com [63.229.232.106])
	by aslan.scsiguy.com (8.12.8/8.12.8) with ESMTP id h51MdMIh025446;
	Sun, 1 Jun 2003 16:39:22 -0600 (MDT)
	(envelope-from gibbs@scsiguy.com)
Date: Sun, 01 Jun 2003 16:39:22 -0600
From: "Justin T. Gibbs" <gibbs@scsiguy.com>
To: Kern Sibbald <kern@sibbald.com>
Message-ID: <2897610000.1054507162@aslan.scsiguy.com>
In-Reply-To: <1054503429.1578.1715.camel@rufus>
References: <1054490081.1582.1685.camel@rufus>
	<2846020000.1054498114@aslan.scsiguy.com> <1054503429.1578.1715.camel@rufus>
X-Mailer: Mulberry/3.0.3 (Linux/x86)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
cc: scsi@FreeBSD.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Jun 2003 22:39:24 -0000

>> Perhaps both Linux and Solaris force the tape drives to run in
>> unbuffered mode?
> 
> Both of these systems run in synchronous write (unbuffered)
> mode by default. It is possible to run with asynchronous
> writes (buffered mode), but I am not aware of any 
> program that does so.  The mt program can be used to set
> synchronous/asynchronous writes, or other modes such
> as Sys V compatibility rather than BSD style.

Does Solaris have the drvbuffer command that is in Linux?

>> > 2. The SCSI driver is doing asynchronous writes (very bad) and
>> >    the End of Medium is not sent to Bacula until many writes after
>> >    the end of the tape.
>> 
>> Disabling the tape drive's write buffer kills performance.  All
>> of the information required to handle buffered writes should be
>> available to you.
> 
> My personal preference is for data security before performance.

There is no potential for lost data if you handle the status that
is presented to you.

> If you are in fact doing asynchronous writes (buffered mode), then
> Bacula will not support FreeBSD without essentially duplicating the
> driver's buffering code inside Bacula -- something I don't plan
> to do in the near future, if for not other reason than doing so
> would mean a different driver for every operating system.

The tape driver doesn't have any buffering code (unlike Linux which
does).  The tape drive has a buffer.  We are just enabling the use
of that buffer.  If you really want to do this simply, just do a
write filemarks of 0 marks everytime you are about to switch input
files.  The write marks flushes the device's buffer an guarantees
that any residual will be within the fd that you are currently using.
This would imply that you only need to explicitly buffer if you support
backups from stdin.

> I'm not convinced that there is really much loss in performance,
> and even if I am wrong (quite possibly) 
> it can be easily compensated by having Bacula
> buffer itself and using a separate thread dedicated to writing
> and using synchronous (non-buffered) writes in the OS driver.

You can never recover the round trip time on the SCSI bus unless
you either have a device that allows you to queue more than one
command at a time or that buffers.  I believe that only FC tape
devices support queuing more than one command at a time, but few
programs support this anyway (unless you lie and say that a previous
write has completed).

> How do you support tar?  Tar knows nothing about buffering --
> at least not GNU tar to the best of my knowledge.

I think few people use tar for multi-volume backups unless they
specify a specific tape length, but I really don't know.

>> Perhaps we should also implement the MTCACHE/MTNOCACHE opcodes so
>> that userland apps can control this.  It's not clear if this is
>> exactly what they were created for, but it may be better to use
>> these than to add some other opcodes.
> 
>> From my experience with Solaris/Linux (absolutely no problems in
> 3 years), I'd recommend implementing a non-buffered mode (your
> MTNOCACHE I assume), and it should be the default.  In fact,
> though it is certainly possible and possibly worth the effort,
> I've never heard of any standard Unix program handling a 
> buffered tape drive.  If you know one, I would certainly like to
> know about it.

Standard program?  I don't know about that, but the commercial
apps have always supported buffered mode.

> Exactly what ioctl() does what is not critical for me as I can
> always code it -- what counts is that it is well documented.
> Of course, the more things are standard across systems, the
> easier it is to program.

It's not clear to me that there is a standard.

> Maybe I missed it, but I didn't see anything that indicated that
> the FreeBSD does asynchronous writes.

>From looking at the sa driver, it appears that it always tries to
do buffered writes unless there is a device quirk indicating that
mode select doesn't work.

--
Justin

From owner-freebsd-scsi@FreeBSD.ORG  Sun Jun  1 17:00:24 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 3933537B401
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 17:00:24 -0700 (PDT)
Received: from beppo.feral.com (beppo.feral.com [192.67.166.79])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 767AD43FAF
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 17:00:23 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Received: from mailhost.feral.com (mjacob@mailhost.feral.com [192.67.166.1])
	by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h5200Iqw097301;
	Sun, 1 Jun 2003 17:00:22 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Date: Sun, 1 Jun 2003 17:00:18 -0700 (PDT)
From: Matthew Jacob <mjacob@feral.com>
X-X-Sender: mjacob@beppo
To: Kern Sibbald <kern@sibbald.com>
In-Reply-To: <1054503893.1578.1723.camel@rufus>
Message-ID: <20030601165751.H97138@beppo>
References: <1054490081.1582.1685.camel@rufus>
	<2846020000.1054498114@aslan.scsiguy.com>
	<1054503893.1578.1723.camel@rufus>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: mjacob@feral.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 00:00:24 -0000

Of course linux has async && buffered. Linux has to copy the data from
user space to kernel buffers and *then* write them. This leads to an
obvious desire to overlap such writes. The same feature was available in
Solaris 2.5 as well.

'Buffering' as we talk about here typically means the device buffers
themselves. You don't want to turn this off. You don't want to turn this
off. You don't want to turn this off. The only device that I know of
that really *has* to have this off is the old M4 1/2" reel drive because
it would discard buffered data when it saw the early warning marker.

I have a longer answer to the previous mail about to go out.

-matt

From owner-freebsd-scsi@FreeBSD.ORG  Sun Jun  1 17:13:47 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6A70F37B401
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 17:13:47 -0700 (PDT)
Received: from beppo.feral.com (beppo.feral.com [192.67.166.79])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 623FA43F3F
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 17:13:46 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Received: from mailhost.feral.com (mjacob@mailhost.feral.com [192.67.166.1])
	by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h520Djqw097367;
	Sun, 1 Jun 2003 17:13:45 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Date: Sun, 1 Jun 2003 17:13:45 -0700 (PDT)
From: Matthew Jacob <mjacob@feral.com>
X-X-Sender: mjacob@beppo
To: Kern Sibbald <kern@sibbald.com>
In-Reply-To: <20030601124620.S18592@root.org>
Message-ID: <20030601163730.T97138@beppo>
References: <20030601124620.S18592@root.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: mjacob@feral.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 00:13:47 -0000


Hello, I'm the author of the SA driver. This specific case is something
I have indeed tried to handle correctly, but could have missed something
on. In particular I've been wary of devices in fixed block mode.

The executive summary: I need more info. I need to know:

	a) was the tape device in fixed or variable block mode

	b) you claim to have lost blocks 1555..1567, and that
	1568 was the signifier to change tapes. Are these tape
	blocks reflective of single 'write' requests? Or are
	these multiple tape records issued in one write?

	c) What was the signifier you got that indicated that it
	was time to change tapes (viz block 1568)? -1 and an errno
	set? A residual that indicated that some data that you
	had requested to be written had not been written.


	d) Other general info about whether you were indeed using
	the 'no-rewind' device, whether you'd changed the default
	EOT model (from 'dual filemark' to 'single filemark'- you
	*have* read the man pages, yes? :-))


There is one case I'm also worried about. This is from sa.c:saerror:

       if (csio->cdb_io.cdb_bytes[0] == SA_WRITE) {
                if (sense_key == SSD_KEY_VOLUME_OVERFLOW) {
                        csio->resid = resid;
                        error = ENOSPC;
                } else if (sense->flags & SSD_EOM) {
                        softc->flags |= SA_FLAG_EOM_PENDING;
                        /*
                         * Grotesque as it seems, the few times
                         * I've actually seen a non-zero resid,
                         * the tape drive actually lied and had
                         * writtent all the data!.
                         */
                        csio->resid = 0;
                }

This is saying: if we were writing, and we got SSD_KEY_VOLUME_OVERFLOW,
we're at hard EOT- we have to assume we didn't write *any* data
for this last operation, and we return an errno.

Otherwise, if early warning was spotted, mark EOM pending, but *don't*
believe the residual field.

Every tape drive I'd tested with (and this was around 7 or 8) had all,
when presenting a non-zero residual, had lied about what they actually
had put on the tape.

What I'm obviously worried about here is whether or not your tape drive
was correct in reporting a residual. This would indeed fit your data.

I'm pretty sure I also tested my EOT test program with an Archive
autoloader- but I don't remember for sure.


Other points:

> However, more recently Dan Langille did some extensive
> testing writing a 6GB file to six tapes. This brought
> out additional problems of the driver "freezing" the tape,

If the tape is 'freezing' it means that tape position was lost.
Under what circumstances did this occor?


-matt

From owner-freebsd-scsi@FreeBSD.ORG  Sun Jun  1 18:58:45 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id D59F737B401
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 18:58:45 -0700 (PDT)
Received: from bast.unixathome.org (bast.unixathome.org [66.11.174.150])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 1B50443F3F
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 18:58:44 -0700 (PDT)
	(envelope-from dan@langille.org)
Received: from wocker (wocker.unixathome.org [192.168.0.99])
	by bast.unixathome.org (Postfix) with ESMTP
	id 286DC3F4F; Sun,  1 Jun 2003 21:58:43 -0400 (EDT)
From: "Dan Langille" <dan@langille.org>
To: Matthew Jacob <mjacob@feral.com>
Date: Sun, 01 Jun 2003 21:58:43 -0400
MIME-Version: 1.0
Message-ID: <3EDA7713.25862.C5BD5952@localhost>
Priority: normal
References: <20030601124620.S18592@root.org>
In-reply-to: <20030601163730.T97138@beppo>
X-mailer: Pegasus Mail for Windows (v4.02a)
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Content-description: Mail message body
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 01:58:46 -0000

On 1 Jun 2003 at 17:13, Matthew Jacob wrote:

> I'm pretty sure I also tested my EOT test program with an Archive
> autoloader- but I don't remember for sure.

Would it help if I tested your EOT test program with this drive?
-- 
Dan Langille : http://www.langille.org/

From owner-freebsd-scsi@FreeBSD.ORG  Sun Jun  1 19:03:25 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 712EC37B401
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 19:03:25 -0700 (PDT)
Received: from beppo.feral.com (beppo.feral.com [192.67.166.79])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 3F26D43F93
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 19:03:24 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Received: from wonky.in0.lcl (wonky.in0.lcl [172.16.166.7])
	by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h5223Nqw097842;
	Sun, 1 Jun 2003 19:03:23 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Date: Sun, 1 Jun 2003 19:03:23 -0700 (PDT)
From: Matthew Jacob <mjacob@feral.com>
X-X-Sender: mjacob@wonky.in0.lcl
To: Dan Langille <dan@langille.org>
In-Reply-To: <3EDA7713.25862.C5BD5952@localhost>
Message-ID: <20030601190003.H49295@wonky.in0.lcl>
References: <20030601124620.S18592@root.org>
	<3EDA7713.25862.C5BD5952@localhost>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: mjacob@feral.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 02:03:25 -0000


Absolutely. I gave the BitKeeper URL of my toolkit, but here's a URL to
pull just the tape pattern test from:

http://people.freebsd.org/~mjacob/tape_pattern_tester.c

I *would* like to know what the output of 'mt status' on that drive is
too.

-matt


On Sun, 1 Jun 2003, Dan Langille wrote:

> On 1 Jun 2003 at 17:13, Matthew Jacob wrote:
>
> > I'm pretty sure I also tested my EOT test program with an Archive
> > autoloader- but I don't remember for sure.
>
> Would it help if I tested your EOT test program with this drive?
> --
> Dan Langille : http://www.langille.org/
>
>

From owner-freebsd-scsi@FreeBSD.ORG  Sun Jun  1 21:43:33 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6FB3437B404
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 21:43:33 -0700 (PDT)
Received: from bunyip.cc.uq.edu.au (bunyip.cc.uq.edu.au [130.102.2.1])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 45F7A43F93
	for <freebsd-scsi@freebsd.org>; Sun,  1 Jun 2003 21:43:32 -0700 (PDT)
	(envelope-from dougg@torque.net)
Received: from torque.net (d-242-41.stlucia.uq.net.au [203.101.242.41])
	by bunyip.cc.uq.edu.au (8.12.9/8.12.9) with ESMTP id h524hRhp023908;
	Mon, 2 Jun 2003 14:43:30 +1000 (GMT+1000)
Message-ID: <3EDAD5B6.5040308@torque.net>
Date: Mon, 02 Jun 2003 14:42:30 +1000
From: Douglas Gilbert <dougg@torque.net>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: freebsd-scsi@freebsd.org
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
cc: ballen@gravity.phys.uwm.edu
Subject: smartmontools port
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: dougg@torque.net
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 04:43:33 -0000

"The smartmontools package contains two utility
programs (smartctl and smartd) to control and
monitor storage systems using the Self-Monitoring,
Analysis and Reporting Technology System (S.M.A.R.T.)
built into most modern ATA and SCSI hard disks."
See http://smartmontools.sourceforge.net for more
details.

Currently it only supports Linux but the maintainer,
Bruce Allen <ballen@gravity.phys.uwm.edu>, has
received patches for a FreeBSD port for ATA disks.
[Those patches are not in the project's CVS yet.]

As yet no-one has proposed or offered any patches for
a FreeBSD port of the SCSI code. For SCSI specific
information about smartmontools together with examples
see this url:
http://smartmontools.sourceforge.net/smartmontools_scsi.html

I have rewritten the SCSI command handling code and
Kai Makisara has added code to support the TapeAlert
mechanism. The Linux SCSI command handling details
are hidden behind a CAM like structure.
This should facilitate a clean port of this code.
Other broader issues would need addressing (e.g. the
assumptions made at higher levels about device names
being SCSI or ATA devices).

If anyone wishes to volunteer or look at this please
contact me or Bruce. We would also be interested if
FreeBSD has any other utilities that provide SMART
facilities.

Doug Gilbert

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 01:28:54 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id D8BBE37B401
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 01:28:52 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 81EDF43FA3
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 01:28:50 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h528Scv04594;
	Mon, 2 Jun 2003 10:28:39 +0200
From: Kern Sibbald <kern@sibbald.com>
To: mjacob@feral.com
In-Reply-To: <20030601163730.T97138@beppo>
References: <20030601124620.S18592@root.org>  <20030601163730.T97138@beppo>
Content-Type: text/plain
Organization: 
Message-Id: <1054542517.1578.1770.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 02 Jun 2003 10:28:38 +0200
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 08:28:55 -0000

Hello,

Yes, I've seen both your name and Justin Gibbs in the FreeBSD
documentation (partial answer to question d below).


On Mon, 2003-06-02 at 02:13, Matthew Jacob wrote:
> Hello, I'm the author of the SA driver. This specific case is something
> I have indeed tried to handle correctly, but could have missed something
> on. In particular I've been wary of devices in fixed block mode.
> 
> The executive summary: I need more info. I need to know:
> 
> 	a) was the tape device in fixed or variable block mode

The tape drive was in variable block mode.  However at the time of
the error all the blocks Bacula was writing were the same size
i.e. 64512 bytes.

> 
> 	b) you claim to have lost blocks 1555..1567, and that
> 	1568 was the signifier to change tapes. Are these tape
> 	blocks reflective of single 'write' requests? Or are
> 	these multiple tape records issued in one write?

All Bacula writes are single non-buffered writes of what I call a
block (a single tape record).  By the way, these block numbers are
Bacula block number counting from 1 beginning at the last EOF
that Bacula wrote or the beginning of the tape. Each block contains
among other things the block number -- this allowed us to identify
which blocks were missing with certainty.  In addition, Bacula
increments the block number at only one place in the code -- after
a "successful" write.  At the end of the tape Bacula writes the final
block number to its database -- we found the correct block number
(the last of the missing blocks) in the Bacula database thus "proving"
that Bacula actually wrote the blocks.

> 
> 	c) What was the signifier you got that indicated that it
> 	was time to change tapes (viz block 1568)? -1 and an errno
> 	set? A residual that indicated that some data that you
> 	had requested to be written had not been written.

Bacula stops writing and changes tapes under a single condition:
the return status from write() does not equal the number of bytes
requested to be written.  

Then a bit of analysis is done and the reason is reported (write error,
end of medium, ...). The effect is the same whether it was an
end of tape or an I/O error.  In this particular
case I cannot say with 100% assurance what happened, but I believe
that Bacula received a -1 status and errno was ENOSPC.

If this point is critical, we can re-run the test with debug code
inserted to give us the exact status that was returned.  It is a
bit of work for us both, but if it is important, say so and we
will do it.

> 
> 
> 	d) Other general info about whether you were indeed using
> 	the 'no-rewind' device, whether you'd changed the default
> 	EOT model (from 'dual filemark' to 'single filemark'- you
> 	*have* read the man pages, yes? :-))

Yes, we were using the no rewind device (though this makes absolutely
no difference to Bacula).  The EOT model was 2 EOF's I am sure because
I questioned Dan on that point and he proved it was 2.

Yes, I have read the man pages several times in detail as well as the
man pages for Linux and Solaris. 


> 
> 
> 
> There is one case I'm also worried about. This is from sa.c:saerror:
> 
>        if (csio->cdb_io.cdb_bytes[0] == SA_WRITE) {
>                 if (sense_key == SSD_KEY_VOLUME_OVERFLOW) {
>                         csio->resid = resid;
>                         error = ENOSPC;
>                 } else if (sense->flags & SSD_EOM) {
>                         softc->flags |= SA_FLAG_EOM_PENDING;
>                         /*
>                          * Grotesque as it seems, the few times
>                          * I've actually seen a non-zero resid,
>                          * the tape drive actually lied and had
>                          * writtent all the data!.
>                          */
>                         csio->resid = 0;
>                 }
> 
> This is saying: if we were writing, and we got SSD_KEY_VOLUME_OVERFLOW,
> we're at hard EOT- we have to assume we didn't write *any* data
> for this last operation, and we return an errno.
> 
> Otherwise, if early warning was spotted, mark EOM pending, but *don't*
> believe the residual field.

In both Solaris and Linux, they immediately notify Bacula with an
errno=ENOSPC (or at least a -1 status) when the early warning is hit.
When this happens, the write was not successful. I then immediately
clear the error status and write two EOF marks (one would be sufficient)
and change tapes.

> 
> Every tape drive I'd tested with (and this was around 7 or 8) had all,
> when presenting a non-zero residual, had lied about what they actually
> had put on the tape.
> 
> What I'm obviously worried about here is whether or not your tape drive
> was correct in reporting a residual. This would indeed fit your data.
> 
> I'm pretty sure I also tested my EOT test program with an Archive
> autoloader- but I don't remember for sure.

As long as the full record is not successfully written to the tape,
Bacula will be 100% data correct because any "short" block that Bacula
reads is discarded. 

Logically what you should do is that if a partial or full
record is written to tape and an early EOM mark is detected, you should
return a the bytes written and set a flag. On the next write, you should
report no data written and errno=ENOSPC.  

That will ensure that every program knows exactly where it is.

> 
> 
> 
> Other points:
> 
> > However, more recently Dan Langille did some extensive
> > testing writing a 6GB file to six tapes. This brought
> > out additional problems of the driver "freezing" the tape,
> 
> If the tape is 'freezing' it means that tape position was lost.
> Under what circumstances did this occor?

Yes, the tape is "freezing".  I do not believe that it is freezing
during the writing, but it apparently freezes during Bacula's check to
see whether or not the last block was correctly written.  This check
fails on FreeBSD probably for two reasons: 1. you freeze the tape. 
2. your handling of EOF marks does not correspond to what Solaris/Linux
does.

Point 1 freezing of the tape:
At EOM (or I/O error) Bacula writes two EOF marks, backspaces over them,
backspaces a record then rereads the record and compares it to the last
block successfully written. This works perfectly on Solaris/Linux, but
does not work on FreeBSD. One reason is that I believe you freeze the
tape on the backspace record.

Point 2 handling of EOF marks:
FreeBSD's handling of EOF marks is quite different from Solaris/Linux
in the sense that Solaris/Linux is "transparent" -- the program writer
never sees the extra EOF marks. In FreeBSD the EOF marks that the driver
adds are visible to the program (causing Bacula great problems, most of
which I have programmed around).

Basically the best I can determine after Bacula writes its two EOF
marks, FreeBSD adds another one, but leaves the tape positioned after
the EOF mark it wrote rather than before it.  When Solaris/Linux add
an EOF mark in the driver, they always backspace over it and leave
you positioned "correctly".

As an example: if I write:

  write()
  EOF
  EOF
  ioctl(MTBSF)
  ioctl(MTBSF)
  ioctl(MTBSR)

I end up on Linux/Solaris end up positioned just before the last
write. On FreeBSD, I seem to always end up positioned *after* the
last write, which I claim in BSD tape mode is "incorrect" (i.e. not
expected).

Actually, if Linux/Solaris are intelligent, they will not write a third
EOF mark.  If I had only written one EOF mark, they would have added a
second one, but backspaced over it before returning control to me.


From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 01:29:58 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id EDE8E37B401
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 01:29:57 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id DCDCB43F85
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 01:29:55 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h528Tlv04599;
	Mon, 2 Jun 2003 10:29:47 +0200
From: Kern Sibbald <kern@sibbald.com>
To: mjacob@feral.com
In-Reply-To: <20030601165751.H97138@beppo>
References: <1054490081.1582.1685.camel@rufus>
	 <2846020000.1054498114@aslan.scsiguy.com>
	 <1054503893.1578.1723.camel@rufus>  <20030601165751.H97138@beppo>
Content-Type: text/plain
Organization: 
Message-Id: <1054542587.1578.1772.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 02 Jun 2003 10:29:47 +0200
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 08:29:58 -0000

Yes, after a bit more thought, I realized this was the
case, thanks.

In any case, both buffered and async writes are turned off by
default.

On Mon, 2003-06-02 at 02:00, Matthew Jacob wrote:
> Of course linux has async && buffered. Linux has to copy the data from
> user space to kernel buffers and *then* write them. This leads to an
> obvious desire to overlap such writes. The same feature was available in
> Solaris 2.5 as well.
> 
> 'Buffering' as we talk about here typically means the device buffers
> themselves. You don't want to turn this off. You don't want to turn this
> off. You don't want to turn this off. The only device that I know of
> that really *has* to have this off is the old M4 1/2" reel drive because
> it would discard buffered data when it saw the early warning marker.
> 
> I have a longer answer to the previous mail about to go out.
> 
> -matt

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 01:57:53 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 1769E37B404
	for <scsi@FreeBSD.org>; Mon,  2 Jun 2003 01:57:53 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4A62543F85
	for <scsi@FreeBSD.org>; Mon,  2 Jun 2003 01:57:51 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h528vgv04658;
	Mon, 2 Jun 2003 10:57:42 +0200
From: Kern Sibbald <kern@sibbald.com>
To: "Justin T. Gibbs" <gibbs@scsiguy.com>
In-Reply-To: <2897610000.1054507162@aslan.scsiguy.com>
References: <1054490081.1582.1685.camel@rufus>
	 <2846020000.1054498114@aslan.scsiguy.com>
	 <1054503429.1578.1715.camel@rufus>
	 <2897610000.1054507162@aslan.scsiguy.com>
Content-Type: text/plain
Organization: 
Message-Id: <1054544261.1578.1801.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 02 Jun 2003 10:57:42 +0200
Content-Transfer-Encoding: 7bit
cc: scsi@FreeBSD.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 08:57:53 -0000

On Mon, 2003-06-02 at 00:39, Justin T. Gibbs wrote:
> >> Perhaps both Linux and Solaris force the tape drives to run in
> >> unbuffered mode?
> > 
> > Both of these systems run in synchronous write (unbuffered)
> > mode by default. It is possible to run with asynchronous
> > writes (buffered mode), but I am not aware of any 
> > program that does so.  The mt program can be used to set
> > synchronous/asynchronous writes, or other modes such
> > as Sys V compatibility rather than BSD style.
> 
> Does Solaris have the drvbuffer command that is in Linux?

I'm not 100% sure -- they have just about everything, and their
documentation is very good.  All their documentation is online
at http://docs.sun.com  -- their AnswerBook.  However, if you have
not read their mt documentation, I recommend it -- that is the
definition of what I consider the "correct" driver behavior.

See for example: http://docs.sun.com/db/doc/802-5747-07/6i9g1cn4u?a=view

For me, it is the bible. Unfortunately, not all Unicies behave
like that.

> 
> >> > 2. The SCSI driver is doing asynchronous writes (very bad) and
> >> >    the End of Medium is not sent to Bacula until many writes after
> >> >    the end of the tape.
> >> 
> >> Disabling the tape drive's write buffer kills performance.  All
> >> of the information required to handle buffered writes should be
> >> available to you.
> > 
> > My personal preference is for data security before performance.
> 
> There is no potential for lost data if you handle the status that
> is presented to you.

Could you explain that more in detail?  If you mean dig into the
OS/driver specific details of an MTIOCERRSTAT packet. That *shouldn't*
be necessary -- at least it is not necessary on Solaris/Linux to
guarantee data integrity.  

> 
> > If you are in fact doing asynchronous writes (buffered mode), then
> > Bacula will not support FreeBSD without essentially duplicating the
> > driver's buffering code inside Bacula -- something I don't plan
> > to do in the near future, if for not other reason than doing so
> > would mean a different driver for every operating system.
> 
> The tape driver doesn't have any buffering code (unlike Linux which
> does).  The tape drive has a buffer.  We are just enabling the use
> of that buffer.  If you really want to do this simply, just do a
> write filemarks of 0 marks everytime you are about to switch input
> files.  The write marks flushes the device's buffer an guarantees
> that any residual will be within the fd that you are currently using.
> This would imply that you only need to explicitly buffer if you support
> backups from stdin.

I don't mind if the tape drive buffers data as long as it writes
*all* of that data to the tape and informs me on the next write
that the LEOM logical EOM in Solaris parlance (or early EOM)
has been hit.

If the drive cannot write *all* the data it has accepted to the
tape because of the EOM or whatever (I/O error), then I *much*
prefer to turn that mode off and write a block at a time.

Bacula in such a single write non-buffered mode Bacula is faster
than Networker, which for the moment is good enough for me. I
think that I can get even more speed by internally buffering and
possibly using asynchronous writes -- but that is for the pretty
far future and will undoubtedly be OS dependent since there seems
to be no standard interface for enabling/disabling such modes.

> 
> > I'm not convinced that there is really much loss in performance,
> > and even if I am wrong (quite possibly) 
> > it can be easily compensated by having Bacula
> > buffer itself and using a separate thread dedicated to writing
> > and using synchronous (non-buffered) writes in the OS driver.
> 
> You can never recover the round trip time on the SCSI bus unless
> you either have a device that allows you to queue more than one
> command at a time or that buffers.  I believe that only FC tape
> devices support queuing more than one command at a time, but few
> programs support this anyway (unless you lie and say that a previous
> write has completed).

I can see that performance concerns you because you wrote the
driver, but for me (and most users I believe) what counts is
data integrity first and then performance.  In addition for
me as a systems applications writer, I look for the common
denominator so that my program will work on the maximum 
machines.  Writing to a specific machine is very difficult
for me since I only have access to Linux and at times
Solaris machines with tape drives.

> 
> > How do you support tar?  Tar knows nothing about buffering --
> > at least not GNU tar to the best of my knowledge.
> 
> I think few people use tar for multi-volume backups unless they
> specify a specific tape length, but I really don't know.

I'm beginning to understand why Amanda doesn't handle multi-volume
backups.  I guess I can tell FreeBSD users that they can use the
tape drive *if* they specify a tape length, but that seems a pity.

> 
> >> Perhaps we should also implement the MTCACHE/MTNOCACHE opcodes so
> >> that userland apps can control this.  It's not clear if this is
> >> exactly what they were created for, but it may be better to use
> >> these than to add some other opcodes.
> > 
> >> From my experience with Solaris/Linux (absolutely no problems in
> > 3 years), I'd recommend implementing a non-buffered mode (your
> > MTNOCACHE I assume), and it should be the default.  In fact,
> > though it is certainly possible and possibly worth the effort,
> > I've never heard of any standard Unix program handling a 
> > buffered tape drive.  If you know one, I would certainly like to
> > know about it.
> 
> Standard program?  I don't know about that, but the commercial
> apps have always supported buffered mode.

Well, in the case of Networker on Solaris, that hasn't helped them
much -- in any case, I *will* support buffered mode someday 
even if it is my own buffering.

> 
> > Exactly what ioctl() does what is not critical for me as I can
> > always code it -- what counts is that it is well documented.
> > Of course, the more things are standard across systems, the
> > easier it is to program.
> 
> It's not clear to me that there is a standard.

Yes, it is a pity isn't it, and I'm certainly not blaming anyone
especially you.

> 
> > Maybe I missed it, but I didn't see anything that indicated that
> > the FreeBSD does asynchronous writes.
> 
> >From looking at the sa driver, it appears that it always tries to
> do buffered writes unless there is a device quirk indicating that
> mode select doesn't work.

Hmmm. Well short term, it looks like the user must specify the
size -- something almost impossible to do with any precision given
hardware compression on drives these days.  In the longer run, I
hope you will consider either turning off buffering by default or
at least letting me (in user land) do so.

Best regards,

Kern

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 03:45:38 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 49EC037B401
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 03:45:38 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id A91A643FA3
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 03:45:36 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h52AjQv04974;
	Mon, 2 Jun 2003 12:45:26 +0200
From: Kern Sibbald <kern@sibbald.com>
To: "Justin T. Gibbs" <gibbs@scsiguy.com>
In-Reply-To: <2846020000.1054498114@aslan.scsiguy.com>
References: <1054490081.1582.1685.camel@rufus>
	 <2846020000.1054498114@aslan.scsiguy.com>
Content-Type: text/plain
Organization: 
Message-Id: <1054550725.1582.1859.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 02 Jun 2003 12:45:26 +0200
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
cc: mjacob@feral.com
Subject: Differences between Solaris/Linux and FreeBSD
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 10:45:38 -0000

On Sun, 2003-06-01 at 22:08, Justin T. Gibbs wrote:
> > Hello,
> > 
> > I'm the author of a GPL'ed network backup program called
> > Bacula (www.bacula.org). For the last three years, it
> > has been working flawlessly on Solaris and Linux systems.
> > When users attempted to use it recently on FreeBSD,
> > it did not work. I subsequently modified Bacula so that
> > it would work on FreeBSD -- basically, I had to program
> > around some important differences in the way FreeBSD 
> > handles EOFs compared to Solaris and Linux.  At some point
> > in the future, I would like to discuss the problems
> > I had in detail, if that interests you.
> 
> I would be interested as I'm sure would other readers of this
> list.

...

As promised, in this email, I will try my best to describe
the differences I found between Solaris/Linux and FreeBSD
concerning tape handling. There were five separate areas
where I noticed differences:

1. On Solaris/Linux, the default behavior for ioctl(MTEOM)
   is to run in what they call slow mode. In this mode, the
   tape is positioned to the end of the data, and the driver
   returns the correct file number in the MTIOCGET packet.
   It is possible to enable fast-EOM, but no one uses it to
   my knowledge.

   On FreeBSD, you apparently always use the fast-EOM so that
   the tape position is unknown after the ioctl().

   Bacula always knows how many files are on a tape, and when
   appending to a tape that is already written and newly opened,
   it MUST know where it is on the tape. As a consequence, on
   FreeBSD, I must explicitly use MTFSF with read()s in between
   to position to the end of the tape -- a fairly slow affair.

   Note, on FreeBSD, the user must explicitly tell Bacula not
   to use the MTEOM function with a special configuration
   statement.

2. Your handling of EOM differs from Solaris/Linux.  On both of
   those systems, when the Bacula reads the first EOF, the driver
   returns 0 bytes read. On reading the second EOF, the driver
   returns 0 bytes read, but before returning backspaces over
   the EOF, leaving you positioned correctly for appending to the
   tape and having told you you are at the end of the tape by
   giving two consecutive 0 byte read.  Any further read() 
   request return an I/O error.

   On FreeBSD, reading the first EOF returns 0 bytes, reading
   the second EOF also returns 0 bytes (sometimes, I apparently
   get "Illegal operation"). However, the tape is left positioned
   after the second EOF, so appending from that point effectively
   "loses" the data. 

   To handle this correctly the FreeBSD user must add a configuration
   statement to Bacula telling him to backspace file at EOM. 

3. I have previously described this but will do so again for
   completeness here. On Solaris/Linux when Bacula does:

    write();
    ioctl(MTEOF);
    ioctl(MTEOF)
    ioctl(MTBSF);
    ioctl(MTBSF);
    ioctl(MTBSR);
    read();

   the read() re-reads the last write.  On FreeBSD, the read returns
   0 bytes (there is also a problem of freezing the tape wrapped into
   this example if I am not mistaken). Apparently the 0 bytes read is
   because FreeBSD adds an additional EOF mark (not necessary) and
   leaves the drive positioned *after* the mark thus re-reading the
   last record fails when it logically should not.

4. Tape freezing: On Solaris/Linux, the tape never "freezes". On 
   FreeBSD it does freeze. As best I can determine, you freeze the
   drive when you lose track of where you are. Typically, this 
   occurs when I do a MTBSR to re-read the last record. On Solaris/Linux
   the tape is never frozen, but when they don't know the position,
   they simply return -s in the MTIOCGET packet, which is fine with
   me because Bacula only uses that info when initially reading a
   tape to append to it.

   Freezing the tape causes all sorts of problems because it generates
   a flood of unexpected errors. Within a large complicated program like
   Bacula, when a low level routine re-reads a record during writing and
   the tape freezes, it cannot simply rewind the drive as this could
   cause chaos and possible overwriting of the beginning of the drive.

   I've attempted to overcome tape freezing by providing the user a
   means to turn off MTBSR (but they don't always do so), and by issuing
   ioctl(MTIOCERRSTAT) after every return of -1 from any I/O request.

   I recommend that you do away with freezing the drive -- it seems to
   me that it only causes more problems.  In saying that I have to 
   that I really do not understand tape freezing or why you do it since
   I found no documentation on it, and everything I write above I have
   deduced from what Dan has reported back to me.

5. I am quite fuzzy on this point because I forget exactly what happened
   and what I did about it. 

   It seems to me that on Linux, if I read a block but specify a number
   of bytes less than the number actually in the block on the tape, the
   driver returns the data anyway.  I then check if the block is 
   internally complete and if not, increase my record size to the size
   indicated in the data received, backspace one record, and re-read it.

   If I am not mistaken, on FreeBSD, the first read returns an error,
   and Bacula just immediately gives up.  Your documentation specifies
   that one can never read a partial record from a tape, but it does not
   specify what error code is generated. As a consequence, rather than
   recovering and re-reading the record, Bacula has to assume it was
   a fatal error.  

I hope these points are clear. If not please don't hesitate to ask.

Best regards,

Kern

   
From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 04:28:20 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 43D8737B401
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 04:28:20 -0700 (PDT)
Received: from bast.unixathome.org (bast.unixathome.org [66.11.174.150])
	by mx1.FreeBSD.org (Postfix) with ESMTP id A021A43F85
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 04:28:19 -0700 (PDT)
	(envelope-from dan@langille.org)
Received: from wocker (wocker.unixathome.org [192.168.0.99])
	by bast.unixathome.org (Postfix) with ESMTP
	id 1B0D93F4F; Mon,  2 Jun 2003 07:28:19 -0400 (EDT)
From: "Dan Langille" <dan@langille.org>
To: Matthew Jacob <mjacob@feral.com>
Date: Mon, 02 Jun 2003 07:28:18 -0400
MIME-Version: 1.0
Message-ID: <3EDAFC92.7886.C7C6DEAE@localhost>
Priority: normal
References: <3EDA7713.25862.C5BD5952@localhost>
In-reply-to: <20030601190003.H49295@wonky.in0.lcl>
X-mailer: Pegasus Mail for Windows (v4.02a)
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Content-description: Mail message body
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 11:28:20 -0000

On 1 Jun 2003 at 19:03, Matthew Jacob wrote:

> 
> Absolutely. I gave the BitKeeper URL of my toolkit, but here's a URL to
> pull just the tape pattern test from:
> 
> http://people.freebsd.org/~mjacob/tape_pattern_tester.c

# ./tpt -v -b 512 -r 100 -n 10 -f /dev/nrsa0
.......Rewind Tape
........Write Pass
WEOT at File 9 Record 100 Offset 512 (512000 total bytes written)
Elapsed Seconds: 118; Data Rate: 0MB/s
.......Rewind Tape
.........Read Pass
REOT at File 10 Record 0 Offset 0 (512000 total bytes read)
Elapsed Seconds: 5: Data Rate: 0MB/s

> I *would* like to know what the output of 'mt status' on that drive is
> too.

# mt -f /dev/nrsa0 status
Mode      Density              Blocksize      bpi      Compression
Current:  0x13:X3B5/88-185A    variable       61000    DCLZ
---------available modes---------
0:        0x13:X3B5/88-185A    variable       61000    DCLZ
1:        0x13:X3B5/88-185A    variable       61000    DCLZ
2:        0x13:X3B5/88-185A    variable       61000    DCLZ
3:        0x13:X3B5/88-185A    variable       61000    DCLZ
---------------------------------
Current Driver State: at rest.
---------------------------------
File Number: 11 Record Number: 0        Residual Count 0


-- 
Dan Langille : http://www.langille.org/

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 04:49:32 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id AF86237B401
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 04:49:32 -0700 (PDT)
Received: from dirac.phys.uwm.edu (dirac.phys.uwm.edu [129.89.57.19])
	by mx1.FreeBSD.org (Postfix) with ESMTP id CAB9643F93
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 04:49:31 -0700 (PDT)
	(envelope-from ballen@gravity.phys.uwm.edu)
Received: from localhost (ballen@localhost)
	by dirac.phys.uwm.edu (8.11.6+Sun/8.11.6) with ESMTP id h52BnRJ09551;
	Mon, 2 Jun 2003 06:49:27 -0500 (CDT)
X-Authentication-Warning: dirac.phys.uwm.edu: ballen owned process doing -bs
Date: Mon, 2 Jun 2003 06:49:26 -0500 (CDT)
From: Bruce Allen <ballen@gravity.phys.uwm.edu>
X-Sender: ballen@dirac.phys.uwm.edu
To: Douglas Gilbert <dougg@torque.net>
In-Reply-To: <3EDAD5B6.5040308@torque.net>
Message-ID: <Pine.GSO.4.21.0306020647430.9509-100000@dirac.phys.uwm.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: ben@scum.org
cc: freebsd-scsi@freebsd.org
Subject: Re: smartmontools port
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 11:49:33 -0000

I wanted to point out that the person who did the FreeBSD port, Ben Gras
<ben@scum.org> is now a smartmontools developer, and has write access to
the smartmontools CVS archive.

Cheers,
	Bruce

On Mon, 2 Jun 2003, Douglas Gilbert wrote:

> "The smartmontools package contains two utility
> programs (smartctl and smartd) to control and
> monitor storage systems using the Self-Monitoring,
> Analysis and Reporting Technology System (S.M.A.R.T.)
> built into most modern ATA and SCSI hard disks."
> See http://smartmontools.sourceforge.net for more
> details.
> 
> Currently it only supports Linux but the maintainer,
> Bruce Allen <ballen@gravity.phys.uwm.edu>, has
> received patches for a FreeBSD port for ATA disks.
> [Those patches are not in the project's CVS yet.]
> 
> As yet no-one has proposed or offered any patches for
> a FreeBSD port of the SCSI code. For SCSI specific
> information about smartmontools together with examples
> see this url:
> http://smartmontools.sourceforge.net/smartmontools_scsi.html
> 
> I have rewritten the SCSI command handling code and
> Kai Makisara has added code to support the TapeAlert
> mechanism. The Linux SCSI command handling details
> are hidden behind a CAM like structure.
> This should facilitate a clean port of this code.
> Other broader issues would need addressing (e.g. the
> assumptions made at higher levels about device names
> being SCSI or ATA devices).
> 
> If anyone wishes to volunteer or look at this please
> contact me or Bruce. We would also be interested if
> FreeBSD has any other utilities that provide SMART
> facilities.
> 
> Doug Gilbert
> 
> 

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 08:06:18 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 3363937B401
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 08:06:18 -0700 (PDT)
Received: from beppo.feral.com (beppo.feral.com [192.67.166.79])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8F67843F3F
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 08:06:17 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Received: from mailhost.feral.com (mjacob@mailhost.feral.com [192.67.166.1])
	by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h52F6Fqw069696;
	Mon, 2 Jun 2003 08:06:15 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Date: Mon, 2 Jun 2003 08:06:15 -0700 (PDT)
From: Matthew Jacob <mjacob@feral.com>
X-X-Sender: mjacob@beppo
To: Dan Langille <dan@langille.org>
In-Reply-To: <3EDAFC92.7886.C7C6DEAE@localhost>
Message-ID: <20030602080535.K69681@beppo>
References: <3EDA7713.25862.C5BD5952@localhost>
	<3EDAFC92.7886.C7C6DEAE@localhost>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: mjacob@feral.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 15:06:18 -0000


Err, umm,  you need to run it to actual physical EOT so we can see what
happens with early warning...


On Mon, 2 Jun 2003, Dan Langille wrote:

> On 1 Jun 2003 at 19:03, Matthew Jacob wrote:
>
> >
> > Absolutely. I gave the BitKeeper URL of my toolkit, but here's a URL to
> > pull just the tape pattern test from:
> >
> > http://people.freebsd.org/~mjacob/tape_pattern_tester.c
>
> # ./tpt -v -b 512 -r 100 -n 10 -f /dev/nrsa0
> .......Rewind Tape
> ........Write Pass
> WEOT at File 9 Record 100 Offset 512 (512000 total bytes written)
> Elapsed Seconds: 118; Data Rate: 0MB/s
> .......Rewind Tape
> .........Read Pass
> REOT at File 10 Record 0 Offset 0 (512000 total bytes read)
> Elapsed Seconds: 5: Data Rate: 0MB/s
>
> > I *would* like to know what the output of 'mt status' on that drive is
> > too.
>
> # mt -f /dev/nrsa0 status
> Mode      Density              Blocksize      bpi      Compression
> Current:  0x13:X3B5/88-185A    variable       61000    DCLZ
> ---------available modes---------
> 0:        0x13:X3B5/88-185A    variable       61000    DCLZ
> 1:        0x13:X3B5/88-185A    variable       61000    DCLZ
> 2:        0x13:X3B5/88-185A    variable       61000    DCLZ
> 3:        0x13:X3B5/88-185A    variable       61000    DCLZ
> ---------------------------------
> Current Driver State: at rest.
> ---------------------------------
> File Number: 11 Record Number: 0        Residual Count 0
>
>
> --
> Dan Langille : http://www.langille.org/
>
>

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 08:10:59 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5969937B404
	for <scsi@FreeBSD.org>; Mon,  2 Jun 2003 08:10:59 -0700 (PDT)
Received: from aslan.scsiguy.com (aslan.scsiguy.com [63.229.232.106])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 467B743F3F
	for <scsi@FreeBSD.org>; Mon,  2 Jun 2003 08:10:58 -0700 (PDT)
	(envelope-from gibbs@scsiguy.com)
Received: from aslan.scsiguy.com (aslan.scsiguy.com [63.229.232.106])
	by aslan.scsiguy.com (8.12.8/8.12.8) with ESMTP id h52FAvIh028540;
	Mon, 2 Jun 2003 09:10:57 -0600 (MDT)
	(envelope-from gibbs@scsiguy.com)
Date: Mon, 02 Jun 2003 09:10:57 -0600
From: "Justin T. Gibbs" <gibbs@scsiguy.com>
To: Kern Sibbald <kern@sibbald.com>
Message-ID: <3177210000.1054566657@aslan.scsiguy.com>
In-Reply-To: <1054544261.1578.1801.camel@rufus>
References: <1054490081.1582.1685.camel@rufus>
	<2846020000.1054498114@aslan.scsiguy.com>	 <1054503429.1578.1715.camel@rufus>	
	<1054544261.1578.1801.camel@rufus>
X-Mailer: Mulberry/3.0.3 (Linux/x86)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
cc: scsi@FreeBSD.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 15:10:59 -0000

>> > My personal preference is for data security before performance.
>> 
>> There is no potential for lost data if you handle the status that
>> is presented to you.
> 
> Could you explain that more in detail?  If you mean dig into the
> OS/driver specific details of an MTIOCERRSTAT packet. That *shouldn't*
> be necessary -- at least it is not necessary on Solaris/Linux to
> guarantee data integrity.  

If you properly honor the residual provided in MTIOCERRSTAT, then
you will know what data needs to be rewritten.  In otherwords,
all the information required to behave correctly is there.

>> The tape driver doesn't have any buffering code (unlike Linux which
>> does).  The tape drive has a buffer.  We are just enabling the use
>> of that buffer.  If you really want to do this simply, just do a
>> write filemarks of 0 marks everytime you are about to switch input
>> files.  The write marks flushes the device's buffer an guarantees
>> that any residual will be within the fd that you are currently using.
>> This would imply that you only need to explicitly buffer if you support
>> backups from stdin.
> 
> I don't mind if the tape drive buffers data as long as it writes
> *all* of that data to the tape and informs me on the next write
> that the LEOM logical EOM in Solaris parlance (or early EOM)
> has been hit.

FreeBSD does start to fail writes at LEOM, but depending on the tape
type and the amount of buffer, etc. you may or may not get all data
from the buffer to the tape.  That is why a residual is provided.

>> You can never recover the round trip time on the SCSI bus unless
>> you either have a device that allows you to queue more than one
>> command at a time or that buffers.  I believe that only FC tape
>> devices support queuing more than one command at a time, but few
>> programs support this anyway (unless you lie and say that a previous
>> write has completed).
> 
> I can see that performance concerns you because you wrote the
> driver,

I care about performance because performance matters.  I didn't
write this driver.

--
Justin

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 08:14:53 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 34ED837B401
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 08:14:53 -0700 (PDT)
Received: from bast.unixathome.org (bast.unixathome.org [66.11.174.150])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8457543F3F
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 08:14:52 -0700 (PDT)
	(envelope-from dan@langille.org)
Received: from wocker (wocker.unixathome.org [192.168.0.99])
	by bast.unixathome.org (Postfix) with ESMTP
	id D90AB3F4F; Mon,  2 Jun 2003 11:14:51 -0400 (EDT)
From: "Dan Langille" <dan@langille.org>
To: Matthew Jacob <mjacob@feral.com>
Date: Mon, 02 Jun 2003 11:14:51 -0400
MIME-Version: 1.0
Message-ID: <3EDB31AB.16420.C8964B7D@localhost>
Priority: normal
References: <3EDAFC92.7886.C7C6DEAE@localhost>
In-reply-to: <20030602080535.K69681@beppo>
X-mailer: Pegasus Mail for Windows (v4.02a)
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Content-description: Mail message body
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 15:14:53 -0000

On 2 Jun 2003 at 8:06, Matthew Jacob wrote:

> Err, umm,  you need to run it to actual physical EOT so we can see what
> happens with early warning...

OK, this will take a while:

# ./tpt -v -b 5120 -r 10000000000 -n 10 -f /dev/nrsa0
.......Rewind Tape
........Write Pass
-- 
Dan Langille : http://www.langille.org/

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 08:21:41 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5280937B401
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 08:21:41 -0700 (PDT)
Received: from beppo.feral.com (beppo.feral.com [192.67.166.79])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 20D5E43F93
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 08:21:40 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Received: from mailhost.feral.com (mjacob@mailhost.feral.com [192.67.166.1])
	by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h52FLdqw069749;
	Mon, 2 Jun 2003 08:21:39 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Date: Mon, 2 Jun 2003 08:21:39 -0700 (PDT)
From: Matthew Jacob <mjacob@feral.com>
X-X-Sender: mjacob@beppo
To: Dan Langille <dan@langille.org>
In-Reply-To: <3EDB31AB.16420.C8964B7D@localhost>
Message-ID: <20030602082134.L69681@beppo>
References: <3EDAFC92.7886.C7C6DEAE@localhost>
	<3EDB31AB.16420.C8964B7D@localhost>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: mjacob@feral.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 15:21:41 -0000


Yeah :-;


On Mon, 2 Jun 2003, Dan Langille wrote:

> On 2 Jun 2003 at 8:06, Matthew Jacob wrote:
>
> > Err, umm,  you need to run it to actual physical EOT so we can see what
> > happens with early warning...
>
> OK, this will take a while:
>
> # ./tpt -v -b 5120 -r 10000000000 -n 10 -f /dev/nrsa0
> .......Rewind Tape
> ........Write Pass
> --
> Dan Langille : http://www.langille.org/
>
>

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 08:27:31 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2AD8537B404
	for <scsi@FreeBSD.org>; Mon,  2 Jun 2003 08:27:31 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 14B6643FE3
	for <scsi@FreeBSD.org>; Mon,  2 Jun 2003 08:27:28 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h52FRKv05738;
	Mon, 2 Jun 2003 17:27:20 +0200
From: Kern Sibbald <kern@sibbald.com>
To: "Justin T. Gibbs" <gibbs@scsiguy.com>
In-Reply-To: <3177210000.1054566657@aslan.scsiguy.com>
References: <1054490081.1582.1685.camel@rufus>
	 <2846020000.1054498114@aslan.scsiguy.com>
	 <1054503429.1578.1715.camel@rufus>
	 <2897610000.1054507162@aslan.scsiguy.com>
	 <1054544261.1578.1801.camel@rufus>
	 <3177210000.1054566657@aslan.scsiguy.com>
Content-Type: text/plain
Organization: 
Message-Id: <1054567640.1578.1912.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 02 Jun 2003 17:27:20 +0200
Content-Transfer-Encoding: 7bit
cc: scsi@FreeBSD.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 15:27:31 -0000

Thanks for your response.

On Mon, 2003-06-02 at 17:10, Justin T. Gibbs wrote:
> >> > My personal preference is for data security before performance.
> >> 
> >> There is no potential for lost data if you handle the status that
> >> is presented to you.
> > 
> > Could you explain that more in detail?  If you mean dig into the
> > OS/driver specific details of an MTIOCERRSTAT packet. That *shouldn't*
> > be necessary -- at least it is not necessary on Solaris/Linux to
> > guarantee data integrity.  
> 
> If you properly honor the residual provided in MTIOCERRSTAT, then
> you will know what data needs to be rewritten.  In otherwords,
> all the information required to behave correctly is there.

I'll take a look at what the residual provides and see if it
corresponds to our data loss.

> 
> >> The tape driver doesn't have any buffering code (unlike Linux which
> >> does).  The tape drive has a buffer.  We are just enabling the use
> >> of that buffer.  If you really want to do this simply, just do a
> >> write filemarks of 0 marks everytime you are about to switch input
> >> files.  The write marks flushes the device's buffer an guarantees
> >> that any residual will be within the fd that you are currently using.
> >> This would imply that you only need to explicitly buffer if you support
> >> backups from stdin.
> > 
> > I don't mind if the tape drive buffers data as long as it writes
> > *all* of that data to the tape and informs me on the next write
> > that the LEOM logical EOM in Solaris parlance (or early EOM)
> > has been hit.
> 
> FreeBSD does start to fail writes at LEOM, but depending on the tape
> type and the amount of buffer, etc. you may or may not get all data
> from the buffer to the tape.  That is why a residual is provided.

Too bad that FreeBSD doesn't start failing writes at LEOM. That would
completely remove the need for a residual and hence machine specific
programming, and the cost or price for doing so is nothing.

> 
> >> You can never recover the round trip time on the SCSI bus unless
> >> you either have a device that allows you to queue more than one
> >> command at a time or that buffers.  I believe that only FC tape
> >> devices support queuing more than one command at a time, but few
> >> programs support this anyway (unless you lie and say that a previous
> >> write has completed).
> > 
> > I can see that performance concerns you because you wrote the
> > driver,
> 
> I care about performance because performance matters.  I didn't
> write this driver.

OK. Yes, performance matters but not when you are losing data. :-)


From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 09:47:04 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id BA59537B401
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 09:47:04 -0700 (PDT)
Received: from bast.unixathome.org (bast.unixathome.org [66.11.174.150])
	by mx1.FreeBSD.org (Postfix) with ESMTP id B896743F85
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 09:47:03 -0700 (PDT)
	(envelope-from dan@langille.org)
Received: from wocker (wocker.unixathome.org [192.168.0.99])
	by bast.unixathome.org (Postfix) with ESMTP
	id E51A83F4F; Mon,  2 Jun 2003 12:47:02 -0400 (EDT)
From: "Dan Langille" <dan@langille.org>
To: Matthew Jacob <mjacob@feral.com>
Date: Mon, 02 Jun 2003 12:47:02 -0400
MIME-Version: 1.0
Message-ID: <3EDB4746.25.C8EAB2B1@localhost>
Priority: normal
References: <3EDB31AB.16420.C8964B7D@localhost>
In-reply-to: <20030602082134.L69681@beppo>
X-mailer: Pegasus Mail for Windows (v4.02a)
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Content-description: Mail message body
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 16:47:05 -0000

On 2 Jun 2003 at 8:21, Matthew Jacob wrote:

> On Mon, 2 Jun 2003, Dan Langille wrote:
> 
> > On 2 Jun 2003 at 8:06, Matthew Jacob wrote:
> >
> > > Err, umm,  you need to run it to actual physical EOT so we can see what
> > > happens with early warning...
> >
> > OK, this will take a while:
> >
> > # ./tpt -v -b 5120 -r 10000000000 -n 10 -f /dev/nrsa0
> > .......Rewind Tape
> > ........Write Pass
> 
> Yeah :-;

We have a bit of progress:

# ./tpt -v -b 5120 -r 10000000000 -n 10 -f /dev/nrsa0
.......Rewind Tape
........Write Pass
WEOT at File 0 Record 694983 Offset 0 (3558312960 total bytes written)
Elapsed Seconds: 5467; Data Rate: 0.620633MB/s
.......Rewind Tape
.........Read Pass

And the wait continues...
-- 
Dan Langille : http://www.langille.org/

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 11:02:31 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 43A8D37B401
	for <scsi@freebsd.org>; Mon,  2 Jun 2003 11:02:31 -0700 (PDT)
Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D034243F3F
	for <scsi@freebsd.org>; Mon,  2 Jun 2003 11:02:29 -0700 (PDT)
	(envelope-from owner-bugmaster@freebsd.org)
Received: from freefall.freebsd.org (peter@localhost [127.0.0.1])
	by freefall.freebsd.org (8.12.9/8.12.9) with ESMTP id h52I2TUp081743
	for <scsi@freebsd.org>; Mon, 2 Jun 2003 11:02:29 -0700 (PDT)
	(envelope-from owner-bugmaster@freebsd.org)
Received: (from peter@localhost)
	by freefall.freebsd.org (8.12.9/8.12.9/Submit) id h52I2TxG081738
	for scsi@freebsd.org; Mon, 2 Jun 2003 11:02:29 -0700 (PDT)
Date: Mon, 2 Jun 2003 11:02:29 -0700 (PDT)
Message-Id: <200306021802.h52I2TxG081738@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: peter set sender to
	owner-bugmaster@freebsd.org using -f
From: FreeBSD bugmaster <bugmaster@freebsd.org>
To: scsi@FreeBSD.org
Subject: Current problem reports assigned to you
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 18:02:32 -0000

Current FreeBSD problem reports
Critical problems
Serious problems
Non-critical problems

S  Submitted   Tracker     Resp.       Description
-------------------------------------------------------------------------------
f [1999/12/21] kern/15608  scsi        acd0 / cd0 give inconsistent errors on em

1 problem total.

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 11:05:29 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 7543537B407
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 11:05:29 -0700 (PDT)
Received: from bast.unixathome.org (bast.unixathome.org [66.11.174.150])
	by mx1.FreeBSD.org (Postfix) with ESMTP id F1DC643F3F
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 11:05:24 -0700 (PDT)
	(envelope-from dan@langille.org)
Received: from wocker (wocker.unixathome.org [192.168.0.99])
	by bast.unixathome.org (Postfix) with ESMTP
	id 586353F4F; Mon,  2 Jun 2003 14:05:24 -0400 (EDT)
From: "Dan Langille" <dan@langille.org>
To: Matthew Jacob <mjacob@feral.com>
Date: Mon, 02 Jun 2003 14:05:24 -0400
MIME-Version: 1.0
Message-ID: <3EDB59A4.27599.C93270FB@localhost>
Priority: normal
References: <3EDB31AB.16420.C8964B7D@localhost>
In-reply-to: <20030602082134.L69681@beppo>
X-mailer: Pegasus Mail for Windows (v4.02a)
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Content-description: Mail message body
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 18:05:29 -0000

On 2 Jun 2003 at 8:21, Matthew Jacob wrote:

> On Mon, 2 Jun 2003, Dan Langille wrote:
> 
> > On 2 Jun 2003 at 8:06, Matthew Jacob wrote:
> >
> > > Err, umm,  you need to run it to actual physical EOT so we can see what
> > > happens with early warning...
> >
> > OK, this will take a while:
> >
> > # ./tpt -v -b 5120 -r 10000000000 -n 10 -f /dev/nrsa0
> > .......Rewind Tape
> > ........Write Pass
> 
> Yeah :-;
> 

And we have finish:

./tpt -v -b 5120 -r 10000000000 -n 10 -f /dev/nrsa0
.......Rewind Tape
........Write Pass
WEOT at File 0 Record 694983 Offset 0 (3558312960 total bytes written)
Elapsed Seconds: 5467; Data Rate: 0.620633MB/s
.......Rewind Tape
.........Read Pass
REOT at File 1 Record 0 Offset 0 (3558312960 total bytes read)
Elapsed Seconds: 4422: Data Rate: 0.7673MB/s
-- 
Dan Langille : http://www.langille.org/

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 11:11:42 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 70F1A37B404
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 11:11:42 -0700 (PDT)
Received: from beppo.feral.com (beppo.feral.com [192.67.166.79])
	by mx1.FreeBSD.org (Postfix) with ESMTP id C8BCA43F93
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 11:11:41 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Received: from mailhost.feral.com (mjacob@mailhost.feral.com [192.67.166.1])
	by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h52IBeqw071574;
	Mon, 2 Jun 2003 11:11:40 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Date: Mon, 2 Jun 2003 11:11:40 -0700 (PDT)
From: Matthew Jacob <mjacob@feral.com>
X-X-Sender: mjacob@beppo
To: Dan Langille <dan@langille.org>
In-Reply-To: <3EDB59A4.27599.C93270FB@localhost>
Message-ID: <20030602110836.H71034@beppo>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@freebsd.org
cc: Kern Sibbald <kern@sibbald.com>
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: mjacob@feral.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 18:11:42 -0000


> And we have finish:
>
> ./tpt -v -b 5120 -r 10000000000 -n 10 -f /dev/nrsa0
> .......Rewind Tape
> ........Write Pass
> WEOT at File 0 Record 694983 Offset 0 (3558312960 total bytes written)
> Elapsed Seconds: 5467; Data Rate: 0.620633MB/s
> .......Rewind Tape
> .........Read Pass
> REOT at File 1 Record 0 Offset 0 (3558312960 total bytes read)
> Elapsed Seconds: 4422: Data Rate: 0.7673MB/s
> --

Now, tape_pattern_tester actually does some somewhat simplistic data
integrity checking, so I tend to believe that the bytes read do match
the bytes written.

This now begs the question as to why Bacula lost some records.

I'm going to ponder this a bit today while I'm at one of my paying gigs.
I'll hook up a tape drive to a FreeBSD box @ Feral tomorrow and see if I
can anylyze Bacula a bit closer.

-matt

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 11:50:12 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 79B7D37B43F
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 11:50:12 -0700 (PDT)
Received: from magic.adaptec.com (magic-mail.adaptec.com [208.236.45.100])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 3CD4243FB1
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 11:50:10 -0700 (PDT)
	(envelope-from gibbs@scsiguy.com)
Received: from redfish.adaptec.com (redfish.adaptec.com [162.62.50.11])
	by magic.adaptec.com (8.11.6/8.11.6) with ESMTP id h52Ij1Z28703;
	Mon, 2 Jun 2003 11:45:01 -0700
Received: from [10.100.253.70] (aslan.btc.adaptec.com [10.100.253.70])
	by redfish.adaptec.com (8.8.8p2+Sun/8.8.8) with ESMTP id LAA18582;
	Mon, 2 Jun 2003 11:50:00 -0700 (PDT)
Date: Mon, 02 Jun 2003 12:50:40 -0600
From: "Justin T. Gibbs" <gibbs@scsiguy.com>
To: mjacob@feral.com, Dan Langille <dan@langille.org>
Message-ID: <577540000.1054579840@aslan.btc.adaptec.com>
In-Reply-To: <20030602110836.H71034@beppo>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost> <20030602110836.H71034@beppo>
X-Mailer: Mulberry/3.0.3 (Linux/x86)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
cc: freebsd-scsi@freebsd.org
cc: Kern Sibbald <kern@sibbald.com>
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: "Justin T. Gibbs" <gibbs@scsiguy.com>
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 18:50:12 -0000

>> And we have finish:
>> 
>> ./tpt -v -b 5120 -r 10000000000 -n 10 -f /dev/nrsa0

Shouldn't the test be run with the 64k record size that Bacula
uses?

--
Justin

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 12:06:56 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6222037B408
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 12:06:56 -0700 (PDT)
Received: from bast.unixathome.org (bast.unixathome.org [66.11.174.150])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 6B25043F85
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 12:06:55 -0700 (PDT)
	(envelope-from dan@langille.org)
Received: from wocker (wocker.unixathome.org [192.168.0.99])
	by bast.unixathome.org (Postfix) with ESMTP
	id 9BF543F4F; Mon,  2 Jun 2003 15:06:54 -0400 (EDT)
From: "Dan Langille" <dan@langille.org>
To: "Justin T. Gibbs" <gibbs@scsiguy.com>
Date: Mon, 02 Jun 2003 15:06:54 -0400
MIME-Version: 1.0
Message-ID: <3EDB680E.28213.C96AC110@localhost>
Priority: normal
In-reply-to: <577540000.1054579840@aslan.btc.adaptec.com>
References: <20030602110836.H71034@beppo>
X-mailer: Pegasus Mail for Windows (v4.02a)
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Content-description: Mail message body
cc: freebsd-scsi@freebsd.org
cc: Kern Sibbald <kern@sibbald.com>
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 19:06:56 -0000

On 2 Jun 2003 at 12:50, Justin T. Gibbs wrote:

> >> And we have finish:
> >> 
> >> ./tpt -v -b 5120 -r 10000000000 -n 10 -f /dev/nrsa0
> 
> Shouldn't the test be run with the 64k record size that Bacula
> uses?

I'm happy to run this again.  Would that be this command?

   ./tpt -v -b 65536 -r 10000000000 -n 10 -f /dev/nrsa0

-- 
Dan Langille : http://www.langille.org/

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 12:10:58 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id B088637B404
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 12:10:58 -0700 (PDT)
Received: from magic.adaptec.com (magic-mail.adaptec.com [208.236.45.100])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 1CA2543FA3
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 12:10:58 -0700 (PDT)
	(envelope-from gibbs@scsiguy.com)
Received: from redfish.adaptec.com (redfish.adaptec.com [162.62.50.11])
	by magic.adaptec.com (8.11.6/8.11.6) with ESMTP id h52J5qZ25982;
	Mon, 2 Jun 2003 12:05:52 -0700
Received: from [10.100.253.70] (aslan.btc.adaptec.com [10.100.253.70])
	by redfish.adaptec.com (8.8.8p2+Sun/8.8.8) with ESMTP id MAA25394;
	Mon, 2 Jun 2003 12:10:50 -0700 (PDT)
Date: Mon, 02 Jun 2003 13:11:34 -0600
From: "Justin T. Gibbs" <gibbs@scsiguy.com>
To: Dan Langille <dan@langille.org>
Message-ID: <585530000.1054581094@aslan.btc.adaptec.com>
In-Reply-To: <3EDB680E.28213.C96AC110@localhost>
References: <20030602110836.H71034@beppo> <3EDB680E.28213.C96AC110@localhost>
X-Mailer: Mulberry/3.0.3 (Linux/x86)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
cc: freebsd-scsi@freebsd.org
cc: Kern Sibbald <kern@sibbald.com>
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: "Justin T. Gibbs" <gibbs@scsiguy.com>
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 19:10:59 -0000

> I'm happy to run this again.  Would that be this command?
> 
>    ./tpt -v -b 65536 -r 10000000000 -n 10 -f /dev/nrsa0

I believe so, but I'm not familiar with the utility.

--
Justin

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 12:24:25 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 7FB7237B405
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 12:24:25 -0700 (PDT)
Received: from rootlabs.com (root.org [67.118.192.226])
	by mx1.FreeBSD.org (Postfix) with SMTP id E919143FB1
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 12:24:24 -0700 (PDT)
	(envelope-from nate@rootlabs.com)
Received: (qmail 21009 invoked by uid 1000); 2 Jun 2003 19:24:25 -0000
Date: Mon, 2 Jun 2003 12:24:25 -0700 (PDT)
From: Nate Lawson <nate@root.org>
To: Douglas Gilbert <dougg@torque.net>
In-Reply-To: <3EDAD5B6.5040308@torque.net>
Message-ID: <20030602121526.W20895@root.org>
References: <3EDAD5B6.5040308@torque.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: ballen@gravity.phys.uwm.edu
cc: freebsd-scsi@freebsd.org
Subject: Re: smartmontools port
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 19:24:25 -0000

On Mon, 2 Jun 2003, Douglas Gilbert wrote:
> "The smartmontools package contains two utility
> programs (smartctl and smartd) to control and
> monitor storage systems using the Self-Monitoring,
> Analysis and Reporting Technology System (S.M.A.R.T.)
> built into most modern ATA and SCSI hard disks."
> See http://smartmontools.sourceforge.net for more
> details.
>
> Currently it only supports Linux but the maintainer,
> Bruce Allen <ballen@gravity.phys.uwm.edu>, has
> received patches for a FreeBSD port for ATA disks.
> [Those patches are not in the project's CVS yet.]

I assume you've submitted the ATA patches to sos@?

> I have rewritten the SCSI command handling code and
> Kai Makisara has added code to support the TapeAlert
> mechanism. The Linux SCSI command handling details
> are hidden behind a CAM like structure.
> This should facilitate a clean port of this code.

I would be interested in reviewing any patches you have.  camcontrol(8)
code has good examples for the usermode libcam interface.

> Other broader issues would need addressing (e.g. the
> assumptions made at higher levels about device names
> being SCSI or ATA devices).

This could be done with a functional interface instead of data (i.e.
IS_SCSI_DEV() being platform-specific).

> If anyone wishes to volunteer or look at this please
> contact me or Bruce. We would also be interested if
> FreeBSD has any other utilities that provide SMART
> facilities.

You can implement this ad-hoc with the "camcontrol cmd" command.  Hope
this helps.

-Nate

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 13:14:42 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 767B437B401
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 13:14:42 -0700 (PDT)
Received: from beppo.feral.com (beppo.feral.com [192.67.166.79])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D23E743F75
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 13:14:41 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Received: from mailhost.feral.com (mjacob@mailhost.feral.com [192.67.166.1])
	by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h52KEdqw072059;
	Mon, 2 Jun 2003 13:14:39 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Date: Mon, 2 Jun 2003 13:14:38 -0700 (PDT)
From: Matthew Jacob <mjacob@feral.com>
X-X-Sender: mjacob@beppo
To: "Justin T. Gibbs" <gibbs@scsiguy.com>
In-Reply-To: <577540000.1054579840@aslan.btc.adaptec.com>
Message-ID: <20030602131225.F71034@beppo>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost>
	<577540000.1054579840@aslan.btc.adaptec.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@freebsd.org
cc: Kern Sibbald <kern@sibbald.com>
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: mjacob@feral.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 20:14:42 -0000


Probably. Actually, it was 63k.

But I sorta doubt that this was the issue.

A buddy of mine at Mirapoint did just remind me that physio can silently
break up xfers that are even less than 64k if the buffer isn't page
aligned- I'd forgotten about that. But I'm not sure that this is what is
occurring.

I need to think about this some more, but it may be that the actions
that are being taken after EOM detection may be overwriting data. But
don't take that to the bank at all.

-matt


On Mon, 2 Jun 2003, Justin T. Gibbs wrote:

> >> And we have finish:
> >>
> >> ./tpt -v -b 5120 -r 10000000000 -n 10 -f /dev/nrsa0
>
> Shouldn't the test be run with the 64k record size that Bacula
> uses?
>
> --
> Justin
>
>

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 14:16:34 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id C7B7737B401
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 14:16:34 -0700 (PDT)
Received: from bast.unixathome.org (bast.unixathome.org [66.11.174.150])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 2D22143F3F
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 14:16:34 -0700 (PDT)
	(envelope-from dan@langille.org)
Received: from wocker (wocker.unixathome.org [192.168.0.99])
	by bast.unixathome.org (Postfix) with ESMTP
	id 641663F4F; Mon,  2 Jun 2003 17:16:33 -0400 (EDT)
From: "Dan Langille" <dan@langille.org>
To: Matthew Jacob <mjacob@feral.com>
Date: Mon, 02 Jun 2003 17:16:33 -0400
MIME-Version: 1.0
Message-ID: <3EDB8671.23600.C9E174E6@localhost>
Priority: normal
References: <577540000.1054579840@aslan.btc.adaptec.com>
In-reply-to: <20030602131225.F71034@beppo>
X-mailer: Pegasus Mail for Windows (v4.02a)
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Content-description: Mail message body
cc: freebsd-scsi@freebsd.org
cc: Kern Sibbald <kern@sibbald.com>
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 21:16:35 -0000

On 2 Jun 2003 at 13:14, Matthew Jacob wrote:

> Probably. Actually, it was 63k.
> 
> But I sorta doubt that this was the issue.

Well, I started it, and it's done now, so in case it's ever useful:

 # ./tpt -v -b 65536 -r 10000000000 -n 10 -f /dev/nrsa0
.......Rewind Tape
........Write Pass
WEOT at File 0 Record 52374 Offset 0 (3432382464 total bytes written)
Elapsed Seconds: 3264; Data Rate: 1.00276MB/s
.......Rewind Tape
.........Read Pass
REOT at File 1 Record 0 Offset 0 (3432382464 total bytes read)
Elapsed Seconds: 3256: Data Rate: 1.00522MB/s
-- 
Dan Langille : http://www.langille.org/

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 14:24:22 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 7736037B401
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 14:24:22 -0700 (PDT)
Received: from beppo.feral.com (beppo.feral.com [192.67.166.79])
	by mx1.FreeBSD.org (Postfix) with ESMTP id DAC5A43F85
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 14:24:21 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Received: from mailhost.feral.com (mjacob@mailhost.feral.com [192.67.166.1])
	by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h52LOKqw072502;
	Mon, 2 Jun 2003 14:24:21 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Date: Mon, 2 Jun 2003 14:24:20 -0700 (PDT)
From: Matthew Jacob <mjacob@feral.com>
X-X-Sender: mjacob@beppo
To: Dan Langille <dan@langille.org>
In-Reply-To: <3EDB8671.23600.C9E174E6@localhost>
Message-ID: <20030602142401.H71034@beppo>
References: <577540000.1054579840@aslan.btc.adaptec.com>
	<3EDB8671.23600.C9E174E6@localhost>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@freebsd.org
cc: Kern Sibbald <kern@sibbald.com>
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: mjacob@feral.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 21:24:22 -0000


More data points are indeed always useful. Thanks for doing that!

On Mon, 2 Jun 2003, Dan Langille wrote:

> On 2 Jun 2003 at 13:14, Matthew Jacob wrote:
>
> > Probably. Actually, it was 63k.
> >
> > But I sorta doubt that this was the issue.
>
> Well, I started it, and it's done now, so in case it's ever useful:
>
>  # ./tpt -v -b 65536 -r 10000000000 -n 10 -f /dev/nrsa0
> .......Rewind Tape
> ........Write Pass
> WEOT at File 0 Record 52374 Offset 0 (3432382464 total bytes written)
> Elapsed Seconds: 3264; Data Rate: 1.00276MB/s
> .......Rewind Tape
> .........Read Pass
> REOT at File 1 Record 0 Offset 0 (3432382464 total bytes read)
> Elapsed Seconds: 3256: Data Rate: 1.00522MB/s
> --
> Dan Langille : http://www.langille.org/
>
>

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 14:46:53 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8326237B401
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 14:46:53 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 9D43F43FAF
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 14:46:51 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h52Lg0v06734;
	Mon, 2 Jun 2003 23:42:00 +0200
From: Kern Sibbald <kern@sibbald.com>
To: mjacob@feral.com
In-Reply-To: <20030602131225.F71034@beppo>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost> <20030602110836.H71034@beppo>
	<20030602131225.F71034@beppo>
Content-Type: text/plain
Organization: 
Message-Id: <1054590119.13606.8.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 02 Jun 2003 23:41:59 +0200
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 21:46:53 -0000

On Mon, 2003-06-02 at 22:14, Matthew Jacob wrote:
> Probably. Actually, it was 63k.

Most of Bacula writes are 64512 bytes, and all the
data that was lost consisted of blocks of 64512 bytes.

> 
> But I sorta doubt that this was the issue.
> 
> A buddy of mine at Mirapoint did just remind me that physio can silently
> break up xfers that are even less than 64k if the buffer isn't page
> aligned- I'd forgotten about that. But I'm not sure that this is what is
> occurring.

The buffers are 64 bit aligned but not page aligned.

> 
> I need to think about this some more, but it may be that the actions
> that are being taken after EOM detection may be overwriting data. But
> don't take that to the bank at all.

Dan and I have been working on this for some time, so I'm
sure there is data loss and that it is related to the EOM.

I suspect that the problem is something very simple such as
the drive buffering data then hitting the physical EOM and
of course any buffered data goes down the bit bucket.

> 
> -matt
> 
> 
> 
> On Mon, 2 Jun 2003, Justin T. Gibbs wrote:
> 
> > >> And we have finish:
> > >>
> > >> ./tpt -v -b 5120 -r 10000000000 -n 10 -f /dev/nrsa0
> >
> > Shouldn't the test be run with the 64k record size that Bacula
> > uses?
> >
> > --
> > Justin
> >
> >

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 14:55:47 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 560AF37B401
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 14:55:47 -0700 (PDT)
Received: from beppo.feral.com (beppo.feral.com [192.67.166.79])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 26A7143F93
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 14:55:46 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Received: from mailhost.feral.com (mjacob@mailhost.feral.com [192.67.166.1])
	by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h52Ltjqw072613;
	Mon, 2 Jun 2003 14:55:45 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Date: Mon, 2 Jun 2003 14:55:45 -0700 (PDT)
From: Matthew Jacob <mjacob@feral.com>
X-X-Sender: mjacob@beppo
To: Kern Sibbald <kern@sibbald.com>
In-Reply-To: <1054590119.13606.8.camel@rufus>
Message-ID: <20030602145421.D71034@beppo>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost>
	<577540000.1054579840@aslan.btc.adaptec.com>
	<20030602131225.F71034@beppo> <1054590119.13606.8.camel@rufus>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: mjacob@feral.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 21:55:47 -0000


> I suspect that the problem is something very simple such as
> the drive buffering data then hitting the physical EOM and
> of course any buffered data goes down the bit bucket.

A question to ask then is why tape_pattern_tester stopped at LEOT but
Bacula didn't and kept going to PEOT.

-matt

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 15:31:58 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A4B5A37B401
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 15:31:57 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 2DF3143FA3
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 15:31:56 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h52MVFv06875;
	Tue, 3 Jun 2003 00:31:15 +0200
From: Kern Sibbald <kern@sibbald.com>
To: mjacob@feral.com
In-Reply-To: <20030602145421.D71034@beppo>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost> <20030602110836.H71034@beppo>
	<20030602131225.F71034@beppo>
	<1054590119.13606.8.camel@rufus>  <20030602145421.D71034@beppo>
Content-Type: text/plain
Organization: 
Message-Id: <1054593075.13606.28.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 03 Jun 2003 00:31:15 +0200
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 22:31:58 -0000

On Mon, 2003-06-02 at 23:55, Matthew Jacob wrote:
> > I suspect that the problem is something very simple such as
> > the drive buffering data then hitting the physical EOM and
> > of course any buffered data goes down the bit bucket.
> 
> A question to ask then is why tape_pattern_tester stopped at LEOT but
> Bacula didn't and kept going to PEOT.
> 
> -matt

This was just a thought, because you or Justin said that 
the driver does not fail writes at the LEOF, which means
that unless you are doing something special in your
tpt, it is not stopping at the LEOF.

One thought that I had is: the fact that Bacula backs
up at the EOM to re-read the last record could cause
some problems.  I've asked Dan if he will re-run the
Bacula backup/restore test but with the re-read disabled.
As someone said, this will give one more data point.

Another interesting test would be to see if the same
data loss occurs in a situation where a tape size is
specified such that Bacula stops writing before the
EOM on the first tape.

Best regards,

Kern

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 15:42:45 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A706237B401
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 15:42:45 -0700 (PDT)
Received: from sift.mirapoint.com (sift.mirapoint.com [63.107.133.19])
	by mx1.FreeBSD.org (Postfix) with ESMTP id DD0D543F3F
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 15:42:42 -0700 (PDT)
	(envelope-from cer@mirapoint.com)
Received: from alpo.mirapoint.com (alpo.mirapoint.com [63.107.133.20])
	by sift.mirapoint.com (Mirapoint Messaging Server MOS 3.3.5-GR)
	with ESMTP id ABU95230;
	Mon, 2 Jun 2003 15:42:35 -0700 (PDT)
Received: from 192.168.10.83
	by alpo.mirapoint.com (Mirapoint Messaging Server MOS 3.3.5-GR)
	with HTTPS/1.1;
	Mon, 2 Jun 2003 15:42:35 -0700
Date: Mon, 2 Jun 2003 15:42:35 -0700
From: Carl Reisinger <cer@mirapoint.com>
To: Kern Sibbald <kern@sibbald.com>
X-Mailer: Webmail Mirapoint Direct 3.3.5-GR
MIME-Version: 1.0
Message-Id: <ca22a600.8ddeac19.8172000@alpo.mirapoint.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: cer@mirapoint.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 22:42:46 -0000

While your current problem may not be related to the
tendency of physio to silently breakup writes (and reads) I
believe you should revisit your code and make sure all buffers
are on a page boundary, especially since your writes are over
61440 bytes in size. And the max bufer size should be limited
to 65536 (otherwise all writes will be split).

Doing this now will prevent surprises in the future.

Carl Reisinger
Mirapoint

>
>Most of Bacula writes are 64512 bytes, and all the
>data that was lost consisted of blocks of 64512 bytes.
>
>> 
>> But I sorta doubt that this was the issue.
>> 
>> A buddy of mine at Mirapoint did just remind me that physio
can silently
>> break up xfers that are even less than 64k if the buffer
isn't page
>> aligned- I'd forgotten about that. But I'm not sure that
this is what is
>> occurring.
>
>The buffers are 64 bit aligned but not page aligned.
>

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 15:45:15 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 3DF0A37B404
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 15:45:15 -0700 (PDT)
Received: from beppo.feral.com (beppo.feral.com [192.67.166.79])
	by mx1.FreeBSD.org (Postfix) with ESMTP id DEF7A43FAF
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 15:45:13 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Received: from mailhost.feral.com (mjacob@mailhost.feral.com [192.67.166.1])
	by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h52MjCqw072758;
	Mon, 2 Jun 2003 15:45:13 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Date: Mon, 2 Jun 2003 15:45:12 -0700 (PDT)
From: Matthew Jacob <mjacob@feral.com>
X-X-Sender: mjacob@beppo
To: Kern Sibbald <kern@sibbald.com>
In-Reply-To: <1054593075.13606.28.camel@rufus>
Message-ID: <20030602154021.T71034@beppo>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost>
	<577540000.1054579840@aslan.btc.adaptec.com>
	<20030602131225.F71034@beppo>  <1054590119.13606.8.camel@rufus> 
	<20030602145421.D71034@beppo> <1054593075.13606.28.camel@rufus>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: mjacob@feral.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jun 2003 22:45:15 -0000


On Mon, 3 Jun 2003, Kern Sibbald wrote:

> On Mon, 2003-06-02 at 23:55, Matthew Jacob wrote:
> > > I suspect that the problem is something very simple such as
> > > the drive buffering data then hitting the physical EOM and
> > > of course any buffered data goes down the bit bucket.
> >
> > A question to ask then is why tape_pattern_tester stopped at LEOT but
> > Bacula didn't and kept going to PEOT.
> >
> > -matt
>
> This was just a thought, because you or Justin said that
> the driver does not fail writes at the LEOF, which means
> that unless you are doing something special in your
> tpt, it is not stopping at the LEOF.

Yes, it does provide a signfier. At the end of one operation that has
athe check condition that indicates early warning:

                } else if (sense->flags & SSD_EOM) {
                        softc->flags |= SA_FLAG_EOM_PENDING;

and

        SA_FLAG_ERR_PENDING     = (SA_FLAG_EOM_PENDING|SA_FLAG_EIO_PENDING|
                                   SA_FLAG_EOF_PENDING),

and at the start of an I/O:

                } else if ((softc->flags & SA_FLAG_ERR_PENDING) != 0) {
		....
                        bp->b_resid = bp->b_bcount;
			...
                        if ((softc->flags & SA_FLAG_EOM_PENDING) != 0) {
                                /*
                                 * We now just clear errors in this case
                                 * and let the residual be the notifier.
                                 */
                                bp->b_error = 0;

The signifier here back to the user application is a write returning
less than the requested amount.


>
> One thought that I had is: the fact that Bacula backs
> up at the EOM to re-read the last record could cause
> some problems.  I've asked Dan if he will re-run the
> Bacula backup/restore test but with the re-read disabled.
> As someone said, this will give one more data point.

Yes.


>
> Another interesting test would be to see if the same
> data loss occurs in a situation where a tape size is
> specified such that Bacula stops writing before the
> EOM on the first tape.

That too.

-matt

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 18:37:42 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2994637B427
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 18:37:42 -0700 (PDT)
Received: from bast.unixathome.org (bast.unixathome.org [66.11.174.150])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8DA9343FA3
	for <freebsd-scsi@freebsd.org>; Mon,  2 Jun 2003 18:37:41 -0700 (PDT)
	(envelope-from dan@langille.org)
Received: from wocker (wocker.unixathome.org [192.168.0.99])
	by bast.unixathome.org (Postfix) with ESMTP
	id 79ECF3F4F; Mon,  2 Jun 2003 21:37:40 -0400 (EDT)
From: "Dan Langille" <dan@langille.org>
To: Kern Sibbald <kern@sibbald.com>
Date: Mon, 02 Jun 2003 21:37:40 -0400
MIME-Version: 1.0
Message-ID: <3EDBC3A4.174.CAD088EC@localhost>
Priority: normal
References: <20030602145421.D71034@beppo>
In-reply-to: <1054593075.13606.28.camel@rufus>
X-mailer: Pegasus Mail for Windows (v4.02a)
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Content-description: Mail message body
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 01:37:42 -0000

On 3 Jun 2003 at 0:31, Kern Sibbald wrote:

> One thought that I had is: the fact that Bacula backs
> up at the EOM to re-read the last record could cause
> some problems.  I've asked Dan if he will re-run the
> Bacula backup/restore test but with the re-read disabled.
> As someone said, this will give one more data point.

The test results will be available by noon EST on Tuesday.
-- 
Dan Langille : http://www.langille.org/

From owner-freebsd-scsi@FreeBSD.ORG  Mon Jun  2 23:56:39 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 32E1537B401
	for <freebsd-scsi@FreeBSD.org>; Mon,  2 Jun 2003 23:56:39 -0700 (PDT)
Received: from rootlabs.com (root.org [67.118.192.226])
	by mx1.FreeBSD.org (Postfix) with SMTP id BEA7843F3F
	for <freebsd-scsi@FreeBSD.org>; Mon,  2 Jun 2003 23:56:38 -0700 (PDT)
	(envelope-from nate@rootlabs.com)
Received: (qmail 22077 invoked by uid 1000); 3 Jun 2003 06:56:41 -0000
Date: Mon, 2 Jun 2003 23:56:41 -0700 (PDT)
From: Nate Lawson <nate@root.org>
To: Aniruddha Bohra <bohra@cs.rutgers.edu>
In-Reply-To: <3ED3CCFF.4080507@cs.rutgers.edu>
Message-ID: <20030602235514.J22029@root.org>
References: <3ED3CCFF.4080507@cs.rutgers.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@FreeBSD.org
Subject: Re: Emulating a SCSI device
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 06:56:39 -0000

On Tue, 27 May 2003, Aniruddha Bohra wrote:
>     I am trying to write a SIM module for FreeBSD which basically
> emulates a SCSI controller with a disk attached at target 0 lun 0.

What is your hardware?  Are you using sys/cam/scsi/scsi_target and
src/share/examples/scsi_target?

>     I go as far as the action function of the controller getting called
> with a XPT_PATH_INQ - where I fill in the fake data.
>
>     Nothing happens after that. I have looked for documentation
> of how to get the psuedo disk attached to the da driver but did
> not make much headway.

You have to call xpt_done() on the CCB to send it back to the caller.

>     My question is : How do I get the da or any SCSI peripheral
> driver attach to the emulated disk.
>
>     I would appreciate any help or pointers.

More information is needed (see questions above).

-Nate

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 00:28:49 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5DD8937B401
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 00:28:49 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id CE29B43FBD
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 00:28:47 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h537Sbv08582;
	Tue, 3 Jun 2003 09:28:38 +0200
From: Kern Sibbald <kern@sibbald.com>
To: cer@mirapoint.com
In-Reply-To: <ca22a600.8ddeac19.8172000@alpo.mirapoint.com>
References: <ca22a600.8ddeac19.8172000@alpo.mirapoint.com>
Content-Type: text/plain
Organization: 
Message-Id: <1054625317.13630.68.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 03 Jun 2003 09:28:37 +0200
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 07:28:49 -0000

Concerning the maximum buffer size: I have chosen
the default maximum buffer size to be 64512 bytes so
that it is smaller than 65536. In fact, 64512 bytes is
the size (126 blocks) that I used for tar in 1982 
and never had any problems.  

>From what I understand the 65536 point at which 
buffers are always split only applies to devices in
fixed block mode, and probably older devices at that.

Though Bacula can run in fixed block mode, the
default is variable block, so I don't see that as
an issue here -- unless I am missing something?

Can you explain why you mention 61440 bytes?  and
why it might be a better choice than 64512?

On aligning the buffers on a page boundary: interesting
idea, I'll look into it, but I'm not too keen on the
idea. 

Best regards,

Kern

On Tue, 2003-06-03 at 00:42, Carl Reisinger wrote:
> While your current problem may not be related to the
> tendency of physio to silently breakup writes (and reads) I
> believe you should revisit your code and make sure all buffers
> are on a page boundary, especially since your writes are over
> 61440 bytes in size. And the max bufer size should be limited
> to 65536 (otherwise all writes will be split).
> 
> Doing this now will prevent surprises in the future.
> 
> Carl Reisinger
> Mirapoint
> 
> >
> >Most of Bacula writes are 64512 bytes, and all the
> >data that was lost consisted of blocks of 64512 bytes.
> >
> >> 
> >> But I sorta doubt that this was the issue.
> >> 
> >> A buddy of mine at Mirapoint did just remind me that physio
> can silently
> >> break up xfers that are even less than 64k if the buffer
> isn't page
> >> aligned- I'd forgotten about that. But I'm not sure that
> this is what is
> >> occurring.
> >
> >The buffers are 64 bit aligned but not page aligned.
> >

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 06:07:40 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6E31F37B404
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 06:07:40 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 1CC9B43FBF
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 06:07:38 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h53D6uv09487;
	Tue, 3 Jun 2003 15:06:57 +0200
From: Kern Sibbald <kern@sibbald.com>
To: mjacob@feral.com
In-Reply-To: <20030602131225.F71034@beppo>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost> <20030602110836.H71034@beppo>
	<20030602131225.F71034@beppo>
Content-Type: text/plain
Organization: 
Message-Id: <1054645616.13630.161.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 03 Jun 2003 15:06:56 +0200
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 13:07:40 -0000

Hello,

Dan has now re-run our test of writing to two tapes. In
this test, he told Bacula not to attempt to re-read the
last block written, so Bacula wrote until -1 with errno=ENOSPC
was returned, wrote two EOF marks then put up
the next volume.

The results were the same (more or less) 12 blocks of
data were lost, which corresponds to the smaller size
of the restored file that was split across two tapes.

These 12 blocks were also at the end of the tape.  

During the restore, Bacula reported the following:

03-Jun-2003 05:01 undef-sd: RestoreFiles.2003-06-03_04.36.59 Error:
Invalid block number. Expected 6060, got 6072

and in Bacula's database, Bacula indicates that blocks
0 to 6072 were written to the first tape. In fact, only
blocks 0 to 6071 were written to the first tape -- I
see that Bacula has included the failed block in its
count, which is wrong, but this doesn't change the results
at all though.

Bottom line: 

Even when we eliminate the code that backs
up and re-reads the last block, we still see
the last 12 or 13 blocks being lost. They were
written by the program but are not physically 
on the tape.

Next step: 

Dan is now running a test where Bacula will stop
writing on the first tape before the EOM is reached.

Best regards,

Kern


From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 06:19:40 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6EC1337B401
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 06:19:40 -0700 (PDT)
Received: from sift.mirapoint.com (sift.mirapoint.com [63.107.133.19])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D141843FCB
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 06:19:39 -0700 (PDT)
	(envelope-from cer@mirapoint.com)
Received: from alpo.mirapoint.com (alpo.mirapoint.com [63.107.133.20])
	by sift.mirapoint.com (Mirapoint Messaging Server MOS 3.3.5-GR)
	with ESMTP id ABV00096;
	Tue, 3 Jun 2003 06:19:35 -0700 (PDT)
Received: from 207.135.76.118
	by alpo.mirapoint.com (Mirapoint Messaging Server MOS 3.3.5-GR)
	with HTTPS/1.1;
	Tue, 3 Jun 2003 06:19:35 -0700
Date: Tue, 3 Jun 2003 06:19:35 -0700
From: Carl Reisinger <cer@mirapoint.com>
To: Kern Sibbald <kern@sibbald.com>
X-Mailer: Webmail Mirapoint Direct 3.3.5-GR
MIME-Version: 1.0
Message-Id: <ace3a3c.8e2ef894.8172000@alpo.mirapoint.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: cer@mirapoint.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 13:19:40 -0000

>Concerning the maximum buffer size: I have chosen
>the default maximum buffer size to be 64512 bytes so
>that it is smaller than 65536. In fact, 64512 bytes is
>the size (126 blocks) that I used for tar in 1982 
>and never had any problems.  

Try using the FreeBSD tar with the multi-volume flag (-M) and
your record size.

Without the flag the writes are page aligned, with the flag
the writes are offset some, either 512 or 1536 bytes (I forget
which), and the writes will be split by the kernel physio
function into a 60K and 3K write. (This is with the tar
shipped with FreeBSD up to at least 4.2. Later ones may also
do this, I have not tried them)

>
>>From what I understand the 65536 point at which 
>buffers are always split only applies to devices in
>fixed block mode, and probably older devices at that.

This magic number has nothing to do with the device. I've only
used variable block mode and newer technologies, SDLT, LTO.

>
>Though Bacula can run in fixed block mode, the
>default is variable block, so I don't see that as
>an issue here -- unless I am missing something?
>
>Can you explain why you mention 61440 bytes?  and
>why it might be a better choice than 64512?
>

61440 was mentioned since that is the largest write that can
be done without the physio function doing some surprising and
annoying things to your write. 61440 is the size that, no
matter its address alignment, can always be mapped with one
page register.

If you are careful to page align all writes then you can write
up to 65536 and have one record sent to the tape device.

(Actually, with a minor change to scsi_sa.c and limiting one
self to newer SCSI HBAs you can go as high as 128KB for
read/write)

An example:

Write 64512 bytes with a starting address of 4096. Physio will
take this, see that the address is paged aligned, check that
it can be mapped with one page register and perform one write.

Now lets write 64512 bytes but with an address of 5632. In
this case physio will notice it is not paged aligned and
adjust the starting address to be 4096. Now 66048 bytes need
to be mapped which exceeds the default size of 65536. In this
case physio will map the first 60K (64K to him because of the
starting address change), write that and then map and write
the remainder.

Now when one goes back to read 64512 bytes, the first read
returns 61440 bytes and the second 3072 instead of just one
read retuning 64512.

>On aligning the buffers on a page boundary: interesting
>idea, I'll look into it, but I'm not too keen on the
>idea. 
>

If your software has no problem with short reads and records
being split into two, then don't bother page aligning.
But, if you want to read exectly what you know you wrote then
alignment is a must.

Carl

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 06:37:46 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5F9A837B401
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 06:37:46 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8E06543F3F
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 06:37:44 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h53Dbev09582;
	Tue, 3 Jun 2003 15:37:40 +0200
From: Kern Sibbald <kern@sibbald.com>
To: cer@mirapoint.com
In-Reply-To: <ace3a3c.8e2ef894.8172000@alpo.mirapoint.com>
References: <ace3a3c.8e2ef894.8172000@alpo.mirapoint.com>
Content-Type: text/plain
Organization: 
Message-Id: <1054647459.13630.189.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 03 Jun 2003 15:37:40 +0200
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 13:37:46 -0000

Thanks for the lesson is how blocks are written to
a tape -- especially the example.
I'm now leaning strongly toward aligning
my buffers.  However a couple more questions please.

- When using tar (or say Bacula), how do you know
  your writes are split by the kernel?  In the case
  of Bacula, with the buffer size I use, it ALWAYS
  gets back exactly what it wrote. From my userland
  perspective I see no double writes.

- What is the "page" that you are referring to?
  Paged memory?  If I am not mistaken the page
  size can be radically different depending on the
  OS and hardware. I.e. 1024 to 4096 or even more.

- How does one determine what a page size is,
  preferably in a system independent way?  

Thanks,

Kern

On Tue, 2003-06-03 at 15:19, Carl Reisinger wrote:
> >Concerning the maximum buffer size: I have chosen
> >the default maximum buffer size to be 64512 bytes so
> >that it is smaller than 65536. In fact, 64512 bytes is
> >the size (126 blocks) that I used for tar in 1982 
> >and never had any problems.  
> 
> Try using the FreeBSD tar with the multi-volume flag (-M) and
> your record size.
> 
> Without the flag the writes are page aligned, with the flag
> the writes are offset some, either 512 or 1536 bytes (I forget
> which), and the writes will be split by the kernel physio
> function into a 60K and 3K write. (This is with the tar
> shipped with FreeBSD up to at least 4.2. Later ones may also
> do this, I have not tried them)
> 
> >
> >>From what I understand the 65536 point at which 
> >buffers are always split only applies to devices in
> >fixed block mode, and probably older devices at that.
> 
> This magic number has nothing to do with the device. I've only
> used variable block mode and newer technologies, SDLT, LTO.
> 
> >
> >Though Bacula can run in fixed block mode, the
> >default is variable block, so I don't see that as
> >an issue here -- unless I am missing something?
> >
> >Can you explain why you mention 61440 bytes?  and
> >why it might be a better choice than 64512?
> >
> 
> 61440 was mentioned since that is the largest write that can
> be done without the physio function doing some surprising and
> annoying things to your write. 61440 is the size that, no
> matter its address alignment, can always be mapped with one
> page register.
> 
> If you are careful to page align all writes then you can write
> up to 65536 and have one record sent to the tape device.
> 
> (Actually, with a minor change to scsi_sa.c and limiting one
> self to newer SCSI HBAs you can go as high as 128KB for
> read/write)
> 
> An example:
> 
> Write 64512 bytes with a starting address of 4096. Physio will
> take this, see that the address is paged aligned, check that
> it can be mapped with one page register and perform one write.
> 
> Now lets write 64512 bytes but with an address of 5632. In
> this case physio will notice it is not paged aligned and
> adjust the starting address to be 4096. Now 66048 bytes need
> to be mapped which exceeds the default size of 65536. In this
> case physio will map the first 60K (64K to him because of the
> starting address change), write that and then map and write
> the remainder.
> 
> Now when one goes back to read 64512 bytes, the first read
> returns 61440 bytes and the second 3072 instead of just one
> read retuning 64512.
> 
> >On aligning the buffers on a page boundary: interesting
> >idea, I'll look into it, but I'm not too keen on the
> >idea. 
> >
> 
> If your software has no problem with short reads and records
> being split into two, then don't bother page aligning.
> But, if you want to read exectly what you know you wrote then
> alignment is a must.
> 
> Carl

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 07:01:58 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id C58B137B401
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 07:01:58 -0700 (PDT)
Received: from sift.mirapoint.com (sift.mirapoint.com [63.107.133.19])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0146043F93
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 07:01:58 -0700 (PDT)
	(envelope-from cer@mirapoint.com)
Received: from alpo.mirapoint.com (alpo.mirapoint.com [63.107.133.20])
	by sift.mirapoint.com (Mirapoint Messaging Server MOS 3.3.5-GR)
	with ESMTP id ABV00329;
	Tue, 3 Jun 2003 07:01:53 -0700 (PDT)
Received: from 207.135.76.118
	by alpo.mirapoint.com (Mirapoint Messaging Server MOS 3.3.5-GR)
	with HTTPS/1.1;
	Tue, 3 Jun 2003 07:01:53 -0700
Date: Tue, 3 Jun 2003 07:01:53 -0700
From: Carl Reisinger <cer@mirapoint.com>
To: Kern Sibbald <kern@sibbald.com>
X-Mailer: Webmail Mirapoint Direct 3.3.5-GR
MIME-Version: 1.0
Message-Id: <a2258404.8e32d868.8285000@alpo.mirapoint.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: cer@mirapoint.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 14:01:59 -0000

>
>- When using tar (or say Bacula), how do you know
>  your writes are split by the kernel?

One of my pet peeves, you don't know. Physio just blindly does
this, which I believe is fine for a disk but can be quite
annoying when applied to a tape device. Makes me wish for a
physio that works like the USG and Research V7 versions which
would just fail the write if it could not perform it in one
shot (showing my age here).

  In the case
>  of Bacula, with the buffer size I use, it ALWAYS
>  gets back exactly what it wrote. From my userland
>  perspective I see no double writes.

Then, so far, your buffers have been correcty aligned. If you
use malloc they will be (check the malloc man page, this is
mentioned in the first paragraph).

>
>- What is the "page" that you are referring to?
>  Paged memory?  If I am not mistaken the page
>  size can be radically different depending on the
>  OS and hardware. I.e. 1024 to 4096 or even more.
>

Here's where I fall down. The page register I keep mentioning
is the x86 MMU registers, I really do not know their proper
name in the x86 world. One register can map up 64K but must be
page (4096 byte) aligned.

>- How does one determine what a page size is,
>  preferably in a system independent way?  
>

I don't know how widespread this is, but check out
getpagesize(3).

If you count system include headers as system independent then
PAGE_SIZE can be used (it's in param.h).

If you start messing around with the max write size then
DFLTPHYS can be used as the limit.

Carl

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 07:34:52 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id BD96A37B401
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 07:34:52 -0700 (PDT)
Received: from beppo.feral.com (beppo.feral.com [192.67.166.79])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 99D7843FA3
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 07:34:51 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Received: from mailhost.feral.com (mjacob@mailhost.feral.com [192.67.166.1])
	by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h53EYoqw044921;
	Tue, 3 Jun 2003 07:34:50 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Date: Tue, 3 Jun 2003 07:34:49 -0700 (PDT)
From: Matthew Jacob <mjacob@feral.com>
X-X-Sender: mjacob@beppo
To: Kern Sibbald <kern@sibbald.com>
In-Reply-To: <1054645616.13630.161.camel@rufus>
Message-ID: <20030603072944.U44880@beppo>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost>
	<577540000.1054579840@aslan.btc.adaptec.com>
	<20030602131225.F71034@beppo> <1054645616.13630.161.camel@rufus>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: mjacob@feral.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 14:34:53 -0000


The fact that you're getting  ENOSPC means that you're getting to PEOT-
past LEOT. I guess I need to see the Bacula source to see why LEOT is
being missed. If you can build a kernel with CAMDEBUG and run

	camcontrol debug -I b:t:l

(bus:target:lun for the tape) and rerun the test, you'll get boatloads
of output, but an audit trail of what sastart and saerror are doing
around the PEOT timeframe.

There's other stuff here that I need to collect my thoughts on to mail
about. This will happen later today.

On Tue, 3 Jun 2003, Kern Sibbald wrote:

> Hello,
>
> Dan has now re-run our test of writing to two tapes. In
> this test, he told Bacula not to attempt to re-read the
> last block written, so Bacula wrote until -1 with errno=ENOSPC
> was returned, wrote two EOF marks then put up
> the next volume.
>
> The results were the same (more or less) 12 blocks of
> data were lost, which corresponds to the smaller size
> of the restored file that was split across two tapes.
>
> These 12 blocks were also at the end of the tape.
>
> During the restore, Bacula reported the following:
>
> 03-Jun-2003 05:01 undef-sd: RestoreFiles.2003-06-03_04.36.59 Error:
> Invalid block number. Expected 6060, got 6072
>
> and in Bacula's database, Bacula indicates that blocks
> 0 to 6072 were written to the first tape. In fact, only
> blocks 0 to 6071 were written to the first tape -- I
> see that Bacula has included the failed block in its
> count, which is wrong, but this doesn't change the results
> at all though.
>
> Bottom line:
>
> Even when we eliminate the code that backs
> up and re-reads the last block, we still see
> the last 12 or 13 blocks being lost. They were
> written by the program but are not physically
> on the tape.
>
> Next step:
>
> Dan is now running a test where Bacula will stop
> writing on the first tape before the EOM is reached.
>
> Best regards,
>
> Kern
>
>
>
>
>
>

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 07:52:17 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6771437B401
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 07:52:17 -0700 (PDT)
Received: from aslan.scsiguy.com (mail.scsiguy.com [63.229.232.106])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 5159243F85
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 07:52:16 -0700 (PDT)
	(envelope-from gibbs@scsiguy.com)
Received: from aslan.scsiguy.com (aslan.scsiguy.com [63.229.232.106])
	by aslan.scsiguy.com (8.12.8/8.12.8) with ESMTP id h53EpxIh033249;
	Tue, 3 Jun 2003 08:51:59 -0600 (MDT)
	(envelope-from gibbs@scsiguy.com)
Date: Tue, 03 Jun 2003 08:51:59 -0600
From: "Justin T. Gibbs" <gibbs@scsiguy.com>
To: Kern Sibbald <kern@sibbald.com>, mjacob@feral.com
Message-ID: <3490610000.1054651919@aslan.scsiguy.com>
In-Reply-To: <1054645616.13630.161.camel@rufus>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost> <20030602110836.H71034@beppo>	
	<20030602131225.F71034@beppo> <1054645616.13630.161.camel@rufus>
X-Mailer: Mulberry/3.0.3 (Linux/x86)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 14:52:17 -0000

> Hello,
> 
> Dan has now re-run our test of writing to two tapes. In
> this test, he told Bacula not to attempt to re-read the
> last block written, so Bacula wrote until -1 with errno=ENOSPC
> was returned, wrote two EOF marks then put up
> the next volume.

Bacula is supposed to start the process of a tape change as soon
as the amount written is less than what you intended to write.
Ignoring the short write and waiting until you hit ENOSPC guarantees
you will hit PEOM, since the LEOM is only reported once.  The tape
driver expects that you know what you are doing if you go on writing.

--
Justin

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 08:05:19 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 69FC137B401
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 08:05:19 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 955FE43FA3
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 08:05:17 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h53F4cv09992;
	Tue, 3 Jun 2003 17:04:38 +0200
From: Kern Sibbald <kern@sibbald.com>
To: mjacob@feral.com
In-Reply-To: <20030603072944.U44880@beppo>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost> <20030602110836.H71034@beppo>
	<20030602131225.F71034@beppo>
	<1054645616.13630.161.camel@rufus>  <20030603072944.U44880@beppo>
Content-Type: text/plain
Organization: 
Message-Id: <1054652678.13630.209.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 03 Jun 2003 17:04:38 +0200
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 15:05:19 -0000

What is clear from the output is that the write()
is returning a -1 status. errno could possibly be 0,
in which case I set it to ENOSPC, if it is not 0
then it is ENOSPC judging by the error message that
is printed "Write error on device ...".

You may want to see more, but here is the basic code
that does the write:
   if ((uint32_t)(stat=write(dev->fd, block->buf, (size_t)wlen)) !=
wlen) {
      /* We should check for errno == ENOSPC, BUT many 
       * devices simply report EIO when it is full.
       * with a little more thought we may be able to check
       * capacity and distinguish real errors and EOT
       * conditions.  In any case, we probably want to
       * simulate an End of Medium.
       */
      clrerror_dev(dev, -1);

      if (dev->dev_errno == 0) {
         dev->dev_errno = ENOSPC;        /* out of space */
      }

      Dmsg4(10, "=== Write error. size=%u rtn=%d  errno=%d: ERR=%s\n", 
         wlen, stat, dev->dev_errno, strerror(dev->dev_errno));

      if (stat == -1) {
         Jmsg(jcr, M_ERROR, 0, _("Write error on device %s. ERR=%s.\n"),
            dev->dev_name, strerror(dev->dev_errno));
      } else {
         Jmsg3(jcr, M_INFO, 0, _("End of medium on device %s. Write of
%u bytes got %d
            dev->dev_name, wlen, stat);
      }  
      block->write_failed = true;
      dev->EndBlock = dev->block_num;
      dev->EndFile  = dev->file;
      weof_dev(dev, 1);               /* end the tape */
      weof_dev(dev, 1);               /* write second eof */
      dev->state |= (ST_EOF | ST_EOT | ST_WEOT);


=======
clererror() does:
void
clrerror_dev(DEVICE *dev, int func)
{
   char *msg = NULL;

   dev->dev_errno = errno;         /* save errno */
   if (errno == EIO) {
      dev->VolCatInfo.VolCatErrors++;
   }

   if (!(dev->state & ST_TAPE)) {
      return;
   }
   if (errno == ENOTTY || errno == ENOSYS) { /* Function not implemented
*/
   switch (func) {
      case -1:
         Emsg0(M_ABORT, 0, "Got ENOTTY on read/write!\n");
         break;
      case MTWEOF:
         msg = "WTWEOF";
         dev->capabilities &= ~CAP_EOF; /* turn off feature */
         break;
#ifdef MTEOM
      case MTEOM:
         msg = "WTEOM";
         dev->capabilities &= ~CAP_EOM; /* turn off feature */
         break;
#endif 
      case MTFSF:
         msg = "MTFSF";
         dev->capabilities &= ~CAP_FSF; /* turn off feature */
         break;
      case MTBSF:
         msg = "MTBSF";
         dev->capabilities &= ~CAP_BSF; /* turn off feature */
         break;
      case MTFSR:
         msg = "MTFSR";
         dev->capabilities &= ~CAP_FSR; /* turn off feature */
         break;
      case MTBSR:
         msg = "MTBSR";
         dev->capabilities &= ~CAP_BSR; /* turn off feature */
         break;
      default:
         msg = "Unknown";
         break;
      }
      if (msg != NULL) {
         dev->dev_errno = ENOSYS;
         Mmsg1(&dev->errmsg, _("This device does not support %s.\n"),
msg);
         Emsg0(M_ERROR, 0, dev->errmsg);
      }
   }
/* Found on Linux */
#ifdef MTIOCLRERR
{
   struct mtop mt_com;
   mt_com.mt_op = MTIOCLRERR;
   mt_com.mt_count = 1;
   /* Clear any error condition on the tape */
   ioctl(dev->fd, MTIOCTOP, (char *)&mt_com);
   Dmsg0(200, "Did MTIOCLRERR\n");
}
#endif

/* Typically on FreeBSD */
#ifdef MTIOCERRSTAT
{
   /* Read and clear SCSI error status */
   union mterrstat mt_errstat;
   Pmsg2(000, "Doing MTIOCERRSTAT errno=%d ERR=%s\n", dev->dev_errno,
      strerror(dev->dev_errno));
   ioctl(dev->fd, MTIOCERRSTAT, (char *)&mt_errstat);
}
#endif
}

==== 
 

On Tue, 2003-06-03 at 16:34, Matthew Jacob wrote:
> The fact that you're getting  ENOSPC means that you're getting to PEOT-
> past LEOT. I guess I need to see the Bacula source to see why LEOT is
> being missed. If you can build a kernel with CAMDEBUG and run
> 
> 	camcontrol debug -I b:t:l
> 
> (bus:target:lun for the tape) and rerun the test, you'll get boatloads
> of output, but an audit trail of what sastart and saerror are doing
> around the PEOT timeframe.
> 
> There's other stuff here that I need to collect my thoughts on to mail
> about. This will happen later today.
> 
> On Tue, 3 Jun 2003, Kern Sibbald wrote:
> 
> > Hello,
> >
> > Dan has now re-run our test of writing to two tapes. In
> > this test, he told Bacula not to attempt to re-read the
> > last block written, so Bacula wrote until -1 with errno=ENOSPC
> > was returned, wrote two EOF marks then put up
> > the next volume.
> >
> > The results were the same (more or less) 12 blocks of
> > data were lost, which corresponds to the smaller size
> > of the restored file that was split across two tapes.
> >
> > These 12 blocks were also at the end of the tape.
> >
> > During the restore, Bacula reported the following:
> >
> > 03-Jun-2003 05:01 undef-sd: RestoreFiles.2003-06-03_04.36.59 Error:
> > Invalid block number. Expected 6060, got 6072
> >
> > and in Bacula's database, Bacula indicates that blocks
> > 0 to 6072 were written to the first tape. In fact, only
> > blocks 0 to 6071 were written to the first tape -- I
> > see that Bacula has included the failed block in its
> > count, which is wrong, but this doesn't change the results
> > at all though.
> >
> > Bottom line:
> >
> > Even when we eliminate the code that backs
> > up and re-reads the last block, we still see
> > the last 12 or 13 blocks being lost. They were
> > written by the program but are not physically
> > on the tape.
> >
> > Next step:
> >
> > Dan is now running a test where Bacula will stop
> > writing on the first tape before the EOM is reached.
> >
> > Best regards,
> >
> > Kern
> >
> >
> >
> >
> >
> >

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 08:12:16 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6080137B401
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 08:12:16 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id BCE5C43FA3
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 08:12:14 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h53FBlv10009;
	Tue, 3 Jun 2003 17:11:47 +0200
From: Kern Sibbald <kern@sibbald.com>
To: "Justin T. Gibbs" <gibbs@scsiguy.com>
In-Reply-To: <3490610000.1054651919@aslan.scsiguy.com>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost> <20030602110836.H71034@beppo>
	<20030602131225.F71034@beppo>	 <1054645616.13630.161.camel@rufus>
	<3490610000.1054651919@aslan.scsiguy.com>
Content-Type: text/plain
Organization: 
Message-Id: <1054653106.13606.217.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 03 Jun 2003 17:11:47 +0200
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
cc: mjacob@feral.com
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 15:12:16 -0000

On Tue, 2003-06-03 at 16:51, Justin T. Gibbs wrote:
> > Hello,
> > 
> > Dan has now re-run our test of writing to two tapes. In
> > this test, he told Bacula not to attempt to re-read the
> > last block written, so Bacula wrote until -1 with errno=ENOSPC
> > was returned, wrote two EOF marks then put up
> > the next volume.
> 
> Bacula is supposed to start the process of a tape change as soon
> as the amount written is less than what you intended to write.


This is exactly what it does. *Every* time the requested write 
size does not agree with the returned value, Bacula gives 
up on the tape.  My last email has the code that does that.

My email above was not very clear because I was telling you what
happened in the particular case of loss of data (the -1 and errno=0
or errno=ENOSPC I don't know which). As noted here, Bacula *will*
stop writing if the driver returns a short block (assuming my
code isn't broken), but I have never seen that case on FreeBSD.

> Ignoring the short write and waiting until you hit ENOSPC guarantees
> you will hit PEOM, since the LEOM is only reported once.  The tape
> driver expects that you know what you are doing if you go on writing.

The only additional writing Bacula does (unless I am missing something)
is the two EOF marks.


From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 09:03:39 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5E39D37B401
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 09:03:39 -0700 (PDT)
Received: from beppo.feral.com (beppo.feral.com [192.67.166.79])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 71EA543F3F
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 09:03:38 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Received: from wonky.in0.lcl (wonky.in0.lcl [172.16.166.7])
	by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h53G3aqw046246;
	Tue, 3 Jun 2003 09:03:36 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Date: Tue, 3 Jun 2003 09:03:36 -0700 (PDT)
From: Matthew Jacob <mjacob@feral.com>
X-X-Sender: mjacob@wonky.in0.lcl
To: Kern Sibbald <kern@sibbald.com>
In-Reply-To: <1054653106.13606.217.camel@rufus>
Message-ID: <20030603084701.U24586@wonky.in0.lcl>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost>
	<577540000.1054579840@aslan.btc.adaptec.com>
	<20030602131225.F71034@beppo>  <1054645616.13630.161.camel@rufus> 
	<1054653106.13606.217.camel@rufus>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: mjacob@feral.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 16:03:39 -0000

>
> This is exactly what it does. *Every* time the requested write
> size does not agree with the returned value, Bacula gives
> up on the tape.  My last email has the code that does that.
>
> My email above was not very clear because I was telling you what
> happened in the particular case of loss of data (the -1 and errno=0
> or errno=ENOSPC I don't know which). As noted here, Bacula *will*
> stop writing if the driver returns a short block (assuming my
> code isn't broken), but I have never seen that case on FreeBSD.

That's really wierd. I have to look at this closer. I've had some
drives not report LEOT at all, but since tape_pattern_tester didn't
complain on the same drive you were using, I know tape_pattern_tester is
in fact stopping at LEOT.

write(2) isn't necessarily returning -1. It may be returning 0- which
means that no data moved.

I think the ENOSPC as you report is a red herring because you're setting
this value- unless you actually *did* see -1 returned from write(2) and
ENOSPC set in errno,.

In any case, even if you hit PEOT instead of LEOT, you shouldn't *lose*
data. If you hit PEOT, we have to return -1/ENOSPC. Because this is Unix
or Linux or Solaris instead of a reasonable and modern OS, like RSX, VMS
or NT, which allow you to give realistic details to failures in I/O
requests, this means you have no way of telling the user application how
much was *actually* written when you hit *PEOT* (not LEOT, note!). As
far as the user application is concerned, *no* data was written at all
for this last write.

But there may in fact be data on the tape media. What is particularily
annoying in the PEOT case is that your application probably asked for
the next tape and rewrote all the blocks from the failed write. This is
fine, but you have to make damned sure then on rereading the data later
that you can handle duplicate blocks because you may read blocks NOPQR
on tapeA and then switch to tapeB and read blocks OPQR again on tapeB.

I don't think this is your problem here, but I thought I'd have a
pre-coffee diatribe about it. Grump.


>
> > Ignoring the short write and waiting until you hit ENOSPC guarantees
> > you will hit PEOM, since the LEOM is only reported once.  The tape
> > driver expects that you know what you are doing if you go on writing.
>
> The only additional writing Bacula does (unless I am missing something)
> is the two EOF marks.

This is one of the things that's bothering me. You shouldn't be writing
extra marks if you actually close the device. I'd like to look over all
the current Bacula source, but sourceforge is offline at the moment.


-matt

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 09:10:56 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id DAE4337B404
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 09:10:56 -0700 (PDT)
Received: from bast.unixathome.org (bast.unixathome.org [66.11.174.150])
	by mx1.FreeBSD.org (Postfix) with ESMTP id DA3BF43F3F
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 09:10:55 -0700 (PDT)
	(envelope-from dan@langille.org)
Received: from wocker (wocker.unixathome.org [192.168.0.99])
	by bast.unixathome.org (Postfix) with ESMTP
	id 06FC23F4F; Tue,  3 Jun 2003 12:10:55 -0400 (EDT)
From: "Dan Langille" <dan@langille.org>
To: Matthew Jacob <mjacob@feral.com>
Date: Tue, 03 Jun 2003 12:10:54 -0400
MIME-Version: 1.0
Message-ID: <3EDC904E.3272.CDF011AF@localhost>
Priority: normal
References: <1054653106.13606.217.camel@rufus>
In-reply-to: <20030603084701.U24586@wonky.in0.lcl>
X-mailer: Pegasus Mail for Windows (v4.02a)
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Content-description: Mail message body
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 16:10:57 -0000

On 3 Jun 2003 at 9:03, Matthew Jacob wrote:

> This is one of the things that's bothering me. You shouldn't be writing
> extra marks if you actually close the device. I'd like to look over all
> the current Bacula source, but sourceforge is offline at the moment.

The code I'm working with is at:

http://www.freebsddiary.org/tmp/bacula-1.31cvs.tar.gz

I'm just modifying the existing FreeBSD port skeleton 
(sysutils/bacula) to use it:

# diff Makefile ../bacula/Makefile
5c5
< # $FreeBSD: ports/sysutils/bacula/Makefile,v 1.2 2003/05/08 
15:52:16 demon Exp $
---
> # $FreeBSD: ports/sysutils/bacula/Makefile,v 1.3 2003/05/13 
14:30:40 demon Exp $
9c9
< PORTVERSION=  1.31cvs
---
> PORTVERSION=  1.30a

and a make makesum will remove any distfile mismatch errors.
-- 
Dan Langille : http://www.langille.org/

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 09:24:51 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id E10C837B404
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 09:24:51 -0700 (PDT)
Received: from magic.adaptec.com (magic-mail.adaptec.com [208.236.45.100])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4927943F85
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 09:24:51 -0700 (PDT)
	(envelope-from gibbs@scsiguy.com)
Received: from redfish.adaptec.com (redfish.adaptec.com [162.62.50.11])
	by magic.adaptec.com (8.11.6/8.11.6) with ESMTP id h53GJmZ24864;
	Tue, 3 Jun 2003 09:19:48 -0700
Received: from [10.100.253.70] (aslan.btc.adaptec.com [10.100.253.70])
	by redfish.adaptec.com (8.8.8p2+Sun/8.8.8) with ESMTP id JAA07505;
	Tue, 3 Jun 2003 09:24:50 -0700 (PDT)
Date: Tue, 03 Jun 2003 10:25:30 -0600
From: "Justin T. Gibbs" <gibbs@scsiguy.com>
To: Kern Sibbald <kern@sibbald.com>, mjacob@feral.com
Message-ID: <882210000.1054657530@aslan.btc.adaptec.com>
In-Reply-To: <1054652678.13630.209.camel@rufus>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost> <20030602110836.H71034@beppo>
	<577540000.1054579840@aslan.btc.adaptec.com> <20030602131225.F71034@beppo>
	<1054645616.13630.161.camel@rufus>  <20030603072944.U44880@beppo>
	<1054652678.13630.209.camel@rufus>
X-Mailer: Mulberry/3.0.3 (Linux/x86)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: "Justin T. Gibbs" <gibbs@scsiguy.com>
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 16:24:52 -0000

> What is clear from the output is that the write()
> is returning a -1 status. errno could possibly be 0,
> in which case I set it to ENOSPC, if it is not 0
> then it is ENOSPC judging by the error message that
> is printed "Write error on device ...".
> 
> You may want to see more, but here is the basic code
> that does the write:
>    if ((uint32_t)(stat=write(dev->fd, block->buf, (size_t)wlen)) !=
> wlen) {
>       /* We should check for errno == ENOSPC, BUT many 
>        * devices simply report EIO when it is full.
>        * with a little more thought we may be able to check
>        * capacity and distinguish real errors and EOT
>        * conditions.  In any case, we probably want to
>        * simulate an End of Medium.
>        */
>       clrerror_dev(dev, -1);

Apart from the funny casting, the only obvious bug is that you
are expecting errno to be set on every syscall.  Errno is only
valid if stat == -1 or you explicitly clear it prior to the
syscall (or after the last time it was set).  You don't seem
to be doing that here.

See the errno man page for details

--
Justin

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 09:41:09 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 1537837B401
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 09:41:09 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 5E5E343F85
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 09:41:07 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h53GeWv10269;
	Tue, 3 Jun 2003 18:40:32 +0200
From: Kern Sibbald <kern@sibbald.com>
To: "Justin T. Gibbs" <gibbs@scsiguy.com>
In-Reply-To: <882210000.1054657530@aslan.btc.adaptec.com>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost> <20030602110836.H71034@beppo>
	<20030602131225.F71034@beppo>
	<1054645616.13630.161.camel@rufus>  <20030603072944.U44880@beppo>
	<1054652678.13630.209.camel@rufus>
	<882210000.1054657530@aslan.btc.adaptec.com>
Content-Type: text/plain
Organization: 
Message-Id: <1054658432.13630.252.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 03 Jun 2003 18:40:32 +0200
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
cc: mjacob@feral.com
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 16:41:09 -0000

Yes, I probably should move the clrerror() and the
check/set of errno inside the check for "stat == -1". 
However, the code though odd is correct since 
I do not use errno unless the status is -1.

Our most recent tests are even more interesting.
We are getting the same data loss any time
Bacula switches tapes.  This means the data loss
does not have anything in particular to do with
the LEOM or PEOM status.

By the way, the funny casting is mandatory in C++,
because ssize_t as returned by the write is not the 
same as size_t (what is written).

More after I look at the most recent tests results.

Best regards,

Kern

On Tue, 2003-06-03 at 18:25, Justin T. Gibbs wrote:
> > What is clear from the output is that the write()
> > is returning a -1 status. errno could possibly be 0,
> > in which case I set it to ENOSPC, if it is not 0
> > then it is ENOSPC judging by the error message that
> > is printed "Write error on device ...".
> > 
> > You may want to see more, but here is the basic code
> > that does the write:
> >    if ((uint32_t)(stat=write(dev->fd, block->buf, (size_t)wlen)) !=
> > wlen) {
> >       /* We should check for errno == ENOSPC, BUT many 
> >        * devices simply report EIO when it is full.
> >        * with a little more thought we may be able to check
> >        * capacity and distinguish real errors and EOT
> >        * conditions.  In any case, we probably want to
> >        * simulate an End of Medium.
> >        */
> >       clrerror_dev(dev, -1);
> 
> Apart from the funny casting, the only obvious bug is that you
> are expecting errno to be set on every syscall.  Errno is only
> valid if stat == -1 or you explicitly clear it prior to the
> syscall (or after the last time it was set).  You don't seem
> to be doing that here.
> 
> See the errno man page for details
> 
> --
> Justin

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 10:03:43 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2E33237B401
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 10:03:43 -0700 (PDT)
Received: from magic.adaptec.com (magic-mail.adaptec.com [208.236.45.100])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8C94E43F85
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 10:03:42 -0700 (PDT)
	(envelope-from gibbs@scsiguy.com)
Received: from redfish.adaptec.com (redfish.adaptec.com [162.62.50.11])
	by magic.adaptec.com (8.11.6/8.11.6) with ESMTP id h53GweZ24008;
	Tue, 3 Jun 2003 09:58:40 -0700
Received: from [10.100.253.70] (aslan.btc.adaptec.com [10.100.253.70])
	by redfish.adaptec.com (8.8.8p2+Sun/8.8.8) with ESMTP id KAA19573;
	Tue, 3 Jun 2003 10:03:40 -0700 (PDT)
Date: Tue, 03 Jun 2003 11:04:20 -0600
From: "Justin T. Gibbs" <gibbs@scsiguy.com>
To: Kern Sibbald <kern@sibbald.com>
Message-ID: <900070000.1054659860@aslan.btc.adaptec.com>
In-Reply-To: <1054658432.13630.252.camel@rufus>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost> <20030602110836.H71034@beppo>
	<577540000.1054579840@aslan.btc.adaptec.com> <20030602131225.F71034@beppo>
	<1054645616.13630.161.camel@rufus>  <20030603072944.U44880@beppo>
	<1054652678.13630.209.camel@rufus>
	<882210000.1054657530@aslan.btc.adaptec.com>
	<1054658432.13630.252.camel@rufus>
X-Mailer: Mulberry/3.0.3 (Linux/x86)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
cc: freebsd-scsi@freebsd.org
cc: mjacob@feral.com
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: "Justin T. Gibbs" <gibbs@scsiguy.com>
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 17:03:43 -0000


> Yes, I probably should move the clrerror() and the
> check/set of errno inside the check for "stat == -1". 
> However, the code though odd is correct since 
> I do not use errno unless the status is -1.

No, the code is not correct.  clrerror() has side effects
in many cases when errno is non-zero.

> By the way, the funny casting is mandatory in C++,
> because ssize_t as returned by the write is not the 
> same as size_t (what is written).

Integer type conversions are still valid in C++:

#include <stdio.h>
#include <inttypes.h>

int
subroutine(uint8_t small_type)
{
        printf("small_type is %d\n", small_type);

        return (0);
}

int
main(int argc, char *argv[])
{
        uint8_t  foo;
        uint32_t bigger_foo;

        foo = 100;
        bigger_foo = argc;

        if (foo != bigger_foo) {
                printf("Foos differ\n");
        }

        subroutine(bigger_foo);

        return (0);
}

% g++ -Wall -pedantic foo.cc

Produces no output.

--
Justin

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 10:19:59 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 1CB8B37B401
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 10:19:59 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 3160143FA3
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 10:19:57 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h53HJOv10449;
	Tue, 3 Jun 2003 19:19:24 +0200
From: Kern Sibbald <kern@sibbald.com>
To: "Justin T. Gibbs" <gibbs@scsiguy.com>
In-Reply-To: <900070000.1054659860@aslan.btc.adaptec.com>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost> <20030602110836.H71034@beppo>
	<20030602131225.F71034@beppo>
	<1054645616.13630.161.camel@rufus>  <20030603072944.U44880@beppo>
	<1054652678.13630.209.camel@rufus>
	<882210000.1054657530@aslan.btc.adaptec.com>
	<1054658432.13630.252.camel@rufus>
	<900070000.1054659860@aslan.btc.adaptec.com>
Content-Type: text/plain
Organization: 
Message-Id: <1054660763.13630.279.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 03 Jun 2003 19:19:24 +0200
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
cc: mjacob@feral.com
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 17:19:59 -0000

On Tue, 2003-06-03 at 19:04, Justin T. Gibbs wrote:
> > Yes, I probably should move the clrerror() and the
> > check/set of errno inside the check for "stat == -1". 
> > However, the code though odd is correct since 
> > I do not use errno unless the status is -1.
> 
> No, the code is not correct.  clrerror() has side effects
> in many cases when errno is non-zero.

Yes, you are right. I'll fix it. However, this is not
the problem because the only side effect that can
occur is that Bacula would abort. The -1 second argument
guarantees that.

> 
> > By the way, the funny casting is mandatory in C++,
> > because ssize_t as returned by the write is not the 
> > same as size_t (what is written).

If I remove the (uint32_t) cast, I get an error message:

c++   -c   -I. -I..  -g -O2 -Wall  block.c
block.c: In function `int write_block_to_dev (JCR *, DEVICE *, 
DEV_BLOCK *)':
block.c:381: warning: comparison between signed and unsigned integer 
expressions

Line 381 reads:

   if ((stat=write(dev->fd, block->buf, (size_t)wlen)) != wlen) {

so I will stick with my funny casting.


> Integer type conversions are still valid in C++:
> 
> #include <stdio.h>
> #include <inttypes.h>
> 
> int
> subroutine(uint8_t small_type)
> {
>         printf("small_type is %d\n", small_type);
> 
>         return (0);
> }
> 
> int
> main(int argc, char *argv[])
> {
>         uint8_t  foo;
>         uint32_t bigger_foo;
> 
>         foo = 100;
>         bigger_foo = argc;
> 
>         if (foo != bigger_foo) {
>                 printf("Foos differ\n");
>         }
> 
>         subroutine(bigger_foo);
> 
>         return (0);
> }
> 
> % g++ -Wall -pedantic foo.cc
> 
> Produces no output.
> 
> --
> Justin

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 10:35:04 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id ED0E237B401
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 10:35:03 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 32B0D43F75
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 10:35:02 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h53HYSv10479;
	Tue, 3 Jun 2003 19:34:28 +0200
From: Kern Sibbald <kern@sibbald.com>
To: mjacob@feral.com
In-Reply-To: <20030603084701.U24586@wonky.in0.lcl>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost> <20030602110836.H71034@beppo>
	<20030602131225.F71034@beppo>	 <1054645616.13630.161.camel@rufus>
	<3490610000.1054651919@aslan.scsiguy.com>
	<20030603084701.U24586@wonky.in0.lcl>
Content-Type: text/plain
Organization: 
Message-Id: <1054661668.13606.292.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 03 Jun 2003 19:34:28 +0200
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 17:35:04 -0000

There are now a lot more things making sense
because I've had other FreeBSD users report 
"unbelievable" output from a simple test program
I have. I'll respond below, but with the latest
test results, the problem seems to be generated
from the simple sequence:

  write()
  ...
  write()
  ioctl(MTEOF)
  ioctl(MTEOF)
  ioctl(MTREW)

is there any reason why writing two end of file marks
followed by a rewind after a series of writes should
create data loss?

On Tue, 2003-06-03 at 18:03, Matthew Jacob wrote:
> >
> > This is exactly what it does. *Every* time the requested write
> > size does not agree with the returned value, Bacula gives
> > up on the tape.  My last email has the code that does that.
> >
> > My email above was not very clear because I was telling you what
> > happened in the particular case of loss of data (the -1 and errno=0
> > or errno=ENOSPC I don't know which). As noted here, Bacula *will*
> > stop writing if the driver returns a short block (assuming my
> > code isn't broken), but I have never seen that case on FreeBSD.
> 
> That's really wierd. I have to look at this closer. I've had some
> drives not report LEOT at all, but since tape_pattern_tester didn't
> complain on the same drive you were using, I know tape_pattern_tester is
> in fact stopping at LEOT.

I'm now sure this is not related to LEOF, so please don't waste your
time on that.

> 
> write(2) isn't necessarily returning -1. It may be returning 0- which
> means that no data moved.

In this case I am sure a -1 was returned because I print different
error messages.

> 
> I think the ENOSPC as you report is a red herring because you're setting
> this value- unless you actually *did* see -1 returned from write(2) and
> ENOSPC set in errno,.

I don't think this is the case because of what I say above.

> 
> In any case, even if you hit PEOT instead of LEOT, you shouldn't *lose*
> data. If you hit PEOT, we have to return -1/ENOSPC. Because this is Unix
> or Linux or Solaris instead of a reasonable and modern OS, like RSX, VMS
> or NT, which allow you to give realistic details to failures in I/O
> requests, this means you have no way of telling the user application how
> much was *actually* written when you hit *PEOT* (not LEOT, note!). As
> far as the user application is concerned, *no* data was written at all
> for this last write.

I've been screaming and tearing my hair out for the last 3 years for
exactly the reasons you write, so I am pleased that someone else feels
the same about it.

> 
> But there may in fact be data on the tape media. 

If it is there, it is hiding because we read the tape back and
printed all the block numbers. That way, we "verified" that the 
blocks were really missing.

> What is particularily
> annoying in the PEOT case is that your application probably asked for
> the next tape and rewrote all the blocks from the failed write. 

I don't think so because we would have clearly seen this in the 
listing we did.  The dump of the two tapes was done separately
reading only one at a time -- no possibility of getting confused.

> This is
> fine, but you have to make damned sure then on rereading the data later
> that you can handle duplicate blocks because you may read blocks NOPQR
> on tapeA and then switch to tapeB and read blocks OPQR again on tapeB.

I can handle duplicate blocks because I put the block number in each
block -- however, I have not programmed it because I have so many
things to do and I have never run into any duplicate blocks -- one
reason is that Bacula always stops writing if the write count is
not correct.

> 
> I don't think this is your problem here, but I thought I'd have a
> pre-coffee diatribe about it. Grump.
> 
> 
> >
> > > Ignoring the short write and waiting until you hit ENOSPC guarantees
> > > you will hit PEOM, since the LEOM is only reported once.  The tape
> > > driver expects that you know what you are doing if you go on writing.
> >
> > The only additional writing Bacula does (unless I am missing something)
> > is the two EOF marks.
> 
> This is one of the things that's bothering me. You shouldn't be writing
> extra marks if you actually close the device. I'd like to look over all
> the current Bacula source, but sourceforge is offline at the moment.
> 
> 
> -matt

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 11:00:37 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5630837B404
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 11:00:37 -0700 (PDT)
Received: from beppo.feral.com (beppo.feral.com [192.67.166.79])
	by mx1.FreeBSD.org (Postfix) with ESMTP id BE35C44005
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 11:00:33 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Received: from wonky.in0.lcl (wonky.in0.lcl [172.16.166.7])
	by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h53I0Mqw047146;
	Tue, 3 Jun 2003 11:00:22 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Date: Tue, 3 Jun 2003 11:00:21 -0700 (PDT)
From: Matthew Jacob <mjacob@feral.com>
X-X-Sender: mjacob@wonky.in0.lcl
To: Kern Sibbald <kern@sibbald.com>
In-Reply-To: <1054661668.13606.292.camel@rufus>
Message-ID: <20030603103611.R24586@wonky.in0.lcl>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost>
	<577540000.1054579840@aslan.btc.adaptec.com>
	<20030602131225.F71034@beppo>  <1054645616.13630.161.camel@rufus> 
	<1054653106.13606.217.camel@rufus><1054661668.13606.292.camel@rufus>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: mjacob@feral.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 18:00:37 -0000

> There are now a lot more things making sense
> because I've had other FreeBSD users report
> "unbelievable" output from a simple test program
> I have. I'll respond below, but with the latest
> test results, the problem seems to be generated
> from the simple sequence:
>
>   write()
>   ...
>   write()
>   ioctl(MTEOF)
>   ioctl(MTEOF)
>   ioctl(MTREW)
>
> is there any reason why writing two end of file marks
> followed by a rewind after a series of writes should
> create data loss?

No. That, in fact, whould flush data to the tape. Additional filemarks
should not be written even after you close here because the rewind would
clear SA_FLAG_TAPE_WRITTEN so a subsequent close won't write more.

I'm playing around with this some as we speak.

-matt

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 11:17:03 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 0395337B401
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 11:17:03 -0700 (PDT)
Received: from beppo.feral.com (beppo.feral.com [192.67.166.79])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 42D9E43F93
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 11:17:02 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Received: from wonky.in0.lcl (wonky.in0.lcl [172.16.166.7])
	by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h53IH0qw047210;
	Tue, 3 Jun 2003 11:17:00 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Date: Tue, 3 Jun 2003 11:17:00 -0700 (PDT)
From: Matthew Jacob <mjacob@feral.com>
X-X-Sender: mjacob@wonky.in0.lcl
To: Kern Sibbald <kern@sibbald.com>
In-Reply-To: <20030603103611.R24586@wonky.in0.lcl>
Message-ID: <20030603111000.V24586@wonky.in0.lcl>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost>
	<577540000.1054579840@aslan.btc.adaptec.com>
	<20030602131225.F71034@beppo>  <1054645616.13630.161.camel@rufus> 
	<1054653106.13606.217.camel@rufus><1054661668.13606.292.camel@rufus>
	<20030603103611.R24586@wonky.in0.lcl>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: mjacob@feral.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 18:17:03 -0000


> >   write()
> >   ...
> >   write()
> >   ioctl(MTEOF)
> >   ioctl(MTEOF)
> >   ioctl(MTREW)
> >

I'm going to have to look at the actual full bacula source (the URL that
Dan sent me I can't get at).

I modified tape_pattern_tester to take another argument ('-E N'), which
specifies the number of filemarks to use to close off a tape at the end,
and just rewind (rather than use close(2) and let the tape driver write
the 'correct' number of marks), and start reading.

tape_pattern_tester -v -b 64512 -E 2 -n 1 -f /dev/nrsa0
.......Rewind Tape
........Write Pass
...Writing 2 FMKs
WEOT at File 0 Record 1000 Offset 64512 (64512000 total bytes written)
Elapsed Seconds: 52; Data Rate: 1.17308MB/s
.......Rewind Tape
.........Read Pass
REOT at File 1 Record 0 Offset 0 (64512000 total bytes read)
Elapsed Seconds: 57: Data Rate: 1.07018MB/s


From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 11:39:30 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 541CC37B401
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 11:39:30 -0700 (PDT)
Received: from beppo.feral.com (beppo.feral.com [192.67.166.79])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 60BD543F93
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 11:39:29 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Received: from wonky.in0.lcl (wonky.in0.lcl [172.16.166.7])
	by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h53IdSqw047285;
	Tue, 3 Jun 2003 11:39:28 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Date: Tue, 3 Jun 2003 11:39:28 -0700 (PDT)
From: Matthew Jacob <mjacob@feral.com>
X-X-Sender: mjacob@wonky.in0.lcl
To: Kern Sibbald <kern@sibbald.com>
In-Reply-To: <1054550725.1582.1859.camel@rufus>
Message-ID: <20030603111738.X24586@wonky.in0.lcl>
References: <1054490081.1582.1685.camel@rufus>
	<2846020000.1054498114@aslan.scsiguy.com>
	<1054550725.1582.1859.camel@rufus>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@freebsd.org
Subject: Re: Differences between Solaris/Linux and FreeBSD
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: mjacob@feral.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 18:39:30 -0000


> As promised, in this email, I will try my best to describe
> the differences I found between Solaris/Linux and FreeBSD
> concerning tape handling. There were five separate areas
> where I noticed differences:
>
> 1. On Solaris/Linux, the default behavior for ioctl(MTEOM)
>    is to run in what they call slow mode. In this mode, the
>    tape is positioned to the end of the data, and the driver
>    returns the correct file number in the MTIOCGET packet.
>    It is possible to enable fast-EOM, but no one uses it to
>    my knowledge.
>
>    On FreeBSD, you apparently always use the fast-EOM so that
>    the tape position is unknown after the ioctl().

You *could* read block position. Particularly for h/w blocks this works
very fast when you need to locate.

NB: SCSI-3 changed the layout for h/w block position stuff and I haven't
updated the FreeBSD driver to handle this yet.

>    Bacula always knows how many files are on a tape, and when
>    appending to a tape that is already written and newly opened,
>    it MUST know where it is on the tape. As a consequence, on
>    FreeBSD, I must explicitly use MTFSF with read()s in between
>    to position to the end of the tape -- a fairly slow affair.

Uh, this is how 'slow' EOM works. It's not really faster to do it in the
kernel as opposed to in the driver.

I must point out that you cannot, and should not, depend absolutely on
reported position. For tape you can ensure BOT or end of recorded media,
but otherwise you really must use self-referential data on the tape if
tape location is important.

> 2. Your handling of EOM differs from Solaris/Linux.  On both of
>    those systems, when the Bacula reads the first EOF, the driver
>    returns 0 bytes read. On reading the second EOF, the driver
>    returns 0 bytes read, but before returning backspaces over
>    the EOF, leaving you positioned correctly for appending to the
>    tape and having told you you are at the end of the tape by
>    giving two consecutive 0 byte read.  Any further read()
>    request return an I/O error.
>
>    On FreeBSD, reading the first EOF returns 0 bytes, reading
>    the second EOF also returns 0 bytes (sometimes, I apparently
>    get "Illegal operation"). However, the tape is left positioned
>    after the second EOF, so appending from that point effectively
>    "loses" the data.
>
>    To handle this correctly the FreeBSD user must add a configuration
>    statement to Bacula telling him to backspace file at EOM.

Yes. This is a problem.

But part of the problem here is that dual-filemark at EOM is only one
tape convention- and a poorly thought out one at best- it exists
*solely* because a *few* (ancient) tape drives would unwind off the feed
reel if you kept advancing them. For QIC drives, you *cannot* write dual
filemarks (really).

Note that there is a setting that can change the model to single EOM. If
I could have gotten away with it, I would have made this the default.

I think, though, I'd accept that the FreeBSD behaviour is a bug that
should be fixed. If we have a dual fmk EOT model and are advancing along
and hit two in a row, we *probably* should say we're at logical EOT and
backspace over one of them. After all, this is what we do when we're
*writing* to tape and close the no-rewind device.

I also would agree that this situation is exacerbated by the 'space to
end of recorded data' model for the MTEOM command. This now leaves us
with a legacy of tapes with spurious dual filemarks in the middle.

Oops. This means that I really can't fix things the way you'd like :-(.

>
> 3. I have previously described this but will do so again for
>    completeness here. On Solaris/Linux when Bacula does:
>
>     write();
>     ioctl(MTEOF);
>     ioctl(MTEOF)
>     ioctl(MTBSF);
>     ioctl(MTBSF);
>     ioctl(MTBSR);
>     read();
>
>    the read() re-reads the last write.  On FreeBSD, the read returns
>    0 bytes (there is also a problem of freezing the tape wrapped into
>    this example if I am not mistaken). Apparently the 0 bytes read is
>    because FreeBSD adds an additional EOF mark (not necessary) and
>    leaves the drive positioned *after* the mark thus re-reading the
>    last record fails when it logically should not.

I don't believe that FreeBSD adds an additional filemark here, but I
should add this as a test case. I have another tester program that I use
for testing block locate, but I haven't really validated it or finished
it yet.

Why, btw, are you issuing two MTEOFs? The mtop has a count field y'know
:-).

>
> 4. Tape freezing: On Solaris/Linux, the tape never "freezes". On
>    FreeBSD it does freeze. As best I can determine, you freeze the
>    drive when you lose track of where you are. Typically, this
>    occurs when I do a MTBSR to re-read the last record. On Solaris/Linux
>    the tape is never frozen, but when they don't know the position,
>    they simply return -s in the MTIOCGET packet, which is fine with
>    me because Bacula only uses that info when initially reading a
>    tape to append to it.
>
>    Freezing the tape causes all sorts of problems because it generates
>    a flood of unexpected errors. Within a large complicated program like
>    Bacula, when a low level routine re-reads a record during writing and
>    the tape freezes, it cannot simply rewind the drive as this could
>    cause chaos and possible overwriting of the beginning of the drive.
>
>    I've attempted to overcome tape freezing by providing the user a
>    means to turn off MTBSR (but they don't always do so), and by issuing
>    ioctl(MTIOCERRSTAT) after every return of -1 from any I/O request.
>
>    I recommend that you do away with freezing the drive -- it seems to
>    me that it only causes more problems.  In saying that I have to
>    that I really do not understand tape freezing or why you do it since
>    I found no documentation on it, and everything I write above I have
>    deduced from what Dan has reported back to me.

Freezing the drive is precisely what Solaris and Linux *should* do. If
you've lost position, you have to take some action to bring the tape to
a known position. The unaware application should not be allowed to
overwrite in random spots on the tape. If your low level read/write
routines get any kind of error, you have to move to a "what do I have in
my tape drive now?" state anyway.

You know, I was pretty sure I'd documented the freeze option, but I
cannot find it in the man page (sa(4)) now at all.


>
> 5. I am quite fuzzy on this point because I forget exactly what happened
>    and what I did about it.
>
>    It seems to me that on Linux, if I read a block but specify a number
>    of bytes less than the number actually in the block on the tape, the
>    driver returns the data anyway.  I then check if the block is
>    internally complete and if not, increase my record size to the size
>    indicated in the data received, backspace one record, and re-read it.
>
>    If I am not mistaken, on FreeBSD, the first read returns an error,
>    and Bacula just immediately gives up.  Your documentation specifies
>    that one can never read a partial record from a tape, but it does not
>    specify what error code is generated. As a consequence, rather than
>    recovering and re-reading the record, Bacula has to assume it was
>    a fatal error.

The reason linux 'succeeds' here is because linux internally reads all
tape data to an oversized buffer in kernel memory anyway. This means
that it doesn't suffer an 'overrun' condition which is what you are
doing if you attempt to read *less* than a tape record size. Solaris
will fail the same way, btw, as FreeBSD.

What you should always do is start out by reading the largest possible
record size (a pathetic 64KB for FreeBSD) and adjust *downward* (if
desired and you are just autosizing to find a tape record size).


THanks for doing the critique. There's definitely food for thought here
and some changes that *should* be made.

-matt

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 12:12:41 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 159F237B401
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 12:12:41 -0700 (PDT)
Received: from magic.adaptec.com (magic-mail.adaptec.com [208.236.45.100])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 270A143F85
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 12:12:40 -0700 (PDT)
	(envelope-from gibbs@scsiguy.com)
Received: from redfish.adaptec.com (redfish.adaptec.com [162.62.50.11])
	by magic.adaptec.com (8.11.6/8.11.6) with ESMTP id h53J7aZ28320;
	Tue, 3 Jun 2003 12:07:36 -0700
Received: from [10.100.253.70] (aslan.btc.adaptec.com [10.100.253.70])
	by redfish.adaptec.com (8.8.8p2+Sun/8.8.8) with ESMTP id MAA27735;
	Tue, 3 Jun 2003 12:12:37 -0700 (PDT)
Date: Tue, 03 Jun 2003 13:13:21 -0600
From: "Justin T. Gibbs" <gibbs@scsiguy.com>
To: Kern Sibbald <kern@sibbald.com>
Message-ID: <955950000.1054667601@aslan.btc.adaptec.com>
In-Reply-To: <1054660763.13630.279.camel@rufus>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost> <20030602110836.H71034@beppo>
	<577540000.1054579840@aslan.btc.adaptec.com> <20030602131225.F71034@beppo>
	<1054645616.13630.161.camel@rufus>  <20030603072944.U44880@beppo>
	<1054652678.13630.209.camel@rufus>
	<882210000.1054657530@aslan.btc.adaptec.com>
	<1054658432.13630.252.camel@rufus>
	<900070000.1054659860@aslan.btc.adaptec.com>
	<1054660763.13630.279.camel@rufus>
X-Mailer: Mulberry/3.0.3 (Linux/x86)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
cc: freebsd-scsi@freebsd.org
cc: mjacob@feral.com
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: "Justin T. Gibbs" <gibbs@scsiguy.com>
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 19:12:41 -0000

>> > By the way, the funny casting is mandatory in C++,
>> > because ssize_t as returned by the write is not the 
>> > same as size_t (what is written).
> 
> If I remove the (uint32_t) cast, I get an error message:
> 
> c++   -c   -I. -I..  -g -O2 -Wall  block.c
> block.c: In function `int write_block_to_dev (JCR *, DEVICE *, 
> DEV_BLOCK *)':
> block.c:381: warning: comparison between signed and unsigned integer 
> expressions
> 
> Line 381 reads:
> 
>    if ((stat=write(dev->fd, block->buf, (size_t)wlen)) != wlen) {
> 
> so I will stick with my funny casting.
> 

This has nothing to do with type size or the fact that you are
using C++.  The same warning would occur when your code is
compiled as C.  wlen should be a signed type.  Since wlen
by definition cannot be larger than the largest positive integer
reportable by the signed return value of write, using an unsigned
type buys you nothing.  Conversion from ssize_t to size_t will
occur without error if you happen to chose to make wlen an ssize_t.

I guess it matters little.  My own philosophy is that casts should
be used as a last resort rather than deployed indiscriminantly to
cover up compile warnings.  The above casts are easily avoidable
which is why I mentioned them at all.

--
Justin

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 12:44:03 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id CA28637B401
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 12:44:03 -0700 (PDT)
Received: from bast.unixathome.org (bast.unixathome.org [66.11.174.150])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 3EC7043F3F
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 12:44:03 -0700 (PDT)
	(envelope-from dan@langille.org)
Received: from wocker (wocker.unixathome.org [192.168.0.99])
	by bast.unixathome.org (Postfix) with ESMTP
	id 25E8F3F4F; Tue,  3 Jun 2003 15:44:02 -0400 (EDT)
From: "Dan Langille" <dan@langille.org>
To: Matthew Jacob <mjacob@feral.com>
Date: Tue, 03 Jun 2003 15:44:02 -0400
MIME-Version: 1.0
Message-ID: <3EDCC242.10472.CEB33309@localhost>
Priority: normal
References: <20030603103611.R24586@wonky.in0.lcl>
In-reply-to: <20030603111000.V24586@wonky.in0.lcl>
X-mailer: Pegasus Mail for Windows (v4.02a)
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Content-description: Mail message body
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 19:44:04 -0000

On 3 Jun 2003 at 11:17, Matthew Jacob wrote:

> 
> > >   write()
> > >   ...
> > >   write()
> > >   ioctl(MTEOF)
> > >   ioctl(MTEOF)
> > >   ioctl(MTREW)
> > >
> 
> I'm going to have to look at the actual full bacula source (the URL that
> Dan sent me I can't get at).

http://www.freebsddiary.org/tmp/bacula-1.31cvs.tar.gz will work now.  
Sorry about that.
-- 
Dan Langille : http://www.langille.org/

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 12:46:34 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 509E837B401
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 12:46:34 -0700 (PDT)
Received: from beppo.feral.com (beppo.feral.com [192.67.166.79])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 79C7F43F93
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 12:46:32 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Received: from wonky.in0.lcl (wonky.in0.lcl [172.16.166.7])
	by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h53JkKqw047563;
	Tue, 3 Jun 2003 12:46:20 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Date: Tue, 3 Jun 2003 12:46:20 -0700 (PDT)
From: Matthew Jacob <mjacob@feral.com>
X-X-Sender: mjacob@wonky.in0.lcl
To: Dan Langille <dan@langille.org>
In-Reply-To: <3EDCC242.10472.CEB33309@localhost>
Message-ID: <20030603124616.O24586@wonky.in0.lcl>
References: <20030603103611.R24586@wonky.in0.lcl>
	<3EDCC242.10472.CEB33309@localhost>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: mjacob@feral.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 19:46:34 -0000


Working now, thanks


On Tue, 3 Jun 2003, Dan Langille wrote:

> On 3 Jun 2003 at 11:17, Matthew Jacob wrote:
>
> >
> > > >   write()
> > > >   ...
> > > >   write()
> > > >   ioctl(MTEOF)
> > > >   ioctl(MTEOF)
> > > >   ioctl(MTREW)
> > > >
> >
> > I'm going to have to look at the actual full bacula source (the URL that
> > Dan sent me I can't get at).
>
> http://www.freebsddiary.org/tmp/bacula-1.31cvs.tar.gz will work now.
> Sorry about that.
> --
> Dan Langille : http://www.langille.org/
>
>

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 13:05:35 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 9B21E37B401
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 13:05:35 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D108143F85
	for <freebsd-scsi@freebsd.org>; Tue,  3 Jun 2003 13:05:32 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h53K4vv10899;
	Tue, 3 Jun 2003 22:04:57 +0200
From: Kern Sibbald <kern@sibbald.com>
To: "Justin T. Gibbs" <gibbs@scsiguy.com>
In-Reply-To: <955950000.1054667601@aslan.btc.adaptec.com>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost> <20030602110836.H71034@beppo>
	<20030602131225.F71034@beppo>
	<1054645616.13630.161.camel@rufus>  <20030603072944.U44880@beppo>
	<1054652678.13630.209.camel@rufus>
	<882210000.1054657530@aslan.btc.adaptec.com>
	<1054658432.13630.252.camel@rufus>
	<900070000.1054659860@aslan.btc.adaptec.com>
	<1054660763.13630.279.camel@rufus>
	<955950000.1054667601@aslan.btc.adaptec.com>
Content-Type: text/plain
Organization: 
Message-Id: <1054670696.13606.302.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 03 Jun 2003 22:04:56 +0200
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
cc: mjacob@feral.com
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 20:05:35 -0000

I cannot argue with what you write other than to
say that I generally use uint32_t for variables that
are positive integers, so I have them everywhere and
to change wlen to signed would lead to a flood of
changes of other variables. If I remember right one
thing that pushed me in this direction is the fact
that sizeof() is unsigned.

By the way, I've just completed a much simpler test
that I ask Dan to try to see if we can reproduce the
problem in a simpler case.

Best regards,

Kern

On Tue, 2003-06-03 at 21:13, Justin T. Gibbs wrote:
> >> > By the way, the funny casting is mandatory in C++,
> >> > because ssize_t as returned by the write is not the 
> >> > same as size_t (what is written).
> > 
> > If I remove the (uint32_t) cast, I get an error message:
> > 
> > c++   -c   -I. -I..  -g -O2 -Wall  block.c
> > block.c: In function `int write_block_to_dev (JCR *, DEVICE *, 
> > DEV_BLOCK *)':
> > block.c:381: warning: comparison between signed and unsigned integer 
> > expressions
> > 
> > Line 381 reads:
> > 
> >    if ((stat=write(dev->fd, block->buf, (size_t)wlen)) != wlen) {
> > 
> > so I will stick with my funny casting.
> > 
> 
> This has nothing to do with type size or the fact that you are
> using C++.  The same warning would occur when your code is
> compiled as C.  wlen should be a signed type.  Since wlen
> by definition cannot be larger than the largest positive integer
> reportable by the signed return value of write, using an unsigned
> type buys you nothing.  Conversion from ssize_t to size_t will
> occur without error if you happen to chose to make wlen an ssize_t.
> 
> I guess it matters little.  My own philosophy is that casts should
> be used as a last resort rather than deployed indiscriminantly to
> cover up compile warnings.  The above casts are easily avoidable
> which is why I mentioned them at all.
> 
> --
> Justin

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 14:22:07 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5496D37B401
	for <freebsd-scsi@FreeBSD.org>; Tue,  3 Jun 2003 14:22:07 -0700 (PDT)
Received: from cybernetics.com (cyborg.cybernetics.com [206.246.200.18])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 5BAFA43F3F
	for <freebsd-scsi@FreeBSD.org>; Tue,  3 Jun 2003 14:22:06 -0700 (PDT)
	(envelope-from )
Received: from martin1 ([137.157.1.143]) by cyborg.cybernetics.com with ESMTP
	id <119057>; Tue, 3 Jun 2003 17:20:54 -0400
Message-Id: <5.1.0.14.2.20030603170956.03195380@wheresmymailserver.com>
X-Sender: (Unverified)
X-Mailer: QUALCOMM Windows Eudora Version 5.1
Date: Tue, 03 Jun 2003 17:20:24 -0400
From: PostMaster General <mail@cybernetics.com>
To: freebsd-scsi@FreeBSD.org
Illegal-Object: S
	From:	Martin <>
			^-expected word
Mime-Version: 1.0
Content-Type: multipart/mixed; x-avg-checked=avg-ok-52865B7E;
	boundary="=======18275876======="
X-Content-Filtered-By: Mailman/MimeDel 2.1.1
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 21:22:07 -0000

--=======18275876=======
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
Content-Type: text/plain; x-avg-checked="avg-ok-52865B7E"; charset="us-ascii"


   It looks like your code is failing to detect the check condition  no
   sense combination thrown when you hit EOT - and in variable block mode
   you would not see any residual on the block that crossed LEOT. I don't
   know enough about the FreeBSD tape driver to know where the fault
   lies.
   Even using that as a theory it did not make any sense why you were
   loosing so many blocks of data. I pulled up a DDS manual and checked
   the EOT behavior and got a surprise. Most modern tape drives
   automatically go into unbuffered mode past LEOT. This would insure
   that you would lose only one block when you hit PEOT. But the
   description that I have added below only states that it will throw a
   check condition/no sense  for each write past LEOT. Thus it would seem
   that you have several data blocks in the buffer when PEOT is reached.
   I wonder if the error you are detecting as EOT is in fact a deferred
   error from several blocks back?
   Manual excerpt:
   The drive calculates the logical Early Warning. The Early Warning
   point is
   calculated as greater than ten megabytes before the EOT. This ensures
   that
   when Early Warning is encountered, enough space remains to
   successfully
   write any unwritten blocks up to ten megabytes.
   At Early Warning, the drive completes the current block transfer and
   terminates the command with a Check Condition, EOM bit set, and Sense
   Key
   equal to 0. If the SEW bit (in MODE SELECT Device Configuration Page)
   is set,
   the data in the buffer is then written to tape.
   Subsequent WRITE commands complete with a Check Condition and the EOM
   bit set.
   If writing the buffer to tape is unsuccesful because of EOT, a Volume
   Overflow
   is reported. The Residual count field in the Request Sense data
   reports the
   amount of data not transferred. Writing can continue in the Early
   Warning
   region until EOT is encountered. Any WRITE command issued within Early
   Warning and successfully completed, finishes with a Check Condition
   and the
   EOM bit set.
   If an error is encountered while writing, the Write Retry Count (in
   MODE
   SELECT Read/Write Error Recovery Page) specifies the maximum number of
   attempts to rewrite the data. If none of the rewrites area successful,
   the error is
   considered unrecoveralbe and reported as such. This situation may
   occur if the
   tape has severe damage. In this case, the green LED flashes rapidly.
   Hope this helps.

   Martin

--=======18275876=======--

From owner-freebsd-scsi@FreeBSD.ORG  Tue Jun  3 14:33:57 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id CE0EF37B401
	for <freebsd-scsi@FreeBSD.org>; Tue,  3 Jun 2003 14:33:57 -0700 (PDT)
Received: from aramis.rutgers.edu (aramis.rutgers.edu [128.6.4.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 1164043FA3
	for <freebsd-scsi@FreeBSD.org>; Tue,  3 Jun 2003 14:33:57 -0700 (PDT)
	(envelope-from bohra@cs.rutgers.edu)
Received: from cs.rutgers.edu (sirtaki.rutgers.edu [128.6.171.146])
	by aramis.rutgers.edu (8.11.7+Sun/8.8.8) with ESMTP id h53LXtk29390;
	Tue, 3 Jun 2003 17:33:55 -0400 (EDT)
Message-ID: <3EDD1302.6090102@cs.rutgers.edu>
Date: Tue, 03 Jun 2003 17:28:34 -0400
From: Aniruddha Bohra <bohra@cs.rutgers.edu>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
	rv:1.3) Gecko/20030312
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Nate Lawson <nate@root.org>
References: <3ED3CCFF.4080507@cs.rutgers.edu> <20030602235514.J22029@root.org>
In-Reply-To: <20030602235514.J22029@root.org>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@FreeBSD.org
Subject: Re: Emulating a SCSI device
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2003 21:33:58 -0000


Nate Lawson wrote:

>On Tue, 27 May 2003, Aniruddha Bohra wrote:
>  
>
>>    I am trying to write a SIM module for FreeBSD which basically
>>emulates a SCSI controller with a disk attached at target 0 lun 0.
>>    
>>
>
>What is your hardware?  Are you using sys/cam/scsi/scsi_target and
>src/share/examples/scsi_target?
>  
>
There is a LSI logic(mpt) SCSI controller in the system. However that is 
immaterial as
I am trying to get a large memory buffer to act as a disk.
I would emulate a SCSI controller, so will handle the low level SCSI 
commands myself by accessing the
memory.

I will look at the above.

>>    I go as far as the action function of the controller getting called
>>with a XPT_PATH_INQ - where I fill in the fake data.
>>
>>    Nothing happens after that. I have looked for documentation
>>of how to get the psuedo disk attached to the da driver but did
>>not make much headway.
>>    
>>
>
>You have to call xpt_done() on the CCB to send it back to the caller.
>  
>
I do that - Is there something special that I need to write in the ccb 
header so that the caller
identifies the "device" as a disk ?

Thanks for the response.

Aniruddha

From owner-freebsd-scsi@FreeBSD.ORG  Wed Jun  4 00:21:04 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 7355137B401
	for <freebsd-scsi@freebsd.org>; Wed,  4 Jun 2003 00:21:04 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id CC15A43F3F
	for <freebsd-scsi@freebsd.org>; Wed,  4 Jun 2003 00:21:02 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h547KVv13248;
	Wed, 4 Jun 2003 09:20:32 +0200
From: Kern Sibbald <kern@sibbald.com>
To: mjacob@feral.com
In-Reply-To: <20030603103611.R24586@wonky.in0.lcl>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost> <20030602110836.H71034@beppo>
	<20030602131225.F71034@beppo>	 <1054645616.13630.161.camel@rufus>
	<3490610000.1054651919@aslan.scsiguy.com>
	<20030603084701.U24586@wonky.in0.lcl>
	<20030603103611.R24586@wonky.in0.lcl>
Content-Type: text/plain
Organization: 
Message-Id: <1054711231.13606.396.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 04 Jun 2003 09:20:31 +0200
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Jun 2003 07:21:04 -0000

Hello,

The latest tests indicate that the sequence that I
gave you below does work. I'm going to look more 
carefully at the Bacula code (the code path is 
fairly complicated) because perhaps it simply
releases the drive without doing a rewind.

I'm going to also work on simplifying the case
that we know fails (and it seems to be perfectly
reproducible).

Best regards,

Kern

On Tue, 2003-06-03 at 20:00, Matthew Jacob wrote:
> > There are now a lot more things making sense
> > because I've had other FreeBSD users report
> > "unbelievable" output from a simple test program
> > I have. I'll respond below, but with the latest
> > test results, the problem seems to be generated
> > from the simple sequence:
> >
> >   write()
> >   ...
> >   write()
> >   ioctl(MTEOF)
> >   ioctl(MTEOF)
> >   ioctl(MTREW)
> >
> > is there any reason why writing two end of file marks
> > followed by a rewind after a series of writes should
> > create data loss?
> 
> No. That, in fact, whould flush data to the tape. Additional filemarks
> should not be written even after you close here because the rewind would
> clear SA_FLAG_TAPE_WRITTEN so a subsequent close won't write more.
> 
> I'm playing around with this some as we speak.
> 
> -matt

From owner-freebsd-scsi@FreeBSD.ORG  Wed Jun  4 07:51:42 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6D60637B401
	for <freebsd-scsi@freebsd.org>; Wed,  4 Jun 2003 07:51:42 -0700 (PDT)
Received: from beppo.feral.com (beppo.feral.com [192.67.166.79])
	by mx1.FreeBSD.org (Postfix) with ESMTP id AEBBA43F85
	for <freebsd-scsi@freebsd.org>; Wed,  4 Jun 2003 07:51:41 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Received: from wonky.in0.lcl (wonky.in0.lcl [172.16.166.7])
	by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h54EpZqw020494;
	Wed, 4 Jun 2003 07:51:39 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Date: Wed, 4 Jun 2003 07:51:35 -0700 (PDT)
From: Matthew Jacob <mjacob@feral.com>
X-X-Sender: mjacob@wonky.in0.lcl
To: Kern Sibbald <kern@sibbald.com>
In-Reply-To: <1054711231.13606.396.camel@rufus>
Message-ID: <20030604074943.E98367@wonky.in0.lcl>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost>
	<577540000.1054579840@aslan.btc.adaptec.com>
	<20030602131225.F71034@beppo>  <1054645616.13630.161.camel@rufus> 
	<1054653106.13606.217.camel@rufus><1054661668.13606.292.camel@rufus> 
	<1054711231.13606.396.camel@rufus>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: mjacob@feral.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Jun 2003 14:51:42 -0000


Yes, well, I have not been able to myself reproduce it.

Some of the issues you brought up in the other mail have been worth all
this ruckus in any case.

I'll be OOT from thu-wed so I may not respond quickly or get to looking
at bacula myself. I did leave a couple tape drives hooked up to one of
my 4.8 boxes so I can fool around with stuff remotely if I get a chance.

> Hello,
>
> The latest tests indicate that the sequence that I
> gave you below does work. I'm going to look more
> carefully at the Bacula code (the code path is
> fairly complicated) because perhaps it simply
> releases the drive without doing a rewind.
>
> I'm going to also work on simplifying the case
> that we know fails (and it seems to be perfectly
> reproducible).
>

From owner-freebsd-scsi@FreeBSD.ORG  Wed Jun  4 09:51:46 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 0AE7637B401
	for <freebsd-scsi@freebsd.org>; Wed,  4 Jun 2003 09:51:46 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 45B9B43FA3
	for <freebsd-scsi@freebsd.org>; Wed,  4 Jun 2003 09:51:44 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h54GoOv15165;
	Wed, 4 Jun 2003 18:50:25 +0200
From: Kern Sibbald <kern@sibbald.com>
To: mjacob@feral.com
In-Reply-To: <20030604074943.E98367@wonky.in0.lcl>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost> <20030602110836.H71034@beppo>
	<20030602131225.F71034@beppo>	 <1054645616.13630.161.camel@rufus>
	<3490610000.1054651919@aslan.scsiguy.com>
	<20030603084701.U24586@wonky.in0.lcl>
	<20030603103611.R24586@wonky.in0.lcl>
	<20030604074943.E98367@wonky.in0.lcl>
Content-Type: text/plain
Organization: 
Message-Id: <1054745424.13606.524.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 04 Jun 2003 18:50:24 +0200
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Jun 2003 16:51:46 -0000

Sorry, I didn't mean to create a ruckus -- I just
don't like data loss -- especially in "Bacula". :-)

What I wrote in the other mail was more or less a
core dump. When things calm down (and hopefully I get
to the bottom of this), I'm going to try to point out
the things that could *possibly* be changed without
creating chaos with existing programs.  Some things
like not specifying a large enough buffer and getting
an error (on FreeBSD) are preferable to Linux just
returning part of the buffer -- but it is a difference
worth knowing about.


On Wed, 2003-06-04 at 16:51, Matthew Jacob wrote:
> Yes, well, I have not been able to myself reproduce it.
> 
> Some of the issues you brought up in the other mail have been worth all
> this ruckus in any case.
> 
> I'll be OOT from thu-wed so I may not respond quickly or get to looking
> at bacula myself. I did leave a couple tape drives hooked up to one of
> my 4.8 boxes so I can fool around with stuff remotely if I get a chance.
> 
> > Hello,
> >
> > The latest tests indicate that the sequence that I
> > gave you below does work. I'm going to look more
> > carefully at the Bacula code (the code path is
> > fairly complicated) because perhaps it simply
> > releases the drive without doing a rewind.
> >
> > I'm going to also work on simplifying the case
> > that we know fails (and it seems to be perfectly
> > reproducible).
> >

From owner-freebsd-scsi@FreeBSD.ORG  Wed Jun  4 23:41:58 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 01D3437B401
	for <freebsd-scsi@FreeBSD.org>; Wed,  4 Jun 2003 23:41:57 -0700 (PDT)
Received: from rootlabs.com (root.org [67.118.192.226])
	by mx1.FreeBSD.org (Postfix) with SMTP id 0FB6F43F75
	for <freebsd-scsi@FreeBSD.org>; Wed,  4 Jun 2003 23:41:57 -0700 (PDT)
	(envelope-from nate@rootlabs.com)
Received: (qmail 26656 invoked by uid 1000); 5 Jun 2003 06:41:57 -0000
Date: Wed, 4 Jun 2003 23:41:57 -0700 (PDT)
From: Nate Lawson <nate@root.org>
To: Aniruddha Bohra <bohra@cs.rutgers.edu>
In-Reply-To: <3EDD1302.6090102@cs.rutgers.edu>
Message-ID: <20030604233659.V26654@root.org>
References: <3ED3CCFF.4080507@cs.rutgers.edu> <20030602235514.J22029@root.org>
 <3EDD1302.6090102@cs.rutgers.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@FreeBSD.org
Subject: Re: Emulating a SCSI device
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Jun 2003 06:41:58 -0000

On Tue, 3 Jun 2003, Aniruddha Bohra wrote:
> Nate Lawson wrote:
> >On Tue, 27 May 2003, Aniruddha Bohra wrote:
> >>    I am trying to write a SIM module for FreeBSD which basically
> >>emulates a SCSI controller with a disk attached at target 0 lun 0.
> >
> >What is your hardware?  Are you using sys/cam/scsi/scsi_target and
> >src/share/examples/scsi_target?
>
> There is a LSI logic(mpt) SCSI controller in the system. However that is
> immaterial as
> I am trying to get a large memory buffer to act as a disk.
> I would emulate a SCSI controller, so will handle the low level SCSI
> commands myself by accessing the
> memory.
>
> I will look at the above.

I still am unclear about what you are trying to do.  Are you trying to
emulate an HBA?  In that case, just take isp(4) or ahc(4) and strip things
down to *_attach() and *_action().

> >>    I go as far as the action function of the controller getting called
> >>with a XPT_PATH_INQ - where I fill in the fake data.

Which controller do you mean?  You said above the controller is not
relevant.

> >>    Nothing happens after that. I have looked for documentation
> >>of how to get the psuedo disk attached to the da driver but did
> >>not make much headway.
> >
> >You have to call xpt_done() on the CCB to send it back to the caller.
>
> I do that - Is there something special that I need to write in the ccb
> header so that the caller
> identifies the "device" as a disk ?

Let's start over with you explaining what you are trying to do and I can
give you an answer.

-Nate

From owner-freebsd-scsi@FreeBSD.ORG  Thu Jun  5 02:15:49 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id ADA7E37B401; Thu,  5 Jun 2003 02:15:49 -0700 (PDT)
Received: from vmx1.skoleetaten.oslo.no (vmx1.skoleetaten.oslo.no
	[193.156.192.31])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 14DE143F93; Thu,  5 Jun 2003 02:15:48 -0700 (PDT)
	(envelope-from shamz@nevada.skoleetaten.oslo.no)
Received: from smtp.skoleetaten.oslo.no (localhost [127.0.0.1])
	by vmx1.skoleetaten.oslo.no (Clean Mail System) with SMTP
	id B69EA7D4C5; Thu,  5 Jun 2003 11:15:43 +0200 (CEST)
Received: from nevada.skoleetaten.oslo.no (nevada.skoleetaten.oslo.no
	[193.156.192.131])
	by smtp.skoleetaten.oslo.no (Clean Mail System) with ESMTP
	id 836587D470; Thu,  5 Jun 2003 11:15:43 +0200 (CEST)
Received: from nevada.skoleetaten.oslo.no (localhost [127.0.0.1])
	h559FcOU055116;	Thu, 5 Jun 2003 11:15:38 +0200 (CEST)
	(envelope-from shamz@nevada.skoleetaten.oslo.no)
Received: (from shamz@localhost)h559FWe1055115;
	Thu, 5 Jun 2003 11:15:32 +0200 (CEST)
Date: Thu, 5 Jun 2003 11:15:32 +0200
From: Shaun Jurrens <shaun.jurrens@skoleetaten.oslo.no>
To: Palle Girgensohn <girgen@pingpong.net>
Message-ID: <20030605091532.GO98443@nevada.skoleetaten.oslo.no>
References: <20030603152123.GM98443@nevada.skoleetaten.oslo.no>
	<46490000.1054744366@rambutan.pingpong.net>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="djJN5oi3zFpblwUd"
Content-Disposition: inline
In-Reply-To: <46490000.1054744366@rambutan.pingpong.net>
User-Agent: Mutt/1.4.1i
X-Operating-System: FreeBSD 4.8-RELEASE
cc: freebsd-net@freebsd.org
cc: freebsd-scsi@freebsd.org
Subject: Re: fxp0: device timeout | SCB already complete (me too)
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Jun 2003 09:15:50 -0000


--djJN5oi3zFpblwUd
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jun 04, 2003 at 06:32:46PM +0200, Palle Girgensohn wrote:
#> Hi Shaun,
#>=20
#> Thanks for the input! Glad to hear I'm not the only one
#>=20
#> In my case, both the SCSI and NIC are integrated on the motherboard, so =
I=20
#> cannot really move them around... :)
#>=20
#> Also, as I mentioned, I tried a de0 (PCI card, not onboard, and it=20
#> literally stopped the machine). Is the de0 driver also a problem?
#>=20
#> /Palle

	I'm beginning to think it's a scsi problem of sorts as well so
	I clipped -hardware and Cc'd -scsi on this. I just happed to=20
	(unfortunately) run into this on another box yesterday after
	four months of relative quiet. I happened to be moving an=20
	interface over from some crap Nortel switch to a nice Cisco
	switch and promptly a different interface began to do it's dance.
	It's the same interface each time (and I've changed cards...)
	Anyway, for the record, a little from messages:

Jun  2 18:48:43 nol33n0x /kernel: fxp0: Microcode loaded, int_delay: 1000 u=
sec =20
bundle_max: 6
Jun  4 16:57:50 nol33n0x /kernel: fxp1: SCB timeout: 0x80 0xe0 0x50 0x0
Jun  4 16:57:51 nol33n0x last message repeated 4 times
Jun  4 16:57:51 nol33n0x /kernel: fxp1: SCB timeout: 0x80 0xe0 0x50 0x400
Jun  4 16:57:58 nol33n0x /kernel: fxp1: SCB timeout: 0x80 0xe0 0x50 0x0
Jun  4 16:57:58 nol33n0x last message repeated 3 times
Jun  4 16:57:58 nol33n0x /kernel: fxp1: SCB timeout: 0x80 0xe0 0x50 0x400
Jun  4 16:57:58 nol33n0x last message repeated 110 times
Jun  4 16:58:17 nol33n0x /kernel: fxp1: SCB timeout: 0x80 0xe0 0x90 0x400
Jun  4 16:58:20 nol33n0x last message repeated 17 times
Jun  4 17:09:04 nol33n0x /kernel: fxp1: SCB timeout: 0x80 0xe0 0x90 0x400
Jun  4 17:09:09 nol33n0x last message repeated 2 times
Jun  4 17:09:09 nol33n0x /kernel: fxp1: SCB timeout: 0x80 0xe0 0x90 0x0
Jun  4 17:09:12 nol33n0x last message repeated 3 times
Jun  4 17:09:12 nol33n0x /kernel: fxp1: SCB timeout: 0x80 0xe0 0x90 0x400
Jun  4 17:09:39 nol33n0x last message repeated 22 times
Jun  4 17:18:18 nol33n0x login: ROOT LOGIN (root) ON ttyv0
Jun  4 17:19:21 nol33n0x /kernel: fxp1: DMA timeout
Jun  4 17:19:21 nol33n0x /kernel: fxp1: Microcode loaded, int_delay: 1000 u=
sec =20
bundle_max: 6
Jun  4 17:19:21 nol33n0x /kernel: fxp1: DMA timeout
Jun  4 17:19:21 nol33n0x /kernel: fxp1: SCB timeout: 0x10 0x0 0x80 0x0
Jun  4 17:19:21 nol33n0x /kernel: fxp1: DMA timeout
Jun  4 17:19:21 nol33n0x /kernel: fxp1: SCB timeout: 0x10 0x0 0x80 0x0
Jun  4 17:19:21 nol33n0x /kernel: fxp1: DMA timeout
Jun  4 17:19:21 nol33n0x /kernel: fxp1: SCB timeout: 0x10 0x0 0x80 0x0
Jun  4 17:19:21 nol33n0x /kernel: fxp1: SCB timeout: 0x10 0x0 0x80 0x0
Jun  4 17:19:37 nol33n0x /kernel: fxp1: command queue timeout
Jun  4 17:19:46 nol33n0x /kernel: fxp1: SCB timeout: 0x1 0x0 0x80 0x400
Jun  4 17:19:46 nol33n0x /kernel: fxp1: SCB timeout: 0x81 0x0 0x80 0x400
Jun  4 17:19:58 nol33n0x last message repeated 37 times

=2E..

Jun  4 17:24:21 nol33n0x /kernel: fxp1: SCB timeout: 0x80 0xe0 0x90 0x0
Jun  4 17:24:21 nol33n0x last message repeated 8 times
Jun  4 17:24:21 nol33n0x /kernel: fxp1: SCB timeout: 0x80 0xe0 0x90 0x400
Jun  4 17:24:37 nol33n0x last message repeated 115 times

	After that the box didn't find 3 of the 5 fxp nic's until a new
	boot and a cleared ESCD. Not sure why an fxp card should bitch=20
	about SCB's anyway.  I'd be grateful for any pointers here.=20
	FreeBSD is on it's way out on firewalls here otherwise because=20
	I'm catching a good deal of heat about it.  More info is available=20
	on request.

#>=20
#>=20
#>=20
#> --On tisdag, juni 03, 2003 17.21.23 +0200 Shaun Jurrens=20
#> <shaun.jurrens@skoleetaten.oslo.no> wrote:
#>=20
#> >I hate to say it, but I've had these for months starting at 4.6-stable
#> >and continuing up to at least the latest 4.7-RRELEASE-p* . I have one
#> >dual -current box that has exibited the same behaviour as well.
#> >
#> >The boxes work just fine with the xl0 driver. Lots of different
#> >motherboards and processors (all PIII) and a number of different Intel
#> >card revisions. I can't run my squid boxes on fxp cards _at all_ for
#> >example, the fxp driver will take the box down with it. On my firewalls
#> >it's locked up the  interfaces numerous times.
#> >
#> >The only suggestion I can offer at the moment is to try various card
#> >placements over your PCI slots. I've found stability using one of the
#> >first two slots for my Adaptec controller (2940U[2]W, 29160[N]) and the
#> >rest for the Intel nics.  This happens both with or without POLLING
#> >enabled. I've tried a number of combinations of POLLING enabled/disable=
d,
#> >not  compiled in and different HZ settings. Obviously no POLLING on my
#> >SMP  boxes.
#> >
#> >I know one or two others that have had problems with this too, but
#> >haven't  had the time or equipment at hand to work with any developers =
on
#> >getting this fixed. I guess I got the equipment now (various PIII UP/SMP
#> >boards from Gigabyte, Asus) and a little time if anyone wants to bite.
#> >
#> >My guess is that the POLLING commits broke something, but that's just a
#> >guess. I don't have any dc cards here, and no one has ever complained
#> >about either them or the rl cards timing out.  There also seems to be
#> >a definite correlation between the fxp problem and the ahc driver.
#> >
#> >Ok, the rest of the "me too's" should now chime in with a bit of time
#> >and energy. There's also a PR open on this: kern/45568 .
#> >
#> >
#> >
#> >--
#>=20

--=20
Med vennlig hilsen/Sincerely,

Shaun D. Jurrens
Drift og Sikkerhetskonsulent
IKT-Avdeling
Oslo Skoleetaten

gpg key fingerprint: 007A B6BD 8B1B BAB9 C583  2D19 3A7F 4A3E F83E 84AE

--djJN5oi3zFpblwUd
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (FreeBSD)

iD8DBQE+3wo0On9KPvg+hK4RAgPDAJwJiZvozhTU/NxI1Q8f0wGb3rQZZgCdHXrJ
EhsABUwk5AhmLrZ5vCITwjw=
=G1EC
-----END PGP SIGNATURE-----

--djJN5oi3zFpblwUd--

From owner-freebsd-scsi@FreeBSD.ORG  Thu Jun  5 03:37:59 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id E616737B401
	for <freebsd-scsi@freebsd.org>; Thu,  5 Jun 2003 03:37:59 -0700 (PDT)
Received: from mars.oxyd.fr (mars.oxyd.fr [212.43.245.66])
	by mx1.FreeBSD.org (Postfix) with ESMTP id CC32F43F3F
	for <freebsd-scsi@freebsd.org>; Thu,  5 Jun 2003 03:37:58 -0700 (PDT)
	(envelope-from pcasidy@casidy.com)
Received: from gueway.home (du-253-140.nat.adsl.claranet.fr [212.43.253.140])
	by mars.oxyd.fr (8.12.6/8.12.3) with ESMTP id h55Abu2v093455
	for <freebsd-scsi@freebsd.org>; Thu, 5 Jun 2003 12:37:56 +0200 (CEST)
	(envelope-from pcasidy@casidy.com)
Received: from casidy.com (littleoak.home [192.168.1.3])
	by gueway.home (8.12.9/8.12.9) with ESMTP id h55Acpvg092080
	for <freebsd-scsi@freebsd.org>; Thu, 5 Jun 2003 12:38:56 +0200 (CEST)
	(envelope-from pcasidy@casidy.com)
Message-Id: <200306051038.h55Acpvg092080@gueway.home>
Date: Thu, 5 Jun 2003 12:35:41 +0200 (CEST)
From: pcasidy@casidy.com
To: freebsd-scsi@freebsd.org
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
Subject: Cant write filemarks on DDS-4 Tapes
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Jun 2003 10:38:00 -0000

Hello

I currently face a strange problems.

I use several type of tapes; from 1Gb to 20Gb tapes.
I can read and write from of them but DDS-4 (20Gb)!

I can read DDS-4 btu cannot write on them.

When I try to write on a DDS-4 I encounter an error:
[gueway] ~# tar c /
tar: Removing leading `/' from member names
tar: /dev/sa0: Wrote only 0 of 10240 bytes
tar: Error is not recoverable: exiting now

and /var/log/messages says:
Jun  5 12:28:09 gueway /kernel: (sa0:sym0:0:5:0): WRITE FILEMARKS. CDB: 10 0 0 0 1 0 
Jun  5 12:28:09 gueway /kernel: (sa0:sym0:0:5:0): MEDIUM ERROR info:1 asc:c,0
Jun  5 12:28:09 gueway /kernel: (sa0:sym0:0:5:0): Write error
Jun  5 12:28:09 gueway /kernel: (sa0:sym0:0:5:0): failed to write terminating filemark(s)

every time a filemark has to be written, I have the same error.

For information, mt status gives:
[gueway] ~# mt status
Mode      Density              Blocksize      bpi      Compression
Current:  0x26:DDS-4           variable       97000    DCLZ
---------available modes---------
0:        0x26:DDS-4           variable       97000    DCLZ
1:        0x26:DDS-4           variable       97000    DCLZ
2:        0x26:DDS-4           variable       97000    DCLZ
3:        0x26:DDS-4           variable       97000    DCLZ
---------------------------------
Current Driver State: at rest.
---------------------------------
File Number: 0  Record Number: 0        Residual Count 0

Changing comp to off, or blocksize to 1024 does not help.

I do not understand why i can write on DDS-2 or DDS-3 but not on DDS-4
(i used to several months ago).

I need your help ;)

Thanks

Phil.

From owner-freebsd-scsi@FreeBSD.ORG  Thu Jun  5 09:53:37 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 11ED637B401; Thu,  5 Jun 2003 09:53:37 -0700 (PDT)
Received: from magic.adaptec.com (magic-mail.adaptec.com [208.236.45.100])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 7F82643F75; Thu,  5 Jun 2003 09:53:36 -0700 (PDT)
	(envelope-from gibbs@scsiguy.com)
Received: from redfish.adaptec.com (redfish.adaptec.com [162.62.50.11])
	by magic.adaptec.com (8.11.6/8.11.6) with ESMTP id h55GmPP20473;
	Thu, 5 Jun 2003 09:48:25 -0700
Received: from [10.100.253.70] (aslan.btc.adaptec.com [10.100.253.70])
	by redfish.adaptec.com (8.8.8p2+Sun/8.8.8) with ESMTP id JAA25099;
	Thu, 5 Jun 2003 09:53:33 -0700 (PDT)
Date: Thu, 05 Jun 2003 10:54:14 -0600
From: "Justin T. Gibbs" <gibbs@scsiguy.com>
To: Shaun Jurrens <shaun.jurrens@skoleetaten.oslo.no>,
	Palle Girgensohn <girgen@pingpong.net>
Message-ID: <1607630000.1054832053@aslan.btc.adaptec.com>
In-Reply-To: <20030605091532.GO98443@nevada.skoleetaten.oslo.no>
References: <20030603152123.GM98443@nevada.skoleetaten.oslo.no>
	<46490000.1054744366@rambutan.pingpong.net>
	<20030605091532.GO98443@nevada.skoleetaten.oslo.no>
X-Mailer: Mulberry/3.0.3 (Linux/x86)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
cc: freebsd-net@freebsd.org
cc: freebsd-scsi@freebsd.org
Subject: Re: fxp0: device timeout | SCB already complete (me too)
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: "Justin T. Gibbs" <gibbs@scsiguy.com>
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Jun 2003 16:53:37 -0000

 	After that the box didn't find 3 of the 5 fxp nic's until a new
> 	boot and a cleared ESCD. Not sure why an fxp card should bitch 
> 	about SCB's anyway.

Perhaps because fxp devices have SCBs too?  Not the same SCBs that
the Adaptec SCSI controllers have, but a different data structure
that happens to have the same acronym.

--
Justin

From owner-freebsd-scsi@FreeBSD.ORG  Thu Jun  5 09:56:28 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2711B37B405
	for <scsi@freebsd.org>; Thu,  5 Jun 2003 09:56:28 -0700 (PDT)
Received: from rootlabs.com (root.org [67.118.192.226])
	by mx1.FreeBSD.org (Postfix) with SMTP id 036C243F75
	for <scsi@freebsd.org>; Thu,  5 Jun 2003 09:56:27 -0700 (PDT)
	(envelope-from nate@rootlabs.com)
Received: (qmail 27713 invoked by uid 1000); 5 Jun 2003 16:56:27 -0000
Date: Thu, 5 Jun 2003 09:56:27 -0700 (PDT)
From: Nate Lawson <nate@root.org>
To: Shaun Jurrens <shaun.jurrens@skoleetaten.oslo.no>
In-Reply-To: <20030605091532.GO98443@nevada.skoleetaten.oslo.no>
Message-ID: <20030605095126.B27684@root.org>
References: <20030603152123.GM98443@nevada.skoleetaten.oslo.no>
	<20030605091532.GO98443@nevada.skoleetaten.oslo.no>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Mailman-Approved-At: Thu, 05 Jun 2003 10:08:02 -0700
cc: stable@freebsd.org
cc: Palle Girgensohn <girgen@pingpong.net>
Subject: Re: fxp0: device timeout | SCB already complete (me too)
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Jun 2003 16:56:28 -0000

On Thu, 5 Jun 2003, Shaun Jurrens wrote:
> On Wed, Jun 04, 2003 at 06:32:46PM +0200, Palle Girgensohn wrote:
> #> Hi Shaun,
> #>
> #> Thanks for the input! Glad to hear I'm not the only one
> #>
> #> In my case, both the SCSI and NIC are integrated on the motherboard, so I
> #> cannot really move them around... :)
> #>
> #> Also, as I mentioned, I tried a de0 (PCI card, not onboard, and it
> #> literally stopped the machine). Is the de0 driver also a problem?
>
> 	I'm beginning to think it's a scsi problem of sorts as well so
> 	I clipped -hardware and Cc'd -scsi on this. I just happed to
> 	(unfortunately) run into this on another box yesterday after
> 	four months of relative quiet. I happened to be moving an
> 	interface over from some crap Nortel switch to a nice Cisco
> 	switch and promptly a different interface began to do it's dance.
> 	It's the same interface each time (and I've changed cards...)
> 	Anyway, for the record, a little from messages:
>
> Jun  2 18:48:43 nol33n0x /kernel: fxp0: Microcode loaded, int_delay: 1000 usec
> bundle_max: 6
> Jun  4 16:57:50 nol33n0x /kernel: fxp1: SCB timeout: 0x80 0xe0 0x50 0x0
> Jun  4 16:57:51 nol33n0x last message repeated 4 times
> Jun  4 16:57:51 nol33n0x /kernel: fxp1: SCB timeout: 0x80 0xe0 0x50 0x400
> Jun  4 16:57:58 nol33n0x /kernel: fxp1: SCB timeout: 0x80 0xe0 0x50 0x0
> Jun  4 16:57:58 nol33n0x last message repeated 3 times

This doesn't mention SCSI anywhere.  Your problem is almost certainly a
PCI/interrupt problem.  I'm redirecting this thread to -stable.

> #> >The only suggestion I can offer at the moment is to try various card
> #> >placements over your PCI slots. I've found stability using one of the
> #> >first two slots for my Adaptec controller (2940U[2]W, 29160[N]) and the
> #> >rest for the Intel nics.

I got panics on boot with my BP6 (SMP) when I had an ahc controller in a
PCI slot that didn't support bus mastering.  I suggest you do what the
above message says and try different combinations of cards in slots (i.e.
keep removing one until you no longer get the messages and move around
which slot is free).  This will help people track down the problem.  Also
get your mobo manual and check if any slots force interrupt sharing or
don't support bus mastering.

-Nate

From owner-freebsd-scsi@FreeBSD.ORG  Thu Jun  5 13:38:07 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 474A937B401; Thu,  5 Jun 2003 13:38:07 -0700 (PDT)
Received: from vmx1.skoleetaten.oslo.no (vmx1.skoleetaten.oslo.no
	[193.156.192.31])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 3485743F93; Thu,  5 Jun 2003 13:38:06 -0700 (PDT)
	(envelope-from shamz@nevada.skoleetaten.oslo.no)
Received: from smtp.skoleetaten.oslo.no (localhost [127.0.0.1])
	by vmx1.skoleetaten.oslo.no (Clean Mail System) with SMTP
	id 881047D554; Thu,  5 Jun 2003 22:35:45 +0200 (CEST)
Received: from nevada.skoleetaten.oslo.no (nevada.skoleetaten.oslo.no
	[193.156.192.131])
	by smtp.skoleetaten.oslo.no (Clean Mail System) with ESMTP
	id 55E067D379; Thu,  5 Jun 2003 22:35:45 +0200 (CEST)
Received: from nevada.skoleetaten.oslo.no (localhost [127.0.0.1])
	h55KZjOU055966;	Thu, 5 Jun 2003 22:35:45 +0200 (CEST)
	(envelope-from shamz@nevada.skoleetaten.oslo.no)
Received: (from shamz@localhost)h55KZZJh055965;
	Thu, 5 Jun 2003 22:35:35 +0200 (CEST)
Date: Thu, 5 Jun 2003 22:35:35 +0200
From: Shaun Jurrens <shaun.jurrens@skoleetaten.oslo.no>
To: "Justin T. Gibbs" <gibbs@scsiguy.com>
Message-ID: <20030605203535.GP98443@nevada.skoleetaten.oslo.no>
References: <20030603152123.GM98443@nevada.skoleetaten.oslo.no>
	<46490000.1054744366@rambutan.pingpong.net>
	<20030605091532.GO98443@nevada.skoleetaten.oslo.no>
	<1607630000.1054832053@aslan.btc.adaptec.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="7PAM/4G1BR2SfWzg"
Content-Disposition: inline
In-Reply-To: <1607630000.1054832053@aslan.btc.adaptec.com>
User-Agent: Mutt/1.4.1i
X-Operating-System: FreeBSD 4.8-RELEASE
cc: freebsd-net@freebsd.org
cc: Palle Girgensohn <girgen@pingpong.net>
cc: freebsd-scsi@freebsd.org
Subject: Re: fxp0: device timeout | SCB already complete (me too)
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Jun 2003 20:38:07 -0000


--7PAM/4G1BR2SfWzg
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jun 05, 2003 at 10:54:14AM -0600, Justin T. Gibbs wrote:
#>  	After that the box didn't find 3 of the 5 fxp nic's until a new
#> > 	boot and a cleared ESCD. Not sure why an fxp card should bitch=20
#> > 	about SCB's anyway.
#>=20
#> Perhaps because fxp devices have SCBs too?  Not the same SCBs that
#> the Adaptec SCSI controllers have, but a different data structure
#> that happens to have the same acronym.

	A bit confusing, I'll admit, and not documented. A more careful
	grep would have found it in /usr/src/sys/dev/fxp/if_fxp.c (and=20
	related files...)

	Thanx for the clue bat anyway...
#>=20
#> --
#> Justin

--=20
Med vennlig hilsen/Sincerely,

Shaun D. Jurrens
Drift og Sikkerhetskonsulent
IKT-Avdeling
Oslo Skoleetaten

gpg key fingerprint: 007A B6BD 8B1B BAB9 C583  2D19 3A7F 4A3E F83E 84AE

--7PAM/4G1BR2SfWzg
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (FreeBSD)

iD8DBQE+36mXOn9KPvg+hK4RAqtPAJ9Z4vmerpivSlSO3Wv7jFqWXmAc+gCfZHCr
TkFhfsZZ9qJkU9Zjnk+sYZc=
=5j1v
-----END PGP SIGNATURE-----

--7PAM/4G1BR2SfWzg--

From owner-freebsd-scsi@FreeBSD.ORG  Fri Jun  6 07:38:41 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 3C26B37B401
	for <freebsd-scsi@freebsd.org>; Fri,  6 Jun 2003 07:38:41 -0700 (PDT)
Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48])
	by mx1.FreeBSD.org (Postfix) with ESMTP id F082143F93
	for <freebsd-scsi@freebsd.org>; Fri,  6 Jun 2003 07:38:34 -0700 (PDT)
	(envelope-from kern@sibbald.com)
Received: from [192.168.68.112] (rufus [192.168.68.112])
	by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h56EUpv23575;
	Fri, 6 Jun 2003 16:30:52 +0200
From: Kern Sibbald <kern@sibbald.com>
To: mjacob@feral.com
In-Reply-To: <20030604074943.E98367@wonky.in0.lcl>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost> <20030602110836.H71034@beppo>
	<20030602131225.F71034@beppo>	 <1054645616.13630.161.camel@rufus>
	<3490610000.1054651919@aslan.scsiguy.com>
	<20030603084701.U24586@wonky.in0.lcl>
	<20030603103611.R24586@wonky.in0.lcl>
	<20030604074943.E98367@wonky.in0.lcl>
Content-Type: text/plain
Organization: 
Message-Id: <1054909851.13630.967.camel@rufus>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.4 
Date: 06 Jun 2003 16:30:51 +0200
Content-Transfer-Encoding: 7bit
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Jun 2003 14:38:41 -0000

Hello,

I have now completed a fairly extensive series of tests
on my Linux machine with a DDS-4 drive and on Dan's FreeBSD
machine with a DDS-1 drive.

Bottom line: There is a significant data loss (500KB to 2MB) 
at the EOM on Dan's drive.  There is no data loss on my drive.

The variation in the data loss seems to be inversely dependent
on how compressible the data is (i.e the more the data can be
compressed to fit in a fixed size driver buffer, the more user
data is lost).

I ran three different kinds of tests and several variations of some
of those tests:

Tests:
1. Bacula saving a 1GB file containing random data.
2. Simulation of Bacula writing easily compressible, non-random data.
3. Raw write() of random data (same data each write except for
   first 32 bits).

Variations:
1. Bacula stop writing before EOM reached.
2. Test 2 above without drive hardware compression
3. Test 3 above without writing EOF but simply rewinding
4. Tests with and without using ioctl(MTIOCLRERROR).
5. Various tests with block size at 64,512 bytes, others with
   block size at 61,440 bytes.

Results:
1. All tests on my machine succeeded.
2. All tests (Test 1 Variation 1) not writing to EOM succeed
   on both machines. (Previously we indicated that there
   was a loss when not writing to the EOM. I could not
   produce this and believe we had a misunderstanding 
   somewhere).
3. All tests of all variations writing to EOM failed 
   on Dan's machine.
4. The number of buffers lost was quite consistent (1-2 buffer
   difference) for any given variation.
5. There was not much difference in the number of buffers
   lost with/without hardware compression when the data was
   random.
6. The number of buffers lost was 4 times greater with
   non-random data and drive compression enabled than
   with random data or with no drive compression.

Conclusions:
1. On Dan's machine, data is always lost at EOM.
2. The amount of data lost appears to be closely
   related to what is in the drive buffer (more buffers 
   are lost if the data is easily compressed).

Possible causes:
1. The hardware does not have an LEOM
2. The driver is not signaling to the program when an LEOM
   occurs thus the buffered data is lost at the PEOM,  The
   ONLY write() status I got in all the tests was -1 with 
   errno=ENOSPC (no zero bytes written were ever returned).
3. Some miscommunication between the hardware and the driver.

What next:
- Time for the SCSI guys to look at this.  The problem is easily 
  repeatable on Dan's machine -- just do a whole bunch
  of write()s, nothing else, and it is guaranteed
  to happen.

Perhaps all the above is not clear enough, in which case,
please ask, but if I write it out with all the reasoning, it will
be a monster essay, so I've tried to give the important test
results so that you can draw your own conclusions and then
compare them to mine.

Best regards,

Kern

From owner-freebsd-scsi@FreeBSD.ORG  Fri Jun  6 09:00:17 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 04BE737B401
	for <freebsd-scsi@freebsd.org>; Fri,  6 Jun 2003 09:00:17 -0700 (PDT)
Received: from bast.unixathome.org (bast.unixathome.org [66.11.174.150])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 6FBE543FA3
	for <freebsd-scsi@freebsd.org>; Fri,  6 Jun 2003 09:00:16 -0700 (PDT)
	(envelope-from dan@langille.org)
Received: from wocker (wocker.unixathome.org [192.168.0.99])
	by bast.unixathome.org (Postfix) with ESMTP id 77E7B3D29
	for <freebsd-scsi@freebsd.org>; Fri,  6 Jun 2003 12:00:15 -0400 (EDT)
From: "Dan Langille" <dan@langille.org>
To: freebsd-scsi@freebsd.org
Date: Fri, 06 Jun 2003 12:00:15 -0400
MIME-Version: 1.0
Message-ID: <3EE0824F.8075.DD59AAD5@localhost>
Priority: normal
References: <20030604074943.E98367@wonky.in0.lcl>
In-reply-to: <1054909851.13630.967.camel@rufus>
X-mailer: Pegasus Mail for Windows (v4.02a)
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Content-description: Mail message body
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Jun 2003 16:00:17 -0000

On 6 Jun 2003 at 16:30, Kern Sibbald wrote:

> What next:
> - Time for the SCSI guys to look at this.  The problem is easily 
>   repeatable on Dan's machine -- just do a whole bunch
>   of write()s, nothing else, and it is guaranteed
>   to happen.

Access to the box in question can be arranged for those that 
want/need it.
-- 
Dan Langille : http://www.langille.org/

From owner-freebsd-scsi@FreeBSD.ORG  Fri Jun  6 11:50:35 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8E75C37B401
	for <freebsd-scsi@freebsd.org>; Fri,  6 Jun 2003 11:50:35 -0700 (PDT)
Received: from beppo.feral.com (beppo.feral.com [192.67.166.79])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D2E9443FAF
	for <freebsd-scsi@freebsd.org>; Fri,  6 Jun 2003 11:50:34 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Received: from mailhost.feral.com (mjacob@mailhost.feral.com [192.67.166.1])
	by beppo.feral.com (8.12.9/8.12.9) with ESMTP id h56IoWqw071754;
	Fri, 6 Jun 2003 11:50:33 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Date: Fri, 6 Jun 2003 11:50:32 -0700 (PDT)
From: Matthew Jacob <mjacob@feral.com>
X-X-Sender: mjacob@beppo
To: Kern Sibbald <kern@sibbald.com>
In-Reply-To: <1054909851.13630.967.camel@rufus>
Message-ID: <20030606114718.U71629@beppo>
References: <3EDB31AB.16420.C8964B7D@localhost>
	<3EDB59A4.27599.C93270FB@localhost>
	<577540000.1054579840@aslan.btc.adaptec.com>
	<20030602131225.F71034@beppo>  <1054645616.13630.161.camel@rufus> 
	<1054653106.13606.217.camel@rufus><1054661668.13606.292.camel@rufus> 
	<1054711231.13606.396.camel@rufus> <1054909851.13630.967.camel@rufus>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-scsi@freebsd.org
Subject: Re: SCSI tape data loss
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: mjacob@feral.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Jun 2003 18:50:35 -0000

I'm OOT right now.

My guess that if this is rleated to DCLZ format on an old python that
it's not correctly signalling early warning. Note that the driver
doesn't give a flying frick about what happens with data if it's being
compressed on the drive- it's the drive's responsiblity to signal early
warning appropriately.

I have an old DDS1 changer drive around somewhere but I won't be able to
connect it until I get back next week.