From owner-freebsd-bugs  Fri May 19 18:31:38 1995
Return-Path: bugs-owner
Received: (from majordom@localhost)
          by freefall.cdrom.com (8.6.10/8.6.6) id SAA11970
          for bugs-outgoing; Fri, 19 May 1995 18:31:38 -0700
Received: from ns1.win.net (NS1.WIN.NET [204.215.209.3])
          by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id SAA11964
          for <bugs@freebsd.org>; Fri, 19 May 1995 18:31:36 -0700
Received: (from bugs@localhost) by ns1.win.net (8.6.11/8.6.9) id VAA07349 for bugs@freebsd.org; Fri, 19 May 1995 21:34:07 -0400
From: Mark Hittinger <bugs@ns1.win.net>
Message-Id: <199505200134.VAA07349@ns1.win.net>
Subject: re: kern/430: bug in tape drivers
To: bugs@FreeBSD.org
Date: Fri, 19 May 1995 21:34:07 -0400 (EDT)
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Content-Length: 2030      
Sender: bugs-owner@FreeBSD.org
Precedence: bulk

>Number:         430
>Category:       kern
>Synopsis:       SCSI Tape dont work
>Originator:     Charles Henrich (MSU)
>Release:        FreeBSD 2.1.0-Development i386
>
>	ALR Dual Pentium, BT747 SCSI-2, Connor DDS-2 Dat, 3 Seagate Hawk 2gig
                          ^^^^^
>	drives.
>
>Description:
>
>	90% of the time you access the dat drive via dump, FreeBSD goes off
>	and scrambles the other disks in the system.  This sucks, and has
>	happened to me several times.
>

I have seen the same problem since 2.0R.  I have a WangDAT3400DX.  When a
process closes the tape drive I get "bt0a: try to abort".  I believe this
is due to the lengthy rewind, although recently I noted that there was a
problem with scsi commands that contained no data.   In any event I
still see the problem in -current.  I will try a 2940 controller this
weekend and see if the problem exists there.

After a few "bt0a try to abort" I get a "bt0a abort timed out".  It is
at this point that horrible things happen.  The driver corrupts the ccb
chain and bit sprays your disks.  If the rewind finishes before the
"bt0a abort timed out" then no badness happens to your disks.

As a short term kludge/workaround for myself I use the following patch
to sys/i386/isa/bt742a.c:

1600c1600
< 	int	count = xs->timeout;
---
> 	unsigned long	count = xs->timeout;
1631d1630
< 		untimeout(bt_timeout, (caddr_t)ccb);
1710c1709
< 		timeout(bt_timeout, (caddr_t)ccb, 2 * hz);
---
> 		timeout(bt_timeout, (caddr_t)ccb, 10 * hz);

There appears (at least to me) to be a redundant call to untimeout which
I remove.  I increased the timeout value.  Also a (probably) cosmetic
type change on count.

After running a backup each day, when the tape drive is closed I will get
several bt0a aborts until the tape drive completes rewinding.  Then
everything is ok.  I never reach the bt0a abort timeout step where the
nasty corruption occurs.

In any event this allows me to do good backups so I can use my time on
bigger brushfires :-).

Regards,

Mark Hittinger
bugs@win.net