Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 21 May 1995 08:28:49 -0400 (EDT)
From:      Peter Dufault <dufault@hda.com>
To:        bugs@ns1.win.net (Mark Hittinger)
Cc:        hackers@FreeBSD.org, julian@FreeBSD.org
Subject:   Re: kern/430: bug in tape drivers
Message-ID:  <199505211228.IAA17587@hda.com>
In-Reply-To: <199505200134.VAA07349@ns1.win.net> from "Mark Hittinger" at May 19, 95 09:34:07 pm

next in thread | previous in thread | raw e-mail | index | archive | help
Mark Hittinger writes:
> 
> >Number:         430
> >Category:       kern
> >Synopsis:       SCSI Tape dont work
> >Originator:     Charles Henrich (MSU)
> >Release:        FreeBSD 2.1.0-Development i386
> >
> >	ALR Dual Pentium, BT747 SCSI-2, Connor DDS-2 Dat, 3 Seagate Hawk 2gig
>                           ^^^^^
> >	drives.
> >
> >Description:
> >
> >	90% of the time you access the dat drive via dump, FreeBSD goes off
> >	and scrambles the other disks in the system.  This sucks, and has
> >	happened to me several times.
> >

I think that the the tape drive is tying up the SCSI bus (and
maybe therefore the host adapter?) for some reason.

> I have seen the same problem since 2.0R.  I have a WangDAT3400DX.  When a
> process closes the tape drive I get "bt0a: try to abort".  I believe this
> is due to the lengthy rewind, although recently I noted that there was a
> problem with scsi commands that contained no data.   In any event I
> still see the problem in -current.  I will try a 2940 controller this
> weekend and see if the problem exists there.

As I mentioned, zero length commands aren't an issue.

> After a few "bt0a try to abort" I get a "bt0a abort timed out".  It is
> at this point that horrible things happen.  The driver corrupts the ccb
> chain and bit sprays your disks.  If the rewind finishes before the
> "bt0a abort timed out" then no badness happens to your disks.

You get more than one "bt0: Try to abort" messages?   That
is probably the scsi system aborting the ongoing disk transfers that aren't
completing due to the problem with the tape drive, since you will
only get one "Try to abort" message per aborted transaction.

I'm not sure what your work around does:  you end up stretching out
the "Try to abort" time until the drive finishes and "unlocks"
the host adapter.  So you've tried to abort a few transfers.  Did they
abort?  I don't know.  Do you wind up getting a disk retry per
abort message after this?

Anyway, if the "abort timed out" happens we toss that active CCB's back
onto the freelist and the next SCSI transaction will get that same
CCB.  This is probably a mistake: we should instead let the CCBs leak
off into the bit bucket, potentially hanging the system,
but tossing them back so that they wind up being reused may be what
is trashing the disk.

Peter
-- 
Peter Dufault               Real Time Machine Control and Simulation
HD Associates, Inc.         Voice: 508 433 6936
dufault@hda.com             Fax:   508 433 5267



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199505211228.IAA17587>