From owner-freebsd-bugs Fri May 19 18:31:38 1995 Return-Path: bugs-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id SAA11970 for bugs-outgoing; Fri, 19 May 1995 18:31:38 -0700 Received: from ns1.win.net (NS1.WIN.NET [204.215.209.3]) by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id SAA11964 for ; Fri, 19 May 1995 18:31:36 -0700 Received: (from bugs@localhost) by ns1.win.net (8.6.11/8.6.9) id VAA07349 for bugs@freebsd.org; Fri, 19 May 1995 21:34:07 -0400 From: Mark Hittinger Message-Id: <199505200134.VAA07349@ns1.win.net> Subject: re: kern/430: bug in tape drivers To: bugs@FreeBSD.org Date: Fri, 19 May 1995 21:34:07 -0400 (EDT) X-Mailer: ELM [version 2.4 PL23] Content-Type: text Content-Length: 2030 Sender: bugs-owner@FreeBSD.org Precedence: bulk >Number: 430 >Category: kern >Synopsis: SCSI Tape dont work >Originator: Charles Henrich (MSU) >Release: FreeBSD 2.1.0-Development i386 > > ALR Dual Pentium, BT747 SCSI-2, Connor DDS-2 Dat, 3 Seagate Hawk 2gig ^^^^^ > drives. > >Description: > > 90% of the time you access the dat drive via dump, FreeBSD goes off > and scrambles the other disks in the system. This sucks, and has > happened to me several times. > I have seen the same problem since 2.0R. I have a WangDAT3400DX. When a process closes the tape drive I get "bt0a: try to abort". I believe this is due to the lengthy rewind, although recently I noted that there was a problem with scsi commands that contained no data. In any event I still see the problem in -current. I will try a 2940 controller this weekend and see if the problem exists there. After a few "bt0a try to abort" I get a "bt0a abort timed out". It is at this point that horrible things happen. The driver corrupts the ccb chain and bit sprays your disks. If the rewind finishes before the "bt0a abort timed out" then no badness happens to your disks. As a short term kludge/workaround for myself I use the following patch to sys/i386/isa/bt742a.c: 1600c1600 < int count = xs->timeout; --- > unsigned long count = xs->timeout; 1631d1630 < untimeout(bt_timeout, (caddr_t)ccb); 1710c1709 < timeout(bt_timeout, (caddr_t)ccb, 2 * hz); --- > timeout(bt_timeout, (caddr_t)ccb, 10 * hz); There appears (at least to me) to be a redundant call to untimeout which I remove. I increased the timeout value. Also a (probably) cosmetic type change on count. After running a backup each day, when the tape drive is closed I will get several bt0a aborts until the tape drive completes rewinding. Then everything is ok. I never reach the bt0a abort timeout step where the nasty corruption occurs. In any event this allows me to do good backups so I can use my time on bigger brushfires :-). Regards, Mark Hittinger bugs@win.net