From owner-freebsd-hackers  Sat Jan 20 20:02:39 1996
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.3/8.7.3) id UAA12899
          for hackers-outgoing; Sat, 20 Jan 1996 20:02:39 -0800 (PST)
Received: from jhome.DIALix.COM (root@jhome.DIALix.COM [192.203.228.69])
          by freefall.freebsd.org (8.7.3/8.7.3) with ESMTP id UAA12893
          for <hackers@freebsd.org>; Sat, 20 Jan 1996 20:02:34 -0800 (PST)
Received: (from peter@localhost) by jhome.DIALix.COM (8.7.3/8.7.3) id MAA06529; Sun, 21 Jan 1996 12:02:24 +0800 (WST)
Date: Sun, 21 Jan 1996 12:02:24 +0800 (WST)
From: Peter Wemm <peter@jhome.DIALix.COM>
To: FreeBSD hackers <hackers@freebsd.org>
Subject: Whew!!!!!!! (MAJOR sigh of relief!)
Message-ID: <Pine.BSF.3.91.960121113701.1465B-100000@jhome.DIALix.COM>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-hackers@freebsd.org
Precedence: bulk

Ever had that unpleasant message:
st0(bt0:5:0): MEDIUM ERROR info:2800 asc:11,0 Unrecovered read error

I got this while restoring from a backup (the only useful backup it 
turned out) of some rather critical information.. (My work machine)

I've been discovering all about the 'scsi' command, and how !&@#%!&# big 
the scsi-2 spec is..  Well.. I (by luck I think :-) managed to tell the 
DAT drive to seek one block past the bad spot with the SCSI "read 
position" and "locate" commands, and tar was quite happy to continue 
reading!   (considering I knew *nothing* about low level scsi, this was 
an even more against the odds... :-)

WHEW!!!  ( understatement++++ )

Just out of genuine pure coincidence (yes, they do happen :-), the few
files that I lost happened to be saved mail.. and I think one of my large
old "pending reply" folders was the worst hit... :-/ (This was from my work
machine (which is currently loading FreeBSD-2.1R), so it doesn't affect 
my backlog of FreeBSD mail...)

Several morals to the story:
1: do backups more often.. :-)
2: if know you are going to need it (like you've spent 4 days recovering the 
data off a scrambled disk and you are dumping it to tape before wiping the 
rest of the disk and reparitioning it) make sure you verify it MORE than 
only once...
3: in situation #2, make second copy of the tape in case it goes bad...
4: dont do backups with a room temperature that is higher than the 
environmental limits of the media in question (45C or 113F)

Along the way, I discovered that the BT driver handles timeouts and 
aborts very badly...  I also discovered that the timeout for the scsi "READ" 
command is too short.  The command timed out before the drive had given 
up on trying to recover the data..

I also discovered that we need a "mt getpos" that tells us what block 
number the tape is on (from scsi "READ POSITION"), and a "mt setpos" to do a 
scsi "LOCATE".. (the command names don't matter, but it'd be nice to have 
them without having to look up the mammoth scsi-2 spec each time..)

-Peter