From owner-freebsd-hackers Sat Jan 20 20:02:39 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id UAA12899 for hackers-outgoing; Sat, 20 Jan 1996 20:02:39 -0800 (PST) Received: from jhome.DIALix.COM (root@jhome.DIALix.COM [192.203.228.69]) by freefall.freebsd.org (8.7.3/8.7.3) with ESMTP id UAA12893 for ; Sat, 20 Jan 1996 20:02:34 -0800 (PST) Received: (from peter@localhost) by jhome.DIALix.COM (8.7.3/8.7.3) id MAA06529; Sun, 21 Jan 1996 12:02:24 +0800 (WST) Date: Sun, 21 Jan 1996 12:02:24 +0800 (WST) From: Peter Wemm To: FreeBSD hackers Subject: Whew!!!!!!! (MAJOR sigh of relief!) Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-hackers@freebsd.org Precedence: bulk Ever had that unpleasant message: st0(bt0:5:0): MEDIUM ERROR info:2800 asc:11,0 Unrecovered read error I got this while restoring from a backup (the only useful backup it turned out) of some rather critical information.. (My work machine) I've been discovering all about the 'scsi' command, and how !&@#%!&# big the scsi-2 spec is.. Well.. I (by luck I think :-) managed to tell the DAT drive to seek one block past the bad spot with the SCSI "read position" and "locate" commands, and tar was quite happy to continue reading! (considering I knew *nothing* about low level scsi, this was an even more against the odds... :-) WHEW!!! ( understatement++++ ) Just out of genuine pure coincidence (yes, they do happen :-), the few files that I lost happened to be saved mail.. and I think one of my large old "pending reply" folders was the worst hit... :-/ (This was from my work machine (which is currently loading FreeBSD-2.1R), so it doesn't affect my backlog of FreeBSD mail...) Several morals to the story: 1: do backups more often.. :-) 2: if know you are going to need it (like you've spent 4 days recovering the data off a scrambled disk and you are dumping it to tape before wiping the rest of the disk and reparitioning it) make sure you verify it MORE than only once... 3: in situation #2, make second copy of the tape in case it goes bad... 4: dont do backups with a room temperature that is higher than the environmental limits of the media in question (45C or 113F) Along the way, I discovered that the BT driver handles timeouts and aborts very badly... I also discovered that the timeout for the scsi "READ" command is too short. The command timed out before the drive had given up on trying to recover the data.. I also discovered that we need a "mt getpos" that tells us what block number the tape is on (from scsi "READ POSITION"), and a "mt setpos" to do a scsi "LOCATE".. (the command names don't matter, but it'd be nice to have them without having to look up the mammoth scsi-2 spec each time..) -Peter