Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 Aug 1999 11:20:11 +0200 (CEST)
From:      Wilko Bulte <wilko@yedi.iaf.nl>
To:        karl@Denninger.Net (Karl Denninger)
Cc:        randy@psg.com, freebsd-scsi@FreeBSD.ORG
Subject:   Re: dump to dlt gets write error
Message-ID:  <199908140920.LAA53941@yedi.iaf.nl>
In-Reply-To: <19990813191646.A57450@Denninger.Net> from Karl Denninger at "Aug 13, 1999  7:16:46 pm"

next in thread | previous in thread | raw e-mail | index | archive | help
As Karl Denninger wrote ...

> I've seen this kind of stupidity before and you're not going to like the
> problem or solution.
> 
> Put the DLT on a different SCSI bus (different host adapter) from the disks.
> 
> Specifically, separate the fast/wide and narrow SCSI devices.
> 
> I've seen both DLTs and other "non-wide" devices have kittens with disks
> running fast/wide on the same SCSI bus.  It usually manifests itself as 
> an I/O error on the narrow device - which is exactly what you're getting.

I've been doing this for years and it works just fine:
	
FreeBSD 3.2-STABLE #5: Sun Aug  8 17:13:28 CEST 1999
    root@yedi.iaf.nl:/usr/freebsd-stable-src/src/sys/compile/YEDI

[....]

da0: <DEC RZ1EF-CB (C) DEC 0371> Fixed Direct Access SCSI-2 device 
da0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da0: 17365MB (35565080 512 byte sectors: 255H 63S/T 2213C)
evice to da0s2a
cd0 at ahc0 bus 0 target 4 lun 0
cd0: <TOSHIBA CD-ROM XM-5701TA 0557> Removable CD-ROM SCSI-2 device 
cd0: 10.000MB/s transfers (10.000MHz, offset 8)
cd0: Attempt to query device size failed: NOT READY, Medium not present
da1 at ahc0 bus 0 target 1 lun 0
da1: <IBM DDRS-39130D DC1B> Fixed Direct Access SCSI-2 device 
da1: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da1: 8715MB (17850000 512 byte sectors: 255H 63S/T 1111C)
cd1: <PHILIPS CDD3600 CD-R/RW 2.00> Removable CD-ROM SCSI-2 device 
cd1: 10.000MB/s transfers (10.000MHz, offset 15)
cd1: Attempt to query device size failed: NOT READY, Medium not present
sa3 at ahc0 bus 0 target 2 lun 0
sa3: <ARCHIVE 4326XX 27871-XXX 0322> Removable Sequential Access SCSI-2 device 
sa3: 5.000MB/s transfers (5.000MHz, offset 15)
sa2 at ahc0 bus 0 target 5 lun 0
sa2: <DEC TZ88     (C) DEC D473> Removable Sequential Access SCSI-2 device 
sa2: 10.000MB/s transfers (10.000MHz, offset 15)
sa0 at ahc0 bus 0 target 6 lun 0
sa0: <TANDBERG TDC 4200 00A1> Removable Sequential Access SCSI-2 device 
sa0: 3.300MB/s transfers

The TZ88 is a DLT4000 btw, I also used a TZ87 which is a DLT2000. All
my tapes are in a Storageworks shelf.

> My guess is that the hardware on the narrow (and not-so-fast) device gets
> mightily confused by the shorter signal times (even though they're not
> aimed at that target) and randomly "freaks out" enough to botch an
> operation.

A DLT4000 is also a fast scsi device. My DLT2000 which is 5 Mb/sec also
worked just fine.

If I had to guess this is bad interconnect of some kind, or lousy
termination.

> Do you get any kind of DMESG log when the write *fails* (check it) or a
> console log of the actual error?

You can also pull the error logs from within the DLT drive itself. Try
the script below:


#!/bin/sh

# dltinfo: get more information out of your DLT tape drive.
#
# (C) 1996, Wilko Bulte, wilko@freebsd.org
#
# Warning: This script has only been tested on a DEC TZ87 & TZ88 DLT
#
# You need the DLT drive's OEM manual (or similar) to make
# sense out of some of the data reported.

# Please send any constructive comments by email to wilko@freebsd.org

Unit=2

## camcontrol(8) setup
Verbose="-v"
TimeOut="-t 3"

get_write_error_log() {

	RetVal=`camcontrol cmd -n sa -u $Unit \
	     $Verbose \
	     $Timeout \
	     -c "4d 0 42 0 0 0 0 0 3f 0" \
	     -i 63 \
		"{skip} *i4 \
		 {skip} *i4 \
		 {Corrected errors without substantial delay} i4 \
		 {skip} *i4 \
		 {Corrected errors with possible delay      } i4 \
		 {skip} *i4 \
		 {Total errors                              } i4 \
		 {skip} *i4 \
		 {Total errors corrected                    } i4 \
		 {skip} *i4 \
		 {Total times correction algorithm processed} i4 \
		 {skip} *i4 \
		 {Total bytes processed                     } i8 \
		 {skip} *i4 \
		 {Total uncorrected errors                  } i4"
	`
	set $RetVal
	echo "--- write errors ---"
	printf "Corrected errors without substantial delay = %d\n" $1
	printf "Corrected errors with possible delay = %d\n" $2
	printf "Total errors = %d\n" $3
	printf "Total errors corrected = %d\n" $4
	printf "Total times correction algorithm processed = %d\n" $5
	printf "Total bytes processed  = %d\n" $6
	printf "Total uncorrected errors = %d\n" $7
}

get_read_error_log() {

	RetVal=`camcontrol cmd -n sa -u $Unit \
	     $Verbose \
	     $Timeout \
	     -c "4d 0 43 0 0 0 0 0 3f 0" \
	     -i 63 \
		"{skip} *i4 \
		 {skip} *i4 \
		 {Corrected errors without substantial delay} i4 \
		 {skip} *i4 \
		 {Corrected errors with possible delay      } i4 \
		 {skip} *i4 \
		 {Total errors                              } i4 \
		 {skip} *i4 \
		 {Total errors corrected                    } i4 \
		 {skip} *i4 \
		 {Total times correction algorithm processed} i4 \
		 {skip} *i4 \
		 {Total bytes processed                     } i8 \
		 {skip} *i4 \
		 {Total uncorrected errors                  } i4"
	`
	set $RetVal
	echo "--- read errors ---"
        printf "Corrected errors without substantial delay = %d\n" $1
        printf "Corrected errors with possible delay = %d\n" $2
        printf "Total errors = %d\n" $3
        printf "Total errors corrected = %d\n" $4
        printf "Total times correction algorithm processed = %d\n" $5
        printf "Total bytes processed  = %d\n" $6
        printf "Total uncorrected errors = %d\n" $7 
	
}

get_compression_log() {

# Assumption: from the results observed in testing it lookse
#             like the residual counts are in kBytes (and not
#             in Mbytes as the TZ87 manual tells us).

	RetVal=`camcontrol cmd -n sa -u $Unit \
	     $Verbose \
	     $Timeout \
	     -c "4d 0 72 0 0 0 0 0 4c 0" \
	     -i 76 \
		"{skip} *i4 \
                 {skip } *i4 \
		 {Read compression ratio (* 100 %)       } i2 \
		 {skip } *i4 \
		 {Write compression ratio (* 100 %)      } i2 \
		 {skip } *i4 \
		 {Total host Mbytes reads                } i4 \
		 {skip } *i4 \
		 {Total host kbytes read residual        } i4 \
		 {skip } *i4 \
		 {On tape Mbytes read                    } i4 \
		 {skip} *i4 \
		 {On tape kbytes read residual           } i4 \
		 {skip} *i4 \
		 {Host requested Mbytes written          } i4 \
		 {skip} *i4 \
		 {Host requested kbytes written residual } i4 \
		 {skip} *i4 \
		 {On tape Mbytes written                 } i4 \
		 {skip} *i4 \
		 {On tape kbytes written residual        } i4 "
	`
	set $RetVal
	echo "--- compression statistics ---"
        printf "Read compression ratio = %d %%\n" $1
        printf "Write compression ratio = %d %%\n" $2
        printf "Total host Mbytes read = %d\n" $3
        printf "Total host kbytes read residual = %d\n" $4
        printf "On tape Mbytes read = %d\n" $5
        printf "On tape kbytes read residual  = %d\n" $6
        printf "Host requested Mbytes written = %d\n" $7 
        printf "Host requested kbytes written residual = %d\n" $8
        printf "On tape Mbytes written = %d\n" $9
        printf "On tape kbytes written residual = %d\n" $10

}

get_read_error_log
echo
get_write_error_log
echo
get_compression_log
echo

It is a quick hack but it works for me. Like in:


Mon Aug  9 00:32:07 CEST 1999
  DUMP: Date of this level 0 dump: Mon Aug  9 00:32:07 1999
  DUMP: Date of last level 0 dump: the epoch
  DUMP: Dumping /dev/rda1c (/local2) to /dev/nrsa2
  DUMP: mapping (Pass I) [regular files]
  DUMP: mapping (Pass II) [directories]
  DUMP: estimated 2685410 tape blocks.
  DUMP: dumping (Pass III) [directories]
  DUMP: dumping (Pass IV) [regular files]
  DUMP: 12.18% done, finished in 0:36
  DUMP: 27.84% done, finished in 0:25
  DUMP: 43.49% done, finished in 0:19
  DUMP: 59.02% done, finished in 0:13
  DUMP: 73.04% done, finished in 0:09
  DUMP: 86.80% done, finished in 0:04
  DUMP: 99.85% done, finished in 0:00
  DUMP: DUMP: 2686684 tape blocks on 1 volumes(s)
  DUMP: finished in 2104 seconds, throughput 1276 KBytes/sec
  DUMP: level 0 dump on Mon Aug  9 00:32:07 1999
  DUMP: Closing /dev/nrsa2
  DUMP: DUMP IS DONE
Mon Aug  9 01:07:41 CEST 1999

--- read errors ---
Corrected errors without substantial delay = 0
Corrected errors with possible delay = 0
Total errors = 0
Total errors corrected = 0
Total times correction algorithm processed = 0
Total bytes processed  = 0
Total uncorrected errors = 0

--- write errors ---
Corrected errors without substantial delay = 0
Corrected errors with possible delay = 0
Total errors = 105
Total errors corrected = 105
Total times correction algorithm processed = 0
Total bytes processed  = 0
Total uncorrected errors = 0

--- compression statistics ---
Read compression ratio = 0 %
Write compression ratio = 100 %
Total host Mbytes read = 0
Total host kbytes read residual = 0
On tape Mbytes read = 0
On tape kbytes read residual  = 0
Host requested Mbytes written = 9196
Host requested kbytes written residual = 196608
On tape Mbytes written = 9196
On tape kbytes written residual = 0

The most interesting part is the Total errors thingy. I've seen that
sky rocket with bad media or DLT drives with a bad head. 

IMHO this kind of errorlogging would be cool to have in any standard
shape. Like VMS does, or in a quite different form, DEC Unix, eh Tru64. 
Really useful in case you have hardware problems.

> Karl Denninger (karl@denninger.net)  Web: childrens-justice.org
> 
> 
> On Fri, Aug 13, 1999 at 05:05:17PM -0700, Randy Bush wrote:
> > asus p2b-ds 2x350mhz, 128mb
> > two barracudas
> > quantum dlt2000
> > 4.0-currnt of 99.04.03
> > 
> >     rip.psg.com:/# /do-dump
> >       ...
> >       DUMP: Date of this level 0 dump: Fri Aug 13 16:07:11 1999
> >       DUMP: Date of last level 0 dump: the epoch
> >       DUMP: Dumping /dev/rccd5c (/usr) to /dev/nrsa0
> >       DUMP: mapping (Pass I) [regular files]
> >       DUMP: mapping (Pass II) [directories]
> >       DUMP: estimated 2951716 tape blocks.
> >       DUMP: dumping (Pass III) [directories]
> >       DUMP: dumping (Pass IV) [regular files]
> >       DUMP: 11.37% done, finished in 0:38
> >       DUMP: 25.73% done, finished in 0:28
> >       DUMP: 39.41% done, finished in 0:23
> >       DUMP: 51.77% done, finished in 0:18
> >       DUMP: 64.66% done, finished in 0:13
> >       DUMP: 76.19% done, finished in 0:09
> >       DUMP: 88.92% done, finished in 0:04
> >       DUMP: write error 2700020 blocks into volume 1
> >       DUMP: Do you want to restart?: ("yes" or "no") 
> > 
> > usually a LOT more fits on a tape, like four machines more.
> > 
> > i ran the cleaning tape.  i tried different tapes from different batches,

Don't ever run cleaning tapes on a DLT drive unless the 'Use cleaning tape'
LED comes on. Cleaning tapes are really bad news for the DLT heads if they
are run on a regular basis. They consist of a more or less normal data tape
that did not get it's final polishing steps in manufacturing. They are
quite abrasive and do bad things to non-dirty heads.

> > including one that worked in the past.  it breaks at different places, but
> > always much of the way through that partition.
> > 
> > clues solicited.
> > 
> > randy
> > 

-- 
|   / o / /  _  	 Arnhem, The Netherlands	- Powered by FreeBSD -
|/|/ / / /( (_) Bulte 	 WWW  : http://www.tcja.nl 	http://www.freebsd.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199908140920.LAA53941>