From owner-freebsd-scsi  Mon Jan 18 20:31:23 1999
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id UAA09590
          for freebsd-scsi-outgoing; Mon, 18 Jan 1999 20:31:23 -0800 (PST)
          (envelope-from owner-freebsd-scsi@FreeBSD.ORG)
Received: from panzer.plutotech.com (panzer.plutotech.com [206.168.67.125])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id UAA09584
          for <scsi@FreeBSD.ORG>; Mon, 18 Jan 1999 20:31:21 -0800 (PST)
          (envelope-from ken@panzer.plutotech.com)
Received: (from ken@localhost)
          by panzer.plutotech.com (8.9.1/8.8.5) id VAA05784;
          Mon, 18 Jan 1999 21:31:01 -0700 (MST)
From: "Kenneth D. Merry" <ken@plutotech.com>
Message-Id: <199901190431.VAA05784@panzer.plutotech.com>
Subject: Re: Fireball woes (continued)
In-Reply-To: <xzpd84c53n7.fsf@flood.ping.uio.no> from Dag-Erling Smorgrav at "Jan 19, 99 04:43:08 am"
To: des@flood.ping.uio.no (Dag-Erling Smorgrav)
Date: Mon, 18 Jan 1999 21:31:01 -0700 (MST)
Cc: scsi@FreeBSD.ORG
X-Mailer: ELM [version 2.4ME+ PL28s (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-scsi@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Dag-Erling Smorgrav wrote...
> Some of you may remember that I had a nearly brand-new Fireball act up
> on me last fall... well, it's acting up again:
> 
> Jan 19 04:12:59 niobe /kernel: (da1:ahc0:0:4:0): READ(06). CDB: 8 2 57 0 10 0
> Jan 19 04:12:59 niobe /kernel: (da1:ahc0:0:4:0): MEDIUM ERROR info:2570d asc:11,0
> Jan 19 04:12:59 niobe /kernel: (da1:ahc0:0:4:0): Unrecovered read error
> Jan 19 04:12:59 niobe /kernel: /: got error 5 while accessing filesystem
> Jan 19 04:12:59 niobe /kernel: Lost type inodedep
> Jan 19 04:12:59 niobe last message repeated 12 times
> Jan 19 04:12:59 niobe /kernel: /: got error 5 while accessing filesystem
> Jan 19 04:12:59 niobe /kernel: Lost type inodedep
> Jan 19 04:12:59 niobe last message repeated 4 times
> Jan 19 04:12:59 niobe /kernel: /: got error 5 while accessing filesystem
> Jan 19 04:12:59 niobe /kernel: Lost type inodedep

Hmm, medium error.  That's not good at all.  My guess is that the info
field is probably the block that it had trouble with.  Yeah, looks like it
was probably part-way into the read request when it blew up.

> At that point, the system froze for about a minute, maybe less, then
> rebooted (it *may* have dropped into DDB; I tried typing "panic" and
> "continue" blind since I was in X at the time, and it rebooted at
> about the time I finished typing "continue" and hit enter)
> 
> A short time after reboot, I get:
> 
> Jan 19 04:31:19 niobe /kernel: (da1:ahc0:0:4:0): READ(06). CDB: 8 3 68 60 48 0
> Jan 19 04:31:19 niobe /kernel: (da1:ahc0:0:4:0): MEDIUM ERROR info:36866 asc:11,1
> Jan 19 04:31:19 niobe /kernel: (da1:ahc0:0:4:0): Read retries exhausted

Yep, another medium error, different block.

> ===> gnu/usr.bin/cc/cc1
> install -c -s -o root -g wheel -m 555   cc1 /usr/release/usr/libexec
> install: /usr/release/usr/libexec/cc1: Bad address
> *** Error code 71
> 
> Stop.
> 
> (which coincides with the message from vm_fault).


Yeah, if the disk your swap partition is on goes south, you'll have trouble.

> "camcontrol defects -n da -u 1 -f block -G" gives me 15 defects, and
> the following log message as a bonus:
> 
> Jan 19 04:39:33 niobe /kernel: (pass1:ahc0:0:4:0): READ DEFECT DATA(10). CDB: 37 0 8 0 0 0 0 fd e8 0
> Jan 19 04:39:33 niobe /kernel: (pass1:ahc0:0:4:0): RECOVERED ERROR asc:1c,0
> Jan 19 04:39:33 niobe /kernel: (pass1:ahc0:0:4:0): Defect list not found

That's normal for some drives.  If they don't support the defect list
format that you ask for, they'll spew a non-standard error back.  Sometimes
they'll return the defect list in a different format, sometimes they won't
return it at all.  camcontrol will handle the case where they return the
defect list in a different format.

Most every drive that I've seen will give you the defect list in 'phys'
format.

> The permanent defect list has >600 entries.

Don't worry about the permanent defect list.

> The disk is a six-months-old Quantum Fireball:
> 
> an 19 04:18:58 niobe /kernel: da1 at ahc0 bus 0 target 4 lun 0
> Jan 19 04:18:58 niobe /kernel: da1: <QUANTUM FIREBALL ST6.4S 0F0C> Fixed Direct Access SCSI-2 device
> Jan 19 04:18:58 niobe /kernel: da1: 20.0MB/s transfers (20.0MHz, offset 15), Tagged Queueing Enabled
> Jan 19 04:18:58 niobe /kernel: da1: 6180MB (12657717 512 byte sectors: 255H 63S/T 787C)
> 
> Can anybody tell me what error 5 means, and how serious the disk's
> condition is?

Error 5 is EIO (from sys/errno.h):

#define EIO             5               /* Input/output error */

Do you have read/write reallocation turned on for that disk?  If not, edit
mode page 1 and turn them on.  Then, back up whatever you can get off the
disk.

Then try writing to the entire disk, to try to force it to remap any bad
blocks it has on the disk.  Then read from every block on the disk and see
if you get any errors.  (dd is probably the best for both)

If you still have trouble, you can try formatting the disk.  The following
command will probably do the trick:

camcontrol cmd -v -t 7200 -n da -u 1 -c "4 0 0 0 0 0"

You might need a longer timeout.  I've heard rumors, though, that Quantum
doesn't do anything for the format unit command.  (specifically, that they
just wait 5 minutes and return the command)  I haven't tried to format a
Quantum disk in a long, long time, though, so I can't say whether that's
true or not.

I do know, however, that the 0F0C firmware for the Fireball ST is buggy,
and you're likely to see it hang up in certain situations under high load.
You'll probably be happier if you can get the 0F0J firmware (the 0FS1
firmware may also work okay).  Of course, you shouldn't bother if the disk
is going south.

Ken
-- 
Kenneth Merry
ken@plutotech.com

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message