From owner-freebsd-scsi Mon Jan 18 20:31:23 1999 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id UAA09590 for freebsd-scsi-outgoing; Mon, 18 Jan 1999 20:31:23 -0800 (PST) (envelope-from owner-freebsd-scsi@FreeBSD.ORG) Received: from panzer.plutotech.com (panzer.plutotech.com [206.168.67.125]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id UAA09584 for ; Mon, 18 Jan 1999 20:31:21 -0800 (PST) (envelope-from ken@panzer.plutotech.com) Received: (from ken@localhost) by panzer.plutotech.com (8.9.1/8.8.5) id VAA05784; Mon, 18 Jan 1999 21:31:01 -0700 (MST) From: "Kenneth D. Merry" Message-Id: <199901190431.VAA05784@panzer.plutotech.com> Subject: Re: Fireball woes (continued) In-Reply-To: from Dag-Erling Smorgrav at "Jan 19, 99 04:43:08 am" To: des@flood.ping.uio.no (Dag-Erling Smorgrav) Date: Mon, 18 Jan 1999 21:31:01 -0700 (MST) Cc: scsi@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL28s (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-scsi@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Dag-Erling Smorgrav wrote... > Some of you may remember that I had a nearly brand-new Fireball act up > on me last fall... well, it's acting up again: > > Jan 19 04:12:59 niobe /kernel: (da1:ahc0:0:4:0): READ(06). CDB: 8 2 57 0 10 0 > Jan 19 04:12:59 niobe /kernel: (da1:ahc0:0:4:0): MEDIUM ERROR info:2570d asc:11,0 > Jan 19 04:12:59 niobe /kernel: (da1:ahc0:0:4:0): Unrecovered read error > Jan 19 04:12:59 niobe /kernel: /: got error 5 while accessing filesystem > Jan 19 04:12:59 niobe /kernel: Lost type inodedep > Jan 19 04:12:59 niobe last message repeated 12 times > Jan 19 04:12:59 niobe /kernel: /: got error 5 while accessing filesystem > Jan 19 04:12:59 niobe /kernel: Lost type inodedep > Jan 19 04:12:59 niobe last message repeated 4 times > Jan 19 04:12:59 niobe /kernel: /: got error 5 while accessing filesystem > Jan 19 04:12:59 niobe /kernel: Lost type inodedep Hmm, medium error. That's not good at all. My guess is that the info field is probably the block that it had trouble with. Yeah, looks like it was probably part-way into the read request when it blew up. > At that point, the system froze for about a minute, maybe less, then > rebooted (it *may* have dropped into DDB; I tried typing "panic" and > "continue" blind since I was in X at the time, and it rebooted at > about the time I finished typing "continue" and hit enter) > > A short time after reboot, I get: > > Jan 19 04:31:19 niobe /kernel: (da1:ahc0:0:4:0): READ(06). CDB: 8 3 68 60 48 0 > Jan 19 04:31:19 niobe /kernel: (da1:ahc0:0:4:0): MEDIUM ERROR info:36866 asc:11,1 > Jan 19 04:31:19 niobe /kernel: (da1:ahc0:0:4:0): Read retries exhausted Yep, another medium error, different block. > ===> gnu/usr.bin/cc/cc1 > install -c -s -o root -g wheel -m 555 cc1 /usr/release/usr/libexec > install: /usr/release/usr/libexec/cc1: Bad address > *** Error code 71 > > Stop. > > (which coincides with the message from vm_fault). Yeah, if the disk your swap partition is on goes south, you'll have trouble. > "camcontrol defects -n da -u 1 -f block -G" gives me 15 defects, and > the following log message as a bonus: > > Jan 19 04:39:33 niobe /kernel: (pass1:ahc0:0:4:0): READ DEFECT DATA(10). CDB: 37 0 8 0 0 0 0 fd e8 0 > Jan 19 04:39:33 niobe /kernel: (pass1:ahc0:0:4:0): RECOVERED ERROR asc:1c,0 > Jan 19 04:39:33 niobe /kernel: (pass1:ahc0:0:4:0): Defect list not found That's normal for some drives. If they don't support the defect list format that you ask for, they'll spew a non-standard error back. Sometimes they'll return the defect list in a different format, sometimes they won't return it at all. camcontrol will handle the case where they return the defect list in a different format. Most every drive that I've seen will give you the defect list in 'phys' format. > The permanent defect list has >600 entries. Don't worry about the permanent defect list. > The disk is a six-months-old Quantum Fireball: > > an 19 04:18:58 niobe /kernel: da1 at ahc0 bus 0 target 4 lun 0 > Jan 19 04:18:58 niobe /kernel: da1: Fixed Direct Access SCSI-2 device > Jan 19 04:18:58 niobe /kernel: da1: 20.0MB/s transfers (20.0MHz, offset 15), Tagged Queueing Enabled > Jan 19 04:18:58 niobe /kernel: da1: 6180MB (12657717 512 byte sectors: 255H 63S/T 787C) > > Can anybody tell me what error 5 means, and how serious the disk's > condition is? Error 5 is EIO (from sys/errno.h): #define EIO 5 /* Input/output error */ Do you have read/write reallocation turned on for that disk? If not, edit mode page 1 and turn them on. Then, back up whatever you can get off the disk. Then try writing to the entire disk, to try to force it to remap any bad blocks it has on the disk. Then read from every block on the disk and see if you get any errors. (dd is probably the best for both) If you still have trouble, you can try formatting the disk. The following command will probably do the trick: camcontrol cmd -v -t 7200 -n da -u 1 -c "4 0 0 0 0 0" You might need a longer timeout. I've heard rumors, though, that Quantum doesn't do anything for the format unit command. (specifically, that they just wait 5 minutes and return the command) I haven't tried to format a Quantum disk in a long, long time, though, so I can't say whether that's true or not. I do know, however, that the 0F0C firmware for the Fireball ST is buggy, and you're likely to see it hang up in certain situations under high load. You'll probably be happier if you can get the 0F0J firmware (the 0FS1 firmware may also work okay). Of course, you shouldn't bother if the disk is going south. Ken -- Kenneth Merry ken@plutotech.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message