Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Jul 1999 09:35:16 -0600 (MDT)
From:      "Kenneth D. Merry" <ken@plutotech.com>
To:        asami@cs.berkeley.edu (Satoshi Asami)
Cc:        scsi@FreeBSD.ORG
Subject:   Re: error logs
Message-ID:  <199907211535.JAA82716@panzer.kdm.org>
In-Reply-To: <199907210937.CAA96811@silvia.hip.berkeley.edu> from Satoshi Asami at "Jul 21, 1999 02:37:32 am"

next in thread | previous in thread | raw e-mail | index | archive | help
Satoshi Asami wrote...
> Hi,
> 
> I have a question.  I just saw some errors on the package building
> machine.  Part of it looks like this:
> 
> ===
>  :
> Jul 21 02:25:39 bento /kernel: (da7:ahc1:0:4:0): READ(10). CDB: 28 0 0 3c f8 16 0 0 2 0 
> Jul 21 02:25:39 bento /kernel: (da7:ahc1:0:4:0): MEDIUM ERROR info:3cf816 asc:11,0
> Jul 21 02:25:39 bento /kernel: (da7:ahc1:0:4:0): Unrecovered read error sks:80,9
> Jul 21 02:25:40 bento /kernel: (da7:ahc1:0:4:0): READ(10). CDB: 28 0 0 3c f8 16 0 0 2 0 
> Jul 21 02:25:41 bento /kernel: (da7:ahc1:0:4:0): RECOVERED ERROR info:3cf817 asc:17,2
> Jul 21 02:25:41 bento /kernel: (da7:ahc1:0:4:0): Recovered data with positive head offset sks:80,2
>  :
> ===
> 
> I assume the stuff after "CDB:" is the entire SCSI command (10-byte
> commands?),

Yes, that's the SCSI command.

> does this mean that the kernel got a medium error from the
> disk, retried the exact same read command and succeeded the second
> time, even though the disk had to do some internal fiddling ("positive
> head offset")?

Well, the two errors above refer to two different blocks on the disk.  The
command in question was the same in both instances, but the two errors are
for two different blocks.  (see the info field on the second line of the
error message, that tells you which block caused the problem) My guess is
that the command was retried by some of the upper-level code or something,
since CAM will generally only spit out one error for a command, and then
only after the retry count (4 in this case) has been exhausted.

So it looks like you've got one bad block, and one block that got
recovered.

> I also see a bunch of recovered error messages with no associated
> medium error messages.  This probably means the disk is dying, right?

It could indeed mean the disk is dying.

Make sure you have read and write reallocation turned on for the disk, and
keep track of the grown defect list.  (see the camcontrol man page for how
to do that)

Ken
-- 
Kenneth Merry
ken@plutotech.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199907211535.JAA82716>