Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Jul 1999 12:07:28 -0600 (MDT)
From:      "Kenneth D. Merry" <ken@plutotech.com>
To:        mike@smith.net.au (Mike Smith)
Cc:        asami@cs.berkeley.edu (Satoshi Asami), scsi@FreeBSD.ORG
Subject:   Re: error logs
Message-ID:  <199907211807.MAA83601@panzer.kdm.org>
In-Reply-To: <199907211630.JAA00715@dingo.cdrom.com> from Mike Smith at "Jul 21, 1999 09:30:25 am"

next in thread | previous in thread | raw e-mail | index | archive | help
Mike Smith wrote...
> > Hi,
> > 
> > I have a question.  I just saw some errors on the package building
> > machine.  Part of it looks like this:
> > 
> > ===
> >  :
> > Jul 21 02:25:39 bento /kernel: (da7:ahc1:0:4:0): READ(10). CDB: 28 0 0 3c f8 16 0 0 2 0 
> > Jul 21 02:25:39 bento /kernel: (da7:ahc1:0:4:0): MEDIUM ERROR info:3cf816 asc:11,0
> > Jul 21 02:25:39 bento /kernel: (da7:ahc1:0:4:0): Unrecovered read error sks:80,9
> 
> This is a fatal read error.  The kernel will retry it.

If it gets retried, it gets retried above the CAM layer.  When CAM prints
out an error message, it almost always is after all retries have been
completed.  Read and write commands from the da driver have a retry count
of 4.

> > Jul 21 02:25:40 bento /kernel: (da7:ahc1:0:4:0): READ(10). CDB: 28 0 0 3c f8 16 0 0 2 0 
> > Jul 21 02:25:41 bento /kernel: (da7:ahc1:0:4:0): RECOVERED ERROR info:3cf817 asc:17,2
> > Jul 21 02:25:41 bento /kernel: (da7:ahc1:0:4:0): Recovered data with positive head offset sks:80,2
> >  :
> 
> This is the kernel-instigated retry, note that the read10 command is 
> the same.  The drive reports that it was able to recover the data but 
> needed to adjust the head position in order to do so.

The read command is the same, but the block referred to in this error
message is different than the one above.  See the info field.  The read
cdb above is two blocks in length.

> > ===
> > 
> > I assume the stuff after "CDB:" is the entire SCSI command (10-byte
> > commands?), does this mean that the kernel got a medium error from the
> > disk, retried the exact same read command and succeeded the second
> > time, even though the disk had to do some internal fiddling ("positive
> > head offset")?
> > 
> > I also see a bunch of recovered error messages with no associated
> > medium error messages.  This probably means the disk is dying, right?
> 
> It at least means that it's grown some defects.  What I'm not seeing 
> are any additions to the grown defects list, despite ARRE being set.  8(

Read reallocation only works if the disk managed to salvage the data.  If
it can't salvage the data, it can't reallocate it.  Write reallocation,
IMO, should be successful much more often, because the kernel has the good
data already.

Ken
-- 
Kenneth Merry
ken@plutotech.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199907211807.MAA83601>