Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 09 Aug 2005 10:23:50 +0200
From:      "O. Hartmann" <ohartman@mail.uni-mainz.de>
To:        Mike Tancsa <mike@sentex.net>
Cc:        freebsd-stable@freebsd.org, freebsd-questions@freebsd.org
Subject:   Re: ad10: WARNING - READ_DMA UDMA ICRC error (retrying  request) LBA=11441599
Message-ID:  <42F86816.6070706@mail.uni-mainz.de>
In-Reply-To: <6.2.1.2.0.20050808232304.03deb4b8@64.7.153.2>
References:  <42F7F7E8.1020507@mail.uni-mainz.de> <6.2.1.2.0.20050808232304.03deb4b8@64.7.153.2>

next in thread | previous in thread | raw e-mail | index | archive | help
Mike Tancsa wrote:
> At 08:25 PM 08/08/2005, O. Hartmann wrote:
> 
>> Hello.
>>
>> My box is a FreeBSD 6.0-BETA2 driven ASUS A8N-SLI Deluxe based AMD64 
>> boxed (see dmesg).
>> One of  my SATA disks, the SAMSUNG SP2004C seems to show errors during 
>> operation (and also showd under 5.4-RELEASE-p3).
>> Sometimes I get this error:
>> ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
>> while the machine still keeps working.
>> Other days the box crashes completely.
>>
>> Is this a operating system bug or is this message an evidence of 
>> defective hardware?
> 
> 
> You can probably confirm a hardware issue with the smartmon tools.  
> (/usr/ports/sysutils/smartmontools).
> 
> It was quite handy the other day for us to narrow down a problem between 
> a drive tray and the actual drive.  We started to see
> 
> Aug  3 02:02:49 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 
> retries left) LBA=391423
> Aug  3 02:03:00 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 
> retries left) LBA=2304319
> Aug  3 02:03:10 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 
> retries left) LBA=2312927
> Aug  3 02:03:17 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 
> retries left) LBA=2308639
> Aug  3 02:03:26 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 
> retries left) LBA=2309855
> Aug  3 02:03:37 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 
> retries left) LBA=2348359
> Aug  4 12:12:37 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 
> retries left) LBA=1528639
> Aug  4 12:13:04 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 
> retries left) LBA=1530031
> Aug  4 12:13:04 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (1 
> retry left) LBA=1528639
> Aug  4 12:13:04 verify1 kernel: ad0: FAILURE - READ_DMA timed out
> Aug  4 12:13:04 verify1 kernel: spec_getpages:(ad0s1a) I/O read failure: 
> (error=5) bp 0xd630b4fc vp 0xc2640d68
> 
> Yet when we read the actual error info off the drive via smartctl -a 
> ad0, it was clean.  So it pointed to the drive tray which we swapped and 
> all was well.  In other situations however, the smart info will often 
> tell you if the drive is starting to fail.  Its not 100% reliable, but 
> since we started using it, it generally gave us some sort of heads up as 
> to whether or not a drive is in trouble.
> 
> 
>         ---Mike

Dear Mike.
Thanks a lot for this info.
I will use this tool and try to report what I found out.

I also use trays for my drives (like I did with SCSI and SCA2 on our 
servers at the lab). Maybe this could be an issue.

Oliver



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?42F86816.6070706>