Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 25 Jun 2011 15:58:33 -0700
From:      perryh@pluto.rain.com
To:        freebsd-drivers@freebsd.org
Subject:   fatal ata WRITE_DMA48 UDMA ICRC errors
Message-ID:  <4e066819.DRprHaL0TvBGL6Jl%perryh@pluto.rain.com>

next in thread | raw e-mail | index | archive | help
Once in a while, on a recently-installed 8.1-RELEASE, I get a
sequence like this (reformatted):

Jun 25 15:55:30 fbsd81 kernel: ad8: WARNING - WRITE_DMA48 UDMA
                ICRC error (retrying request) LBA=615769530
Jun 25 15:55:30 fbsd81 kernel: ad8: FAILURE - WRITE_DMA48
                status=51<READY,DSC,ERROR> error=4<ABORTED>
                LBA=615769530
Jun 25 15:55:30 fbsd81 kernel: GEOM_MIRROR: Request failed (error=5).
                ad8s2a[WRITE(offset=315265765888, length=78336)]
Jun 25 15:55:30 fbsd81 kernel: GEOM_MIRROR: Device gm0: provider
                ad8s2a disconnected.

The sequence is consistent:  a retried WRITE_DMA48 UDMA ICRC error
on ad8, a WRITE_DMA48 "FAILURE" on the same LBA with status=51 and
error=4, a gmirror "Request failed (error=5)", and a disconnect.
The LBA, offset, and length vary from one instance to another.

The retry seems to succeed most of the time -- the "WARNING -
WRITE_DMA48 UDMA ICRC error" message most often is not closely
followed by anything else -- but it is immediately followed by
a failure with status=51 and error=4 frequently enough to be a
significant problem (since it breaks the mirror).

The cable between the controller and the drive has been a factor --
the errors became much more frequent the first time I replaced it --
but I'm still getting occasional errors even with a brand-new cable.

I doubt there is anything wrong with the (nearly new) drive, because
I am not having any trouble at all with an identical drive connected
to the onboard ata controller as ad0, but I wonder if there may be
known issues with the VIA-based PCI card that provides two SATA
ports along with the ad8 ATA port.  (Nothing is connected as ad9,
and I haven't yet tried to use either of the SATA devices.)

I've asked on geom@ about the possibility of making gmirror more
robust to this sort of event, but the better solution would be to
improve the handling at the hardware or ata driver level.  What
would cause the ad8 driver to sometimes return a FAILURE indication
after a single retryable error?  Would it make sense to treat this
indication (with status=51 and error=4) as retryable?

Relevant parts of dmesg:

pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <PCI-PCI bridge> at device 1.0 on pci0
pci1: <PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci2: <ACPI PCI bus> on pcib2
atapci0: <VIA 6421 SATA150 controller>
 port 0xdc70-0xdc7f,0xdc50-0xdc5f,0xdc30-0xdc3f,
 0xdc10-0xdc1f,0xd8e0-0xd8ff,0xd400-0xd4ff
 irq 19 at device 11.0 on pci2
atapci0: [ITHREAD]
ata2: <ATA channel 0> on atapci0
ata2: [ITHREAD]
ata3: <ATA channel 1> on atapci0
ata3: [ITHREAD]
ata4: <ATA channel 2> on atapci0
ata4: [ITHREAD]
pcib3: <PCI-PCI bridge> at device 14.0 on pci2
pci3: <PCI bus> on pcib3
atapci1: <Intel ICH UDMA66 controller>
 port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf
 at device 31.1 on pci0
ata0: <ATA channel 0> on atapci1
ata0: [ITHREAD]
ata1: <ATA channel 1> on atapci1
ata1: [ITHREAD]
ad0: 305245MB <Hitachi HDT725032VLAT80 V54OA4NA> at ata0-master
 UDMA66 
ad1: 32253MB <MAXTOR 6L040L2 A93.0500> at ata0-slave UDMA66 
acd0: <Lite-On LTN483S 48x Max/PD02> CDROM drive at ata1 as slave
ad4: 61136MB <PATRIOT MEMORY 64GB SSD 02.10104> at ata2-master
 UDMA100 SATA 1.5Gb/s
acd1: <PIONEER DVD-RW DVR-212D/1.24> DVDR drive at ata3 as master
ad8: 305245MB <Hitachi HDT725032VLAT80 V54OA4NA> at ata4-master
 UDMA133 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4e066819.DRprHaL0TvBGL6Jl%perryh>