Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 18 Jul 2013 08:29:14 +0100
From:      Dr Josef Karthauser <joe@karthauser.co.uk>
To:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Cc:        "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>
Subject:   Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue?
Message-ID:  <60F7BE75-5E2F-471E-A9CE-AF4CD17D96E2@karthauser.co.uk>
References:  <20130716225013.1C63B23A@babel.karthauser.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi there,

I'm scratching my head. I've just migrated to a super micro chassis and =
at the same time gone from FreeBSD 9.0 to 9.1-RELEASE.

The machine in question is running a ZFS mirror configuration on two ada =
devices (with a 8gb gmirror carved out for swap).

Since doing so I've been having strange drop outs on the drives; the =
just disappear from the bus like so:

(ada2:ahcich2:0:0:0): removing device entry
(aprobe0:ahcich2:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich2:0:0:0): CAM status: ATA Status Error
(aprobe0:ahcich2:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 =
(ABRT )
(aprobe0:ahcich2:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
(aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted
(aprobe0:ahcich2:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich2:0:0:0): CAM status: ATA Status Error
(aprobe0:ahcich2:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 =
(ABRT )
(aprobe0:ahcich2:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
(aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted


At first I though it was a failing drive - one of the drives did this, =
and I limped on a single drive for a week until I could get someone up =
to the rack to plug a third drive in.  We resilvered the zpool onto the =
new device and ran with the failed drive still plugged in (but not =
responding to a reset on the ada bus with camcontrol) for a week or so.

Then, the new drive dropped out in exactly the same way, followed in =
short order by the remaining original drive!!!

After rebooting the machine, and observing all three drives probing and =
available, I resilvered the gmirror and zpool again on the two devices =
expected that I thought were reliable, but before the resilvering was =
completed the new drive dropped out again.

I'm scratching my head now. I can't imagine that it's a wiring problem, =
as they are all on individual SATA buses and individually cabled.

Smart isn't reporting an drive issues either=85. :/

So, I'm wondering, is it a driver issuer with 9.1-RELEASE, if I upgrade =
to 9-RELENG would I expect that to resolve the problem?  (Have there =
been any reported ada bus issuer reported since last December?)

The hardware in question is:

ahci0: <Intel Cougar Point AHCI SATA controller> port =
0xf050-0xf057,0xf040-0xf043,0xf030-0xf037,0xf020-0xf023,0xf000-0xf01f =
mem 0xdfb02000-0xdfb027ff irq 19 at device 31.2 on pci0
ahci0: AHCI v1.30 with 6 3Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich5: <AHCI channel> at channel 5 on ahci0
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD1000FYPS-01ZKB0 02.01B01> ATA-8 SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <WDC WD1000FYPS-01ZKB0 02.01B01> ATA-8 SATA 2.x device
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada1: Previously was known as ad6
ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
ada2: <WDC WD1000FYPS-01ZKB0 02.01B01> ATA-8 SATA 2.x device
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada2: Previously was known as ad8


Any ideas would be greatly welcomed.

Thanks,
Joe




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?60F7BE75-5E2F-471E-A9CE-AF4CD17D96E2>