Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Sep 2007 18:16:40 -0400
From:      Yarema <yds@CoolRat.org>
To:        "=?UTF-8?Q?S=C3=B8ren_Schmidt?=" <sos@deepcore.dk>
Cc:        NYCBUG Talk <talk@lists.nycbug.org>, freebsd-stable@freebsd.org
Subject:   Re: FreeBSD PseudoRAID RAID0 array broken on atapci1: <Intel ICH5 SATA150 controller>
Message-ID:  <5AB37930CB158A943D586344@[192.168.1.72]>
In-Reply-To: <46F8AF60.4020709@deepcore.dk>
References:  <866CEC2FB789142D3C0AAFCB@[192.168.1.72]> <46F8AF60.4020709@deepcore.dk>

next in thread | previous in thread | raw e-mail | index | archive | help
--On Tuesday, September 25, 2007 8:49 AM +0200 S=C3=B8ren Schmidt=20
<sos@deepcore.dk> wrote:

> Yarema wrote:
>> Hi, I need some help recovering from this.  First some back story.
>> Running 6.2-STABLE i386 from Sep 17, 2007.  My /home slice is mounted
>> from /dev/ar0s1e where the relevant kernel messages look like so when
>> all is good:
>>
>> atapci1: <Intel ICH5 SATA150 controller>
>> ata2: <ATA channel 0> on atapci1
>> ata3: <ATA channel 1> on atapci1
>> ad4: 381554MB <WDC WD4000YR-01PLB0 01.06A01> at ata2-master SATA150
>> ad6: 381554MB <WDC WD4000YR-01PLB0 01.06A01> at ata3-master SATA150
>> ar0: 763108MB <FreeBSD PseudoRAID RAID0 (stripe 256 KB)> status: READY
>> ar0: disk0 READY using ad4 at ata2-master
>> ar0: disk1 READY using ad6 at ata3-master
>>
>> Today this server crashed with the following loggeed:
>>
>> ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=3D144888320
>> ad4: TIMEOUT - READ_DMA retrying (1 retry left) LBA=3D143390319
>> ad4: FAILURE - device detached
>> ar0: FAILURE - RAID0 array broken
>> subdisk4: detached
>> ad4: detached
>> g_vfs_done():ar0s1e[WRITE(offset=3D146002964480, length=3D2048)]error =
=3D 5
>> initiate_write_filepage: already started
>> g_vfs_done():ar0s1e[WRITE(offset=3D146002964480, length=3D2048)]error =
=3D 5
>> g_vfs_done():ar0s1e[WRITE(offset=3D6144000, length=3D16384)]error =3D 5
>> g_vfs_done():ar0s1e[WRITE(offset=3D6160384, length=3D16384)]error =3D 5
>> g_vfs_done():ar0s1e[WRITE(offset=3D6176768, length=3D16384)]error =3D 5
>> g_vfs_done():ar0s1e[WRITE(offset=3D6193152, length=3D16384)]error =3D 5
>> g_vfs_done():ar0s1e[WRITE(offset=3D6209536, length=3D2048)]error =3D 5
>> g_vfs_done():ar0s1e[WRITE(offset=3D65536, length=3D2048)]error =3D 5
>> g_vfs_done():ar0s1e[WRITE(offset=3D147801325568, length=3D12288)]error =
=3D 5
>> g_vfs_done():ar0s1e[WRITE(offset=3D147142686720, length=3D2048)]error =
=3D 5
>> g_vfs_done():ar0s1e[WRITE(offset=3D65536, length=3D2048)]error =3D 5
>> g_vfs_done():ar0s1e[WRITE(offset=3D6144000, length=3D16384)]error =3D 5
>> g_vfs_done():ar0s1e[WRITE(offset=3D6160384, length=3D16384)]error =3D 5
>> g_vfs_done():ar0s1e[WRITE(offset=3D6176768, length=3D16384)]error =3D 5
>> g_vfs_done():ar0s1e[WRITE(offset=3D6193152, length=3D16384)]error =3D 5
>> g_vfs_done():ar0s1e[WRITE(offset=3D6209536, length=3D2048)]error =3D 5
>> g_vfs_done():ar0s1e[WRITE(offset=3D146831867904, length=3D16384)]error =
=3D 5
>> g_vfs_done():ar0s1e[WRITE(offset=3D147024330752, length=3D16384)]error =
=3D 5
>> initiate_write_filepage: already started
>> g_vfs_done():ar0s1e[WRITE(offset=3D146002964480, length=3D2048)]error =
=3D 5
>> initiate_write_filepage: already started
>> g_vfs_done():ar0s1e[WRITE(offset=3D146002964480, length=3D2048)]error =
=3D 5
>> initiate_write_filepage: already started
>> g_vfs_done():ar0s1e[WRITE(offset=3D147801325568, length=3D12288)]error =
=3D 5
>> initiate_write_filepage: already started
>> g_vfs_done():ar0s1e[WRITE(offset=3D147142686720, length=3D2048)]error =
=3D 5
>>
>> Now the kernel messages read:
>>
>> ar0: FAILURE - RAID0 array broken
>> ar0: 763108MB <FreeBSD PseudoRAID RAID0 (stripe 256 KB)> status: BROKEN
>> ar0: disk0 READY using ad4 at ata2-master
>> ar0: disk1 DOWN no device found for this subdisk
>> ar1: 763108MB <FreeBSD PseudoRAID RAID0 (stripe 256 KB)> status: BROKEN
>> ar1: disk0 DOWN no device found for this subdisk
>> ar1: disk1 READY using ad6 at ata3-master
>>
>> For some reason the second disk in the array shows up as ar1 instead
>> of being part of ar0.  I suspect there's gotta be some way to force
>> the two drives to show up as part of the same array by perhaps editing
>> the PseudoRAID metadata on disk without putting any of the UFS2 data
>> in "jeopardy".  Any pointers on where to start poking around for the
>> relevant metadata structures on disk or what to search for?  I figure
>> if I can dd the metadata off the disks, tweak a field or two and then
>> dd the whole mess back I stand a chance of either hosing the array
>> irrevocably or getting it all back. ;)  Or maybe atacontrol could be
>> used to re-create the metadata without destroying the UFS2 on the
>> array?  I have a coredump of the kernel from this crash if that helps
>> analyze things any.
>>
>
> The solution to getting the array back is to "atacontrol delete ar0"
> "atacontrol delete ar1" "atacontrol create stripe 512 ad4 ad6" and
> the array is reborn.
>  However your filesystems might be just a bunch of bits depending
> on how much of the failed write that made it in there, you get the
> (missing) protection you asked for using RAID0....

S=C3=B8ren,

Thank you for your prompt and helpful reply.  I'm running into an new=20
situation with atacontrol:

% atacontrol create RAID0 512 ad4 ad6
ar0: 763108MB <Intel MatrixRAID RAID0 (stripe 128 KB)> status: READY
ar0: disk0 READY using ad4 at ata2-master
ar0: disk1 READY using ad6 at ata3-master

Note that the original RAID0 which broke was
ar0: 763108MB <FreeBSD PseudoRAID RAID0 (stripe 256 KB)> status: READY

Now atacontrol will not create FreeBSD PseudoRAID metadata with a 256KB=20
stripe, but insists on creating Intel MatrixRAID metadata with a 128KB=20
stripe.  This is on a non-R version of the ICH5 southbridge.  So there's no =

way to enable/disable the Intel MatrixRAID from the BIOS.  Nor is there any =

way to change the stripe size in the BIOS since there is no Intel=20
MatrixRAID BIOS on this motherboard.  The computer in question is a Dell=20
SC400 with an Intel OEM motherboard which has a very limited BIOS Setup=20
interface typical of Intel/Dell.

Is there any way to force atacontrol to create FreeBSD PseudoRAID metadata? =

Perhaps using an older FreeSBIE release based on FreeBSD 6.0 since IIRC I=20
created this RAID0 back when 6.0 was CURRENT.

--=20
Yarema



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5AB37930CB158A943D586344>