From owner-freebsd-stable@FreeBSD.ORG Tue Sep 25 22:16:44 2007 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 339F816A477 for ; Tue, 25 Sep 2007 22:16:44 +0000 (UTC) (envelope-from yds@CoolRat.org) Received: from dppl.com (orion.dppl.net [216.182.10.230]) by mx1.freebsd.org (Postfix) with ESMTP id D5FAA13C474 for ; Tue, 25 Sep 2007 22:16:43 +0000 (UTC) (envelope-from yds@CoolRat.org) Received: from [192.168.1.72] (c-68-83-224-175.hsd1.nj.comcast.net [68.83.224.175]) (AUTH: PLAIN yds, TLS: TLSv1/SSLv3,256bits,AES256-SHA) by dppl.com with esmtp; Tue, 25 Sep 2007 18:16:42 -0400 id 06432C46.0000000046F988CA.0000D680 Date: Tue, 25 Sep 2007 18:16:40 -0400 From: Yarema To: "=?UTF-8?Q?S=C3=B8ren_Schmidt?=" Message-ID: <5AB37930CB158A943D586344@[192.168.1.72]> In-Reply-To: <46F8AF60.4020709@deepcore.dk> References: <866CEC2FB789142D3C0AAFCB@[192.168.1.72]> <46F8AF60.4020709@deepcore.dk> X-Mailer: Mulberry/4.0.8 (Win32) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Cc: NYCBUG Talk , freebsd-stable@freebsd.org Subject: Re: FreeBSD PseudoRAID RAID0 array broken on atapci1: X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Sep 2007 22:16:44 -0000 --On Tuesday, September 25, 2007 8:49 AM +0200 S=C3=B8ren Schmidt=20 wrote: > Yarema wrote: >> Hi, I need some help recovering from this. First some back story. >> Running 6.2-STABLE i386 from Sep 17, 2007. My /home slice is mounted >> from /dev/ar0s1e where the relevant kernel messages look like so when >> all is good: >> >> atapci1: >> ata2: on atapci1 >> ata3: on atapci1 >> ad4: 381554MB at ata2-master SATA150 >> ad6: 381554MB at ata3-master SATA150 >> ar0: 763108MB status: READY >> ar0: disk0 READY using ad4 at ata2-master >> ar0: disk1 READY using ad6 at ata3-master >> >> Today this server crashed with the following loggeed: >> >> ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=3D144888320 >> ad4: TIMEOUT - READ_DMA retrying (1 retry left) LBA=3D143390319 >> ad4: FAILURE - device detached >> ar0: FAILURE - RAID0 array broken >> subdisk4: detached >> ad4: detached >> g_vfs_done():ar0s1e[WRITE(offset=3D146002964480, length=3D2048)]error = =3D 5 >> initiate_write_filepage: already started >> g_vfs_done():ar0s1e[WRITE(offset=3D146002964480, length=3D2048)]error = =3D 5 >> g_vfs_done():ar0s1e[WRITE(offset=3D6144000, length=3D16384)]error =3D 5 >> g_vfs_done():ar0s1e[WRITE(offset=3D6160384, length=3D16384)]error =3D 5 >> g_vfs_done():ar0s1e[WRITE(offset=3D6176768, length=3D16384)]error =3D 5 >> g_vfs_done():ar0s1e[WRITE(offset=3D6193152, length=3D16384)]error =3D 5 >> g_vfs_done():ar0s1e[WRITE(offset=3D6209536, length=3D2048)]error =3D 5 >> g_vfs_done():ar0s1e[WRITE(offset=3D65536, length=3D2048)]error =3D 5 >> g_vfs_done():ar0s1e[WRITE(offset=3D147801325568, length=3D12288)]error = =3D 5 >> g_vfs_done():ar0s1e[WRITE(offset=3D147142686720, length=3D2048)]error = =3D 5 >> g_vfs_done():ar0s1e[WRITE(offset=3D65536, length=3D2048)]error =3D 5 >> g_vfs_done():ar0s1e[WRITE(offset=3D6144000, length=3D16384)]error =3D 5 >> g_vfs_done():ar0s1e[WRITE(offset=3D6160384, length=3D16384)]error =3D 5 >> g_vfs_done():ar0s1e[WRITE(offset=3D6176768, length=3D16384)]error =3D 5 >> g_vfs_done():ar0s1e[WRITE(offset=3D6193152, length=3D16384)]error =3D 5 >> g_vfs_done():ar0s1e[WRITE(offset=3D6209536, length=3D2048)]error =3D 5 >> g_vfs_done():ar0s1e[WRITE(offset=3D146831867904, length=3D16384)]error = =3D 5 >> g_vfs_done():ar0s1e[WRITE(offset=3D147024330752, length=3D16384)]error = =3D 5 >> initiate_write_filepage: already started >> g_vfs_done():ar0s1e[WRITE(offset=3D146002964480, length=3D2048)]error = =3D 5 >> initiate_write_filepage: already started >> g_vfs_done():ar0s1e[WRITE(offset=3D146002964480, length=3D2048)]error = =3D 5 >> initiate_write_filepage: already started >> g_vfs_done():ar0s1e[WRITE(offset=3D147801325568, length=3D12288)]error = =3D 5 >> initiate_write_filepage: already started >> g_vfs_done():ar0s1e[WRITE(offset=3D147142686720, length=3D2048)]error = =3D 5 >> >> Now the kernel messages read: >> >> ar0: FAILURE - RAID0 array broken >> ar0: 763108MB status: BROKEN >> ar0: disk0 READY using ad4 at ata2-master >> ar0: disk1 DOWN no device found for this subdisk >> ar1: 763108MB status: BROKEN >> ar1: disk0 DOWN no device found for this subdisk >> ar1: disk1 READY using ad6 at ata3-master >> >> For some reason the second disk in the array shows up as ar1 instead >> of being part of ar0. I suspect there's gotta be some way to force >> the two drives to show up as part of the same array by perhaps editing >> the PseudoRAID metadata on disk without putting any of the UFS2 data >> in "jeopardy". Any pointers on where to start poking around for the >> relevant metadata structures on disk or what to search for? I figure >> if I can dd the metadata off the disks, tweak a field or two and then >> dd the whole mess back I stand a chance of either hosing the array >> irrevocably or getting it all back. ;) Or maybe atacontrol could be >> used to re-create the metadata without destroying the UFS2 on the >> array? I have a coredump of the kernel from this crash if that helps >> analyze things any. >> > > The solution to getting the array back is to "atacontrol delete ar0" > "atacontrol delete ar1" "atacontrol create stripe 512 ad4 ad6" and > the array is reborn. > However your filesystems might be just a bunch of bits depending > on how much of the failed write that made it in there, you get the > (missing) protection you asked for using RAID0.... S=C3=B8ren, Thank you for your prompt and helpful reply. I'm running into an new=20 situation with atacontrol: % atacontrol create RAID0 512 ad4 ad6 ar0: 763108MB status: READY ar0: disk0 READY using ad4 at ata2-master ar0: disk1 READY using ad6 at ata3-master Note that the original RAID0 which broke was ar0: 763108MB status: READY Now atacontrol will not create FreeBSD PseudoRAID metadata with a 256KB=20 stripe, but insists on creating Intel MatrixRAID metadata with a 128KB=20 stripe. This is on a non-R version of the ICH5 southbridge. So there's no = way to enable/disable the Intel MatrixRAID from the BIOS. Nor is there any = way to change the stripe size in the BIOS since there is no Intel=20 MatrixRAID BIOS on this motherboard. The computer in question is a Dell=20 SC400 with an Intel OEM motherboard which has a very limited BIOS Setup=20 interface typical of Intel/Dell. Is there any way to force atacontrol to create FreeBSD PseudoRAID metadata? = Perhaps using an older FreeSBIE release based on FreeBSD 6.0 since IIRC I=20 created this RAID0 back when 6.0 was CURRENT. --=20 Yarema