FreeBSD Mail Archives

Date:      Mon, 25 May 2009 09:19:21 -0700
From:      Freddie Cash <fjwcash@gmail.com>
To:        freebsd-current@freebsd.org
Subject:   Re: ZFS panic under extreme circumstances (2/3 disks corrupted)
Message-ID:  <b269bc570905250919t5bf37b5cv6037f22eaf925154@mail.gmail.com>
In-Reply-To: <D817D098-9C36-4B72-9DCB-027CE8A7C564@exscape.org>
References:  <4E6E325D-BB18-4478-BCFD-633D6F4CFD88@exscape.org> <D98FEABB-8B8A-48E6-B021-B05816B4C699@exscape.org> <b269bc570905250839r54a0f58fo5474e9e219a222ca@mail.gmail.com> <D817D098-9C36-4B72-9DCB-027CE8A7C564@exscape.org>

On Mon, May 25, 2009 at 9:12 AM, Thomas Backman <serenity@exscape.org> wrot=
e:
> On May 25, 2009, at 05:39 PM, Freddie Cash wrote:
>> On Mon, May 25, 2009 at 2:13 AM, Thomas Backman <serenity@exscape.org>
>> wrote:
>>> On May 24, 2009, at 09:02 PM, Thomas Backman wrote:
>>>
>>>> So, I was playing around with RAID-Z and self-healing...
>>>
>>> Yet another follow-up to this.
>>> It appears that all traces of errors vanish after a reboot. So, say you
>>> have a dying disk; ZFS repairs the data for you, and you don't notice (=
unless
>>> you check zpool status). Then you reboot, and there's NO (easy?) way th=
at I
>>> can tell to find out that something is wrong with your hardware!
>>
>> On our storage server that was initially configured using 1 large
>> 24-drive raidz2 vdev (don't do that, by the way), we had 1 drive go
>> south. =C2=A0"zpool status" was full of errors. =C2=A0And the error coun=
ts
>> survived reboots. =C2=A0Either that, or the drive was so bad that the er=
ror
>> counts started increasing right away after a boot. =C2=A0After a week of
>> fighting with it to get the new drive to resilver and get added to the
>> vdev, we nuked it and re-created it using 3 raidz2 vdevs each
>> comprised of 8 drives.
>>
>> (Un)fortunately, that was the only failure we've had so far, so can't
>> really confirm/deny the "error counts reset after reboot".
>
> Was this on FreeBSD?

64-bit FreeBSD 7.1 using ZFS v6.  SATA drives connected to 3Ware RAID
controllers, but configured as "Single Drive" arrays not using
hardware RAID in any way.

> I have another unfortunate thing to note regarding this: after a reboot,
> it's even impossible to tell *which disk* has gone bad, even if the pool =
is
> "uncleared" but otherwise "healed". It simply says that a device has fail=
ed,
> with no clue as to which one, since they're all "ONLINE"!

Even when using -v?  zpool status -v

--=20
Freddie Cash
fjwcash@gmail.com

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b269bc570905250919t5bf37b5cv6037f22eaf925154>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation