Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 May 2011 10:55:26 -0700
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        "Vladislav V. Prodan" <universite@ukr.net>
Cc:        fs@freebsd.org
Subject:   Re: how to import raidz2, if only one disk is missing?
Message-ID:  <20110525175526.GA45398@icarus.home.lan>
In-Reply-To: <4DDD0516.4060000@ukr.net>
References:  <4DDC0D13.3030401@ukr.net> <20110524201118.GF2415@garage.freebsd.pl> <4DDC128F.80203@ukr.net> <20110525025831.GA2363@DataIX.net> <4DDD0516.4060000@ukr.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, May 25, 2011 at 04:33:10PM +0300, Vladislav V. Prodan wrote:
> 25.05.2011 5:58, Jason Hellenthal wrote:
> >
> >Vladislav,
> >
> >Hi. Just a heads up on this instead of waiting for the MFC to happen you
> >may want to boot mfsBSD from Martin Matuska [1][2] to check your disks
> >ahead of time and diagnose whether it is worthwhile waiting for the MFC.
> 
> Thank you for reminding me about this rescue-CD.
> 
> I received a broken pool raidz2.
> 
> 1) Run smartctl on all 6 disks. He showed that the three discs
> problem with counters:
> 184 End-to-End_Error 0x0033 001 001 099 Pre-fail Always FAILING_NOW 149

Without knowing the exact device model of disk ("Device Model:"), and
whether or not the disk is within smartmontools' internal drive database
("Device is:"), this attribute may or may not actually be
End-to-End_Error.  It would be helpful if you could provide that.  Are
these HP drives, per chance?

Assuming these are HP drives: end-to-end error indicates, more or less,
bad cache on the drive itself.  HP implemented a parity check on every
512 bytes read/written from/to the drive's cache.  There's no error
correction used (to my knowledge), and failures are reported back to the
(host) controller in some manner.  HP does document that "in some
situations" (reads) the drive can attempt re-reads and re-write that
block of data in the cache, in hopes that a subsequent read will work.
In that situation I imagine the attribute would be incremented but a
hard failure (ATA error, etc.) not shown.

Are you absolutely certain you haven't seen a single error on your
FreeBSD console (or in /var/log/messages, etc.) since these drives were
put into use?  Were these brand new drives or previously used?

(Footnote for readers: this SMART attribute shouldn't be confused with
attribute 199 (CRC errors), which indicates communication failures
between both controllers (the controller in the host, and the controller
on the drive PCB) and is often an indicator of bad cabling, a bad
hot-swap backplane, a dusty/dirty SATA port, etc...)

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110525175526.GA45398>