From owner-freebsd-fs@FreeBSD.ORG  Thu Nov  1 09:29:01 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 8D1EC60B
 for <freebsd-fs@freebsd.org>; Thu,  1 Nov 2012 09:29:01 +0000 (UTC)
 (envelope-from paul-freebsd@fletchermoorland.co.uk)
Received: from hercules.mthelicon.com (hercules.mthelicon.com [66.90.118.40])
 by mx1.freebsd.org (Postfix) with ESMTP id 510BF8FC0A
 for <freebsd-fs@freebsd.org>; Thu,  1 Nov 2012 09:29:00 +0000 (UTC)
Received: from demophon.fletchermoorland.co.uk (hydra.fletchermoorland.co.uk
 [78.33.209.59] (may be forged)) (authenticated bits=0)
 by hercules.mthelicon.com (8.14.5/8.14.5) with ESMTP id qA19SqYX011127;
 Thu, 1 Nov 2012 09:28:53 GMT
 (envelope-from paul-freebsd@fletchermoorland.co.uk)
Message-ID: <509240D3.7070607@fletchermoorland.co.uk>
Date: Thu, 01 Nov 2012 09:28:51 +0000
From: Paul Wootton <paul-freebsd@fletchermoorland.co.uk>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:12.0) Gecko/20120530 Thunderbird/12.0.1
MIME-Version: 1.0
To: Zaphod Beeblebrox <zbeeble@gmail.com>
Subject: Re: ZFS RaidZ-2 problems
References: <508F98F9.3040604@fletchermoorland.co.uk>
 <1351598684.88435.19.camel@btw.pki2.com>
 <508FE643.4090107@fletchermoorland.co.uk>
 <op.wmz1vtrd8527sy@ronaldradial.versatec.local>
 <5090010A.4050109@fletchermoorland.co.uk>
 <op.wm1axoqv8527sy@ronaldradial.versatec.local>
 <CACpH0MeJpSg3ti-QUgT=XwaC0jkEo5JeBAfRGPTFfUE6eLJFJg@mail.gmail.com>
In-Reply-To: <CACpH0MeJpSg3ti-QUgT=XwaC0jkEo5JeBAfRGPTFfUE6eLJFJg@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Nov 2012 09:29:01 -0000

On 10/31/12 17:58, Zaphod Beeblebrox wrote:
> I'd start off by saying "smart is your friend."  Install smartmontools
> and study the somewhat opaque "smartctl -a /dev/mydisk" output
> carefully.  Try running a short and/or long test, too.  Many times the
> disk can tell you what the problem is.  If too many blocks are being
> replaced, your drive is dying.  If the drive sees errors in commands
> it receives, the cable or the controller are at fault.   ZFS itself
> does _exceptionally_ well at trying to use what it has.

I already run SmartMonTools regularly. I do have a one of my drives that 
is starting to go bad.
The drive that keeps disconnecting actually looks on on SMART (when it's 
connected).

I normally also run a period scrub every few days (I've been caught out 
a few times before)

> I'll also say that bad power supplies make for bad disks.  Replacing a
> power supply has often been the solution to bad disk problems I've
> had.  Disks are sensitive to under voltage problems.  Brown-outs can
> exacerbate this problem.  My parents live out where power is very
> flaky.  Cheap UPSs didn't help much ... but a good power supply can
> make all the difference.
Maybe... I will not run out a bad power supply
> But I've also had bad controllers of late, too.  My most recent
> problem had my 9-disk raidZ1 array loose a disk.  Smartctl said that
> it was loosing blocks fast, so I RMA'd the disk.  When the new disk
> came, the array just wouldn't heal... it kept loosing the disks
> attached to a certain controller.  Now it's possible the controller
> was bad before the disk had died ... or that it died during the first
> attempt at resilver ... or that FreeBSD drivers don't like it anymore
> ... I don't know.
>
> My solution was to get two more 4 drive "pro box" SATA enclosures.
> They use a 1-to-4 SATA breakout and the 6 motherboard ports I have are
> a revision of the ICH11 intel chipset that supports SATA port
> replication (I already had two of these boxes).  In this manner I
> could remove the defective controller and put all disks onto the
> motherboard ICH11 (it actually also allowed me to later expand the
> array... but that's not part of this story).
Again maybe... It might be a controller or cable. It could actually be 
the drive.


I am not worried about the hardware side. I can replace the disks, 
cables, controllers and power supply with out any problems.

As I said before, the issue I have is, I have a 9 RAIDZ-2 pack with only 
1 disk showing as offline and the pack is showing as faulted.
If the power supply was bouncing and a drive was giving bad data, I 
would expect ZFS to report that 2 drives were faulted (1 offline and 1 
corrupt)

Is there a way with ZDB that I can see why the pool is showing as 
faulted? Can it tell me which drives it thinks are bad, or has bad data?


Paul