From owner-freebsd-fs@FreeBSD.ORG Thu Nov 1 09:29:01 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8D1EC60B for ; Thu, 1 Nov 2012 09:29:01 +0000 (UTC) (envelope-from paul-freebsd@fletchermoorland.co.uk) Received: from hercules.mthelicon.com (hercules.mthelicon.com [66.90.118.40]) by mx1.freebsd.org (Postfix) with ESMTP id 510BF8FC0A for ; Thu, 1 Nov 2012 09:29:00 +0000 (UTC) Received: from demophon.fletchermoorland.co.uk (hydra.fletchermoorland.co.uk [78.33.209.59] (may be forged)) (authenticated bits=0) by hercules.mthelicon.com (8.14.5/8.14.5) with ESMTP id qA19SqYX011127; Thu, 1 Nov 2012 09:28:53 GMT (envelope-from paul-freebsd@fletchermoorland.co.uk) Message-ID: <509240D3.7070607@fletchermoorland.co.uk> Date: Thu, 01 Nov 2012 09:28:51 +0000 From: Paul Wootton User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:12.0) Gecko/20120530 Thunderbird/12.0.1 MIME-Version: 1.0 To: Zaphod Beeblebrox Subject: Re: ZFS RaidZ-2 problems References: <508F98F9.3040604@fletchermoorland.co.uk> <1351598684.88435.19.camel@btw.pki2.com> <508FE643.4090107@fletchermoorland.co.uk> <5090010A.4050109@fletchermoorland.co.uk> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Nov 2012 09:29:01 -0000 On 10/31/12 17:58, Zaphod Beeblebrox wrote: > I'd start off by saying "smart is your friend." Install smartmontools > and study the somewhat opaque "smartctl -a /dev/mydisk" output > carefully. Try running a short and/or long test, too. Many times the > disk can tell you what the problem is. If too many blocks are being > replaced, your drive is dying. If the drive sees errors in commands > it receives, the cable or the controller are at fault. ZFS itself > does _exceptionally_ well at trying to use what it has. I already run SmartMonTools regularly. I do have a one of my drives that is starting to go bad. The drive that keeps disconnecting actually looks on on SMART (when it's connected). I normally also run a period scrub every few days (I've been caught out a few times before) > I'll also say that bad power supplies make for bad disks. Replacing a > power supply has often been the solution to bad disk problems I've > had. Disks are sensitive to under voltage problems. Brown-outs can > exacerbate this problem. My parents live out where power is very > flaky. Cheap UPSs didn't help much ... but a good power supply can > make all the difference. Maybe... I will not run out a bad power supply > But I've also had bad controllers of late, too. My most recent > problem had my 9-disk raidZ1 array loose a disk. Smartctl said that > it was loosing blocks fast, so I RMA'd the disk. When the new disk > came, the array just wouldn't heal... it kept loosing the disks > attached to a certain controller. Now it's possible the controller > was bad before the disk had died ... or that it died during the first > attempt at resilver ... or that FreeBSD drivers don't like it anymore > ... I don't know. > > My solution was to get two more 4 drive "pro box" SATA enclosures. > They use a 1-to-4 SATA breakout and the 6 motherboard ports I have are > a revision of the ICH11 intel chipset that supports SATA port > replication (I already had two of these boxes). In this manner I > could remove the defective controller and put all disks onto the > motherboard ICH11 (it actually also allowed me to later expand the > array... but that's not part of this story). Again maybe... It might be a controller or cable. It could actually be the drive. I am not worried about the hardware side. I can replace the disks, cables, controllers and power supply with out any problems. As I said before, the issue I have is, I have a 9 RAIDZ-2 pack with only 1 disk showing as offline and the pack is showing as faulted. If the power supply was bouncing and a drive was giving bad data, I would expect ZFS to report that 2 drives were faulted (1 offline and 1 corrupt) Is there a way with ZDB that I can see why the pool is showing as faulted? Can it tell me which drives it thinks are bad, or has bad data? Paul