Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Sep 2008 08:33:27 -0700
From:      Freddie Cash <fjwcash@gmail.com>
To:        freebsd-hackers@freebsd.org
Subject:   Re: ZFS w/failing drives - any equivalent of Solaris FMA?
Message-ID:  <200809120833.28233.fjwcash@gmail.com>
In-Reply-To: <C984A6E7B1C6657CD8C4F79E@Slim64.dmpriest.net.uk>
References:  <C984A6E7B1C6657CD8C4F79E@Slim64.dmpriest.net.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On September 12, 2008 02:45 am Karl Pielorz wrote:
> Recently, a ZFS pool on my FreeBSD box started showing lots of errors
> on one drive in a mirrored pair.
>
> The pool consists of around 14 drives (as 7 mirrored pairs), hung off
> of a couple of SuperMicro 8 port SATA controllers (1 drive of each pair
> is on each controller).
>
> One of the drives started picking up a lot of errors (by the end of
> things it was returning errors pretty much for any reads/writes issued)
> - and taking ages to complete the I/O's.
>
> However, ZFS kept trying to use the drive - e.g. as I attached another
> drive to the remaining 'good' drive in the mirrored pair, ZFS was still
> trying to read data off the failed drive (and remaining good one) in
> order to complete it's re-silver to the newly attached drive.

For the one time I've had a drive fail, and the three times I've replaced 
drives for larger ones, the process used was:

  zpool offline <pool> <old device>
  <remove old device>
  <insert new device>
  zpool replace <pool> <old device> <new device>

For one machine, I had to shut it off after the offline, as it didn't have 
hot-swappable drive bays.  For the other machine, it did everything while 
online and running.

IOW, the old device never had a chance to interfere with anything.  Same 
process we've used with hardware RAID setups in the past.

> Is there anything similar to this on FreeBSD yet? - i.e. Does/can
> anything on the system tell ZFS "This drives experiencing failures"
> rather than ZFS just seeing lots of timed out I/O 'errors'? (as appears
> to be the case).

Beyond the periodic script that checks for things like this, and sends 
root an e-mail, I haven't seen anything.

-- 
Freddie Cash
fjwcash@gmail.com



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200809120833.28233.fjwcash>