Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 09 Mar 2010 13:33:27 -0500
From:      Steve Polyack <korvus@comcast.net>
To:        Ivan Voras <ivoras@freebsd.org>
Cc:        freebsd-fs@freebsd.org, freebsd-stable@freebsd.org
Subject:   Re: ZFS hot spares
Message-ID:  <4B969477.70706@comcast.net>
In-Reply-To: <hn56sl$kor$1@dough.gmane.org>
References:  <4B953C92.5080606@comcast.net> <hn56sl$kor$1@dough.gmane.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 03/09/10 05:11, Ivan Voras wrote:
> On 03/08/10 19:06, Steve Polyack wrote:
>> ZFS in FreeBSD lacks at least one major feature from the Solaris
>> version: hot spares. There is a PR open at
>> http://www.freebsd.org/cgi/query-pr.cgi?pr=134491, but there hasn't been
>> any motion/thoughts posted on it since its creation almost one year ago.
>>
>> I'm aware that on Solaris, hot spare replacement is handled by a few
>> Solaris-specific daemons, zfs-retire and zfs-diagnose, which both plug
>> into the Solaris FMA (Fault Management Architecture). Have there been
>> any thoughts on porting these over or getting something similar running
>> within FreeBSD? With all of the recent SATA/SAS CAM hotplug work now
>> committed, it would be nice to have automatic replacement of hot spares
>> with a future hot-replacement of the failed drive.
>>
>> On the other side, I'd be interested in hearing if anyone has had
>> success in rolling their own scripted solution: i.e. something which
>> polls 'zpool status' looking for failed drives and performing hot-spare
>> replacements automatically.
>
> You don't have to exactly poll it. See /etc/devd.conf:
>
> # Sample ZFS problem reports handling.
> notify 10 {
>         match "system"          "ZFS";
>         match "type"            "zpool";
>         action "logger -p kern.err 'ZFS: failed to load zpool $pool'";
> };
>
> notify 10 {
>         match "system"          "ZFS";
>         match "type"            "vdev";
>         action "logger -p kern.err 'ZFS: vdev failure, zpool=$pool 
> type=$type'";
> };
>
> notify 10 {
>         match "system"          "ZFS";
>         match "type"            "data";
>         action "logger -p kern.warn 'ZFS: zpool I/O failure, 
> zpool=$pool error=$zio_err'";
> };
>
> notify 10 {
>         match "system"          "ZFS";
>         match "type"            "io";
>         action "logger -p kern.warn 'ZFS: vdev I/O failure, 
> zpool=$pool path=$vdev_path offset=$zio_offset size=$zio_size 
> error=$zio_err'";
> };
>
> notify 10 {
>         match "system"          "ZFS";
>         match "type"            "checksum";
>         action "logger -p kern.warn 'ZFS: checksum mismatch, 
> zpool=$pool path=$vdev_path offset=$zio_offset size=$zio_size'";
> };
>
> I don't really know if these notifications actually work since I don't 
> have hot-plug test machines, but if they do, this looks like a decent 
> starting point.
>

Thanks for the suggestions.  I received a similar one from someone 
else.  If I get time to build a ZFS lab machine then I will certainly 
try these out and provide feedback on how they work.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B969477.70706>