Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Sep 2009 11:29:19 -0600
From:      Kurt Touet <ktouet@gmail.com>
To:        freebsd-fs@freebsd.org
Subject:   Re: ZFS - Unable to offline drive in raidz1 based pool
Message-ID:  <2a5e326f0909281029p17334ceeoff4bb3e7adeb5cef@mail.gmail.com>
In-Reply-To: <2a5e326f0909201500w1513aeb5ra644f1c748e22f34@mail.gmail.com>
References:  <2a5e326f0909201500w1513aeb5ra644f1c748e22f34@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
I've run into a similar experience again with my zfs raidz1 array
reporting itself as healthy when it's not.  This, again, was after
some drive spin_retry_count errors (and a power cycle when unable to
shutdown -h).  The pattern goes as follows:

1) A hard drive in the zfs array (for whatever reason) repeatedly
times out.. in this case, generating spin_retry_count errors in the
smart status.
2) The box is semi-frozen because it cannot deal with activity on the
zfs array, so it won't gracefully shutdown -h now.
3) The box is power cycled.
4) Everything spins up fine on the box, the array is now accessible.
5) zpool status - shows the array as online with no degraded status
6) zpool scrub - shows the drives to be desynced and resilvers a couple of them
7) presumably, everything is fine

monolith# zpool status
  pool: storage
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad14    ONLINE       0     0     0
            ad6     ONLINE       0     0     0
            ad12    ONLINE       0     0     0
            ad4     ONLINE       0     0     0
        spares
          ad22      AVAIL

errors: No known data errors
monolith# zpool scrub storage
monolith# zpool status
  pool: storage
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Mon Sep 28 11:17:05 2009
config:

        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad14    ONLINE       0     0     0  1.17M resilvered
            ad6     ONLINE       0     0     0  1.50K resilvered
            ad12    ONLINE       0     0     0  2K resilvered
            ad4     ONLINE       0     0     0  2K resilvered
        spares
          ad22      AVAIL

errors: No known data errors


So, my question still stands.. how does zfs upon scrubbing, instantly
know that the drives need to be resilvered (it completes in a few
seconds), but previous declares the array to be fine with no known
date errors?

Cheers,
-kurt



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2a5e326f0909281029p17334ceeoff4bb3e7adeb5cef>