Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 Dec 2009 16:36:00 -0800
From:      Steven Schlansker <stevenschlansker@gmail.com>
To:        freebsd-fs@freebsd.org
Subject:   Re: ZFS: Can't repair raidz2 (Cannot replace a replacing device)
Message-ID:  <9CEE3EE5-2CF7-440E-B5F4-D2BD796EA55C@gmail.com>
In-Reply-To: <5da0588e0912221741r48395defnd11e34728d2b7b97@mail.gmail.com>
References:  <048AF210-8B9A-40EF-B970-E8794EC66B2F@gmail.com> <4B315320.5050504@quip.cz> <5da0588e0912221741r48395defnd11e34728d2b7b97@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Dec 22, 2009, at 5:41 PM, Rich wrote:

> http://kerneltrap.org/mailarchive/freebsd-fs/2009/9/30/6457763 may be
> useful to you - it's what we did when we got stuck in a resilver loop.
> I recall being in the same state you're in right now at one point, and
> getting out of it from there.
>=20
> I think if you apply that patch, you'll be able to cancel the
> resilver, and then resilver again with the device you'd like to
> resilver with.
>=20

Thanks for the suggestion, but the problem isn't that it's stuck
in a resilver loop (which is what the patch seems to try to avoid)
but that I can't detach a drive.

Now I got clever and fudged a label onto the new drive (copied the first
50MB of one of the dying drives), ran a scrub, and have this layout -

  pool: universe
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are =
unaffected.
action: Determine if the device needs to be replaced, and clear the =
errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 20h58m with 0 errors on Wed Dec 23 =
11:36:43 2009
config:=20

        NAME                       STATE     READ WRITE CKSUM
        universe                   DEGRADED     0     0     0
          raidz2                   DEGRADED     0     0     0
            ad16                   ONLINE       0     0     0
            replacing              DEGRADED     0     0 40.7M
              ad26                 ONLINE       0     0     0  506G =
repaired
              6170688083648327969  UNAVAIL      0 88.7M     0  was =
/dev/ad12
            ad8                    ONLINE       0     0     0
            concat/back2           ONLINE       0     0     0
            ad10                   ONLINE       0     0     0
            concat/ad4ex           ONLINE       0     0     0
            ad24                   ONLINE       0     0     0
            concat/ad6ex           ONLINE      48     0     0  28.5K =
repaired

Why has the replacing vdev not gone away?  I still can't detach -
[steven@universe:~]% sudo zpool detach universe 6170688083648327969
cannot detach 6170688083648327969: no valid replicas
even though now there actually is a valid replica (ad26)

Additionally, running zpool clear hangs permanently and in fact freezes =
all IO
to the pool.  Since I've mounted /usr from the pool, this is effectively
death to the system.  Any other zfs commands seem to work okay
(zpool scrub, zfs mount, etc.).  Just clear is insta-death.  I can't
help but suspect that this is caused by the now non-sensical vdev =
configuration
(replacing with one good drive and one nonexistent one)...

Any further thoughts?  Thanks,
Steven


> - Rich
>=20
> On Tue, Dec 22, 2009 at 6:15 PM, Miroslav Lachman <000.fbsd@quip.cz> =
wrote:
>> Steven Schlansker wrote:
>>>=20
>>> As a corollary, you may notice some funky concat business going on.
>>> This is because I have drives which are very slightly different in =
size (<
>>>  1MB)
>>> and whenever one of them goes down and I bring the pool up, it =
helpfully
>>> (?)
>>> expands the pool by a whole megabyte then won't let the drive back =
in.
>>> This is extremely frustrating... is there any way to fix that?  I'm
>>> eventually going to keep expanding each of my drives one megabyte at =
a
>>> time
>>> using gconcat and space on another drive!  Very frustrating...
>>=20
>> You can avoid it by partitioning the drives to the well known =
'minimal' size
>> (size of smallest disk) and use the partition instead of raw disk.
>> For example ad12s1 instead of ad12 (if you creat slices by fdisk)
>> of ad12p1 (if you creat partitions by gpart)
>>=20
>> You can also use labels instead of device name.
>>=20
>> Miroslav Lachman
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>=20
>=20
>=20
>=20
> --=20
>=20
> If you are over 80 years old and accompanied by your parents, we will
> cash your check.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9CEE3EE5-2CF7-440E-B5F4-D2BD796EA55C>