Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 9 Jan 2010 15:35:03 -0800
From:      Steven Schlansker <stevenschlansker@gmail.com>
To:        freebsd-fs@freebsd.org
Subject:   Re: ZFS: Can't repair raidz2 (Cannot replace a replacing device)
Message-ID:  <CA577E79-C936-4EBE-81BA-E0C2940011E2@gmail.com>
In-Reply-To: <alpine.BSF.2.00.0912272247410.64051@ibyngvyr>
References:  <048AF210-8B9A-40EF-B970-E8794EC66B2F@gmail.com> <4B315320.5050504@quip.cz> <5da0588e0912221741r48395defnd11e34728d2b7b97@mail.gmail.com> <9CEE3EE5-2CF7-440E-B5F4-D2BD796EA55C@gmail.com> <alpine.BSF.2.00.0912240708020.1450@ibyngvyr> <5565955F-482A-4628-A528-117C58046B1F@gmail.com> <alpine.BSF.2.00.0912272247410.64051@ibyngvyr>

next in thread | previous in thread | raw e-mail | index | archive | help

On Dec 27, 2009, at 8:59 PM, Wes Morgan wrote:

> On Sun, 27 Dec 2009, Steven Schlansker wrote:
>=20
>>=20
>> On Dec 24, 2009, at 5:17 AM, Wes Morgan wrote:
>>=20
>>> On Wed, 23 Dec 2009, Steven Schlansker wrote:
>>>>=20
>>>> Why has the replacing vdev not gone away?  I still can't detach -
>>>> [steven@universe:~]% sudo zpool detach universe 6170688083648327969
>>>> cannot detach 6170688083648327969: no valid replicas
>>>> even though now there actually is a valid replica (ad26)
>>>=20
>>> Try detaching ad26. If it lets you do that it will abort the =
replacement and then you just do another replacement with the real =
device. If it won't let you do that, you may be stuck having to do some =
metadata tricks.
>>>=20
>>=20
>> errors: No known data errors
>> [steven@universe:~]% sudo zpool detach universe ad26
>> cannot detach ad26: no valid replicas
>> [steven@universe:~]% sudo zpool offline -t universe ad26
>> cannot offline ad26: no valid replicas
>>=20
>=20
> I just tried to re-create this scenario with some sparse files and I =
was able to detach it completely (below). There is one difference, =
however. Your array is returning checksum errors for the ad26 device. =
Perhaps this is making the system think that there is no sibling device =
in the replacement node that has all the data, so it denies the detach. =
Even though logically the data will be recovered by a scrub later.. =
Interesting. If you can determine where the detach is failing, that will =
help paint the complete picture.
>=20

Interestingly enough, I found a solution!  Somewhat roundabout, but what =
I did was replace a different device and let it resilver completely.  =
Then the array looked like this:

        NAME                       STATE     READ WRITE CKSUM
        universe                   DEGRADED     0     0     0
          raidz2                   DEGRADED     0     0     0
            ad16                   ONLINE       0     0     0
            replacing              DEGRADED     0     0     0
              ad26                 ONLINE       0     0     0
              6170688083648327969  UNAVAIL      0 1.13M     0  was =
/dev/ad12
            ad8                    ONLINE       0     0     0
            da0                    ONLINE       0     0     0
            ad10                   ONLINE       0     0     0
            concat/ad4ex           ONLINE       0     0     0
            ad24                   ONLINE       0     0     0
            concat/ad6ex           ONLINE       0     0     0

Just for kicks, I then tried to detach -

[steven@universe:~]% sudo zpool detach universe 6170688083648327969
[steven@universe:~]% sudo zpool status
  pool: universe
 state: ONLINE
 scrub: none requested
config:

        NAME              STATE     READ WRITE CKSUM
        universe          ONLINE       0     0     0
          raidz2          ONLINE       0     0     0
            ad16          ONLINE       0     0     0
            ad26          ONLINE       0     0     0
            ad8           ONLINE       0     0     0
            da0           ONLINE       0     0     0
            ad10          ONLINE       0     0     0
            concat/ad4ex  ONLINE       0     0     0
            ad24          ONLINE       0     0     0
            concat/ad6ex  ONLINE       0     0     0

Ta-da!  I have no idea why this helped, or how it fixed it, but if =
anyone has this problem
in the future try replacing a different device, letting it resilver, and =
then detach the original problematic devices.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA577E79-C936-4EBE-81BA-E0C2940011E2>