Date: Sat, 9 Jan 2010 15:35:03 -0800 From: Steven Schlansker <stevenschlansker@gmail.com> To: freebsd-fs@freebsd.org Subject: Re: ZFS: Can't repair raidz2 (Cannot replace a replacing device) Message-ID: <CA577E79-C936-4EBE-81BA-E0C2940011E2@gmail.com> In-Reply-To: <alpine.BSF.2.00.0912272247410.64051@ibyngvyr> References: <048AF210-8B9A-40EF-B970-E8794EC66B2F@gmail.com> <4B315320.5050504@quip.cz> <5da0588e0912221741r48395defnd11e34728d2b7b97@mail.gmail.com> <9CEE3EE5-2CF7-440E-B5F4-D2BD796EA55C@gmail.com> <alpine.BSF.2.00.0912240708020.1450@ibyngvyr> <5565955F-482A-4628-A528-117C58046B1F@gmail.com> <alpine.BSF.2.00.0912272247410.64051@ibyngvyr>
next in thread | previous in thread | raw e-mail | index | archive | help
On Dec 27, 2009, at 8:59 PM, Wes Morgan wrote: > On Sun, 27 Dec 2009, Steven Schlansker wrote: >=20 >>=20 >> On Dec 24, 2009, at 5:17 AM, Wes Morgan wrote: >>=20 >>> On Wed, 23 Dec 2009, Steven Schlansker wrote: >>>>=20 >>>> Why has the replacing vdev not gone away? I still can't detach - >>>> [steven@universe:~]% sudo zpool detach universe 6170688083648327969 >>>> cannot detach 6170688083648327969: no valid replicas >>>> even though now there actually is a valid replica (ad26) >>>=20 >>> Try detaching ad26. If it lets you do that it will abort the = replacement and then you just do another replacement with the real = device. If it won't let you do that, you may be stuck having to do some = metadata tricks. >>>=20 >>=20 >> errors: No known data errors >> [steven@universe:~]% sudo zpool detach universe ad26 >> cannot detach ad26: no valid replicas >> [steven@universe:~]% sudo zpool offline -t universe ad26 >> cannot offline ad26: no valid replicas >>=20 >=20 > I just tried to re-create this scenario with some sparse files and I = was able to detach it completely (below). There is one difference, = however. Your array is returning checksum errors for the ad26 device. = Perhaps this is making the system think that there is no sibling device = in the replacement node that has all the data, so it denies the detach. = Even though logically the data will be recovered by a scrub later.. = Interesting. If you can determine where the detach is failing, that will = help paint the complete picture. >=20 Interestingly enough, I found a solution! Somewhat roundabout, but what = I did was replace a different device and let it resilver completely. = Then the array looked like this: NAME STATE READ WRITE CKSUM universe DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 ad16 ONLINE 0 0 0 replacing DEGRADED 0 0 0 ad26 ONLINE 0 0 0 6170688083648327969 UNAVAIL 0 1.13M 0 was = /dev/ad12 ad8 ONLINE 0 0 0 da0 ONLINE 0 0 0 ad10 ONLINE 0 0 0 concat/ad4ex ONLINE 0 0 0 ad24 ONLINE 0 0 0 concat/ad6ex ONLINE 0 0 0 Just for kicks, I then tried to detach - [steven@universe:~]% sudo zpool detach universe 6170688083648327969 [steven@universe:~]% sudo zpool status pool: universe state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM universe ONLINE 0 0 0 raidz2 ONLINE 0 0 0 ad16 ONLINE 0 0 0 ad26 ONLINE 0 0 0 ad8 ONLINE 0 0 0 da0 ONLINE 0 0 0 ad10 ONLINE 0 0 0 concat/ad4ex ONLINE 0 0 0 ad24 ONLINE 0 0 0 concat/ad6ex ONLINE 0 0 0 Ta-da! I have no idea why this helped, or how it fixed it, but if = anyone has this problem in the future try replacing a different device, letting it resilver, and = then detach the original problematic devices.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA577E79-C936-4EBE-81BA-E0C2940011E2>