From owner-freebsd-fs@freebsd.org Sun Mar 10 02:35:40 2019 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 456711543B62 for ; Sun, 10 Mar 2019 02:35:34 +0000 (UTC) (envelope-from michelle@sorbs.net) Received: from hades.sorbs.net (hades.sorbs.net [72.12.213.40]) by mx1.freebsd.org (Postfix) with ESMTP id E740887C2A; Sun, 10 Mar 2019 02:35:31 +0000 (UTC) (envelope-from michelle@sorbs.net) MIME-version: 1.0 Received: from [10.228.74.137] (unknown [1.129.247.131]) by hades.sorbs.net (Oracle Communications Messaging Server 7.0.5.29.0 64bit (built Jul 9 2013)) with ESMTPSA id <0PO400AA1QGGBG10@hades.sorbs.net>; Sat, 09 Mar 2019 18:48:50 -0800 (PST) From: Michelle Sullivan Date: Sun, 10 Mar 2019 13:34:55 +1100 Subject: Re: ZFS pool faulted (corrupt metadata) but the disk data appears ok... Message-id: References: <54D3E9F6.20702@sorbs.net> <54D41608.50306@delphij.net> <54D41AAA.6070303@sorbs.net> <54D41C52.1020003@delphij.net> <54D424F0.9080301@sorbs.net> <54D47F94.9020404@freebsd.org> <54D4A552.7050502@sorbs.net> <54D4BB5A.30409@freebsd.org> <54D8B3D8.6000804@sorbs.net> <54D8CECE.60909@freebsd.org> <54D8D4A1.9090106@sorbs.net> <54D8D5DE.4040906@sentex.net> <54D8D92C.6030705@sorbs.net> <54D8E189.40201@sorbs.net> <54D924DD.4000205@sorbs.net> <54DCAC29.8000301@sorbs.net> <9c995251-45f1-cf27-c4c8-30a4bd0f163c@sorbs.net> <8282375D-5DDC-4294-A69C-03E9450D9575@gmail.com> <73dd7026-534e-7212-a037-0cbf62a61acd@sorbs.net> <027070fb-f7b5-3862-3a52-c0f280ab46d1@sorbs.net> <42C31457-1A84-4CCA-BF14-357F1F3177DA@gmail.com> <5eb35692-37ab-33bf-aea1-9f4aa61bb7f7@sorbs.net> <3be04f0b-bded-9b77-896b-631824a14c4a@sorbs.net> In-reply-to: <3be04f0b-bded-9b77-896b-631824a14c4a@sorbs.net> To: Ben RUBSON , "freebsd-fs@freebsd.org" , Stefan Esser X-Mailer: iPad Mail (16A404) X-Rspamd-Queue-Id: E740887C2A X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of michelle@sorbs.net designates 72.12.213.40 as permitted sender) smtp.mailfrom=michelle@sorbs.net X-Spamd-Result: default: False [-1.08 / 15.00]; ARC_NA(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; R_SPF_ALLOW(-0.20)[+a:hades.sorbs.net]; NEURAL_HAM_LONG(-0.99)[-0.986,0]; TAGGED_RCPT(0.00)[]; MIME_GOOD(-0.10)[multipart/alternative,text/plain,multipart/related]; DMARC_NA(0.00)[sorbs.net]; TO_DN_SOME(0.00)[]; NEURAL_SPAM_SHORT(0.86)[0.861,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[cached: battlestar.sorbs.net]; RCVD_IN_DNSWL_NONE(0.00)[40.213.12.72.list.dnswl.org : 127.0.10.0]; NEURAL_HAM_MEDIUM(-0.73)[-0.729,0]; IP_SCORE(-0.01)[country: US(-0.07)]; FREEMAIL_TO(0.00)[gmail.com]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:+]; ASN(0.00)[asn:11114, ipnet:72.12.192.0/19, country:US]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-Mailman-Approved-At: Sun, 10 Mar 2019 10:49:24 +0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Mar 2019 02:35:40 -0000 Turns out the cause of the fire is now known... https://www.southcoastregister.com.au/story/5945663/homes-left-without-power= -after-electrical-pole-destroyed-in-sanctuary-point-accident/ UPSs couldn=E2= =80=99t deal with 11kv down the 240v line... (guess I=E2=80=99m lucky no one= was killed..) Anyhow.. Any clues on how to get the pool back would be greatly appreciated.. the =E2= =80=9Ccannot open=E2=80=9D disk was the faulted disk that mfid13 was replaci= ng... being raidz2 there should be all the data there=20 Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 10 Mar 2019, at 12:29, Michelle Sullivan wrote: >=20 > Michelle Sullivan wrote: >> Ben RUBSON wrote: >>>> On 02 Feb 2018 21:48, Michelle Sullivan wrote: >>>>=20 >>>> Ben RUBSON wrote: >>>>=20 >>>>> So disks died because of the carrier, as I assume the second unscathed= server was OK... >>>>=20 >>>> Pretty much. >>>>=20 >>>>> Heads must have scratched the platters, but they should have been park= ed, so... Really strange. >>>>=20 >>>> You'd have thought... though 2 of the drives look like it was wear and w= ear issues (the 2 not showing red lights) just not picked up on the periodic= scrub.... Could be that the recovery showed that one up... you know - how y= ou can have an array working fine, but one disk dies then others fail during= the rebuild because of the extra workload. >>>=20 >>> Yes... To try to mitigate this, when I add a new vdev to a pool, I sprea= d the new disks I have among the existing vdevs, and construct the new vdev w= ith the remaining new disk(s) + other disks retrieved from the other vdevs. T= hus, when possible, avoiding vdevs with all disks at the same runtime. >>> However I only use mirrors, applying this with raid-Z could be a little b= it more tricky... >>>=20 >> Believe it or not... >>=20 >> # zpool status -v >> pool: VirtualDisks >> state: ONLINE >> status: One or more devices are configured to use a non-native block size= . >> Expect reduced performance. >> action: Replace affected devices with devices that support the >> configured block size, or migrate data to a properly configured >> pool. >> scan: none requested >> config: >>=20 >> NAME STATE READ WRITE CKSUM >> VirtualDisks ONLINE 0 0 0 >> zvol/sorbs/VirtualDisks ONLINE 0 0 0 block size: 512= B configured, 8192B native >>=20 >> errors: No known data errors >>=20 >> pool: sorbs >> state: ONLINE >> scan: resilvered 2.38T in 307445734561816429h29m with 0 errors on Sat Au= g 26 09:26:53 2017 >> config: >>=20 >> NAME STATE READ WRITE CKSUM >> sorbs ONLINE 0 0 0 >> raidz2-0 ONLINE 0 0 0 >> mfid0 ONLINE 0 0 0 >> mfid1 ONLINE 0 0 0 >> mfid7 ONLINE 0 0 0 >> mfid8 ONLINE 0 0 0 >> mfid12 ONLINE 0 0 0 >> mfid10 ONLINE 0 0 0 >> mfid14 ONLINE 0 0 0 >> mfid11 ONLINE 0 0 0 >> mfid6 ONLINE 0 0 0 >> mfid15 ONLINE 0 0 0 >> mfid2 ONLINE 0 0 0 >> mfid3 ONLINE 0 0 0 >> spare-12 ONLINE 0 0 3 >> mfid13 ONLINE 0 0 0 >> mfid9 ONLINE 0 0 0 >> mfid4 ONLINE 0 0 0 >> mfid5 ONLINE 0 0 0 >> spares >> 185579620420611382 INUSE was /dev/mfid9 >>=20 >> errors: No known data errors >>=20 >>=20 >> It would appear that the when I replaced the damaged drives it picked one= of them up as being rebuilt from back in August (before it was packed up to= go) and that was why it saw it as 'corrupted metadata' and spent the last 3= weeks importing it, it rebuilt it as it was importing it.. no dataloss that= I can determine. (literally just finished in the middle of the night here.)= >>=20 >=20 > And back to this little nutmeg... >=20 > We had a fire last night ... and it (the same pool) was resilvering again= ... Corrupted the metadata.. import -fFX worked and it started rebuilding, t= hen during the early hours when the pool was at 50%(ish) rebuilt/resilvered (= one vdev) there was at last one more issue on the powerline... UPSs went out= after multiple hits and now can't get it imported - the server was in singl= e user mode - on a FBSD-12 USB stick ... so it was only resilvering... "zd= b -AAA -L -uhdi -FX -e storage" returns sanely... >=20 > anyone any thoughts how I might get the data back/pool to import? (zpool i= mport -fFX storage spends a long time working and eventually comes back with= unable to import as one or more of the vdevs are unavailable - however they= are all there as far as I can tell) >=20 > THanks, >=20 > --=20 > Michelle Sullivan > http://www.mhix.org/ >=20