From owner-freebsd-fs@freebsd.org  Sun Mar 10 02:35:40 2019
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 456711543B62
 for <freebsd-fs@mailman.ysv.freebsd.org>; Sun, 10 Mar 2019 02:35:34 +0000 (UTC)
 (envelope-from michelle@sorbs.net)
Received: from hades.sorbs.net (hades.sorbs.net [72.12.213.40])
 by mx1.freebsd.org (Postfix) with ESMTP id E740887C2A;
 Sun, 10 Mar 2019 02:35:31 +0000 (UTC)
 (envelope-from michelle@sorbs.net)
MIME-version: 1.0
Received: from [10.228.74.137] (unknown [1.129.247.131])
 by hades.sorbs.net (Oracle Communications Messaging Server 7.0.5.29.0 64bit
 (built Jul  9 2013)) with ESMTPSA id <0PO400AA1QGGBG10@hades.sorbs.net>; Sat,
 09 Mar 2019 18:48:50 -0800 (PST)
From: Michelle Sullivan <michelle@sorbs.net>
Date: Sun, 10 Mar 2019 13:34:55 +1100
Subject: Re: ZFS pool faulted (corrupt metadata) but the disk data appears
 ok...
Message-id: <FF6E0ABD-48BB-4AB6-814C-925157876977@sorbs.net>
References: <54D3E9F6.20702@sorbs.net> <54D41608.50306@delphij.net>
 <54D41AAA.6070303@sorbs.net> <54D41C52.1020003@delphij.net>
 <54D424F0.9080301@sorbs.net> <54D47F94.9020404@freebsd.org>
 <54D4A552.7050502@sorbs.net> <54D4BB5A.30409@freebsd.org>
 <54D8B3D8.6000804@sorbs.net> <54D8CECE.60909@freebsd.org>
 <54D8D4A1.9090106@sorbs.net> <54D8D5DE.4040906@sentex.net>
 <54D8D92C.6030705@sorbs.net> <54D8E189.40201@sorbs.net>
 <54D924DD.4000205@sorbs.net> <54DCAC29.8000301@sorbs.net>
 <9c995251-45f1-cf27-c4c8-30a4bd0f163c@sorbs.net>
 <8282375D-5DDC-4294-A69C-03E9450D9575@gmail.com>
 <73dd7026-534e-7212-a037-0cbf62a61acd@sorbs.net>
 <FAB7C3BA-057F-4AB4-96E1-5C3208BABBA7@gmail.com>
 <027070fb-f7b5-3862-3a52-c0f280ab46d1@sorbs.net>
 <42C31457-1A84-4CCA-BF14-357F1F3177DA@gmail.com>
 <5eb35692-37ab-33bf-aea1-9f4aa61bb7f7@sorbs.net>
 <3be04f0b-bded-9b77-896b-631824a14c4a@sorbs.net>
In-reply-to: <3be04f0b-bded-9b77-896b-631824a14c4a@sorbs.net>
To: Ben RUBSON <ben.rubson@gmail.com>,
 "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>,
 Stefan Esser <se@freebsd.org>
X-Mailer: iPad Mail (16A404)
X-Rspamd-Queue-Id: E740887C2A
X-Spamd-Bar: -
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of michelle@sorbs.net designates
 72.12.213.40 as permitted sender) smtp.mailfrom=michelle@sorbs.net
X-Spamd-Result: default: False [-1.08 / 15.00]; ARC_NA(0.00)[];
 TO_DN_EQ_ADDR_SOME(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[];
 FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3];
 R_SPF_ALLOW(-0.20)[+a:hades.sorbs.net];
 NEURAL_HAM_LONG(-0.99)[-0.986,0]; TAGGED_RCPT(0.00)[];
 MIME_GOOD(-0.10)[multipart/alternative,text/plain,multipart/related];
 DMARC_NA(0.00)[sorbs.net]; TO_DN_SOME(0.00)[];
 NEURAL_SPAM_SHORT(0.86)[0.861,0];
 TO_MATCH_ENVRCPT_SOME(0.00)[];
 MX_GOOD(-0.01)[cached: battlestar.sorbs.net];
 RCVD_IN_DNSWL_NONE(0.00)[40.213.12.72.list.dnswl.org : 127.0.10.0];
 NEURAL_HAM_MEDIUM(-0.73)[-0.729,0];
 IP_SCORE(-0.01)[country: US(-0.07)];
 FREEMAIL_TO(0.00)[gmail.com]; RCVD_NO_TLS_LAST(0.10)[];
 FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[];
 MIME_TRACE(0.00)[0:+,1:+,2:+];
 ASN(0.00)[asn:11114, ipnet:72.12.192.0/19, country:US];
 MID_RHS_MATCH_FROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2]
X-Mailman-Approved-At: Sun, 10 Mar 2019 10:49:24 +0000
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 10 Mar 2019 02:35:40 -0000

Turns out the cause of the fire is now known...

https://www.southcoastregister.com.au/story/5945663/homes-left-without-power=
-after-electrical-pole-destroyed-in-sanctuary-point-accident/ UPSs couldn=E2=
=80=99t deal with 11kv down the 240v line... (guess I=E2=80=99m lucky no one=
 was killed..)

Anyhow..


Any clues on how to get the pool back would be greatly appreciated..  the =E2=
=80=9Ccannot open=E2=80=9D disk was the faulted disk that mfid13 was replaci=
ng...  being raidz2 there should be all the data there=20


Michelle Sullivan
http://www.mhix.org/
Sent from my iPad

> On 10 Mar 2019, at 12:29, Michelle Sullivan <michelle@sorbs.net> wrote:
>=20
> Michelle Sullivan wrote:
>> Ben RUBSON wrote:
>>>> On 02 Feb 2018 21:48, Michelle Sullivan wrote:
>>>>=20
>>>> Ben RUBSON wrote:
>>>>=20
>>>>> So disks died because of the carrier, as I assume the second unscathed=
 server was OK...
>>>>=20
>>>> Pretty much.
>>>>=20
>>>>> Heads must have scratched the platters, but they should have been park=
ed, so... Really strange.
>>>>=20
>>>> You'd have thought... though 2 of the drives look like it was wear and w=
ear issues (the 2 not showing red lights) just not picked up on the periodic=
 scrub....  Could be that the recovery showed that one up... you know - how y=
ou can have an array working fine, but one disk dies then others fail during=
 the rebuild because of the extra workload.
>>>=20
>>> Yes... To try to mitigate this, when I add a new vdev to a pool, I sprea=
d the new disks I have among the existing vdevs, and construct the new vdev w=
ith the remaining new disk(s) + other disks retrieved from the other vdevs. T=
hus, when possible, avoiding vdevs with all disks at the same runtime.
>>> However I only use mirrors, applying this with raid-Z could be a little b=
it more tricky...
>>>=20
>> Believe it or not...
>>=20
>> # zpool status -v
>>  pool: VirtualDisks
>> state: ONLINE
>> status: One or more devices are configured to use a non-native block size=
.
>>    Expect reduced performance.
>> action: Replace affected devices with devices that support the
>>    configured block size, or migrate data to a properly configured
>>    pool.
>>  scan: none requested
>> config:
>>=20
>>    NAME                       STATE     READ WRITE CKSUM
>>    VirtualDisks               ONLINE       0     0     0
>>      zvol/sorbs/VirtualDisks  ONLINE       0     0     0  block size: 512=
B configured, 8192B native
>>=20
>> errors: No known data errors
>>=20
>>  pool: sorbs
>> state: ONLINE
>>  scan: resilvered 2.38T in 307445734561816429h29m with 0 errors on Sat Au=
g 26 09:26:53 2017
>> config:
>>=20
>>    NAME                  STATE     READ WRITE CKSUM
>>    sorbs                 ONLINE       0     0     0
>>      raidz2-0            ONLINE       0     0     0
>>        mfid0             ONLINE       0     0     0
>>        mfid1             ONLINE       0     0     0
>>        mfid7             ONLINE       0     0     0
>>        mfid8             ONLINE       0     0     0
>>        mfid12            ONLINE       0     0     0
>>        mfid10            ONLINE       0     0     0
>>        mfid14            ONLINE       0     0     0
>>        mfid11            ONLINE       0     0     0
>>        mfid6             ONLINE       0     0     0
>>        mfid15            ONLINE       0     0     0
>>        mfid2             ONLINE       0     0     0
>>        mfid3             ONLINE       0     0     0
>>        spare-12          ONLINE       0     0     3
>>          mfid13          ONLINE       0     0     0
>>          mfid9           ONLINE       0     0     0
>>        mfid4             ONLINE       0     0     0
>>        mfid5             ONLINE       0     0     0
>>    spares
>>      185579620420611382  INUSE     was /dev/mfid9
>>=20
>> errors: No known data errors
>>=20
>>=20
>> It would appear that the when I replaced the damaged drives it picked one=
 of them up as being rebuilt from back in August (before it was packed up to=
 go) and that was why it saw it as 'corrupted metadata' and spent the last 3=
 weeks importing it, it rebuilt it as it was importing it.. no dataloss that=
 I can determine. (literally just finished in the middle of the night here.)=

>>=20
>=20
> And back to this little nutmeg...
>=20
> We had a fire last night ...  and it (the same pool) was resilvering again=
...  Corrupted the metadata.. import -fFX worked and it started rebuilding, t=
hen during the early hours when the pool was at 50%(ish) rebuilt/resilvered (=
one vdev) there was at last one more issue on the powerline... UPSs went out=
 after multiple hits and now can't get it imported - the server was in singl=
e user mode - on a FBSD-12 USB stick ...  so it was only resilvering...  "zd=
b -AAA -L -uhdi -FX -e storage" returns sanely...
>=20
> anyone any thoughts how I might get the data back/pool to import? (zpool i=
mport -fFX storage spends a long time working and eventually comes back with=
 unable to import as one or more of the vdevs are unavailable - however they=
 are all there as far as I can tell)
>=20
> THanks,
>=20
> --=20
> Michelle Sullivan
> http://www.mhix.org/
>=20