From owner-freebsd-stable@freebsd.org  Wed May  1 23:46:28 2019
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E7E16158139F
 for <freebsd-stable@mailman.ysv.freebsd.org>;
 Wed,  1 May 2019 23:46:27 +0000 (UTC)
 (envelope-from michelle@sorbs.net)
Received: from hades.sorbs.net (hades.sorbs.net [72.12.213.40])
 by mx1.freebsd.org (Postfix) with ESMTP id 3A2026BD54
 for <freebsd-stable@freebsd.org>; Wed,  1 May 2019 23:46:27 +0000 (UTC)
 (envelope-from michelle@sorbs.net)
MIME-version: 1.0
Received: from [10.10.0.230] (gate.mhix.org [203.206.128.220])
 by hades.sorbs.net
 (Oracle Communications Messaging Server 7.0.5.29.0 64bit (built Jul 9 2013))
 with ESMTPSA id <0PQU00MV5O0JZ500@hades.sorbs.net> for
 freebsd-stable@freebsd.org; Wed, 01 May 2019 17:00:23 -0700 (PDT)
Sun-Java-System-SMTP-Warning: Lines longer than SMTP allows found and
 truncated.
Subject: Re: ZFS...
From: Michelle Sullivan <michelle@sorbs.net>
X-Mailer: iPad Mail (16A404)
In-reply-to: <47137ea9-1ab2-1271-c15f-c0c05a17b92f@multiplay.co.uk>
Date: Thu, 02 May 2019 09:46:22 +1000
Cc: Paul Mather <paul@gromit.dlib.vt.edu>,
 freebsd-stable <freebsd-stable@freebsd.org>
Message-id: <289FE04E-1692-4763-96B3-91E8C1BBBBD6@sorbs.net>
References: <30506b3d-64fb-b327-94ae-d9da522f3a48@sorbs.net>
 <17B373DA-4AFC-4D25-B776-0D0DED98B320@sorbs.net>
 <70fac2fe3f23f85dd442d93ffea368e1@ultra-secure.de>
 <70C87D93-D1F9-458E-9723-19F9777E6F12@sorbs.net>
 <CAGMYy3tYqvrKgk2c==WTwrH03uTN1xQifPRNxXccMsRE1spaRA@mail.gmail.com>
 <5ED8BADE-7B2C-4B73-93BC-70739911C5E3@sorbs.net>
 <d0118f7e-7cfc-8bf1-308c-823bce088039@denninger.net>
 <2e4941bf-999a-7f16-f4fe-1a520f2187c0@sorbs.net>
 <CAOtMX2gOwwZuGft2vPpR-LmTpMVRy6hM_dYy9cNiw+g1kDYpXg@mail.gmail.com>
 <34539589-162B-4891-A68F-88F879B59650@sorbs.net>
 <CAOtMX2iB7xJszO8nT_KU+rFuSkTyiraMHddz1fVooe23bEZguA@mail.gmail.com>
 <576857a5-a5ab-eeb8-2391-992159d9c4f2@denninger.net>
 <A7928311-8F51-4C72-839C-C9C2BA62C66E@sorbs.net>
 <b0fa0f8e-dc45-9d66-cc48-c733cbb9645b@denninger.net>
 <FD9802E0-E2E4-464A-8ABD-83B0A21C08F2@sorbs.net> <bf63007@sorbs.net>
 <CB86C16D-87D9-4D3F-9291-1E2586246E04@sorbs.net>
 <7DBA7907-BE8F-4944-9A71-86E5AC1B85CA@gromit.dlib.vt.edu>
 <5c458075-351f-6eb6-44aa-1bd268398343@sorbs.net>
To: Steven Hartland <killing@multiplay.co.uk>
X-Rspamd-Queue-Id: 3A2026BD54
X-Spamd-Bar: ---
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of michelle@sorbs.net designates
 72.12.213.40 as permitted sender) smtp.mailfrom=michelle@sorbs.net
X-Spamd-Result: default: False [-3.14 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[];
 R_SPF_ALLOW(-0.20)[+a:hades.sorbs.net]; TO_DN_ALL(0.00)[];
 MX_GOOD(-0.01)[cached: battlestar.sorbs.net];
 NEURAL_HAM_SHORT(-0.88)[-0.883,0]; SUBJ_ALL_CAPS(0.45)[6];
 RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+];
 ASN(0.00)[asn:11114, ipnet:72.12.192.0/19, country:US];
 MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-0.98)[-0.985,0]; FROM_HAS_DN(0.00)[];
 RCPT_COUNT_THREE(0.00)[3]; NEURAL_HAM_LONG(-1.00)[-1.000,0];
 MIME_GOOD(-0.10)[multipart/alternative,text/plain];
 DMARC_NA(0.00)[sorbs.net]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 RCVD_IN_DNSWL_NONE(0.00)[40.213.12.72.list.dnswl.org : 127.0.10.0];
 IP_SCORE(-0.51)[ip: (-1.31), ipnet: 72.12.192.0/19(-0.66), asn: 11114(-0.51),
 country: US(-0.06)]; RCVD_COUNT_TWO(0.00)[2]
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 May 2019 23:46:28 -0000


Michelle Sullivan
http://www.mhix.org/
Sent from my iPad

> On 02 May 2019, at 03:39, Steven Hartland <killing@multiplay.co.uk> wrote:=

>=20
>=20
>=20
>> On 01/05/2019 15:53, Michelle Sullivan wrote:
>> Paul Mather wrote:
>>>> On Apr 30, 2019, at 11:17 PM, Michelle Sullivan <michelle@sorbs.net> wr=
ote:
>>>>=20
>>>> Been there done that though with ext2 rather than UFS..  still got all m=
y data back... even though it was a nightmare..
>>>=20
>>>=20
>>> Is that an implication that had all your data been on UFS (or ext2:) thi=
s time around you would have got it all back?  (I've got that impression thr=
ough this thread from things you've written.) That sort of makes it sound li=
ke UFS is bulletproof to me.
>>=20
>> Its definitely not (and far from it) bullet proof - however when the data=
 on disk is not corrupt I have managed to recover it - even if it has been a=
 nightmare - no structure - all files in lost+found etc... or even resorting=
 to r-studio in the even of lost raid information etc..
> Yes but you seem to have done this with ZFS too, just not in this particul=
arly bad case.
>=20

There is no r-studio for zfs or I would have turned to it as soon as this is=
sue hit.


> If you imagine that the in memory update for the metadata was corrupted an=
d then written out to disk, which is what you seem to have experienced with y=
our ZFS pool, then you'd be in much the same position.
>>=20
>> This case - from what my limited knowledge has managed to fathom is a spa=
cemap has become corrupt due to partial write during the hard power failure.=
 This was the second hard outage during the resilver process following a dri=
ve platter failure (on a ZRAID2 - so single platter failure should be comple=
tely recoverable all cases - except hba failure or other corruption which do=
es not appear to be the case).. the spacemap fails checksum (no surprises th=
ere being that it was part written) however it cannot be repaired (for what e=
ver reason))... how I get that this is an interesting case... one cannot jus=
t assume anything about the corrupt spacemap... it could be complete and jus=
t the checksum is wrong, it could be completely corrupt and ignorable.. but w=
hat I understand of ZFS (and please watchers chime in if I'm wrong) the spac=
emap is just the freespace map.. if corrupt or missing one cannot just 'fix i=
t' because there is a very good chance that the fix would corrupt something t=
hat is actually allocated and therefore the best solution would be (to "fix i=
t") would be consider it 100% full and therefore 'dead space' .. but zfs doe=
sn't do that - probably a good thing - the result being that a drive that is=
 supposed to be good (and zdb reports some +36m objects there) becomes compl=
etely unreadable ...  my thought (desire/want) on a 'walk' tool would be a l=
ast resort tool that could walk the datasets and send them elsewhere (like z=
fs send) so that I could create a new pool elsewhere and send the data it kn=
ows about to another pool and then blow away the original - if there are cor=
ruptions or data missing, thats my problem it's a last resort.. but in the c=
ase the critical structures become corrupt it means a local recovery option i=
s enabled.. it means that if the data is all there and the corruption is jus=
t a spacemap one can transfer the entire drive/data to a new pool whilst the=
 original host is rebuilt... this would *significantly* help most people wit=
h large pools that have to blow them away and re-create the pools because of=
 errors/corruptions etc... and with the addition of 'rsync' (the checksummin=
g of files) it would be trivial to just 'fix' the data corrupted or missing f=
rom a mirror host rather than transferring the entire pool from (possibly) o=
ffsite....
>=20
> =46rom what I've read that's not a partial write issue, as in that case th=
e pool would have just rolled back. It sounds more like the write was succes=
sful but the data in that write was trashed due to your power incident and t=
hat was replicated across ALL drives.
>=20

I think this might be where the problem started.. it was already rolling bac=
k from the first power issue (it did exactly what was expected and programme=
d, it rolled back 5 seconds.. which as no-one had write access to it from th=
e start of the resilver I really didn=E2=80=99t care as the only changes wer=
e the resilver itself.). Now you assertion/musing maybe correct...  all driv=
es got trashed data.. I think not but unless we get into it and examine it I=
 think we won=E2=80=99t know.  What I do know is in the second round -FfX wo=
uldn=E2=80=99t work, I used zdb to locate a =E2=80=9CLOADED=E2=80=9D MOS and=
 used -t <txg> to import.. the txg number was 7 or 8 from current so just ou=
tside of the -X limit (going off memory here, so could have been more, but I=
 remember it was just past the switch limit.)

> To be clear this may or may not be what your seeing as you don't see to ha=
ve covered any of the details of the issues your seeing and what in detail s=
teps you have tried to recover with?

There have been many steps over the last month.. and some of which I may hav=
e made it from very difficult to recover to non recoverable now... though th=
e only writes is what the kernel does as have not got it (the dataset) mount=
ed at any time, even though it has imported.

>=20
> I'm not saying this is the case but all may not be lost depending on the e=
xact nature of the corruption.
>=20
> For more information on space maps see:
> https://www.delphix.com/blog/delphix-engineering/openzfs-code-walk-metasla=
bs-and-space-maps

This is something I read a month ago, along with multiple other articles on t=
he same blog, including https://www.delphix.com/blog/openzfs-pool-import-rec=
overy

Which I might add got me from non importable to importable but not mountable=
.

I have *not* attempted to bypass the checksum line for spacemap load to date=
 as I see that as a possible way to make the problem worse.

> https://sdimitro.github.io/post/zfs-lsm-flushing/

Not read this.

>=20
> A similar behavior resulted in being a bug:
> https://www.reddit.com/r/zfs/comments/97czae/zfs_zdb_space_map_errors_on_u=
nmountable_zpool/
>=20

Or this.. will go there following =E2=80=9Cpressing send=E2=80=9D.. :)

>     Regards
>     Steve
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"