From owner-freebsd-stable@freebsd.org  Tue Apr 30 10:14:26 2019
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 33741158B5D8;
 Tue, 30 Apr 2019 10:14:26 +0000 (UTC)
 (envelope-from michelle@sorbs.net)
Received: from hades.sorbs.net (hades.sorbs.net [72.12.213.40])
 by mx1.freebsd.org (Postfix) with ESMTP id D582897491;
 Tue, 30 Apr 2019 10:14:24 +0000 (UTC)
 (envelope-from michelle@sorbs.net)
MIME-version: 1.0
Received: from [10.10.0.230] (gate.mhix.org [203.206.128.220])
 by hades.sorbs.net
 (Oracle Communications Messaging Server 7.0.5.29.0 64bit (built Jul 9 2013))
 with ESMTPSA id <0PQR002N8RR3Y400@hades.sorbs.net>; Tue,
 30 Apr 2019 03:28:19 -0700 (PDT)
Subject: Re: ZFS...
From: Michelle Sullivan <michelle@sorbs.net>
X-Mailer: iPad Mail (16A404)
In-reply-to: <CAGMYy3tYqvrKgk2c==WTwrH03uTN1xQifPRNxXccMsRE1spaRA@mail.gmail.com>
Date: Tue, 30 Apr 2019 20:14:19 +1000
Cc: rainer@ultra-secure.de, owner-freebsd-stable@freebsd.org,
 freebsd-stable <freebsd-stable@freebsd.org>, Andrea Venturoli <ml@netfence.it>
Message-id: <5ED8BADE-7B2C-4B73-93BC-70739911C5E3@sorbs.net>
References: <30506b3d-64fb-b327-94ae-d9da522f3a48@sorbs.net>
 <CAOtMX2gf3AZr1-QOX_6yYQoqE-H+8MjOWc=eK1tcwt5M3dCzdw@mail.gmail.com>
 <56833732-2945-4BD3-95A6-7AF55AB87674@sorbs.net>
 <3d0f6436-f3d7-6fee-ed81-a24d44223f2f@netfence.it>
 <17B373DA-4AFC-4D25-B776-0D0DED98B320@sorbs.net>
 <70fac2fe3f23f85dd442d93ffea368e1@ultra-secure.de>
 <70C87D93-D1F9-458E-9723-19F9777E6F12@sorbs.net>
 <CAGMYy3tYqvrKgk2c==WTwrH03uTN1xQifPRNxXccMsRE1spaRA@mail.gmail.com>
To: Xin LI <delphij@gmail.com>
X-Rspamd-Queue-Id: D582897491
X-Spamd-Bar: ---
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of michelle@sorbs.net designates
 72.12.213.40 as permitted sender) smtp.mailfrom=michelle@sorbs.net
X-Spamd-Result: default: False [-3.81 / 15.00]; ARC_NA(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[];
 R_SPF_ALLOW(-0.20)[+a:hades.sorbs.net];
 NEURAL_HAM_LONG(-1.00)[-1.000,0];
 MIME_GOOD(-0.10)[multipart/alternative,text/plain];
 DMARC_NA(0.00)[sorbs.net]; RCPT_COUNT_FIVE(0.00)[5];
 TO_MATCH_ENVRCPT_SOME(0.00)[];
 MX_GOOD(-0.01)[cached: battlestar.sorbs.net];
 NEURAL_HAM_SHORT(-0.99)[-0.985,0];
 RCVD_IN_DNSWL_NONE(0.00)[40.213.12.72.list.dnswl.org : 127.0.10.0];
 SUBJ_ALL_CAPS(0.45)[6];
 IP_SCORE(-1.06)[ip: (-2.92), ipnet: 72.12.192.0/19(-1.33), asn: 11114(-1.01),
 country: US(-0.06)]; FREEMAIL_TO(0.00)[gmail.com];
 RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+];
 ASN(0.00)[asn:11114, ipnet:72.12.192.0/19, country:US];
 MID_RHS_MATCH_FROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2]
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Apr 2019 10:14:26 -0000


Michelle Sullivan
http://www.mhix.org/
Sent from my iPad

> On 30 Apr 2019, at 19:50, Xin LI <delphij@gmail.com> wrote:
>=20
>=20
>> On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan <michelle@sorbs.net> wr=
ote:
>> but in my recent experience 2 issues colliding at the same time results i=
n disaster
>=20
> Do we know exactly what kind of corruption happen to your pool?  If you se=
e it twice in a row, it might suggest a software bug that should be investig=
ated.

All I know is it=E2=80=99s a checksum error on a meta slab (122) and from wh=
at I can gather it=E2=80=99s the spacemap that is corrupt... but I am no exp=
ert.  I don=E2=80=99t believe it=E2=80=99s a software fault as such, because=
 this was cause by a hard outage (damaged UPSes) whilst resilvering a single=
 (but completely failed) drive.  ...and after the first outage a second occu=
rred (same as the first but more damaging to the power hardware)... the host=
 itself was not damaged nor were the drives or controller.

>=20
> Note that ZFS stores multiple copies of its essential metadata, and in my e=
xperience with my old, consumer grade crappy hardware (non-ECC RAM, with sev=
eral faulty, single hard drive pool: bad enough to crash almost monthly and d=
amages my data from time to time),

This was a top end consumer grade mb with non ecc ram that had been running f=
or 8+ years without fault (except for hard drive platter failures.). Uptime w=
ould have been years if it wasn=E2=80=99t for patching.

> I've never seen a corruption this bad and I was always able to recover the=
 pool.=20

So far, same.

> At previous employer, the only case that we had the pool corrupted enough t=
o the point that mount was not allowed was because two host nodes happen to i=
mport the pool at the same time, which is a situation that can be avoided wi=
th SCSI reservation; their hardware was of much better quality, though.
>=20
> Speaking for a tool like 'fsck': I think I'm mostly convinced that it's no=
t necessary, because at the point ZFS says the metadata is corrupted, it mea=
ns that these metadata was really corrupted beyond repair (all replicas were=
 corrupted; otherwise it would recover by finding out the right block and re=
write the bad ones).

I see this message all the time and mostly agree.. actually I do agree with p=
ossibly a minor exception, but so minor it=E2=80=99s probably not worth it. =
 However as I suggested in my original post.. the pool says the files are th=
ere, a tool that would send them (aka zfs send) but ignoring errors to space=
maps etc would be real useful (to me.)

>=20
> An interactive tool may be useful (e.g. "I saw data structure version 1, 2=
, 3 available, and all with bad checksum, choose which one you would want to=
 try"), but I think they wouldn't be very practical for use with large data p=
ools -- unlike traditional filesystems, ZFS uses copy-on-write and heavily d=
epends on the metadata to find where the data is, and a regular "scan" is no=
t really useful.

Zdb -AAA showed (shows) 36m files..  which suggests the data is intact, but i=
t aborts the mount with I/o error because it says metadata has three errors.=
. 2 =E2=80=98metadata=E2=80=99 and one =E2=80=9C<storage:0x0>=E2=80=9D (stor=
age being the pool name).. it does import, and it attempts to resilver but r=
eports the resilver finishes at some 780M (ish).. export import and it does i=
t all again...  zdb without -AAA aborts loading metaslab 122.

>=20
> I'd agree that you need a full backup anyway, regardless what storage syst=
em is used, though.

Yeah.. unlike UFS that has to get really really hosed to restore from backup=
 with nothing recoverable it seems ZFS can get hosed where issues occur in j=
ust the wrong bit... but mostly it is recoverable (and my experience has bee=
n some nasty shit that always ended up being recoverable.)

Michelle=20=