From owner-freebsd-stable@freebsd.org  Tue Apr 30 14:12:36 2019
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 86DE41592B91
 for <freebsd-stable@mailman.ysv.freebsd.org>;
 Tue, 30 Apr 2019 14:12:36 +0000 (UTC)
 (envelope-from asomers@gmail.com)
Received: from mail-lf1-f68.google.com (mail-lf1-f68.google.com
 [209.85.167.68])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id B76DE7306F
 for <freebsd-stable@freebsd.org>; Tue, 30 Apr 2019 14:12:35 +0000 (UTC)
 (envelope-from asomers@gmail.com)
Received: by mail-lf1-f68.google.com with SMTP id j11so11023110lfm.0
 for <freebsd-stable@freebsd.org>; Tue, 30 Apr 2019 07:12:35 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=8QjjSl9eEUAdwTjVUxd+F43h3w03QzC58axBLeNWB4U=;
 b=AQcZ08jp/bph9pIYYu+LffPairkHgZ6CNRxrYNDmLcokrzkyfRdDq64KlH9exazKu9
 R+NSEbI4MB2s/voAK5X9ltcxr99dDR/JtahKIDuJ8+rH1sHeQqtFiB4ILvZZVJgltB6w
 Y5ceNI1dJDTOKQmE04g5Ljex3K775Xq+4hBC1i9+cQG8KSE1Zzy7TybRI6r9s1miDMl+
 tfFEHJLedndhgwyQBUy8/qZyeGDnV7JitM9JmmfWRb/FmrSgAmxz8byDvFJY1UekqB/C
 imJH7EadPbG5C0aRNGeb0H6paXB+eU/SBKY7RZrY1fZUhqjqTnjbeWRvJP/lR5hsq9V4
 X8sQ==
X-Gm-Message-State: APjAAAWgDiT9CNWdVbjlC8V9ihAcDaFIfRQQ0pjEnK9Zclu23WiNUu1f
 SqOVSL//d6dGK+WG9NrOWFM+izeKRqTOu3YMWNk=
X-Google-Smtp-Source: APXvYqz8y/1SOtTTmwB3aE0uvgGjUdXS3nxUafDmxn4iEGcywv3QwDCQy5QBmpqpIX7md75t32m8jJlscT9ci5k2DMQ=
X-Received: by 2002:a05:6512:c8:: with SMTP id
 c8mr35851962lfp.138.1556633554055; 
 Tue, 30 Apr 2019 07:12:34 -0700 (PDT)
MIME-Version: 1.0
References: <30506b3d-64fb-b327-94ae-d9da522f3a48@sorbs.net>
 <CAOtMX2gf3AZr1-QOX_6yYQoqE-H+8MjOWc=eK1tcwt5M3dCzdw@mail.gmail.com>
 <56833732-2945-4BD3-95A6-7AF55AB87674@sorbs.net>
 <3d0f6436-f3d7-6fee-ed81-a24d44223f2f@netfence.it>
 <17B373DA-4AFC-4D25-B776-0D0DED98B320@sorbs.net>
 <70fac2fe3f23f85dd442d93ffea368e1@ultra-secure.de>
 <70C87D93-D1F9-458E-9723-19F9777E6F12@sorbs.net>
 <CAGMYy3tYqvrKgk2c==WTwrH03uTN1xQifPRNxXccMsRE1spaRA@mail.gmail.com>
 <5ED8BADE-7B2C-4B73-93BC-70739911C5E3@sorbs.net>
 <d0118f7e-7cfc-8bf1-308c-823bce088039@denninger.net>
 <2e4941bf-999a-7f16-f4fe-1a520f2187c0@sorbs.net>
 <CAOtMX2gOwwZuGft2vPpR-LmTpMVRy6hM_dYy9cNiw+g1kDYpXg@mail.gmail.com>
 <34539589-162B-4891-A68F-88F879B59650@sorbs.net>
In-Reply-To: <34539589-162B-4891-A68F-88F879B59650@sorbs.net>
From: Alan Somers <asomers@freebsd.org>
Date: Tue, 30 Apr 2019 08:12:22 -0600
Message-ID: <CAOtMX2iB7xJszO8nT_KU+rFuSkTyiraMHddz1fVooe23bEZguA@mail.gmail.com>
Subject: Re: ZFS...
To: Michelle Sullivan <michelle@sorbs.net>
Cc: Karl Denninger <karl@denninger.net>, FreeBSD <freebsd-stable@freebsd.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: B76DE7306F
X-Spamd-Bar: ---
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of asomers@gmail.com designates
 209.85.167.68 as permitted sender) smtp.mailfrom=asomers@gmail.com
X-Spamd-Result: default: False [-3.63 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; FROM_HAS_DN(0.00)[];
 RCPT_COUNT_THREE(0.00)[3];
 R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org];
 DMARC_NA(0.00)[freebsd.org]; NEURAL_HAM_SHORT(-0.84)[-0.843,0];
 MIME_TRACE(0.00)[0:+]; TO_DN_ALL(0.00)[];
 MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com];
 TO_MATCH_ENVRCPT_SOME(0.00)[];
 RCVD_IN_DNSWL_NONE(0.00)[68.167.85.209.list.dnswl.org : 127.0.5.0];
 SUBJ_ALL_CAPS(0.45)[6]; RCVD_TLS_LAST(0.00)[];
 FORGED_SENDER(0.30)[asomers@freebsd.org,asomers@gmail.com];
 RWL_MAILSPIKE_POSSIBLE(0.00)[68.167.85.209.rep.mailspike.net : 127.0.0.17];
 R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com];
 ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US];
 FROM_NEQ_ENVFROM(0.00)[asomers@freebsd.org,asomers@gmail.com];
 IP_SCORE(-1.23)[ipnet: 209.85.128.0/17(-3.86), asn: 15169(-2.24), country:
 US(-0.06)]; RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Apr 2019 14:12:36 -0000

On Tue, Apr 30, 2019 at 8:05 AM Michelle Sullivan <michelle@sorbs.net> wrot=
e:
>
>
>
> Michelle Sullivan
> http://www.mhix.org/
> Sent from my iPad
>
> > On 01 May 2019, at 00:01, Alan Somers <asomers@freebsd.org> wrote:
> >
> >> On Tue, Apr 30, 2019 at 7:30 AM Michelle Sullivan <michelle@sorbs.net>=
 wrote:
> >>
> >> Karl Denninger wrote:
> >>> On 4/30/2019 05:14, Michelle Sullivan wrote:
> >>>>>> On 30 Apr 2019, at 19:50, Xin LI <delphij@gmail.com> wrote:
> >>>>>> On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan <michelle@sorbs.=
net> wrote:
> >>>>>> but in my recent experience 2 issues colliding at the same time re=
sults in disaster
> >>>>> Do we know exactly what kind of corruption happen to your pool?  If=
 you see it twice in a row, it might suggest a software bug that should be =
investigated.
> >>>>>
> >>>>> All I know is it=E2=80=99s a checksum error on a meta slab (122) an=
d from what I can gather it=E2=80=99s the spacemap that is corrupt... but I=
 am no expert.  I don=E2=80=99t believe it=E2=80=99s a software fault as su=
ch, because this was cause by a hard outage (damaged UPSes) whilst resilver=
ing a single (but completely failed) drive.  ...and after the first outage =
a second occurred (same as the first but more damaging to the power hardwar=
e)... the host itself was not damaged nor were the drives or controller.
> >>> .....
> >>>>> Note that ZFS stores multiple copies of its essential metadata, and=
 in my experience with my old, consumer grade crappy hardware (non-ECC RAM,=
 with several faulty, single hard drive pool: bad enough to crash almost mo=
nthly and damages my data from time to time),
> >>>> This was a top end consumer grade mb with non ecc ram that had been =
running for 8+ years without fault (except for hard drive platter failures.=
). Uptime would have been years if it wasn=E2=80=99t for patching.
> >>> Yuck.
> >>>
> >>> I'm sorry, but that may well be what nailed you.
> >>>
> >>> ECC is not just about the random cosmic ray.  It also saves your baco=
n
> >>> when there are power glitches.
> >>
> >> No. Sorry no.  If the data is only half to disk, ECC isn't going to sa=
ve
> >> you at all... it's all about power on the drives to complete the write=
.
> >
> > ECC RAM isn't about saving the last few seconds' worth of data from
> > before a power crash.  It's about not corrupting the data that gets
> > written long before a crash.  If you have non-ECC RAM, then a cosmic
> > ray/alpha ray/row hammer attack/bad luck can corrupt data after it's
> > been checksummed but before it gets DMAed to disk.  Then disk will
> > contain corrupt data and you won't know it until you try to read it
> > back.
>
> I know this... unless I misread Karl=E2=80=99s message he implied the ECC=
 would have saved the corruption in the crash... which is patently false...=
 I think you=E2=80=99ll agree..

I don't think that's what Karl meant.  I think he meant that the
non-ECC RAM could've caused latent corruption that was only detected
when the crash forced a reboot and resilver.

>
> Michelle
>
>
> >
> > -Alan
> >
> >>>
> >>> Unfortunately however there is also cache memory on most modern hard
> >>> drives, most of the time (unless you explicitly shut it off) it's on =
for
> >>> write caching, and it'll nail you too.  Oh, and it's never, in my
> >>> experience, ECC.
> >
> > Fortunately, ZFS never sends non-checksummed data to the hard drive.
> > So an error in the hard drive's cache ram will usually get detected by
> > the ZFS checksum.
> >
> >>
> >> No comment on that - you're right in the first part, I can't comment i=
f
> >> there are drives with ECC.
> >>
> >>>
> >>> In addition, however, and this is something I learned a LONG time ago
> >>> (think Z-80 processors!) is that as in so many very important things
> >>> "two is one and one is none."
> >>>
> >>> In other words without a backup you WILL lose data eventually, and it
> >>> WILL be important.
> >>>
> >>> Raidz2 is very nice, but as the name implies it you have two
> >>> redundancies.  If you take three errors, or if, God forbid, you *writ=
e*
> >>> a block that has a bad checksum in it because it got scrambled while =
in
> >>> RAM, you're dead if that happens in the wrong place.
> >>
> >> Or in my case you write part data therefore invalidating the checksum.=
..
> >>>
> >>>> Yeah.. unlike UFS that has to get really really hosed to restore fro=
m backup with nothing recoverable it seems ZFS can get hosed where issues o=
ccur in just the wrong bit... but mostly it is recoverable (and my experien=
ce has been some nasty shit that always ended up being recoverable.)
> >>>>
> >>>> Michelle
> >>> Oh that is definitely NOT true.... again, from hard experience,
> >>> including (but not limited to) on FreeBSD.
> >>>
> >>> My experience is that ZFS is materially more-resilient but there is n=
o
> >>> such thing as "can never be corrupted by any set of events."
> >>
> >> The latter part is true - and my blog and my current situation is not
> >> limited to or aimed at FreeBSD specifically,  FreeBSD is my experience=
.
> >> The former part... it has been very resilient, but I think (based on
> >> this certain set of events) it is easily corruptible and I have just
> >> been lucky.  You just have to hit a certain write to activate the issu=
e,
> >> and whilst that write and issue might be very very difficult (read: hi=
t
> >> and miss) to hit in normal every day scenarios it can and will
> >> eventually happen.
> >>
> >>>   Backup
> >>> strategies for moderately large (e.g. many Terabytes) to very large
> >>> (e.g. Petabytes and beyond) get quite complex but they're also very
> >>> necessary.
> >>>
> >> and there in lies the problem.  If you don't have a many 10's of
> >> thousands of dollars backup solutions, you're either:
> >>
> >> 1/ down for a looooong time.
> >> 2/ losing all data and starting again...
> >>
> >> ..and that's the problem... ufs you can recover most (in most
> >> situations) and providing the *data* is there uncorrupted by the fault
> >> you can get it all off with various tools even if it is a complete
> >> mess....  here I am with the data that is apparently ok, but the
> >> metadata is corrupt (and note: as I had stopped writing to the drive
> >> when it started resilvering the data - all of it - should be intact...
> >> even if a mess.)
> >>
> >> Michelle
> >>
> >> --
> >> Michelle Sullivan
> >> http://www.mhix.org/
> >>
> >> _______________________________________________
> >> freebsd-stable@freebsd.org mailing list
> >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> >> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.o=
rg"