FreeBSD Mail Archives

Date:      Tue, 30 Apr 2019 20:14:19 +1000
From:      Michelle Sullivan <michelle@sorbs.net>
To:        Xin LI <delphij@gmail.com>
Cc:        rainer@ultra-secure.de, owner-freebsd-stable@freebsd.org, freebsd-stable <freebsd-stable@freebsd.org>, Andrea Venturoli <ml@netfence.it>
Subject:   Re: ZFS...
Message-ID:  <5ED8BADE-7B2C-4B73-93BC-70739911C5E3@sorbs.net>
In-Reply-To: <CAGMYy3tYqvrKgk2c==WTwrH03uTN1xQifPRNxXccMsRE1spaRA@mail.gmail.com>
References:  <30506b3d-64fb-b327-94ae-d9da522f3a48@sorbs.net> <CAOtMX2gf3AZr1-QOX_6yYQoqE-H%2B8MjOWc=eK1tcwt5M3dCzdw@mail.gmail.com> <56833732-2945-4BD3-95A6-7AF55AB87674@sorbs.net> <3d0f6436-f3d7-6fee-ed81-a24d44223f2f@netfence.it> <17B373DA-4AFC-4D25-B776-0D0DED98B320@sorbs.net> <70fac2fe3f23f85dd442d93ffea368e1@ultra-secure.de> <70C87D93-D1F9-458E-9723-19F9777E6F12@sorbs.net> <CAGMYy3tYqvrKgk2c==WTwrH03uTN1xQifPRNxXccMsRE1spaRA@mail.gmail.com>

Michelle Sullivan
http://www.mhix.org/
Sent from my iPad

> On 30 Apr 2019, at 19:50, Xin LI <delphij@gmail.com> wrote:
>=20
>=20
>> On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan <michelle@sorbs.net> wr=
ote:
>> but in my recent experience 2 issues colliding at the same time results i=
n disaster
>=20
> Do we know exactly what kind of corruption happen to your pool?  If you se=
e it twice in a row, it might suggest a software bug that should be investig=
ated.

All I know is it=E2=80=99s a checksum error on a meta slab (122) and from wh=
at I can gather it=E2=80=99s the spacemap that is corrupt... but I am no exp=
ert.  I don=E2=80=99t believe it=E2=80=99s a software fault as such, because=
 this was cause by a hard outage (damaged UPSes) whilst resilvering a single=
 (but completely failed) drive.  ...and after the first outage a second occu=
rred (same as the first but more damaging to the power hardware)... the host=
 itself was not damaged nor were the drives or controller.

>=20
> Note that ZFS stores multiple copies of its essential metadata, and in my e=
xperience with my old, consumer grade crappy hardware (non-ECC RAM, with sev=
eral faulty, single hard drive pool: bad enough to crash almost monthly and d=
amages my data from time to time),

This was a top end consumer grade mb with non ecc ram that had been running f=
or 8+ years without fault (except for hard drive platter failures.). Uptime w=
ould have been years if it wasn=E2=80=99t for patching.

> I've never seen a corruption this bad and I was always able to recover the=
 pool.=20

So far, same.

> At previous employer, the only case that we had the pool corrupted enough t=
o the point that mount was not allowed was because two host nodes happen to i=
mport the pool at the same time, which is a situation that can be avoided wi=
th SCSI reservation; their hardware was of much better quality, though.
>=20
> Speaking for a tool like 'fsck': I think I'm mostly convinced that it's no=
t necessary, because at the point ZFS says the metadata is corrupted, it mea=
ns that these metadata was really corrupted beyond repair (all replicas were=
 corrupted; otherwise it would recover by finding out the right block and re=
write the bad ones).

I see this message all the time and mostly agree.. actually I do agree with p=
ossibly a minor exception, but so minor it=E2=80=99s probably not worth it. =
 However as I suggested in my original post.. the pool says the files are th=
ere, a tool that would send them (aka zfs send) but ignoring errors to space=
maps etc would be real useful (to me.)

>=20
> An interactive tool may be useful (e.g. "I saw data structure version 1, 2=
, 3 available, and all with bad checksum, choose which one you would want to=
 try"), but I think they wouldn't be very practical for use with large data p=
ools -- unlike traditional filesystems, ZFS uses copy-on-write and heavily d=
epends on the metadata to find where the data is, and a regular "scan" is no=
t really useful.

Zdb -AAA showed (shows) 36m files..  which suggests the data is intact, but i=
t aborts the mount with I/o error because it says metadata has three errors.=
. 2 =E2=80=98metadata=E2=80=99 and one =E2=80=9C<storage:0x0>=E2=80=9D (stor=
age being the pool name).. it does import, and it attempts to resilver but r=
eports the resilver finishes at some 780M (ish).. export import and it does i=
t all again...  zdb without -AAA aborts loading metaslab 122.

>=20
> I'd agree that you need a full backup anyway, regardless what storage syst=
em is used, though.

Yeah.. unlike UFS that has to get really really hosed to restore from backup=
 with nothing recoverable it seems ZFS can get hosed where issues occur in j=
ust the wrong bit... but mostly it is recoverable (and my experience has bee=
n some nasty shit that always ended up being recoverable.)

Michelle=20=

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5ED8BADE-7B2C-4B73-93BC-70739911C5E3>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation