From owner-freebsd-stable@freebsd.org Tue Apr 30 10:14:26 2019 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 33741158B5D8; Tue, 30 Apr 2019 10:14:26 +0000 (UTC) (envelope-from michelle@sorbs.net) Received: from hades.sorbs.net (hades.sorbs.net [72.12.213.40]) by mx1.freebsd.org (Postfix) with ESMTP id D582897491; Tue, 30 Apr 2019 10:14:24 +0000 (UTC) (envelope-from michelle@sorbs.net) MIME-version: 1.0 Received: from [10.10.0.230] (gate.mhix.org [203.206.128.220]) by hades.sorbs.net (Oracle Communications Messaging Server 7.0.5.29.0 64bit (built Jul 9 2013)) with ESMTPSA id <0PQR002N8RR3Y400@hades.sorbs.net>; Tue, 30 Apr 2019 03:28:19 -0700 (PDT) Subject: Re: ZFS... From: Michelle Sullivan X-Mailer: iPad Mail (16A404) In-reply-to: Date: Tue, 30 Apr 2019 20:14:19 +1000 Cc: rainer@ultra-secure.de, owner-freebsd-stable@freebsd.org, freebsd-stable , Andrea Venturoli Message-id: <5ED8BADE-7B2C-4B73-93BC-70739911C5E3@sorbs.net> References: <30506b3d-64fb-b327-94ae-d9da522f3a48@sorbs.net> <56833732-2945-4BD3-95A6-7AF55AB87674@sorbs.net> <3d0f6436-f3d7-6fee-ed81-a24d44223f2f@netfence.it> <17B373DA-4AFC-4D25-B776-0D0DED98B320@sorbs.net> <70fac2fe3f23f85dd442d93ffea368e1@ultra-secure.de> <70C87D93-D1F9-458E-9723-19F9777E6F12@sorbs.net> To: Xin LI X-Rspamd-Queue-Id: D582897491 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of michelle@sorbs.net designates 72.12.213.40 as permitted sender) smtp.mailfrom=michelle@sorbs.net X-Spamd-Result: default: False [-3.81 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+a:hades.sorbs.net]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; DMARC_NA(0.00)[sorbs.net]; RCPT_COUNT_FIVE(0.00)[5]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[cached: battlestar.sorbs.net]; NEURAL_HAM_SHORT(-0.99)[-0.985,0]; RCVD_IN_DNSWL_NONE(0.00)[40.213.12.72.list.dnswl.org : 127.0.10.0]; SUBJ_ALL_CAPS(0.45)[6]; IP_SCORE(-1.06)[ip: (-2.92), ipnet: 72.12.192.0/19(-1.33), asn: 11114(-1.01), country: US(-0.06)]; FREEMAIL_TO(0.00)[gmail.com]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+]; ASN(0.00)[asn:11114, ipnet:72.12.192.0/19, country:US]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2] Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Apr 2019 10:14:26 -0000 Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 30 Apr 2019, at 19:50, Xin LI wrote: >=20 >=20 >> On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan wr= ote: >> but in my recent experience 2 issues colliding at the same time results i= n disaster >=20 > Do we know exactly what kind of corruption happen to your pool? If you se= e it twice in a row, it might suggest a software bug that should be investig= ated. All I know is it=E2=80=99s a checksum error on a meta slab (122) and from wh= at I can gather it=E2=80=99s the spacemap that is corrupt... but I am no exp= ert. I don=E2=80=99t believe it=E2=80=99s a software fault as such, because= this was cause by a hard outage (damaged UPSes) whilst resilvering a single= (but completely failed) drive. ...and after the first outage a second occu= rred (same as the first but more damaging to the power hardware)... the host= itself was not damaged nor were the drives or controller. >=20 > Note that ZFS stores multiple copies of its essential metadata, and in my e= xperience with my old, consumer grade crappy hardware (non-ECC RAM, with sev= eral faulty, single hard drive pool: bad enough to crash almost monthly and d= amages my data from time to time), This was a top end consumer grade mb with non ecc ram that had been running f= or 8+ years without fault (except for hard drive platter failures.). Uptime w= ould have been years if it wasn=E2=80=99t for patching. > I've never seen a corruption this bad and I was always able to recover the= pool.=20 So far, same. > At previous employer, the only case that we had the pool corrupted enough t= o the point that mount was not allowed was because two host nodes happen to i= mport the pool at the same time, which is a situation that can be avoided wi= th SCSI reservation; their hardware was of much better quality, though. >=20 > Speaking for a tool like 'fsck': I think I'm mostly convinced that it's no= t necessary, because at the point ZFS says the metadata is corrupted, it mea= ns that these metadata was really corrupted beyond repair (all replicas were= corrupted; otherwise it would recover by finding out the right block and re= write the bad ones). I see this message all the time and mostly agree.. actually I do agree with p= ossibly a minor exception, but so minor it=E2=80=99s probably not worth it. = However as I suggested in my original post.. the pool says the files are th= ere, a tool that would send them (aka zfs send) but ignoring errors to space= maps etc would be real useful (to me.) >=20 > An interactive tool may be useful (e.g. "I saw data structure version 1, 2= , 3 available, and all with bad checksum, choose which one you would want to= try"), but I think they wouldn't be very practical for use with large data p= ools -- unlike traditional filesystems, ZFS uses copy-on-write and heavily d= epends on the metadata to find where the data is, and a regular "scan" is no= t really useful. Zdb -AAA showed (shows) 36m files.. which suggests the data is intact, but i= t aborts the mount with I/o error because it says metadata has three errors.= . 2 =E2=80=98metadata=E2=80=99 and one =E2=80=9C=E2=80=9D (stor= age being the pool name).. it does import, and it attempts to resilver but r= eports the resilver finishes at some 780M (ish).. export import and it does i= t all again... zdb without -AAA aborts loading metaslab 122. >=20 > I'd agree that you need a full backup anyway, regardless what storage syst= em is used, though. Yeah.. unlike UFS that has to get really really hosed to restore from backup= with nothing recoverable it seems ZFS can get hosed where issues occur in j= ust the wrong bit... but mostly it is recoverable (and my experience has bee= n some nasty shit that always ended up being recoverable.) Michelle=20=