From owner-freebsd-stable@freebsd.org Wed May 1 23:46:28 2019 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E7E16158139F for ; Wed, 1 May 2019 23:46:27 +0000 (UTC) (envelope-from michelle@sorbs.net) Received: from hades.sorbs.net (hades.sorbs.net [72.12.213.40]) by mx1.freebsd.org (Postfix) with ESMTP id 3A2026BD54 for ; Wed, 1 May 2019 23:46:27 +0000 (UTC) (envelope-from michelle@sorbs.net) MIME-version: 1.0 Received: from [10.10.0.230] (gate.mhix.org [203.206.128.220]) by hades.sorbs.net (Oracle Communications Messaging Server 7.0.5.29.0 64bit (built Jul 9 2013)) with ESMTPSA id <0PQU00MV5O0JZ500@hades.sorbs.net> for freebsd-stable@freebsd.org; Wed, 01 May 2019 17:00:23 -0700 (PDT) Sun-Java-System-SMTP-Warning: Lines longer than SMTP allows found and truncated. Subject: Re: ZFS... From: Michelle Sullivan X-Mailer: iPad Mail (16A404) In-reply-to: <47137ea9-1ab2-1271-c15f-c0c05a17b92f@multiplay.co.uk> Date: Thu, 02 May 2019 09:46:22 +1000 Cc: Paul Mather , freebsd-stable Message-id: <289FE04E-1692-4763-96B3-91E8C1BBBBD6@sorbs.net> References: <30506b3d-64fb-b327-94ae-d9da522f3a48@sorbs.net> <17B373DA-4AFC-4D25-B776-0D0DED98B320@sorbs.net> <70fac2fe3f23f85dd442d93ffea368e1@ultra-secure.de> <70C87D93-D1F9-458E-9723-19F9777E6F12@sorbs.net> <5ED8BADE-7B2C-4B73-93BC-70739911C5E3@sorbs.net> <2e4941bf-999a-7f16-f4fe-1a520f2187c0@sorbs.net> <34539589-162B-4891-A68F-88F879B59650@sorbs.net> <576857a5-a5ab-eeb8-2391-992159d9c4f2@denninger.net> <7DBA7907-BE8F-4944-9A71-86E5AC1B85CA@gromit.dlib.vt.edu> <5c458075-351f-6eb6-44aa-1bd268398343@sorbs.net> To: Steven Hartland X-Rspamd-Queue-Id: 3A2026BD54 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of michelle@sorbs.net designates 72.12.213.40 as permitted sender) smtp.mailfrom=michelle@sorbs.net X-Spamd-Result: default: False [-3.14 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+a:hades.sorbs.net]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: battlestar.sorbs.net]; NEURAL_HAM_SHORT(-0.88)[-0.883,0]; SUBJ_ALL_CAPS(0.45)[6]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+]; ASN(0.00)[asn:11114, ipnet:72.12.192.0/19, country:US]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.98)[-0.985,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; DMARC_NA(0.00)[sorbs.net]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[40.213.12.72.list.dnswl.org : 127.0.10.0]; IP_SCORE(-0.51)[ip: (-1.31), ipnet: 72.12.192.0/19(-0.66), asn: 11114(-0.51), country: US(-0.06)]; RCVD_COUNT_TWO(0.00)[2] Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 May 2019 23:46:28 -0000 Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 02 May 2019, at 03:39, Steven Hartland wrote:= >=20 >=20 >=20 >> On 01/05/2019 15:53, Michelle Sullivan wrote: >> Paul Mather wrote: >>>> On Apr 30, 2019, at 11:17 PM, Michelle Sullivan wr= ote: >>>>=20 >>>> Been there done that though with ext2 rather than UFS.. still got all m= y data back... even though it was a nightmare.. >>>=20 >>>=20 >>> Is that an implication that had all your data been on UFS (or ext2:) thi= s time around you would have got it all back? (I've got that impression thr= ough this thread from things you've written.) That sort of makes it sound li= ke UFS is bulletproof to me. >>=20 >> Its definitely not (and far from it) bullet proof - however when the data= on disk is not corrupt I have managed to recover it - even if it has been a= nightmare - no structure - all files in lost+found etc... or even resorting= to r-studio in the even of lost raid information etc.. > Yes but you seem to have done this with ZFS too, just not in this particul= arly bad case. >=20 There is no r-studio for zfs or I would have turned to it as soon as this is= sue hit. > If you imagine that the in memory update for the metadata was corrupted an= d then written out to disk, which is what you seem to have experienced with y= our ZFS pool, then you'd be in much the same position. >>=20 >> This case - from what my limited knowledge has managed to fathom is a spa= cemap has become corrupt due to partial write during the hard power failure.= This was the second hard outage during the resilver process following a dri= ve platter failure (on a ZRAID2 - so single platter failure should be comple= tely recoverable all cases - except hba failure or other corruption which do= es not appear to be the case).. the spacemap fails checksum (no surprises th= ere being that it was part written) however it cannot be repaired (for what e= ver reason))... how I get that this is an interesting case... one cannot jus= t assume anything about the corrupt spacemap... it could be complete and jus= t the checksum is wrong, it could be completely corrupt and ignorable.. but w= hat I understand of ZFS (and please watchers chime in if I'm wrong) the spac= emap is just the freespace map.. if corrupt or missing one cannot just 'fix i= t' because there is a very good chance that the fix would corrupt something t= hat is actually allocated and therefore the best solution would be (to "fix i= t") would be consider it 100% full and therefore 'dead space' .. but zfs doe= sn't do that - probably a good thing - the result being that a drive that is= supposed to be good (and zdb reports some +36m objects there) becomes compl= etely unreadable ... my thought (desire/want) on a 'walk' tool would be a l= ast resort tool that could walk the datasets and send them elsewhere (like z= fs send) so that I could create a new pool elsewhere and send the data it kn= ows about to another pool and then blow away the original - if there are cor= ruptions or data missing, thats my problem it's a last resort.. but in the c= ase the critical structures become corrupt it means a local recovery option i= s enabled.. it means that if the data is all there and the corruption is jus= t a spacemap one can transfer the entire drive/data to a new pool whilst the= original host is rebuilt... this would *significantly* help most people wit= h large pools that have to blow them away and re-create the pools because of= errors/corruptions etc... and with the addition of 'rsync' (the checksummin= g of files) it would be trivial to just 'fix' the data corrupted or missing f= rom a mirror host rather than transferring the entire pool from (possibly) o= ffsite.... >=20 > =46rom what I've read that's not a partial write issue, as in that case th= e pool would have just rolled back. It sounds more like the write was succes= sful but the data in that write was trashed due to your power incident and t= hat was replicated across ALL drives. >=20 I think this might be where the problem started.. it was already rolling bac= k from the first power issue (it did exactly what was expected and programme= d, it rolled back 5 seconds.. which as no-one had write access to it from th= e start of the resilver I really didn=E2=80=99t care as the only changes wer= e the resilver itself.). Now you assertion/musing maybe correct... all driv= es got trashed data.. I think not but unless we get into it and examine it I= think we won=E2=80=99t know. What I do know is in the second round -FfX wo= uldn=E2=80=99t work, I used zdb to locate a =E2=80=9CLOADED=E2=80=9D MOS and= used -t to import.. the txg number was 7 or 8 from current so just ou= tside of the -X limit (going off memory here, so could have been more, but I= remember it was just past the switch limit.) > To be clear this may or may not be what your seeing as you don't see to ha= ve covered any of the details of the issues your seeing and what in detail s= teps you have tried to recover with? There have been many steps over the last month.. and some of which I may hav= e made it from very difficult to recover to non recoverable now... though th= e only writes is what the kernel does as have not got it (the dataset) mount= ed at any time, even though it has imported. >=20 > I'm not saying this is the case but all may not be lost depending on the e= xact nature of the corruption. >=20 > For more information on space maps see: > https://www.delphix.com/blog/delphix-engineering/openzfs-code-walk-metasla= bs-and-space-maps This is something I read a month ago, along with multiple other articles on t= he same blog, including https://www.delphix.com/blog/openzfs-pool-import-rec= overy Which I might add got me from non importable to importable but not mountable= . I have *not* attempted to bypass the checksum line for spacemap load to date= as I see that as a possible way to make the problem worse. > https://sdimitro.github.io/post/zfs-lsm-flushing/ Not read this. >=20 > A similar behavior resulted in being a bug: > https://www.reddit.com/r/zfs/comments/97czae/zfs_zdb_space_map_errors_on_u= nmountable_zpool/ >=20 Or this.. will go there following =E2=80=9Cpressing send=E2=80=9D.. :) > Regards > Steve > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"