From owner-freebsd-questions@freebsd.org Mon Feb 1 14:58:49 2021 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 0C2F952CCBB for ; Mon, 1 Feb 2021 14:58:49 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from mout.kundenserver.de (mout.kundenserver.de [217.72.192.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "mout.kundenserver.de", Issuer "TeleSec ServerPass Class 2 CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4DTrgC6kdMz3mjP for ; Mon, 1 Feb 2021 14:58:47 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from r56.edvax.de ([178.8.36.167]) by mrelayeu.kundenserver.de (mreue106 [212.227.15.183]) with ESMTPA (Nemesis) id 1MmkfQ-1lqP3t2G1u-00jqty; Mon, 01 Feb 2021 15:58:43 +0100 Date: Mon, 1 Feb 2021 15:58:42 +0100 From: Polytropon To: "Matt Emmerton" Cc: Subject: Re: Help recovering damaged drive - fsck segfaults, read-only mount looks ok Message-Id: <20210201155842.1e529018.freebsd@edvax.de> In-Reply-To: <012a01d6f81e$3103d390$930b7ab0$@gsicomp.on.ca> References: <012a01d6f81e$3103d390$930b7ab0$@gsicomp.on.ca> Reply-To: Polytropon Organization: EDVAX X-Mailer: Sylpheed 3.1.1 (GTK+ 2.24.5; i386-portbld-freebsd8.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K1:wgfn40xksBvNLn8A+XdaqGkIjKn+eoE+/wt9iwKxIugv9D2T3GM k07q4g2KfIIEbwd5cz2FIGnH3PWkHVCei4On3mCPAR8K5d29BUJzxwQqYNWIvY95kB8NxJb DNJiaUH9ipYntHkXDkvJ/E7xJwx+zkdF7N8Qh6hwuOo2a+0roy6jTYuwiNsLrMqc0vxhTUU aqOaZ7QDfoBvn47Ky51vw== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:7IA3hKpJfnk=:Fiz0ljGQLXy5vQkdU2Uzp6 giYTo99JG92jiMbSOwsa+2iiA8jJ/wWYe8qSxJtjVCzZtAlX2rfDfschll75Fym3ysmPIyyxV Ys6SrrH72ozVVJA5GmYUSKWxWH15jvrJMyewIo4CJ9Xux1AMExm9PDetQPD3gc5YOdkfyfLWM NOrd4W/d6pUM1fTA6VKywNsXhRD3vyqCWC43KrG4a9XBBxZriaOJRSU4+ueciLV+KRT7f/j7m 19eafACSQJlkJ6HxZTj/i3XOVrlb4ifvkfljSUIpP/ndiTKsgk71DufP6djC8jjuAcXoWbxQs tenqweaPMqlMMJMmvtmzYoRt1CTKmQGDS3AipQHOmbbKjqEKm/sYVJ6E73702t4IvMR0dxQeQ ejxz7ca9HOGqCDqODsAQQfmJd1ohjC+9tW8TFOSABuZPi0ntBydD9eZq6ZgO/ X-Rspamd-Queue-Id: 4DTrgC6kdMz3mjP X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of freebsd@edvax.de has no SPF policy when checking 217.72.192.73) smtp.mailfrom=freebsd@edvax.de X-Spamd-Result: default: False [-0.53 / 15.00]; HAS_REPLYTO(0.00)[freebsd@edvax.de]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; MV_CASE(0.50)[]; HAS_ORG_HEADER(0.00)[]; NEURAL_HAM_SHORT(-0.93)[-0.932]; RCPT_COUNT_TWO(0.00)[2]; RECEIVED_SPAMHAUS_PBL(0.00)[178.8.36.167:received]; RCVD_TLS_LAST(0.00)[]; R_DKIM_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RBL_DBL_DONT_QUERY_IPS(0.00)[217.72.192.73:from]; ASN(0.00)[asn:8560, ipnet:217.72.192.0/20, country:DE]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; REPLYTO_EQ_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[edvax.de]; AUTH_NA(1.00)[]; SPAMHAUS_ZRD(0.00)[217.72.192.73:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MID_CONTAINS_FROM(1.00)[]; RCVD_IN_DNSWL_NONE(0.00)[217.72.192.73:from]; R_SPF_NA(0.00)[no SPF record]; RWL_MAILSPIKE_POSSIBLE(0.00)[217.72.192.73:from]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-questions] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Feb 2021 14:58:49 -0000 On Sun, 31 Jan 2021 17:12:40 -0500, Matt Emmerton wrote: > Hi, > > I have a FreeBSD-11 machine that I recently upgraded to FreeBSD-12. It has > a Sii RAID-1 pair of 1TB drives. > A week ago this system got unexpectedly powered off and when it came back > up, mount refuses to mount my RAID-1 FS because it is durty. > fsck runs, but segfaults. It's clear that the corruption is confusing fsck > and causing the trap. First of all: This sounds a lot like the problem that initially brought me to the FreeBSD mailing list, so maybe the archives and my memory can help you. I'm not sure if it is really _the same_ kind of problem, but at least it could be some inspiration for further experiments. > If I force a mount in readonly mode, I can inspect the drive and at first > glance, everything seems valid. Since this machine is used for backups, I > have lots of other medata (eg, checksums) and I'm slowly working through to > see if anything important is damaged. At this point: STOP. If your data is important to you, get a copy of it NOW. A forced r/o mount is a good chance to read your data. Copy everything you are interested in, because in worst case, you could have to initialize the whole filesystem, which implies data loss. Make sure you're prepared for such an event. In all honesty: I wasn't, and I regret it. Always remember the purpose of backups: You don't need them until it's too late. :-) > From some of the stuff that fsck is finding, it's clear that the corruption > is in a rather large-and-deep directory tree that was recently deleted. > It's possible that the 'rm -rf' for this was running in the background when > the system lost power. Therefore deleted files (or "scheduled for deletion") can still be present in the r/o mount. This "delay" once helped me recover accidentally deleted files (stupid wildcard + fat fingers + brain already asleep) - turned of power, booted SUM, mounted read-only, copied files (still there!), ran fsck (files were gone), and then copied files back into place. As if nothing happened... :-) > Is there any way to have fsck be more "selective" in what it checks/repairs? > It's been a long time since I've done low-level filesystem surgery, but it > seems to me that if I can prevent it from going off into the weeds (and > trying to repair inode entries that are no longer relevant), all will be > well. Yes. There is a "preen mode" (fsck -p) and a forced mode (fsck -f). Be careful with specifying -y, it does not always to what you want it to do. Data loss might happen. See "man fsck" for details. > Any advice? I have thought about doing some inspection with "ls -i" and > then being very selective in the inodes I get fsck to repair, but that seems > challenging to get right. And _that_ is how I finally got my files back (the initial "severe data loss problem more than 10 years ago): With ls -i, I determined the inode of an offending directory, then used fsdb (which I found out about reading a reference manual about a GDR UNIX system) to remove it, and _then_ (!) fsck was able, after two runs, to bring the filesystem back to a consistent state. The offending directory was .snap at the root of the filesystem. Once it was gone, fsck worked as expected. Also note that fsck _might_ have problems (or require a second run) when dealing with soft updates and UFS journal. If fsck encounters an unallocated, but not "free" inode, it will store its content in the lost+found/ directory at the root of the filesystem. It could be possible that the whole deleted tree appears there. So check this location after the system came up properly. You can then delete its content, if you wanted to delete those files anyway. Up to that point, I had already read McKusick's UFS paper, the code of fsck_ffs (UFS fsck) and many other resources about how things worked; I modified the fsck program, debugged it, examined dumps; I learned data recovery tools (such as TSK and "UFS Explorer"), forensic strategies and "What you should have done" - I couldn't find out why fsck had "hickups" and could not proceed. None mentioned that some directory entry (for a feature that I never used!) was the problem. At least in my case, I got all (!) my data back, just a few hundred filenames were missing (unallocated, but present), but from the content, it was no problem to finally re-instantiate those that mattered. However, it's possible that you're facing an entirely different problem where fsck won't be able to get the filesystem back into a consistent state, and backup - newfs - restore is your only option. All the best, and I hope you can solve that problem. It's one of the very few cases that can happen, and which teach you a lot about how the UFS filesystem works. :-) -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...