Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 10 Mar 2018 10:08:23 -0800
From:      Cy Schubert <Cy.Schubert@cschubert.com>
To:        rgrimes@freebsd.org
Cc:        Warner Losh <imp@bsdimp.com>, Ian Lepore <ian@freebsd.org>, Mark Johnston <markj@freebsd.org>, David Bright <dab@freebsd.org>, src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r328013 - head/sbin/fsck_ffs
Message-ID:  <201803101808.w2AI8Ntn038591@slippy.cwsent.com>
In-Reply-To: Message from "Rodney W. Grimes" <freebsd@pdx.rh.CN85.dnsmgr.net> of "Sat, 10 Mar 2018 09:51:43 -0800." <201803101751.w2AHphph070578@pdx.rh.CN85.dnsmgr.net>

next in thread | previous in thread | raw e-mail | index | archive | help
In message <201803101751.w2AHphph070578@pdx.rh.CN85.dnsmgr.net>, 
"Rodney W. Gri
mes" writes:
> > On Sat, Mar 10, 2018 at 10:26 AM, Ian Lepore <ian@freebsd.org> wrote:
> > 
> > > On Sat, 2018-03-10 at 09:02 -0800, Rodney W. Grimes wrote:
> > > > >
> > > > > On Sat, 2018-03-10 at 08:44 -0800, Rodney W. Grimes wrote:
> > > > > >
> > > [...]
> > > > > > > add "-T ffs:-R" to the initial fsck invocation in rc.d/fsck.
> > > > > > Please do not do that, if fsck -p fails YOU may optionally
> > > > > > wish to continue, or do retries, but please do not make this
> > > > > > a hardcoded situation.??At most make it a controllable knob
> > > > > > that defaults to the old behavior please.
> > > > > >
> > > > > > Thanks you,
> > > > > This whole situation with fsck retries is just very strange. ?How
> > > > > many other tools in the base system exhibit this behavior:?
> > > > >
> > > > >     I didn't do everything you asked, even though I am completely
> > > > >     capable of doing so. ?If you'd like to actually do the thing
> > > > >     you asked for, please run this program again.
> > > > >
> > > > > If there is some reason why fsck should do less than a complete job
> > > > > under some circumstances, isn't THAT the exceptional situation that
> > > > > should need a special flag to make it happen?
> > > > The job is "make sure my data is ok, keep my data at all costs, do
> > > > not however do something that may damange my data".
> > > >
> > > > The job is NOT "do everything you can to bring the file system to
> > > > a consistent state, even if you have to screw my data all up".
> > > >
> > >
> > > I'm not sure why you think the -R flag is some sort of "ruin my data"
> > > request.  Maybe because all of this stuff is so scantily documented in
> > > the manpage?
> > >
> > >     -R Instruct fsck_ffs to restart itself if it encounters certain
> > >      errors that warrant another run.
> > >
> > > Who knows what "certain errors" means?
> > >
> > 
> > There are some classes of errors that fsck correct that it must recompute a
> > large amount of state to make sure it is consistent. Rather than doing
> > that, it exits with a message saying to re-run fsck to make sure that there
> > aren't more errors that were hidden by the now-corrected errors from the
> > past pass.
> > 
> > 
> > > Looking at the code, it appears -R has no effect if you're in preen
> > > mode.  Hmmm, what's "preen mode" again?  Don't bother looking in the
> > > man page, you'll just find a bunch of mentions of the word preen that
> > > say "see the -p flag" and then, surrealistically, when you look at the
> > > -p flag it says "Preen file systems (see above)".  Of course, what was
> > > above was all the places that told you to see -p.
> > >
> > 
> > The man page could use some improvement. Preen mode means 'fix all the
> > stupid inconsistencies that crop up that never result in data loss'.
> > non-preen mode means to do that, and ask if you want to correct other
> > errors that usually don't cause data loss, but might and some modicum of
> > human intelligence is required to tell the two apart. Eg, I usually give up
> > hitting 'y' after a dozen or so times in FSCK unless I have a specific
> > reason to keep going. fsck -y has no such nuance.
>
> I do not believe that normal mode has any intellegnce to as if data
> loss will or will not occur.  It will gladly ask you if you want to
> clear an inode that is the root of a rather large tree, and you end
> up with either data loss, or a huge lost+found, sometimes even over
> flowing the size of lost+found (though that may of been fixed in ufs2).
>
> It simply runs along and if it finds an error it asks if you want
> to correct it or not.  Y is not always the correct answer, but
> most people are oblivious to what the questions imply with respect
> to the file system, and hence answer Y.  fsck does do thing in
> a sequence that tries to make Y the correct answer, but as you
> say human intelligence may do better.
>
> Some times if you had answered N at the right question you would not
> of gotten all of the other 11 questions that lead you to giving up,
> sometimes the N answer maybe 100's of Y's in, often to a clear
> inode question.
>
> When I get a preen failure my usual next step is to run a logged
> fsck -n to see what that says so I can evaluate the extent of fs
> damage, especially if this is a critical file system containing
> very valuable data.  
>
> > Warner
> > 
> > 
> > > So, I guess I'll just keep using fsck_y_enable=YES and relying on the
> > > fact that by default that now includes the -R option.
>
> And if your running ufs2 with soft updates your in a
> pretty safe place.  I would not recommend doing this on ufs1
> or without soft updates enabled.
>
> One must try to remeber that fsck -p during /etc/rc processing can
> run into many different file systems, some more resilent to running
> things like fsck -R -y, some not.

Having been in this situation with FreeBSD, Solaris, Linux, and many 
other operating systems, if you have more than a reasonable number of 
inodes that need to be cleared and if time constrained, as a person 
usually is in a system down situation, you're better off simply 
recovering from backup. In those situations data loss is usually 
unrecoverable. In my experience it comes down to: do I bite the bullet 
now or do I continue to waste precious time?

Having said that, if you have the time and recovery is too expensiveyou 
can use a binary editor avoid data loss. A co-worker and I spent 28 
hours on five mainframe filesystems once. The reason management chose 
this was that even though there were backups the customer would have 
lost 24 hours of transactions. This was partially successful as all 
data except for one customer database were recovered.




-- 
Cheers,
Cy Schubert <Cy.Schubert@cschubert.com>
FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  http://www.FreeBSD.org

	The need of the many outweighs the greed of the few.





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201803101808.w2AI8Ntn038591>