Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 12 Dec 2017 22:06:35 -0700
From:      Gary Aitken <freebsd@dreamchaser.org>
To:        Polytropon <freebsd@edvax.de>, Adam Vande More <amvandemore@gmail.com>
Cc:        FreeBSD Questions <freebsd-questions@freebsd.org>
Subject:   Re: Subject: Thunderbird causing system crash, need guidance
Message-ID:  <603b487e-d1b7-eb98-6bcd-f2c2c6d3b843@dreamchaser.org>
In-Reply-To: <20171212200126.3ddf75e5.freebsd@edvax.de>
References:  <201712110045.vBB0jCTQ078476@nightmare.dreamchaser.org> <CA+tpaK0sG31TckxL8orNmAD0ZXSz7rJzEotjsCEtASw9u2COZg@mail.gmail.com> <38e2ef70-fa1b-25bf-4447-752006418d0a@dreamchaser.org> <20171211135803.d1aff6c8.freebsd@edvax.de> <5fbcd05c-ce12-b1a4-a9e9-79276dad7183@dreamchaser.org> <20171212200126.3ddf75e5.freebsd@edvax.de>

Next in thread | Previous in thread | Raw E-Mail | Index | Archive | Help
On 12/12/17 12:01, Polytropon wrote:
> On Tue, 12 Dec 2017 11:30:26 -0700, Gary Aitken wrote:
>> On 12/11/17 05:58, Polytropon wrote:
>>> On Sun, 10 Dec 2017 21:56:16 -0700, Gary Aitken wrote:
>>>> On 12/10/17 19:02, Adam Vande More wrote:
>>>>> On Sun, Dec 10, 2017 at 6:45 PM, Gary Aitken wrote:
>> <snip>
>>>> However, I'm confused. Upon reboot, the system checks to see if
>>>> file systems were properly dismounted and is supposed to do an
>>>> fsck.  Since those don't show up in messages, I can't verify
>>>> this, but I'm pretty certain it must have thought it was clean,
>>>> which it wasn't.  (One reason I'm pretty certain is the time
>>>> involved when run manually as you suggested).
>>> 
>>> This is the primary reason for setting
>>> 
>>> background_fsck="NO"
>> 
>> Already had that set for just that reason.
>> 
>>> in /etc/rc.conf - if you can afford a little downtime. The
>>> background fsck doesn't have all the repair capabilities a forced
>>> foreground check has, to it _might_ leave the file system in an
>>> inconsistent state, and the system runs with that unclean
>>> partition.
>>> 
>>>> The file system in question was mounted below "/". Does the
>>>> system only auto-check file systems mounted at "/"?
>>> 
>>> Yes, / is the first file system it checks. The two last fields in
>>> /etc/fstab control what fsck will check, and /etc/rc.conf allows
>>> additional flags for those automatic checks.
>> 
>> The ordering part I understand; what I don't understand is why it
>> (as I recall) rebooted successfully with no warnings in spite of
>> the background_fsck="NO" being set and when one of the disks
>> apparently didn't fsck properly.  I thought it should have halted
>> in single-user mode and waited for me to do a full fsck manually.
>> Unfortunately, the fsck output is not printed to the log, and I
>> logged in as root on the vt0 device, so it had scrolled off by the
>> time I went to look for it.  A good reason never to log into the
>> vt0 device.  Is there any way to get the "transient" boot-time fsck
>> and other messages recorded in the log?
> 
> There is an easy explanation:
> 
> The foregroud fsck at boot time can only handle a subset of damages.
> In some cases, you are required to perform a second run of fsck in
> order to fix problems. This is where a forced full fsck is very
> useful (usually in single-user mode).
> 
> You can specify additional flags for boot-time fsck via /etc/rc.conf,
> which are:
> 
> fsck_y_enable="NO"      # Set to YES to do fsck -y if the initial
> preen fails. fsck_y_flags=""         # Additional flags for fsck -y 
> background_fsck="YES"   # Attempt to run fsck in the background where
> possible. background_fsck_delay="60" # Time to wait (seconds) before
> starting the fsck.
> 
> For example, fsck_y_flags="-f" would be such an addition. As you can
> see, an initial preen ("limited fsck") can fail, and the filesystem
> might be in an inconsistent state. This is probably what you've been
> experiencing.
> 
> See "man fsck" for details. :-)

My language skills must be degenerating along with everything else... :-(

I have already read and reread the fsck man page, but thanks for the
fsck_y_flags example.  I'm fairly sure a normal boot fsck should
not have succeeded.

According to the handbook, 12.2.4, if an fsck fails it should drop into
single user.  But it didn't; it simply rebooted with the disk in a messed
up state.  The disk in question is a non-system disk; 3 file systems
mounted at nodes under /hd2 and below, all at pass 4, after all sys disks.
It did this repeatedly; I could consistently repeat the failure and crash
the system, and it would immediately autoboot into multi-user mode,
mounting the problem partition normally rw.  However, when I manually
unmounted the file system, ran "fsck -f" (which corrected numerous errors),
then rebooted, everything was (not surprisingly) once again functioning
properly.

So I'm looking for, if possible:

1. An explanation for the above behavior, which seems inconsistent with
the documented and expected behavior.  The only fsck flag set in rc.conf
is 'background_fsck="NO"'.  Is there some state a disk (or something else,
such as a normal shutdown flag), can (however theoretically) be in, where
it is possible to have a corrupt disk that won't pass normal boot time
fsck in preen mode but will not be checked in the first place, even while
another disk, the one containing all the system files, is checked?

2. A way to get the output of boot-time fsck commands recorded in the
system log, so one can after the fact of a reboot check to see what the
heck went on in terms of the fsck sequence?

On 12/12/17 11:54, Adam Vande More wrote:
> On Tue, Dec 12, 2017 at 12:30 PM, Gary Aitken
> <freebsd@dreamchaser.org <mailto:freebsd@dreamchaser.org>> wrote:
> 
>> The ordering part I understand; what I don't understand is why it (as
>> I recall) rebooted successfully with no warnings in spite of the 
>> background_fsck="NO" being set and when one of the disks apparently 
>> didn't fsck properly.  I thought it should have halted in
>> single-user mode and waited for me to do a full fsck manually.
> 
> That happens if the preen fails.  See the man page I pointed you to.
> There are cases where it can miss things. 

Are you saying if the preen fails, it mounts the file system normally
and continues to boot into multi-user?  According to the man page for
fsck, a failure in preen mode will exit with failure, and the rc.d/fsck
script (by my incompetent reading) will do stop_boot in that case.

> You can set:
> 
> fsck_y_enable="YES"
> 
> If you want it to handle a failed preen automatically.

I don't want that; I want it to drop into single-user and let me do a
manual fsck.

> I don't think background fsck presence or lack of had much to do with
> this.  It is a popular whipping boy.
> 
> A regular foreground fsck may not be a bad idea if your system is
> prone disk errors or hard crashes.  Perhaps sysutils/diskcheckd or a
> periodic entry, but under *normal* circumstances such things are not
> needed.

It's been doing foreground fsck all along, as I had a problem with
background fsck some year(s) ago.  Thanks for the diskcheckd pointer.

Thanks for any help; sorry for the dogged persistence...

Gary



Want to link to this message? Use this URL: <http://docs.FreeBSD.org/cgi/mid.cgi?603b487e-d1b7-eb98-6bcd-f2c2c6d3b843>