Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Apr 2013 17:35:44 +0200
From:      Polytropon <freebsd@edvax.de>
To:        nightrecon@hotmail.com
Cc:        freebsd-questions@freebsd.org
Subject:   Re: pwd.db/spwd.db file corupption when having unsafe system poweroff
Message-ID:  <20130417173544.25266cd6.freebsd@edvax.de>
In-Reply-To: <kkkda5$vm9$1@ger.gmane.org>
References:  <CAHHq%2BVwcazbVXDDsZqH1AXxVOu0mfGjT_5Tcj3OoHJroe8Kgdg@mail.gmail.com> <kkkda5$vm9$1@ger.gmane.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Allow me a few additions:

On Tue, 16 Apr 2013 16:45:59 -0400, Michael Powell wrote:
> Pressing the power button for 4 seconds as described is invoking the ACPI 
> layer to stimulate call(s) down to the system BIOS.

No. In most (but of course not all) default settings the
"long press" will forcedly (and with _no_ message to the OS)
turn off the system's power.

The "short press" will emit the ACPI signal to the OS to
deal with the power-off sequence itself.

Still it's possible to have a different programming for the
button. For example, it seems to be common to have this
button perform a "ACPI sleep", "ACPI hibernate" or "ACPI
powersafe" mode on "short press", and (as you mentioned)
the "ACPI power down" on long press.

But as I said: _What_ the button actually does is defined
in the CMOS setup.

http://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface#Power_states

have a look at this page to find out more about the various
possible signals (power states).



> Whatever is set in the 
> BIOS wrt to power control and various power-savings modes are passed through 
> the ACPI layer. The problem with this is the acpi module in FreeBSD may, or 
> may not, be a perfect implementation for every possible piece of hardware in 
> existance.

This statement especially applies in regards to laptops, where
closing the lid can also trigger a specific signal, and opening
the device again sends another signal. Vendors don't agree on
how to "properly" do this, so there are many different ACPI
implementations.

% ls /boot/kernel/acpi*
/boot/kernel/acpi.ko*           /boot/kernel/acpi_ibm.ko*
/boot/kernel/acpi_aiboost.ko*   /boot/kernel/acpi_panasonic.ko*
/boot/kernel/acpi_asus.ko*      /boot/kernel/acpi_sony.ko*
/boot/kernel/acpi_dock.ko*      /boot/kernel/acpi_toshiba.ko*
/boot/kernel/acpi_fujitsu.ko*   /boot/kernel/acpi_video.ko*
/boot/kernel/acpi_hp.ko*        /boot/kernel/acpi_wmi.ko*

You can see from this example that FreeBSD only supports a
subset of what can be considered possible. Of course there
are many "fields of compatibility", but it may still result
in specific hardware not working properly -- mostly in the
area of laptops and their accessories (like docking stations).



> The piece of that which really concerns me are individual 
> manufactuer BIOS quirks can be just enough 'off' so as to misbehave even when 
> the FreeBSD acpi implentation is basically sound.

Even though I did not experience that myself, it can be
considered possible. A sloppy ACPI implementation can
be the source of many kinds of trouble, even involving
such "simple" devices like a power button.



> The jist of this is (IMHO 
> here - YMMV) is I consider it a bad procedure to turn off a server as you've 
> described.

Definitely. :-)



> Use the shutdown command properly instead. I would never do what 
> your coworker did to any of my servers.

A mechanicl protection could prevent that.



> Caveat being sometimes you have no 
> other choice but to do a hard power-down. A hard power-down is done by using 
> the switch on the power supply, and not using the ACPI/BIOS from pressing 
> the power switch on the front.

This is also possible. Both this _and_ the default "forced power off"
(the "long press" in many defaults) equal the action of pulling the
power cord.



> When you do have an 'uh-oh' like this, FreeBSD normally boots back into an 
> unclean file system with corresponding whinings and complaints about how the 
> file system(s) were not properly dismounted.

This is an intended behaviour. TO prevent further damage and to
make data recovery possible (worst case), the system does not
try to "boot by all means", just to make the (clueless) user
happy. :-)



> Normally a background fsck 
> ensues after 60 seconds of idle.

This _can_ be dangerous, because at this time, the system has
already been booted into a "somehow working" state. You should
ask yourself the question: Can I invest the time to have _no_
background fsck (i. e., a foreground fsck which maybe will ask
prior to doing anything "heavy") to make sure my data is consistent,
because it is important data which _needs_ to be okay? In this
case, put background_fsck="NO" in /etc/rc.conf -- and wait.

When using UFS, there _may_ be file system damages so severe
that fsck will _not_ correct them manually (which often leads
to data loss of important data that could have been saved if
the proper _user decision_ would have been taken place). This
will only happen in the "interactive mode" at system startup.



> In your case whatever files were left open 
> and not properly closed this background fsck, had it been allowed to run and 
> complete, would have cleaned this up.

Maybe, maybe not. It highly depends on what actually happened,
and it's nearly impossible to find that out, especially when
there is no control about what the background fsck does (while
the system is already happily running and humming).



> The problem starts when someone 
> presses the power off button again, and again, before this process completes. 
> Using the power button ACPI/BIOS only compounds this situation.

Correct. That's why the time to have fsck perform its task in
the foreground should be invested, at least after such an abrupt
action.



> I would recommend you do NOT use the power button as you described above. 
> Period.

In case of _servers_, this button is commonly considered an
"emergency button" anyway, and therefor hardly used. :-)



> In any event pay particular attention to that very first boot after 
> an 'uh-oh' power off event. Look at top and watch for the background fsck to 
> kick off and complete, returning the machine to quiescent state BEFORE you do 
> ANYTHING else to it. This includes pressing the button on the front.

The "doing anything else" can be the problem with a background fsck.
Let's say the server starts its services which start accessing the
partitions currently checked by fsck. Yes, I know, snapshots and all
this stuff. Sometimes it works. Sometimes it doesn't. My additional
advice would be: Do not use a background fsck. If you had a power
failure (for whatever reason), take the time to make sure your system
boots into a verified state (NOT: boots into a questionable state,
tries to verify it during normal operations, and pretends "everything
is fine").





-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130417173544.25266cd6.freebsd>