Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Apr 2013 16:45:59 -0400
From:      Michael Powell <nightrecon@hotmail.com>
To:        freebsd-questions@freebsd.org
Subject:   Re: pwd.db/spwd.db file corupption when having unsafe system poweroff
Message-ID:  <kkkda5$vm9$1@ger.gmane.org>
References:  <CAHHq%2BVwcazbVXDDsZqH1AXxVOu0mfGjT_5Tcj3OoHJroe8Kgdg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Tak Tak wrote:

> hi everyone,
> 
> i wanna know what exactly happens for freebsd files and processes,
> when we shutdown system via pressing hardware power key for 3 seconds?
> 
> here's what has happened to me, recently:
> i've faced a strange problem.. on one of my bsd servers, one of my
> coworkers had defined and edited some system users, and then, instead
> of safe shutdown, he kept pressing power-button for 3 seconds!..
> after next startup, we couldn't login anymore! we had to replace
> pwd.db and spwd.db files, via bootable-freebsd Fixit mode, and then,
> everything was fine!
> 
> we know that we are, for sure, better to use safe shutdown, but i
> can't guarantee it always happens. what if sudden power off makes same
> problem??so i can't leave my servers in such situations..
> 
> My questins are:
> what has happened exactly? just in-used corrupted files ??
> is there any way to prevent this situation? (instead of  having a
> read-only FS.. i can't apply it on this server for now..).
> 
> i'm sorry if my question seems dummish! i'm trying to increase my bsd
> knowledge, but i'm just on my way..
> 
> for sure, i appreciate any ideas or answers :)

At the risk of illustrating what I'm fuzzy on, possibly those with more in-
depth skill can fill in the blanks or tidy up some with more accurate and 
complete details.

Pressing the power button for 4 seconds as described is invoking the ACPI 
layer to stimulate call(s) down to the system BIOS. Whatever is set in the 
BIOS wrt to power control and various power-savings modes are passed through 
the ACPI layer. The problem with this is the acpi module in FreeBSD may, or 
may not, be a perfect implementation for every possible piece of hardware in 
existance. The piece of that which really concerns me are individual 
manufactuer BIOS quirks can be just enough 'off' so as to misbehave even when 
the FreeBSD acpi implentation is basically sound. The jist of this is (IMHO 
here - YMMV) is I consider it a bad procedure to turn off a server as you've 
described. Use the shutdown command properly instead. I would never do what 
your coworker did to any of my servers. Caveat being sometimes you have no 
other choice but to do a hard power-down. A hard power-down is done by using 
the switch on the power supply, and not using the ACPI/BIOS from pressing 
the power switch on the front.

When you do have an 'uh-oh' like this, FreeBSD normally boots back into an 
unclean file system with corresponding whinings and complaints about how the 
file system(s) were not properly dismounted. Normally a background fsck 
ensues after 60 seconds of idle. In your case whatever files were left open 
and not properly closed this background fsck, had it been allowed to run and 
complete, would have cleaned this up. The problem starts when someone 
presses the power off button again, and again, before this process completes. 
Using the power button ACPI/BIOS only compounds this situation.

I have had at one time or another, power failures that occurred almost back 
to back, only with a few minutes in between. So what happened was on first 
boot after power came back the power went down again right in the middle of 
this background fsck. Two more of these and my file system(s) were in pretty 
not-so-good shape. Luckily I was running gmirror and one of the drives was 
consistent. So the mirror got rebuilt from the drive with the consistent 
file system automagically (takes a while), then the system continued to 
boot, and then the background fsck finally kicked in. Gmirror saved my bacon 
here. Journaling is also supposed to provide similar error recovery 
features. I've had this happen twice on 2 different boxen. Needless to say, 2 
broken UPS units were scrapped and replaced as a result.

I would recommend you do NOT use the power button as you described above. 
Period. In any event pay particular attention to that very first boot after 
an 'uh-oh' power off event. Look at top and watch for the background fsck to 
kick off and complete, returning the machine to quiescent state BEFORE you do 
ANYTHING else to it. This includes pressing the button on the front.

Just my $.02 - but I've had a couple of experiences like this and survived 
them successfully by doing things my way.

-Mike
   





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?kkkda5$vm9$1>