From owner-freebsd-questions@FreeBSD.ORG Tue Apr 16 20:46:16 2013 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 032CBC10 for ; Tue, 16 Apr 2013 20:46:16 +0000 (UTC) (envelope-from freebsd-questions@m.gmane.org) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) by mx1.freebsd.org (Postfix) with ESMTP id 9C54F1698 for ; Tue, 16 Apr 2013 20:46:15 +0000 (UTC) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1USClU-0005vT-36 for freebsd-questions@freebsd.org; Tue, 16 Apr 2013 22:46:08 +0200 Received: from pool-173-79-84-117.washdc.fios.verizon.net ([173.79.84.117]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 16 Apr 2013 22:46:08 +0200 Received: from nightrecon by pool-173-79-84-117.washdc.fios.verizon.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 16 Apr 2013 22:46:08 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-questions@freebsd.org From: Michael Powell Subject: Re: pwd.db/spwd.db file corupption when having unsafe system poweroff Date: Tue, 16 Apr 2013 16:45:59 -0400 Lines: 81 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: pool-173-79-84-117.washdc.fios.verizon.net X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: nightrecon@hotmail.com List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Apr 2013 20:46:16 -0000 Tak Tak wrote: > hi everyone, > > i wanna know what exactly happens for freebsd files and processes, > when we shutdown system via pressing hardware power key for 3 seconds? > > here's what has happened to me, recently: > i've faced a strange problem.. on one of my bsd servers, one of my > coworkers had defined and edited some system users, and then, instead > of safe shutdown, he kept pressing power-button for 3 seconds!.. > after next startup, we couldn't login anymore! we had to replace > pwd.db and spwd.db files, via bootable-freebsd Fixit mode, and then, > everything was fine! > > we know that we are, for sure, better to use safe shutdown, but i > can't guarantee it always happens. what if sudden power off makes same > problem??so i can't leave my servers in such situations.. > > My questins are: > what has happened exactly? just in-used corrupted files ?? > is there any way to prevent this situation? (instead of having a > read-only FS.. i can't apply it on this server for now..). > > i'm sorry if my question seems dummish! i'm trying to increase my bsd > knowledge, but i'm just on my way.. > > for sure, i appreciate any ideas or answers :) At the risk of illustrating what I'm fuzzy on, possibly those with more in- depth skill can fill in the blanks or tidy up some with more accurate and complete details. Pressing the power button for 4 seconds as described is invoking the ACPI layer to stimulate call(s) down to the system BIOS. Whatever is set in the BIOS wrt to power control and various power-savings modes are passed through the ACPI layer. The problem with this is the acpi module in FreeBSD may, or may not, be a perfect implementation for every possible piece of hardware in existance. The piece of that which really concerns me are individual manufactuer BIOS quirks can be just enough 'off' so as to misbehave even when the FreeBSD acpi implentation is basically sound. The jist of this is (IMHO here - YMMV) is I consider it a bad procedure to turn off a server as you've described. Use the shutdown command properly instead. I would never do what your coworker did to any of my servers. Caveat being sometimes you have no other choice but to do a hard power-down. A hard power-down is done by using the switch on the power supply, and not using the ACPI/BIOS from pressing the power switch on the front. When you do have an 'uh-oh' like this, FreeBSD normally boots back into an unclean file system with corresponding whinings and complaints about how the file system(s) were not properly dismounted. Normally a background fsck ensues after 60 seconds of idle. In your case whatever files were left open and not properly closed this background fsck, had it been allowed to run and complete, would have cleaned this up. The problem starts when someone presses the power off button again, and again, before this process completes. Using the power button ACPI/BIOS only compounds this situation. I have had at one time or another, power failures that occurred almost back to back, only with a few minutes in between. So what happened was on first boot after power came back the power went down again right in the middle of this background fsck. Two more of these and my file system(s) were in pretty not-so-good shape. Luckily I was running gmirror and one of the drives was consistent. So the mirror got rebuilt from the drive with the consistent file system automagically (takes a while), then the system continued to boot, and then the background fsck finally kicked in. Gmirror saved my bacon here. Journaling is also supposed to provide similar error recovery features. I've had this happen twice on 2 different boxen. Needless to say, 2 broken UPS units were scrapped and replaced as a result. I would recommend you do NOT use the power button as you described above. Period. In any event pay particular attention to that very first boot after an 'uh-oh' power off event. Look at top and watch for the background fsck to kick off and complete, returning the machine to quiescent state BEFORE you do ANYTHING else to it. This includes pressing the button on the front. Just my $.02 - but I've had a couple of experiences like this and survived them successfully by doing things my way. -Mike