From owner-freebsd-questions@FreeBSD.ORG Sat Aug 18 01:05:45 2012 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7E4F8106568A for ; Sat, 18 Aug 2012 01:05:45 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from mx01.qsc.de (mx01.qsc.de [213.148.129.14]) by mx1.freebsd.org (Postfix) with ESMTP id 24D4D8FC08 for ; Sat, 18 Aug 2012 01:05:44 +0000 (UTC) Received: from r56.edvax.de (port-92-195-32-97.dynamic.qsc.de [92.195.32.97]) by mx01.qsc.de (Postfix) with ESMTP id 494413CEA2; Sat, 18 Aug 2012 03:05:38 +0200 (CEST) Received: from r56.edvax.de (localhost [127.0.0.1]) by r56.edvax.de (8.14.5/8.14.5) with SMTP id q7I15bL1002998; Sat, 18 Aug 2012 03:05:37 +0200 (CEST) (envelope-from freebsd@edvax.de) Date: Sat, 18 Aug 2012 03:05:37 +0200 From: Polytropon To: freebsd@dreamchaser.org Message-Id: <20120818030537.4d5bf55b.freebsd@edvax.de> In-Reply-To: <502EA73B.6000008@dreamchaser.org> References: <502EA73B.6000008@dreamchaser.org> Organization: EDVAX X-Mailer: Sylpheed 3.1.1 (GTK+ 2.24.5; i386-portbld-freebsd8.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: FreeBSD Mailing List Subject: Re: fsck recoveries, configuration X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Polytropon List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Aug 2012 01:05:45 -0000 On Fri, 17 Aug 2012 14:19:07 -0600, Gary Aitken wrote: > 1. It appears to me that the file system (ufs) is not writing > stuff out when things are idle. If I do a sync manually and > leave the machine idle and it crashes later, it comes up clean. > If I don't do a sync manually and it crashes later, it often > comes up needing fsck. Is there a way to configure the filesystem > to cache but still write cached stuff at low priority? Note that even if the OS orders a data write, it's up to the disk driver to actually tell the disk to do it. And the disk then _has_ to do it. There is no real "connection" (in time) for those components of the "task line", even though one would assume that they happen immediately. On a somewhat idle system, you could keep a process (e. g. top -S) running to check system processes that could be responsible for writes (or missing writes). > 2. When my machine hung (could not rlogin or ping), I powered > off and rebooted. Does the machine have a "soft power button" and it is configured to issue a "shutdown -p now" (which is quite common)? When you have access to the machine, try that. Even if the machine does not accept network logins, this mechanism might still work. > Reboot did a deferred fsck. Is this intended? Personally, I'd rather wait some time to boot in a fully checked file system environment then dealing with the uncertain situation of snapshots and background FS check activity. In worst case, I want to be prompted by fsck if a major defect has been found that requires administrator attention. Put background_fsck="NO" into /etc/rc.conf to get this behaviour. Note that as long as fsck is running, you can't enter any interactive commands, and it will happen _prior_ to allowing any network connections. Also note that this is in single user mode, so you can't switch VTs. > After it booted I logged in, and also logged in on another system. > On the remote system I could do a ping but rlogin returned > "connection reset by peer", even though I could log in locally. Does rlogin work when you "give the system some time to recover"? > I presume that is because the background fscks were not complete? Possible. Background fsck is uncertain per se, so for diagnostics better leave it aside and use the maybe "less comfortable" method. This is easy when you have local access to the machine in question. > I then did a > ps ax | grep fsck > and saw only the "logger" process for the deferred fsck's. > I did a > man logger > which appeared to hang -- no output. I'm guessing because it needed > the filesystems which hadn't yet fsck'd. Just a guess: Maybe you're experiencing a file system defect and fsck, even though running in background, needs an input? I'm not really sure about this, because I'm _intendedly_ not using fsck that way. > I then attempted to switch consoles using > fn > but could not. That would imply you're still stuck in SUM. A strange constellation given that it appears that you have fsck running in background. > I then attempted to kill the man logger process using ^C with no success. Waiting / hanging process? > Can someone shed light on the above sequence of events? It's highly > likely some of them occurred before the 60 second delay for fsck > timed out, but I'd like to understand what the heck is going on. Try to construct a more _defined_ situation for further diagnostics. Also you could boot the system up in SUM (use "boot -s") and then perform fsck manually, just to make sure your disks are fine. -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...