From owner-freebsd-stable@FreeBSD.ORG Sat Jun 18 17:52:19 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8F753106566B for ; Sat, 18 Jun 2011 17:52:19 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from QMTA11.westchester.pa.mail.comcast.net (qmta11.westchester.pa.mail.comcast.net [76.96.59.211]) by mx1.freebsd.org (Postfix) with ESMTP id 3C7D88FC08 for ; Sat, 18 Jun 2011 17:52:18 +0000 (UTC) Received: from omta08.westchester.pa.mail.comcast.net ([76.96.62.12]) by QMTA11.westchester.pa.mail.comcast.net with comcast id xVqd1g0030Fqzac5BVsK9v; Sat, 18 Jun 2011 17:52:19 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta08.westchester.pa.mail.comcast.net with comcast id xVsH1g00h1t3BNj3UVsJ0S; Sat, 18 Jun 2011 17:52:19 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 68802102C36; Sat, 18 Jun 2011 10:52:15 -0700 (PDT) Date: Sat, 18 Jun 2011 10:52:15 -0700 From: Jeremy Chadwick To: Stefan Bethke Message-ID: <20110618175215.GA18645@icarus.home.lan> References: <52F39CE0-EEC7-4180-8186-BF8696AF279D@lassitu.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <52F39CE0-EEC7-4180-8186-BF8696AF279D@lassitu.de> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org, Christian Baer Subject: Re: Crashes with Promise controller X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Jun 2011 17:52:19 -0000 On Sat, Jun 18, 2011 at 06:49:41PM +0200, Stefan Bethke wrote: > Am 13.06.2011 um 16:22 schrieb Christian Baer: > > > I have to slightly explain the word "crash" here: I don't actually have > > to hard reset the system myself. My box just does a reboot by itself. No > > filesystem is unmounted cleanly and because the machine isn't really new > > and powerful fsck takes pretty long. > > I can't help you with your controllers, but anyone in a position to > help will likely want to know if the box simply resets, or if the > kernel panics. And if there are going to be any patches, you most > certainly will want to get familiar with the debugger to help try > stuff out. The handbook has information on how to enable crash dumps > and getting the kernel debugger going. If you haven't done so > already, try and get a serial console going, it helps tremendously to > be able to cut&paste debugger info instead of trying to hand > transcribe it. It may be that the kernel is panic'ing and auto-rebooting before he can see the message in question. I would advocate he put the following directives in his kernel configuration and rebuild/reinstall kernel and wait for it to happen again. # Debugging options options KDB # Enable kernel debugger support options KDB_TRACE # Print stack trace automatically on panic options DDB # Support DDB options GDB # Support remote GDB If after doing this the machine literally reboots rather than panics, then that would indicate a mainboard having issues, or power-related stuff (keep reading). As for the behaviour he describes -- this sort of problem can sometimes turn out to be PSU-load-related (too many drives on a PSU that can't handle it on a single rail), bad/improper voltages (difficult to track down given the state of hardware monitoring on mainboards and on FreeBSD), or "dirty power" / excessive ripple. Power-related problems on computers almost always appear as random/abrupt situations that can usually be exacerbated by heavy system utilisation. I have no proof this is Christian's problem, but it's worth considering anyway. One might be able to detect ("log") potential power loss by looking at SMART attribute 12 on mechanical HDDs in the system; if the RAW_VALUE increases after it happens, then power is being lost to the drives. If not, then it may be a soft reset. I use the word "may" because sometimes a very quick brown-out won't cause the drives to actually "power down" fully (e.g. the attribute never gets incremented) but the loss of power can be just enough to cause them to start freaking out. Computers + power issues = expect random chaos. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |