From owner-freebsd-stable@FreeBSD.ORG Thu Mar 10 07:46:21 2005 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3D64416A4CE for ; Thu, 10 Mar 2005 07:46:21 +0000 (GMT) Received: from carver.gumbysoft.com (carver.gumbysoft.com [66.220.23.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 14B2543D62 for ; Thu, 10 Mar 2005 07:46:21 +0000 (GMT) (envelope-from dwhite@gumbysoft.com) Received: by carver.gumbysoft.com (Postfix, from userid 1000) id 0A51472DD4; Wed, 9 Mar 2005 23:46:21 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by carver.gumbysoft.com (Postfix) with ESMTP id 04B2272DCB; Wed, 9 Mar 2005 23:46:21 -0800 (PST) Date: Wed, 9 Mar 2005 23:46:20 -0800 (PST) From: Doug White To: Tony Arcieri In-Reply-To: <20050309184838.GA64546@flash.atmos.colostate.edu> Message-ID: <20050309234350.W53915@carver.gumbysoft.com> References: <20050309184838.GA64546@flash.atmos.colostate.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-stable@freebsd.org Subject: Re: Continued instability with 5.3-STABLE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2005 07:46:21 -0000 On Wed, 9 Mar 2005, Tony Arcieri wrote: > I have a dual Opteron upon which seems to only stay up approximately two > weeks at a time then spontaneously reboots. It's colocated so I can't ever > see panic messages, and I don't have another system colocated at the same > place I can use to gather debugging info. You may want to consider finding a small system with a free serial port to serve as a temporary serial console. Without output from the crash its impossible to tell what went wrong. > I've never managed to get the system to generate a crash dump either. It > has a 1GB swap partition and 2GB of physical RAM but through the last > few reboots I've been setting hw.physmem to 896M as the only custom parameter > in loader.conf. The swap partition is labeled as follows: > > twed0s1b swap 1024MB SWAP > > And dumpdev is set in rc.conf as follows: > > dumpdev="/dev/twed0s1b" > > /var/crash/minfree is set to 2048 > > Lately I built a kernel from GENERIC using the latest RELENG_5 sources and > without SMP support and experienced a reboot after approximately 16 days uptime, > roughly equivalent to how long it took the system to crash with SMP enabled. > No core file was generated. > > The kernel was built using source checked out from RELENG_5 on February 18th. > I'm not sure if any Opteron specific fixes have been applied to the branch > since then. Make sure you're actually running this kernel since crashdump support for twe was added 2/12, in rev 1.22.2.1 of src/sys/dev/twe/twe.c. > Are there any other means of gathering debugging data that would work in > my situation? As is I'm still unsure if my problems are hardware or > software related as I've still never seen a panic message from the > system (hardware is a Tyan K8S motherboard in a Tyan Transport system) You really, really want a serial console. > Should I look into using KTR ALQ to log KTR data to the swap partition, and > if it fills up will it wrap over to the beginning? I've never used that > feature before... If you don't have a serial console to manipulate ddb from or crashdumps then there is no way to retrieve the ktr data. -- Doug White | FreeBSD: The Power to Serve dwhite@gumbysoft.com | www.FreeBSD.org