From owner-freebsd-stable@FreeBSD.ORG Tue Jan 11 02:26:46 2011 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7ADE2106566B for ; Tue, 11 Jan 2011 02:26:46 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta03.westchester.pa.mail.comcast.net (qmta03.westchester.pa.mail.comcast.net [76.96.62.32]) by mx1.freebsd.org (Postfix) with ESMTP id 2200D8FC08 for ; Tue, 11 Jan 2011 02:26:45 +0000 (UTC) Received: from omta18.westchester.pa.mail.comcast.net ([76.96.62.90]) by qmta03.westchester.pa.mail.comcast.net with comcast id tsnw1f0051wpRvQ532DKFM; Tue, 11 Jan 2011 02:13:19 +0000 Received: from koitsu.dyndns.org ([98.248.34.134]) by omta18.westchester.pa.mail.comcast.net with comcast id u2DH1f00S2tehsa3e2DJm8; Tue, 11 Jan 2011 02:13:19 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 6ECF29B427; Mon, 10 Jan 2011 18:13:16 -0800 (PST) Date: Mon, 10 Jan 2011 18:13:16 -0800 From: Jeremy Chadwick To: Mark Saad Message-ID: <20110111021316.GA84376@icarus.home.lan> References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: stable@freebsd.org Subject: Re: Enabling DDB prevent kernel from panicing X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jan 2011 02:26:46 -0000 On Mon, Jan 10, 2011 at 07:42:21PM -0500, Mark Saad wrote: > On Mon, Jan 10, 2011 at 6:59 PM, wrote: > > Hello, Mark > > > > 2011/1/11 Mark Saad : > >> All > >> This was originally posted to hackers@ > >> > >> I have a good question that I cant find an answer for. I believe > >> found a kernel bug in 7.3-RELEASE that prevents me from booting 64-bit > >> kernels on HP's DL360 G4p . The kernel dies with "Fatal trap 12: page > >> fault while in kernel mode " . The hardware works fine in 7.2-RELEASE > >> amd64, 7.1-RELEASE amd64, and 6.4-RELEASE amd64 . > >> > >> In 7.3-RELEASE amd64 I can not boot from cd or pxe correctly using the > >> stock 7.3-RELEASE amd64 kernel however i386 works fine. To see if this > >> issue was some how fixed in 7.3-RELEASE-p4 amd64 I rebuilt a GENERIC > >> kernel using patches sources and tried to boot and I got the same > >> crash. > >> > >>  Next I rebuilt the kernel with KDB and DDB to see if I could get a > >> core-dump of the system. I also set loader.conf to > >> > >> kernel="kernel.DEBUG" > >> kern.dumpdev="/dev/da0s1b" > >> > >> Next I pxebooted  the box and the system does not crash on boot up, it > >> will easily load a nfs root and work fine. So I copied my debug > >> kernel, and loader.conf to the local disk and rebooted and it boots > >> fine from the local disk . > > > > Looks like a race condition. > > Well, you don't need to compile KDB and DDB, just add > > > > makeoptions DEBUG=-g > > > > into your kernel config file and rebuild kernel. > > > > Then after you got a crash dump you can easy debug it (see FreeBSD > > Developers Handbok): > > http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html > > > > > > wbr, > > Nickolas > > > > Sorry let me clarify the issue, When you install a generic > 7.3-RELEASE amd64 on some of the HP servers I use, the kernel panics > in boot up > when it probes the sio driver . Here is a part of my dmesg.boot file > > atkbd0: [ITHREAD] > psm0: irq 12 on atkbdc0 > psm0: [GIANT-LOCKED] > psm0: [ITHREAD] > psm0: model Generic PS/2 mouse, device ID 0 > sio0: configured irq 4 not in bitmap of probed irqs 0 > sio0: port may not be enabled > sio0: configured irq 4 not in bitmap of probed irqs 0 > sio0: port may not be enabled > sio0: port 0x3f8-0x3ff irq 4 on acpi0 > sio0: type 16550A > sio0: [FILTER] > Say about here in the boot up , is where the box crashes with the > above noted error. > > If I then boot the same box off a 7.1-RELEASE amd64 netboot server , > mount the local disks of the 7.3-RELEASE install and edit the > /boot/device.hints and comment out the sio hints like this > > hint.vga.0.at="isa" > hint.sc.0.at="isa" > hint.sc.0.flags="0x100" > #hint.sio.0.at="isa" > #hint.sio.0.port="0x3F8" > #hint.sio.0.flags="0x10" > #hint.sio.0.irq="4" > #hint.sio.1.at="isa" > #hint.sio.1.port="0x2F8" > #hint.sio.1.irq="3" > #hint.sio.2.at="isa" > #hint.sio.2.disabled="1" > #hint.sio.2.port="0x3E8" > #hint.sio.2.irq="5" > #hint.sio.3.at="isa" > #hint.sio.3.disabled="1" > #hint.sio.3.port="0x2E8" > #hint.sio.3.irq="9" > hint.ppc.0.at="isa" > hint.ppc.0.irq="7" > > then boot the server off the local disks , the server boots correctly. > > The odd thing was, I rebuilt a debug 7.3-RELEASE amd64 kernel on > another working server, and installed it on the broken server and > booted it off the local disks, with out any changes to the hints file > and the server booted correctly and I was able to manually break out > into the debugger , but nothing looked wrong . The sio(4) driver has been deprecated in RELENG_8, which uses uart(4). uart(4) is better in a lot of regards, and should also be available for use on RELENG_7 but you'll need to adjust /etc/ttys to refer to the new device names (ttyuX vs. ttydX), plus add the uart entries to /boot/device.hints. I'm mentioning this as a workaround. Also worth considering is that the sio(4) ISA probe may be touching something Bad(tm) as a result, so you might try adding the following lines to your loader.conf (not a typo) to disable sio(4) entries entirely: hint.sio.0.disabled="1" hint.sio.1.disabled="1" And see if that improves things. If it does, remove the sio.1.disabled entry and see if that suffices. > So to sum this up there is something broken in 7.3-RELEASE but I cant > figure out what. This server works with a generic install of > 7.1-RELEASE 7.2-RELEASE , 6.1-RELEASE, 6.2-RELEASE and 6.4-RELEASE in > both amd64 and i386 , but not 7.3-RELEASE in amd64 . It also worked in > 7.4-RC1 . > > avg recommended I see what changed from r212964 to r212994 I am > currently looking into this . Has anyone seen this before ? If the server works fine with 7.4-PRERELEASE/RC1, why are you caring about 7.3? Upgrade. :-) -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB |