From owner-freebsd-stable@FreeBSD.ORG Tue Jan 11 03:29:14 2011 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8006B1065675 for ; Tue, 11 Jan 2011 03:29:14 +0000 (UTC) (envelope-from nonesuch@longcount.org) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 022018FC15 for ; Tue, 11 Jan 2011 03:29:13 +0000 (UTC) Received: by bwz12 with SMTP id 12so12364358bwz.13 for ; Mon, 10 Jan 2011 19:29:12 -0800 (PST) MIME-Version: 1.0 Received: by 10.204.140.208 with SMTP id j16mr3822725bku.151.1294716551487; Mon, 10 Jan 2011 19:29:11 -0800 (PST) Received: by 10.204.151.212 with HTTP; Mon, 10 Jan 2011 19:29:11 -0800 (PST) X-Originating-IP: [68.239.206.210] In-Reply-To: <20110111021316.GA84376@icarus.home.lan> References: <20110111021316.GA84376@icarus.home.lan> Date: Mon, 10 Jan 2011 22:29:11 -0500 Message-ID: From: Mark Saad To: Jeremy Chadwick Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: stable@freebsd.org Subject: Re: Enabling DDB prevent kernel from panicing X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jan 2011 03:29:14 -0000 On Mon, Jan 10, 2011 at 9:13 PM, Jeremy Chadwick wrote: > On Mon, Jan 10, 2011 at 07:42:21PM -0500, Mark Saad wrote: >> On Mon, Jan 10, 2011 at 6:59 PM, =C2=A0 wrote: >> > Hello, Mark >> > >> > 2011/1/11 Mark Saad : >> >> All >> >> This was originally posted to hackers@ >> >> >> >> I have a good question that I cant find an answer for. I believe >> >> found a kernel bug in 7.3-RELEASE that prevents me from booting 64-bi= t >> >> kernels on HP's DL360 G4p . The kernel dies with "Fatal trap 12: page >> >> fault while in kernel mode " . The hardware works fine in 7.2-RELEASE >> >> amd64, 7.1-RELEASE amd64, and 6.4-RELEASE amd64 . >> >> >> >> In 7.3-RELEASE amd64 I can not boot from cd or pxe correctly using th= e >> >> stock 7.3-RELEASE amd64 kernel however i386 works fine. To see if thi= s >> >> issue was some how fixed in 7.3-RELEASE-p4 amd64 I rebuilt a GENERIC >> >> kernel using patches sources and tried to boot and I got the same >> >> crash. >> >> >> >> =C2=A0Next I rebuilt the kernel with KDB and DDB to see if I could ge= t a >> >> core-dump of the system. I also set loader.conf to >> >> >> >> kernel=3D"kernel.DEBUG" >> >> kern.dumpdev=3D"/dev/da0s1b" >> >> >> >> Next I pxebooted =C2=A0the box and the system does not crash on boot = up, it >> >> will easily load a nfs root and work fine. So I copied my debug >> >> kernel, and loader.conf to the local disk and rebooted and it boots >> >> fine from the local disk . >> > >> > Looks like a race condition. >> > Well, you don't need to compile KDB and DDB, just add >> > >> > makeoptions DEBUG=3D-g >> > >> > into your kernel config file and rebuild kernel. >> > >> > Then after you got a crash dump you can easy debug it (see FreeBSD >> > Developers Handbok): >> > http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gd= b.html >> > >> > >> > wbr, >> > Nickolas >> > >> >> =C2=A0 Sorry let me clarify the issue, When you install a generic >> 7.3-RELEASE amd64 on some of the HP servers I use, the kernel panics >> in boot up >> when it probes the sio driver . Here is a part of my dmesg.boot file >> >> atkbd0: [ITHREAD] >> psm0: irq 12 on atkbdc0 >> psm0: [GIANT-LOCKED] >> psm0: [ITHREAD] >> psm0: model Generic PS/2 mouse, device ID 0 >> sio0: configured irq 4 not in bitmap of probed irqs 0 >> sio0: port may not be enabled >> sio0: configured irq 4 not in bitmap of probed irqs 0 >> sio0: port may not be enabled >> sio0: port 0x3f8-0x3ff irq 4 on acpi0 >> sio0: type 16550A >> sio0: [FILTER] >> Say about here in the boot up , is where the box crashes with the >> above noted error. >> >> If I then boot the same box off a 7.1-RELEASE amd64 netboot server , >> mount the local disks of the 7.3-RELEASE install and edit the >> /boot/device.hints and comment out the sio hints like this >> >> hint.vga.0.at=3D"isa" >> hint.sc.0.at=3D"isa" >> hint.sc.0.flags=3D"0x100" >> #hint.sio.0.at=3D"isa" >> #hint.sio.0.port=3D"0x3F8" >> #hint.sio.0.flags=3D"0x10" >> #hint.sio.0.irq=3D"4" >> #hint.sio.1.at=3D"isa" >> #hint.sio.1.port=3D"0x2F8" >> #hint.sio.1.irq=3D"3" >> #hint.sio.2.at=3D"isa" >> #hint.sio.2.disabled=3D"1" >> #hint.sio.2.port=3D"0x3E8" >> #hint.sio.2.irq=3D"5" >> #hint.sio.3.at=3D"isa" >> #hint.sio.3.disabled=3D"1" >> #hint.sio.3.port=3D"0x2E8" >> #hint.sio.3.irq=3D"9" >> hint.ppc.0.at=3D"isa" >> hint.ppc.0.irq=3D"7" >> >> then boot the server off the local disks , the server boots correctly. >> >> The odd thing was, I rebuilt a debug 7.3-RELEASE amd64 kernel on >> another working server, and installed it on the broken server and >> booted it off the local disks, with out any changes to the hints file >> and the server booted correctly and I was able to manually break out >> into the debugger , but nothing looked wrong . > > The sio(4) driver has been deprecated in RELENG_8, which uses uart(4). > uart(4) is better in a lot of regards, and should also be available for > use on RELENG_7 but you'll need to adjust /etc/ttys to refer to the new > device names (ttyuX vs. ttydX), plus add the uart entries to > /boot/device.hints. > I found that too, and I was thinking about the change but its going to require a source build of the kernel to fix that along with a bunch of manual work on my side that I would rather not do . > I'm mentioning this as a workaround. > > Also worth considering is that the sio(4) ISA probe may be touching > something Bad(tm) as a result, so you might try adding the following > lines to your loader.conf (not a typo) to disable sio(4) entries > entirely: > > hint.sio.0.disabled=3D"1" > hint.sio.1.disabled=3D"1" > > And see if that improves things. =C2=A0If it does, remove the sio.1.disab= led > entry and see if that suffices. I'll try the hint disabling but how is that different from removing the hint outright ? > >> So to sum this up there is something broken in 7.3-RELEASE but I cant >> figure out what. This server works with a generic install of >> 7.1-RELEASE 7.2-RELEASE , 6.1-RELEASE, 6.2-RELEASE and 6.4-RELEASE in >> both amd64 and i386 , but not 7.3-RELEASE in amd64 . It also worked in >> 7.4-RC1 . >> >> avg recommended I see what changed from r212964 =C2=A0to r212994 I am >> currently looking into this . Has anyone seen this before ? > > If the server works fine with 7.4-PRERELEASE/RC1, why are you caring > about 7.3? =C2=A0Upgrade. =C2=A0:-) > Can't just upgrade we did a bunch of work on 7.3-RELEASE and we are going to stay on 7.3-RELEASE until 2012 for various reasons. > -- > | Jeremy Chadwick =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 jdc@parodiu= s.com | > | Parodius Networking =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 http://www.parodius.com/ | > | UNIX Systems Administrator =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0Mountain View, CA, USA | > | Making life hard for others since 1977. =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 PGP 4BD6C0CB | > > So anyone what to take a stab on flying with out a device.hints ? --=20 mark saad | nonesuch@longcount.org