From owner-freebsd-stable@FreeBSD.ORG Mon Nov 24 10:30:27 2003 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 565F116A4CE for ; Mon, 24 Nov 2003 10:30:27 -0800 (PST) Received: from postal3.es.net (postal3.es.net [198.128.3.207]) by mx1.FreeBSD.org (Postfix) with ESMTP id 119EF43FB1 for ; Mon, 24 Nov 2003 10:30:24 -0800 (PST) (envelope-from oberman@es.net) Received: from ptavv.es.net ([198.128.4.29]) by postal3.es.net (Postal Node 3) with ESMTP (SSL) id MUA74016 for ; Mon, 24 Nov 2003 10:30:22 -0800 Received: from ptavv (localhost [127.0.0.1]) by ptavv.es.net (Tachyon Server) with ESMTP id 800105D07 for ; Mon, 24 Nov 2003 10:30:21 -0800 (PST) To: freebsd-stable@freebsd.org In-Reply-To: Message from Peter Radcliffe of "Fri, 21 Nov 2003 14:24:24 EST." <20031121192423.GA20052@pir.net> Date: Mon, 24 Nov 2003 10:30:21 -0800 From: "Kevin Oberman" Message-Id: <20031124183021.800105D07@ptavv.es.net> Subject: Re: Hang on boot with 4.9-STABLE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Nov 2003 18:30:27 -0000 > Date: Fri, 21 Nov 2003 14:24:24 -0500 > From: Peter Radcliffe > Sender: owner-freebsd-stable@freebsd.org > > Doug White probably said: > > Well, until you can spend some time on them to provoke crashdumps or find > > the date things go bad or whatever, there isn't much we can do. > > I installed the spare remote controlled power strip today and managed > to persuade the owner of the systems to give me some time to move the > power to it and using one to debug for a bit. > > > Switching to acpi (or simply removing apm from the config) does seem > to work around the problem, but experimental acpi in 4.x doesn't give > me warm fuzzy feelings. > > Don't seem to be able to provoke a crashdump, even with loader.conf > setting where to dump to, but with RELENG_4 source from > date=2003.10.15.05.00.00 I do get a kernel trap; > > apm0: on motherboard > kernel trap 12 with interrupts disabled > > Fatal trap 12: page fault while in kernel mode > mp_lock = 00000006; cpuid = 0; lapic.id = 00000000 > fault virtual address = 0x36 > fault code = supervisor write, page not present > instruction pointer = 0x8:0xc020d05a > stack pointer = 0x10:0xc03fad66 > frame pointer = 0x10:0xc03fae06 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 0 (swapper) > interrupt mask = net tty bio cam <- SMP: XXX > kernel: type 12 trap, code=0 > Stopped at vm_fault+0x132: lock addw %ax,0x36(%edx) > db> trace > vm_fault(c0354e2c,c0000000,2,0,c) at vm_fault+0x132 > trap_pfault(c03fae82,0,c00004d8,ffffffff,0) at trap_pfault+0xda > trap(18,70000010,60,1c,0) at trap+0x377 > calltrap() at calltrap+0x17 > --- trap 0xc, eip = 0x6096, esp = 0xc03faec2, ebp = 0xc03faec8 --- > gd_idlestack(aedc0058,0,530e0102,80202,5061aa) at 0x6096 > > I've limited the time of the problem being introduced to between > date=2003.10.10.05.00.00 and date=2003.10.15.05.00.00 and am working > on finding a more exact range. Peter, I've been on travel, so I'm just catching up. The crash is the result of a long-standing bug in the apm code that Peter Wemm fixed back on 10/16. Make sure that locore.s is 1.132.2.13. This should not be happening on 4.9 unless there has been a regression. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634