From owner-freebsd-stable@FreeBSD.ORG  Mon Nov 24 10:30:27 2003
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 565F116A4CE
	for <freebsd-stable@freebsd.org>;
	Mon, 24 Nov 2003 10:30:27 -0800 (PST)
Received: from postal3.es.net (postal3.es.net [198.128.3.207])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 119EF43FB1
	for <freebsd-stable@freebsd.org>;
	Mon, 24 Nov 2003 10:30:24 -0800 (PST)	(envelope-from oberman@es.net)
Received: from ptavv.es.net ([198.128.4.29])
        by postal3.es.net (Postal Node 3) with ESMTP (SSL) id MUA74016
        for <freebsd-stable@freebsd.org>; Mon, 24 Nov 2003 10:30:22 -0800
Received: from ptavv (localhost [127.0.0.1])
	by ptavv.es.net (Tachyon Server) with ESMTP id 800105D07
	for <freebsd-stable@freebsd.org>;
	Mon, 24 Nov 2003 10:30:21 -0800 (PST)
To: freebsd-stable@freebsd.org
In-Reply-To: Message from Peter Radcliffe <pir@pir.net> 
   of "Fri, 21 Nov 2003 14:24:24 EST." <20031121192423.GA20052@pir.net> 
Date: Mon, 24 Nov 2003 10:30:21 -0800
From: "Kevin Oberman" <oberman@es.net>
Message-Id: <20031124183021.800105D07@ptavv.es.net>
Subject: Re: Hang on boot with 4.9-STABLE 
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Production branch of FreeBSD source code
	<freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 24 Nov 2003 18:30:27 -0000

> Date: Fri, 21 Nov 2003 14:24:24 -0500
> From: Peter Radcliffe <pir@pir.net>
> Sender: owner-freebsd-stable@freebsd.org
> 
> Doug White <dwhite@gumbysoft.com> probably said:
> > Well, until you can spend some time on them to provoke crashdumps or find
> > the date things go bad or whatever, there isn't much we can do.
> 
> I installed the spare remote controlled power strip today and managed
> to persuade the owner of the systems to give me some time to move the
> power to it and using one to debug for a bit.
> 
> 
> Switching to acpi (or simply removing apm from the config) does seem
> to work around the problem, but experimental acpi in 4.x doesn't give
> me warm fuzzy feelings.
> 
> Don't seem to be able to provoke a crashdump, even with loader.conf
> setting where to dump to, but with RELENG_4 source from
> date=2003.10.15.05.00.00 I do get a kernel trap;
> 
>   apm0: <APM BIOS> on motherboard
>   kernel trap 12 with interrupts disabled
> 
>   Fatal trap 12: page fault while in kernel mode
>   mp_lock = 00000006; cpuid = 0; lapic.id = 00000000
>   fault virtual address   = 0x36
>   fault code              = supervisor write, page not present
>   instruction pointer     = 0x8:0xc020d05a
>   stack pointer           = 0x10:0xc03fad66
>   frame pointer           = 0x10:0xc03fae06
>   code segment            = base 0x0, limit 0xfffff, type 0x1b
>                           = DPL 0, pres 1, def32 1, gran 1
>   processor eflags        = interrupt enabled, resume, IOPL = 0
>   current process         = 0 (swapper)
>   interrupt mask          = net tty bio cam  <- SMP: XXX
>   kernel: type 12 trap, code=0
>   Stopped at      vm_fault+0x132: lock addw       %ax,0x36(%edx)
>   db> trace 
>   vm_fault(c0354e2c,c0000000,2,0,c) at vm_fault+0x132
>   trap_pfault(c03fae82,0,c00004d8,ffffffff,0) at trap_pfault+0xda
>   trap(18,70000010,60,1c,0) at trap+0x377
>   calltrap() at calltrap+0x17
>   --- trap 0xc, eip = 0x6096, esp = 0xc03faec2, ebp = 0xc03faec8 ---
>   gd_idlestack(aedc0058,0,530e0102,80202,5061aa) at 0x6096
> 
> I've limited the time of the problem being introduced to between
> date=2003.10.10.05.00.00 and date=2003.10.15.05.00.00 and am working
> on finding a more exact range.

Peter,

I've been on travel, so I'm just catching up.

The crash is the result of a long-standing bug in the apm code that
Peter Wemm fixed back on 10/16. Make sure that locore.s is 1.132.2.13.
This should not be happening on 4.9 unless there has been a
regression.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: oberman@es.net			Phone: +1 510 486-8634