From owner-freebsd-current@FreeBSD.ORG  Tue Sep 14 17:00:49 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 9CEBC16A4DC
	for <freebsd-current@freebsd.org>;
	Tue, 14 Sep 2004 17:00:49 +0000 (GMT)
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 326F343D4C
	for <freebsd-current@freebsd.org>;
	Tue, 14 Sep 2004 17:00:49 +0000 (GMT)
	(envelope-from robert@fledge.watson.org)
Received: from fledge.watson.org (localhost [127.0.0.1])
	by fledge.watson.org (8.13.1/8.13.1) with ESMTP id i8EH0Vd6067964;
	Tue, 14 Sep 2004 13:00:31 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Received: from localhost (robert@localhost)i8EH0V2b067961;
	Tue, 14 Sep 2004 13:00:31 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Date: Tue, 14 Sep 2004 13:00:31 -0400 (EDT)
From: Robert Watson <rwatson@freebsd.org>
X-Sender: robert@fledge.watson.org
To: Volker <volker@vwsoft.com>
In-Reply-To: <41471DD8.2050006@vwsoft.com>
Message-ID: <Pine.NEB.3.96L.1040914125449.63543C-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-current@freebsd.org
Subject: Re: fatal trap 12
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Sep 2004 17:00:49 -0000

On Tue, 14 Sep 2004, Volker wrote:

> After the reboot, the system is panicing 3 to 8 times a day. To see the
> panic messages, I've set the PANIC_REBOOT_WAIT_TIME to -1 and this let
> me see a message like (not copied and pasted): 

If I might suggest, and if possible, you might want to set up a serial
console for the box so that you can copy and paste debugger output. 
You'll probably be asked for quite a bit of output from the debugger and
life is a lot easier if you can do that :-).  It also reduces the chances
of typographical errors.

> fatal trap 12: page fault
> fault virtual address: 0xc
> fault code: supervisor read, page not present
> instr. ptr: 0x8:0xc0586e60
> stack ptr: 0x10:0xcee2cac8
> frame ptr: 0x10:0xcee2caf0
> cs: base 0x0 limit 0xffff type 0x1b DPL 0 pres 1 def32 1 gran 1
> cpu eflags: interrupt enabled, resume, IOPL=0
> process: 33767 imapd
> trap 12

This is a kernel NULL pointer dereference.  To debug this, it would be
helpful if you could determine what line in the kernel source code
0xc0586e60 refers to.  addr2line on the kernel.debug from your kernel
build is a good place to start.  It would also be very helpful to have a
stack trace.  When you drop to DDB due to the panic (assuming DDB is
compiled in), you can type in "trace" to generate the trace.  Having the
names of the functions plus offsets would be very helpful.  Also having
the arguments is good, but a lot more pain for you without a serial
console :-).

> While trying to get the system stable, I've tried a 6-current Kernel
> (+world) but the system still panics (only the current process and the
> pointer addresses are changing, the system mostly panics with a trap
> 12). 
> 
> Another time the system panic'ed with: 'panic: sbappendaddr_locked'

A stack trace here would be invaluable.  This panic occurs as a result of
a violation of calling convention, in which a non-header mbuf (or maybe a
free'd mbuf) is appended to a socket incorrectly.  A stack trace will tell
as what calling code might be at fauilt.

> On 2004-09-13 I've cvsup'ed current and releng_5 sources and recompiled 
> (releng_5) world + kernel. The system kept panicing.
> 
> Well, since having boot problems using that mainboard (Slot-1, P-III 
> 600, FIC VB-601V, which caused the BTX loader sometimes to a fatal 
> exit... strange thing), I've plugged in another board which has been 
> working stable over the last few weeks (Epox 51-MVP3G with AMD K6-2 500).
> 
> This system is now up using that socket-7 board but has paniced a few 
> minutes ago the second time:
> 
> fatal trap 12: page fault
> fatal virtual address: 0x40
> trap 12: page fault while in kernel mode
> ip: 0x8:0xc05488ed
> sp: 0x10:0xca3f4c20
> fp: 0x10:0xca3f4c20
> process: 34 (swi6: task queue)
> 
> A few minutes before it paniced with:
> 
> in_cksum_skip: out of data by 184

A couple of bugs relating to this error were introduced and then fixed.
In particular, could you confirm that you have at least revision 1.165 of
udp_usrreq.c, or 1.162.2.2 of udp_usrreq.c?  The merge to RELENG_5
happened on 8/30 so you should have it, but it's worth confirming.

A stack trace here would also be extremely helpful, but this failure could
be explained by whatever causes the sbappendaddr_locked failure as well.

> Any additional tests you want me to drive? 

Could you try booting and running the system with debug.mpsafenet=0 in
loader.conf?  Is this an SMP box?  Could you try compiling and running
without the PREEMPTION kernel option?  Probably the most valuable
information would be the stack traces as indicated above, however.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Principal Research Scientist, McAfee Research