From owner-freebsd-current@FreeBSD.ORG Tue Sep 14 17:00:49 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9CEBC16A4DC for ; Tue, 14 Sep 2004 17:00:49 +0000 (GMT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 326F343D4C for ; Tue, 14 Sep 2004 17:00:49 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.13.1/8.13.1) with ESMTP id i8EH0Vd6067964; Tue, 14 Sep 2004 13:00:31 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i8EH0V2b067961; Tue, 14 Sep 2004 13:00:31 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Tue, 14 Sep 2004 13:00:31 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Volker In-Reply-To: <41471DD8.2050006@vwsoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-current@freebsd.org Subject: Re: fatal trap 12 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Sep 2004 17:00:49 -0000 On Tue, 14 Sep 2004, Volker wrote: > After the reboot, the system is panicing 3 to 8 times a day. To see the > panic messages, I've set the PANIC_REBOOT_WAIT_TIME to -1 and this let > me see a message like (not copied and pasted): If I might suggest, and if possible, you might want to set up a serial console for the box so that you can copy and paste debugger output. You'll probably be asked for quite a bit of output from the debugger and life is a lot easier if you can do that :-). It also reduces the chances of typographical errors. > fatal trap 12: page fault > fault virtual address: 0xc > fault code: supervisor read, page not present > instr. ptr: 0x8:0xc0586e60 > stack ptr: 0x10:0xcee2cac8 > frame ptr: 0x10:0xcee2caf0 > cs: base 0x0 limit 0xffff type 0x1b DPL 0 pres 1 def32 1 gran 1 > cpu eflags: interrupt enabled, resume, IOPL=0 > process: 33767 imapd > trap 12 This is a kernel NULL pointer dereference. To debug this, it would be helpful if you could determine what line in the kernel source code 0xc0586e60 refers to. addr2line on the kernel.debug from your kernel build is a good place to start. It would also be very helpful to have a stack trace. When you drop to DDB due to the panic (assuming DDB is compiled in), you can type in "trace" to generate the trace. Having the names of the functions plus offsets would be very helpful. Also having the arguments is good, but a lot more pain for you without a serial console :-). > While trying to get the system stable, I've tried a 6-current Kernel > (+world) but the system still panics (only the current process and the > pointer addresses are changing, the system mostly panics with a trap > 12). > > Another time the system panic'ed with: 'panic: sbappendaddr_locked' A stack trace here would be invaluable. This panic occurs as a result of a violation of calling convention, in which a non-header mbuf (or maybe a free'd mbuf) is appended to a socket incorrectly. A stack trace will tell as what calling code might be at fauilt. > On 2004-09-13 I've cvsup'ed current and releng_5 sources and recompiled > (releng_5) world + kernel. The system kept panicing. > > Well, since having boot problems using that mainboard (Slot-1, P-III > 600, FIC VB-601V, which caused the BTX loader sometimes to a fatal > exit... strange thing), I've plugged in another board which has been > working stable over the last few weeks (Epox 51-MVP3G with AMD K6-2 500). > > This system is now up using that socket-7 board but has paniced a few > minutes ago the second time: > > fatal trap 12: page fault > fatal virtual address: 0x40 > trap 12: page fault while in kernel mode > ip: 0x8:0xc05488ed > sp: 0x10:0xca3f4c20 > fp: 0x10:0xca3f4c20 > process: 34 (swi6: task queue) > > A few minutes before it paniced with: > > in_cksum_skip: out of data by 184 A couple of bugs relating to this error were introduced and then fixed. In particular, could you confirm that you have at least revision 1.165 of udp_usrreq.c, or 1.162.2.2 of udp_usrreq.c? The merge to RELENG_5 happened on 8/30 so you should have it, but it's worth confirming. A stack trace here would also be extremely helpful, but this failure could be explained by whatever causes the sbappendaddr_locked failure as well. > Any additional tests you want me to drive? Could you try booting and running the system with debug.mpsafenet=0 in loader.conf? Is this an SMP box? Could you try compiling and running without the PREEMPTION kernel option? Probably the most valuable information would be the stack traces as indicated above, however. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research