Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Nov 2007 14:41:03 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        Juergen Lock <nox@jelal.kn-bremen.de>
Cc:        freebsd-hackers@freebsd.org, freebsd-emulation@freebsd.org
Subject:   Re: double panic, and whats apic_cmd? (kqemu crash...)
Message-ID:  <200711291441.04134.jhb@freebsd.org>
In-Reply-To: <20071128235042.GA40147@saturn.kn-bremen.de>
References:  <20071118020533.GA57425@saturn.kn-bremen.de> <200711270824.55839.jhb@freebsd.org> <20071128235042.GA40147@saturn.kn-bremen.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday 28 November 2007 06:50:42 pm Juergen Lock wrote:
> On Tue, Nov 27, 2007 at 08:24:55AM -0500, John Baldwin wrote:
> > On Sunday 18 November 2007 05:43:45 pm Juergen Lock wrote:
> > > On Sun, Nov 18, 2007 at 03:05:33AM +0100, Juergen Lock wrote:
> > > > Ok I finally have an amd64 smp box here that i can play with, and 
tried
> > > > to reproduce http://www.freebsd.org/cgi/query-pr.cgi?pr=113430 - and I 
got
> > > > the following crash:
> > > >[...]
> > > 
> > > Ok, the crashes seem to be pretty random, I got a few more:
> > > (btw I disabled -DSMP in the kqemu build since it doesn't seem to help,
> > > and it doesn't seem to be used anywhere else.  Also I forgot to say
> > > I also have KDB_TRACE and KDB_UNATTENDED in the kernel config.  Oh and
> > > I had a few hangs too, and never could get into ddb in those cases...)
> > > 
> > > [GDB will not be able to debug user-mode 
threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
> > > GNU gdb 6.1.1 [FreeBSD]
> > > Copyright 2004 Free Software Foundation, Inc.
> > > GDB is free software, covered by the GNU General Public License, and you 
are
> > > welcome to change it and/or distribute copies of it under certain 
conditions.
> > > Type "show copying" to see the conditions.
> > > There is absolutely no warranty for GDB.  Type "show warranty" for 
details.
> > > This GDB was configured as "amd64-marcel-freebsd".
> > > 
> > > Unread portion of the kernel message buffer:
> > > kernel trap 12 with interrupts disabled
> > > 
> > > 
> > > Fatal trap 12: page fault while in kernel mode
> > > cpuid = 1; apic id = 01
> > > fault virtual address	= 0x246
> > > fault code		= supervisor read instruction, page not present
> > > instruction pointer	= 0x8:0x246
> > > stack pointer	        = 0x10:0xffffffff9fae4b50
> > > frame pointer	        = 0x10:0xffffffff9fae4b80
> > > code segment		= base 0x0, limit 0xfffff, type 0x1b
> > > 			= DPL 0, pres 1, long 1, def32 0, gran 1
> > > processor eflags	= resume, IOPL = 0
> > > current process		= 11 (idle: cpu1)
> > > trap number		= 12
> > > <0>
> > > 
> > > 
> > > Fatal trap 12: page fault while in kernel mode
> > > cpuid = 1; apic id = 01
> > > fault virtual address	= 0xc011dbfb
> > > fault code		= supervisor read instruction, page not present
> > > instruction pointer	= 0x8:0xc011dbfb
> > > stack pointer	        = 0x10:0xffffffff9fae47d0
> > > frame pointer	        = 0x10:0x801de4000
> > > code segment		= base 0x0, limit 0xfffff, type 0x1b
> > > 			= DPL 0, pres 1, long 1, def32 0, gran 1
> > > processor eflags	= trace trap, interrupt enabled, nested task, IOPL = 3
> > > current process		= 11 (idle: cpu1)
> > > trap number		= 12
> > > panic: page fault
> > > cpuid = 1
> > > KDB: stack backtrace:
> > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> > > panic() at panic+0x17a
> > > trap_fatal() at trap_fatal+0x29f
> > > trap_pfault() at trap_pfault+0x294
> > > trap() at trap+0x2ea
> > > sendsig() at sendsig+0x2aa
> > > sched_choose() at sched_choose+0x8c
> > > choosethread() at choosethread+0x2b
> > > sched_switch() at sched_switch+0x184
> > > mi_switch() at mi_switch+0x189
> > > ast() at ast+0x1e8
> > > doreti_ast() at doreti_ast+0x1f
> > > Uptime: 37m8s
> > > Physical memory: 986 MB
> > > Dumping 152 MB: 137 121 105 89 73 57 41 25 9
> > > 
> > > #0  doadump () at pcpu.h:194
> > > 194		__asm __volatile("movq %%gs:0,%0" : "=r" (td));
> > > (kgdb) bt
> > > #0  doadump () at pcpu.h:194
> > > #1  0xffffffff80484b18 in boot (howto=260) 
at ../../../kern/kern_shutdown.c:409
> > > #2  0xffffffff80484f77 in panic (fmt=Variable "fmt" is not available.
> > > ) at ../../../kern/kern_shutdown.c:563
> > > #3  0xffffffff8070de6f in trap_fatal (frame=0xc, eva=Variable "eva" is 
not available.
> > > )
> > >     at ../../../amd64/amd64/trap.c:697
> > > #4  0xffffffff8070e254 in trap_pfault (frame=0xffffffff9fae4720, 
usermode=0)
> > >     at ../../../amd64/amd64/trap.c:614
> > > #5  0xffffffff8070ec0a in trap (frame=0xffffffff9fae4720)
> > >     at ../../../amd64/amd64/trap.c:383
> > > #6  0xffffffff806fcd4a in sendsig (catcher=0x405460, ksi=Variable "ksi" 
is not available.
> > > )
> > >     at ../../../amd64/amd64/machdep.c:326
> > > #7  0xffffffff804a16ec in sched_choose () 
at ../../../kern/sched_4bsd.c:1256
> > > #8  0xffffffff804a174b in choosethread () at kern_switch.c:137
> > > #9  0xffffffff804a2984 in sched_switch (td=0xffffff000209b680, 
> > >     newtd=0xffffff00021a18c0, flags=13) 
at ../../../kern/sched_4bsd.c:907
> > > #10 0xffffffff8048cc99 in mi_switch (flags=2, newtd=0x0)
> > >     at ../../../kern/kern_synch.c:442
> > > #11 0xffffffff804b7068 in ast (framep=0xffffffff9fae4c70)
> > >     at ../../../kern/subr_trap.c:239
> > > #12 0xffffffff806f4999 in doreti_ast () 
at ../../../amd64/amd64/exception.S:468
> > > #13 0x0000000811d87d74 in ?? ()
> > > #14 0x0000000000000005 in ?? ()
> > > #15 0x00000000000010e0 in ?? ()
> > > ---Type <return> to continue, or q <return> to quit---
> > > #16 0x0000000811d87d8c in ?? ()
> > > #17 0x0000000801de4000 in ?? ()
> > > #18 0x0000000741e00000 in ?? ()
> > > #19 0x000000000215dd30 in ?? ()
> > > #20 0x0000000000d49160 in ?? ()
> > > #21 0x00000000c016fdf0 in ?? ()
> > > #22 0x0000000000000000 in ?? ()
> > > #23 0x0000000801de84d0 in ?? ()
> > > #24 0xffffffffbfffffff in ?? ()
> > > #25 0x0000000000063fff in ?? ()
> > > #26 0x0000000801de4000 in ?? ()
> > > #27 0x0000000000063fff in ?? ()
> > > #28 0x0000000000000016 in ?? ()
> > > #29 0x0000000000000000 in ?? ()
> > > #30 0x0000000000000000 in ?? ()
> > > #31 0x0000000000000000 in ?? ()
> > > #32 0x000000000215dd0c in ?? ()
> > > #33 0x000000000000002b in ?? ()
> > > #34 0x0000000000000286 in ?? ()
> > > #35 0x00007fffffffb608 in ?? ()
> > > #36 0x0000000000000023 in ?? ()
> > > #37 0x0000000000000000 in ?? ()
> > > #38 0x0000000000000000 in ?? ()
> > > ---Type <return> to continue, or q <return> to quit---
> > > #39 0x0000000000c9f000 in ?? ()
> > > #40 0x00000000fffffffd in ?? ()
> > > #41 0xffffff0001080460 in ?? ()
> > > #42 0xffffff000209b680 in ?? ()
> > > #43 0x0000000000000001 in ?? ()
> > > #44 0xffffffff9fae4bb0 in ?? ()
> > > #45 0xffffffff9fae4b68 in ?? ()
> > > #46 0xffffff00010819c0 in ?? ()
> > > #47 0xffffffff804a2984 in sched_switch (td=0xd49160, newtd=0x63fff, 
> > >     flags=409599) at ../../../kern/sched_4bsd.c:907
> > > Previous frame inner to this frame (corrupt stack?)
> > > (kgdb) q
> > > iapetus# exit
> > > 
> > >  and
> > > 
> > > [GDB will not be able to debug user-mode 
threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
> > > GNU gdb 6.1.1 [FreeBSD]
> > > Copyright 2004 Free Software Foundation, Inc.
> > > GDB is free software, covered by the GNU General Public License, and you 
are
> > > welcome to change it and/or distribute copies of it under certain 
conditions.
> > > Type "show copying" to see the conditions.
> > > There is absolutely no warranty for GDB.  Type "show warranty" for 
details.
> > > This GDB was configured as "amd64-marcel-freebsd".
> > > 
> > > Unread portion of the kernel message buffer:
> > > kernel trap 12 with interrupts disabled
> > > 
> > > 
> > > Fatal trap 0:  while in kernel mode
> > > cpuid = 0; apic id = 00
> > > instruction pointer	= 0x4300:0xffffffff9fae41c0
> > > stack pointer	        = 0x10:0xffffffff9fae4190
> > > frame pointer	        = 0x10:0x5
> > > code segment		= base 0x0, limit 0x0, type 0x0
> > > 			= DPL 0, pres 0, long 0, def32 0, gran 0
> > > processor eflags	= resume, IOPL = 0
> > > current process		= 904 (qemu-system-x86_64)
> > > trap number		= kernel trap 12 with interrupts disabled
> > > 
> > > 
> > > Fatal trap 12: page fault while in kernel mode
> > > cpuid = 0; apic id = 00
> > > fault virtual address	= 0x46
> > > fault code		= supervisor read data, page not present
> > > instruction pointer	= 0x8:0xffffffff804aff9d
> > > stack pointer	        = 0x10:0xffffffff9fae3d20
> > > frame pointer	        = 0x10:0xffffffff9fae3e80
> > > code segment		= base 0x0, limit 0xfffff, type 0x1b
> > > 			= DPL 0, pres 1, long 1, def32 0, gran 1
> > > processor eflags	= resume, IOPL = 0
> > > current process		= 904 (qemu-system-x86_64)
> > > trap number		= 12
> > > panic: page fault
> > > cpuid = 0
> > > KDB: stack backtrace:
> > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> > > panic() at panic+0x17a
> > > trap_fatal() at trap_fatal+0x29f
> > > trap() at trap+0x242
> > > calltrap() at calltrap+0x8
> > > --- trap 0xc, rip = 0xffffffff804aff9d, rsp = 0xffffffff9fae3d20, rbp = 
0xffffffff9fae3e80 ---
> > > kvprintf() at kvprintf+0x11ed
> > > printf() at printf+0xa4
> > > uart_z8530_class() at 0x3386
> > > swapb.6687() at swapb.6687+0x13f
> > > Uptime: 19m14s
> > > Physical memory: 986 MB
> > > Dumping 113 MB: (CTRL-C to abort)  98 82 66 (CTRL-C to abort)  50 34 18 
2
> > > 
> > > #0  doadump () at pcpu.h:194
> > > 194		__asm __volatile("movq %%gs:0,%0" : "=r" (td));
> > > (kgdb) bt
> > > #0  doadump () at pcpu.h:194
> > > #1  0xffffffff80484b18 in boot (howto=260) 
at ../../../kern/kern_shutdown.c:409
> > > #2  0xffffffff80484f77 in panic (fmt=Variable "fmt" is not available.
> > > ) at ../../../kern/kern_shutdown.c:563
> > > #3  0xffffffff8070de6f in trap_fatal (frame=0xc, eva=Variable "eva" is 
not available.
> > > )
> > >     at ../../../amd64/amd64/trap.c:697
> > > #4  0xffffffff8070eb62 in trap (frame=0xffffffff9fae3c70)
> > >     at ../../../amd64/amd64/trap.c:248
> > > #5  0xffffffff806f3e0e in calltrap () 
at ../../../amd64/amd64/exception.S:169
> > > #6  0xffffffff804aff9d in kvprintf (fmt=0xffffffff807febff "\n", 
> > >     func=0xffffffff804b07d0 <putchar>, arg=0xffffffff9fae3e90, radix=10, 
> > >     ap=0xffffffff9fae3ec0) at ../../../kern/subr_prf.c:819
> > > #7  0xffffffff804b0284 in printf (fmt=Variable "fmt" is not available.
> > > ) at ../../../kern/subr_prf.c:314
> > > #8  0x0000000000003386 in ?? ()
> > > #9  0xffffffff9fae4090 in ?? ()
> > > #10 0xffffffff806f4667 in Xtimerint () at apic_vector.S:103
> > > Previous frame identical to this frame (corrupt stack?)
> > > (kgdb) q
> > > iapetus# exit
> > > 
> > > Script done on Sun Nov 18 19:11:41 2007
> > > 
> > >  and:
> > > 
> > > [GDB will not be able to debug user-mode 
threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
> > > GNU gdb 6.1.1 [FreeBSD]
> > > Copyright 2004 Free Software Foundation, Inc.
> > > GDB is free software, covered by the GNU General Public License, and you 
are
> > > welcome to change it and/or distribute copies of it under certain 
conditions.
> > > Type "show copying" to see the conditions.
> > > There is absolutely no warranty for GDB.  Type "show warranty" for 
details.
> > > This GDB was configured as "amd64-marcel-freebsd".
> > > 
> > > Unread portion of the kernel message buffer:
> > > kernel trap 12 with interrupts disabled
> > > 
> > > 
> > > Fatal trap 12: page fault while in kernel mode
> > > cpuid = 0; apic id = 00
> > > fault virtual address	= 0xd
> > > fault code		= supervisor read data, page not present
> > > instruction pointer	= 0x8:0xffffffff8073d743
> > > stack pointer	        = 0x10:0xffffffff9fae4610
> > > frame pointer	        = 0x10:0x0
> > > code segment		= base 0x0, limit 0xfffff, type 0x1b
> > > 			= DPL 0, pres 1, long 1, def32 0, gran 1
> > > processor eflags	= resume, IOPL = 0
> > > current process		= 948 (qemu-system-x86_64)
> > > trap number		= 12
> > > panic: page fault
> > > cpuid = 0
> > > KDB: stack backtrace:
> > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> > > panic() at panic+0x17a
> > > trap_fatal() at trap_fatal+0x29f
> > > dmapbase() at 0xffffff0001080460
> > > dmapbase() at 0xffffff00010819c0
> > > Uptime: 23m57s
> > > Physical memory: 986 MB
> > > Dumping 152 MB: 137 121 105 89 73 57 41 25 9
> > > 
> > > #0  doadump () at pcpu.h:194
> > > 194		__asm __volatile("movq %%gs:0,%0" : "=r" (td));
> > > (kgdb) bt
> > > #0  doadump () at pcpu.h:194
> > > #1  0xffffffff80484b18 in boot (howto=260) 
at ../../../kern/kern_shutdown.c:409
> > > #2  0xffffffff80484f77 in panic (fmt=Variable "fmt" is not available.
> > > ) at ../../../kern/kern_shutdown.c:563
> > > #3  0xffffffff8070de6f in trap_fatal (frame=0xc, eva=Variable "eva" is 
not available.
> > > )
> > >     at ../../../amd64/amd64/trap.c:697
> > > #4  0xffffff0001080460 in ?? ()
> > > #5  0xffffffff80a4d8a0 in lapics ()
> > > #6  0xffffff00010819c0 in ?? ()
> > > #7  0x0000000000000000 in ?? ()
> > > #8  0xffffff0001055600 in ?? ()
> > > #9  0xffffffff9fae44e0 in ?? ()
> > > #10 0xffffffff8044ffed in hardclock_cpu (usermode=Variable "usermode" is 
not available.
> > > )
> > >     at ../../../kern/kern_clock.c:224
> > > #11 0xffffff00010819c0 in ?? ()
> > > #12 0x0000000000000000 in ?? ()
> > > #13 0xffffff000215b000 in ?? ()
> > > #14 0xffffffff9fae4610 in ?? ()
> > > #15 0xffffff000215b000 in ?? ()
> > > #16 0x0000000000000000 in ?? ()
> > > #17 0xffffffff80a26430 in main_console ()
> > > #18 0x00000000000213bf in ?? ()
> > > #19 0xffffff00010819c0 in ?? ()
> > > #20 0x0000000000000000 in ?? ()
> > > ---Type <return> to continue, or q <return> to quit---
> > > #21 0x0000000000000000 in ?? ()
> > > #22 0xffffffff80a2fd78 in runq ()
> > > #23 0xffffff000215b000 in ?? ()
> > > #24 0x0000000000000001 in ?? ()
> > > #25 0xffffffff8047953c in _mtx_lock_spin (m=0xffffffff80a26430, 
tid=136126, 
> > >     opts=Variable "opts" is not available.
> > > ) at cpufunc.h:343
> > > Previous frame inner to this frame (corrupt stack?)
> > > (kgdb) q
> > > iapetus# exit
> > > 
> > >  kgdb still seems to be kind of confused tho, afaict runq is a variable
> > > not a function...  Anyone can make head or tail of these crashes?
> > 
> > I would check your hardware for bad RAM, etc.
> 
> Well, I doubt its that...  It works when running a up kernel, and it works
> on a 6.3beta2 i386 install on the same box with smp.  Also I haven't
> seen any crashes on that box yet other than from this amd64 kqemu on the
> smp kernel (it also survived building a world and kernel with -j4),
> actually I haven't received reports of kqemu/amd64/smp actually working
> for anyone.  (do you want to try? :)  I _suspect_ kqemu/amd64 is doing
> either things differently than on i386, or differences between the
> i386 and amd64 kernels trigger the problem.
> 
>  Fwiw, I have a report of kqemu/amd64 crashing the host on a linux smp host
> too, tho there only with a windows guest; linux guests (which I was testing)
> seem to work there.
> 
>  Oh and I left memtest86 running on that box overnight and it found 
nothing...

well, it could be a kqemu bug I guess, but your panics look like seemingly 
random memory corruptino as you have stack traces where functions are calling 
other functions that the don't actually call in the source code.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200711291441.04134.jhb>