From owner-freebsd-hackers Wed Mar 10 23:33: 4 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id CB7E2150F9 for ; Wed, 10 Mar 1999 23:33:02 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id XAA61853; Wed, 10 Mar 1999 23:32:43 -0800 (PST) (envelope-from dillon) Date: Wed, 10 Mar 1999 23:32:43 -0800 (PST) From: Matthew Dillon Message-Id: <199903110732.XAA61853@apollo.backplane.com> To: Greg Rowe Cc: David Greenman , freebsd-hackers@FreeBSD.ORG Subject: Re: SMP Woes References: Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :Hi David, : : I built a kernel with debugging and had one of the guys here take a look :through the crash dump. First, is the new "Fatal Trap" DDB output and then his :comments on what he saw. Anything else we should try ?? : :Greg : :On 10-Mar-99 David Greenman wrote: :> There are at least two things that are strange in the following. First, :> there is no call to bzero() from zalloci() (or in zlock(), _zalloc(), and :> zunlock(), which are inlined). Second, the parameters to generic_bzero() :> indicate that 0 bytes are to be zeroed. It's also strange that the address Well, zalloci() can call _zget(), which can call bzero(). Maybe the underscore in the _zget() is preventing DDB from listing it. The call offset in zalloci() in the trace below is zalloci+0x29. If you disassemble zalloci, you will note that this is the call-return point for _zget: 0xf020b59f : pushl %ebx 0xf020b5a0 : call 0xf020b5f8 <_zget> 0xf020b5a5 : movl %eax,%ebx The generic_bzero() call arguments are either bogus, or the stack length argument has been modified by generic_bzero(). The fault virtual address is 0, but vm_page_alloc() seems to properly test for m == NULL so this should not be possible. It would be useful to print out the contents of *m from the _zget frame, and also the *z structure. -- If this machine has a large amount of memory, it may have overrun its KVA allocation. This can also happen if you have a large 'maxusers' in the kernel config. If so, try reducing maxusers to 128 or less. -Matt Matthew Dillon :Fatal trap 12: page fault while in kernel mode :mp_lock = 03000002; cpuid = 3; lapic.id = 02000000 :fault virtual address = 0x0 :fault code = supervisor write, page not present :instruction pointer = 0x8:0xf020ec9f :stack pointer = 0x10:0xfe5d2c34 :frame pointer = 0x10:0xfe5d2c58 :code segment = base 0x0, limit 0xfffff, type 0x1b : = DPL 0, pres 1, def32 1, gran 1 :processor eflags = interrupt enabled, resume, IOPL = 0 :current process = 243 (cpio) :interrupt mask = net tty bio cam <- SMP: XXX :kernel: type 12 trap, code=0 :Stopped at generic_bzero+0xf: repe stosl %es:(%edi) :db> trace :generic_bzero(f3283f80,0,f47f7000,fe5d2c90,fe5d2c98) at generic_bzero+0xf :zalloci(f3283f80,f4880100,f47f7000,6f7f9,fe540ec0) at zalloci+0x29 :getnewvnode(1,f33f0400,f3266200,fe5d2cfc,100) at getnewvnode+0x2f8 :... :....snip....gdb) frame 11 :#11 0xf01f1869 in zalloci (z=0xf3283f80) at ../../vm/vm_zone.h:85 :85 return _zget(z); :(kgdb) print _zget :$6 = {void *(struct vm_zone *)} 0xf01f18d4 <_zget> :(kgdb) print generic_bzero :$7 = {} 0xf020ec90 : :So, it doesn't look like a 1-bit off error.... : :Because zget/zalloc is an inline function, I can't seem to get gdb to :print simple_lock or simple_unlock. : :-Chris : : :Greg Rowe US WEST - Internet Service Operations : To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message