From owner-freebsd-hackers Wed Oct 6 17:23:52 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from dt011n66.san.rr.com (dt011n66.san.rr.com [204.210.13.102]) by hub.freebsd.org (Postfix) with ESMTP id BFE7414CA8 for ; Wed, 6 Oct 1999 17:23:34 -0700 (PDT) (envelope-from Doug@gorean.org) Received: from localhost (doug@localhost) by dt011n66.san.rr.com (8.9.3/8.8.8) with ESMTP id RAA57809; Wed, 6 Oct 1999 17:23:32 -0700 (PDT) (envelope-from Doug@gorean.org) Date: Wed, 6 Oct 1999 17:23:32 -0700 (PDT) From: Doug X-Sender: doug@dt011n66.san.rr.com To: freebsd-hackers@FreeBSD.ORG Cc: Luoqi Chen Subject: New crash, NFS/multiple frees related. (Was: zalloci/pv_entry problem) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Same machine crashed, this time at a different place. I'm starting to wonder if there may be a hardware issue with this machine, but the errors I'm seeing in the logs are similar enough to the show-stopping errors I had when all the machines were running the same newer -current, so maybe it's just bad luck. In any case, here is the latest data. Any input would be appreciated. I can resend the pertinent details to anyone who needs them. Thanks, Doug Fatal trap 12: page fault while in kernel mode mp_lock = 00000005; cpuid = 0; lapic.id = 01000000 fault virtual address = 0x4800c040 fault code = supervisor read, page not present instruction pointer = 0x8:0xc01e2c40 stack pointer = 0x10:0xdc838a40 frame pointer = 0x10:0xdc838a44 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 97652 (miva) interrupt mask = net tty bio cam <- SMP: XXX kernel: type 12 trap, code=0 panic: free: multiple frees mp_lock = 00000001; cpuid = 0; lapic.id = 01000000 Debugger("panic") Stopped at Debugger+0x37: movl $0,in_Debugger db> trace Debugger(c0216d32) at Debugger+0x37 panic(c02162ff,c20e2440,dcef4bc0,dc838c94,4800c040) at panic+0xa8 free(c20e2440,c023b580,dd1c6e40,dc838c70,c0154ee7) at free+0xd3 cache_zap(c20e2440) at cache_zap+0xb3 cache_purge(dd1c6e40,20,dd1e2c60,c1c7a040,dc838c94) at cache_purge+0x37 getnewvnode(2,c1d3ce00,c1c58d00,dc838cc8,20) at getnewvnode+0x27e nfs_nget(c1d3ce00,c111d860,20,dc838d64,dc838e1c) at nfs_nget+0x107 nfs_create(dc838e1c,0,dc838f80,fffffffb,da07aec0) at nfs_create+0x1673 vn_open(dc838eec,20a,180,da07aec0,c0239d4c) at vn_open+0xfa open(da07aec0,dc838f80,1,80d5d80,bfbfd5f0) at open+0xbb syscall(2f,2f,2f,bfbfd5f0,80d5d80) at syscall+0x19e Xint0x80_syscall() at Xint0x80_syscall+0x31 I actually had better luck with x/s than I did with x/l, so here goes: db> x/s 0xc02162ff __set_sysuninit_set_sym_M_FREE_uninit_sys_uninit+0x7b: free: multiple frees 0xc20e2440: @$\016\302\264Y\302\301@\026\372\301H\335"\302 db> x/s 0xc023b580 M_CACHE: \340\265#\300\200'* db> x/s 0xdd1c6e40 0xdd1c6e40: db> x/l 0xdd1c6e40 0xdc838c70: \234\214\203\334\202\203\025\300@n\034\335 db> x/s 0xc0154ee7 cache_purge+0x37: \203\304\004\203\273\210 0xc20e2440: @$\016\302\264Y\302\301@\026\372\301H\335"\302 db> x/s 0xdcef4bc0 0xdcef4bc0: db> x/l 0xdcef4bc0 db> x/s 0xdc838c94 0xdc838c94: db> x/l 0xdc838c94 0xdd1e2c60: \300\346\343\335`3\001\335 0xc1c7a040: \3617\335\300\346\203\335@\233g\335`\014\237\335@+\361\334@k%\335@\242B\335\300V\303\334@[\322\334\2405\032\335\300\266\305\334 db> x/s 0xc1d3ce00 0xc1d3ce00: db> x/l 0xc1d3ce00 db> x/s 0xc1c58d00 0xc1c58d00: \300f\025\300\250f\025\300 \320\032\300\024\323\031\300h\327\031\300\370\327\031\300\300\272\027\300\250f\025\300xh\025\300\334\327\032\300\250\004\032\300\334\273\027\300\334\333\031\300\250f\025\300 \245\027\300\250f\025\300\364\274\027\300\260\327\032\300@\317\032\300\250f\025\300\3745\032\300\014g\032\300\204\362\031\300P\210\032\300\3105\032\300\024\320\032\300T\325\031\300\250f\025\300 h\025\300X\247\032\300\320\004\032\300\004Q\032\300\250X\032\300x\240\032\300\224\317\032\300\220q\032\300\250f\025\300\360\224\027\300H\227\027\300\250f\025\300 \330\032\300\240f\025\300\250f\025\300\250f\025\300\250f\025\300 0xdc838cc8: \200\344\032\301\364\215\203\334oL\032\300 On Tue, 5 Oct 1999, Doug wrote: > On Mon, 4 Oct 1999, Luoqi Chen wrote: > > > If you have a crash dump, could you look at the 4 longwords starting > > at address 0xc02698c0? It seemed to be an accouting problem. Do you > > by any chance use any kld module? zalloc() calls from within a module > > do not lock the vm_zone data structure, which is fine for UP but > > dangerous for SMP. > > Well the same machine crashed in the same place, so I can look at > the current crash for you: > > Fatal trap 12: page fault while in kernel mode > mp_lock = 01000002; cpuid = 1; lapic.id = 00000000 > fault virtual address = 0x800018 > fault code = supervisor read, page not present > instruction pointer = 0x8:0xc01d1107 > stack pointer = 0x10:0xdc98fe28 > frame pointer = 0x10:0xdc98fe34 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 41226 (httpd) > interrupt mask = net tty bio cam <- SMP: XXX > kernel: type 12 trap, code=0 > Stopped at zalloci+0x33: movl 0(%edx),%eax > > db> trace > zalloci(c02698c0,dc98fe58,c01f24c3,da07e7a4,91bb000) at zalloci+0x33 > get_pv_entry(da07e7a4,91bb000,ffc246ec,0,dc98fe90) at get_pv_entry+0x4a > pmap_insert_entry(da07e7a4,91bb000,c0572b60,8206000) at > pmap_insert_entry+0x1f > pmap_copy(da07e7a4,dc8eeb64,80c6000,1f85000,80c6000) at pmap_copy+0x1a0 > vm_map_copy_entry(dc8eeb00,da07e740,dc8327a8,dc861ed8) at > vm_map_copy_entry+0xdf > vmspace_fork(dc8eeb00,dc8e9ce0,dc8e9ce0,bfbfddbc,dc98ff30) at > vmspace_fork+0x1d3 > vm_fork(dc8ea940,dc8e9ce0,14) at vm_fork+0x2f > fork1(dc8ea940,14,dc98ff48,dc8ea940,9) at fork1+0x621 > fork(dc8ea940,dc98ff80,805b36c,30,bfbfddbc) at fork+0x16 > syscall(c01e002f,2f,2f,bfbfddbc,30) at syscall+0x19e > Xint0x80_syscall() at Xint0x80_syscall+0x31 > One other thing that I thought of related to the SMP issue is that > these are Intel N440BX motherboards, and the BIOS has an option to set the > "Multi-Processor Specification" or some such that was set to 1.4, with the > other option being 1.1. Would it be better to set it to 1.1, or was my > assumption that FreeBSD would ignore that setting anyways correct? > > I put more DDB stuff from this crash at > http://doug.simplenet.com/DDB2.txt, let me know if there is anything else > I need to do. Remote GDB is an option here if you think that'd be a better > tool. I'll check to see if I'm getting dumps when the machine comes back. > > Thanks, > > Doug > -- "Stop it, I'm gettin' misty." - Mel Gibson as Porter, "Payback" To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message