From owner-freebsd-stable Fri Mar 15 15:47: 8 2002 Delivered-To: freebsd-stable@freebsd.org Received: from mail.flipdog.com (12-254-245-65.client.attbi.com [12.254.245.65]) by hub.freebsd.org (Postfix) with ESMTP id 5EA1637B402 for ; Fri, 15 Mar 2002 15:47:01 -0800 (PST) Received: from aurora (localhost [127.0.0.1]) by mail.flipdog.com (Postfix) with ESMTP id 0A74C422E2; Fri, 15 Mar 2002 16:46:54 -0700 (MST) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 From: "Jan L. Peterson" To: Holt Grendal Cc: stable@freebsd.org Subject: Re: Further Page Fault Details X-face: p=61=y<.Il$z+k*y~"j>%c[8R~8{j3WTnaSd-'RyC>t.Ub>AAm\zYA#5JF +W=G?EI+|EI);]=fs_MOfKN0n9`OlmB[1^0;L^64K5][nOb&gv/n}p@mm06|J|WNa asp7mMEw0w)e_6T~7v-\]yHKvI^1}[2k)] References: <20020315230339.53208.qmail@web11603.mail.yahoo.com> In-reply-to: Your message of "Fri, 15 Mar 2002 15:03:39 PST." <20020315230339.53208.qmail@web11603.mail.yahoo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 15 Mar 2002 16:46:54 -0700 Message-Id: <20020315234655.0A74C422E2@mail.flipdog.com> Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG By chance, do you have an xl ethernet adapter in that box? Can you send a dmesg and debugger trace? Also, are you doing heavy NFS on that box? Do you have softupdates enabled? These are all problems that have been seen/mentioned on the list lately. My own system is crashing again. It only happens on heavy NFS usage. I've been going the rounds with this box for several weeks, have swapped entire machines (it's a laptop) as well as RAM and hard drive. I have a kernel.debug and vmcore.0 file from the most recent crash. gdb says: Fatal trap 12: page fault while in kernel mode fault virtual address = 0xc0f8222e fault code = supervisor write, page not present instruction pointer = 0x8:0xc0205a7b stack pointer = 0x10:0xce1e1b04 frame pointer = 0x10:0xce1e1b18 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 13452 (ld) interrupt mask = net tty (I'm leaving off the traceback stuff that's in the kernel debugger if you're wondering why the trace starts at frame 11.) (kgdb) where [...] #11 0xc0205a7b in nfs_realign (pm=0xc0e90800, hsiz=20) at ../../nfs/nfs_socket.c:1733 #12 0xc0203dbd in nfs_receive (rep=0xc18c0380, aname=0xce1e1ba0, mp=0xce1e1ba4) at ../../nfs/nfs_socket.c:746 #13 0xc0203e0d in nfs_reply (myrep=0xc18c0380) at ../../nfs/nfs_socket.c:792 #14 0xc0204502 in nfs_request (vp=0xcdfc8600, mrest=0xc0e49200, procnum=7, procp=0xce082080, cred=0xc18cb900, mrp=0xce1e1c94, mdp=0xce1e1c98, dposp=0xce1e1c9c) at ../../nfs/nfs_socket.c:1080 #15 0xc02123f9 in nfs_writerpc (vp=0xcdfc8600, uiop=0xce1e1d00, cred=0xc18cb900, iomode=0xce1e1cf0, must_commit=0xce1e1cec) at ../../nfs/nfs_vnops.c:1197 #16 0xc01eb7d0 in nfs_doio (bp=0xc68f5e58, cr=0xc18cb900, p=0xce082080) at ../../nfs/nfs_bio.c:1530 #17 0xc021e3c0 in nfs_strategy (ap=0xce1e1d60) at ../../nfs/nfs_vnops.c:2752 #18 0xc021ecf4 in nfs_writebp (bp=0xc68f5e58, force=1, procp=0xce082080) at vnode_if.h:944 #19 0xc021ec16 in nfs_bwrite (ap=0xce1e1dd8) at ../../nfs/nfs_vnops.c:3117 #20 0xc01eacc4 in nfs_write (ap=0xce1e1e64) at vnode_if.h:1193 #21 0xc01a0cea in vn_write (fp=0xc18c0140, uio=0xce1e1ed4, cred=0xc18cb900, flags=0, p=0xce082080) at vnode_if.h:363 #22 0xc017b5f1 in dofilewrite (p=0xce082080, fp=0xc18c0140, fd=7, buf=0x80c4200, nbyte=20, offset=-1, flags=0) at ../../sys/file.h:162 #23 0xc017b4aa in write (p=0xce082080, uap=0xce1e1f80) at ../../kern/sys_generic.c:329 #24 0xc02be441 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 134984096, tf_esi = 135021056, tf_ebp = -1077940628, tf_isp = -836886572, tf_ebx = 134984096, tf_edx = 134984096, tf_ecx = 134984096, tf_eax = 4, tf_trapno = 7, tf_err = 2, tf_eip = 134815572, tf_cs = 31, tf_eflags = 663, tf_esp = -1077940672, tf_ss = 47}) at ../../i386/i386/trap.c:1167 #25 0xc02af515 in Xint0x80_syscall () (kgdb) up 11 #11 0xc0205a7b in nfs_realign (pm=0xc0e90800, hsiz=20) at ../../nfs/nfs_socket.c:1733 1733 MCLGET(n, M_WAIT); (kgdb) list 1728 1729 while ((m = *pm) != NULL) { 1730 if ((m->m_len & 0x3) || (mtod(m, intptr_t) & 0x3)) { 1731 MGET(n, M_WAIT, MT_DATA); 1732 if (m->m_len >= MINCLSIZE) { 1733 MCLGET(n, M_WAIT); 1734 } 1735 n->m_len = 0; 1736 break; 1737 } (kgdb) p *n $1 = {m_hdr = {mh_next = 0x0, mh_nextpkt = 0x0, mh_data = 0xc0e94114 "\020\002\b\001\nE\002\024", mh_len = 16, mh_type = 1, mh_flags = 0}, M_dat = {MH = {MH_pkthdr = {rcvif = 0x1080210, len = 335693066, header = 0x0, csum_flags = 0, csum_data = 335693066, aux = 0x5002450a}, MH_dat = {MH_ext = { ext_buf = 0x3194635
, ext_free = 0, ext_size = 33554432, ext_ref = 0xa3860100}, [... ton of other hud removed ...] notice that ext_buf "Address out of bounds" up there? I'd bet a nickle that's the problem. Now what is causing that bad buffer to get on the list I don't know. Any ideas from kernel developers? Anyone want more info out of this vmcore? -jan- -- Jan L. Peterson To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message