From owner-freebsd-stable  Fri Mar 15 15:47: 8 2002
Delivered-To: freebsd-stable@freebsd.org
Received: from mail.flipdog.com (12-254-245-65.client.attbi.com [12.254.245.65])
	by hub.freebsd.org (Postfix) with ESMTP id 5EA1637B402
	for <stable@freebsd.org>; Fri, 15 Mar 2002 15:47:01 -0800 (PST)
Received: from aurora (localhost [127.0.0.1])
	by mail.flipdog.com (Postfix) with ESMTP
	id 0A74C422E2; Fri, 15 Mar 2002 16:46:54 -0700 (MST)
X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4
From: "Jan L. Peterson" <jlp@softhome.net>
To: Holt Grendal <holtor@yahoo.com>
Cc: stable@freebsd.org
Subject: Re: Further Page Fault Details 
X-face: p=61=y<.Il$z+k*y~"j>%c[8R~8{j3WTnaSd-'RyC>t.Ub>AAm\zYA#5JF
 +W=G?EI+|EI);]=fs_MOfKN0n9`OlmB[1^0;L^64K5][nOb&gv/n}p@mm06|J|WNa
 asp7mMEw0w)e_6T~7v-\]yHKvI^1}[2k)]
References: <20020315230339.53208.qmail@web11603.mail.yahoo.com>
In-reply-to: Your message of "Fri, 15 Mar 2002 15:03:39 PST."
             <20020315230339.53208.qmail@web11603.mail.yahoo.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Fri, 15 Mar 2002 16:46:54 -0700
Message-Id: <20020315234655.0A74C422E2@mail.flipdog.com>
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-stable.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-stable>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-stable>
X-Loop: FreeBSD.ORG

By chance, do you have an xl ethernet adapter in that box?  Can you 
send a dmesg and debugger trace?  Also, are you doing heavy NFS on that 
box?  Do you have softupdates enabled?  These are all problems that 
have been seen/mentioned on the list lately.

My own system is crashing again.  It only happens on heavy NFS usage.
I've been going the rounds with this box for several weeks, have 
swapped entire machines (it's a laptop) as well as RAM and hard drive.
I have a kernel.debug and vmcore.0 file from the most recent crash.

gdb says:

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xc0f8222e
fault code              = supervisor write, page not present
instruction pointer     = 0x8:0xc0205a7b
stack pointer           = 0x10:0xce1e1b04
frame pointer           = 0x10:0xce1e1b18
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 13452 (ld)
interrupt mask          = net tty 

(I'm leaving off the traceback stuff that's in the kernel debugger if 
you're wondering why the trace starts at frame 11.)

(kgdb) where
[...]
#11 0xc0205a7b in nfs_realign (pm=0xc0e90800, hsiz=20)
    at ../../nfs/nfs_socket.c:1733
#12 0xc0203dbd in nfs_receive (rep=0xc18c0380, aname=0xce1e1ba0, mp=0xce1e1ba4)
    at ../../nfs/nfs_socket.c:746
#13 0xc0203e0d in nfs_reply (myrep=0xc18c0380) at ../../nfs/nfs_socket.c:792
#14 0xc0204502 in nfs_request (vp=0xcdfc8600, mrest=0xc0e49200, procnum=7, 
    procp=0xce082080, cred=0xc18cb900, mrp=0xce1e1c94, mdp=0xce1e1c98, 
    dposp=0xce1e1c9c) at ../../nfs/nfs_socket.c:1080
#15 0xc02123f9 in nfs_writerpc (vp=0xcdfc8600, uiop=0xce1e1d00, 
    cred=0xc18cb900, iomode=0xce1e1cf0, must_commit=0xce1e1cec)
    at ../../nfs/nfs_vnops.c:1197
#16 0xc01eb7d0 in nfs_doio (bp=0xc68f5e58, cr=0xc18cb900, p=0xce082080)
    at ../../nfs/nfs_bio.c:1530
#17 0xc021e3c0 in nfs_strategy (ap=0xce1e1d60) at ../../nfs/nfs_vnops.c:2752
#18 0xc021ecf4 in nfs_writebp (bp=0xc68f5e58, force=1, procp=0xce082080)
    at vnode_if.h:944
#19 0xc021ec16 in nfs_bwrite (ap=0xce1e1dd8) at ../../nfs/nfs_vnops.c:3117
#20 0xc01eacc4 in nfs_write (ap=0xce1e1e64) at vnode_if.h:1193
#21 0xc01a0cea in vn_write (fp=0xc18c0140, uio=0xce1e1ed4, cred=0xc18cb900, 
    flags=0, p=0xce082080) at vnode_if.h:363
#22 0xc017b5f1 in dofilewrite (p=0xce082080, fp=0xc18c0140, fd=7, 
    buf=0x80c4200, nbyte=20, offset=-1, flags=0) at ../../sys/file.h:162
#23 0xc017b4aa in write (p=0xce082080, uap=0xce1e1f80)
    at ../../kern/sys_generic.c:329
#24 0xc02be441 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, 
      tf_edi = 134984096, tf_esi = 135021056, tf_ebp = -1077940628, 
      tf_isp = -836886572, tf_ebx = 134984096, tf_edx = 134984096, 
      tf_ecx = 134984096, tf_eax = 4, tf_trapno = 7, tf_err = 2, 
      tf_eip = 134815572, tf_cs = 31, tf_eflags = 663, tf_esp = -1077940672, 
      tf_ss = 47}) at ../../i386/i386/trap.c:1167
#25 0xc02af515 in Xint0x80_syscall ()

(kgdb) up 11
#11 0xc0205a7b in nfs_realign (pm=0xc0e90800, hsiz=20)
    at ../../nfs/nfs_socket.c:1733
1733                                    MCLGET(n, M_WAIT);

(kgdb) list
1728
1729            while ((m = *pm) != NULL) {
1730                    if ((m->m_len & 0x3) || (mtod(m, intptr_t) & 0x3)) {
1731                            MGET(n, M_WAIT, MT_DATA);
1732                            if (m->m_len >= MINCLSIZE) {
1733                                    MCLGET(n, M_WAIT);
1734                            }
1735                            n->m_len = 0;
1736                            break;
1737                    }

(kgdb) p *n
$1 = {m_hdr = {mh_next = 0x0, mh_nextpkt = 0x0, 
    mh_data = 0xc0e94114 "\020\002\b\001\nE\002\024", mh_len = 16, 
    mh_type = 1, mh_flags = 0}, M_dat = {MH = {MH_pkthdr = {rcvif = 0x1080210, 
        len = 335693066, header = 0x0, csum_flags = 0, csum_data = 335693066, 
        aux = 0x5002450a}, MH_dat = {MH_ext = {
          ext_buf = 0x3194635 <Address 0x3194635 out of bounds>, ext_free = 0, 
          ext_size = 33554432, ext_ref = 0xa3860100}, 
[... ton of other hud removed ...]

notice that ext_buf "Address out of bounds" up there?  I'd bet a nickle 
that's the problem.  Now what is causing that bad buffer to get on the 
list I don't know.  Any ideas from kernel developers?  Anyone want more 
info out of this vmcore?  

	-jan-
-- 
Jan L. Peterson
<jlp@softhome.net>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message