From owner-freebsd-current@FreeBSD.ORG Sun Jan 15 20:40:04 2006 Return-Path: X-Original-To: current@freebsd.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D038E16A41F; Sun, 15 Jan 2006 20:40:04 +0000 (GMT) (envelope-from frank@exit.com) Received: from tinker.exit.com (tinker.exit.com [206.223.0.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 71B3443D46; Sun, 15 Jan 2006 20:40:04 +0000 (GMT) (envelope-from frank@exit.com) Received: from realtime.exit.com (realtime [206.223.0.5]) by tinker.exit.com (8.13.4/8.13.4) with ESMTP id k0FKe5CC038117; Sun, 15 Jan 2006 12:40:05 -0800 (PST) (envelope-from frank@exit.com) Received: from realtime.exit.com (localhost [127.0.0.1]) by realtime.exit.com (8.13.4/8.13.4) with ESMTP id k0FKe3P3001936; Sun, 15 Jan 2006 12:40:03 -0800 (PST) (envelope-from frank@exit.com) Received: (from frank@localhost) by realtime.exit.com (8.13.4/8.13.4/Submit) id k0FKe35t001935; Sun, 15 Jan 2006 12:40:03 -0800 (PST) (envelope-from frank@exit.com) X-Authentication-Warning: realtime.exit.com: frank set sender to frank@exit.com using -f From: Frank Mayhar To: hackers@freebsd.org Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Exit Consulting Date: Sun, 15 Jan 2006 12:40:02 -0800 Message-Id: <1137357602.1362.23.camel@realtime.exit.com> Mime-Version: 1.0 X-Mailer: Evolution 2.4.1 FreeBSD GNOME Team Port X-Virus-Scanned: ClamAV 0.87.1/1243/Sun Jan 15 10:35:18 2006 on tinker.exit.com X-Virus-Status: Clean Cc: FreeBSD-Current Subject: Panic in nfs_putpages() on 6-stable. X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: frank@exit.com List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Jan 2006 20:40:05 -0000 I've run into this panic a couple of times over the last few days, while trying to rebuild ports using an NFS-mounted /usr/ports filesystem. It happened again today and this time I had time to look at the dump. The problem is a null pointer dereference in nfs_putpages(), when it tries to look at np->n_size. It turns out that v_data is NULL on entry to this routine. Looking at the stack I see why: #6 0xc0674e4a in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #7 0xc05eb030 in nfs_putpages (ap=0xe81c6a14) at /usr/src/sys/nfsclient/nfs_bio.c:301 #8 0xc0691148 in VOP_PUTPAGES_APV (vop=0x1000, a=0xe81c6a14) at vnode_if.c:2164 #9 0xc064fd8e in vnode_pager_putpages (object=0xcafaa840, m=0x1000, count=0x1000, sync=0x5, rtvals=0x1000) at vnode_if.h:1119 During symbol reading, Attribute value is not a constant (DW_FORM_ref4). #10 0xc064b99e in vm_pageout_flush (mc=0xe81c6ab0, count=0x1, flags=0x5) at vm_pager.h:147 #11 0xc0647d0c in vm_object_page_collect_flush (object=0xcafaa840, p=0xc19e5218, curgeneration=0x0, pagerflags=0x5) at /usr/src/sys/vm/vm_object.c:950 #12 0xc0647800 in vm_object_page_clean (object=0xcafaa840, start=0x0, end=Unhandled dwarf expression opcode 0x93 ) at /usr/src/sys/vm/vm_object.c:753 #13 0xc0647525 in vm_object_terminate (object=0xcafaa840) at /usr/src/sys/vm/vm_object.c:608 #14 0xc064e5ad in vnode_destroy_vobject (vp=0xcb58c110) at /usr/src/sys/vm/vnode_pager.c:166 #15 0xc05ee075 in nfs_reclaim (ap=0x1000) at /usr/src/sys/nfsclient/nfs_node.c:247 #16 0xc069095e in VOP_RECLAIM_APV (vop=0x1000, a=0xe81c6c90) at vnode_if.c:1589 #17 0xc0587aa5 in vgonel (vp=0xcb58c110) at vnode_if.h:818 #18 0xc0584ac2 in vlrureclaim (mp=0xc9b2e400) at /usr/src/sys/kern/vfs_subr.c:612 #19 0xc0584e8b in vnlru_proc () at /usr/src/sys/kern/vfs_subr.c:725 #20 0xc052034c in fork_exit (callout=0xc0584d00 , arg=0x0, frame=0xe81c6d38) at /usr/src/sys/kern/kern_fork.c:789 #21 0xc0674eac in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:208 In nfs_reclaim(), just before he calls vnode_destroy_vobject(), he zfrees and clears vp->v_data. When, down in the guts of vm_object.c, he tries to flush the associated pages, v_data is already NULL so he goes boom. Now, why does he do the zfree/clear before vnode_destroy_vobject()? Is he assuming that there are no pages associated with this vnode that need to be flushed? Should there be? I looked at some other file systems and they do the same thing. The obvious fix is to move the zfree/clear to after the vnode_destroy_vobject() but if there should be no pages that need to be flushed on the vnode at this point, that would just hide the problem. I can keep looking at the code to answer my question but I thought I would ask here first, in case there's someone who knows the answer right away. Thanks. -- Frank Mayhar frank@exit.com http://www.exit.com/ Exit Consulting http://www.gpsclock.com/ http://www.exit.com/blog/frank/