From owner-freebsd-bugs Thu Feb 12 17:00:06 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id RAA13444 for freebsd-bugs-outgoing; Thu, 12 Feb 1998 17:00:06 -0800 (PST) (envelope-from owner-freebsd-bugs@FreeBSD.ORG) Received: (from gnats@localhost) by hub.freebsd.org (8.8.8/8.8.8) id RAA13420; Thu, 12 Feb 1998 17:00:04 -0800 (PST) (envelope-from gnats) Received: (from nobody@localhost) by hub.freebsd.org (8.8.8/8.8.8) id QAA13298; Thu, 12 Feb 1998 16:59:28 -0800 (PST) (envelope-from nobody) Message-Id: <199802130059.QAA13298@hub.freebsd.org> Date: Thu, 12 Feb 1998 16:59:28 -0800 (PST) From: gallatin@cs.duke.edu To: freebsd-gnats-submit@FreeBSD.ORG X-Send-Pr-Version: www-1.0 Subject: kern/5731: executables wedge on "vmopar" when built in fs mounted via NFSv3 from DU4.0B Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >Number: 5731 >Category: kern >Synopsis: executables wedge on "vmopar" when built in fs mounted via NFSv3 from DU4.0B >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Thu Feb 12 17:00:01 PST 1998 >Last-Modified: >Originator: Andrew Gallatin >Organization: Duke University, Department of Computer Science >Release: 2.2.5-STABLE >Environment: FreeBSD rain.cs.duke.edu 2.2.5-STABLE FreeBSD 2.2.5-STABLE #11: Thu Feb 12 18:51:05 EST 1998 gallatin@treefrog.cs.duke.edu:/usr/project/ari_scratch2/gallatin/freebsd-compiles/compile/TPZ i386 >Description: If you run certain executables immediately after writing them to a partition mounted via NFSv3 from a Digital UNIX (4.0B) NFS server, they sleep infinitely on "vmopar". Typically this occurs when one executes a large program immediately after linking it. Here is a stack trace of a wedged job: # gdb -k kernel /dev/mem GDB is free software and you are welcome to distribute copies of it under certain conditions; type "show copying" to see the conditions. There is absolutely no warranty for GDB; type "show warranty" for details. GDB 4.16 (i386-unknown-freebsd), Copyright 1996 Free Software Foundation, Inc... IdlePTD 295000 current pcb at 7257000 #0 mi_switch () at ../../kern/kern_synch.c:628 628 microtime(&runtime); (kgdb) proc pidhashtbl[220]->lh_first current pcb at f5a41000 (kgdb) where #0 mi_switch () at ../../kern/kern_synch.c:628 #1 0xf011f3b5 in tsleep (ident=0xf041cb50, priority=4, wmesg=0xf01b98d4 "vmopar", timo=0) at ../../kern/kern_synch.c:391 #2 0xf01b9a9c in vm_object_page_remove (object=0xf1903680, start=0, end=1540, clean_only=1) at ../../vm/vm_object.c:1261 #3 0xf013a090 in vinvalbuf (vp=0xf1903700, flags=1, cred=0xf18f6d00, p=0xf18b2200, slpflag=0, slptimeo=0) at ../../kern/vfs_subr.c:540 #4 0xf015e278 in nfs_vinvalbuf (vp=0xf1903700, flags=1, cred=0xf18f6d00, p=0xf18b2200, intrflg=1) at ../../nfs/nfs_bio.c:799 #5 0xf015cd60 in nfs_bioread (vp=0xf1903700, uio=0xefbffe48, ioflag=8, cred=0xf18f6d00, getpages=1) at ../../nfs/nfs_bio.c:213 #6 0xf015ca98 in nfs_getpages (ap=0xefbffe84) at ../../nfs/nfs_bio.c:130 #7 0xf01beaa8 in vnode_pager_getpages (object=0xf1903680, m=0xefbfff3c, count=2, reqpage=0) at vnode_if.h:1063 #8 0xf01bd657 in vm_pager_get_pages (object=0xf1903680, m=0xefbfff3c, count=2, reqpage=0) at ../../vm/vm_pager.c:188 #9 0xf01b32f6 in vm_fault (map=0xf18fe900, vaddr=6303744, fault_type=3 '\003', change_wiring=0) at ../../vm/vm_fault.c:426 #10 0xf01ccdcc in trap_pfault (frame=0xefbfffbc, usermode=1) at ../../i386/i386/trap.c:633 #11 0xf01cc95b in trap (frame={tf_es = 39, tf_ds = 39, tf_edi = 0, tf_esi = -272640436, tf_ebp = -272640440, tf_isp = -272629788, tf_ebx = -272640432, tf_edx = -272640424, tf_ecx = 0, tf_eax = 0, tf_trapno = 12, tf_err = 6, tf_eip = 4168, tf_cs = 31, tf_eflags = 66054, tf_esp = -272640452, tf_ss = 39}) at ../../i386/i386/trap.c:239 #12 0x1048 in ?? () The page in question has its state set to p->busy++ and p->flags &= ~PG_BUSY by nfs_getpages() at frame #6. This state causes the vm_object_page_remove to sleep, giving a deadlock since nfs_getpages() can't clear it. This path is taken in nfs_bioread() because the nfsnode's n_mtime is not equal to vattr.va_mtime.tv_sec. I suspect that what's happening is that a write is in progress (the file was just closed by the linker), and the nfsnode's n_mtime hasn't yet been updated. It appears Digital UNIX is replying to the read's getattr() before the write's setattr(), so the nfsnode's n_mtime is != to the value returned by the getattr(). There is a tcpdump of the transactions (started immediately after the link, and before the execution) between the server ("storm") and the client ("rain") at ftp://ftp.cs.duke.edu/pub/gallatin/nfs-bug/log.gz >How-To-Repeat: To repeat the problem, compile and link the example program at ftp://ftp.cs.duke.edu/pub/gallatin/nfs-bug/example.tar.gz in a partition NFSv3 mounted from a DU4.0B server. >Fix: I don't know enough about the NFSv3 spec to really fix this, but a workaround which appears to work here is to be less aggressive, and force buffers to be committed on close: *** /usr/project/spider1/FreeBSD-2.2-STABLE/src/sys/nfs/nfs_vnops.c Wed May 28 14:26:45 1997 --- nfs/nfs_vnops.c Thu Feb 12 18:50:01 1998 *************** *** 595,601 **** if ((VFSTONFS(vp->v_mount)->nm_flag & NFSMNT_NQNFS) == 0 && (np->n_flag & NMODIFIED)) { if (NFS_ISV3(vp)) { ! error = nfs_flush(vp, ap->a_cred, MNT_WAIT, ap->a_p, 0); np->n_flag &= ~NMODIFIED; } else error = nfs_vinvalbuf(vp, V_SAVE, ap->a_cred, ap->a_p, 1); --- 595,601 ---- if ((VFSTONFS(vp->v_mount)->nm_flag & NFSMNT_NQNFS) == 0 && (np->n_flag & NMODIFIED)) { if (NFS_ISV3(vp)) { ! error = nfs_flush(vp, ap->a_cred, MNT_WAIT, ap->a_p, 1); np->n_flag &= ~NMODIFIED; } else error = nfs_vinvalbuf(vp, V_SAVE, ap->a_cred, ap->a_p, 1); >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message