Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 Aug 1999 15:17:38 -0700 (PDT)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        David Malone <dwmalone@maths.tcd.ie>
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   Re: vm_fault: pager read error on NFS filesystems. 
Message-ID:  <199908242217.PAA18836@apollo.backplane.com>
References:   <199908241955.aa23162@salmon.maths.tcd.ie>

next in thread | previous in thread | raw e-mail | index | archive | help

:I don't see how 2 could make break the API - all a process in this
:state can do is spin trying to serve SIGBUSs. I think HPUX may KILL
:processes in this state. Yep, on HPUX 10.10 when I run my test
:program (included at the bottom of this mail) I get:
:
:Pid 2717 was killed due to failure in writing the signal context.
:
:Solaris 2.6 gives a bus error. AIX and Digital Unix have the same
:behavior as FreeBSD. Linux either does what FreeBSD does or core
:dumps depending on the address of the SIGBUS handler I give it.
:I'd like to see what NTs POSIX subsystem does...
:
:BTW - don't run the test program on a -CURRENT machine unless you've
:recompiled the kernel today. There was a bug which alowd processes
:to catch SIGKILL, and it results in the process being unkillable.

    Oh, sure, during stacking.  I was thinking of processes which
    catch SIGBUS and then do something inside their signal handler
    which may SIGBUS again.  That sort of recursion is legal.

:Init got stuck in vmopar while doing a wait on a 3.2-STABLE machine.
:We haven't managed to do this reproduceably, but the first stuck
:zombie was a process which had suffered from the "vm_fault: pager
:read error" problem which someone had kill -9'ed.
:
:We do have a trace from a 4.0 machine with a normal process stuck
:in vmopar. I've tacked that on the bottom too. The problems may
:be similar. I don't understand why waiting on a zombie would
:require the text of the program at all - but I haven't looked at
:the code so.

    The vmopar lockup is a known bug.  If the underlying vnode changes
    out from under a vm_fault a client can get into trouble.

> (earlier email written by Matt in May 99)
>     The problem is a same-process deadlock.  A VM fault occurs accessing a
>     NFS-backed page.  The fault locks (PG_BUSY's) the page in question then
>     calls vnode_pager_getpages() to bring the page in.  This filters down
>     into an nfs_getpages() call which then calls nfs_readrpc().
> 
>     nfs_readrpc() normally ( and properly ) tries to keep the vnode 
>     synchronized to the NFS state returned by the RPC.  The problem is that
>     if the state indicates that the server has truncated the file,
>     vnode_pager_setsize() will be called and will attempt to remove all
>     the pages beyond the truncation point from the VM object.
> 
>     Unfortunately, at least one of those pages has been locked by the same
>     process.  Bewm.  Deadlock.

    Did I never fix this problem?  May was a bad month.  This is still an 
    open problem.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

:	David.

:... (from 4.0 box panic)
:    wmesg=0xc0250b91 "vmopar", timo=0) at ../../kern/kern_synch.c:443
:#2  0xc01e2caf in vm_object_page_remove (object=0xc87baf3c, start=0, end=978, 
:    clean_only=0) at ../../vm/vm_page.h:536
:#3  0xc01e7449 in vnode_pager_setsize (vp=0xc8760980, nsize=0)
:    at ../../vm/vnode_pager.c:285
:#4  0xc01ad117 in nfs_loadattrcache (vpp=0xc86aedbc, mdp=0xc86aedc8, 
:    dposp=0xc86aedcc, vaper=0x0) at ../../nfs/nfs_subs.c:1383
:#5  0xc01b5f80 in nfs_readrpc (vp=0xc8760980, uiop=0xc86aee30, cred=0xc0ac6580)
:    at ../../nfs/nfs_vnops.c:1086
:#6  0xc018def5 in nfs_getpages (ap=0xc86aee6c) at ../../nfs/nfs_bio.c:154
:#7  0xc01e79fe in vnode_pager_getpages (object=0xc87baf3c, m=0xc86aef00, 
:    count=1, reqpage=0) at vnode_if.h:1067
:#8  0xc01dc158 in vm_fault (map=0xc7c5ff80, vaddr=134512640, 
:    fault_type=1 '\001', fault_flags=0) at ../../vm/vm_pager.h:130
:#9  0xc0214a14 in trap_pfault (frame=0xc86aefa8, usermode=1, eva=134513640)
:    at ../../i386/i386/trap.c:781
:#10 0xc02145a3 in trap (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, 
:      tf_edi = -1077945600, tf_esi = -1077945608, tf_ebp = -1077945652, 
:      tf_isp = -932515884, tf_ebx = 1, tf_edx = 1, tf_ecx = 0, tf_eax = 10, 
:      tf_trapno = 12, tf_err = 4, tf_eip = 134513640, tf_cs = 31, 
:      tf_eflags = 66118, tf_esp = -1077945652, tf_ss = 47})
:    at ../../i386/i386/trap.c:349
:#11 0x80483e8 in ?? ()
:#12 0x804837d in ?? ()
:



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199908242217.PAA18836>