Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 5 Mar 2016 20:33:39 +0700
From:      Eugene Grosbein <eugen@grosbein.net>
To:        Dmitry Sivachenko <trtrmitya@gmail.com>
Cc:        FreeBSD Stable ML <stable@freebsd.org>
Subject:   Re: nfs_getpages: error 4
Message-ID:  <56DAE033.9020304@grosbein.net>
In-Reply-To: <550ADE4F-9F60-44FB-BF07-A1384A6B7B1A@gmail.com>
References:  <A2A32332-4D9D-40DF-9DEC-EE9000879416@gmail.com> <56DACD4E.3070905@grosbein.net> <550ADE4F-9F60-44FB-BF07-A1384A6B7B1A@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
05.03.2016 19:32, Dmitry Sivachenko пишет:

>>> I am running a number of machines with /home mounted via nfs (FreeBSD 10.3-PRERELEASE #0 r294799, rw,bg,intr,soft).
>>>
>>> Sometimes I get the following messages in syslog:
>>>
>>> nfs_getpages: error 4
>>> vm_fault: pager read error, pid NNN (myprog)
>>>
>>> After that I see I lot of processes stuck in "pfault" state (these are computational processes which use some files from NFS mount), they use 0% of CPU after that.
>>>
>>> On NFS server machine I see nothing strange in logs.  procstat -kk for such stuck processes shows:
>>>   PID    TID COMM             TDNAME           KSTACK
>>> 85274 102056 myprog           -                mi_switch+0xbe sleepq_wait+0x3a _sleep+0x287 vm_waitpfault+0x8a vm_fault_hold+0xdd0 vm_fault+0x77 trap_pfault+0x180 trap+0x52c calltrap+0x8
>>>
>>>
>>> What can be the reason of this?
>>
>> For example, if some processes running on NFS server box modify some files "in-place"
>> and these files are opened by processes running on NFS client, that could be the reason.
>> If so, change this so processes updating such files create new temporary versions of them first
>> and then rename them atomically.
>>
>
> This should not be the case: users are working only on NFS clients.
> Moreover, the nature of computations is so that each process uses it's own set of files.
>
> (Forgot to mention in my previous e-mail that these processes can't be stopped even with kill -9)

Make sure you use TCP mounts and TSO is disabled. Try switching between NFSv3/NFSv4 to avoid this bug
and to discover what version is broken. And show full mount command/option set.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?56DAE033.9020304>