Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 Mar 2010 09:23:53 -0400
From:      Steve Polyack <korvus@comcast.net>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-fs@freebsd.org, User Questions <freebsd-questions@freebsd.org>, bseklecki@noc.cfi.pgh.pa.us
Subject:   Re: FreeBSD NFS client goes into infinite retry loop
Message-ID:  <4BA37AE9.4060806@comcast.net>
In-Reply-To: <201003190831.00950.jhb@freebsd.org>
References:  <4BA3613F.4070606@comcast.net> <201003190831.00950.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 03/19/10 08:31, John Baldwin wrote:
> On Friday 19 March 2010 7:34:23 am Steve Polyack wrote:
>    
>> Hi, we use a FreeBSD 8-STABLE (from shortly after release) system as an
>> NFS server to provide user home directories which get mounted across a
>> few machines (all 6.3-RELEASE).  For the past few weeks we have been
>> running into problems where one particular client will go into an
>> infinite loop where it is repeatedly trying to write data which causes
>> the NFS server to return "reply ok 40 write ERROR: Input/output error
>> PRE: POST:".  This retry loop can cause between 20mbps and 500mbps of
>> constant traffic on our network, depending on the size of the data
>> associated with the failed write.
>>
>> We spent some time on the issue and determined that something on one of
>> the clients is deleting a file as it is being written to by another NFS
>> client.  We were able to enable the NFS lockmgr and use lockf(1) to fix
>> most of these conditions, and the frequency of this problem has dropped
>> from once a night to once a week.  However, it's still a problem and we
>> can't necessarily force all of our users to "play nice" and use lockf/flock.
>>
>> Has anyone seen this before?  No errors are being logged on the NFS
>> server itself, but the "Server Ret-Failed" counter begins to increase
>> rapidly whenever a client gets stuck in this infinite retry loop:
>> Server Ret-Failed
>>           224768961
>>
>> I have a feeling that using NFS in such a matter may simply be prone to
>> such problems, but what confuses me is why the NFS client system is
>> infinitely retrying the write operation and causing itself so much grief.
>>      
> Yes, your feeling is correct.  This sort of race is inherent to NFS if you do
> not use some sort of locking protocol to resolve the race.  The infinite
> retries sound like a client-side issue.  Have you been able to try a newer OS
> version on a client to see if it still causes the same behavior?
>
>    
I can't try a newer FBSD version on the client where we are seeing the 
problems, but I can recreate the problem fairly easily.  Perhaps I'll 
try it with an 8.0 client.  If I remember correctly, one of the strange 
things is that it doesn't seem to hit "critical mass" until a few hours 
after the operation first fails.  I may be wrong, but I'll double check 
that when I check vs. 8.0-release.

I forgot to add this in the first post, but these are all TCP NFS v3 mounts.

Thanks for the response.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4BA37AE9.4060806>