Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Nov 2010 09:33:18 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-stable@freebsd.org
Cc:        Adam McDougall <mcdouga9@egr.msu.edu>
Subject:   Re: Stale NFS file handles on 8.x amd64
Message-ID:  <201011300933.18505.jhb@freebsd.org>
In-Reply-To: <4CF44E2E.4070700@egr.msu.edu>
References:  <4CF44E2E.4070700@egr.msu.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Monday, November 29, 2010 8:06:54 pm Adam McDougall wrote:
> I've been running dovecot 1.1 on FreeBSD 7.x for a while with a bare 
> minimum of NFS problems, but it got worse with 8.x.  I have 2-4 servers 
> (usually just 2) accessing mail on a Netapp over NFSv3 via imapd. 
> delivery is via procmail which doesn't touch the dovecot metadata and 
> webmail uses imapd.  Client connections to imapd go to random servers 
> and I don't yet have solid means to keep certain users on certain 
> servers.  I upgraded some of the servers to 8.x and dovecot 1.2 and ran 
> into Stale NFS file handles causing index/uidlist corruption causing 
> inboxes to appear as empty when they were not.  In some situations their 
> corrupt index had to be deleted manually.  I first suspected dovecot 1.2 
> since it was upgraded at the same time but I downgraded to 1.1 and its 
> doing the same thing.  I don't really have a wealth of details to go on 
> yet and I usually stay quiet until I do, and half the time it is 
> difficult to reproduce myself so I've had to put it in production to get 
> a feel for progress.  This only happens a dozen or so times per weekday 
> but I feel the need to start taking bigger steps.  I'll probably do what 
> I can to get IMAP back on a stable base (7.x?) and also try to debug 8.x 
> on the remaining servers.  A binary search is within possibility if I 
> can reproduce the symptoms often enough even if I have to put a test 
> server in production for a few hours.

There were some changes to allow more concurrency in the NFS client in 8 (and 
7.2+) that caused ESTALE errors to occur on open(2) more frequently.  You can 
try setting 'vfs.lookup_shared=0' to disable the extra concurrency (but at a 
performance cost) as a workaround.  The most recent 7.x and 8.x have some 
changes to open(2) to minimize ESTALE errors that I think get it back to the 
same level as when lookup_shared is set to 0.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201011300933.18505.jhb>