Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Jun 2012 16:20:10 +0200
From:      Peter Holm <peter@holm.cc>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-fs@freebsd.org, Konstantin Belousov <kib@freebsd.org>
Subject:   Re: close() of an flock'd file is not atomic
Message-ID:  <20120607142010.GA83575@x2.osted.lan>
In-Reply-To: <201206060817.54684.jhb@freebsd.org>
References:  <201203071318.08241.jhb@freebsd.org> <201203091059.29342.jhb@freebsd.org> <201203161406.27549.jhb@freebsd.org> <201206060817.54684.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jun 06, 2012 at 08:17:54AM -0400, John Baldwin wrote:
> On Friday, March 16, 2012 2:06:27 pm John Baldwin wrote:
> > On Friday, March 09, 2012 10:59:29 am John Baldwin wrote:
> > > On Thursday, March 08, 2012 5:39:19 pm Konstantin Belousov wrote:
> > > > On Thu, Mar 08, 2012 at 03:39:07PM -0500, John Baldwin wrote:
> > > > > On Wednesday, March 07, 2012 1:18:07 pm John Baldwin wrote:
> > > > > > So I ran into this problem at work.  Suppose you have a process that opens a 
> > > > > > read-write file descriptor with O_EXLOCK (so it has an flock()).  It then 
> > > > > > writes out a binary into that file.  Another process wants to execve() the 
> > > > > > file when it is ready, so it opens the file with O_EXLOCK (or O_SHLOCK), and 
> > > > > > will call execve() once it has locked the file.  In theory, what should happen 
> > > > > > is that the second process should wait until the first process has finished 
> > > > > > and called close().  In practice what happens is that I occasionally see the 
> > > > > > second process fail with ETXTBUSY.
> > > > > > 
> > > > > > The bug is that the vn_closefile() does the VOP_ADVLOCK() to unlock the file 
> > > > > > separately from the call to vn_close() which drops the writecount.  Thus, the 
> > > > > > second process can do an open() and flock() of the file and subsequently call
> > > > > > execve() after the first process has done the VOP_ADVLOCK(), but before it 
> > > > > > calls into vn_close().  In fact, since vn_close() requires a write lock on the 
> > > > > > vnode, this turns out to not be too hard to reproduce at all.  Below is a 
> > > > > > simple test program that reproduces this constantly.  To use, copy /bin/test 
> > > > > > to some other file (e.g. /tmp/foo) and make it writable (chmod a+w), then run 
> > > > > > ./flock_close_race /tmp/foo.
> > > > > > 
> > > > > > The "fix" I came up with is to defer calling VOP_ADVLOCK() to release the lock 
> > > > > > until after vn_close() executes.  However, even with that fix applied, my test
> > > > > > case still fails.  Now it is because open() with a given lock flag is
> > > > > > non-atomic in that the open(O_RDWR) will call vn_open() and bump v_writecount
> > > > > > before it blocks on the lock due to O_EXLOCK, so even though the 'exec_child' 
> > > > > > process has the fd locked, the writecount can still be bumped.  One gross hack
> > > > > > would be to defer the bump of the writecount to the caller of vn_open() if the
> > > > > > caller passes in O_EXLOCK or O_SHLOCK, but that's a really gross kludge, plus
> > > > > > it doesn't actually work.  I ended up moving acquiring the lock into 
> > > > > > vn_open_cred().  The current patch I'm testing has both of these approaches,
> > > > > > but the first one is #if 0'd out, and the second is #if 1'd.
> > > > > > 
> > > > > > http://www.freebsd.org/~jhb/patches/flock_open_close.patch
> > > > > 
> > > > > Based on some feedback from Konstantin, I've fixed some issues in the failure
> > > > > path handling for VOP_ADVLOCK().  I've also removed the #if 0'd code mentioned
> > > > > above, so the patch is now the actual change that I'm testing.  So far it
> > > > > handles both my workload at work and my test program without any issues.
> > > > 
> > > > I think a comment is needed for a reason to call vn_writechk() second time.
> > > 
> > > Fixed.
> > > 
> > > > Could you, please, point me, where the FHASLOCK is set for O_EXLOCK | O_SHLOCK
> > > > case in the patched kernel ?
> > > 
> > > It wasn't. :(  I wonder how this was even working since close shouldn't have
> > > been unlocking.  I'll need to do some more testing.  BTW, I ran into fhopen()
> > > and found that I would need to put all this same logic into that, so I've split
> > > the common code from fhopen() and vn_open_cred() into a new vn_open_vnode().
> > > I think in general it improves both sets of code.
> > > 
> > > I'll upate the patch once I've done some more testing.
> 
> Based on feedback from Konstantin, I have split the vn_open_vnode() changes
> out into a separate patch.  Once that patch is in the tree I will revisit
> this and update the actual bug-fix patch.
> 
> The vn_open_vnode() patch is at
> http://www.freebsd.org/~jhb/patches/vn_open_vnode.patch
> 
> I tested it by doing a buildworld -j 32 in a loop while NFS exporting the
> /usr/obj tree to another machine that did a continual find | xargs md5 loop
> over the /usr/obj tree.  This survived overnight.
> 

I've tested this patch for 22 hours without finding any problems.

- Peter



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120607142010.GA83575>