Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 2 Mar 2012 17:16:43 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        Peter Holm <peter@holm.cc>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r226967 - head/sys/ufs/ufs
Message-ID:  <201203021716.43954.jhb@freebsd.org>
In-Reply-To: <20120302204223.GA51629@x2.osted.lan>
References:  <201110311501.p9VF1lrf020688@svn.freebsd.org> <201203021153.06614.jhb@freebsd.org> <20120302204223.GA51629@x2.osted.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Friday, March 02, 2012 3:42:23 pm Peter Holm wrote:
> On Fri, Mar 02, 2012 at 11:53:06AM -0500, John Baldwin wrote:
> > On Friday, March 02, 2012 8:29:21 am Peter Holm wrote:
> > > On Thu, Mar 01, 2012 at 04:47:41PM -0500, John Baldwin wrote:
> > > > On Monday, October 31, 2011 11:01:47 am Peter Holm wrote:
> > > > > Author: pho
> > > > > Date: Mon Oct 31 15:01:47 2011
> > > > > New Revision: 226967
> > > > > URL: http://svn.freebsd.org/changeset/base/226967
> > > > > 
> > > > > Log:
> > > > >   The kern_renameat() looks up the fvp using the DELETE flag, which 
causes
> > > > >   the removal of the name cache entry for fvp.
> > > > >   
> > > > >   Reported by:	Anton Yuzhaninov <citrin citrin ru>
> > > > >   In collaboration with:	kib
> > > > >   MFC after:	1 week
> > > > > 
> > > > > Modified:
> > > > >   head/sys/ufs/ufs/ufs_vnops.c
> > > > 
> > > > So I ran into this at work recently, and even this fix applied I was 
still 
> > > > seeing rename()'s that were seemingly not taking effect.  After 
getting some 
> > > > extra KTR traces, I figured out that the same purge needs to be 
applied to the 
> > > > destination vnode.  Specifically, the issue I ran into was that was 
renaming 
> > > > 'foo' to 'bar', but lookups for 'bar' were still returning the old 
file.  The 
> > > > reason was that a lookup after the namei(RENAME) of the destination 
while 
> > > > ufs_rename() had its locks dropped was readding the name cache entry 
for 
> > > > 'bar', and then a cache_lookup() of 'bar' would return the old vnode 
as long 
> > > > as that vnode was valid (e.g. if it had a link in another location, or 
other 
> > > > processes had an open file descriptor for it).  I'm currently testing 
the 
> > > > patch below:
> > > > 
> > > 
> > > I now have a scenario that fails, but not quite the same way you
> > > describe.
> > > 
> > > It looks like this:
> > > 
> > > touch file1
> > > echo xxx > file2
> > > rename(file1, file2)
> > > 
> > > A different process performs stat() on both files in a tight loop.
> > > 
> > > Once in a while I observe that a stat() of file2, after the rename,
> > > returns a link count of zero. Size is zero as expected, but the inode
> > > number of file2 is unchanged.
> > 
> > Hmm, that is surprising.  I would not expect inconsistent stat info.  I
> > have no explanation for why that would happen.  I do not have a simplified
> > test program, just a specific workload at work.  In this case it's 
workflow
> > is more like this:
> > 
> > 	fd = flopen(file1, O_CREAT);
> > 	fstat(fd);
> > 	if (st.st_size == 0) {
> > 		fd2 = open(file1.temp, O_CREAT | O_EXLOCK);
> > 		fd3 = open(someotherfile);
> > 		copy_data(fd3, fdf2);
> > 		close(fd3);
> > 		rename(file1.temp, file1);
> > 		close(fd);
> > 		fd = fd2;
> > 	}
> > 	link(file1, uniquedir/file1);
> > 	close(fd);
> > 
> > 	/* Use uniquedir/file1, and unlink it when done. */
> > 
> > What I observed was that sometimes uniquedir/file1 would end up 
referencing
> > the empty file created by flopen() after the rename() rather than linking
> > to the file created when file1.temp was created.
> > 
> > > I've been running the same test with your patch and not observed this
> > > problem. This on UFS2 with SU enabled.
> > 
> > Hmm, I wish I could explain explain your odd result above in terms of this
> > bug, but the results from stat() should always be consistent 
(VOP_GETATTR()
> > can't switch vnodes mid-stream as it were).
> > 
> > BTW, note that in my case where I had multiple processes all doing the 
same
> > loop, in the edge case, another process always had the file open (and was
> > blocked in flock() in flopen()) when the rename() happened, so that 
prevented
> > the vnode from going away.  This is important as otherwise the use count 
would
> > drop to zero and marked inactive which removes all references to it from 
the
> > name cache.  In my case the flock() in flopen() and the fact that the 
"first"
> > process held the flock until after the rename and call to link() made the
> > race more likely to trigger.
> > 
> > Hmm, perhaps one way to do this would be:
> > 
> > 	touch file.always (save its i-node)
> > 	fork worker process that just continually does a 'stat file1' in a 
loop
> > 	main process:
> > 		ln file.always file1
> > 		touch file2
> > 		rename file2 file1
> > 		stat file1
> > 		complain if file1 has the saved i-node
> > 
> > Hmm, I just whipped up something to do this and it fails early and often 
on
> > an unpatched kernel, but does not with my patch.
> > 
> > #include <sys/types.h>
> > #include <sys/stat.h>
> > #include <err.h>
> > #include <errno.h>
> > #include <fcntl.h>
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <unistd.h>
> > 
> > static char *always, *file1, *file2;
> > static ino_t always_ino;
> > 
> > static void
> > usage(void)
> > {
> > 	fprintf(stderr, "Usage: rename_race <dir>\n");
> > 	exit(1);
> > }
> > 
> > static void
> > child(void)
> > {
> > 	struct stat sb;
> > 
> > 	/* Exit as soon as our parent exits. */
> > 	while (getppid() != 1) {
> > 		stat(file1, &sb);
> > 	}
> > 	exit(0);
> > }
> > 
> > static void
> > create_file(const char *path)
> > {
> > 	int fd;
> > 
> > 	fd = open(path, O_CREAT, 0666);
> > 	if (fd < 0)
> > 		err(1, "open(%s)", path);
> > 	close(fd);
> > }
> > 
> > int
> > main(int ac, char **av)
> > {
> > 	struct stat sb, sb2;
> > 	pid_t pid;
> > 
> > 	if (ac != 2)
> > 		usage();
> > 	if (stat(av[1], &sb) != 0)
> > 		err(1, "stat(%s)", av[1]);
> > 	if (!S_ISDIR(sb.st_mode))
> > 		errx(1, "%s not a directory", av[1]);
> > 
> > 	asprintf(&always, "%s/file.always", av[1]);
> > 	asprintf(&file1, "%s/file1", av[1]);
> > 	asprintf(&file2, "%s/file2", av[1]);
> > 
> > 	create_file(always);
> > 	if (stat(always, &sb) != 0)
> > 		err(1, "stat(%s)", always);
> > 	always_ino = sb.st_ino;
> > 
> > 	pid = fork();
> > 	if (pid < 0)
> > 		err(1, "fork");
> > 	if (pid == 0)
> > 		child();
> > 	for (;;) {
> > 		if (unlink(file1) < 0 && errno != ENOENT)
> > 			err(1, "unlink(%s)", file1);
> > 		if (link(always, file1) < 0)
> > 			err(1, "link(%s, %s)", always, file1);
> > 		create_file(file2);
> > 		if (stat(file2, &sb2) < 0)
> > 			err(1, "stat(%s)", file2);
> > 		if (rename(file2, file1) < 0)
> > 			err(1, "rename(%s, %s)", file2, file1);
> > 		if (stat(file1, &sb) < 0)
> > 			err(1, "stat(%s)", file1);
> > 		if (sb.st_ino != sb2.st_ino ||
> > 		    sb.st_ino == always_ino)
> > 			printf("Bad stat: always: %d file1: %d (should be %d)\n",
> > 			    always_ino, sb.st_ino, sb2.st_ino);
> > 	}
> > 	return (0);
> > }
> > 
> 
> Excellent test scenario!
> 
> For the tmpfs case I do not see any problems using both our
> scenarios on a pristine HEAD.

Ah, that may be.  I have only observed this on UFS.  If tmpfs does not have 
the stuff to drop locks and do a restart then it may not be subject to the 
'tvp' race.  I do wonder if all filesystems are subject to the 'fvp' race 
though given that kern_renameat() doesn't keep 'fvp' locked while it does the 
namei() lookup of the target (or am I just wrong)?

For example, I wonder if NFS is subject to the same problem.  Hmm, a quick 
test of NFS shows no problems.  Does tmpfs support LOOKUP_SHARED btw?  If not, 
then that might explain it not being as vulnerable.  Ah, it doesn't, so that
might explain it.

(Also, I committed this patch to HEAD already, so you might need to revert it
for testing tmpfs.)

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201203021716.43954.jhb>