Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Jul 2014 01:55:38 +0200
From:      Mateusz Guzik <mjguzik@gmail.com>
To:        freebsd-arch@freebsd.org
Subject:   current fd allocation idiom
Message-ID:  <20140717235538.GA15714@dft-labs.eu>

next in thread | raw e-mail | index | archive | help
The kernel has to deal with concurrent attempts to open, close, dup2 and
use the same file descriptor.

I start with stating a rather esoteric bug affecting this area, then I
follow with a short overview of what is happening in general and a
proposal how to change to get rid of the bug and get some additional
enchancements. Interestingly enough turns out Linux is doing pretty much
the same thing.

============================
THE BUG:
/*
 * Initialize the file pointer with the specified properties.
 *
 * The ops are set with release semantics to be certain that the flags, type,
 * and data are visible when ops is.  This is to prevent ops methods from being
 * called with bad data.
 */
void
finit(struct file *fp, u_int flag, short type, void *data, struct fileops *ops)
{
        fp->f_data = data;
        fp->f_flag = flag;
        fp->f_type = type;
        atomic_store_rel_ptr((volatile uintptr_t *)&fp->f_ops, (uintptr_t)ops);
}

This itself is fine, but it assumes all code obtaining fp from fdtable places
a read memory barrier after reading f_ops and before reading anything else.
As you could guess no code does that and I don't believe placing rmb's
in several new places is the way to go.

============================
GENERAL OVERVIEW OF CURRENT STATE:

fps are obtained and installed as follows:

struct file *fp;
int fd;

if (error = falloc(&fp, &fd))
	return (error);
if (error = something(....)) {
	fdclose(fp, fd);
	fdrop(fp);
	return (error);
}

finit(fp, ....);
fdrop(fp);
return (0);

After falloc fp is installed in fdtable, it has refcount 2 and ops set to
'badfileops'.

if something() failed:
fdclose() checks if it has anything to do. if so, it cleares fd and fdrops fp
fdrop() clears the second reference, everything is cleared up

if something() succeeded:
finit() finishes initialization of fp
fdrop() cleares the second reference. fp now has expected refcount of 1.

Now a little complication:
parallel close() execution:
fd is recognizes as used. it is cleared and fdrop(fp) is called.

if something() succeeded after close:
fdrop() kills fp

if something() failed after close:
fdclose() concludes nothing to do
fdrop() kill fp

Same story with dup2.

What readers need to do:
- rmb() after reading fp_ops
- check fp_ops for badfileops

============================
PROPOSAL:

struct file *fp;
int fd;

if (error = falloc(&fp, &fd))
	return (error);
if (error = something(....)) {
	fdclose(fp, fd);
	return (error);
}

finit(fp, ....);
factivate(fd, fp);
return (0);

After falloc fd is only marked as used, fp is NOT installed.
fp is returned with refcount of 1 and is invisible to anyone but
curthread.

if something() failed:
fdclose() marks fd as unused and kills fp

if something() succeeded:
finit() finishes initialization of fp
factivate() sets fp to non-null with a barrier

Now a little complication:
parallel close() execution:
since fp is null, fd is recognized as unused. EBADF is returned.

The only problem is with dup2 and I believe it is actually a step
forward.

Let's assume fd was marked as used by falloc, but fp was not installed yet.
dup2(n, fd) will see that fd is used. With current code there is no
problem since there is fp to fdrop and it can proceed. With the proposal
however, there is nothing to fdrop. Linux returns EBADF in this case
which deals with the problem and does not seem to provide any drawbacks
for behaving processes.

So, differences to current approach:
1. fewer barriers and atomic operations
2. no need to check for f_ops type
3. new case when dup2 can return an error

Note that 3 should not be a problem since Linux is doing this already.

Also note current approach is not implemented correctly at the moment as
it misses rmbs, although I'm unsure how much this matters in practice.

Thoughts?
-- 
Mateusz Guzik <mjguzik gmail.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140717235538.GA15714>