Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Jul 2014 02:56:08 +0200
From:      Mateusz Guzik <mjguzik@gmail.com>
To:        freebsd-arch@freebsd.org
Subject:   Re: current fd allocation idiom
Message-ID:  <20140718005608.GB15714@dft-labs.eu>
In-Reply-To: <20140717235538.GA15714@dft-labs.eu>
References:  <20140717235538.GA15714@dft-labs.eu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jul 18, 2014 at 01:55:38AM +0200, Mateusz Guzik wrote:
> The kernel has to deal with concurrent attempts to open, close, dup2 and
> use the same file descriptor.
> 
> I start with stating a rather esoteric bug affecting this area, then I
> follow with a short overview of what is happening in general and a
> proposal how to change to get rid of the bug and get some additional
> enchancements. Interestingly enough turns out Linux is doing pretty much
> the same thing.
> 
> ============================
> THE BUG:
> /*
>  * Initialize the file pointer with the specified properties.
>  *
>  * The ops are set with release semantics to be certain that the flags, type,
>  * and data are visible when ops is.  This is to prevent ops methods from being
>  * called with bad data.
>  */
> void
> finit(struct file *fp, u_int flag, short type, void *data, struct fileops *ops)
> {
>         fp->f_data = data;
>         fp->f_flag = flag;
>         fp->f_type = type;
>         atomic_store_rel_ptr((volatile uintptr_t *)&fp->f_ops, (uintptr_t)ops);
> }
> 
> This itself is fine, but it assumes all code obtaining fp from fdtable places
> a read memory barrier after reading f_ops and before reading anything else.
> As you could guess no code does that and I don't believe placing rmb's
> in several new places is the way to go.
> 
> ============================
> GENERAL OVERVIEW OF CURRENT STATE:
> 
> fps are obtained and installed as follows:
> 
> struct file *fp;
> int fd;
> 
> if (error = falloc(&fp, &fd))
> 	return (error);
> if (error = something(....)) {
> 	fdclose(fp, fd);
> 	fdrop(fp);
> 	return (error);
> }
> 
> finit(fp, ....);
> fdrop(fp);
> return (0);
> 
> After falloc fp is installed in fdtable, it has refcount 2 and ops set to
> 'badfileops'.
> 
> if something() failed:
> fdclose() checks if it has anything to do. if so, it cleares fd and fdrops fp
> fdrop() clears the second reference, everything is cleared up
> 
> if something() succeeded:
> finit() finishes initialization of fp
> fdrop() cleares the second reference. fp now has expected refcount of 1.
> 
> Now a little complication:
> parallel close() execution:
> fd is recognizes as used. it is cleared and fdrop(fp) is called.
> 
> if something() succeeded after close:
> fdrop() kills fp
> 
> if something() failed after close:
> fdclose() concludes nothing to do
> fdrop() kill fp
> 
> Same story with dup2.
> 
> What readers need to do:
> - rmb() after reading fp_ops
> - check fp_ops for badfileops
> 
> ============================
> PROPOSAL:
> 
> struct file *fp;
> int fd;
> 
> if (error = falloc(&fp, &fd))
> 	return (error);
> if (error = something(....)) {
> 	fdclose(fp, fd);
> 	return (error);
> }
> 
> finit(fp, ....);
> factivate(fd, fp);
> return (0);
> 
> After falloc fd is only marked as used, fp is NOT installed.
> fp is returned with refcount of 1 and is invisible to anyone but
> curthread.
> 
> if something() failed:
> fdclose() marks fd as unused and kills fp
> 
> if something() succeeded:
> finit() finishes initialization of fp
> factivate() sets fp to non-null with a barrier
> 
> Now a little complication:
> parallel close() execution:
> since fp is null, fd is recognized as unused. EBADF is returned.
> 
> The only problem is with dup2 and I believe it is actually a step
> forward.
> 
> Let's assume fd was marked as used by falloc, but fp was not installed yet.
> dup2(n, fd) will see that fd is used. With current code there is no
> problem since there is fp to fdrop and it can proceed. With the proposal
> however, there is nothing to fdrop. Linux returns EBADF in this case
> which deals with the problem and does not seem to provide any drawbacks
> for behaving processes.
> 
> So, differences to current approach:
> 1. fewer barriers and atomic operations
> 2. no need to check for f_ops type
> 3. new case when dup2 can return an error
> 

One has to note that fdtable can be reallocated at any time and
factivate would have to make sure it updated the current table. The
simplest thing would be to take filedesc lock, which diminishes
advantages to some extent.

Maybe switching sx lock to other kind would remedy this a little.

> Note that 3 should not be a problem since Linux is doing this already.
> 
> Also note current approach is not implemented correctly at the moment as
> it misses rmbs, although I'm unsure how much this matters in practice.
> 
> Thoughts?
> -- 
> Mateusz Guzik <mjguzik gmail.com>

-- 
Mateusz Guzik <mjguzik gmail.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140718005608.GB15714>