Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Jul 2014 16:06:29 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Mateusz Guzik <mjguzik@gmail.com>
Cc:        freebsd-arch@freebsd.org
Subject:   Re: current fd allocation idiom
Message-ID:  <20140718130629.GJ93733@kib.kiev.ua>
In-Reply-To: <20140717235538.GA15714@dft-labs.eu>
References:  <20140717235538.GA15714@dft-labs.eu>

next in thread | previous in thread | raw e-mail | index | archive | help

--fkEewJHTsXZZ0w48
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Jul 18, 2014 at 01:55:38AM +0200, Mateusz Guzik wrote:
> The kernel has to deal with concurrent attempts to open, close, dup2 and
> use the same file descriptor.
>=20
> I start with stating a rather esoteric bug affecting this area, then I
> follow with a short overview of what is happening in general and a
> proposal how to change to get rid of the bug and get some additional
> enchancements. Interestingly enough turns out Linux is doing pretty much
> the same thing.
>=20
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D
> THE BUG:
> /*
>  * Initialize the file pointer with the specified properties.
>  *
>  * The ops are set with release semantics to be certain that the flags, t=
ype,
>  * and data are visible when ops is.  This is to prevent ops methods from=
 being
>  * called with bad data.
>  */
> void
> finit(struct file *fp, u_int flag, short type, void *data, struct fileops=
 *ops)
> {
>         fp->f_data =3D data;
>         fp->f_flag =3D flag;
>         fp->f_type =3D type;
>         atomic_store_rel_ptr((volatile uintptr_t *)&fp->f_ops, (uintptr_t=
)ops);
> }
>=20
> This itself is fine, but it assumes all code obtaining fp from fdtable pl=
aces
> a read memory barrier after reading f_ops and before reading anything els=
e.
> As you could guess no code does that and I don't believe placing rmb's
> in several new places is the way to go.
I think your analysis is correct for all cases except kern_openat().

For kern_openat(), we install a file into fdtable only after the struct
file is fully initialized, see kern_openat(). The file is allocated
with falloc_noinstall(), then file is initialized, then finstall() does
FILEDESC_LOCK/UNLOCK, which ensures the full barrier. Other file
accessors do fget_unlocked(), which does acquire (of fp->f_count, but
this does not matter).

The only critical constraint is that other accessors must not see
f_ops !=3D badfileops while struct file is not fully visible.  IMO,
the full barrier in finstall() and acquire in fget*() guarantee
that we see badfileops until writes to other members are visible.

For falloc(), indeed the write to f_ops could become visible too early
(but not on x86).

>=20
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D
> GENERAL OVERVIEW OF CURRENT STATE:
>=20
> fps are obtained and installed as follows:
>=20
> struct file *fp;
> int fd;
>=20
> if (error =3D falloc(&fp, &fd))
> 	return (error);
> if (error =3D something(....)) {
> 	fdclose(fp, fd);
> 	fdrop(fp);
> 	return (error);
> }
>=20
> finit(fp, ....);
> fdrop(fp);
> return (0);
>=20
> After falloc fp is installed in fdtable, it has refcount 2 and ops set to
> 'badfileops'.
>=20
> if something() failed:
> fdclose() checks if it has anything to do. if so, it cleares fd and fdrop=
s fp
> fdrop() clears the second reference, everything is cleared up
>=20
> if something() succeeded:
> finit() finishes initialization of fp
> fdrop() cleares the second reference. fp now has expected refcount of 1.
>=20
> Now a little complication:
> parallel close() execution:
> fd is recognizes as used. it is cleared and fdrop(fp) is called.
>=20
> if something() succeeded after close:
> fdrop() kills fp
>=20
> if something() failed after close:
> fdclose() concludes nothing to do
> fdrop() kill fp
>=20
> Same story with dup2.
>=20
> What readers need to do:
> - rmb() after reading fp_ops
> - check fp_ops for badfileops
How can readers see badfileops ?

>=20
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D
> PROPOSAL:
>=20
> struct file *fp;
> int fd;
>=20
> if (error =3D falloc(&fp, &fd))
Use falloc_noinstall() there.

> 	return (error);
> if (error =3D something(....)) {
> 	fdclose(fp, fd);
> 	return (error);
> }
>=20
> finit(fp, ....);
> factivate(fd, fp);
This function is spelled finstall().

It seems that all what is needed is conversion of places using
falloc() to falloc_noinstall()/finstall().

> return (0);
>=20
> After falloc fd is only marked as used, fp is NOT installed.
> fp is returned with refcount of 1 and is invisible to anyone but
> curthread.
>=20
> if something() failed:
> fdclose() marks fd as unused and kills fp
>=20
> if something() succeeded:
> finit() finishes initialization of fp
> factivate() sets fp to non-null with a barrier
>=20
> Now a little complication:
> parallel close() execution:
> since fp is null, fd is recognized as unused. EBADF is returned.
>=20
> The only problem is with dup2 and I believe it is actually a step
> forward.
>=20
> Let's assume fd was marked as used by falloc, but fp was not installed ye=
t.
> dup2(n, fd) will see that fd is used. With current code there is no
> problem since there is fp to fdrop and it can proceed. With the proposal
> however, there is nothing to fdrop. Linux returns EBADF in this case
> which deals with the problem and does not seem to provide any drawbacks
> for behaving processes.
>=20
> So, differences to current approach:
> 1. fewer barriers and atomic operations
> 2. no need to check for f_ops type
> 3. new case when dup2 can return an error
>=20
> Note that 3 should not be a problem since Linux is doing this already.
>=20
> Also note current approach is not implemented correctly at the moment as
> it misses rmbs, although I'm unsure how much this matters in practice.
>=20
> Thoughts?
> --=20
> Mateusz Guzik <mjguzik gmail.com>
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"

--fkEewJHTsXZZ0w48
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBAgAGBQJTyRvVAAoJEJDCuSvBvK1BZ0oP/jcXsr1uN2vcD85ED6NlFOvs
I/PCpLyndYxOdMEGhUPiHw7nMvmpcHknGjxyPrzr64fMCk1WUiySA7gY5kqNxV/w
UZB61O9O57A9m3c+3uIHMWQdhZYPk7WC0d1H59rCcwlnMrFAZXgy+EcFvd4MNAgN
gTR4Ss2mb8sqjVpl34Sal4MRV2AuprGsWJfViomcGYOp02pu8pNbvq2hySmoTu/Y
HGsx9Pxv9Tq5P4pWBbb6RG6nEB2wiggv2qQa3MWy4YSf+e5GI5b0aeu9mZO9RD2/
t5iL8skUp4TTt/TZIM7KOpbvzqfKU30nFfB+b8ZNB2bKMkXgvWU4KS6vqhIcCX4z
DWq0XbEoQ63myLqcdzv8FSxuWhA2Djv8BoslpeIZeGltM+aMAhBw1htI5kP44aWo
IvDmW//O5QrUay1cct1Nnn170sRF8pjzeazQECiP3c8GVHYdUXqy4HWuLLs1lqGg
LVgruvQMcuzxYuB4gQtttvKYl7M5gS12H0zj/ALn+YoW9i4VHvjk31GvvZ4NAVw2
ahHohWU7KJSLukmgrYt5Cb33CyCd5obNpm4YLR/iQo6rjhxN4m9Lg5wTek8Rwz03
pjqP0ZKJJPX4lSX0CzKPStcO5Oz/gIVgoUGNhdRWHa1WxRFE0nBHLGfeYizU2Bsq
nfX+H5kv4pa7+pR0+SJz
=YyP8
-----END PGP SIGNATURE-----

--fkEewJHTsXZZ0w48--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140718130629.GJ93733>