Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 25 Aug 2014 10:34:04 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Mateusz Guzik <mjguzik@gmail.com>, freebsd-hackers@freebsd.org
Subject:   Re: atomic_load_acq_int in sequential_heuristic
Message-ID:  <20140825073404.GZ2737@kib.kiev.ua>
In-Reply-To: <20140825005659.GA14344@dft-labs.eu>
References:  <20140824115729.GC2045@dft-labs.eu> <20140824162331.GW2737@kib.kiev.ua> <20140824164236.GX2737@kib.kiev.ua> <20140825005659.GA14344@dft-labs.eu>

next in thread | previous in thread | raw e-mail | index | archive | help

--uepJbk9oh0IF4lih
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Aug 25, 2014 at 02:57:00AM +0200, Mateusz Guzik wrote:
> On Sun, Aug 24, 2014 at 07:42:36PM +0300, Konstantin Belousov wrote:
> > On Sun, Aug 24, 2014 at 07:23:31PM +0300, Konstantin Belousov wrote:
> > > On Sun, Aug 24, 2014 at 01:57:29PM +0200, Mateusz Guzik wrote:
> > > > Writer side is:
> > > > fp->f_seqcount =3D (arg + bsize - 1) / bsize;
> > > > do {
> > > > 	new =3D old =3D fp->f_flag;
> > > > 	new |=3D FRDAHEAD;
> > > > } while (!atomic_cmpset_rel_int(&fp->f_flag, old, new));
> > > >=20
> > > > Reader side is:
> > > > if (atomic_load_acq_int(&(fp->f_flag)) & FRDAHEAD)
> > > > 	return (fp->f_seqcount << IO_SEQSHIFT);
> > > >=20
> > > > We can easily get the following despite load_acq:
> > > > CPU0				CPU1
> > > > 				load_acq fp->f_flag
> > > > fp->f_seqcount =3D ...
> > > > store_rel fp->f_flag
> > > > 				read fp->f_seqcount
> > > > 			=09
> > > > So the barrier does not seem to serve any purpose.
> > > It does.
> > >=20
> > > Consider initial situation, when the flag is not set yet. There, we
> > > do not want to allow the reader to interpret automatically calculated
> > > f_seqcount as the user-supplied constant.  Without barriers, we might
> > > read the flag as set, while user-provided value for f_seqcount is sti=
ll
> > > not visible to processor doing read.
> > That said, I think now that there is a real bug.
> >=20
> > If we did not read the FRDAHEAD in sequential_heuristic(), we may
> > override user-supplied value for f_seqcount.  I do not see other
> > solution than start to use locking.
>=20
> Right.
>=20
> How about abusing vnode lock for this purpose? All callers of
> sequential_heuristic have the vnode at least shared locked.
>=20
> FRDAHEAD setting code locks it shared. We can change that to exclusive,
> which will close the race and should not be problematic given that it is
> rather rare.
>=20
> diff --git a/sys/kern/kern_descrip.c b/sys/kern/kern_descrip.c
> index 7abdca0..643920b 100644
> --- a/sys/kern/kern_descrip.c
> +++ b/sys/kern/kern_descrip.c
> @@ -762,18 +762,18 @@ kern_fcntl(struct thread *td, int fd, int cmd, intp=
tr_t arg)
>  		}
>  		if (arg >=3D 0) {
>  			vp =3D fp->f_vnode;
> -			error =3D vn_lock(vp, LK_SHARED);
> +			error =3D vn_lock(vp, LK_EXCLUSIVE);
>  			if (error !=3D 0) {
>  				fdrop(fp, td);
>  				break;
>  			}
>  			bsize =3D fp->f_vnode->v_mount->mnt_stat.f_iosize;
> -			VOP_UNLOCK(vp, 0);
>  			fp->f_seqcount =3D (arg + bsize - 1) / bsize;
>  			do {
>  				new =3D old =3D fp->f_flag;
>  				new |=3D FRDAHEAD;
>  			} while (!atomic_cmpset_rel_int(&fp->f_flag, old, new));
Do we still need rel there ?

> +			VOP_UNLOCK(vp, 0);
>  		} else {
>  			do {
>  				new =3D old =3D fp->f_flag;
> diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c
> index f1d19ac..98823f3 100644
> --- a/sys/kern/vfs_vnops.c
> +++ b/sys/kern/vfs_vnops.c
> @@ -438,7 +438,8 @@ static int
>  sequential_heuristic(struct uio *uio, struct file *fp)
>  {
> =20
> -	if (atomic_load_acq_int(&(fp->f_flag)) & FRDAHEAD)
> +	ASSERT_VOP_LOCKED(fp->f_vnode, __func__);
> +	if (fp->f_flag & FRDAHEAD)
>  		return (fp->f_seqcount << IO_SEQSHIFT);
> =20
>  	/*

I believe the patch is correct.

Two notes.  First, please add a comment explaining which other part
of the code is locked against in F_READAHEAD switch case.  Second,
should the vnode lock cover the FRDAHEAD reset case too, at least
for consistency ?

--uepJbk9oh0IF4lih
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBAgAGBQJT+ubsAAoJEJDCuSvBvK1B7XcP/2wadiRcw75ae4/uR6C1RYpn
WFAzjNx+KL4fQV5n4aWPaXaqDrIeEfdKYZADj2J6ceVN6sAcWWKWU7Fi3jyUJb8e
1Q5sSgx9TI0Y1ME/nQrs6TVqDMfi1jQZQqkjV6EkZPdXIMG3ZDxt/JlR+ASilxD0
/WBTDoWtBDFEJYAnYeTwzZ4vVDRJ6cWsUoNtTZCl0EgUuAOtIdY1c7xHfRFBp0D8
nypou9Ty0yaO8YkaCRe/6apMmV9++2PPW4TsoN8OlMgOs/etf7l8RrcH5BYq22s1
K2Cz8d54n3fs3zxyPYwfBsnqD6HwEzbwv7B7mF7cHTFwzQiLs1E+rCbVg3hLyWbn
XJkU/fiBsz0BbREwaW7yZY9oIOLgsnyZR10Ep9BTk9hy4KOZI74vdI22MSeP/G/u
3ush46DsSGkVWFlH2e3U7stHXN5Dy6pQeOJKzQJ9aZHKi3zm7F0HPL6Jd6BVPE3O
nBM1btqxxFkZX7Q48JM35eghD4ouEw01tE0umTnH5d+1lI4Bx6Km1k0VxP0n19YC
rVVnT48ypLI+7BkaoiIUV6bfcS6rv1yY08PfRB40ckeyjAilDntjbKpzlayb5BDi
mKv04InB+S/1ppZUtiBkP26Qd5QLa2cXz35UJCt9bW/Fwv1SFTI/FjEoUc2UTy3/
O6X8JDdkqSG47kS/EyFR
=AARQ
-----END PGP SIGNATURE-----

--uepJbk9oh0IF4lih--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140825073404.GZ2737>