Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 9 Aug 2010 01:21:03 +0200
From:      Attilio Rao <attilio@freebsd.org>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        freebsd-fs@freebsd.org, Jaakko Heinonen <jh@freebsd.org>
Subject:   Re: syncer vnode leak because of nmount() race
Message-ID:  <AANLkTi=uDQQPVa80zsVRXpu3JcR0nvfC=XxUqFnKkBep@mail.gmail.com>
In-Reply-To: <20100613144240.GV13238@deviant.kiev.zoral.com.ua>
References:  <20100603143501.GA3176@a91-153-117-195.elisa-laajakaista.fi> <AANLkTimuJJcZ0D1TMDvTHjpb3advHetSB0aJw2IevSC1@mail.gmail.com> <20100613144240.GV13238@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
I think that the patch looks good.
Did you test it?
Otherwise you could ask to gianni@ to test it out.

Thanks,
Attilio


2010/6/13 Kostik Belousov <kostikbel@gmail.com>:
> On Fri, Jun 04, 2010 at 05:00:49PM +0200, Attilio Rao wrote:
>> 2010/6/3 Jaakko Heinonen <jh@freebsd.org>:
>> >
>> > Hi,
>> >
>> > I have been playing with devfs and stress2 recently and I was able to
>> > trigger a syncer vnode leak with devfs.sh. As far as I can tell it is =
a
>> > nmount(2) (initial) mount race against update mount. The code in
>> > vfs_domount() is protected with vfs_busy()/vfs_unbusy() but it doesn't
>> > protect against update mounts. The protection by Giant is defeated
>> > because devfs_root() may sleep due to sx_xlock(). vfs_domount() calls
>> > VFS_ROOT() before allocating the syncer vnode. VI_MOUNT vnode flag
>> > doesn't provide sufficient protection against update mounts either.
>>
>> Thanks for this bug report.
>> I think that, luckilly, it is not a very common condition to have the
>> mount still in flight and get updates... :)
>> However, I think that the real breakage here is that the check on
>> mnt->mnt_syncer is done lockless and it is unsafe. It might really be
>> protected by the mount interlock but that's tricky because
>> vfs_allocate_syncvnode() sleeps. Also re-checking the condition (after
>> the allocation takes place) is not too simple here.
> Is it =C2=A0? I think that the patch below would fix the issue, by
> syncronizing mnt_syncer by syncer mutex.
>
> On the other hand, I prefer the jh' patch, because it seemingly disallows
> parallel updates of the mount point. I believe that mp->mnt_vnodecovered
> should be stable after the call to vfs_busy() succeeded.
>
>
>> I found also this bug when rewriting the syncer and I resolved that by
>> using a separate flag for that (in my case it was simpler and more
>> beneficial actually for some other reasons, but you may do the same
>> thing with a mnt_kern_flag entry).
>> If you have no time I can work actively on the patch, even if I'm
>> confident, luckilly, this problem is very hard to happen in the real
>> life.
>>
>> Additively, note that vfs_busy() here is not intended to protect
>> against such situations but against unmount.
>> Also, I'm very unsure about what Giant protection might bring here.
>> IMHO we might probabilly remove it at all as long as all the sleeping
>> point present make it completely unuseful if anything (and I don't see
>> a reason why it may be needed).
>>
>> > PS. vfs_unbusy(9) manual page is out of date after r184554 and IMO
>> > =C2=A0 =C2=A0vfs_busy(9) manual page is misleading because it talks ab=
out
>> > =C2=A0 =C2=A0synchronizing access to a mount point.
>>
>> May you be more precise on what is misleading please?
>
> diff --git a/sys/kern/vfs_mount.c b/sys/kern/vfs_mount.c
> index 088d939..053bd68 100644
> --- a/sys/kern/vfs_mount.c
> +++ b/sys/kern/vfs_mount.c
> @@ -1034,14 +1034,10 @@ vfs_domount(
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0else
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0mp->mnt_kern_flag &=3D ~MNTK_ASYNC;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0MNT_IUNLOCK(mp);
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if ((mp->mnt_flag & MN=
T_RDONLY) =3D=3D 0) {
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 if (mp->mnt_syncer =3D=3D NULL)
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 error =3D vfs_allocate_syncvnode(mp);
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 } else {
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 if (mp->mnt_syncer !=3D NULL)
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 vrele(mp->mnt_syncer);
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 mp->mnt_syncer =3D NULL;
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if ((mp->mnt_flag & MN=
T_RDONLY) =3D=3D 0)
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 error =3D vfs_allocate_syncvnode(mp);
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 else
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 vfs_deallocate_syncvnode(mp);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0vfs_unbusy(mp);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0VI_LOCK(vp);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0vp->v_iflag &=3D ~=
VI_MOUNT;
> diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
> index ff6744a..6e4bb12 100644
> --- a/sys/kern/vfs_subr.c
> +++ b/sys/kern/vfs_subr.c
> @@ -3400,12 +3400,31 @@ vfs_allocate_syncvnode(struct mount *mp)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0/* XXX - vn_syncer_add_to_worklist() also grab=
s and drops sync_mtx. */
> =C2=A0 =C2=A0 =C2=A0 =C2=A0mtx_lock(&sync_mtx);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0sync_vnode_count++;
> + =C2=A0 =C2=A0 =C2=A0 if (mp->mnt_syncer =3D=3D NULL) {
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mp->mnt_syncer =3D vp;
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 vp =3D NULL;
> + =C2=A0 =C2=A0 =C2=A0 }
> =C2=A0 =C2=A0 =C2=A0 =C2=A0mtx_unlock(&sync_mtx);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0BO_UNLOCK(bo);
> - =C2=A0 =C2=A0 =C2=A0 mp->mnt_syncer =3D vp;
> + =C2=A0 =C2=A0 =C2=A0 if (vp !=3D NULL)
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 vgone(vp);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0return (0);
> =C2=A0}
>
> +void
> +vfs_deallocate_syncvnode(struct mount *mp)
> +{
> + =C2=A0 =C2=A0 =C2=A0 struct vnode *vp;
> +
> + =C2=A0 =C2=A0 =C2=A0 mtx_lock(&sync_mtx);
> + =C2=A0 =C2=A0 =C2=A0 vp =3D mp->mnt_syncer;
> + =C2=A0 =C2=A0 =C2=A0 if (vp !=3D NULL)
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mp->mnt_syncer =3D NUL=
L;
> + =C2=A0 =C2=A0 =C2=A0 mtx_unlock(&sync_mtx);
> + =C2=A0 =C2=A0 =C2=A0 if (vp !=3D NULL)
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 vrele(vp);
> +}
> +
> =C2=A0/*
> =C2=A0* Do a lazy sync of the filesystem.
> =C2=A0*/
> @@ -3484,15 +3503,16 @@ sync_reclaim(struct vop_reclaim_args *ap)
>
> =C2=A0 =C2=A0 =C2=A0 =C2=A0bo =3D &vp->v_bufobj;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0BO_LOCK(bo);
> - =C2=A0 =C2=A0 =C2=A0 vp->v_mount->mnt_syncer =3D NULL;
> + =C2=A0 =C2=A0 =C2=A0 mtx_lock(&sync_mtx);
> + =C2=A0 =C2=A0 =C2=A0 if (vp->v_mount->mnt_syncer =3D=3D vp)
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 vp->v_mount->mnt_synce=
r =3D NULL;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0if (bo->bo_flag & BO_ONWORKLST) {
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mtx_lock(&sync_mtx);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0LIST_REMOVE(bo, bo=
_synclist);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0syncer_worklist_le=
n--;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0sync_vnode_count--=
;
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mtx_unlock(&sync_mtx);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0bo->bo_flag &=3D ~=
BO_ONWORKLST;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0}
> + =C2=A0 =C2=A0 =C2=A0 mtx_unlock(&sync_mtx);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0BO_UNLOCK(bo);
>
> =C2=A0 =C2=A0 =C2=A0 =C2=A0return (0);
> diff --git a/sys/sys/mount.h b/sys/sys/mount.h
> index 20dcf64..dcff206 100644
> --- a/sys/sys/mount.h
> +++ b/sys/sys/mount.h
> @@ -731,6 +731,7 @@ int vfs_busy(struct mount *, int);
> =C2=A0int =C2=A0 =C2=A0vfs_export =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* process mount export info */
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(struct mount *, struct export_a=
rgs *);
> =C2=A0int =C2=A0 =C2=A0vfs_allocate_syncvnode(struct mount *);
> +void =C2=A0 vfs_deallocate_syncvnode(struct mount *);
> =C2=A0int =C2=A0 =C2=A0vfs_donmount(struct thread *td, int fsflags, struc=
t uio *fsoptions);
> =C2=A0void =C2=A0 vfs_getnewfsid(struct mount *);
> =C2=A0struct cdev *vfs_getrootfsid(struct mount *);
>



--=20
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTi=uDQQPVa80zsVRXpu3JcR0nvfC=XxUqFnKkBep>