Date: Fri, 20 May 2016 18:36:49 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Bruce Evans <brde@optusnet.com.au> Cc: fs@freebsd.org Subject: Re: fix for per-mount i/o counting in ffs Message-ID: <20160520153649.GX89104@kib.kiev.ua> In-Reply-To: <20160520142654.GW89104@kib.kiev.ua> References: <20160518084931.T6534@besplex.bde.org> <20160518110834.GJ89104@kib.kiev.ua> <20160519065714.H1393@besplex.bde.org> <20160519094901.O1798@besplex.bde.org> <20160519120557.A2250@besplex.bde.org> <20160519104128.GN89104@kib.kiev.ua> <20160520074427.W1151@besplex.bde.org> <20160520092348.GV89104@kib.kiev.ua> <20160520194427.W1170@besplex.bde.org> <20160520142654.GW89104@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, May 20, 2016 at 05:26:54PM +0300, Konstantin Belousov wrote: > On Fri, May 20, 2016 at 09:22:08PM +1000, Bruce Evans wrote: > > On Fri, 20 May 2016, Konstantin Belousov wrote: > > > > > On Fri, May 20, 2016 at 09:27:38AM +1000, Bruce Evans wrote: > > >>> diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c > > >>> index 712fc21..21425f5 100644 > > >>> --- a/sys/ufs/ffs/ffs_vfsops.c > > >>> +++ b/sys/ufs/ffs/ffs_vfsops.c > > >>> @@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td) > > >>> cred = td ? td->td_ucred : NOCRED; > > >>> ronly = (mp->mnt_flag & MNT_RDONLY) != 0; > > >>> > > >>> + KASSERT(devvp->v_type == VCHR, ("reclaimed devvp")); > > >>> dev = devvp->v_rdev; > > >>> dev_ref(dev); > > >>> + if (!atomic_cmpset_ptr(&dev->si_mountpt, 0, mp)) { > > >>> + dev_rel(dev); > > >>> + VOP_UNLOCK(devvp, 0); > > >>> + return (EBUSY); > > >>> + } > > >> > > >> This is cleaner and safer than my version. > > >> > > >>> DROP_GIANT(); > > >>> g_topology_lock(); > > >>> error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1); > > >> > > >> g_vfs_open() already sets devvp->v_bufobj.bo_ops to g_vfs_bufops unless > > >> it fails. This clobbered our setting in the buggy multiple-mount case. > > >> But with multiple mounts not allowed, this cleans up any garbage in > > >> v_bufobj. > > > Yes, and this orders things. g_vfs_open() shoudl have devvp locked, > > > both fo bo manipulations and for vnode_create_vobject() call. > > > We can only assign to bo_ops after g_vfs_open() was done successfully. > > > > The atomic cmpset now orders things too. Is that enough? It ensures > > that an old mount cannot be active. I don't know if v_bufobj is used > > for non-mounts. > v_bufobj is logically protected against modifications by the vnode lock. > > > > > Except, for zfs there is no g_vfs_open() to order things, and for all > > other file systems there is no atomic cmpset yet. > > > > >> g_vfs_open() has 2 failures for non-exclusive access. It starts by > > >> checking v_bufobj.bo_private == devvp (this is after translating its > > >> pointers to the ones passed here). This is avg's fix for the multiple- > > >> mounts problem (r206130). It doesn't work in all cases. I think this > > >> is unecessary now. > > > At least it weeds out other devfs mounts. > > > > Yes, we need it until everything is converted. > > > > >> ... > > >> I now see another cleanup: don't goto out when g_vfs_open() fails. This > > >> depends on it setting cp to NULL and leaving nothing to clean when it > > >> fails. It has no man page and this detail is documented in its source > > >> code. > > > Then I would need to add another NULL assignment, VOP_UNLOCK etc. > > > > g_vfs_open() already sets cp to NULL when it fails, and the cleanup > > depends on that now, but it is just as good to depend on no cleanup > > being needed on failure. You do need another dev_rel(). > > > > I thought about moving the dev_ref() later to simplify the early returns. > > I thought that this didn't quite work. Now I think it does work, for > > obvious reasons: > > - the device is attached to a vnode, so it is referenced to prevent it > > going away unless the device is revoked. It seems to be referenced > > at least 3 times in FreeBSD-9. > > - the vnode is locked, so the reference count remains > 0 until we unlock. > > So we just need a dev_ref() before the unlock in the non-error case, to > > keep the device from going away if it is revoked. > Yes, and this is how the current patched code is structured. > > > > > > Updated patch to add acq/rel. > > > > > > diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c > > > index 712fc21..670bb15 100644 > > > --- a/sys/ufs/ffs/ffs_vfsops.c > > > +++ b/sys/ufs/ffs/ffs_vfsops.c > > > @@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td) > > > cred = td ? td->td_ucred : NOCRED; > > > ronly = (mp->mnt_flag & MNT_RDONLY) != 0; > > > > > > + KASSERT(devvp->v_type == VCHR, ("reclaimed devvp")); > > > > Hrmph. > I want this, it would remove amount of obvious questions. > > > > > > dev = devvp->v_rdev; > > > dev_ref(dev); > > > > Move later... > > > > > + if (atomic_cmpset_acq_ptr(&dev->si_mountpt, 0, mp) != 0) { > > > > I changed the first 0 to NULL, and this works on i386, but now I remember > > that i386 has bogus casts which break detection of type mismatches -- > > the atomic ptr functions take a [u]intptr_t, not a pointer type, so > > NULL won't work if it is ((void *)0). At least amd64 is still missing > > this bug. > cmpset_<bar>_ptr() on i386 has cast for old and new parameters to u_int. > store_rel_ptr() on i386 does not cast value to u_int. As result, NULL > is acceptable for cmpset, but not for store. I spelled it 0 in all cases. > > Hm, I also should add uintptr_t cast for cmpset, otherwise, I suspect, > some arch might be broken. Even more casts are needed, updated patch is below. > > > > > > + dev_rel(dev); > > > > ...then this dev_rel() is not needed. > > > > > + VOP_UNLOCK(devvp, 0); > > > + return (EBUSY); > > > + } > > > DROP_GIANT(); > > > g_topology_lock(); > > > error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1); > > > g_topology_unlock(); > > > PICKUP_GIANT(); > > > - VOP_UNLOCK(devvp, 0); > > > - if (error) > > > + if (error != 0) { > > > + VOP_UNLOCK(devvp, 0); > > > goto out; > > > > This becomes: > > > > if (error != 0) { > > VOP_UNLOCK(devvp, 0); > > return (EBUSY); > > } > > > > Then assign v_bufobj. > > > > Then dev_ref(), just in time for unlocking. > > > > Then unlock. > Ok. > > > > > > - if (devvp->v_rdev->si_iosize_max != 0) > > > - mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max; > > > + } > > > + if (dev->si_iosize_max != 0) > > > + mp->mnt_iosize_max = dev->si_iosize_max; > > > if (mp->mnt_iosize_max > MAXPHYS) > > > mp->mnt_iosize_max = MAXPHYS; > > > - > > > devvp->v_bufobj.bo_ops = &ffs_ops; > > > - if (devvp->v_type == VCHR) > > > - devvp->v_rdev->si_mountpt = mp; > > > + VOP_UNLOCK(devvp, 0); > > > > This belongs earlier. > > > > > > > > fs = NULL; > > > sblockloc = 0; > > > ... > > > > We need this in a central function. g_vfs_open/close() can do it for > > all cases except zfs. This looks like: > I might look at this later. > > > > > DROP_GIANT(); > > g_topology_lock(); > > // atomic_cmpset and its error = EBUSY moved to top of g_vfs_open() > > error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1); > > g_topology_unlock(); > > PICKUP_GIANT(); > > if (error != 0) { > > VOP_UNLOCK(devvp, 0); > > return (error); > > } > > devvp->v_bufobj.bo_ops = &ffs_ops; > > dev_ref(dev); > > VOP_UNLOCK(devvp, 0); > > if (dev->si_iosize_max != 0) > > mp->mnt_iosize_max = dev->si_iosize_max; > > if (mp->mnt_iosize_max > MAXPHYS) > > mp->mnt_iosize_max = MAXPHYS; > > > > where 2 of 2 lines with GIANT and 3 of 4 lines with iosize_max remain to > > be cleaned up. > > > > Resetting si_mountpt in g_vfs_close() is even simpler. Oops, it also has > > to be reset in g_vfs_open() on a later failure there. diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c index 712fc21..0487c2f 100644 --- a/sys/ufs/ffs/ffs_vfsops.c +++ b/sys/ufs/ffs/ffs_vfsops.c @@ -764,25 +764,31 @@ ffs_mountfs(devvp, mp, td) cred = td ? td->td_ucred : NOCRED; ronly = (mp->mnt_flag & MNT_RDONLY) != 0; + KASSERT(devvp->v_type == VCHR, ("reclaimed devvp")); dev = devvp->v_rdev; - dev_ref(dev); + if (atomic_cmpset_acq_ptr((uintptr_t *)&dev->si_mountpt, 0, + (uintptr_t)mp) != 0) { + VOP_UNLOCK(devvp, 0); + return (EBUSY); + } DROP_GIANT(); g_topology_lock(); error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1); g_topology_unlock(); PICKUP_GIANT(); + if (error != 0) { + VOP_UNLOCK(devvp, 0); + atomic_store_rel_ptr((uintptr_t *)&dev->si_mountpt, 0); + return (error); + } + dev_ref(dev); + devvp->v_bufobj.bo_ops = &ffs_ops; VOP_UNLOCK(devvp, 0); - if (error) - goto out; - if (devvp->v_rdev->si_iosize_max != 0) - mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max; + if (dev->si_iosize_max != 0) + mp->mnt_iosize_max = dev->si_iosize_max; if (mp->mnt_iosize_max > MAXPHYS) mp->mnt_iosize_max = MAXPHYS; - devvp->v_bufobj.bo_ops = &ffs_ops; - if (devvp->v_type == VCHR) - devvp->v_rdev->si_mountpt = mp; - fs = NULL; sblockloc = 0; /* @@ -1083,8 +1089,6 @@ ffs_mountfs(devvp, mp, td) out: if (bp) brelse(bp); - if (devvp->v_type == VCHR && devvp->v_rdev != NULL) - devvp->v_rdev->si_mountpt = NULL; if (cp != NULL) { DROP_GIANT(); g_topology_lock(); @@ -1102,6 +1106,7 @@ out: free(ump, M_UFSMNT); mp->mnt_data = NULL; } + atomic_store_rel_ptr((uintptr_t *)&dev->si_mountpt, 0); dev_rel(dev); return (error); } @@ -1287,8 +1292,7 @@ ffs_unmount(mp, mntflags) g_vfs_close(ump->um_cp); g_topology_unlock(); PICKUP_GIANT(); - if (ump->um_devvp->v_type == VCHR && ump->um_devvp->v_rdev != NULL) - ump->um_devvp->v_rdev->si_mountpt = NULL; + atomic_store_rel_ptr((uintptr_t *)&ump->um_dev->si_mountpt, 0); vrele(ump->um_devvp); dev_rel(ump->um_dev); mtx_destroy(UFS_MTX(ump));
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160520153649.GX89104>