Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 May 2016 14:08:34 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        fs@freebsd.org
Subject:   Re: fix for per-mount i/o counting in ffs
Message-ID:  <20160518110834.GJ89104@kib.kiev.ua>
In-Reply-To: <20160518084931.T6534@besplex.bde.org>
References:  <20160517072104.I2137@besplex.bde.org> <20160517084241.GY89104@kib.kiev.ua> <20160518061040.D5948@besplex.bde.org> <20160518070252.F6121@besplex.bde.org> <20160517220055.GF89104@kib.kiev.ua> <20160518084931.T6534@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, May 18, 2016 at 10:00:09AM +1000, Bruce Evans wrote:
> On Wed, 18 May 2016, Konstantin Belousov wrote:
> > VCHR check ensures that the devvp vnode is not reclaimed. I do not want
> > to remove the check and rely on the caller of ffs_mountfs() to always do
> > the right thing for it without unlocking devvp, this is too subtle.
> 
> Surely the caller must lock devvp?  Otherwise none of the uses of devvp
> can be trusted, and there are several others.
It must lock, but the interface of ffs_mountfs() would then require
that there is no relock between vn_isdisk() check and call.

I think I know how to make a good compromise there.  I converted the
check for VCHR into the assert.

> There is also ump->um_devvvp, but this seems to be unusable since it
> might go away.
Go away as in being reclaimed, yes.  The vnode itself is there, since
we keep a reference.

> 
> So using the devvp->v_rdev instead of the dev variable is not just a
> style bug.
Might be.


> > 	devvp->v_bufobj.bo_ops = &ffs_ops;
> > 	if (devvp->v_type == VCHR)
> 
> devvp must be still VCHR since this is now under the vnode lock, and we
> depend on dev remaining a character device for the disk described by
> devvp at the time of the vn_isdisk() check.
Unless relocked.

> 
> > -		devvp->v_rdev->si_mountpt = mp;
> > +		dev->si_mountpt = mp;
> > +	VOP_UNLOCK(devvp, 0);
> 
> The unlocking could be a little earlier since dev is still for a disk even
> if devvp went away and you changed this to not used devvp->v_rdev.
> 
> >
> > 	fs = NULL;
> > 	sblockloc = 0;
> 
> Unlocking and then using devvp sure looks like a race.
> 
> You only needed to move the unlocking to fix.  devvp->v_bufobj.  How does
> that work?  The write is now locked, but if devvp goes away, then don't
> we lose its bufobj?
The buffer queues are flushed, and BO_DEAD flag is set.  But the flag
does very little.


> How does any use of ump->um_devvp work?
> 
> I tried revoke(2) on the devvp of a mounted file system.  This worked
> to give v_type = VBAD and v_rdev = NULL, but didn't crash.  ffs_unmount()
> checked for the bad vnode, unlike most places, and failed to clear
> si_mountpt.
> 
> Normal use doesn't have revokes, but if the vnode is reclaimed instead
> of just becoming bad, then worse things probably happen.  I think vnode
> cache resizing gives very unstable storage so the pointer becomes very
> invalid.  But even revoke followed by setting kern.numvnodes to 1 didn't
> crash (15 vnodes remained).  So devvp must be referenced throughout.
> It seems to have reference count 2, since umounting reduced kern.numvnodes
> from 15 to 13.  (It is surprising how much works with kern.maxvnodes=1.
> I was able to run revoke, sysctl and umount.)  It is still a mystery
> that the VBAD vnode doesn't crash soon.
> 
I believe that bo_ops assignment is the reason why UFS mounts survive the
reclamation of the devvp vnode.  Take a look at the ffs_geom_strategy(),
which is the place where UFS io is tunneled directly into geom.  It does
not pass io requests through devfs.  As result, revocation does not
change much except doing unneccessary buf queue flush.

It might be telling to try the same experiment, as conducted in your
next message, on msdosfs instead of UFS.

Below is the simplified patch.

diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
index 712fc21..412b000 100644
--- a/sys/ufs/ffs/ffs_vfsops.c
+++ b/sys/ufs/ffs/ffs_vfsops.c
@@ -764,6 +764,7 @@ ffs_mountfs(devvp, mp, td)
 	cred = td ? td->td_ucred : NOCRED;
 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
 
+	KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));
 	dev = devvp->v_rdev;
 	dev_ref(dev);
 	DROP_GIANT();
@@ -771,17 +772,17 @@ ffs_mountfs(devvp, mp, td)
 	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
 	g_topology_unlock();
 	PICKUP_GIANT();
-	VOP_UNLOCK(devvp, 0);
-	if (error)
+	if (error != 0) {
+		VOP_UNLOCK(devvp, 0);
 		goto out;
-	if (devvp->v_rdev->si_iosize_max != 0)
-		mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max;
+	}
+	if (dev->si_iosize_max != 0)
+		mp->mnt_iosize_max = dev->si_iosize_max;
 	if (mp->mnt_iosize_max > MAXPHYS)
 		mp->mnt_iosize_max = MAXPHYS;
-
 	devvp->v_bufobj.bo_ops = &ffs_ops;
-	if (devvp->v_type == VCHR)
-		devvp->v_rdev->si_mountpt = mp;
+	dev->si_mountpt = mp;
+	VOP_UNLOCK(devvp, 0);
 
 	fs = NULL;
 	sblockloc = 0;
@@ -1083,8 +1084,7 @@ ffs_mountfs(devvp, mp, td)
 out:
 	if (bp)
 		brelse(bp);
-	if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
-		devvp->v_rdev->si_mountpt = NULL;
+	dev->si_mountpt = NULL;
 	if (cp != NULL) {
 		DROP_GIANT();
 		g_topology_lock();
@@ -1287,8 +1287,7 @@ ffs_unmount(mp, mntflags)
 	g_vfs_close(ump->um_cp);
 	g_topology_unlock();
 	PICKUP_GIANT();
-	if (ump->um_devvp->v_type == VCHR && ump->um_devvp->v_rdev != NULL)
-		ump->um_devvp->v_rdev->si_mountpt = NULL;
+	ump->um_dev->si_mountpt = NULL;
 	vrele(ump->um_devvp);
 	dev_rel(ump->um_dev);
 	mtx_destroy(UFS_MTX(ump));




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160518110834.GJ89104>