Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 10 Jul 2011 13:30:05 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Kirk McKusick <mckusick@FreeBSD.org>
Cc:        svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org
Subject:   Re: svn commit: r223900 - in head/sys: geom ufs/ffs
Message-ID:  <20110710114730.T1039@besplex.bde.org>
In-Reply-To: <201107100041.p6A0fVea022978@svn.freebsd.org>
References:  <201107100041.p6A0fVea022978@svn.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 10 Jul 2011, Kirk McKusick wrote:

> Log:
>  Allow disk partitions associated with UFS read-only mounted
>  filesystems to be opened for writing. This functionality used to
>  be special-cased for just the root filesystem, but with this change
>  is now available for all UFS filesystems. This change is needed for
>  journaled soft updates recovery.
>
>  Discussed with: Jeff Roberson

This seems too unsafe to me, but it may help fix some other bugs:

1. It was broken even for the root file systems due to some bugs in
    frobbing the E flag.  It worked initially, but stopped working
    after the root file sys was mounted read-write -- changing it
    back to read-only didn't work, so in particular the root file
    system remained un-fsckable after shutting down to single user
    mode and mounting it read-only in order to fsck it.  I have
    some patches that might fix this (enclosed).  I'm not sure if
    this is fixed in -current or if moving the E flag frobbing in
    this commit fixes it.

2. tunefs should now work on file systems mounted read-only.  I
    think this once sort of worked, but the E flag broke this too
    (except initially for root), but tunefs has poor error handling
    when it turns out that that the file system is mounted read-write.
    I have some patches that might fix this (not all enclosed).

% Index: ffs_vfsops.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_vfsops.c,v
% retrieving revision 1.345
% diff -u -2 -r1.345 ffs_vfsops.c
% --- ffs_vfsops.c	10 Aug 2008 12:15:36 -0000	1.345
% +++ ffs_vfsops.c	14 Aug 2008 15:51:01 -0000
% @@ -229,9 +229,18 @@
%  			}
%  			vn_finished_write(mp);
% +
% +			/*
% +			 * Decrement the W count, and decrement the E count
% +			 * if we are remounting root (fsck magic).
% +			 */

Improve comments.  Might be OBE.

%  			DROP_GIANT();
%  			g_topology_lock();
% -			g_access(ump->um_cp, 0, -1, 0);
% +			error = g_access(ump->um_cp, 0, -1,
% +			    (mp->mnt_flag & MNT_ROOTFS) ? -1 : 0);

This might be the fix.

%  			g_topology_unlock();
%  			PICKUP_GIANT();
% +			if (error)
% +				return (error);
% +

Add missing error handling.  g_access() can easily fail, although it
rarely fails in the above since the above asks for less access than
has already been granted.

%  			fs->fs_ronly = 1;
%  			MNT_ILOCK(mp);
% @@ -273,18 +282,18 @@
%  				}
%  			}
% -			DROP_GIANT();
% -			g_topology_lock();
% +
%  			/*
% -			 * If we're the root device, we may not have an E count
% -			 * yet, get it now.
% +			 * Increment the W count, and increment the E count
% +			 * if we are remounting root (fsck magic).
%  			 */
% -			if (ump->um_cp->ace == 0)
% -				error = g_access(ump->um_cp, 0, 1, 1);
% -			else
% -				error = g_access(ump->um_cp, 0, 1, 0);
% +			DROP_GIANT();
% +			g_topology_lock();
% +			error = g_access(ump->um_cp, 0, 1,
% +			    (mp->mnt_flag & MNT_ROOTFS) ? 1 : 0);

Improve comments and style.  Don't be so verbose, except in the comments.

%  			g_topology_unlock();
%  			PICKUP_GIANT();
%  			if (error)
%  				return (error);
% +
%  			if ((error = vn_start_write(NULL, &mp, V_WAIT)) != 0)
%  				return (error);
% @@ -596,13 +605,13 @@
%  	dev = devvp->v_rdev;
%  	cred = td ? td->td_ucred : NOCRED;
% -
%  	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
% +
%  	DROP_GIANT();
%  	g_topology_lock();
%  	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
% -
%  	/*
% -	 * If we are a root mount, drop the E flag so fsck can do its magic.
% -	 * We will pick it up again when we remount R/W.
% +	 * If we are mounting root ro, then decrement the E count (which
% +	 * g_access() just incremented) so that fsck can do its magic.
% +	 * We will increment it again if we remount rw.
%  	 */

Probably OBE.

%  	if (error == 0 && ronly && (mp->mnt_flag & MNT_ROOTFS))
% @@ -610,4 +619,5 @@
%  	g_topology_unlock();
%  	PICKUP_GIANT();
% +
%  	VOP_UNLOCK(devvp, 0);
%  	if (error)
% @@ -620,4 +630,5 @@
%  	devvp->v_bufobj.bo_private = cp;
%  	devvp->v_bufobj.bo_ops = &ffs_ops;
% +	devvp->v_bufobj.bo_flag |= BO_MPSAFE;

Unrelated.

% 
%  	bp = NULL;

Part of a patch for tunefs:

% Index: tunefs.c
% ===================================================================
% RCS file: /home/ncvs/src/sbin/tunefs/tunefs.c,v
% retrieving revision 1.42
% diff -u -2 -r1.42 tunefs.c
% --- tunefs.c	9 Apr 2004 19:58:40 -0000	1.42
% +++ tunefs.c	10 Apr 2004 02:10:11 -0000
% ...
% @@ -224,12 +224,14 @@
%  		usage();
% 
% -	on = special = argv[0];
% +	special = argv[0];
%  	if (ufs_disk_fillout(&disk, special) == -1)
%  		goto err;
% -	if (disk.d_name != special) {
% +	if (disk.d_name != special)
%  		special = disk.d_name;
% -		if (statfs(special, &stfs) == 0 &&
% -		    strcmp(special, stfs.f_mntonname) == 0)
% -			active = 1;
% +	stfsp = getmntpt(special);
% +	if (stfsp != NULL) {
% +		if (pflag == 0 && (stfsp->f_flags & MNT_RDONLY) == 0)
% +			errx(1,
% +			    "cannot work on read-write mounted file system");
%  	}
%

getmntpt() is essentially a copy of the function of the same name in
fsck_ffs.  It is used for similar things.  Here it is used to check
more carefully if the fs is already mounted, and to bail out early if
it is already mounted r/w.

newfs_foofs should use a similar function, which should be in libc near
the fstab access functions.  I got ps@ to do add getmntpt() in newfs_msdos
and started from there in tunefs.  Better access control in the kernel
might make this function less needed in applications, but it is still
good for error handling.

I forget what replaced the hotroot stuff in fsck_ffs, or what the unpatched
version of tunefs does instead of bailing out -- it apparently just
continues until sbwrite() fails.

This was written before the E flag existed.  I think exclusive access
just breaks read-only mounted file systems.

% ...
% @@ -370,12 +368,13 @@
%  		goto err;
%  	ufs_disk_close(&disk);
% -	if (active) {
% +	if (stfsp != NULL) {
%  		bzero(&args, sizeof(args));
% -		if (mount("ufs", on,
% -		    stfs.f_flags | MNT_UPDATE | MNT_RELOAD, &args) < 0)
% +		if (mount("ufs", stfsp->f_mntonname,
% +		    stfsp->f_flags | MNT_UPDATE | MNT_RELOAD, &args) < 0)
%  			err(9, "%s: reload", special);
%  		warnx("file system reloaded");
%  	}
%  	exit(0);
% +
%  err:
%  	if (disk.d_error != NULL)

'active' is no longer needed, since we are more careful in finding the
mounted file system if there is one.

Exclusivivity should also fix races in the above, since we should open the
device r/w and thus get exclusive access for the whole operation, but
this seems to be broken with the help of libufs.  I don't like libufs
at all.  Here its brokenness is that it obfuscates the open modes and thus
helps confuse tunefs into using wrong ones.  tunefs should use O_RDONLY for
the -p case and O_RDWR otherwise.  What it actually does is:
- O_RDONLY via ufs_disk_fullout().  Use this for most of the program.
- O_RDWR via sbwrite().  sbwrite() calls ufs_disk_write() which changes
   the open mode to O_RDWR extremely non-atomically.  It just
   does `close(disk->f_fd); disk->d_fd = open(disk->d_name, O_RDWR)'.  So
   even if you had exclusive access by opening the device O_RDWR, every
   single ufs_disk_write() drops it.

These bugs don't quite duplicate old ones in tunefs.  tunefs used to
use getsb() instead of ufs_disk_fillout(), and putsb() instead of
ufs_disk_write().  So it too used O_RDONLY for most of the program.
But it was more careful in putsb().  It held the O_RDONLY fd open while
opening a (different) O_RDWR descriptor and documented this as being
"an interlock to prevent the device from being mounted while we are
switching mode".  However, I think O_RDONLY never gave any useful
interlock.

libufs's bugs also affect newfs.  The main one that I knew about was
that the error returned by sbwrite() is always ignored.  This worked
before libufs by using wtfs() for superblock writes too.
     (wtfs() has unclean but working error handling.  It exits on error.
     Libraries shouldn't exit on error, so libufs tries to preserve
     error codes and messages from possibly-deeply-nested routines.
     Applications, faced with the complexity of this, too often don't
     even know about it and don't even check for errors.  newfs should
     probably use a wrapper function to recover its old error handling
     for superblock writes.  Using the library would still require
     several more lines of code and complications than just doing it.)
Add the close/open race to this.  wtfs() (local in newfs) uses bwrite()
in libufs, and bwrite() uses ufs_disk_write(), giving the close/open
race.

fsck_ffs mercifully doesn't use libufs, so it is missing these bugs.
It uses write(), but otherwise has more specialized error handling
than a library can reasonably do.  This is why I don't like libraries
like libufs.  They make simple things more complicated and buggy by
wrapping layers around them, and can't help for not so simple things.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110710114730.T1039>