From owner-svn-src-all@FreeBSD.ORG Sun Jul 10 03:30:14 2011 Return-Path: Delivered-To: svn-src-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AE288106564A; Sun, 10 Jul 2011 03:30:14 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au [211.29.132.185]) by mx1.freebsd.org (Postfix) with ESMTP id 217EA8FC12; Sun, 10 Jul 2011 03:30:13 +0000 (UTC) Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au (c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191]) by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p6A3U5an018624 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 10 Jul 2011 13:30:06 +1000 Date: Sun, 10 Jul 2011 13:30:05 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Kirk McKusick In-Reply-To: <201107100041.p6A0fVea022978@svn.freebsd.org> Message-ID: <20110710114730.T1039@besplex.bde.org> References: <201107100041.p6A0fVea022978@svn.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org Subject: Re: svn commit: r223900 - in head/sys: geom ufs/ffs X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Jul 2011 03:30:14 -0000 On Sun, 10 Jul 2011, Kirk McKusick wrote: > Log: > Allow disk partitions associated with UFS read-only mounted > filesystems to be opened for writing. This functionality used to > be special-cased for just the root filesystem, but with this change > is now available for all UFS filesystems. This change is needed for > journaled soft updates recovery. > > Discussed with: Jeff Roberson This seems too unsafe to me, but it may help fix some other bugs: 1. It was broken even for the root file systems due to some bugs in frobbing the E flag. It worked initially, but stopped working after the root file sys was mounted read-write -- changing it back to read-only didn't work, so in particular the root file system remained un-fsckable after shutting down to single user mode and mounting it read-only in order to fsck it. I have some patches that might fix this (enclosed). I'm not sure if this is fixed in -current or if moving the E flag frobbing in this commit fixes it. 2. tunefs should now work on file systems mounted read-only. I think this once sort of worked, but the E flag broke this too (except initially for root), but tunefs has poor error handling when it turns out that that the file system is mounted read-write. I have some patches that might fix this (not all enclosed). % Index: ffs_vfsops.c % =================================================================== % RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_vfsops.c,v % retrieving revision 1.345 % diff -u -2 -r1.345 ffs_vfsops.c % --- ffs_vfsops.c 10 Aug 2008 12:15:36 -0000 1.345 % +++ ffs_vfsops.c 14 Aug 2008 15:51:01 -0000 % @@ -229,9 +229,18 @@ % } % vn_finished_write(mp); % + % + /* % + * Decrement the W count, and decrement the E count % + * if we are remounting root (fsck magic). % + */ Improve comments. Might be OBE. % DROP_GIANT(); % g_topology_lock(); % - g_access(ump->um_cp, 0, -1, 0); % + error = g_access(ump->um_cp, 0, -1, % + (mp->mnt_flag & MNT_ROOTFS) ? -1 : 0); This might be the fix. % g_topology_unlock(); % PICKUP_GIANT(); % + if (error) % + return (error); % + Add missing error handling. g_access() can easily fail, although it rarely fails in the above since the above asks for less access than has already been granted. % fs->fs_ronly = 1; % MNT_ILOCK(mp); % @@ -273,18 +282,18 @@ % } % } % - DROP_GIANT(); % - g_topology_lock(); % + % /* % - * If we're the root device, we may not have an E count % - * yet, get it now. % + * Increment the W count, and increment the E count % + * if we are remounting root (fsck magic). % */ % - if (ump->um_cp->ace == 0) % - error = g_access(ump->um_cp, 0, 1, 1); % - else % - error = g_access(ump->um_cp, 0, 1, 0); % + DROP_GIANT(); % + g_topology_lock(); % + error = g_access(ump->um_cp, 0, 1, % + (mp->mnt_flag & MNT_ROOTFS) ? 1 : 0); Improve comments and style. Don't be so verbose, except in the comments. % g_topology_unlock(); % PICKUP_GIANT(); % if (error) % return (error); % + % if ((error = vn_start_write(NULL, &mp, V_WAIT)) != 0) % return (error); % @@ -596,13 +605,13 @@ % dev = devvp->v_rdev; % cred = td ? td->td_ucred : NOCRED; % - % ronly = (mp->mnt_flag & MNT_RDONLY) != 0; % + % DROP_GIANT(); % g_topology_lock(); % error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1); % - % /* % - * If we are a root mount, drop the E flag so fsck can do its magic. % - * We will pick it up again when we remount R/W. % + * If we are mounting root ro, then decrement the E count (which % + * g_access() just incremented) so that fsck can do its magic. % + * We will increment it again if we remount rw. % */ Probably OBE. % if (error == 0 && ronly && (mp->mnt_flag & MNT_ROOTFS)) % @@ -610,4 +619,5 @@ % g_topology_unlock(); % PICKUP_GIANT(); % + % VOP_UNLOCK(devvp, 0); % if (error) % @@ -620,4 +630,5 @@ % devvp->v_bufobj.bo_private = cp; % devvp->v_bufobj.bo_ops = &ffs_ops; % + devvp->v_bufobj.bo_flag |= BO_MPSAFE; Unrelated. % % bp = NULL; Part of a patch for tunefs: % Index: tunefs.c % =================================================================== % RCS file: /home/ncvs/src/sbin/tunefs/tunefs.c,v % retrieving revision 1.42 % diff -u -2 -r1.42 tunefs.c % --- tunefs.c 9 Apr 2004 19:58:40 -0000 1.42 % +++ tunefs.c 10 Apr 2004 02:10:11 -0000 % ... % @@ -224,12 +224,14 @@ % usage(); % % - on = special = argv[0]; % + special = argv[0]; % if (ufs_disk_fillout(&disk, special) == -1) % goto err; % - if (disk.d_name != special) { % + if (disk.d_name != special) % special = disk.d_name; % - if (statfs(special, &stfs) == 0 && % - strcmp(special, stfs.f_mntonname) == 0) % - active = 1; % + stfsp = getmntpt(special); % + if (stfsp != NULL) { % + if (pflag == 0 && (stfsp->f_flags & MNT_RDONLY) == 0) % + errx(1, % + "cannot work on read-write mounted file system"); % } % getmntpt() is essentially a copy of the function of the same name in fsck_ffs. It is used for similar things. Here it is used to check more carefully if the fs is already mounted, and to bail out early if it is already mounted r/w. newfs_foofs should use a similar function, which should be in libc near the fstab access functions. I got ps@ to do add getmntpt() in newfs_msdos and started from there in tunefs. Better access control in the kernel might make this function less needed in applications, but it is still good for error handling. I forget what replaced the hotroot stuff in fsck_ffs, or what the unpatched version of tunefs does instead of bailing out -- it apparently just continues until sbwrite() fails. This was written before the E flag existed. I think exclusive access just breaks read-only mounted file systems. % ... % @@ -370,12 +368,13 @@ % goto err; % ufs_disk_close(&disk); % - if (active) { % + if (stfsp != NULL) { % bzero(&args, sizeof(args)); % - if (mount("ufs", on, % - stfs.f_flags | MNT_UPDATE | MNT_RELOAD, &args) < 0) % + if (mount("ufs", stfsp->f_mntonname, % + stfsp->f_flags | MNT_UPDATE | MNT_RELOAD, &args) < 0) % err(9, "%s: reload", special); % warnx("file system reloaded"); % } % exit(0); % + % err: % if (disk.d_error != NULL) 'active' is no longer needed, since we are more careful in finding the mounted file system if there is one. Exclusivivity should also fix races in the above, since we should open the device r/w and thus get exclusive access for the whole operation, but this seems to be broken with the help of libufs. I don't like libufs at all. Here its brokenness is that it obfuscates the open modes and thus helps confuse tunefs into using wrong ones. tunefs should use O_RDONLY for the -p case and O_RDWR otherwise. What it actually does is: - O_RDONLY via ufs_disk_fullout(). Use this for most of the program. - O_RDWR via sbwrite(). sbwrite() calls ufs_disk_write() which changes the open mode to O_RDWR extremely non-atomically. It just does `close(disk->f_fd); disk->d_fd = open(disk->d_name, O_RDWR)'. So even if you had exclusive access by opening the device O_RDWR, every single ufs_disk_write() drops it. These bugs don't quite duplicate old ones in tunefs. tunefs used to use getsb() instead of ufs_disk_fillout(), and putsb() instead of ufs_disk_write(). So it too used O_RDONLY for most of the program. But it was more careful in putsb(). It held the O_RDONLY fd open while opening a (different) O_RDWR descriptor and documented this as being "an interlock to prevent the device from being mounted while we are switching mode". However, I think O_RDONLY never gave any useful interlock. libufs's bugs also affect newfs. The main one that I knew about was that the error returned by sbwrite() is always ignored. This worked before libufs by using wtfs() for superblock writes too. (wtfs() has unclean but working error handling. It exits on error. Libraries shouldn't exit on error, so libufs tries to preserve error codes and messages from possibly-deeply-nested routines. Applications, faced with the complexity of this, too often don't even know about it and don't even check for errors. newfs should probably use a wrapper function to recover its old error handling for superblock writes. Using the library would still require several more lines of code and complications than just doing it.) Add the close/open race to this. wtfs() (local in newfs) uses bwrite() in libufs, and bwrite() uses ufs_disk_write(), giving the close/open race. fsck_ffs mercifully doesn't use libufs, so it is missing these bugs. It uses write(), but otherwise has more specialized error handling than a library can reasonably do. This is why I don't like libraries like libufs. They make simple things more complicated and buggy by wrapping layers around them, and can't help for not so simple things. Bruce