From owner-freebsd-stable@FreeBSD.ORG Tue Dec 12 12:49:47 2006 Return-Path: X-Original-To: freebsd-stable@FreeBSD.org Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2126016A709; Tue, 12 Dec 2006 12:49:47 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout1.pacific.net.au (mailout1-3.pacific.net.au [61.8.2.210]) by mx1.FreeBSD.org (Postfix) with ESMTP id E06FD43CC5; Tue, 12 Dec 2006 12:48:23 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.2.163]) by mailout1.pacific.net.au (Postfix) with ESMTP id 1373B5A3EFF; Tue, 12 Dec 2006 23:49:45 +1100 (EST) Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy2.pacific.net.au (Postfix) with ESMTP id 06F1727408; Tue, 12 Dec 2006 23:49:42 +1100 (EST) Date: Tue, 12 Dec 2006 23:49:42 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Kostik Belousov In-Reply-To: <20061212101903.GF311@deviant.kiev.zoral.com.ua> Message-ID: <20061212220705.F57430@delplex.bde.org> References: <456950AF.3090308@sh.cvut.cz> <20061127092146.GA69556@deviant.kiev.zoral.com.ua> <457E6C06.1020405@FreeBSD.org> <20061212101903.GF311@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: V??clav Haisman , freebsd-stable@FreeBSD.org, Suleiman Souhlal , freebsd-current@FreeBSD.org, bde@FreeBSD.org, tegge@FreeBSD.org Subject: Re: kqueue LOR X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Dec 2006 12:49:47 -0000 On Tue, 12 Dec 2006, Kostik Belousov wrote: > On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote: >> Is the mount lock really required, if all we're doing is a single read of a >> single word (mnt_kern_flags) (v_mount should be read-only for the whole >> lifetime of the vnode, I believe)? After all, reads of a single word are >> atomic on all our supported architectures. >> The only situation I see where there MIGHT be problems are forced unmounts, >> but I think there are bigger issues with those. >> Sorry for noticing this email only now. > > The problem is real with snapshotting. Ignoring > MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of > mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode > inactivation time. This was the big trouble with nfsd and snapshots. As > such, I think that precise value of mmnt_kern_flag is critical there, > and mount interlock is needed. Locking for just read is almost always bogus, but here (as in most cases) there is also a write based on the contents of the flag, and the lock is held across the write. > Practically speaking, I agree with claim that reading of m_k_f is > surrounded by enough locked operations that would make sure that > the read value is not stale. But there is no such guarantee on > future/non-i386 arches, isn't it ? I think not-very-staleness is implied by acquire/release semantics which are part of the API for most atomic operations. This behaviour doesn't seem to be documented for mutexes, but I don't see how mutexes could work without it (they have to synchronize all memory accesses, not just the memory accessed by the lock). > As a side note, mount interlock scope could be reduced there. > > Index: ufs/ufs/ufs_vnops.c > =================================================================== > RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v > retrieving revision 1.283 > diff -u -r1.283 ufs_vnops.c > --- ufs/ufs/ufs_vnops.c 6 Nov 2006 13:42:09 -0000 1.283 > +++ ufs/ufs/ufs_vnops.c 12 Dec 2006 10:18:04 -0000 > @@ -133,19 +134,19 @@ > { > struct inode *ip; > struct timespec ts; > - int mnt_locked; > > ip = VTOI(vp); > - mnt_locked = 0; > if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0) { > VI_LOCK(vp); > goto out; > } > MNT_ILOCK(vp->v_mount); /* For reading of mnt_kern_flags. */ > - mnt_locked = 1; > VI_LOCK(vp); > - if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) > - goto out_unl; > + if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) { > + MNT_IUNLOCK(vp->v_mount); > + VI_UNLOCK(vp); > + return; > + } The version that depends on not-very-staleness would test the flags without acquiring the lock(s) and return immediately in the usual case where none of the flags are set. It would have to acquire the locks and repeat the test to make changes (and the test is already repeated one flag at a time). I think this would be correct enough, but still inefficient and/or even messier. The current organization is usually: acquire vnode interlock in caller release vnode interlock in caller to avoid messes here (inefficient) call acquire mount interlock acquire vnode interlock test the flags; goto cleanup code if none set (usual case) do the work release vnode interlock release mount interlock return acquire vnode interlock (if needed) release vnode interlock (if needed) and it might become: acquire vnode interlock in caller call test the flags; return if none set (usual case) release vnode interlock // check that callers are aware of this acquire mount interlock acquire vnode interlock do the work // Assume no LOR problem for release, as below. // Otherwise need another relese+acquire of vnode interlock. release mount interlock return release vnode interlock > > if ((vp->v_type == VBLK || vp->v_type == VCHR) && !DOINGSOFTDEP(vp)) > ip->i_flag |= IN_LAZYMOD; > @@ -155,6 +156,7 @@ > ip->i_flag |= IN_MODIFIED; > else if (ip->i_flag & IN_ACCESS) > ip->i_flag |= IN_LAZYACCESS; > + MNT_IUNLOCK(vp->v_mount); > vfs_timestamp(&ts); > if (ip->i_flag & IN_ACCESS) { > DIP_SET(ip, i_atime, ts.tv_sec); Is there no LOR problem for release? As I understand it, MNT_ILOCK() is only protecting IN_ACCESS being converted to IN_MODIFED, so after this conversion is done the lock is not needed. Is this correct? > @@ -172,10 +174,7 @@ > > out: > ip->i_flag &= ~(IN_ACCESS | IN_CHANGE | IN_UPDATE); > - out_unl: > VI_UNLOCK(vp); > - if (mnt_locked) > - MNT_IUNLOCK(vp->v_mount); > } > > /* > BTW, vfs.lookup_shared defaults to 0 and decides shared access for all operations including read, so I wonder if there are [m]any bugs preventing shared accesses being the default for the most important cases where they can be (lookup and reads). (As you know, not acquiring the vnode interlock in old versions of the above was one, and since vfs.lookup_shared is global and not all file systems have the fix, it still is one.) Bruce