Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 12 Dec 2006 12:19:03 +0200
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Suleiman Souhlal <ssouhlal@freebsd.org>
Cc:        freebsd-stable@freebsd.org, V??clav Haisman <V.Haisman@sh.cvut.cz>, tegge@freebsd.org, bde@freebsd.org, freebsd-current@freebsd.org
Subject:   Re: kqueue LOR
Message-ID:  <20061212101903.GF311@deviant.kiev.zoral.com.ua>
In-Reply-To: <457E6C06.1020405@FreeBSD.org>
References:  <456950AF.3090308@sh.cvut.cz> <20061127092146.GA69556@deviant.kiev.zoral.com.ua> <457E6C06.1020405@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--J4XPiPrVK1ev6Sgr
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote:
> Kostik Belousov wrote:
> >On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote:
> >
> >>Hi,
> >>the attached lor.txt contains LOR I got this yesterday. It is FreeBSD 6=
.1
> >>with relatively recent kernel, from last week or so.
> >>
> >>--
> >>VH
> >
> >
> >>+lock order reversal:
> >>+ 1st 0xc537f300 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1547
> >>+ 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @=20
> >>/usr/src/sys/ufs/ufs/ufs_vnops.c:138
> >>+KDB: stack backtrace:
> >>+kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at=20
> >>kdb_backtrace+0x2f
> >>+witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at=20
> >>witness_checkorder+0x5fe
> >>+_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at=20
> >>_mtx_lock_flags+0x32
> >>+ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at=20
> >>ufs_itimes+0x6c
> >>+ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at=20
> >>ufs_getattr+0x20
> >>+VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at=
=20
> >>VOP_GETATTR_APV+0x3a
> >>+filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75
> >>+knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75
> >>+VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at=20
> >>VOP_WRITE_APV+0x148
> >>+vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at vn_write+0x201
> >>+dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,ffffffff,...) at=20
> >>dofilewrite+0x84
> >>+kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65
> >>+write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f
> >>+syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295
> >>+Xint0x80_syscall() at Xint0x80_syscall+0x1f
> >>+--- syscall (4, FreeBSD ELF32, write), eip =3D 0x2831d727, esp =3D=20
> >>0xbfbfea1c, ebp =3D 0xbfbfea48 ---
> >
> >
> >Thank you for the report. The LOR is caused by my commit into
> >sys/ufs/ufs/ufs_vnops.c, rev. 1.280.
>=20
> Is the mount lock really required, if all we're doing is a single read of=
 a=20
> single word (mnt_kern_flags) (v_mount should be read-only for the whole=
=20
> lifetime of the vnode, I believe)? After all, reads of a single word are=
=20
> atomic on all our supported architectures.
> The only situation I see where there MIGHT be problems are forced unmount=
s,=20
> but I think there are bigger issues with those.
> Sorry for noticing this email only now.

The problem is real with snapshotting. Ignoring
MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of
mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode
inactivation time. This was the big trouble with nfsd and snapshots. As
such, I think that precise value of mmnt_kern_flag is critical there,
and mount interlock is needed.

Practically speaking, I agree with claim that reading of m_k_f is
surrounded by enough locked operations that would make sure that
the read value is not stale. But there is no such guarantee on
future/non-i386 arches, isn't it ?

As a side note, mount interlock scope could be reduced there.

Index: ufs/ufs/ufs_vnops.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
retrieving revision 1.283
diff -u -r1.283 ufs_vnops.c
--- ufs/ufs/ufs_vnops.c	6 Nov 2006 13:42:09 -0000	1.283
+++ ufs/ufs/ufs_vnops.c	12 Dec 2006 10:18:04 -0000
@@ -133,19 +134,19 @@
 {
 	struct inode *ip;
 	struct timespec ts;
-	int mnt_locked;
=20
 	ip =3D VTOI(vp);
-	mnt_locked =3D 0;
 	if ((vp->v_mount->mnt_flag & MNT_RDONLY) !=3D 0) {
 		VI_LOCK(vp);
 		goto out;
 	}
 	MNT_ILOCK(vp->v_mount);		/* For reading of mnt_kern_flags. */
-	mnt_locked =3D 1;
 	VI_LOCK(vp);
-	if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) =3D=3D 0)
-		goto out_unl;
+	if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) =3D=3D 0) {
+		MNT_IUNLOCK(vp->v_mount);
+		VI_UNLOCK(vp);
+		return;
+	}
=20
 	if ((vp->v_type =3D=3D VBLK || vp->v_type =3D=3D VCHR) && !DOINGSOFTDEP(v=
p))
 		ip->i_flag |=3D IN_LAZYMOD;
@@ -155,6 +156,7 @@
 		ip->i_flag |=3D IN_MODIFIED;
 	else if (ip->i_flag & IN_ACCESS)
 		ip->i_flag |=3D IN_LAZYACCESS;
+	MNT_IUNLOCK(vp->v_mount);
 	vfs_timestamp(&ts);
 	if (ip->i_flag & IN_ACCESS) {
 		DIP_SET(ip, i_atime, ts.tv_sec);
@@ -172,10 +174,7 @@
=20
  out:
 	ip->i_flag &=3D ~(IN_ACCESS | IN_CHANGE | IN_UPDATE);
- out_unl:
 	VI_UNLOCK(vp);
-	if (mnt_locked)
-		MNT_IUNLOCK(vp->v_mount);
 }
=20
 /*

--J4XPiPrVK1ev6Sgr
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (FreeBSD)

iD8DBQFFfoIWC3+MBN1Mb4gRAmA8AKCrbt+mT1HCNj3VnFFQVu+fb2JzZQCcCr0L
n4DVdlGDbEkZAwaaBZhEFo8=
=7yaQ
-----END PGP SIGNATURE-----

--J4XPiPrVK1ev6Sgr--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061212101903.GF311>