Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 22 Dec 2007 22:16:13 +0200
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        "Freebsd-Net@Freebsd. Org" <freebsd-net@freebsd.org>, David Schwartz <davids@webmaster.com>, freebsd-stable@freebsd.org
Subject:   Re: Packet loss every 30.999 seconds
Message-ID:  <20071222201613.GX57756@deviant.kiev.zoral.com.ua>
In-Reply-To: <20071223032944.G48303@delplex.bde.org>
References:  <20071221234347.GS25053@tnn.dglawrence.com> <MDEHLPKNGKAHNMBLJOLKMEKLJAAC.davids@webmaster.com> <20071222050743.GP57756@deviant.kiev.zoral.com.ua> <20071223032944.G48303@delplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--0k4Rxg87Lb8yV0u3
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Dec 23, 2007 at 04:08:09AM +1100, Bruce Evans wrote:
> On Sat, 22 Dec 2007, Kostik Belousov wrote:
> >Yes, rewriting the syncer is the right solution. It probably cannot be d=
one
> >quickly enough. If the yield workaround provide mitigation for now, it
> >shall go in.
>=20
> I don't think rewriting the syncer just for this is the right solution.
> Rewriting the syncer so that it schedules actual i/o more efficiently
> might involve a solution.  Better scheduling would probably take more
> CPU and increase the problem.
I think that we can easily predict what vnode(s) become dirty at the
places where we do vn_start_write().

>=20
> Note that MNT_VNODE_FOREACH() is used 17 times, so the yielding fix is
> needed in 17 places if it isn't done internally in MNT_VNODE_FOREACH().
> There are 4 places in vfs and 13 places in 6 file systems:
>=20
> % ./ufs/ffs/ffs_snapshot.c:	MNT_VNODE_FOREACH(xvp, mp, mvp) {
> % ./ufs/ffs/ffs_snapshot.c:	MNT_VNODE_FOREACH(vp, mp, mvp) {
> % ./ufs/ffs/ffs_vfsops.c:	MNT_VNODE_FOREACH(vp, mp, mvp) {
> % ./ufs/ffs/ffs_vfsops.c:	MNT_VNODE_FOREACH(vp, mp, mvp) {
> % ./ufs/ufs/ufs_quota.c:	MNT_VNODE_FOREACH(vp, mp, mvp) {
> % ./ufs/ufs/ufs_quota.c:	MNT_VNODE_FOREACH(vp, mp, mvp) {
> % ./ufs/ufs/ufs_quota.c:	MNT_VNODE_FOREACH(vp, mp, mvp) {
> % ./fs/msdosfs/msdosfs_vfsops.c:	MNT_VNODE_FOREACH(vp, mp, nvp) {
> % ./fs/coda/coda_subr.c:	MNT_VNODE_FOREACH(vp, mp, nvp) {
> % ./gnu/fs/ext2fs/ext2_vfsops.c:	MNT_VNODE_FOREACH(vp, mp, mvp) {
> % ./gnu/fs/ext2fs/ext2_vfsops.c:	MNT_VNODE_FOREACH(vp, mp, mvp) {
> % ./kern/vfs_default.c:	MNT_VNODE_FOREACH(vp, mp, mvp) {
> % ./kern/vfs_subr.c:	MNT_VNODE_FOREACH(vp, mp, mvp) {
> % ./kern/vfs_subr.c:	MNT_VNODE_FOREACH(vp, mp, mvp) {
> % ./nfs4client/nfs4_vfsops.c:	MNT_VNODE_FOREACH(vp, mp, mvp) {
> % ./nfsclient/nfs_subs.c:	MNT_VNODE_FOREACH(vp, mp, nvp) {
> % ./nfsclient/nfs_vfsops.c:	MNT_VNODE_FOREACH(vp, mp, mvp) {
>=20
> Only file systems that support writing need it (for VOP_SYNC() and for
> MNT_RELOAD), else there would be many more places.  There would also
> be more places if MNT_RELOAD support were not missing for some file
> systems.

Ok, since you talked about this first :). I already made the following
patch, but did not published it since I still did not inspected all
callers of MNT_VNODE_FOREACH() for safety of dropping mount interlock.
It shall be safe, but better to check. Also, I postponed the check
until it was reported that yielding does solve the original problem.

diff --git a/sys/kern/vfs_mount.c b/sys/kern/vfs_mount.c
index 14acc5b..046af82 100644
--- a/sys/kern/vfs_mount.c
+++ b/sys/kern/vfs_mount.c
@@ -1994,6 +1994,12 @@ __mnt_vnode_next(struct vnode **mvp, struct mount *m=
p)
 	mtx_assert(MNT_MTX(mp), MA_OWNED);
=20
 	KASSERT((*mvp)->v_mount =3D=3D mp, ("marker vnode mount list mismatch"));
+	if ((*mvp)->v_yield++ =3D=3D 500) {
+		MNT_IUNLOCK(mp);
+		(*mvp)->v_yield =3D 0;
+		uio_yield();
+		MNT_ILOCK(mp);
+	}
 	vp =3D TAILQ_NEXT(*mvp, v_nmntvnodes);
 	while (vp !=3D NULL && vp->v_type =3D=3D VMARKER)
 		vp =3D TAILQ_NEXT(vp, v_nmntvnodes);
diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h
index dc70417..6e3119b 100644
--- a/sys/sys/vnode.h
+++ b/sys/sys/vnode.h
@@ -131,6 +131,7 @@ struct vnode {
 		struct socket	*vu_socket;	/* v unix domain net (VSOCK) */
 		struct cdev	*vu_cdev; 	/* v device (VCHR, VBLK) */
 		struct fifoinfo	*vu_fifoinfo;	/* v fifo (VFIFO) */
+		int		vu_yield;	/*   yield count (VMARKER) */
 	} v_un;
=20
 	/*
@@ -185,6 +186,7 @@ struct vnode {
 #define	v_socket	v_un.vu_socket
 #define	v_rdev		v_un.vu_cdev
 #define	v_fifoinfo	v_un.vu_fifoinfo
+#define	v_yield		v_un.vu_yield
=20
 /* XXX: These are temporary to avoid a source sweep at this time */
 #define v_object	v_bufobj.bo_object

--0k4Rxg87Lb8yV0u3
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (FreeBSD)

iD8DBQFHbXCNC3+MBN1Mb4gRAl8eAJ9DvGYrFBcvBUeaesQfI8K8NZa8CwCfabpZ
P1ojIQjyRhEbd8gCeutenLM=
=t5ni
-----END PGP SIGNATURE-----

--0k4Rxg87Lb8yV0u3--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071222201613.GX57756>