Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 1 Jan 2013 17:58:06 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Dominic Fandrey <kamikaze@bsdforen.de>
Cc:        FreeBSD <freebsd-stable@freebsd.org>, Chris Rees <utisoft@gmail.com>
Subject:   Re: Post 9.1 stable file system problems
Message-ID:  <20130101155806.GU82219@kib.kiev.ua>
In-Reply-To: <50E2E720.3040803@bsdforen.de>
References:  <50E225DF.3090004@bsdforen.de> <CADLo838mUdr96zQw2bTPUFWwUNoF=Zb4akEL6FfasQDOW5tN8A@mail.gmail.com> <50E23283.8010407@bsdforen.de> <50E23647.6000309@bsdforen.de> <20130101065145.GT82219@kib.kiev.ua> <50E2E720.3040803@bsdforen.de>

next in thread | previous in thread | raw e-mail | index | archive | help

--VxJb6WgA6MoA+arP
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jan 01, 2013 at 02:39:44PM +0100, Dominic Fandrey wrote:
> On 01/01/2013 07:51, Konstantin Belousov wrote:
> > On Tue, Jan 01, 2013 at 02:05:11AM +0100, Dominic Fandrey wrote:
> >> On 01/01/2013 01:49, Dominic Fandrey wrote:
> >>> On 01/01/2013 01:29, Chris Rees wrote:
> >>>> On 1 Jan 2013 00:01, "Dominic Fandrey" <kamikaze@bsdforen.de> wrote:
> >>>>>
> >>>>> I have a Tinderbox that I just updated to the current RELENG_9.
> >>>>> Following the update build times for packages have increased by a
> >>>>> factor between 5 and 20. I.e. I have packages that used to build in
> >>>>> 5 minutes and now take an hour.
> >>>>>
> >>>>> I'm suspecting the file system ever since I saw that the majority o=
f CPU
> >>>>> load was caused by ls when I looked at top (more than 2 minutes of =
CPU
> >>>>> time were counted that moment). The majority of the time most of th=
e CPU
> >>>>> load is caused by bsdtar, pkg_add, qmake-qt4, etc. Without exception
> >>>>> tools that access a lot of files.
> >>>>>
> >>>>> The file system on which packages are built is nullfs mounted from
> >>>>> an async mounted UFS. I turned async off, to no avail.
> >>>>>
> >>>>> /usr/src/UPDATING says that there were nullfs optimisations. So I
> >>>>> think this is where the problem originates. I might hack the tinder=
box to
> >>>>> use 'ln -s' or set it up for NFS to verify this.
> >>>>
> >>>> Is your kernel newer than the Jail?  The converse causes problems.
> >>>
> >>> I ran makeJail for all jails after updating.
Did you rebuild your modules together with the new kernel ?

> >>>
> >>> I also seem to have similar problems when building in the host-system.
> >>> The unzip for openjdk-7 has just passed the 11 minutes CPU time mark.
> >>> On my notebook it takes less than 10 seconds.
> >>
> >> Just set WRKOBJDIRPREFIX to a tmpfs on the Tinderbox host system
> >> and the extract takes less than a second. Originally WRKOBJDIRPREFIX
> >> also pointed to a nullfs mount.
> >>
> >> Afterwards I pointed WRKOBJDIRPREFIX to a UFS file system (without
> >> nullfs involvement). The entire make extract took 20s.
> >>
> >> So still faster by at least factor 30 than running it on a nullfs mount
> >> (I eventually SIGINTed so I don't know how long it would've run).
> >=20
> > Start providing some useful debugging information ?
>=20
> That one might be interesting. It's all system time:
>=20
> # time -lh make extract
> =3D=3D=3D>  License GPLv2 accepted by the user
> =3D=3D=3D>  Found saved configuration for openjdk-7.9.05_1
> =3D=3D=3D>  Extracting for openjdk-7.9.05_2
> =3D> SHA256 Checksum OK for openjdk-7u6-fcs-src-b24-09_aug_2012.zip.
> =3D> SHA256 Checksum OK for apache-ant-1.8.4-bin.zip.
> =3D=3D=3D>   openjdk-7.9.05_2 depends on file: /usr/local/bin/unzip - fou=
nd
> ^Ctime: command terminated abnormally
>         4m29.30s real           3.03s user              4m22.55s sys
>       5008  maximum resident set size
>        135  average shared memory size
>       2932  average unshared data size
>        127  average unshared stack size
>       7772  page reclaims
>          0  page faults
>          0  swaps
>         19  block input operations
>        101  block output operations
>          0  messages sent
>          0  messages received
>         41  signals received
>       1597  voluntary context switches
>      16590  involuntary context switches

Ok, from your mount -v output, are the three nullfs mounts the only
nullfs mount ever used ?

Is it only unzip which demostrates the silly behaviour ? Or does it
happen with any program ? E.g., does ls(1) or sha1 on the nullfs mount
also slow ?

Could you try some low-tech profiling on the slow program. For instance,
you could run ktrace/kdump -R to see which syscalls are slow.

Most darkly part of your report for me, is that I also use nullfs-backed
jails both on HEAD and stable/9, with bigger scale, and I do not have
an issue. I just did
pooma32% time unzip -q /usr/local/arch/freebsd/distfiles/openjdk-7u6-fcs-sr=
c-b24-09_aug_2012.zip
unzip -q   3.25s user 23.77s system 78% cpu 34.482 total
over nullfs mount of
/usr/home on /usr/sfw/local8/opt/pooma32/usr/home (nullfs, local).

Please try the following patch, which changes nullfs behaviour to be
non-cached by default. You could turn on the caching with the 'mount -t
nullfs -o cache from to' mounting command. I am interested if use/non-use
of -o cache makes a difference for you.

diff --git a/sbin/mount_nullfs/mount_nullfs.c b/sbin/mount_nullfs/mount_nul=
lfs.c
index c88db3d..aaf66e5 100644
--- a/sbin/mount_nullfs/mount_nullfs.c
+++ b/sbin/mount_nullfs/mount_nullfs.c
@@ -57,27 +57,35 @@ static const char rcsid[] =3D
=20
 #include "mntopts.h"
=20
-static struct mntopt mopts[] =3D {
-	MOPT_STDOPTS,
-	MOPT_END
-};
-
 int	subdir(const char *, const char *);
 static void	usage(void) __dead2;
=20
 int
 main(int argc, char *argv[])
 {
-	struct iovec iov[6];
-	int ch, mntflags;
+	struct iovec *iov;
+	char *p, *val;
 	char source[MAXPATHLEN];
 	char target[MAXPATHLEN];
+	char errmsg[255];
+	int ch, mntflags, iovlen;
+	char nullfs[] =3D "nullfs";
=20
+	iov =3D NULL;
+	iovlen =3D 0;
 	mntflags =3D 0;
+	errmsg[0] =3D '\0';
 	while ((ch =3D getopt(argc, argv, "o:")) !=3D -1)
 		switch(ch) {
 		case 'o':
-			getmntopts(optarg, mopts, &mntflags, 0);
+			val =3D strdup("");
+			p =3D strchr(optarg, '=3D');
+			if (p !=3D NULL) {
+				free(val);
+				*p =3D '\0';
+				val =3D p + 1;
+			}
+			build_iovec(&iov, &iovlen, optarg, val, (size_t)-1);
 			break;
 		case '?':
 		default:
@@ -99,21 +107,16 @@ main(int argc, char *argv[])
 		errx(EX_USAGE, "%s (%s) and %s are not distinct paths",
 		    argv[0], target, argv[1]);
=20
-	iov[0].iov_base =3D strdup("fstype");
-	iov[0].iov_len =3D sizeof("fstype");
-	iov[1].iov_base =3D strdup("nullfs");
-	iov[1].iov_len =3D strlen(iov[1].iov_base) + 1;
-	iov[2].iov_base =3D strdup("fspath");
-	iov[2].iov_len =3D sizeof("fspath");
-	iov[3].iov_base =3D source;
-	iov[3].iov_len =3D strlen(source) + 1;
-	iov[4].iov_base =3D strdup("target");
-	iov[4].iov_len =3D sizeof("target");
-	iov[5].iov_base =3D target;
-	iov[5].iov_len =3D strlen(target) + 1;
-
-	if (nmount(iov, 6, mntflags))
-		err(1, NULL);
+	build_iovec(&iov, &iovlen, "fstype", nullfs, (size_t)-1);
+	build_iovec(&iov, &iovlen, "fspath", source, (size_t)-1);
+	build_iovec(&iov, &iovlen, "target", target, (size_t)-1);
+	build_iovec(&iov, &iovlen, "errmsg", errmsg, sizeof(errmsg));
+	if (nmount(iov, iovlen, mntflags) < 0) {
+		if (errmsg[0] !=3D 0)
+			err(1, "%s: %s", source, errmsg);
+		else
+			err(1, "%s", source);
+	}
 	exit(0);
 }
=20
diff --git a/sys/fs/nullfs/null.h b/sys/fs/nullfs/null.h
index 0878e55..4f37020 100644
--- a/sys/fs/nullfs/null.h
+++ b/sys/fs/nullfs/null.h
@@ -34,9 +34,15 @@
  * $FreeBSD$
  */
=20
+#ifndef	FS_NULL_H
+#define	FS_NULL_H
+
+#define	NULLM_CACHE	0x0001
+
 struct null_mount {
 	struct mount	*nullm_vfs;
 	struct vnode	*nullm_rootvp;	/* Reference to root null_node */
+	uint64_t	nullm_flags;
 };
=20
 #ifdef _KERNEL
@@ -80,3 +86,5 @@ MALLOC_DECLARE(M_NULLFSNODE);
 #endif /* NULLFS_DEBUG */
=20
 #endif /* _KERNEL */
+
+#endif
diff --git a/sys/fs/nullfs/null_subr.c b/sys/fs/nullfs/null_subr.c
index b2c7a75..f82d738 100644
--- a/sys/fs/nullfs/null_subr.c
+++ b/sys/fs/nullfs/null_subr.c
@@ -224,6 +224,9 @@ null_nodeget(mp, lowervp, vpp)
 	 * provide ready to use vnode.
 	 */
 	if (VOP_ISLOCKED(lowervp) !=3D LK_EXCLUSIVE) {
+		KASSERT((MOUNTTONULLMOUNT(mp)->nullm_flags & NULLM_CACHE) =3D=3D 0,
+		    ("lowervp %p is not excl locked and cache is disabled",
+		    lowervp));
 		vn_lock(lowervp, LK_UPGRADE | LK_RETRY);
 		if ((lowervp->v_iflag & VI_DOOMED) !=3D 0) {
 			vput(lowervp);
diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c
index 7d84d51..8a5f1b9 100644
--- a/sys/fs/nullfs/null_vfsops.c
+++ b/sys/fs/nullfs/null_vfsops.c
@@ -67,6 +67,13 @@ static vfs_vget_t	nullfs_vget;
 static vfs_extattrctl_t	nullfs_extattrctl;
 static vfs_reclaim_lowervp_t nullfs_reclaim_lowervp;
=20
+/* Mount options that we support. */
+static const char *nullfs_opts[] =3D {
+	"target",
+	"cache",
+	NULL
+};
+
 /*
  * Mount null layer
  */
@@ -86,9 +93,11 @@ nullfs_mount(struct mount *mp)
=20
 	if (!prison_allow(td->td_ucred, PR_ALLOW_MOUNT_NULLFS))
 		return (EPERM);
-
 	if (mp->mnt_flag & MNT_ROOTFS)
 		return (EOPNOTSUPP);
+	if (vfs_filteropt(mp->mnt_optnew, nullfs_opts))
+		return (EINVAL);
+
 	/*
 	 * Update is a no-op
 	 */
@@ -149,7 +158,7 @@ nullfs_mount(struct mount *mp)
 	}
=20
 	xmp =3D (struct null_mount *) malloc(sizeof(struct null_mount),
-	    M_NULLFSMNT, M_WAITOK);
+	    M_NULLFSMNT, M_WAITOK | M_ZERO);
=20
 	/*
 	 * Save reference to underlying FS
@@ -187,16 +196,25 @@ nullfs_mount(struct mount *mp)
 		mp->mnt_flag |=3D MNT_LOCAL;
 		MNT_IUNLOCK(mp);
 	}
+
+	vfs_flagopt(mp->mnt_optnew, "cache", &xmp->nullm_flags, NULLM_CACHE);
+
 	MNT_ILOCK(mp);
-	mp->mnt_kern_flag |=3D lowerrootvp->v_mount->mnt_kern_flag &
-	    (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | MNTK_EXTENDED_SHARED);
+	if ((xmp->nullm_flags & NULLM_CACHE) !=3D 0) {
+		mp->mnt_kern_flag |=3D lowerrootvp->v_mount->mnt_kern_flag &
+		    (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
+		    MNTK_EXTENDED_SHARED);
+	}
 	mp->mnt_kern_flag |=3D MNTK_LOOKUP_EXCL_DOTDOT;
 	MNT_IUNLOCK(mp);
 	mp->mnt_data =3D xmp;
 	vfs_getnewfsid(mp);
-	MNT_ILOCK(xmp->nullm_vfs);
-	TAILQ_INSERT_TAIL(&xmp->nullm_vfs->mnt_uppers, mp, mnt_upper_link);
-	MNT_IUNLOCK(xmp->nullm_vfs);
+	if ((xmp->nullm_flags & NULLM_CACHE) !=3D 0) {
+		MNT_ILOCK(xmp->nullm_vfs);
+		TAILQ_INSERT_TAIL(&xmp->nullm_vfs->mnt_uppers, mp,
+		    mnt_upper_link);
+		MNT_IUNLOCK(xmp->nullm_vfs);
+	}
=20
 	vfs_mountedfrom(mp, target);
=20
@@ -234,13 +252,15 @@ nullfs_unmount(mp, mntflags)
 	 */
 	mntdata =3D mp->mnt_data;
 	ump =3D mntdata->nullm_vfs;
-	MNT_ILOCK(ump);
-	while ((ump->mnt_kern_flag & MNTK_VGONE_UPPER) !=3D 0) {
-		ump->mnt_kern_flag |=3D MNTK_VGONE_WAITER;
-		msleep(&ump->mnt_uppers, &ump->mnt_mtx, 0, "vgnupw", 0);
+	if ((mntdata->nullm_flags & NULLM_CACHE) !=3D 0) {
+		MNT_ILOCK(ump);
+		while ((ump->mnt_kern_flag & MNTK_VGONE_UPPER) !=3D 0) {
+			ump->mnt_kern_flag |=3D MNTK_VGONE_WAITER;
+			msleep(&ump->mnt_uppers, &ump->mnt_mtx, 0, "vgnupw", 0);
+		}
+		TAILQ_REMOVE(&ump->mnt_uppers, mp, mnt_upper_link);
+		MNT_IUNLOCK(ump);
 	}
-	TAILQ_REMOVE(&ump->mnt_uppers, mp, mnt_upper_link);
-	MNT_IUNLOCK(ump);
 	mp->mnt_data =3D NULL;
 	free(mntdata, M_NULLFSMNT);
 	return (0);
diff --git a/sys/fs/nullfs/null_vnops.c b/sys/fs/nullfs/null_vnops.c
index f530ed2..cc35d81 100644
--- a/sys/fs/nullfs/null_vnops.c
+++ b/sys/fs/nullfs/null_vnops.c
@@ -692,7 +692,22 @@ null_unlock(struct vop_unlock_args *ap)
 static int
 null_inactive(struct vop_inactive_args *ap __unused)
 {
+	struct vnode *vp;
+	struct mount *mp;
+	struct null_mount *xmp;
=20
+	vp =3D ap->a_vp;
+	mp =3D vp->v_mount;
+	xmp =3D MOUNTTONULLMOUNT(mp);
+	if ((xmp->nullm_flags & NULLM_CACHE) =3D=3D 0) {
+		/*
+		 * If this is the last reference and caching of the
+		 * nullfs vnodes is not enabled, then free up the
+		 * vnode so as not to tie up the lower vnodes.
+		 */
+		vp->v_object =3D NULL;
+		vrecycle(vp);
+	}
 	return (0);
 }
=20

--VxJb6WgA6MoA+arP
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJQ4weOAAoJEJDCuSvBvK1BMKgP/A69Gl8KwEddfL34TpG60/DT
pIPM5HcVFKgSqula2WnVCvPX9RsrMyRsFhVCnvp8e7kjbMA2h4NBruWiY8kzveBg
MS2RQ2p/EdBJhZtxF4KrrK0nVfw0LladvueeIS/5W2OfaGIhXcH7dhWkFNWD3UW5
06YkeR5UjQG7g4XaNExXCPxeleNzRV6f7Lk2LDjbbAaqH3L/PusEV3F5lxPHDgpd
8BC8+9XS5PS4Y323O04PSBU2mCDUwmaxFSsAoFXajYFwVqLZUECWMbRU7g+YeoVQ
k9DQ6uBjW8dcA30H0dv1gkc6yfF9O6JClA0HeO/BJNllpDFweWuU1wo1+zauTFmF
YvK0rRRpCEmssq1Eb7SrhNfBG8bwEVPQlcJhgbGIcD8FtAGPnF209hCgGX2CmKJM
t7uoI8adZRVTlCASXUbK4XFQKEyPGn+iWJJSWexxDWwP3RbHFTQIk7mIIRXP40Nm
U+QCRUdMmRvrE/9vFZuR7yLWC8WxZZgvvGvxsbzM0ZRkNB55POGOAdIa7OZOsVEP
0kPAyhwdYfyEgJAIqtHQzSLKgv2wBdm1P+2exu0OpVgZ+/trKJSV8Qm9aO/lVldo
jVPs5kHfNwqGoHkiK/Bp2LRBHWBh+Rn9DRazMa8d46c2bfEyrftdfDAnGG6Wxz+w
XTIZFHd10XHSwes5MOjK
=zzfD
-----END PGP SIGNATURE-----

--VxJb6WgA6MoA+arP--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130101155806.GU82219>