Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Sep 2013 19:36:04 +0000 (UTC)
From:      Kirk McKusick <mckusick@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-9@freebsd.org
Subject:   svn commit: r255494 - stable/9/sys/ufs/ffs
Message-ID:  <201309121936.r8CJa4Q6060099@svn.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: mckusick
Date: Thu Sep 12 19:36:04 2013
New Revision: 255494
URL: http://svnweb.freebsd.org/changeset/base/255494

Log:
  MFC of 254995:
  
  A performance problem was reported in PR kern/181226:
  
      I have 25TB Dell PERC 6 RAID5 array. When it becomes almost
      full (10-20GB free), processes which write data to it start
      eating 100% CPU and write speed drops below 1MB/sec (normally
      to gives 400MB/sec). The revision at which it first became
      apparent was http://svnweb.freebsd.org/changeset/base/249782.
  
  The offending change reserved an area in each cylinder group to
  store metadata. The new algorithm attempts to save this area for
  metadata and allows its use for non-metadata only after all the
  data areas have been exhausted. The size of the reserved area
  defaults to half of minfree, so the filesystem reports full before
  the data area can completely fill. However, in this report, the
  filesystem has had minfree reduced to 1% thus forcing the metadata
  area to be used for data. As the filesystem approached full, it
  had only metadata areas left to allocate. The result was that
  every block allocation had to scan summary data for 30,000 cylinder
  groups before falling back to searching up to 30,000 metadata areas.
  
  The fix is to give up on saving the metadata areas once the free
  space reserve drops below 2%. The effect of this change is to use
  the old algorithm of just accepting the first available block that
  we find. Since most filesystems use the default 5% minfree, this
  will have no effect on their operation. For those that want to push
  to the limit, they will get their crappy block placements quickly.
  
  Submitted by:  Dmitry Sivachenko
  Fix Tested by: Dmitry Sivachenko
  PR:            kern/181226
  
  MFC of 254996:
  
  In looking at block layouts as part of fixing filesystem block
  allocations under low free-space conditions (-r254995), determine
  that old block-preference search order used before -r249782 worked
  a bit better. This change reverts to that block-preference search order.

Modified:
  stable/9/sys/ufs/ffs/ffs_alloc.c
Directory Properties:
  stable/9/sys/   (props changed)

Modified: stable/9/sys/ufs/ffs/ffs_alloc.c
==============================================================================
--- stable/9/sys/ufs/ffs/ffs_alloc.c	Thu Sep 12 18:08:25 2013	(r255493)
+++ stable/9/sys/ufs/ffs/ffs_alloc.c	Thu Sep 12 19:36:04 2013	(r255494)
@@ -516,7 +516,13 @@ ffs_reallocblks_ufs1(ap)
 	ip = VTOI(vp);
 	fs = ip->i_fs;
 	ump = ip->i_ump;
-	if (fs->fs_contigsumsize <= 0)
+	/*
+	 * If we are not tracking block clusters or if we have less than 2%
+	 * free blocks left, then do not attempt to cluster. Running with
+	 * less than 5% free block reserve is not recommended and those that
+	 * choose to do so do not expect to have good file layout.
+	 */
+	if (fs->fs_contigsumsize <= 0 || freespace(fs, 2) < 0)
 		return (ENOSPC);
 	buflist = ap->a_buflist;
 	len = buflist->bs_nchildren;
@@ -736,7 +742,13 @@ ffs_reallocblks_ufs2(ap)
 	ip = VTOI(vp);
 	fs = ip->i_fs;
 	ump = ip->i_ump;
-	if (fs->fs_contigsumsize <= 0)
+	/*
+	 * If we are not tracking block clusters or if we have less than 2%
+	 * free blocks left, then do not attempt to cluster. Running with
+	 * less than 5% free block reserve is not recommended and those that
+	 * choose to do so do not expect to have good file layout.
+	 */
+	if (fs->fs_contigsumsize <= 0 || freespace(fs, 2) < 0)
 		return (ENOSPC);
 	buflist = ap->a_buflist;
 	len = buflist->bs_nchildren;
@@ -1173,7 +1185,7 @@ ffs_dirpref(pip)
 			if (fs->fs_contigdirs[cg] < maxcontigdirs)
 				return ((ino_t)(fs->fs_ipg * cg));
 		}
-	for (cg = prefcg - 1; cg >= 0; cg--)
+	for (cg = 0; cg < prefcg; cg++)
 		if (fs->fs_cs(fs, cg).cs_ndir < maxndir &&
 		    fs->fs_cs(fs, cg).cs_nifree >= minifree &&
 	    	    fs->fs_cs(fs, cg).cs_nbfree >= minbfree) {
@@ -1186,7 +1198,7 @@ ffs_dirpref(pip)
 	for (cg = prefcg; cg < fs->fs_ncg; cg++)
 		if (fs->fs_cs(fs, cg).cs_nifree >= avgifree)
 			return ((ino_t)(fs->fs_ipg * cg));
-	for (cg = prefcg - 1; cg >= 0; cg--)
+	for (cg = 0; cg < prefcg; cg++)
 		if (fs->fs_cs(fs, cg).cs_nifree >= avgifree)
 			break;
 	return ((ino_t)(fs->fs_ipg * cg));



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201309121936.r8CJa4Q6060099>