From owner-freebsd-fs@FreeBSD.ORG  Tue Dec 18 03:47:17 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 7D6B192C;
 Tue, 18 Dec 2012 03:47:17 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au
 [211.29.132.186])
 by mx1.freebsd.org (Postfix) with ESMTP id B45328FC17;
 Tue, 18 Dec 2012 03:47:13 +0000 (UTC)
Received: from c122-106-175-26.carlnfd1.nsw.optusnet.com.au
 (c122-106-175-26.carlnfd1.nsw.optusnet.com.au [122.106.175.26])
 by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id qBI3l0Mu012903
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Tue, 18 Dec 2012 14:47:01 +1100
Date: Tue, 18 Dec 2012 14:47:00 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Kirk McKusick <mckusick@mckusick.com>
Subject: Re: FFS - Still using rotational delay with modern disks? 
In-Reply-To: <201212180012.qBI0CW0c098207@chez.mckusick.com>
Message-ID: <20121218142218.A1023@besplex.bde.org>
References: <201212180012.qBI0CW0c098207@chez.mckusick.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=fev1UDsF c=1 sm=1 a=aqK3YMQvoDsA:10
 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=vz6LxnPcZksA:10
 a=pGLkceISAAAA:8 a=6I5d2MoRAAAA:8 a=L6tdpFjRAAAA:8 a=UnnwDxOAFml7KjqtybAA:9
 a=CjuIK1q_8ugA:10 a=MSl-tDqOz04A:10 a=SV7veod9ZcQA:10 a=RZHap4myAk8A:10
 a=L8WxzNrgCeLs9fR2:21 a=ll_fA1kdA-roJnY7:21 a=bxQHXO5Py4tHmhUgaywp5w==:117
Cc: Greg 'groggy' Lehey <grog@FreeBSD.org>, freebsd-fs@FreeBSD.org,
 Dieter BSD <dieterbsd@engineer.com>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Dec 2012 03:47:17 -0000

On Mon, 17 Dec 2012, Kirk McKusick wrote:

>> Date: Tue, 18 Dec 2012 02:56:49 +0300
>> Subject: Re: FFS - Still using rotational delay with modern disks?
>> From: Sergey Kandaurov <pluknet@gmail.com>
>> To: "Greg 'groggy' Lehey" <grog@freebsd.org>
>> Cc: freebsd-fs@freebsd.org, freebsd-performance@freebsd.org,
>>         Dieter BSD <dieterbsd@engineer.com>
>>
>> On 18 December 2012 03:23, Greg 'groggy' Lehey <grog@freebsd.org> wrote:
>>> On Monday, 17 December 2012 at 16:44:11 -0500, Dieter BSD wrote:
>>>> The newfs man page says:
>>>>
>>>>  -a maxcontig
>>>>  Specify the maximum number of contiguous blocks that will be laid
>>>>  out before forcing a rotational delay.  The default value is 16.
>>>>  See tunefs(8) for more details on how to set this option.
>>>>
>>>> Is this still a good idea with modern drives where the number of
>>>> sectors per track varies, and no one but the manufacturer knows how
>>>> many sectors a particular track has?
>>>
>>> No.
>>>
>>> It looks as if this, and also a number of comments in sys/ufs/ffs/fs.h
>>> and sys/ufs/ffs/ffs_alloc.c, are leftovers from the Olden Days.  The
>>> value isn't used anywhere that I can see.  Unless somebody can show
>>> that I'm wrong, I'd suggest that this is a documentation issue that I
>>> can take a look at.

My version (only) fixes the comments about it in ffs_alloc.c

>> [performance@ list trimmed]
>>
>> I'm not sure about this.
>> In UFS fs_maxcontig controls fs_contigsumsize during newfs, both saved
>> in superblock, and fs_contigsumsize is widely used throughout FFS.
>> ...

Yes, fs_maxcontig is used a lot in newfs for initializing fs_contigsumsize,
which is used a lot in ffs.

> Maxcontig is still in the filesystem. However, it is generally overridden
> (i.e., ignored) when clustering and dynamic block reallocation are in
> effect which they are by default and have been since some time early in
> the 1990's.

fs_maxcontig is always ignored except in newfs (and in fsck_ffs,
presumably for reconstructing fs_contigsumsize), and in dumpfs and
growfs for printing it.

Here is my fix for ffs_alloc.c.  It removes the only reference to it
in the kernel outside of ffs/fs.h.  FS_MAXCONTIG there is also only
used in newfs and fsck_ffs, and the larger comment attached to it is
wronger because it gives more details about things that aren't done
any more.

@ Index: ffs_alloc.c
@ ===================================================================
@ RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_alloc.c,v
@ retrieving revision 1.121
@ diff -u -2 -r1.121 ffs_alloc.c
@ --- ffs_alloc.c	16 Jun 2004 09:47:25 -0000	1.121
@ +++ ffs_alloc.c	2 Jan 2012 05:52:39 -0000
@ @@ -1010,5 +1062,6 @@
@   * Select the desired position for the next block in a file.  The file is
@   * logically divided into sections. The first section is composed of the
@ - * direct blocks. Each additional section contains fs_maxbpg blocks.
@ + * direct blocks and the next fs_maxbpg blocks. Each additional section
@ + * contains fs_maxbpg blocks.
@   *
@   * If no blocks have been allocated in the first section, the policy is to
@ @@ -1022,12 +1075,11 @@
@   * continues until a cylinder group with greater than the average number
@   * of free blocks is found. If the allocation is for the first block in an
@ - * indirect block, the information on the previous allocation is unavailable;
@ + * indirect block or the previous block is a hole, then the information on
@ + * the previous allocation is unavailable;
@   * here a best guess is made based upon the logical block number being
@   * allocated.
@   *
@   * If a section is already partially allocated, the policy is to
@ - * contiguously allocate fs_maxcontig blocks. The end of one of these
@ - * contiguous blocks and the beginning of the next is laid out
@ - * contiguously if possible.
@ + * allocate blocks contiguously within the section if possible.
@   */
@  ufs2_daddr_t

Some the above might only apply with the following unrelated changes to
the code.  These changes are:
- something about getting more contiguity across cylinder groups.  IIRC,
   the old version unnecessarily skips some blocks when starting a new
   cg.  This gives an unnecessarily large seek from the previous cg and
   later more discontiguity than necessary when the hole is filled in.
   Not very important.  To test this, write a very large file on a new
   file system and check that all of its blocks are as contiguous as
   possible.  This extends my previous changes which give contiguity
   across the first indirect block.  Maximal contiguity across cg's is
   relatively unimportant since the cg rarely changes, but it easier
   to ensure than maximal contiguity across further indirect blocks.
- use the cgdmin() macro and not its expansion.

@ @@ -1039,12 +1091,18 @@
@  {
@  	struct fs *fs;
@ -	int cg;
@ -	int avgbfree, startcg;
@ +	int avgbfree, cg, firstsection, newsection, startcg;
@ 
@  	fs = ip->i_fs;
@ -	if (indx % fs->fs_maxbpg == 0 || bap[indx - 1] == 0) {
@ -		if (lbn < NDADDR + NINDIR(fs)) {
@ +	if (lbn < NDADDR + fs->fs_maxbpg) {
@ +		firstsection = 1;
@ +		newsection = 0;
@ +	} else {
@ +		firstsection = 0;
@ +		newsection = ((lbn - NDADDR) % fs->fs_maxbpg == 0);
@ +	}
@ +	if (indx == 0 || bap[indx - 1] == 0 || newsection) {
@ +		if (firstsection) {
@  			cg = ino_to_cg(fs, ip->i_number);
@ -			return (fs->fs_fpg * cg + fs->fs_frag);
@ +			return (cgdmin(fs, cg));
@  		}
@  		/*
@ @@ -1052,8 +1110,17 @@
@  		 * unused data blocks.
@  		 */
@ -		if (indx == 0 || bap[indx - 1] == 0)
@ -			startcg =
@ -			    ino_to_cg(fs, ip->i_number) + lbn / fs->fs_maxbpg;
@ -		else
@ +		if (indx == 0 || bap[indx - 1] == 0) {
@ +			cg = ino_to_cg(fs, ip->i_number) +
@ +			    (lbn - NDADDR) / fs->fs_maxbpg;
@ +			if (!newsection) {
@ +				/*
@ +				 * Actually, use our best guess if the
@ +				 * section is not new.
@ +				 */
@ +				cg %= fs->fs_ncg;
@ +				return (cgdmin(fs, cg));
@ +			}
@ +			startcg = cg;
@ +		} else
@  			startcg = dtog(fs, bap[indx - 1]) + 1;
@  		startcg %= fs->fs_ncg;
@ @@ -1062,16 +1129,13 @@
@  			if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) {
@  				fs->fs_cgrotor = cg;
@ -				return (fs->fs_fpg * cg + fs->fs_frag);
@ +				return (cgdmin(fs, cg));
@  			}
@  		for (cg = 0; cg <= startcg; cg++)
@  			if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) {
@  				fs->fs_cgrotor = cg;
@ -				return (fs->fs_fpg * cg + fs->fs_frag);
@ +				return (cgdmin(fs, cg));
@  			}
@  		return (0);
@  	}
@ -	/*
@ -	 * We just always try to lay things out contiguously.
@ -	 */
@  	return (bap[indx - 1] + fs->fs_frag);
@  }
@ @@ -1095,5 +1159,5 @@
@  		if (lbn < NDADDR + NINDIR(fs)) {
@  			cg = ino_to_cg(fs, ip->i_number);
@ -			return (fs->fs_fpg * cg + fs->fs_frag);
@ +			return (cgdmin(fs, cg));
@  		}
@  		/*
@ @@ -1111,10 +1175,10 @@
@  			if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) {
@  				fs->fs_cgrotor = cg;
@ -				return (fs->fs_fpg * cg + fs->fs_frag);
@ +				return (cgdmin(fs, cg));
@  			}
@  		for (cg = 0; cg <= startcg; cg++)
@  			if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) {
@  				fs->fs_cgrotor = cg;
@ -				return (fs->fs_fpg * cg + fs->fs_frag);
@ +				return (cgdmin(fs, cg));
@  			}
@  		return (0);

Bruce