Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 3 Jul 2013 12:24:15 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Kirk McKusick <mckusick@FreeBSD.org>
Cc:        svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org
Subject:   Re: svn commit: r252527 - head/sys/ufs/ffs
Message-ID:  <20130703113444.S1064@besplex.bde.org>
In-Reply-To: <201307022107.r62L786A087838@svn.freebsd.org>
References:  <201307022107.r62L786A087838@svn.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 2 Jul 2013, Kirk McKusick wrote:

> Log:
>  Make better use of metadata area by avoiding using it for data blocks
>  that no should no longer immediately follow their indirect blocks.

I use the following changes for allocation at the start of cylinder group.
They seem to be related.

% diff -u2 ffs_alloc.c~ ffs_alloc.c
% --- ffs_alloc.c~	Sun Jun 20 12:53:30 2004
% +++ ffs_alloc.c	Mon Jan  2 16:52:39 2012
% @@ -1010,5 +1062,6 @@
%   * Select the desired position for the next block in a file.  The file is
%   * logically divided into sections. The first section is composed of the
% - * direct blocks. Each additional section contains fs_maxbpg blocks.
% + * direct blocks and the next fs_maxbpg blocks. Each additional section
% + * contains fs_maxbpg blocks.
%   *
%   * If no blocks have been allocated in the first section, the policy is to
% @@ -1022,12 +1075,11 @@
%   * continues until a cylinder group with greater than the average number
%   * of free blocks is found. If the allocation is for the first block in an
% - * indirect block, the information on the previous allocation is unavailable;
% + * indirect block or the previous block is a hole, then the information on
% + * the previous allocation is unavailable;
%   * here a best guess is made based upon the logical block number being
%   * allocated.
%   *
%   * If a section is already partially allocated, the policy is to
% - * contiguously allocate fs_maxcontig blocks. The end of one of these
% - * contiguous blocks and the beginning of the next is laid out
% - * contiguously if possible.
% + * allocate blocks contiguously within the section if possible.
%   */
%  ufs2_daddr_t

IIRC, the comments were out of date before this change.  Especially the
final paragraph -- fs_maxcontig is not used for anything.

% @@ -1039,12 +1091,18 @@
%  {
%  	struct fs *fs;
% -	int cg;
% -	int avgbfree, startcg;
% +	int avgbfree, cg, firstsection, newsection, startcg;

BTW, I don't like changing the type of these variables to u_int in
-current.  This asks for new sign extension bugs to break the warnings
about old ones.  Even 16-bit signed int is almost large enough for cg,
so it is far from necessary to change it from signed to unsigned to
double its range.

% 
%  	fs = ip->i_fs;
% -	if (indx % fs->fs_maxbpg == 0 || bap[indx - 1] == 0) {
% -		if (lbn < NDADDR + NINDIR(fs)) {
% +	if (lbn < NDADDR + fs->fs_maxbpg) {
% +		firstsection = 1;
% +		newsection = 0;
% +	} else {
% +		firstsection = 0;
% +		newsection = ((lbn - NDADDR) % fs->fs_maxbpg == 0);
% +	}
% +	if (indx == 0 || bap[indx - 1] == 0 || newsection) {
% +		if (firstsection) {
%  			cg = ino_to_cg(fs, ip->i_number);
% -			return (fs->fs_fpg * cg + fs->fs_frag);
% +			return (cgdmin(fs, cg));
%  		}
%  		/*
% @@ -1052,8 +1110,17 @@
%  		 * unused data blocks.
%  		 */
% -		if (indx == 0 || bap[indx - 1] == 0)
% -			startcg =
% -			    ino_to_cg(fs, ip->i_number) + lbn / fs->fs_maxbpg;
% -		else
% +		if (indx == 0 || bap[indx - 1] == 0) {
% +			cg = ino_to_cg(fs, ip->i_number) +
% +			    (lbn - NDADDR) / fs->fs_maxbpg;
% +			if (!newsection) {
% +				/*
% +				 * Actually, use our best guess if the
% +				 * section is not new.
% +				 */
% +				cg %= fs->fs_ncg;
% +				return (cgdmin(fs, cg));
% +			}
% +			startcg = cg;
% +		} else
%  			startcg = dtog(fs, bap[indx - 1]) + 1;
%  		startcg %= fs->fs_ncg;
% @@ -1062,16 +1129,13 @@
%  			if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) {
%  				fs->fs_cgrotor = cg;
% -				return (fs->fs_fpg * cg + fs->fs_frag);
% +				return (cgdmin(fs, cg));
%  			}
%  		for (cg = 0; cg <= startcg; cg++)
%  			if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) {
%  				fs->fs_cgrotor = cg;
% -				return (fs->fs_fpg * cg + fs->fs_frag);
% +				return (cgdmin(fs, cg));
%  			}
%  		return (0);
%  	}
% -	/*
% -	 * We just always try to lay things out contiguously.
% -	 */
%  	return (bap[indx - 1] + fs->fs_frag);
%  }
% @@ -1095,5 +1159,5 @@
%  		if (lbn < NDADDR + NINDIR(fs)) {
%  			cg = ino_to_cg(fs, ip->i_number);
% -			return (fs->fs_fpg * cg + fs->fs_frag);
% +			return (cgdmin(fs, cg));
%  		}
%  		/*
% @@ -1111,10 +1175,10 @@
%  			if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) {
%  				fs->fs_cgrotor = cg;
% -				return (fs->fs_fpg * cg + fs->fs_frag);
% +				return (cgdmin(fs, cg));
%  			}
%  		for (cg = 0; cg <= startcg; cg++)
%  			if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) {
%  				fs->fs_cgrotor = cg;
% -				return (fs->fs_fpg * cg + fs->fs_frag);
% +				return (cgdmin(fs, cg));
%  			}
%  		return (0);

I remembered the changes to use cgdmin() as just style fixes, but
looking at the macro expansions now, this is far from clear.  The
macros are deeply nested and depend on the ffs version.  The inline
expressions don't depend on the ffs version, and are also missing
addition of fs_dblkno.

I now remember that all the changes are related to correctly skipping
over the early blocks whose count is given by fs_dblkno.  The direct
expression gives a block number that is too small to be for a data
block.  Blocks early in the cg are already allocated, so the preference
for using them given by returning the too-small value is ignored and
the caller skips over the allocated blocks.  However, IIRC this results
in a tiny cluster of data blocks being allocated early in the cg,
probably because all the blocks before fs_dblkno are not allocated for
metadata.  This showed up in tests for contiguous allocation.  Large
files on new file systems should be allocated perfectly contiguously
on each cg after the first (accept for indirect blocks), but instead
they have a gap at the start of each cg.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130703113444.S1064>