From owner-freebsd-fs@FreeBSD.ORG Tue Dec 18 03:47:17 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7D6B192C; Tue, 18 Dec 2012 03:47:17 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id B45328FC17; Tue, 18 Dec 2012 03:47:13 +0000 (UTC) Received: from c122-106-175-26.carlnfd1.nsw.optusnet.com.au (c122-106-175-26.carlnfd1.nsw.optusnet.com.au [122.106.175.26]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id qBI3l0Mu012903 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 18 Dec 2012 14:47:01 +1100 Date: Tue, 18 Dec 2012 14:47:00 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Kirk McKusick Subject: Re: FFS - Still using rotational delay with modern disks? In-Reply-To: <201212180012.qBI0CW0c098207@chez.mckusick.com> Message-ID: <20121218142218.A1023@besplex.bde.org> References: <201212180012.qBI0CW0c098207@chez.mckusick.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=fev1UDsF c=1 sm=1 a=aqK3YMQvoDsA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=vz6LxnPcZksA:10 a=pGLkceISAAAA:8 a=6I5d2MoRAAAA:8 a=L6tdpFjRAAAA:8 a=UnnwDxOAFml7KjqtybAA:9 a=CjuIK1q_8ugA:10 a=MSl-tDqOz04A:10 a=SV7veod9ZcQA:10 a=RZHap4myAk8A:10 a=L8WxzNrgCeLs9fR2:21 a=ll_fA1kdA-roJnY7:21 a=bxQHXO5Py4tHmhUgaywp5w==:117 Cc: Greg 'groggy' Lehey , freebsd-fs@FreeBSD.org, Dieter BSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2012 03:47:17 -0000 On Mon, 17 Dec 2012, Kirk McKusick wrote: >> Date: Tue, 18 Dec 2012 02:56:49 +0300 >> Subject: Re: FFS - Still using rotational delay with modern disks? >> From: Sergey Kandaurov >> To: "Greg 'groggy' Lehey" >> Cc: freebsd-fs@freebsd.org, freebsd-performance@freebsd.org, >> Dieter BSD >> >> On 18 December 2012 03:23, Greg 'groggy' Lehey wrote: >>> On Monday, 17 December 2012 at 16:44:11 -0500, Dieter BSD wrote: >>>> The newfs man page says: >>>> >>>> -a maxcontig >>>> Specify the maximum number of contiguous blocks that will be laid >>>> out before forcing a rotational delay. The default value is 16. >>>> See tunefs(8) for more details on how to set this option. >>>> >>>> Is this still a good idea with modern drives where the number of >>>> sectors per track varies, and no one but the manufacturer knows how >>>> many sectors a particular track has? >>> >>> No. >>> >>> It looks as if this, and also a number of comments in sys/ufs/ffs/fs.h >>> and sys/ufs/ffs/ffs_alloc.c, are leftovers from the Olden Days. The >>> value isn't used anywhere that I can see. Unless somebody can show >>> that I'm wrong, I'd suggest that this is a documentation issue that I >>> can take a look at. My version (only) fixes the comments about it in ffs_alloc.c >> [performance@ list trimmed] >> >> I'm not sure about this. >> In UFS fs_maxcontig controls fs_contigsumsize during newfs, both saved >> in superblock, and fs_contigsumsize is widely used throughout FFS. >> ... Yes, fs_maxcontig is used a lot in newfs for initializing fs_contigsumsize, which is used a lot in ffs. > Maxcontig is still in the filesystem. However, it is generally overridden > (i.e., ignored) when clustering and dynamic block reallocation are in > effect which they are by default and have been since some time early in > the 1990's. fs_maxcontig is always ignored except in newfs (and in fsck_ffs, presumably for reconstructing fs_contigsumsize), and in dumpfs and growfs for printing it. Here is my fix for ffs_alloc.c. It removes the only reference to it in the kernel outside of ffs/fs.h. FS_MAXCONTIG there is also only used in newfs and fsck_ffs, and the larger comment attached to it is wronger because it gives more details about things that aren't done any more. @ Index: ffs_alloc.c @ =================================================================== @ RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_alloc.c,v @ retrieving revision 1.121 @ diff -u -2 -r1.121 ffs_alloc.c @ --- ffs_alloc.c 16 Jun 2004 09:47:25 -0000 1.121 @ +++ ffs_alloc.c 2 Jan 2012 05:52:39 -0000 @ @@ -1010,5 +1062,6 @@ @ * Select the desired position for the next block in a file. The file is @ * logically divided into sections. The first section is composed of the @ - * direct blocks. Each additional section contains fs_maxbpg blocks. @ + * direct blocks and the next fs_maxbpg blocks. Each additional section @ + * contains fs_maxbpg blocks. @ * @ * If no blocks have been allocated in the first section, the policy is to @ @@ -1022,12 +1075,11 @@ @ * continues until a cylinder group with greater than the average number @ * of free blocks is found. If the allocation is for the first block in an @ - * indirect block, the information on the previous allocation is unavailable; @ + * indirect block or the previous block is a hole, then the information on @ + * the previous allocation is unavailable; @ * here a best guess is made based upon the logical block number being @ * allocated. @ * @ * If a section is already partially allocated, the policy is to @ - * contiguously allocate fs_maxcontig blocks. The end of one of these @ - * contiguous blocks and the beginning of the next is laid out @ - * contiguously if possible. @ + * allocate blocks contiguously within the section if possible. @ */ @ ufs2_daddr_t Some the above might only apply with the following unrelated changes to the code. These changes are: - something about getting more contiguity across cylinder groups. IIRC, the old version unnecessarily skips some blocks when starting a new cg. This gives an unnecessarily large seek from the previous cg and later more discontiguity than necessary when the hole is filled in. Not very important. To test this, write a very large file on a new file system and check that all of its blocks are as contiguous as possible. This extends my previous changes which give contiguity across the first indirect block. Maximal contiguity across cg's is relatively unimportant since the cg rarely changes, but it easier to ensure than maximal contiguity across further indirect blocks. - use the cgdmin() macro and not its expansion. @ @@ -1039,12 +1091,18 @@ @ { @ struct fs *fs; @ - int cg; @ - int avgbfree, startcg; @ + int avgbfree, cg, firstsection, newsection, startcg; @ @ fs = ip->i_fs; @ - if (indx % fs->fs_maxbpg == 0 || bap[indx - 1] == 0) { @ - if (lbn < NDADDR + NINDIR(fs)) { @ + if (lbn < NDADDR + fs->fs_maxbpg) { @ + firstsection = 1; @ + newsection = 0; @ + } else { @ + firstsection = 0; @ + newsection = ((lbn - NDADDR) % fs->fs_maxbpg == 0); @ + } @ + if (indx == 0 || bap[indx - 1] == 0 || newsection) { @ + if (firstsection) { @ cg = ino_to_cg(fs, ip->i_number); @ - return (fs->fs_fpg * cg + fs->fs_frag); @ + return (cgdmin(fs, cg)); @ } @ /* @ @@ -1052,8 +1110,17 @@ @ * unused data blocks. @ */ @ - if (indx == 0 || bap[indx - 1] == 0) @ - startcg = @ - ino_to_cg(fs, ip->i_number) + lbn / fs->fs_maxbpg; @ - else @ + if (indx == 0 || bap[indx - 1] == 0) { @ + cg = ino_to_cg(fs, ip->i_number) + @ + (lbn - NDADDR) / fs->fs_maxbpg; @ + if (!newsection) { @ + /* @ + * Actually, use our best guess if the @ + * section is not new. @ + */ @ + cg %= fs->fs_ncg; @ + return (cgdmin(fs, cg)); @ + } @ + startcg = cg; @ + } else @ startcg = dtog(fs, bap[indx - 1]) + 1; @ startcg %= fs->fs_ncg; @ @@ -1062,16 +1129,13 @@ @ if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) { @ fs->fs_cgrotor = cg; @ - return (fs->fs_fpg * cg + fs->fs_frag); @ + return (cgdmin(fs, cg)); @ } @ for (cg = 0; cg <= startcg; cg++) @ if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) { @ fs->fs_cgrotor = cg; @ - return (fs->fs_fpg * cg + fs->fs_frag); @ + return (cgdmin(fs, cg)); @ } @ return (0); @ } @ - /* @ - * We just always try to lay things out contiguously. @ - */ @ return (bap[indx - 1] + fs->fs_frag); @ } @ @@ -1095,5 +1159,5 @@ @ if (lbn < NDADDR + NINDIR(fs)) { @ cg = ino_to_cg(fs, ip->i_number); @ - return (fs->fs_fpg * cg + fs->fs_frag); @ + return (cgdmin(fs, cg)); @ } @ /* @ @@ -1111,10 +1175,10 @@ @ if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) { @ fs->fs_cgrotor = cg; @ - return (fs->fs_fpg * cg + fs->fs_frag); @ + return (cgdmin(fs, cg)); @ } @ for (cg = 0; cg <= startcg; cg++) @ if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) { @ fs->fs_cgrotor = cg; @ - return (fs->fs_fpg * cg + fs->fs_frag); @ + return (cgdmin(fs, cg)); @ } @ return (0); Bruce