Date: Sat, 31 May 1997 06:16:07 +1000 From: Bruce Evans <bde@zeta.org.au> To: bde@zeta.org.au, dfr@nlsystems.com Cc: current@freebsd.org Subject: Re: disk cache challenged by small block sizes Message-ID: <199705302016.GAA19539@godzilla.zeta.org.au>
next in thread | raw e-mail | index | archive | help
>> It seems to fix all the speed problems (except ufs is still slower with >> a larger fs blocksize) and the block leak in ex2tfs. > >If you roll vfs_bio.c back to rev 1.115, does it affect the speed of ufs >with 8k blocksize? I am not sure whether my changes to vfs_bio would >affect that. That wasn't it. The slowdown was caused by ffs deciding to allocate all the blocks starting with the first indirect block on a slower part of the disk. It attempts to pessimize all cases, but is confused by fuzzy rounding :-). Details: 1. The file system has size 96MB (exactly). 2. The defaults for a block size of 4K give 10 cylinder groups (cg's) with 9 of size 10MB and one smaller one (slightly less than 6MB because of special blocks before the first cg). The average size is about 9.6MB. 3. The defaults for a block size of 8K give 3 cg's with 2 of size 32MB and one slightly smaller one. The average size is about 32MB. 4. I ran iozone on a new file system, so there was just one directory and one file. 5. The inode for the file was allocated in cg #0 in both cases. 6. The direct blocks were allocated in the same cg as the inode in both cases. 7. The first indirect block and subsequent data blocks are allocated on a cg with >= the average number of free blocks. (The comments before ffs_blkpref() about using a rotor are wrong. fs->fs_cgrotor is never used.) 8. In case (2), cg #0 is chosen because it has almost 10MB-metatada free and the average is about 9.6MB-metadata. 9. In case (3), cg #1 is chosen since it has significantly less than 32MB-metadata free and the average is about 32MB-metadata. 10. In case (3), cg #1 starts a full 1/3 of the way towards the slowest parts of the disk and the speed is significantly slower there. I think the combination of algorithms behind (6) and (7) is often wrong. It's silly to put the direct blocks on a different cg than the indirect blocks immediately following them. The silliest case is for a new file system with all cg's of the same size. Then exact calculation of the average number of free blocks would result in the indirect blocks always starting on cg #1 despite cg #0 being almost empty when the first indirect block is allocated. I added a bias towards using the same cg as the inode for the first indirect block. This is probably too strong. Bruce diff -c2 ffs_alloc.c~ ffs_alloc.c *** ffs_alloc.c~ Mon Mar 24 14:21:27 1997 --- ffs_alloc.c Sat May 31 03:08:56 1997 *************** *** 689,692 **** --- 686,700 ---- startcg %= fs->fs_ncg; avgbfree = fs->fs_cstotal.cs_nbfree / fs->fs_ncg; + /* + * Prefer the same cg as the inode if this allocation + * is for the first block in an indirect block. + */ + if (lbn == NDADDR) { + cg = ino_to_cg(fs, ip->i_number); + if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree / 2) { + fs->fs_cgrotor = cg; + return (fs->fs_fpg * cg + fs->fs_frag); + } + } for (cg = startcg; cg < fs->fs_ncg; cg++) if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) { *************** *** 694,698 **** return (fs->fs_fpg * cg + fs->fs_frag); } ! for (cg = 0; cg <= startcg; cg++) if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) { fs->fs_cgrotor = cg; --- 702,706 ---- return (fs->fs_fpg * cg + fs->fs_frag); } ! for (cg = 0; cg < startcg; cg++) if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) { fs->fs_cgrotor = cg;
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199705302016.GAA19539>