From owner-freebsd-fs@FreeBSD.ORG Wed Oct 29 22:42:21 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8396516A4CE for ; Wed, 29 Oct 2003 22:42:21 -0800 (PST) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9159F43FA3 for ; Wed, 29 Oct 2003 22:42:20 -0800 (PST) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.9p2/8.12.9) with ESMTP id h9U6fWeF031328; Wed, 29 Oct 2003 22:41:35 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <200310300641.h9U6fWeF031328@gw.catspoiler.org> Date: Wed, 29 Oct 2003 22:41:32 -0800 (PST) From: Don Lewis To: kmarx@vicor.com In-Reply-To: <3FA06772.10409@vicor.com> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: freebsd-fs@FreeBSD.org cc: gluk@ptci.ru cc: julian@elischer.org cc: mckusick@beastie.mckusick.com Subject: Re: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Oct 2003 06:42:21 -0000 On 29 Oct, Ken Marx wrote: > Don Lewis wrote: >> I think the real problem is the following code in ffs_dirpref(): >> >> avgifree = fs->fs_cstotal.cs_nifree / fs->fs_ncg; >> avgbfree = fs->fs_cstotal.cs_nbfree / fs->fs_ncg; >> avgndir = fs->fs_cstotal.cs_ndir / fs->fs_ncg; >> [snip] >> maxndir = min(avgndir + fs->fs_ipg / 16, fs->fs_ipg); >> minifree = avgifree - fs->fs_ipg / 4; >> if (minifree < 0) >> minifree = 0; >> minbfree = avgbfree - fs->fs_fpg / fs->fs_frag / 4; >> if (minbfree < 0) >> minbfree = 0; >> [snip] >> prefcg = ino_to_cg(fs, pip->i_number); >> for (cg = prefcg; cg < fs->fs_ncg; cg++) >> if (fs->fs_cs(fs, cg).cs_ndir < maxndir && >> fs->fs_cs(fs, cg).cs_nifree >= minifree && >> fs->fs_cs(fs, cg).cs_nbfree >= minbfree) { >> if (fs->fs_contigdirs[cg] < maxcontigdirs) >> return ((ino_t)(fs->fs_ipg * cg)); >> } >> for (cg = 0; cg < prefcg; cg++) >> if (fs->fs_cs(fs, cg).cs_ndir < maxndir && >> fs->fs_cs(fs, cg).cs_nifree >= minifree && >> fs->fs_cs(fs, cg).cs_nbfree >= minbfree) { >> if (fs->fs_contigdirs[cg] < maxcontigdirs) >> return ((ino_t)(fs->fs_ipg * cg)); >> } >> >> If the file system is more than 75% full, minbfree will be zero, which >> will allow new directories to be created in cylinder groups that have no >> free blocks for either the directory itself, or for any files created in >> that directory. If this happens, allocating the blocks for the >> directory and its files will require ffs_alloc() to do an expensive >> search across the cylinder groups for each block. It looks to me like >> minbfree needs to equal, or at least a lot closer to avgbfree. Actually, I think the expensive search will only happen for the first block in each file (and the other blocks will be allocated in the same cylinder group), but if you are creating tons of files that are only one block long ... >> A similar situation exists with minifree. Please note that the fallback >> algorithm uses the condition: >> fs->fs_cs(fs, cg).cs_nifree >= avgifree >> >> >> > > Interesting. We (Vicor) will defer to experts here, but are very willing to > test anything you come up with. You might try the lightly tested patch below. It tweaks the dirpref algorithm so that cylinder groups with free space >= 75% of the average free space and free inodes >= 75% of the average number of free inodes are candidates for allocating the directory. It will not chose a cylinder group that does not have at least one free block and one free inode. It also decreases maxcontigdirs as the free space decreases so that a cluster of directories is less likely to cause the cylinder group to overflow. I think it would be better to tune maxcontigdirs individually for each cylinder group, based on the free space in that cylinder group, but that is more complex ... Index: sys/ufs/ffs/ffs_alloc.c =================================================================== RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_alloc.c,v retrieving revision 1.64.2.2 diff -u -r1.64.2.2 ffs_alloc.c --- sys/ufs/ffs/ffs_alloc.c 21 Sep 2001 19:15:21 -0000 1.64.2.2 +++ sys/ufs/ffs/ffs_alloc.c 30 Oct 2003 06:01:38 -0000 @@ -696,18 +696,18 @@ * optimal allocation of a directory inode. */ maxndir = min(avgndir + fs->fs_ipg / 16, fs->fs_ipg); - minifree = avgifree - fs->fs_ipg / 4; - if (minifree < 0) - minifree = 0; - minbfree = avgbfree - fs->fs_fpg / fs->fs_frag / 4; - if (minbfree < 0) - minbfree = 0; + minifree = avgifree - avgifree / 4; + if (minifree < 1) + minifree = 1; + minbfree = avgbfree - avgbfree / 4; + if (minbfree < 1) + minbfree = 1; cgsize = fs->fs_fsize * fs->fs_fpg; dirsize = fs->fs_avgfilesize * fs->fs_avgfpdir; curdirsize = avgndir ? (cgsize - avgbfree * fs->fs_bsize) / avgndir : 0; if (dirsize < curdirsize) dirsize = curdirsize; - maxcontigdirs = min(cgsize / dirsize, 255); + maxcontigdirs = min((avgbfree * fs->fs_bsize) / dirsize, 255); if (fs->fs_avgfpdir > 0) maxcontigdirs = min(maxcontigdirs, fs->fs_ipg / fs->fs_avgfpdir);