From owner-freebsd-fs@FreeBSD.ORG Wed Oct 29 17:25:57 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D374516A4CE; Wed, 29 Oct 2003 17:25:57 -0800 (PST) Received: from sploot.vicor-nb.com (sploot.vicor-nb.com [208.206.78.81]) by mx1.FreeBSD.org (Postfix) with ESMTP id F105643F75; Wed, 29 Oct 2003 17:25:56 -0800 (PST) (envelope-from kmarx@vicor.com) Received: from vicor.com (localhost [127.0.0.1]) by sploot.vicor-nb.com (8.12.8/8.12.8) with ESMTP id h9U1KoT1093518; Wed, 29 Oct 2003 17:20:51 -0800 (PST) (envelope-from kmarx@vicor.com) Message-ID: <3FA06772.10409@vicor.com> Date: Wed, 29 Oct 2003 17:20:50 -0800 From: Ken Marx User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3b) Gecko/20030402 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Don Lewis References: <200310290859.h9T8xWeF028514@gw.catspoiler.org> In-Reply-To: <200310290859.h9T8xWeF028514@gw.catspoiler.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-fs@FreeBSD.org cc: julian@elischer.org cc: mckusick@beastie.mckusick.com Subject: Re: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Oct 2003 01:25:58 -0000 Don Lewis wrote: > On 28 Oct, Ken Marx wrote: > >> >>Kirk McKusick wrote: > > >>>I does look like the hash function is having some trouble. >>>It has been completely revamped in 5.0, but is still using >>>a "power-of-2" hashing scheme in 4.X. I highly recommend >>>trying a scheme with non-power-of-2 base. Perhaps something >>>as simple as changing the hashing to use modulo rather than >>>logical & (e.g., in bufhash change from & bufhashmask to >>>% bufhashmask). >>> >>> Kirk McKusick >>> >>> > > >>We have a sample 'fix' for the hashtable in vfs_bio.c >>that uses all the blkno bits. It's in the diff link above. >>Use as you see fit. However, it too doesn't really address >>our symptoms significantly. Darn. >>Bogging down to 1Mb/sec and > 90% system seen. > > > A Fibonacci hash, like I implemented in the kern/kern_mtxpool.c 1.8, > might be a good choice here, since it tends to distribute the keys > fairly uniformly. I think this is a secondary issue, though. > > I think the real problem is the following code in ffs_dirpref(): > > avgifree = fs->fs_cstotal.cs_nifree / fs->fs_ncg; > avgbfree = fs->fs_cstotal.cs_nbfree / fs->fs_ncg; > avgndir = fs->fs_cstotal.cs_ndir / fs->fs_ncg; > [snip] > maxndir = min(avgndir + fs->fs_ipg / 16, fs->fs_ipg); > minifree = avgifree - fs->fs_ipg / 4; > if (minifree < 0) > minifree = 0; > minbfree = avgbfree - fs->fs_fpg / fs->fs_frag / 4; > if (minbfree < 0) > minbfree = 0; > [snip] > prefcg = ino_to_cg(fs, pip->i_number); > for (cg = prefcg; cg < fs->fs_ncg; cg++) > if (fs->fs_cs(fs, cg).cs_ndir < maxndir && > fs->fs_cs(fs, cg).cs_nifree >= minifree && > fs->fs_cs(fs, cg).cs_nbfree >= minbfree) { > if (fs->fs_contigdirs[cg] < maxcontigdirs) > return ((ino_t)(fs->fs_ipg * cg)); > } > for (cg = 0; cg < prefcg; cg++) > if (fs->fs_cs(fs, cg).cs_ndir < maxndir && > fs->fs_cs(fs, cg).cs_nifree >= minifree && > fs->fs_cs(fs, cg).cs_nbfree >= minbfree) { > if (fs->fs_contigdirs[cg] < maxcontigdirs) > return ((ino_t)(fs->fs_ipg * cg)); > } > > If the file system is more than 75% full, minbfree will be zero, which > will allow new directories to be created in cylinder groups that have no > free blocks for either the directory itself, or for any files created in > that directory. If this happens, allocating the blocks for the > directory and its files will require ffs_alloc() to do an expensive > search across the cylinder groups for each block. It looks to me like > minbfree needs to equal, or at least a lot closer to avgbfree. > > A similar situation exists with minifree. Please note that the fallback > algorithm uses the condition: > fs->fs_cs(fs, cg).cs_nifree >= avgifree > > > Interesting. We (Vicor) will defer to experts here, but are very willing to test anything you come up with. thanks, k -- Ken Marx, kmarx@vicor-nb.com I insist that we do the right thing and be accountable for the realistic goals. - http://www.bigshed.com/cgi-bin/speak.cgi