From owner-freebsd-fs@FreeBSD.ORG Tue Jun 28 23:14:01 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E42EF106566C for ; Tue, 28 Jun 2011 23:14:01 +0000 (UTC) (envelope-from gosand1982@yahoo.com) Received: from nm15-vm2.bullet.mail.ne1.yahoo.com (nm15-vm2.bullet.mail.ne1.yahoo.com [98.138.91.91]) by mx1.freebsd.org (Postfix) with SMTP id 9C4B68FC19 for ; Tue, 28 Jun 2011 23:14:01 +0000 (UTC) Received: from [98.138.90.56] by nm15.bullet.mail.ne1.yahoo.com with NNFMP; 28 Jun 2011 23:14:00 -0000 Received: from [98.138.87.5] by tm9.bullet.mail.ne1.yahoo.com with NNFMP; 28 Jun 2011 23:14:00 -0000 Received: from [127.0.0.1] by omp1005.mail.ne1.yahoo.com with NNFMP; 28 Jun 2011 23:14:00 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 914026.8479.bm@omp1005.mail.ne1.yahoo.com Received: (qmail 7933 invoked by uid 60001); 28 Jun 2011 23:14:00 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1309302840; bh=JwY3tQV+veBPXTjHrOYzyIlafn1VqzOZZj+KyzZCn9s=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=DjqZ/4V6c1Hr8vQBmjCXXMW2/YKLo3mR6qHAQX1NsDo5KIxUiv9/kiEKWSfKO9Ch9yCuq1M9wNU/ykP7oIfevNvGmcm02SoqIg6DEO70amNbYPhhTS46OFhqZVxkR8pe9EW60Yw0QMS+h2p5kigRahcCltumcDjKOs0usGgMdP4= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=ABExjLv3d35UqgjJuhPl9v6/AYaJjmzGte6xUw8bcKZ6UrjMV+XK5omxwnoCtzfTiRFVMcNN8ImJimc46iqlvFg8LhE6ZWmOKsPROG09ggHxf5c+DJQ0CqvffQJUpbsAVTqY8RCVjpETdOfyyBavC5MsyXSco64Q+688px4EVko=; X-YMail-OSG: ZygOmYAVM1mtdPiFpoiT2T6PNWPRLhHpgwUHQyirlOW.ZMr 1g4mdfdJRYpo_Q_1SPJyDE420Q9D83oZJFnAg0hCggLGVTr_PXsG51SzTZbI .RZh2gIMQTM_cFxgltTyfal_0_QeViRZyeThGSnbbMGG3PZRJ.JEk0DA8ktS N_OnlEJDkomppHLtnrCpEgFMkf73yl4Wyaum2vP8Q734D8ruaLhYaad0Fqz9 XjabE8Y8LjMbiyLrCEk7nqZJQWO3a.miR4X4eE3aiN2Yoo6mh.IS78YeZsDs rFpF_MbcE0GDeudgsGIRdFO_3XoogIo3xlb5RVqZfspwBxsRmfPc- Received: from [12.202.173.2] by web120004.mail.ne1.yahoo.com via HTTP; Tue, 28 Jun 2011 16:14:00 PDT X-Mailer: YahooMailRC/572 YahooMailWebService/0.8.112.307740 References: <1309217450.43651.YahooMailRC@web120014.mail.ne1.yahoo.com> <20110628010822.GA41399@icarus.home.lan> Message-ID: <1309302840.88674.YahooMailRC@web120004.mail.ne1.yahoo.com> Date: Tue, 28 Jun 2011 16:14:00 -0700 (PDT) From: George Sanders To: Jeremy Chadwick In-Reply-To: <20110628010822.GA41399@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailman-Approved-At: Wed, 29 Jun 2011 01:31:58 +0000 Cc: freebsd-fs@freebsd.org Subject: Re: Improving old-fashioned UFS2 performance with lots of inodes... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Jun 2011 23:14:02 -0000 Hello Jeremy, > > with over 100 million inodes on the filesystem, things go slow. Overall > > throughput is fine, and I have no complaints there, but doing any kind of > > operations with the files is quite slow. Building a file list with rsync, >or > > > doing a cp, or a ln -s of a big dir tree, etc. > > > > Let's assume that the architecture is not changing ... it's going to be >FreeBSD > > > 8.x, using UFS2, and raid6 on actual spinning (7200rpm) disks. > > > > What can I do to speed things up ? > > > > Right now I have these in my loader.conf: > > > > kern.maxdsiz="4096000000"# for fsck > > vm.kmem_size="1610612736"# for big rsyncs > > vm.kmem_size_max="1610612736"# for big rsyncs > > On what exact OS version? Please don't say "8.2", need to know > 8.2-RELEASE, -STABLE, or what. You said "8.x" above, which is too > vague. If 8.2-STABLE you should not be tuning vm.kmem_size_max at all, > and you probably don't need to tune vm.kmem_size either. Ok, right now we are on 6.4-RELEASE, but it is our intention to move to 8.2-RELEASE. If the kmem loader.conf options are no longer relevant in 8.2-STABLE, should I assume that will also be the case when 8.3-RELEASE comes along ? > I also do not understand how vm.kmem_size would affect rsync, since > rsync is a userland application. I imagine you'd want to adjust > kern.maxdsiz and kern.dfldsiz (default dsiz). Well, a huge rsync with 20+ million files dies with memory related errors, and continued to do so until we upped the kmem values that high. We don't know why, but we know it "fixed it". > > and I also set: > > > > vfs.ufs.dirhash_maxmem=64000000 > > This tunable uses memory for a single directorie that has a huge amount > of files in it; AFAIK it does not apply to "large directory structures" > (as in directories within directories within directories). It's obvious > you're just tinkering with random sysctls hoping to gain performance > without really understanding what the sysctls do. :-) To see if you > even need to increase that, try "sysctl -a | grep vfs.ufs.dirhash" and > look at dirhash_mem vs. dirhash_maxmem, as well as dirhash_lowmemcount. No, we actually ALSO have huge directories, and we do indeed need this value. This is the one setting that we actually understand and have empirically measured. > The only thing I can think of on short notice is to have multiple > filesystems (volumes) instead of one large 12TB one. This is pretty > common in the commercial filer world. Ok, that is interesting - are you saying create multiple, smaller UFS filesystems on the single large 12TB raid6 array ? Or are you saying create a handful of smaller arrays ? We have to burn two disks for every raid6 array we make, as I am sure you know, so we really can't split it up into multiple arrays. We could, however, split the single raid6 array into multiple, formatted UFS2 filesystems, but I don't understand how that would help with our performance ? Certainly fsck time would be much shorter, and we could bring up each filesystem after it fsck'd, and then move to the next one ... but in terms of live performance, how does splitting the array into multiple filesystems help ? The nature of a raid array (as I understand it) would have us beating all 12 disks regardless of which UFS filesystems were being used. Can you elaborate ? > Regarding system RAM and UFS2: I have no idea, Kirk might have to > comment on that. > > You could "make use" of system RAM for cache (ZFS ARC) if you were using > ZFS instead of native UFS2. However, if the system has 64GB of RAM, you > need to ask yourself why the system has that amount of RAM in the first > place. For example, if the machine runs mysqld and is tuned to use a > large amount of memory, you really don't ""have"" 64GB of RAM to play > with, and thus wouldn't want mysqld and some filesystem caching model > fighting over memory (e.g. paging/swapping). Actually, the system RAM is there for the purpose of someday using ZFS - and for no other reason. However, it is realistically a few years away on our timeline, unfortunately, so for now we will use UFS2, and as I said ... it seems a shame that UFS2 cannot use system RAM for any good purpose... Or can it ? Anyone ?