From owner-freebsd-fs@FreeBSD.ORG Fri Oct 21 19:11:13 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2EC9F106566C; Fri, 21 Oct 2011 19:11:13 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4]) by mx1.freebsd.org (Postfix) with ESMTP id A14E68FC21; Fri, 21 Oct 2011 19:11:12 +0000 (UTC) Received: from elsa.codelab.cz (localhost [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id 897D128424; Fri, 21 Oct 2011 21:11:10 +0200 (CEST) Received: from [192.168.1.2] (ip-86-49-61-235.net.upcbroadband.cz [86.49.61.235]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id 9438428423; Fri, 21 Oct 2011 21:11:08 +0200 (CEST) Message-ID: <4EA1C3CC.3090500@quip.cz> Date: Fri, 21 Oct 2011 21:11:08 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.1.19) Gecko/20110420 Lightning/1.0b1 SeaMonkey/2.0.14 MIME-Version: 1.0 To: Jeremy Chadwick References: <4E97FEDD.7060205@quip.cz> <4EA19203.5050503@quip.cz> <20111021162025.GA89885@icarus.home.lan> In-Reply-To: <20111021162025.GA89885@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, Ivan Voras Subject: Re: dirhash and dynamic memory allocation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Oct 2011 19:11:13 -0000 Jeremy Chadwick wrote: > On Fri, Oct 21, 2011 at 05:38:43PM +0200, Miroslav Lachman wrote: >> Hi, I am back on this topic... >> >> Ivan Voras wrote: >>> On 14/10/2011 11:20, Miroslav Lachman wrote: >>>> Hi all, >>>> >>>> I tried some tuning of dirhash on our servers and after googlig a bit, I >>>> found an old GSoC project wiki page about Dynamic Memory Allocation for >>>> Dirhash: http://wiki.freebsd.org/DirhashDynamicMemory >>>> Is there any reason not to use it / not commit it to HEAD? >>> >>> AFAIK it's sort-of already present. In 8-stable and recent kernels you >>> can give huge amounts of memory to dirhash via vfs.ufs.dirhash_maxmem >>> (but except in really large edge cases I don't think you *need* more >>> than 32 MB), and the kernel will scale-down or free the memory if not >>> needed. >>> >>> In effect, vfs.ufs.dirhash_maxmem is the upper limit - the kernel will >>> use less and will free the allocated memory in low memory situations >>> (which I've tried and it works). >> >> So the current behavior is that on 7.3+ and 8.x we have smaller >> average dirhash buffer (by default) than it was initialy 10 years >> ago. Because it starts as 2MB fixed size and now we have 2MB max, >> which is lowered in low mem situations... and sometimes it is set to >> 0MB! >> >> I caught this 2 days ago: >> >> root@rip ~/# sysctl vfs.ufs >> vfs.ufs.dirhash_reclaimage: 5 >> vfs.ufs.dirhash_lowmemcount: 36953 >> vfs.ufs.dirhash_docheck: 0 >> vfs.ufs.dirhash_mem: 0 >> vfs.ufs.dirhash_maxmem: 8388608 >> vfs.ufs.dirhash_minsize: 2560 >> >> I set maxmem to 8MB in sysctl.conf to increase performance and >> dirhash_mem 0 is really bad surprise! > > Actually, the "bad surprise" is dirhash_lowmemcount of 36953. You > increasing dirhash_maxmem is fine -- what you're seeing is that your > machine keeps running out of memory, or that your directories are filled > with so many files that you're exhausting the dirhash repetitively. > > I'm going to be blunt and just ask it: why does that happen? Or do you > have a filesystem that has an absurdly high number of files in a single > directory? If the former, ignore the next paragraph There are not absurdly high number of files in a single directory, because I know this potential problem and I am fighting against it with webapp developers. But I see similar lowmemcount on almost all UFS based servers. Most of them are for webhosting (running OpenSource or proprietary CMS, so the most content is in MySQL). Many of our servers have long uptime (about or more than year), so the lowmemcount numbers are higher on them. Webservers are hosting about 100-150 websites. Examples from 4 of our servers: vfs.ufs.dirhash_lowmemcount: 45295 up 39 days vfs.ufs.dirhash_lowmemcount: 164782 up 419 days vfs.ufs.dirhash_lowmemcount: 391452 up 102 days vfs.ufs.dirhash_lowmemcount: 633202 up 417 days Only few of our servers have lowmemcount lower than 1000 (but stil higher than 500) One example is server with jails, where UFS is used only for host system, and jails are on ZFS. This server has 4GB of RAM and 362MB used swap space: vfs.ufs.dirhash_lowmemcount: 936 up 284 days > I've harped on this before on the mailing list: one of the first things > I learned as a system administrator was that you Do Not(tm) fill > directories with tens of thousands of files. Split them up into > subdirs. Even caching daemons (squid, etc.) support this kind of thing; > filename "aj1j11hsfkqXaj21" should really be aj/1j/11hsfkqXaj21. You > get the idea. DNS/BIND administrators of systems which have tens of > thousands of domains are quite familiar with this scenario too. > >> I am worrying about bad performance in situation where dirhash is >> emptied in situations, where server is already running at maximum >> performance (there is some memory hungry process and system can >> start swapping to disk + dirhash is efectively disabled) >> >> I found a PR kern/145246 >> http://www.freebsd.org/cgi/query-pr.cgi?pr=145246 >> >> Is it possible to add some dirhash_minmem limit to not clear all the >> dirhash memory? >> So I can set dirhash_minmem=2MB dirhash_maxmem=16MB and then >> dirhash_mem will be allways between these two limits? > > dirhash shouldn't be "disabled", it's that memory pressure from other > things has priority over the dirhash, but I understand what you mean. > This is quite evident from dirhash_lowmemcount being so high. > > I understand what you want, and maybe there is a way to get what you > want (with little effort), but I am strongly inclined to say you need to > figure out on your system what is causing such memory pressure and solve > that. Honest: try to solve the real problem rather than dancing around > it. If you have a process that skyrockets in RSS/RES usage due to a > memory leak or out-of-control design (such as a daemonised perl script > which blindly uses .= to append data to a scalar, or blindly keeps > appending data to the end of a list), then fix that problem. As the servers are running 3rd party apps (customer's websites), it is out of my control to fix issues with PHP CMS etc. So low memory fix "is easy" - buy and add more RAM. > Basically I'm trying to say that it shouldn't be the responsibility of > dirhash to "work around" other problems happening on the system that > diminish or exhaust available memory. You end up with a kernel design > that has tons of one-offs in it and that does nothing but bite you in > the butt down the road. (Linux has been through this many times over.) You are partially right. But dirhash lowmemhook seems too sensitive to me. I see high lowmemcount numbers on systems with almost empty swap. (few kB in swap, not MBs) That's why I am looking for dirhash_minmem. Miroslav Lachman