From owner-freebsd-performance@FreeBSD.ORG Mon Apr 19 23:17:12 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 11E2916A4CE for ; Mon, 19 Apr 2004 23:17:12 -0700 (PDT) Received: from gen129.n001.c02.escapebox.net (gen129.n001.c02.escapebox.net [213.73.91.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7051843D53 for ; Mon, 19 Apr 2004 23:17:11 -0700 (PDT) (envelope-from gemini@geminix.org) Message-ID: <4084C064.5080506@geminix.org> Date: Tue, 20 Apr 2004 08:17:08 +0200 From: Uwe Doering Organization: Private UNIX Site User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6) Gecko/20040119 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-performance@freebsd.org References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Received: from gemini by geminix.org with asmtp (TLSv1:AES256-SHA:256) (Exim 3.36 #1) id 1BFoZC-000FdE-00; Tue, 20 Apr 2004 08:17:10 +0200 Subject: Re: How does disk caching work? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Apr 2004 06:17:12 -0000 Igor Shmukler wrote: >>>Sorry, I shouldn't have been lazy and actually looked up the settings. >>>Yes, those are the settings I was reffering to. Someone else had cranked >>>them up so that the machine was maintaining about 1.7G in cache; he said >>>that he'd noticed a reduction in disk IO when he did that. I haven't >>>been able to see any difference in disk IO, though it seems logical that >>>setting cache too high would hurt write caching and actually increase >>>disk IO. It's currently set to whatever the kernel thought best, so I'll >>>just leave it there. >> >>Well, I'm afraid your colleague must have been imagining things. The >>cache queue ('Cache' column in 'top') is just a phase in the laundering >>procedure (VM page recyling) between the inactive queue ('Inact' in >>'top') and the free queue ('Free' in 'top'). So these variables have >>nothing to do with disk i/o performance. > > I am not sure you are correct here. I understand things very differently. > Why it is a fact that number of pages in the cache queue does not affect IO throughput, changing vm setting such as: > vm.stats.vm.v_cache_min, vm.stats.vm.v_cache_max, vm.stats.vm.v_free_target and vm.stats.vm.v_free_min should have an effect on disk IO. > > The very reason JD came up with cache pages is to minimize IO traffic. If we require lagrer number of free pages we cause OS remove references at earlier point. This should cause kernel re-read some of the pages that otherwise would be just requeued to active queue. > > Having larger cache queue would require VM to start cleaning dirty pages earlier, which results in some additional write traffic as well. However, this is not that bad, because here it is a zero sum game. If pages to become free, they would have to written out regardless of cache queue size, just at a later point. However there is a benefit to a larger cache bucket. The upside is that if machine often experiences burst in memory demand (pretty much any real-world server would), you are able to accamodate changing load without blocking. Well, I didn't claim that the cache queue were useless. It does have its merits. And there is a certain default amount configured by the kernel's auto-scaling code already. What I was trying to point out is that these variables don't necessarily do what their name suggests. Take 'vm.v_cache_max', for example. When you crank that up, instead of increasing the size of the cache queue it is actually the inactive queue that grows in size. This is because the kernel steals pages from the inactive queue when it temporarily runs out of pages in the cache queue, without having to block for i/o as long as there are clean (not written to or already laundered) pages in the inactive queue. When it finds dirty pages during this scan it schedules them for background synchronization with the disk, but again without blocking in the foreground. The reason for this algorithm is that it is better to keep pages in the inactive queue for as long as possibe, rather than moving them over to the cache queue prematurely. Pages in the inactive queue can be still mapped into the memory space of processes, while pages in the cache queue have lost this association. So, quite naturally, when the VM system has to reactivate a page (put it back into the active queue) this operation tends to be less expensive when the page is still in the inactive queue. So, for reasons like these, I keep recommending to either study the kernel sources before you try to tune the VM system, or leave these variables alone. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net