From owner-freebsd-performance@FreeBSD.ORG Sat Apr 17 00:41:24 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 70F6F16A4CE for ; Sat, 17 Apr 2004 00:41:24 -0700 (PDT) Received: from gen129.n001.c02.escapebox.net (gen129.n001.c02.escapebox.net [213.73.91.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9D3E243D48 for ; Sat, 17 Apr 2004 00:41:23 -0700 (PDT) (envelope-from gemini@geminix.org) Message-ID: <4080DF9F.3040302@geminix.org> Date: Sat, 17 Apr 2004 09:41:19 +0200 From: Uwe Doering Organization: Private UNIX Site User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6) Gecko/20040119 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-performance@freebsd.org References: <20040416163845.GG87362@nasby.net> <20040416221211.GM87362@nasby.net> In-Reply-To: <20040416221211.GM87362@nasby.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Received: from gemini by geminix.org with asmtp (TLSv1:AES256-SHA:256) (Exim 3.36 #1) id 1BEkS1-0003F7-00; Sat, 17 Apr 2004 09:41:22 +0200 Subject: Re: How does disk caching work? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Apr 2004 07:41:24 -0000 Jim C. Nasby wrote: > On Sat, Apr 17, 2004 at 01:56:55AM +0400, "Igor Shmukler" wrote: > >>>Is there a document anywhere that describes in detail how FreeBSD >>>handles disk caching? I've read Matt Dillon's description of the VM >>>system, but it deals mostly with programs, other than vague statements >>>such as 'FreeBSD uses all available memory for disk caching'. >> >>Well, the statement is not vague. FreeBSD has a unified buffer cache. This means that ALL AVAILABLE >>MEMORY IS A BUFFER CACHE for all device IO. >> >>>I think I know how caching memory mapped IO works for the most part, >>>since it should be treated just like program data, but what about files >>>that aren't memory mapped? What impact is there as pages move from >>>active to inactive to cache to free? What role do wired and buffer pages >>>play? >> >>If file is not memory mapped it is not in memory, is it? Where do you cache it? Maybe I am missing >>somewhing? Do you maybe want to know about node caching? > > What if the file isn't memory mapped? You can access a file without > mapping it into memory, right? In FreeBSD, file and directory data always exists as VM objects, that is, a collection of virtual memory pages. Those that have been accessed exist in physical memory (if not recycled due to inactivity), the rest is just reservations. That's why it is called "virtual memory". Whether these objects get accessed by read()/write() or mmap() depends on your application. These system calls are just different userland interfaces to the same kernel resource. >>When pages are rotated from active to inactive and then to cache buckets they is still retains vnode >>references. Once it is in free queue, there is no way to put it back to cache. Association is lost. >> >>Wired pages are to pin memory. So that we do not get situation when fault handling code is paged out. >> >>I am not FreeBSD guru so I never heard of BUFFER pages. Is there such a concept? > > I'm reffering to the 'Buf' column at the top of top. I remember reading > something about that being used to cache file descriptors before the > files are mapped into memory, but I'm not very clear on what is actually > happening. The disk i/o buffers you refer to (the 'Buf' column in 'top') are the actual interface between the VM system and the disk device drivers. For file and directory data, sets of VM pages get referred by and assigned to disk i/o buffers. There they are dealt with by a kernel daemon process that does the actual synchronization between VM and disks. That's where the soft updates algorithm is implemented, for instance. In case of file and directory data, once the data has been written out to disk (if the memory pages were "dirty") the respective disk i/o buffer gets released immediately and can be recycled for other purposes, since it just referred to memory pages that continue to exist within the VM system. Meta data (inodes etc.) is a different matter, though. There is no VM representation for this, so for disk i/o they have to be cached in extra memory allocated for this purpose. A disk i/o buffer then refers to this memory range and tries to keep it around for as long as possible. A classical cache algorithm like LRU recycles these buffers and memory allocations eventually. As usual, the actual implementation is even more complex, but I think you got a picture of how it works. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net