From owner-freebsd-performance@FreeBSD.ORG  Sat Apr 17 00:41:24 2004
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 70F6F16A4CE
	for <freebsd-performance@freebsd.org>;
	Sat, 17 Apr 2004 00:41:24 -0700 (PDT)
Received: from gen129.n001.c02.escapebox.net (gen129.n001.c02.escapebox.net
	[213.73.91.129])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 9D3E243D48
	for <freebsd-performance@freebsd.org>;
	Sat, 17 Apr 2004 00:41:23 -0700 (PDT)
	(envelope-from gemini@geminix.org)
Message-ID: <4080DF9F.3040302@geminix.org>
Date: Sat, 17 Apr 2004 09:41:19 +0200
From: Uwe Doering <gemini@geminix.org>
Organization: Private UNIX Site
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6) Gecko/20040119
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: freebsd-performance@freebsd.org
References: <20040416163845.GG87362@nasby.net>
	<E1BEbKR-000ISM-00.shmukler-mail-ru@f7.mail.ru>
	<20040416221211.GM87362@nasby.net>
In-Reply-To: <20040416221211.GM87362@nasby.net>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Received: from gemini by geminix.org with asmtp (TLSv1:AES256-SHA:256)
	(Exim 3.36 #1)
	id 1BEkS1-0003F7-00; Sat, 17 Apr 2004 09:41:22 +0200
Subject: Re: How does disk caching work?
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 17 Apr 2004 07:41:24 -0000

Jim C. Nasby wrote:
> On Sat, Apr 17, 2004 at 01:56:55AM +0400, "Igor Shmukler"  wrote:
> 
>>>Is there a document anywhere that describes in detail how FreeBSD
>>>handles disk caching? I've read Matt Dillon's description of the VM
>>>system, but it deals mostly with programs, other than vague statements
>>>such as 'FreeBSD uses all available memory for disk caching'.
>>
>>Well, the statement is not vague. FreeBSD has a unified buffer cache. This means that ALL AVAILABLE 
>>MEMORY IS A BUFFER CACHE for all device IO.
>>
>>>I think I know how caching memory mapped IO works for the most part,
>>>since it should be treated just like program data, but what about files
>>>that aren't memory mapped? What impact is there as pages move from
>>>active to inactive to cache to free? What role do wired and buffer pages
>>>play?
>>
>>If file is not memory mapped it is not in memory, is it? Where do you cache it? Maybe I am missing 
>>somewhing? Do you maybe want to know about node caching?
> 
> What if the file isn't memory mapped? You can access a file without
> mapping it into memory, right?

In FreeBSD, file and directory data always exists as VM objects, that 
is, a collection of virtual memory pages.  Those that have been accessed 
exist in physical memory (if not recycled due to inactivity), the rest 
is just reservations.  That's why it is called "virtual memory". 
Whether these objects get accessed by read()/write() or mmap() depends 
on your application.  These system calls are just different userland 
interfaces to the same kernel resource.

>>When pages are rotated from active to inactive and then to cache buckets they is still retains vnode 
>>references. Once it is in free queue, there is no way to put it back to cache. Association is lost.
>>
>>Wired pages are to pin memory. So that we do not get situation when fault handling code is paged out.
>>
>>I am not FreeBSD guru so I never heard of BUFFER pages. Is there such a concept?
> 
> I'm reffering to the 'Buf' column at the top of top. I remember reading
> something about that being used to cache file descriptors before the
> files are mapped into memory, but I'm not very clear on what is actually
> happening.

The disk i/o buffers you refer to (the 'Buf' column in 'top') are the 
actual interface between the VM system and the disk device drivers.  For 
file and directory data, sets of VM pages get referred by and assigned 
to disk i/o buffers.  There they are dealt with by a kernel daemon 
process that does the actual synchronization between VM and disks. 
That's where the soft updates algorithm is implemented, for instance.

In case of file and directory data, once the data has been written out 
to disk (if the memory pages were "dirty") the respective disk i/o 
buffer gets released immediately and can be recycled for other purposes, 
since it just referred to memory pages that continue to exist within the 
VM system.

Meta data (inodes etc.) is a different matter, though.  There is no VM 
representation for this, so for disk i/o they have to be cached in extra 
memory allocated for this purpose.  A disk i/o buffer then refers to 
this memory range and tries to keep it around for as long as possible. 
A classical cache algorithm like LRU recycles these buffers and memory 
allocations eventually.

As usual, the actual implementation is even more complex, but I think 
you got a picture of how it works.

    Uwe
-- 
Uwe Doering         |  EscapeBox - Managed On-Demand UNIX Servers
gemini@geminix.org  |  http://www.escapebox.net