From owner-freebsd-current  Mon Oct 28  0:55: 9 2002
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 21CD637B401; Mon, 28 Oct 2002 00:55:07 -0800 (PST)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 9C62E43E42; Mon, 28 Oct 2002 00:55:06 -0800 (PST)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.12.5/8.12.5) with ESMTP id g9S8svFC094313;
	Mon, 28 Oct 2002 00:54:57 -0800 (PST)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.12.5/8.12.5/Submit) id g9S8svSr094312;
	Mon, 28 Oct 2002 00:54:57 -0800 (PST)
	(envelope-from dillon)
Date: Mon, 28 Oct 2002 00:54:57 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200210280854.g9S8svSr094312@apollo.backplane.com>
To: Jeff Roberson <jroberson@chesapeake.net>
Cc: Seigo Tanimura <tanimura@axe-inc.co.jp>,
	Bruce Evans <bde@zeta.org.au>, <current@FreeBSD.ORG>,
	<tanimura@FreeBSD.ORG>
Subject: Re: Dynamic growth of the buffer and buffer page reclaim
References:  <20021023163758.R22147-100000@mail.chesapeake.net>
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

:I was going to comment on fragmentation issues, but that seems to have
:been very well covered.  I would like to point out that removing the
:buffer_map not only contributes to kernel map fragmentation, but also
:contention for the kernel map.  It might also prevent us from removing
:giant from the kernel map because it would add another interrupt time
:consumer.

    Yes.  Whatever the case any sort of temporary KVA mapping management
    system would need its own submap.  It would be insane to use the
    kernel_map or kmem_map for this.

    In regards to Seigo's patch:

    The scaleability issue is entirely related to the KVA mapping portion
    of the buffer cache.  Only I/O *WRITE* performance is specifically
    limited by the size of the buffer_map, due to the limited number of
    dirty buffers allowed in the map.  This in turn is a restriction
    required by filesystems which must keep track of 'dirty' buffers
    in order to sequence out writes.  Currently the only way around this
    limitation is to use mmap/MAP_NOSYNC.  In otherwords, we support
    dirty VM pages that are not associated with the buffer cache but
    most of the filesystem algorithms are still based around the
    assumption that dirty pages will be mapped into dirty buffers.

    I/O *READ* caching is limited only by the VM Page cache.   
    The reason you got slightly better numbers with your patch
    has nothing to do with I/O performance, it is simply related to 
    the cost of the buffer instantiations and teardowns that occur in
    the limit buffer_map space mapping pages out of the VM page cache.
    Since you could have more buffers, there were fewer instantiations
    and teardowns.  It's that simple.

    Unfortunately, this performance gain is *DIRECTLY* tied to the number
    of pages wired into the buffer cache.  It is precisely the wired pages
    portion of the instantiation and teardown that eats the extra cpu.
    So the moment you regulate the number of wired pages in the system, you
    will blow the performance you are getting.

    I can demonstrate the issue with a simple test.  Create a large file
    with dd, larger then physical memory:

    dd if=/dev/zero of=test bs=1m count=4096	# create a 4G file.

    Then dd (read) portions of the file and observe the performance.
    Do this several times to get stable numbers.

    dd if=test of=/dev/null bs=1m count=16	# repeat several times
    dd if=test of=/dev/null bs=1m count=32	# etc...

    You will find that read performance will drop in two significant
    places:  (1) When the data no longer fits in the buffer cache and
    the buffer cache is forced to teardown wirings and rewire other
    pages from the VM page cache.  Still no physical I/O is being done.
    (2) When the data no longer fits in the VM page cache and the system
    is forced to perform physical I/O.

    Its case (1) that you are manipulating with your patch, and as you can
    see it is entirely dependant on the number of wired pages that the 
    system is able to maintain in the buffer cache.

						-Matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message