From owner-freebsd-current Mon Oct 28 0:55: 9 2002 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 21CD637B401; Mon, 28 Oct 2002 00:55:07 -0800 (PST) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9C62E43E42; Mon, 28 Oct 2002 00:55:06 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.5/8.12.5) with ESMTP id g9S8svFC094313; Mon, 28 Oct 2002 00:54:57 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.5/8.12.5/Submit) id g9S8svSr094312; Mon, 28 Oct 2002 00:54:57 -0800 (PST) (envelope-from dillon) Date: Mon, 28 Oct 2002 00:54:57 -0800 (PST) From: Matthew Dillon Message-Id: <200210280854.g9S8svSr094312@apollo.backplane.com> To: Jeff Roberson Cc: Seigo Tanimura , Bruce Evans , , Subject: Re: Dynamic growth of the buffer and buffer page reclaim References: <20021023163758.R22147-100000@mail.chesapeake.net> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :I was going to comment on fragmentation issues, but that seems to have :been very well covered. I would like to point out that removing the :buffer_map not only contributes to kernel map fragmentation, but also :contention for the kernel map. It might also prevent us from removing :giant from the kernel map because it would add another interrupt time :consumer. Yes. Whatever the case any sort of temporary KVA mapping management system would need its own submap. It would be insane to use the kernel_map or kmem_map for this. In regards to Seigo's patch: The scaleability issue is entirely related to the KVA mapping portion of the buffer cache. Only I/O *WRITE* performance is specifically limited by the size of the buffer_map, due to the limited number of dirty buffers allowed in the map. This in turn is a restriction required by filesystems which must keep track of 'dirty' buffers in order to sequence out writes. Currently the only way around this limitation is to use mmap/MAP_NOSYNC. In otherwords, we support dirty VM pages that are not associated with the buffer cache but most of the filesystem algorithms are still based around the assumption that dirty pages will be mapped into dirty buffers. I/O *READ* caching is limited only by the VM Page cache. The reason you got slightly better numbers with your patch has nothing to do with I/O performance, it is simply related to the cost of the buffer instantiations and teardowns that occur in the limit buffer_map space mapping pages out of the VM page cache. Since you could have more buffers, there were fewer instantiations and teardowns. It's that simple. Unfortunately, this performance gain is *DIRECTLY* tied to the number of pages wired into the buffer cache. It is precisely the wired pages portion of the instantiation and teardown that eats the extra cpu. So the moment you regulate the number of wired pages in the system, you will blow the performance you are getting. I can demonstrate the issue with a simple test. Create a large file with dd, larger then physical memory: dd if=/dev/zero of=test bs=1m count=4096 # create a 4G file. Then dd (read) portions of the file and observe the performance. Do this several times to get stable numbers. dd if=test of=/dev/null bs=1m count=16 # repeat several times dd if=test of=/dev/null bs=1m count=32 # etc... You will find that read performance will drop in two significant places: (1) When the data no longer fits in the buffer cache and the buffer cache is forced to teardown wirings and rewire other pages from the VM page cache. Still no physical I/O is being done. (2) When the data no longer fits in the VM page cache and the system is forced to perform physical I/O. Its case (1) that you are manipulating with your patch, and as you can see it is entirely dependant on the number of wired pages that the system is able to maintain in the buffer cache. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message