From owner-freebsd-hackers  Fri Feb 22 12:43:39 2002
Delivered-To: freebsd-hackers@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by hub.freebsd.org (Postfix) with ESMTP id 5FA2F37B404
	for <hackers@FreeBSD.ORG>; Fri, 22 Feb 2002 12:43:33 -0800 (PST)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.11.6/8.9.1) id g1MKg4u22700;
	Fri, 22 Feb 2002 12:42:04 -0800 (PST)
	(envelope-from dillon)
Date: Fri, 22 Feb 2002 12:42:04 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200202222042.g1MKg4u22700@apollo.backplane.com>
To: Andrew Mobbs <andrewm@chiark.greenend.org.uk>
Cc: hackers@FreeBSD.ORG
Subject: Re2: msync performance
References:  <15478.31998.459219.178549@chiark.greenend.org.uk>
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG


:I recently raised PR 35195
:
:Details are in the PR, but in summary; performing a large amount of
:random IO to a large file through mmap, on a machine with a fair amount
:of free RAM, will cause a following msync to take a significant amount
:of time.
:
:I believe this is because msync walks the dirty buffer list by age,
:therefor will write blocks out in an order liable to cause a lot of
:disk seeks.
:
:My suggestion for a solution would be before starting the IO, to sort
:the dirty buffer list by location on logical disk, and coalesce
:adjacent blocks where possible.
:
:Before I volunteer to implement something like this, please could
:somebody check I'm correct in my analysis, and comment on the
:feasibility of my suggested solution.
:
:Thanks,
:
:-- 
:Andrew Mobbs - http://www.chiark.greenend.org.uk/~andrewm/

    I've looked at this some more.  I can fairly trivially improve
    sequential write efficiency of msync() is called on a range
    of dirty pages, and I can use the same code when msync() is
    called on a complete file *IF* the file is fairly small
    (no more then a hundred pages or so).

    But we have a serious problem when msync() is called on a
    very large file that may only contain a few dirty pages.
    For example, if you have a 20GB file and you are mmap()ing
    portions of it, we can't iterate through the file offsets
    sequentially without eating an enormous amount of cpu
    (as in several seconds worth of cpu or even several minutes).

    In this case we have to scan the object page list, which is
    not sorted.  Even so the existing msync() code *DOES*
    cluster pages together into 64K chunks (though I notice that it
    does not appear to cluster the raw I/O).

    So, this falls back to your suggested solution.... sort
    object->memq (it's the actual page queue that is the problem,
    not the object queue).  Looking at it some more I believe
    this may be a viable solution.  I am going to work something
    up.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message