Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Mar 2000 11:15:45 -0800
From:      Alfred Perlstein <bright@wintelcom.net>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        Poul-Henning Kamp <phk@critter.freebsd.dk>, current@FreeBSD.ORG
Subject:   Re: patches for test / review
Message-ID:  <20000320111544.A14789@fw.wintelcom.net>
In-Reply-To: <200003201736.JAA70124@apollo.backplane.com>; from dillon@apollo.backplane.com on Mon, Mar 20, 2000 at 09:36:22AM -0800
References:  <18039.953549289@critter.freebsd.dk> <200003201736.JAA70124@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
* Matthew Dillon <dillon@apollo.backplane.com> [000320 10:01] wrote:
> 
> :
> :
> :>    Kirk and I have already mapped out a plan to drastically update
> :>    the buffer cache API which will encapsulate much of the state within
> :>    the buffer cache module.
> :
> :Sounds good.  Combined with my stackable BIO plans that sounds like
> :a really great win for FreeBSD.
> :
> :--
> :Poul-Henning Kamp             FreeBSD coreteam member
> :phk@FreeBSD.ORG               "Real hackers run -current on their laptop."
> 
>     I think so.  I can give -current a quick synopsis of the plan but I've
>     probably forgotten some of the bits (note: the points below are not
>     in any particular order):

....

>     * Cleanup the buffer cache API (bread(), BUF_STRATEGY(), and so forth).
>       Specifically, split out the call functionality such that the buffer
>       cache can determine whether a buffer being obtained is going to be
>       used for reading or writing.  At the moment we don't know if the system
>       is going to dirty a buffer until after the fact and this has caused a
>       lot of pain in regards to dealing with low-memory situations.
> 
>       getblk() -> getblk_sh() and getblk_ex()
> 
> 	Obtain bp without issuing I/O, getting either a shared or exclusive
> 	lock on the bp.  With a shared lock you are allowed to issue READ
> 	I/O but you are not allowed to modify the contents of the buffer.
> 	With an exclusive lock you are allowed to issue both READ and WRITE
> 	I/O and you can modify the contents of the buffer.
> 
>       bread()  -> bread_sh() and bread_ex()
> 
> 	Obtain and validate (issue read I/O as appropriate) a bp.  bread_sh()
> 	allows a buffer to be accessed but not modified or rewritten.
> 	bread_ex() allows a buffer to be modified and written.

This seems to allow for expressing intent to write to buffers,
which would be an excellent place to cow the pages 'in software'
rather than obsd's way of using cow'd pages to accomplish the same
thing.

I'm not sure if you remeber what I brought up at BAFUG, but I'd
like to see something along the lines of BX_BKGRDWRITE that Kirk
is using for the bitmaps blocks in softupdates to be enabled on a
system wide basis.  That way rewriting data that has been sent to
the driver isn't blocked and at the same time we don't need to page
protect during every strategy call.

I may have misunderstood your intent, but using page protections
on each IO would seem to introduce a lot of performance issues that
the rest of these points are all trying to get rid of.

>       The idea for the buffer cache is to shift its functionality to one that
>       is solely used to issue device I/O and to keep track of dirty areas for
>       proper sequencing of I/O (e.g. softupdate's use of the buffer cache 
>       to placemark I/O will not change).  The core buffer cache code would
>       no longer map things to KVM with b_data, that functionality would be
>       shifted to the VM Object vm_pager_*() API.  The buffer cache would
>       continue to use the b_pages[] array mechanism to collect pages for I/O,
>       for clustering, and so forth.

Keeping the currect cluster code is a bad idea, if the drivers were
taught how to traverse the linked list in the buf struct rather
than just notice "a big buffer" we could avoid a lot of page
twiddling and also allow for massive IO clustering ( > 64k ) because
we won't be limited by the size of the b_pages[] array for our
upper bound on the amount of buffers we can issue effectively a
scatter/gather on (since the drivers must VTOPHYS them anyway).

To realize my "nfs super commit" stuff all we'd need to do is make
the max cluster size something like 0-1 and instantly get an almost
unbounded IO burst.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000320111544.A14789>