From owner-freebsd-arch Mon Feb 5 13:54:17 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 8F99237B401; Mon, 5 Feb 2001 13:53:56 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id f15LpnV12266; Mon, 5 Feb 2001 13:51:49 -0800 (PST) Date: Mon, 5 Feb 2001 13:51:49 -0800 From: Alfred Perlstein To: Poul-Henning Kamp Cc: "Justin T. Gibbs" , Randell Jesup , Matt Dillon , Matthew Jacob , Mike Smith , Dag-Erling Smorgrav , Dan Nelson , Seigo Tanimura , arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) Message-ID: <20010205135149.G26076@fw.wintelcom.net> References: <20010205132152.E26076@fw.wintelcom.net> <28962.981408816@critter> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <28962.981408816@critter>; from phk@critter.freebsd.dk on Mon, Feb 05, 2001 at 10:33:36PM +0100 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Poul-Henning Kamp [010205 13:33] wrote: > > >You're right, it's non-trivial, however the difference between > >memory and disk speed is also non-trivial, almost every reasonable > >algorithm should be considered to reduce/optimize disk traffic. > > > >A simple call into the VFS should be able to accomplish, afaik when > >a VFS has a disk/physical backing it also hashes/sorts bufs based > >on physicall backing location. Although I may be remebering stuff > >from 4.3BSD or 4.4BSD instead of the current code... > > It's not "a simple call". > > By the time you can make the call, you have passed through the > target FS, through specfs and the disklabel/slice code, possibly > through a layer like vinum and ccd (which may have their own ideas > about clustering) and only then do you arrive at a place where you > know the actual sector address of the request. > > We can quickly dismiss the ccd/vinum case by saying that they > have to cater for the needs of the lower devices, and they > specify the clustering policy "like any other disk". > > But you still have to contend with the diskslice/label code, and > specfs, so even if you do an "upcall" and find more stuff you can > read/write, you need to pass this bit of the request down through > the specfs (for softupdates rollback/forward) and diskslice/label > code (because you want boundary checking). > > And having tried that, I can say with 100% conviction: that is not > an sane option, and if you do it anyway you will certainly not > gain any performance by the time you have resolved all the locking > issues. Well, my impression was that all locking operation (except mutexes) should be resolved by doing try_lockfoo() and if try_lock fails then don't cluster that object/buf/vnode (as the current code does). You are right though, I guess we don't need callbacks into the VFS, this can be resolved with just the buffer system via flags and locks. > Giving some kind of abstract hint from the driver/device and making > the clustering optional for the driver is the only path which does > not lead straight down to layering insanity. I'm not sure I understand what you mean, my vision of the current code is: Kernel IO request triggered via FS/bufdeamon/etc | 1 buf cluster_foo | 1-N bufs (in a pbuf) device | write What I'd like to see (considering we don't need to really involve VFS) is: Kernel IO request triggered via FS/bufdeamon/etc | 1 buf device ---------> cluster routine (A) | / device <----------------/ | 1-N bufs (linked list, no pbuf) write This way the device can call into any number of generic clustering routines if it wants to support them. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message