From owner-freebsd-arch  Mon Feb  5 13:54:17 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP
	id 8F99237B401; Mon,  5 Feb 2001 13:53:56 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id f15LpnV12266;
	Mon, 5 Feb 2001 13:51:49 -0800 (PST)
Date: Mon, 5 Feb 2001 13:51:49 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc: "Justin T. Gibbs" <gibbs@scsiguy.com>,
	Randell Jesup <rjesup@wgate.com>,
	Matt Dillon <dillon@earth.backplane.com>,
	Matthew Jacob <mjacob@feral.com>, Mike Smith <msmith@FreeBSD.ORG>,
	Dag-Erling Smorgrav <des@ofug.org>,
	Dan Nelson <dnelson@emsphone.com>,
	Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386)
Message-ID: <20010205135149.G26076@fw.wintelcom.net>
References: <20010205132152.E26076@fw.wintelcom.net> <28962.981408816@critter>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <28962.981408816@critter>; from phk@critter.freebsd.dk on Mon, Feb 05, 2001 at 10:33:36PM +0100
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Poul-Henning Kamp <phk@critter.freebsd.dk> [010205 13:33] wrote:
> 
> >You're right, it's non-trivial, however the difference between
> >memory and disk speed is also non-trivial, almost every reasonable
> >algorithm should be considered to reduce/optimize disk traffic.
> >
> >A simple call into the VFS should be able to accomplish, afaik when
> >a VFS has a disk/physical backing it also hashes/sorts bufs based
> >on physicall backing location.  Although I may be remebering stuff
> >from 4.3BSD or 4.4BSD instead of the current code...
> 
> It's not "a simple call".
> 
> By the time you can make the call, you have passed through the
> target FS, through specfs and the disklabel/slice code, possibly
> through a layer like vinum and ccd (which may have their own ideas
> about clustering) and only then do you arrive at a place where you
> know the actual sector address of the request.
> 
> We can quickly dismiss the ccd/vinum case by saying that they
> have to cater for the needs of the lower devices, and they
> specify the clustering policy "like any other disk".
> 
> But you still have to contend with the diskslice/label code, and
> specfs, so even if you do an "upcall" and find more stuff you can
> read/write, you need to pass this bit of the request down through
> the specfs (for softupdates rollback/forward) and diskslice/label
> code (because you want boundary checking).
> 
> And having tried that, I can say with 100% conviction: that is not
> an sane option, and if you do it anyway you will certainly not
> gain any performance by the time you have resolved all the locking
> issues.

Well, my impression was that all locking operation (except mutexes)
should be resolved by doing try_lockfoo() and if try_lock fails then
don't cluster that object/buf/vnode (as the current code does).

You are right though, I guess we don't need callbacks into the VFS,
this can be resolved with just the buffer system via flags and locks.

> Giving some kind of abstract hint from the driver/device and making
> the clustering optional for the driver is the only path which does
> not lead straight down to layering insanity.

I'm not sure I understand what you mean, my vision of the current
code is:

  Kernel IO request triggered via FS/bufdeamon/etc
      | 1 buf
  cluster_foo
      | 1-N bufs (in a pbuf)
    device
      |
     write

What I'd like to see (considering we don't need to really involve
VFS) is:

  Kernel IO request triggered via FS/bufdeamon/etc
      | 1 buf
     device  ---------> cluster routine (A)
      |                        /
     device  <----------------/
      |          1-N bufs (linked list, no pbuf)
     write

This way the device can call into any number of generic clustering
routines if it wants to support them.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message