Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 5 Feb 2001 13:51:49 -0800
From:      Alfred Perlstein <bright@wintelcom.net>
To:        Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc:        "Justin T. Gibbs" <gibbs@scsiguy.com>, Randell Jesup <rjesup@wgate.com>, Matt Dillon <dillon@earth.backplane.com>, Matthew Jacob <mjacob@feral.com>, Mike Smith <msmith@FreeBSD.ORG>, Dag-Erling Smorgrav <des@ofug.org>, Dan Nelson <dnelson@emsphone.com>, Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, arch@FreeBSD.ORG
Subject:   Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386)
Message-ID:  <20010205135149.G26076@fw.wintelcom.net>
In-Reply-To: <28962.981408816@critter>; from phk@critter.freebsd.dk on Mon, Feb 05, 2001 at 10:33:36PM %2B0100
References:  <20010205132152.E26076@fw.wintelcom.net> <28962.981408816@critter>

next in thread | previous in thread | raw e-mail | index | archive | help
* Poul-Henning Kamp <phk@critter.freebsd.dk> [010205 13:33] wrote:
> 
> >You're right, it's non-trivial, however the difference between
> >memory and disk speed is also non-trivial, almost every reasonable
> >algorithm should be considered to reduce/optimize disk traffic.
> >
> >A simple call into the VFS should be able to accomplish, afaik when
> >a VFS has a disk/physical backing it also hashes/sorts bufs based
> >on physicall backing location.  Although I may be remebering stuff
> >from 4.3BSD or 4.4BSD instead of the current code...
> 
> It's not "a simple call".
> 
> By the time you can make the call, you have passed through the
> target FS, through specfs and the disklabel/slice code, possibly
> through a layer like vinum and ccd (which may have their own ideas
> about clustering) and only then do you arrive at a place where you
> know the actual sector address of the request.
> 
> We can quickly dismiss the ccd/vinum case by saying that they
> have to cater for the needs of the lower devices, and they
> specify the clustering policy "like any other disk".
> 
> But you still have to contend with the diskslice/label code, and
> specfs, so even if you do an "upcall" and find more stuff you can
> read/write, you need to pass this bit of the request down through
> the specfs (for softupdates rollback/forward) and diskslice/label
> code (because you want boundary checking).
> 
> And having tried that, I can say with 100% conviction: that is not
> an sane option, and if you do it anyway you will certainly not
> gain any performance by the time you have resolved all the locking
> issues.

Well, my impression was that all locking operation (except mutexes)
should be resolved by doing try_lockfoo() and if try_lock fails then
don't cluster that object/buf/vnode (as the current code does).

You are right though, I guess we don't need callbacks into the VFS,
this can be resolved with just the buffer system via flags and locks.

> Giving some kind of abstract hint from the driver/device and making
> the clustering optional for the driver is the only path which does
> not lead straight down to layering insanity.

I'm not sure I understand what you mean, my vision of the current
code is:

  Kernel IO request triggered via FS/bufdeamon/etc
      | 1 buf
  cluster_foo
      | 1-N bufs (in a pbuf)
    device
      |
     write

What I'd like to see (considering we don't need to really involve
VFS) is:

  Kernel IO request triggered via FS/bufdeamon/etc
      | 1 buf
     device  ---------> cluster routine (A)
      |                        /
     device  <----------------/
      |          1-N bufs (linked list, no pbuf)
     write

This way the device can call into any number of generic clustering
routines if it wants to support them.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010205135149.G26076>