From owner-freebsd-arch  Mon Feb  5 12:47:44 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP
	id 8327237B491; Mon,  5 Feb 2001 12:47:25 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id f15Kl7S09686;
	Mon, 5 Feb 2001 12:47:07 -0800 (PST)
Date: Mon, 5 Feb 2001 12:47:07 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: "Justin T. Gibbs" <gibbs@scsiguy.com>
Cc: Randell Jesup <rjesup@wgate.com>,
	Matt Dillon <dillon@earth.backplane.com>,
	Matthew Jacob <mjacob@feral.com>, Mike Smith <msmith@FreeBSD.ORG>,
	Dag-Erling Smorgrav <des@ofug.org>,
	Dan Nelson <dnelson@emsphone.com>,
	Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386)
Message-ID: <20010205124707.Y26076@fw.wintelcom.net>
References: <ybuelxdnik5.fsf@jesup.eng.tvol.net.jesup.eng.tvol.net> <200102052006.f15K6bO49659@aslan.scsiguy.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <200102052006.f15K6bO49659@aslan.scsiguy.com>; from gibbs@scsiguy.com on Mon, Feb 05, 2001 at 01:06:37PM -0700
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Justin T. Gibbs <gibbs@scsiguy.com> [010205 12:08] wrote:
> >>    (2) Modify the 'struct buf' b_pages[] array to instead be a pointer
> >>	to an array.  Include the original static array under another name
> >>	for compatibility purposes and have the init code default to 
> >>	assigning b_pages to the original embedded static array.
> >>
> >>	Then the physio code could be adjusted to dynamically MALLOC the
> >>	necessary pages array if the static one in the supplied buffer is
> >>	insufficient.
> >
> >        So, how reasonable is this?  It seems like a pretty good solution,
> >but I'm far from up-to-speed on the internals here.
> 
> I'd rather allow bufs (or bios) to be chained and let the block devices
> decide how to break them up.  This simplifies the clustering code too
> as you avoid all of the VM operations to combine bufs into a single cluster
> buf.

One of the suggestions that Poul-Henning made was to have the device
somehow specify an optimal clustering strategy, being able to specify
bounds and sizes.

For instance an NFS commit request could be megabytes in size,
while a NFS write may not want any clustering at all.

A RAID request might want to ask for a megabyte of data, but have
it in a range on the device level.

Currently (i think) we only cluster based on logical file offsets,
it would be interesting to allow drivers to do callbacks into the
FS to ask for blocks physically adjacent to the blocks being written.

This is because a 64k block of any file may actually be spread out
across any position, even though UFS tries to reduce fragmentation,
the worse case is that we do the vm ops to cluster non-physically
contiguous blocks.

I think the simplest way to do this would be to rip out the current
clustering code and provide helper routines for the devices to get
adjacent blocks, either logically via VOP or physically via some VFS
mechanism.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message