From owner-freebsd-arch@FreeBSD.ORG  Sun Jan  4 03:11:59 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 52E7916A4CE; Sun,  4 Jan 2004 03:11:59 -0800 (PST)
Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 0281243D39; Sun,  4 Jan 2004 03:11:56 -0800 (PST)
	(envelope-from bde@zeta.org.au)
Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246])
	by mailman.zeta.org.au (8.9.3p2/8.8.7) with ESMTP id WAA25172;
	Sun, 4 Jan 2004 22:11:52 +1100
Date: Sun, 4 Jan 2004 22:11:51 +1100 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: Scott Long <scottl@freebsd.org>
In-Reply-To: <3FF7BD89.4080406@freebsd.org>
Message-ID: <20040104211704.O582@gamplex.bde.org>
References: <20040103.153644.107852018.imp@bsdimp.com>
	<3FF7967A.1090401@freebsd.org><3FF7BD89.4080406@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: arch@freebsd.org
Subject: Re: Simple patch: Make DFLTPHYS an option
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 04 Jan 2004 11:11:59 -0000


On Sun, 4 Jan 2004, Scott Long wrote:

> Bruce Evans wrote:
> > On Sat, 3 Jan 2004, Scott Long wrote:
> >
> >>The key, though, is to ensure that the block system is actually
> >>honoring the per-device disk.d_maxsize variable.  I'm not sure if it is
> >>right now.
> >
> > It at least used to work (up to MAXPHYS).  The ad driver used a max
> > i/o size of 128K until recently.  This has rotted back to 64K for some
> > reason (64K is encoded as DFLTPHYS in the non-dma case and as 64 * 1024
> > in the dma case).
>
> I've seen evidence lately that this might be broken, but I need to track
> it down further.

Do you mean sizes other than DFLTPHYS or the ad driver?  For ad, I
remember seeing the commit that reduced the size, but I couldn't find
it easily.  It seems to have been just the big ATAng commit.

I don't know of any problems with i/o size maxes different from the
defaults except for the one in spec_getpages().  I/O sizes of up to
(VM_INITIAL_PAGEIN * PAGE_SIZE) bytes must work for disk devices, since
spec_getpages() doesn't honor dev->si_iosize_max.  This value accidentally
defaults to the same value as DFLTPHYS on machines with 4K pages and
to the same value as MAXPHYS on machines with 8K pages.  Thus the
"maximum" given by dev->si_iosize_max cannot actually be the maximum
on any machine if it is < DFLTPHYS, and the usual default of DFLTPHYS
is never the actual maximum on non-broken machines with 8K pages.  Most
disk drivers handle this by splitting up large i/o's into smaller ones
internally.  physio() does similar splitting (based on si_iosize_max).
So si_iosize_max is not very useful for disks.  physio() would do better
just to split up based on MAXPHYS (since large sizes only occur if the
user requests them).  Clustering may benefit from using a smaller size
(since a smaller size may actually be better and users can't control it).
physio() needs si_iosize_max mainly to avoid wrong splitting for
non-disk devices (mainly tapes).

> >>Also, increasing MAXPHYS will lead to your KVA being chewed up quite
> >>quickly, which in turn will lead to unpleasant panics.  A lot of work
> >>needs to go in to fixing this; increasing the value here has little
> >>value even to people who shun seatbelts.
> >
> > Not all that quicky.  MAXPHYS affects mainly pbufs, and there are a
> > limited number of them (256 max?), and their kva is statically allocated.
> > 256 times the current MAXPHYS gives 16M.  This could easily be increased
> > by a factor of up to about 8 without necesarily breaking things (e.g.,
> > by stealing 112MB from buffer kva using VM_BCACHE_SIZE if the default
> > normal-buffer kva size is large (if it is small then there should be
> > space to spare, else there would be no space to spare on systems with
> > more RAM so that the buffer kva size is larger).
>
> VFS, softupdates, UFS_DIRHASH, etc, all contribute to KVA being eaten
> faster than it used to be.

Don't use them then :-).  (I mostly don't.)

> Even with smarter tuning of common culprits
> like maxvnodes, KVA is still under a lot of pressure.

This depends on the memory size.  I use VM_BCACHE_SIZE = 512M and have
no problems fitting everything else in the remaining 512M - <kernel size>
on a machine with 1GB.  With more physical memory, it becomes harder
to fit everything in without kludges.  (The default BKVASIZE and
VM_BCACHE_SIZE are already kludged to take 1/4 as much space as they
should, although it this is not necessary on machines with not much
physical memory or more than KVA than i386's have.)

Bruce