From owner-freebsd-commit  Thu Oct 26 14:33:33 1995
Return-Path: owner-commit
Received: (from root@localhost)
          by freefall.freebsd.org (8.6.12/8.6.6) id OAA27692
          for freebsd-commit-outgoing; Thu, 26 Oct 1995 14:33:33 -0700
Received: (from root@localhost)
          by freefall.freebsd.org (8.6.12/8.6.6) id OAA27674
          for cvs-all-outgoing; Thu, 26 Oct 1995 14:33:28 -0700
Received: (from root@localhost)
          by freefall.freebsd.org (8.6.12/8.6.6) id OAA27663
          for cvs-sys-outgoing; Thu, 26 Oct 1995 14:33:24 -0700
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211])
          by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id OAA27633
          ; Thu, 26 Oct 1995 14:32:54 -0700
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id OAA21688; Thu, 26 Oct 1995 14:24:07 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199510262124.OAA21688@phaeton.artisoft.com>
Subject: Re: SYSCALL IDEAS [Was: cvs commit: src/sys/kern sysv_msg.c sysv_sem.c sysv_shm.c]
To: dyson@freefall.freebsd.org (John Dyson)
Date: Thu, 26 Oct 1995 14:24:07 -0700 (MST)
Cc: terry@lambert.org, bde@zeta.org.au, CVS-commiters@freefall.freebsd.org,
        bde@freefall.freebsd.org, cvs-sys@freefall.freebsd.org,
        hackers@freebsd.org, swallace@ece.uci.edu
In-Reply-To: <199510261356.GAA06844@freefall.freebsd.org> from "John Dyson" at Oct 26, 95 06:56:19 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 4035      
Sender: owner-commit@freebsd.org
Precedence: bulk

> The VM limitations are due to extreme performance hits if we have to
> go to a (long long) type representation of a page.  I propose that  we
> (in fact we will) move from page offsets to page indexes.  For a 32 bit
> machine, it gives us at least 8TB (and probably more with sign extension
> fixes) for file/filesystem sizes.  For a 64 bit machine it would give us
> even more (by a factor of 4,000,000).  As far as the API is concerned,
> we can use whatever we want -- long, long long, or whatever.  We will
> just have that terrible, limiting capability of supporting only 8TB files on
> 32 bit machines with 4K pagesizes.

I think we have to work in the scope of device block addressing, which is
2^31 * 512 or 1TB, as opposed to page addressing at 4TB.  The 8TB figure,
I think, represents the value without the use of the "improved" UFS
indirect block handling that came with 4.4.

There are a number of ideas (like page anonymity based page protection)
which can't be implemented without a large statistical range for the
hash -- typically larger than 32 bits with machine memory sizes running
into the 100's of MB.

For the Alpha, at least (a nominally 64bit machine), the address range
for real memory is restricted to less than 64 bits.

The problem that is arising in all these cases is the buffer cache
mapping of file data for large ranges requiring a linear relationship
throughout the file instead of a smaller linear "window" onto the
file.

This would require a domain/range based offset + length mapping, allowing
multiple mapping windows per file to address the issue completely
satisfactorily for 64 bit block and file offsets on 32 bit Intel
architectures.

Such an approach would not have the "long long" drawbacks that would
otherwise be introduced, though there would be *some* (lesser) overhead
involved.

> We haven't worked on the sign extension problems, because simply we do not
> support files > 4GB (or is it 2GB???) period right now.  I don't believe that
> there is a problem with block devices (we do NOT use vmio for those.)  But
> additionally we do not support mmaping them right now.

It's 2G of file, 1T of file system at present, with a single 32 bit
sector offset for a max of 8G based on the dos partitioning and disklabel
issues.  Ie: a very big partition is allowable, but it must start in
the 0-8G range, and disklabellimits the length.

I'd like to address the 2G file size problem.  The 2G limit currently
makes the use of quad off_t's in our internal interfaces a laughable
and gratuitous barrier to source compatability with legacy code (the
only kind we have, unless you know about a commercial venture that I
don't).

> The changes have been so vast that there has been significant ugliness
> added to the code.  That is being worked on, and I suggest that if there
> are some architectural problems that you see -- 'corrected' code would be
> helpful.  Note also some sort of performance analysis and architectural
> impact review is desirable.  All I can say is that I spent nearly a year
> working on the most horrible OS code that I ever saw -- SVR4, and I don't
> want us to go down the low performance path that they did.  They got both
> the hackery and low performance.   At least we are working on cleaning up
> the hackery aspects, including that which was inherited from Mach (because
> of the differences in the philosophy -- Mach VM and the original BSD port
> was certainly interesting.)

I agree with all of this.

It seems that the most interesting places to work are the boundries,
and right now the 2G file size limit is one of those, at least for
me.

At the very least, I'd like to see the system limits on VM mappable
space go away as part of the necessary changes.

Has any consideration been made to pulling in the NetBSD non-vmio based
changes, or to making the vmio/non-vmio switch a bit smoother and less
intrusive?


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.