FreeBSD Mail Archives

Date:      Mon, 22 Apr 2002 02:15:00 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        "Marc G. Fournier" <scrappy@hub.org>
Cc:        freebsd-current@freebsd.org, freebsd-stable@freebsd.org
Subject:   Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
Message-ID:  <3CC3D494.649C2A8E@mindspring.com>
References:  <20020421191440.J1721-100000@mail1.hub.org>

"Marc G. Fournier" wrote:
> On Sun, 21 Apr 2002, Terry Lambert wrote:
> > No, there's no stats collected on this stuff, because it's a pretty
> > obvious and straight-forward thing: you have to have a KVA space large
> > enough that, once you subtract out 4K for each 4M of physical memory and
> > swap (max 4G total for both), you end up with memory left over for the
> > kernel to use, and your limits are such that the you don't run out of
> > PTEs before you run out of mbufs (or whatever you plan on allocating).
> 
> God, I'm glad its straightforwards :)
> 
> Okay, first off, you say "(max 4G total for both)" ... do you max *total*
> between the two, or phy can be 4g *plus* swap can be 4g for a total of 8g?

You aren't going to be able to exceed 4G, no matter what you do,
because that's the limit of your address space.

If you want more, then you need to use a 64 bit processor (or use a
processor that supports bank selection, and hack up FreeBSD to do
bank swapping on 2G at a time, just like Linux has been hacked up,
and expect that it won't be very useful).

If you are swapping, you are demand paging.

The way demand paging works is that you reference a page that has
been swapped out, or for which physical memory backing store has
not been addigned.

When you make this reference, you get a page not present fault (a
trap 12).  The trap handler puts the faulting process to sleep,
and then starts the process of pulling the page in from backing
store (if it's not a create-on-reference), which, among other
things, locates a physical page to contain the copy of the data
pulled in from the backing store (or zero'ed out of physical memoy,
if it's an unbacked page, e.g. non-swappable, or swappable, but for
which swap has not yet been allocated, because it's the first use).

Only certain types of kernel memory are swappable -- mostly kernel
memory that's allocated on a per process basis.  Kernel swapping
really does you no good, if you have a fully populated physical
memory in the virtual address space, since there's only one kernel
virtual address space (SMP reserves a little bit of per processor
memory, but the amount is tiny: one page descriptor's worth: 4M);
after a certain point, your KVA is committed, and it's a mistake to
have it compete in the same LRU domain as processes.  You can't
really avoid that, for the most part, since there's a shared TLB
cache that you really don't have opportunity to manage, other than
by seperating 4M vs. 4K pages (and 2M, etc., for the Pentium Pro,
though variable page granularity is not supported in FreeBSD, since
it's not common to most hardware people actually have).

> For instance, right now, I have 3Gig of physical and ~3gig of swap
> allocated ...

Each process maintains its own virtual address space.  Almost all
of a process virtual address space is swappable.  So if you are
swapping, it's going to be process address space: UVA, not KVA.

If you increase the KVA, then you will decrease the UVA available to
user processes.  The total of the two can not exceed 4G.

With 4G of physical memory, then 3G of KVA is practically a
requirement, particularly if you intend to use the additional memory
for kernel data (you will have to, for PDE's: you have no choice).
For 3G, it's ~2.5G KVA minimally required.  Personally, I'd just
put it at 3G, and live with it, so you can throw in RAM to your limit
later, when you decide you need to throw RAM at some problem or other.
If you can't afford for the UVA to be as small as 1G, then you are
going to have to make some hard decisions on the amount of physical
RAM you put in the machine.

It's not really that bad: for 3G of KVA, you need 3M for PDE's.  The
problem comes when they are exhausted because of the amount of PDE's
you have lying around to describe UVA pages that are swapped out for
various processes, and for kernel memory requirements that go way up
when you crank up the kernel's ability to handle load (e.g. for network
equipment, I generally take half of physical memory for mbufs, mostly
because that's around the limit of what I can take, and have anything
left over).

That you are using System V shared memory segments is *REALLY* going to
hurt you; each of these shared memory segment comes out of the KVA, so
using shared memory segments with the shm*() calls, rather than using
mmap()'ed files as backing store, can eat huge chunks of KVA, as well
as fragmenting the KVA, particularly over time.

For more details on paged memory management on x86, see:

	Protected Mode Software Architecture

and:

	The Indispensible PC Hardware Book

You might also want to find a book on bootstrapping protected mode
operating systems (actually, I have yet to find a very good one,
so post about it, if you find one).

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3CC3D494.649C2A8E>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation