From owner-freebsd-current Mon Apr 22 5: 0:22 2002 Delivered-To: freebsd-current@freebsd.org Received: from soulshock.mail.pas.earthlink.net (soulshock.mail.pas.earthlink.net [207.217.120.130]) by hub.freebsd.org (Postfix) with ESMTP id D440237B430; Mon, 22 Apr 2002 05:00:05 -0700 (PDT) Received: from snipe.prod.itd.earthlink.net (snipe.mail.pas.earthlink.net [207.217.120.62]) by soulshock.mail.pas.earthlink.net (8.11.6+Sun/8.11.6) with ESMTP id g3M9Fqw27585; Mon, 22 Apr 2002 02:15:52 -0700 (PDT) Received: from pool0030.cvx21-bradley.dialup.earthlink.net ([209.179.192.30] helo=mindspring.com) by snipe.prod.itd.earthlink.net with esmtp (Exim 3.33 #2) id 16zZuy-0006r1-00; Mon, 22 Apr 2002 02:15:28 -0700 Message-ID: <3CC3D494.649C2A8E@mindspring.com> Date: Mon, 22 Apr 2002 02:15:00 -0700 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: "Marc G. Fournier" Cc: freebsd-current@freebsd.org, freebsd-stable@freebsd.org Subject: Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ? References: <20020421191440.J1721-100000@mail1.hub.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG "Marc G. Fournier" wrote: > On Sun, 21 Apr 2002, Terry Lambert wrote: > > No, there's no stats collected on this stuff, because it's a pretty > > obvious and straight-forward thing: you have to have a KVA space large > > enough that, once you subtract out 4K for each 4M of physical memory and > > swap (max 4G total for both), you end up with memory left over for the > > kernel to use, and your limits are such that the you don't run out of > > PTEs before you run out of mbufs (or whatever you plan on allocating). > > God, I'm glad its straightforwards :) > > Okay, first off, you say "(max 4G total for both)" ... do you max *total* > between the two, or phy can be 4g *plus* swap can be 4g for a total of 8g? You aren't going to be able to exceed 4G, no matter what you do, because that's the limit of your address space. If you want more, then you need to use a 64 bit processor (or use a processor that supports bank selection, and hack up FreeBSD to do bank swapping on 2G at a time, just like Linux has been hacked up, and expect that it won't be very useful). If you are swapping, you are demand paging. The way demand paging works is that you reference a page that has been swapped out, or for which physical memory backing store has not been addigned. When you make this reference, you get a page not present fault (a trap 12). The trap handler puts the faulting process to sleep, and then starts the process of pulling the page in from backing store (if it's not a create-on-reference), which, among other things, locates a physical page to contain the copy of the data pulled in from the backing store (or zero'ed out of physical memoy, if it's an unbacked page, e.g. non-swappable, or swappable, but for which swap has not yet been allocated, because it's the first use). Only certain types of kernel memory are swappable -- mostly kernel memory that's allocated on a per process basis. Kernel swapping really does you no good, if you have a fully populated physical memory in the virtual address space, since there's only one kernel virtual address space (SMP reserves a little bit of per processor memory, but the amount is tiny: one page descriptor's worth: 4M); after a certain point, your KVA is committed, and it's a mistake to have it compete in the same LRU domain as processes. You can't really avoid that, for the most part, since there's a shared TLB cache that you really don't have opportunity to manage, other than by seperating 4M vs. 4K pages (and 2M, etc., for the Pentium Pro, though variable page granularity is not supported in FreeBSD, since it's not common to most hardware people actually have). > For instance, right now, I have 3Gig of physical and ~3gig of swap > allocated ... Each process maintains its own virtual address space. Almost all of a process virtual address space is swappable. So if you are swapping, it's going to be process address space: UVA, not KVA. If you increase the KVA, then you will decrease the UVA available to user processes. The total of the two can not exceed 4G. With 4G of physical memory, then 3G of KVA is practically a requirement, particularly if you intend to use the additional memory for kernel data (you will have to, for PDE's: you have no choice). For 3G, it's ~2.5G KVA minimally required. Personally, I'd just put it at 3G, and live with it, so you can throw in RAM to your limit later, when you decide you need to throw RAM at some problem or other. If you can't afford for the UVA to be as small as 1G, then you are going to have to make some hard decisions on the amount of physical RAM you put in the machine. It's not really that bad: for 3G of KVA, you need 3M for PDE's. The problem comes when they are exhausted because of the amount of PDE's you have lying around to describe UVA pages that are swapped out for various processes, and for kernel memory requirements that go way up when you crank up the kernel's ability to handle load (e.g. for network equipment, I generally take half of physical memory for mbufs, mostly because that's around the limit of what I can take, and have anything left over). That you are using System V shared memory segments is *REALLY* going to hurt you; each of these shared memory segment comes out of the KVA, so using shared memory segments with the shm*() calls, rather than using mmap()'ed files as backing store, can eat huge chunks of KVA, as well as fragmenting the KVA, particularly over time. For more details on paged memory management on x86, see: Protected Mode Software Architecture and: The Indispensible PC Hardware Book You might also want to find a book on bootstrapping protected mode operating systems (actually, I have yet to find a very good one, so post about it, if you find one). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message