From owner-freebsd-hackers Tue Jun 25 13:10:52 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from albatross.prod.itd.earthlink.net (albatross.mail.pas.earthlink.net [207.217.120.120]) by hub.freebsd.org (Postfix) with ESMTP id 9D88F37B6FD for ; Tue, 25 Jun 2002 13:09:09 -0700 (PDT) Received: from pool0300.cvx21-bradley.dialup.earthlink.net ([209.179.193.45] helo=mindspring.com) by albatross.prod.itd.earthlink.net with esmtp (Exim 3.33 #2) id 17MwcT-0003mZ-00; Tue, 25 Jun 2002 13:08:57 -0700 Message-ID: <3D18CDB2.151978F3@mindspring.com> Date: Tue, 25 Jun 2002 13:08:18 -0700 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Alfred Perlstein Cc: Patrick Thomas , freebsd-hackers@freebsd.org Subject: Re: tunings for many httpds... References: <20020624151650.I68572-100000@utility.clubscholarship.com> <3D17D27A.11E82B2B@mindspring.com> <20020625022238.GH53232@elvis.mu.org> <3D17DBC1.351A8A35@mindspring.com> <20020625072509.GJ53232@elvis.mu.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Alfred Perlstein wrote: > > > You keep saying this but the backing object allocated for sysvshm > > > is taken from either an OBJT_PHYS or OBJT_SWAP object. > > > > Uh, it's only ever an OBJT_SWAP; see line 532 of kern/sysv_shm.c. > > Your sources seem to be really out of date... Yes. I was referring to FreeBSD 4.4; looking at FreeBSD 4.5 shows that OBJT_PHYS is supported there, though it's off by default. > > > At what point does it eat KVA that is other than for the backing > > > data structures? > > > > It eats address space, not RAM. And even if the mappings are not > > active (which they usually are, because of LRU and processes > > accessing them shared), the pages containing the page table entries > > for each process are themselves not swappable; anything with a > > large VSZ is going to eat 1/4k pages in KVA there, too. > > > > Ask yourself where a shared memory segment lives when it's not in > > attached to one process address space, prior to you ipcrm'ing it. > > It has to remain referenced so it isn't reclaimed. > > Yes, but not mapped into the kernel's address space right? right??? Not for OBJT_PHYS, it seems: * Note: PG_UNMANAGED (used by OBJT_PHYS) indicates that the page is * not under PV management but otherwise should be treated as a * normal page. Pages not under PV management cannot be paged out * via the object/vm_page_t because there is no knowledge of their * pte mappings, nor can they be removed from their objects via * the object, and such pages are also not on any PQ queue. Looks like it just eats physical memory. If you look at the commit message on version 1.48, you'll see that without this option specified by the user, it eats KVA space, since it eats KVM. The OBJT_PHYS was added specifically to support not eating KVA space (by Peter, for Oracle, according to the comment). I guess he could try: sysctl -w kern.ipc.shm_use_phys=1 To set the shared memory behaviour away from the default, to make the postgres leave that 64M of KVA alone. If the siezeing problem is a result of running out of KVA space for mappings, rather than out of physical RAM, this could recover some for him. [ God, if it takes this long to arrive at all the tunables, life is really going to suck... 8-) ] The disadvantage seems to be that it eats real memory, and still takes mappings on a per process basis out of KVA space, but it's a factor of 1024 below the overhead without the option. The fact that it's soaking up real memory in a non-pageable way seems to mean that it should not be on by default (it isn't), but it's an interesting optimization for large databases on machines with a lot of physical RAM. It's tempting to precreate mappings for all of KVA space. THat would really change the dynamics of some of the recent problems; among other things, it should make interrupt allocations easier, no matter what the allocator (the zone interrupt allocation works by preassigning the KVA space to the zone; the fundamental thing it does is establish mappings for it... if you had preexisting mappings for all of KVA, then you would not have to dedicate the space to a particular zone at boot time, you could reallocate it at runtime, which would mean that a maxfiles change could really increase the number of network connections, rather than just pretending to do so). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message