From owner-freebsd-hackers Wed Jun 26 0:58:43 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 59B6637B405 for ; Wed, 26 Jun 2002 00:58:26 -0700 (PDT) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.3/8.12.3) with ESMTP id g5Q7wQl1019101; Wed, 26 Jun 2002 00:58:26 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.3/8.12.3/Submit) id g5Q7wPgZ019100; Wed, 26 Jun 2002 00:58:25 -0700 (PDT) (envelope-from dillon) Date: Wed, 26 Jun 2002 00:58:25 -0700 (PDT) From: Matthew Dillon Message-Id: <200206260758.g5Q7wPgZ019100@apollo.backplane.com> To: Terry Lambert Cc: Peter Wemm , Alfred Perlstein , Patrick Thomas , freebsd-hackers@FreeBSD.ORG Subject: Re: tunings for many httpds... References: <20020625222632.B7C7D3811@overcee.wemm.org> <3D1970E7.697D4A49@mindspring.com> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Hmm. I'm fairly sure that Linux does not quite do it that way. I believe the 2-level page tables are copy-on-write, but that only gives you shareability across a fork() and then only for a little while. I'm fairly certain that Linux cannot share page tables for post-fork modifications (like when you mmap() or get a SysV shared segment). The rmap patches are roughly equivalent to our i386 pmap code and allow Rik to implement page queues and proper page aging. -Matt Matthew Dillon :Peter Wemm wrote: :> > Even more importantly it would be nice if we could share compatible :> > pmap pages, then we would have no need for 4MB pages... 50 mappings :> > of the same shared memory segment would wind up using the same pmap :> > pages as if only one mapping had been made. Such a feature would work :> > for SysV shared memory and for mmap()s. I've looked at doing this :> > off and on for two years but do not have a sufficient chunk of time :> > available yet. :> :> SVR4/Solaris/Digital Unix^H^H^H^H^H^HTru64 do this by having an additional :> layer between VM and pmap. The equivalent of our pmap is just another one :> of the address space handlers. The SHM stuff etc is often implemented such :> that it grabbed blocks of 4MB address space to manage in a way that it :> likes. This means it constructs its own page tables etc in such a way that :> they are suitable for common use. *If* I recall correctly, in SunOS/SVR4/ :> Solaris parlance this is the segment layer. Naturally there is quite a bit :> of variation. It has been a long long time since I looked at this. : :Linux 2.4 has this with their "rmap" patch. Alan Cox compares the :VM system performance as "similar to what I see with FreeBSD". : :They are coming from a perspective of sharing all page mappings :by pointing them at the same entries, without a reverse lookup :mechanism (this is what the "rmap" patches add, the reverse lookup; :linux has always shared equivalent page mappings). : :The reverse lookup maintains a linked list (for some reason, this :is 12 bytes -- don't know why yet) that is a list of the PTE references :to the mapping. So the reverse means going backwards and doing a :linear list traversal if the pages are shared (they usually are, for :code pages for any program that's running more than one instance). : :For page waits, they use a shared hash, and then wake up processes :unnecessarily, but they expect the contention to be minimal (they :estimate 4-8% overhead under extreme load with quantum at 100ms). : :Doing this in FreeBSD would probably confuse the heck out of the :exiting page discard code's LRU determination (among other things), :but it's probably worth it, for the cases you've mentioned. I think :the extra overhead in the unloaded case is in the noise, and in the :loaded case, well worth the trade. : :I don't know how the PAE code you were rumored to be doing stands; :if there were plans to put the PTE's for the process in the bank :with the program pages that were running there, then doing this :might prevent that from working very well, if those entries had to :be shared with entries in another bank. : :-- Terry : To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message