From owner-freebsd-performance@FreeBSD.ORG Sat Jun 21 23:45:09 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ECECE37B401 for ; Sat, 21 Jun 2003 23:45:09 -0700 (PDT) Received: from pop016.verizon.net (pop016pub.verizon.net [206.46.170.173]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1F25E43F3F for ; Sat, 21 Jun 2003 23:45:09 -0700 (PDT) (envelope-from cswiger@mac.com) Received: from mac.com ([141.149.47.46]) by pop016.verizon.net (InterMail vM.5.01.05.33 201-253-122-126-133-20030313) with ESMTP id <20030622064508.GPVF3199.pop016.verizon.net@mac.com>; Sun, 22 Jun 2003 01:45:08 -0500 Message-ID: <3EF55072.30104@mac.com> Date: Sun, 22 Jun 2003 02:45:06 -0400 From: Chuck Swiger Organization: The Courts of Chaos User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030612 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-performance@freebsd.org References: <20030621185821.30070.qmail@cr.yp.to> In-Reply-To: <20030621185821.30070.qmail@cr.yp.to> X-Enigmail-Version: 0.76.0.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Authentication-Info: Submitted using SMTP AUTH at pop016.verizon.net from [141.149.47.46] at Sun, 22 Jun 2003 01:45:07 -0500 cc: "D. J. Bernstein" Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Jun 2003 06:45:10 -0000 D. J. Bernstein wrote: [ ... ] > I don't care much about the 3-page text segment. But I do care about the > 39 extra pages of VM, and the 8 extra pages of DRAM. There's no obstacle > to having a small program fit into _one_ page per process; two or three > can be excused, but 39 is absurd. (Yes, I know that Solaris is worse.) Indeed-- Solaris insists that all programs be dynamicly linked; Sun claims that staticly linked programs violate the SPARC ABI. [ ... ] > Why doesn't the compiler automatically merge some bss into data when > that saves a page? Most executable object formats seem to prefer to page-align different sections, in VM if not on disk. I don't know ELF well enough to know whether there is an equivalent to the following from /usr/include/mach-o/loader.h, used by NEXTSTEP and MacOS X: * The file type MH_OBJECT is a compact format intended as output of the * assembler and input (and possibly output) of the link editor (the .o * format). All sections are in one unnamed segment with no segment padding. * This format is used as an executable format when the file is so small the * segment padding greatly increases its size. ...but this did (or used to do) the right thing with regard to merging data and bss onto a single page. > Why can't I omit exit(), manually or automatically, when it's unreachable? There's a Usenet thread around the following message-ID: http://www.google.com/groups?selm=aqbgk8%242v4j%241%40shot.codefab.com ...which might be interesting: ]In comp.sys.sun.misc Casper H.S. Dik wrote: ]> "Chuck Swiger" writes: ]>> Under Solaris, /bin/true and /bin/false are shell scripts rather than binary ]>> executables, but a minimal assembly implementation of these two programs ]>> would not need to perform any system calls or invoke any library routines at ]>> all, no? ]> ]> _exit()? ] ]A point. ] ]On the other hand, valid one-line C implementations-- int main() { return 0; ]}-- do not explicitly call exit() or _exit(). They simply return from ]subroutine to a system-provided stub, once called /lib/crt0.o, which was the ]thing that passed control to _main() in the first place. ] ]If Solaris will insist that one dynamicly links crt0.o to have the standard ]system _exit() symbol available, okay. However, nothing in the code above ]requires dynamic linking on systems which provide a static version of _exit. > Furthermore, malloc() appears to chew up a whole new page of DRAM for > each allocation, plus another page---is this counted in VSZ?---for an > anonymous mmap. Would it really be that difficult to fit 1076 bytes of > requested memory into the 3000-odd bytes available at the end of bss? Of course not, at least if you knew at compile time that one could use the equivalent of static arrays, rather than getting memory from a malloc() implementation. For that matter, if you insist upon using dynamic allocation and getting memory at runtime, I suspect that using alloca() would be a win. Hmm, yes: chuck 9426 0.0 0.0 168 44 p0 S+ 2:28AM 0:00.00 foo ...versus... chuck 9430 0.0 0.0 144 28 p0 S+ 2:28AM 0:00.00 foo2 > I sure hope that there's some better explanation for the remaining 32 > pages than ``Well, we decided to allocate 131072 bytes of memory for the > stack,'' especially when I'm hard-limiting the stack to 4K before exec. The malloc manpage has some information that may be relevant: < Reduce the size of the cache by a factor of two. The default cache size is 16 pages. This option can be specified multiple times. [ ... ] EXAMPLES To set a systemwide reduction of cache size, and to dump core whenever a problem occurs: ln -s 'A<' /etc/malloc.conf ...only, setting this option doesn't seem to make any difference to the memory behavior: 183-sec% truss foo readlink("/etc/malloc.conf","<<",63) = 2 (0x2) mmap(0x0,4096,0x3,0x1002,-1,0x0) = 671395840 (0x2804b000) break(0x804d000) = 0 (0x0) break(0x804e000) = 0 (0x0) break(0x804f000) = 0 (0x0) break(0x8050000) = 0 (0x0) break(0x8051000) = 0 (0x0) nanosleep(0xbfbffa68,0xbfbffa60) = 0 (0x0) exit(0x0) process exit, rval = 0 ...nor does setting the malloc flags to ">>" increase the VM usage. But a truss of foo2, using alloca() versus malloc(), shows that we don't get a PAGEZERO section mmap'ed in (!), and we don't invoke brk() to grow the malloc arena: 189-sec% truss foo2 nanosleep(0xbfbff618,0xbfbff610) = 0 (0x0) exit(0x0) process exit, rval = 0 -- -Chuck