From owner-freebsd-performance@FreeBSD.ORG Sun Jun 22 00:50:35 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 90B8437B401 for ; Sun, 22 Jun 2003 00:50:35 -0700 (PDT) Received: from svaha.com (svaha.com [64.46.156.67]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8817943FA3 for ; Sun, 22 Jun 2003 00:50:34 -0700 (PDT) (envelope-from meconlen@obfuscated.net) Received: from presa (24161248hfc18.tampabay.rr.com [24.161.248.18]) (AUTH: LOGIN meconlen) by svaha.com with esmtp; Sun, 22 Jun 2003 03:50:31 -0400 From: "Michael E. Conlen" To: "D. J. Bernstein" , freebsd-performance@freebsd.org Date: Sun, 22 Jun 2003 03:50:32 -0400 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.6604 (9.0.2911.0) In-Reply-To: <20030621185821.30070.qmail@cr.yp.to> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Importance: Normal Subject: RE: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Jun 2003 07:50:35 -0000 If your going to get this serious about your memory management, couldn't you just brk yourself and manage it your self? You seem to know exactly what your looking for and expect a specific result. I wouldn't recommend it to most, but you seem to know what your doing. -- Michael Conlen -----Original Message----- From: owner-freebsd-performance@freebsd.org [mailto:owner-freebsd-performance@freebsd.org]On Behalf Of D. J. Bernstein Sent: Saturday, June 21, 2003 2:58 PM To: freebsd-performance@freebsd.org Subject: ten thousand small processes FreeBSD 4.8. Test program: malloc(360); malloc(80); malloc(180); malloc(16); malloc(440); sleep(10); _exit(0). Compile statically. The program ends up with 44KB RSS. Where is all that DRAM going? The program also ends up with 168KB VSZ. Where is all that VM going? I don't care much about the 3-page text segment. But I do care about the 39 extra pages of VM, and the 8 extra pages of DRAM. There's no obstacle to having a small program fit into _one_ page per process; two or three can be excused, but 39 is absurd. (Yes, I know that Solaris is worse.) At least 2 pages appear to be wasted by exit(), because it brings in a chunk of stdio, which uses 84 bytes of data and 316 bytes of bss. The libc implementors clearly don't care about 316 bytes of memory, so why don't they make those 316 bytes static? Why doesn't the compiler automatically merge some bss into data when that saves a page? Why can't I omit exit(), manually or automatically, when it's unreachable? Furthermore, malloc() appears to chew up a whole new page of DRAM for each allocation, plus another page---is this counted in VSZ?---for an anonymous mmap. Would it really be that difficult to fit 1076 bytes of requested memory into the 3000-odd bytes available at the end of bss? I sure hope that there's some better explanation for the remaining 32 pages than ``Well, we decided to allocate 131072 bytes of memory for the stack,'' especially when I'm hard-limiting the stack to 4K before exec. ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago _______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" From owner-freebsd-performance@FreeBSD.ORG Sun Jun 22 02:20:25 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C7A6637B401 for ; Sun, 22 Jun 2003 02:20:25 -0700 (PDT) Received: from silver.he.iki.fi (silver.he.iki.fi [193.64.42.241]) by mx1.FreeBSD.org (Postfix) with ESMTP id C7E7A43F75 for ; Sun, 22 Jun 2003 02:20:23 -0700 (PDT) (envelope-from pete@he.iki.fi) Received: from PETEX31 (gprs-prointernet-3e47d663.mobile.inet.fi [62.71.214.99]) by silver.he.iki.fi (8.12.9/8.11.4) with SMTP id h5M9K9sL056425; Sun, 22 Jun 2003 12:20:13 +0300 (EEST) (envelope-from pete@he.iki.fi) Message-ID: <003001c3389f$7bb6a9f0$63d6473e@PETEX31> From: "Petri Helenius" To: "D. J. Bernstein" , References: <20030621185821.30070.qmail@cr.yp.to> Date: Sun, 22 Jun 2003 11:14:44 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Jun 2003 09:20:26 -0000 Have you looked at the malloc manpage and tuned the cache size; < Reduce the size of the cache by a factor of two. The default cache size is 16 pages. This option can be specified multiple times. > Double the size of the cache by a factor of two. The default cache size is 16 pages. This option can be specified multiple times. Pete ----- Original Message ----- From: "D. J. Bernstein" To: Sent: Saturday, June 21, 2003 8:58 PM Subject: ten thousand small processes > FreeBSD 4.8. Test program: malloc(360); malloc(80); malloc(180); > malloc(16); malloc(440); sleep(10); _exit(0). Compile statically. > > The program ends up with 44KB RSS. Where is all that DRAM going? The > program also ends up with 168KB VSZ. Where is all that VM going? > > I don't care much about the 3-page text segment. But I do care about the > 39 extra pages of VM, and the 8 extra pages of DRAM. There's no obstacle > to having a small program fit into _one_ page per process; two or three > can be excused, but 39 is absurd. (Yes, I know that Solaris is worse.) > > At least 2 pages appear to be wasted by exit(), because it brings in a > chunk of stdio, which uses 84 bytes of data and 316 bytes of bss. The > libc implementors clearly don't care about 316 bytes of memory, so why > don't they make those 316 bytes static? Why doesn't the compiler > automatically merge some bss into data when that saves a page? Why can't > I omit exit(), manually or automatically, when it's unreachable? > > Furthermore, malloc() appears to chew up a whole new page of DRAM for > each allocation, plus another page---is this counted in VSZ?---for an > anonymous mmap. Would it really be that difficult to fit 1076 bytes of > requested memory into the 3000-odd bytes available at the end of bss? > > I sure hope that there's some better explanation for the remaining 32 > pages than ``Well, we decided to allocate 131072 bytes of memory for the > stack,'' especially when I'm hard-limiting the stack to 4K before exec. > > ---D. J. Bernstein, Associate Professor, Department of Mathematics, > Statistics, and Computer Science, University of Illinois at Chicago > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" > From owner-freebsd-performance@FreeBSD.ORG Sun Jun 22 04:47:32 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3581037B401 for ; Sun, 22 Jun 2003 04:47:31 -0700 (PDT) Received: from mta2.adelphia.net (mta2.adelphia.net [64.8.50.178]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6753043F75 for ; Sun, 22 Jun 2003 04:47:30 -0700 (PDT) (envelope-from Support@Netflag.Net) Received: from nfn2.Netflag.Net ([68.69.240.78]) by mta2.adelphia.net (InterMail vM.5.01.05.32 201-253-122-126-132-20030307) with ESMTP id <20030622114715.UJWJ1359.mta2.adelphia.net@nfn2.Netflag.Net> for ; Sun, 22 Jun 2003 07:47:15 -0400 Message-Id: <5.2.0.9.2.20030622044421.0225be30@pop.dc3.adelphia.net> X-Sender: pedramn@pop.dc3.adelphia.net X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9 Date: Sun, 22 Jun 2003 04:45:39 -0700 To: freebsd-performance@freebsd.org From: Pedram Nimreezi In-Reply-To: References: <20030621185821.30070.qmail@cr.yp.to> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Subject: RE: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Jun 2003 11:47:32 -0000 I'm sure Professor Bernstein wouldn't have a problem doing so. However, I think he's implying a gross oversight... At 03:50 AM 6/22/2003 -0400, Michael E. Conlen wrote: >If your going to get this serious about your memory management, couldn't you >just brk yourself and manage it your self? You seem to know exactly what >your looking for and expect a specific result. I wouldn't recommend it to >most, but you seem to know what your doing. > >-- >Michael Conlen > >-----Original Message----- >From: owner-freebsd-performance@freebsd.org >[mailto:owner-freebsd-performance@freebsd.org]On Behalf Of D. J. >Bernstein >Sent: Saturday, June 21, 2003 2:58 PM >To: freebsd-performance@freebsd.org >Subject: ten thousand small processes > > >FreeBSD 4.8. Test program: malloc(360); malloc(80); malloc(180); >malloc(16); malloc(440); sleep(10); _exit(0). Compile statically. > >The program ends up with 44KB RSS. Where is all that DRAM going? The >program also ends up with 168KB VSZ. Where is all that VM going? > >I don't care much about the 3-page text segment. But I do care about the >39 extra pages of VM, and the 8 extra pages of DRAM. There's no obstacle >to having a small program fit into _one_ page per process; two or three >can be excused, but 39 is absurd. (Yes, I know that Solaris is worse.) > >At least 2 pages appear to be wasted by exit(), because it brings in a >chunk of stdio, which uses 84 bytes of data and 316 bytes of bss. The >libc implementors clearly don't care about 316 bytes of memory, so why >don't they make those 316 bytes static? Why doesn't the compiler >automatically merge some bss into data when that saves a page? Why can't >I omit exit(), manually or automatically, when it's unreachable? > >Furthermore, malloc() appears to chew up a whole new page of DRAM for >each allocation, plus another page---is this counted in VSZ?---for an >anonymous mmap. Would it really be that difficult to fit 1076 bytes of >requested memory into the 3000-odd bytes available at the end of bss? > >I sure hope that there's some better explanation for the remaining 32 >pages than ``Well, we decided to allocate 131072 bytes of memory for the >stack,'' especially when I'm hard-limiting the stack to 4K before exec. > >---D. J. Bernstein, Associate Professor, Department of Mathematics, >Statistics, and Computer Science, University of Illinois at Chicago >_______________________________________________ >freebsd-performance@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-performance >To unsubscribe, send any mail to >"freebsd-performance-unsubscribe@freebsd.org" > >_______________________________________________ >freebsd-performance@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-performance >To unsubscribe, send any mail to >"freebsd-performance-unsubscribe@freebsd.org" From owner-freebsd-performance@FreeBSD.ORG Sat Jun 21 13:02:27 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E0BB737B401 for ; Sat, 21 Jun 2003 13:02:27 -0700 (PDT) Received: from dragon.nuxi.com (trang.nuxi.com [66.93.134.19]) by mx1.FreeBSD.org (Postfix) with ESMTP id 40E7A43F85 for ; Sat, 21 Jun 2003 13:02:27 -0700 (PDT) (envelope-from obrien@NUXI.com) Received: from dragon.nuxi.com (obrien@localhost [127.0.0.1]) by dragon.nuxi.com (8.12.9/8.12.9) with ESMTP id h5LK27Oc081531; Sat, 21 Jun 2003 13:02:11 -0700 (PDT) (envelope-from obrien@dragon.nuxi.com) Received: (from obrien@localhost) by dragon.nuxi.com (8.12.9/8.12.9/Submit) id h5LK27ps081530; Sat, 21 Jun 2003 13:02:07 -0700 (PDT) Date: Sat, 21 Jun 2003 13:02:07 -0700 From: "David O'Brien" To: "D. J. Bernstein" Message-ID: <20030621200207.GA76101@dragon.nuxi.com> References: <20030621185821.30070.qmail@cr.yp.to> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030621185821.30070.qmail@cr.yp.to> User-Agent: Mutt/1.4i X-Operating-System: FreeBSD 5.1-BETA Organization: The NUXI BSD Group X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3 90 76 5D 69 58 D9 98 7A X-Pgp-Rsa-Keyid: 1024/34F9F9D5 X-Mailman-Approved-At: Sun, 22 Jun 2003 08:20:25 -0700 cc: freebsd-performance@freebsd.org Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: freebsd-performance@freebsd.org List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 21 Jun 2003 20:02:28 -0000 On Sat, Jun 21, 2003 at 06:58:21PM -0000, D. J. Bernstein wrote: > Why doesn't the compiler > automatically merge some bss into data when that saves a page? That is a question for gcc@gcc.gnu.org. We use a 99% stock GCC 3.2.2. From owner-freebsd-performance@FreeBSD.ORG Sun Jun 22 10:17:03 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 935D637B401 for ; Sun, 22 Jun 2003 10:17:03 -0700 (PDT) Received: from mailout11.sul.t-online.com (mailout11.sul.t-online.com [194.25.134.85]) by mx1.FreeBSD.org (Postfix) with ESMTP id BEB0143F3F for ; Sun, 22 Jun 2003 10:17:02 -0700 (PDT) (envelope-from Alexander@Leidinger.net) Received: from fwd11.aul.t-online.de by mailout11.sul.t-online.com with smtp id 19U8SS-0005Rg-09; Sun, 22 Jun 2003 19:16:52 +0200 Received: from Andro-Beta.Leidinger.net (EG4M2OZQQeqY+eq23bwbZFXspcebhf5Fcxu9gebtdm0ZEV-sduc-EV@[80.131.120.204]) by fmrl11.sul.t-online.com with esmtp id 19U8SK-07Pllw0; Sun, 22 Jun 2003 19:16:44 +0200 Received: from Magelan.Leidinger.net (Magelan [192.168.1.1]) h5MHGhI8022752; Sun, 22 Jun 2003 19:16:43 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from Magelan.Leidinger.net (netchild@localhost [127.0.0.1]) by Magelan.Leidinger.net (8.12.9/8.12.9) with SMTP id h5MHGglm055981; Sun, 22 Jun 2003 19:16:42 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Date: Sun, 22 Jun 2003 19:16:42 +0200 From: Alexander Leidinger To: Eric Anderson Message-Id: <20030622191642.3f3532bd.Alexander@Leidinger.net> In-Reply-To: <3EF1AE85.2080505@centtech.com> References: <741132145421.20030619172208@nojabrsk.ru> <3EF1AE85.2080505@centtech.com> X-Mailer: Sylpheed version 0.8.10claws (GTK+ 1.2.10; i386-portbld-freebsd5.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Seen: false X-ID: EG4M2OZQQeqY+eq23bwbZFXspcebhf5Fcxu9gebtdm0ZEV-sduc-EV@t-dialin.net cc: s_porotnikov@nojabrsk.ru cc: freebsd-performance@freebsd.org Subject: Re: freebsd-performance Digest, Vol 4, Issue 9 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Jun 2003 17:17:03 -0000 On Thu, 19 Jun 2003 07:37:25 -0500 Eric Anderson wrote: > Interestingly enough, I found FreeBSD 4.8 to be faster at local disk > writes than FreeBSD 5.1. About 50% faster. 5.1 isn't tuned yet to deliver high performance, so it isn't a surprise to see a lower performance than 4.x. But 50% is much... are you sure you did the test under the same conditions (e.g. are you sure the write cache was off (or on) in both cases, no debugging switches activated in the kernel/userland)? Bye, Alexander. -- Yes, I've heard of "decaf." What's your point? http://www.Leidinger.net Alexander @ Leidinger.net GPG fingerprint = C518 BC70 E67F 143F BE91 3365 79E2 9C60 B006 3FE7 From owner-freebsd-performance@FreeBSD.ORG Sun Jun 22 11:01:02 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7F90537B401 for ; Sun, 22 Jun 2003 11:01:02 -0700 (PDT) Received: from meitner.wh.uni-dortmund.de (meitner.wh.uni-dortmund.de [129.217.129.133]) by mx1.FreeBSD.org (Postfix) with ESMTP id C3C0943FCB for ; Sun, 22 Jun 2003 11:01:01 -0700 (PDT) (envelope-from michaelnottebrock@gmx.net) Received: from lofi.dyndns.org (pc2-105.intern.meitner [10.3.12.105]) by meitner.wh.uni-dortmund.de (Postfix) with ESMTP id E6CCB167627; Sun, 22 Jun 2003 20:01:00 +0200 (CEST) Received: from kiste.my.domain (kiste.my.domain [192.168.8.4]) (authenticated bits=0) by lofi.dyndns.org (8.12.9/8.12.9) with ESMTP id h5MI0xVc061271 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Sun, 22 Jun 2003 20:01:00 +0200 (CEST) (envelope-from michaelnottebrock@gmx.net) From: Michael Nottebrock To: freebsd-performance@freebsd.org, "D. J. Bernstein" Date: Sun, 22 Jun 2003 20:00:59 +0200 User-Agent: KMail/1.5.2 References: <20030621185821.30070.qmail@cr.yp.to> <20030621200207.GA76101@dragon.nuxi.com> In-Reply-To: <20030621200207.GA76101@dragon.nuxi.com> MIME-Version: 1.0 Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="Boundary-02=_b7e9+wbj8OpRrfX"; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200306222000.59352.michaelnottebrock@gmx.net> X-Virus-Scanned: by amavisd-new cc: freebsd-performance@freebsd.org Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Jun 2003 18:01:02 -0000 --Boundary-02=_b7e9+wbj8OpRrfX Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Description: signed data Content-Disposition: inline On Saturday 21 June 2003 22:02, David O'Brien wrote: > That is a question for gcc@gcc.gnu.org. We use a 99% stock GCC 3.2.2. On 4.8? :) =2D-=20 Michael Nottebrock \ KDE on FreeBSD \ ,ww =20 \ --- \ ,wWWCybaWW_)=20 \ http://freebsd.kde.org \ `WSheepW' free \ II II node --Boundary-02=_b7e9+wbj8OpRrfX Content-Type: application/pgp-signature Content-Description: signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (FreeBSD) iD8DBQA+9e7bXhc68WspdLARAh6bAKCIKSOBEUaB9weMoeP8gNDlPewoPgCfZCM/ 1KFOR0aIBAh2/BM5usTlKu0= =m/2T -----END PGP SIGNATURE----- --Boundary-02=_b7e9+wbj8OpRrfX-- From owner-freebsd-performance@FreeBSD.ORG Sun Jun 22 19:24:20 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 37B0737B401 for ; Sun, 22 Jun 2003 19:24:20 -0700 (PDT) Received: from dragon.nuxi.com (trang.nuxi.com [66.93.134.19]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9AA6C43FBF for ; Sun, 22 Jun 2003 19:24:19 -0700 (PDT) (envelope-from obrien@NUXI.com) Received: from dragon.nuxi.com (obrien@localhost [127.0.0.1]) by dragon.nuxi.com (8.12.9/8.12.9) with ESMTP id h5N2OIaQ053056; Sun, 22 Jun 2003 19:24:18 -0700 (PDT) (envelope-from obrien@dragon.nuxi.com) Received: (from obrien@localhost) by dragon.nuxi.com (8.12.9/8.12.9/Submit) id h5N2OHEa053055; Sun, 22 Jun 2003 19:24:17 -0700 (PDT) Date: Sun, 22 Jun 2003 19:24:17 -0700 From: "David O'Brien" To: Michael Nottebrock Message-ID: <20030623022417.GC52619@dragon.nuxi.com> References: <20030621185821.30070.qmail@cr.yp.to> <20030621200207.GA76101@dragon.nuxi.com> <200306222000.59352.michaelnottebrock@gmx.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200306222000.59352.michaelnottebrock@gmx.net> User-Agent: Mutt/1.4i X-Operating-System: FreeBSD 5.1-CURRENT Organization: The NUXI BSD Group X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3 90 76 5D 69 58 D9 98 7A X-Pgp-Rsa-Keyid: 1024/34F9F9D5 X-Mailman-Approved-At: Sun, 22 Jun 2003 19:40:33 -0700 cc: freebsd-performance@freebsd.org Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: freebsd-performance@freebsd.org List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Jun 2003 02:24:20 -0000 On Sun, Jun 22, 2003 at 08:00:59PM +0200, Michael Nottebrock wrote: Content-Description: signed data > On Saturday 21 June 2003 22:02, David O'Brien wrote: > > > That is a question for gcc@gcc.gnu.org. We use a 99% stock GCC 3.2.2. > > On 4.8? :) We use a 97% stock GCC 2.95.3. So it is still a question for gcc@gcc.gnu.org. -- -- David (obrien@FreeBSD.org) From owner-freebsd-performance@FreeBSD.ORG Sun Jun 22 19:25:53 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9174237B401 for ; Sun, 22 Jun 2003 19:25:53 -0700 (PDT) Received: from dragon.nuxi.com (trang.nuxi.com [66.93.134.19]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0437E43F75 for ; Sun, 22 Jun 2003 19:25:53 -0700 (PDT) (envelope-from obrien@NUXI.com) Received: from dragon.nuxi.com (obrien@localhost [127.0.0.1]) by dragon.nuxi.com (8.12.9/8.12.9) with ESMTP id h5N2PqaQ053073; Sun, 22 Jun 2003 19:25:52 -0700 (PDT) (envelope-from obrien@dragon.nuxi.com) Received: (from obrien@localhost) by dragon.nuxi.com (8.12.9/8.12.9/Submit) id h5N2PqtE053072; Sun, 22 Jun 2003 19:25:52 -0700 (PDT) Date: Sun, 22 Jun 2003 19:25:52 -0700 From: "David O'Brien" To: Chuck Swiger Message-ID: <20030623022552.GD52619@dragon.nuxi.com> References: <20030621185821.30070.qmail@cr.yp.to> <3EF55072.30104@mac.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3EF55072.30104@mac.com> User-Agent: Mutt/1.4i X-Operating-System: FreeBSD 5.1-CURRENT Organization: The NUXI BSD Group X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3 90 76 5D 69 58 D9 98 7A X-Pgp-Rsa-Keyid: 1024/34F9F9D5 X-Mailman-Approved-At: Sun, 22 Jun 2003 19:40:33 -0700 cc: freebsd-performance@freebsd.org Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: freebsd-performance@freebsd.org List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Jun 2003 02:25:53 -0000 On Sun, Jun 22, 2003 at 02:45:06AM -0400, Chuck Swiger wrote: > D. J. Bernstein wrote: > [ ... ] > >I don't care much about the 3-page text segment. But I do care about the > >39 extra pages of VM, and the 8 extra pages of DRAM. There's no obstacle > >to having a small program fit into _one_ page per process; two or three > >can be excused, but 39 is absurd. (Yes, I know that Solaris is worse.) > > Indeed-- Solaris insists that all programs be dynamicly linked; Sun claims > that > staticly linked programs violate the SPARC ABI. No, staticly linked programs viloate the general ELF (System V) ABI (also known as gABI). -- -- David (obrien@FreeBSD.org) From owner-freebsd-performance@FreeBSD.ORG Sun Jun 22 19:59:32 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C329437B401 for ; Sun, 22 Jun 2003 19:59:32 -0700 (PDT) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by mx1.FreeBSD.org (Postfix) with SMTP id D354D43F3F for ; Sun, 22 Jun 2003 19:59:31 -0700 (PDT) (envelope-from djb-dsn-1056337204.40078@cr.yp.to) Received: (qmail 40079 invoked by uid 1017); 23 Jun 2003 03:00:04 -0000 Date: 23 Jun 2003 03:00:04 -0000 Message-ID: <20030623030004.40078.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: freebsd-performance@freebsd.org References: <20030621185821.30070.qmail@cr.yp.to> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Jun 2003 02:59:33 -0000 alloca() is inadequate. Memory allocation doesn't always follow a stack pattern. I certainly can't predict memory use at the top of main(). sbrk() is dangerous. Suppose a system call that I'm using turns into a library routine that calls malloc()---this has happened to poll() and to socket(), so it could happen again. Suppose the malloc() implementor assumes that the memory he gets from sbrk() is contiguous. Kaboom. (If the sbrk() manual page said that this was the malloc() implementor's fault, maybe I'd risk it. But the manual page clearly says I shouldn't. ``Duh, why would anyone want to use more than one allocator library?'') Even if I get malloc() under control somehow, I'll still have 30-odd unexplained VM pages going down the tubes. Is this, in fact, the stack? Why doesn't it start at 4K? ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-freebsd-performance@FreeBSD.ORG Mon Jun 23 05:58:53 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F232D37B401 for ; Mon, 23 Jun 2003 05:58:52 -0700 (PDT) Received: from otter3.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id F0DC543FE0 for ; Mon, 23 Jun 2003 05:58:51 -0700 (PDT) (envelope-from anderson@centtech.com) Received: from centtech.com (electron.centtech.com [204.177.173.173]) by otter3.centtech.com (8.12.3/8.12.3) with ESMTP id h5NCwn56040304; Mon, 23 Jun 2003 07:58:49 -0500 (CDT) (envelope-from anderson@centtech.com) Message-ID: <3EF6F979.3020103@centtech.com> Date: Mon, 23 Jun 2003 07:58:33 -0500 From: Eric Anderson User-Agent: Mozilla/5.0 (X11; U; Linux i386; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Alexander Leidinger References: <741132145421.20030619172208@nojabrsk.ru> <3EF1AE85.2080505@centtech.com> <20030622191642.3f3532bd.Alexander@Leidinger.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: s_porotnikov@nojabrsk.ru cc: freebsd-performance@freebsd.org Subject: Re: freebsd-performance Digest, Vol 4, Issue 9 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Jun 2003 12:58:53 -0000 Alexander Leidinger wrote: > On Thu, 19 Jun 2003 07:37:25 -0500 > Eric Anderson wrote: > > >>Interestingly enough, I found FreeBSD 4.8 to be faster at local disk >>writes than FreeBSD 5.1. About 50% faster. > > > 5.1 isn't tuned yet to deliver high performance, so it isn't a surprise > to see a lower performance than 4.x. But 50% is much... are you sure you > did the test under the same conditions (e.g. are you sure the write > cache was off (or on) in both cases, no debugging switches activated in > the kernel/userland)? I attempted to create the same environment, but I didn't force anything to be the same. In other words, what I did was swap the boot drive out from 4.8 to 5.1 and back again, each time testing the differences in speed. Same RAIDed partitions were tested, so it was tested on ufs, with the same tests, and consistant results. I didn't turn on/off write caching on either, so if the defaults are different between them, that could be the problem.. Eric -- ------------------------------------------------------------------ Eric Anderson Systems Administrator Centaur Technology Attitudes are contagious, is yours worth catching? ------------------------------------------------------------------ From owner-freebsd-performance@FreeBSD.ORG Mon Jun 23 16:27:24 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D23A837B401 for ; Mon, 23 Jun 2003 16:27:24 -0700 (PDT) Received: from mail.chesapeake.net (chesapeake.net [208.142.252.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id D798943F3F for ; Mon, 23 Jun 2003 16:27:23 -0700 (PDT) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id h5NNRM505840; Mon, 23 Jun 2003 19:27:22 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Mon, 23 Jun 2003 19:27:22 -0400 (EDT) From: Jeff Roberson To: "D. J. Bernstein" In-Reply-To: <20030621185821.30070.qmail@cr.yp.to> Message-ID: <20030623190019.G36168-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-performance@freebsd.org Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Jun 2003 23:27:25 -0000 On 21 Jun 2003, D. J. Bernstein wrote: > FreeBSD 4.8. Test program: malloc(360); malloc(80); malloc(180); > malloc(16); malloc(440); sleep(10); _exit(0). Compile statically. > > The program ends up with 44KB RSS. Where is all that DRAM going? The > program also ends up with 168KB VSZ. Where is all that VM going? > > I don't care much about the 3-page text segment. But I do care about the > 39 extra pages of VM, and the 8 extra pages of DRAM. There's no obstacle > to having a small program fit into _one_ page per process; two or three > can be excused, but 39 is absurd. (Yes, I know that Solaris is worse.) Even small programs need page tables. On x86 unix you need at least pages for page tables for any process, if I'm counting correctly. One for the page directory, one page table for text, data, bss, and the guard page, and one page table page for stack. 32 of those 'pages of VM' are your initial stack size. They don't really consume any resources other than preventing anyone else from allocating overlapping pages. It's just the initial upper limit on the stack map which is allowed to grow. I haven't looked closely enough to find out what the other 7 might be. There is an obstacle to having a small program fit into one page. Actually, a significant one. First of all, you need protections on different sections of the actual executable image. Text must be read only since it is shared. Data is read write and bss is read-write. BSS is a pseudo section and not actually mapped from the file. Text and data both can be paged in from the binary in a demand paged system such as freebsd. Data can not be written out to its backing object and neither can text. Text can be shared while data changes are private and so the two must be placed in seperate pages. This topic is explored quite well in any modern operating systems book. I suggest you pick up "The Design and Implementation of the 4.4BSD Operating System". It is a little out dated but provides a good introduction to these topics. I really didn't do them justice with this paragraph. If demand paging, shared libraries, and the like are not suited for your problem perhaps you should look at an embedded operating system? Or DOS even. Cheers, Jeff > > At least 2 pages appear to be wasted by exit(), because it brings in a > chunk of stdio, which uses 84 bytes of data and 316 bytes of bss. The > libc implementors clearly don't care about 316 bytes of memory, so why > don't they make those 316 bytes static? Why doesn't the compiler > automatically merge some bss into data when that saves a page? Why can't > I omit exit(), manually or automatically, when it's unreachable? > > Furthermore, malloc() appears to chew up a whole new page of DRAM for > each allocation, plus another page---is this counted in VSZ?---for an > anonymous mmap. Would it really be that difficult to fit 1076 bytes of > requested memory into the 3000-odd bytes available at the end of bss? > > I sure hope that there's some better explanation for the remaining 32 > pages than ``Well, we decided to allocate 131072 bytes of memory for the > stack,'' especially when I'm hard-limiting the stack to 4K before exec. > > ---D. J. Bernstein, Associate Professor, Department of Mathematics, > Statistics, and Computer Science, University of Illinois at Chicago > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" > From owner-freebsd-performance@FreeBSD.ORG Mon Jun 23 16:29:14 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2C17337B401 for ; Mon, 23 Jun 2003 16:29:13 -0700 (PDT) Received: from mail.chesapeake.net (chesapeake.net [208.142.252.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6E90D43FA3 for ; Mon, 23 Jun 2003 16:29:12 -0700 (PDT) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id h5NNTBm07045; Mon, 23 Jun 2003 19:29:11 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Mon, 23 Jun 2003 19:29:11 -0400 (EDT) From: Jeff Roberson To: "D. J. Bernstein" In-Reply-To: <20030623190019.G36168-100000@mail.chesapeake.net> Message-ID: <20030623192818.A36168-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-performance@freebsd.org Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Jun 2003 23:29:14 -0000 On Mon, 23 Jun 2003, Jeff Roberson wrote: > On 21 Jun 2003, D. J. Bernstein wrote: > > > FreeBSD 4.8. Test program: malloc(360); malloc(80); malloc(180); > > malloc(16); malloc(440); sleep(10); _exit(0). Compile statically. > > > > The program ends up with 44KB RSS. Where is all that DRAM going? The > > program also ends up with 168KB VSZ. Where is all that VM going? > > > > I don't care much about the 3-page text segment. But I do care about the > > 39 extra pages of VM, and the 8 extra pages of DRAM. There's no obstacle > > to having a small program fit into _one_ page per process; two or three > > can be excused, but 39 is absurd. (Yes, I know that Solaris is worse.) > The following should read "you need at least three pages" > Even small programs need page tables. On x86 unix you need at least > pages for page tables for any process, if I'm counting correctly. One for > the page directory, one page table for text, data, bss, and the guard > page, and one page table page for stack. > > 32 of those 'pages of VM' are your initial stack size. They don't really > consume any resources other than preventing anyone else from allocating > overlapping pages. It's just the initial upper limit on the stack map > which is allowed to grow. I haven't looked closely enough to find out > what the other 7 might be. > > There is an obstacle to having a small program fit into one page. > Actually, a significant one. First of all, you need protections on > different sections of the actual executable image. Text must be read only > since it is shared. Data is read write and bss is read-write. BSS is a > pseudo section and not actually mapped from the file. Text and data both > can be paged in from the binary in a demand paged system such as freebsd. > Data can not be written out to its backing object and neither can text. > Text can be shared while data changes are private and so the two must be > placed in seperate pages. This topic is explored quite well in any modern > operating systems book. I suggest you pick up "The Design and > Implementation of the 4.4BSD Operating System". It is a little out dated > but provides a good introduction to these topics. I really didn't do them > justice with this paragraph. > > If demand paging, shared libraries, and the like are not suited for your > problem perhaps you should look at an embedded operating system? Or DOS > even. > > Cheers, > Jeff > > > > > At least 2 pages appear to be wasted by exit(), because it brings in a > > chunk of stdio, which uses 84 bytes of data and 316 bytes of bss. The > > libc implementors clearly don't care about 316 bytes of memory, so why > > don't they make those 316 bytes static? Why doesn't the compiler > > automatically merge some bss into data when that saves a page? Why can't > > I omit exit(), manually or automatically, when it's unreachable? > > > > Furthermore, malloc() appears to chew up a whole new page of DRAM for > > each allocation, plus another page---is this counted in VSZ?---for an > > anonymous mmap. Would it really be that difficult to fit 1076 bytes of > > requested memory into the 3000-odd bytes available at the end of bss? > > > > I sure hope that there's some better explanation for the remaining 32 > > pages than ``Well, we decided to allocate 131072 bytes of memory for the > > stack,'' especially when I'm hard-limiting the stack to 4K before exec. > > > > ---D. J. Bernstein, Associate Professor, Department of Mathematics, > > Statistics, and Computer Science, University of Illinois at Chicago > > _______________________________________________ > > freebsd-performance@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > > To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" > > > > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" > From owner-freebsd-performance@FreeBSD.ORG Tue Jun 24 17:45:27 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3DA4B37B401 for ; Tue, 24 Jun 2003 17:45:27 -0700 (PDT) Received: from mail.chesapeake.net (chesapeake.net [208.142.252.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id 682D643FE0 for ; Tue, 24 Jun 2003 17:45:26 -0700 (PDT) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id h5P0jO138436; Tue, 24 Jun 2003 20:45:24 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Tue, 24 Jun 2003 20:45:24 -0400 (EDT) From: Jeff Roberson To: "D. J. Bernstein" In-Reply-To: <20030623030004.40078.qmail@cr.yp.to> Message-ID: <20030624203536.D17881-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-performance@freebsd.org Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jun 2003 00:45:27 -0000 I understand you have some history with the project. As such I believe you deserve a more thorough answer. More replies inline. On 23 Jun 2003, D. J. Bernstein wrote: > alloca() is inadequate. Memory allocation doesn't always follow a stack > pattern. I certainly can't predict memory use at the top of main(). > > sbrk() is dangerous. Suppose a system call that I'm using turns into a > library routine that calls malloc()---this has happened to poll() and to > socket(), so it could happen again. Suppose the malloc() implementor > assumes that the memory he gets from sbrk() is contiguous. Kaboom. poll() does not call malloc in libc or libc_r. Socket only calls malloc() in libc_r. libthr does not call malloc in either case. I'm not sure about libkse. Could you clarify this point? I agree, I wouldn't use sbrk() either. I'd use malloc(). If you have a small number of fixed allocations declaring them in the bss for maximum compaction is an alternative as well. No fragmentation, no allocator overhead, etc. > (If the sbrk() manual page said that this was the malloc() implementor's > fault, maybe I'd risk it. But the manual page clearly says I shouldn't. > ``Duh, why would anyone want to use more than one allocator library?'') I'm not really sure why they would. It seems like it would lead to increaed ineffeciencies and fragmentation. In the kernel we keep moving towards fewer allocators not more for this exact reason. > Even if I get malloc() under control somehow, I'll still have 30-odd > unexplained VM pages going down the tubes. Is this, in fact, the stack? > Why doesn't it start at 4K? > It does start at 4k of allocated pages. I think I discussed this in my other email. WIth regards to the other pages that have been alllocated to this process.. I remembered this today while thinking about another problem. FreeBSD does prefaults to map pages that are already available into an address space. This significantly reduces the number of faults in many applications due to spacial locality. This probably acounts for the memory used that has not actually been referenced. FreeBSD and ELF just aren't well suited to extremely tiny binaries. I hope this answers your questions. > ---D. J. Bernstein, Associate Professor, Department of Mathematics, > Statistics, and Computer Science, University of Illinois at Chicago > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" > From owner-freebsd-performance@FreeBSD.ORG Tue Jun 24 23:05:58 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 092C537B401 for ; Tue, 24 Jun 2003 23:05:58 -0700 (PDT) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by mx1.FreeBSD.org (Postfix) with SMTP id 5F1EB43F85 for ; Tue, 24 Jun 2003 23:05:57 -0700 (PDT) (envelope-from djb-dsn-1056521189.51087@cr.yp.to) Received: (qmail 51088 invoked by uid 1017); 25 Jun 2003 06:06:29 -0000 Date: 25 Jun 2003 06:06:29 -0000 Message-ID: <20030625060629.51087.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: freebsd-performance@freebsd.org References: <20030623030004.40078.qmail@cr.yp.to> <20030624203536.D17881-100000@mail.chesapeake.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jun 2003 06:05:58 -0000 As I said, I don't particularly care about the text segment. I'm not talking about ten thousand separate programs. Why does the memory manager keep the stack separate from data? Suppose a program has 1000 bytes of data+bss. You could organize VM as follows: 0x7fffac18 0x7fffb000 0x80000000 <---- stack data+bss text, say 5 pages heap ----> As long as the stack doesn't chew up more than 3096 bytes and the heap isn't used, there's just one page per process. As for page tables: Instead of allocating space for a bunch of nearly identical page tables, why not overlap page tables, with the changes copied on a process switch? As for 39 pages of VM, mostly stack: Can the system actually allocate 390000 pages of VM? I'm only mildly concerned with the memory-management time; what bothers me is the loss of valuable address space. I hope that this 128-kilobyte stack carelessness doesn't reflect a general policy of dishonest VM allocation (``overcommitment''); I need to be able to preallocate memory with proper error detection, so that I can guarantee the success of subsequent operations. As for malloc()'s careless use of memory: Is it really asking so much that a single malloc(1) not be expanded by a factor of 16384? Here's a really easy way to improve malloc(). Apparently, right now, there's no use of the space between the initial brk and the next page boundary. Okay: allocate that space in the simplest possible way--- static wherewenormallystart = 0; static freebie; malloc(n) { if (!wherewenormallystart) { wherewenormallystart = rounduptopage(sbrk(0)); freebie = wherewenormallystart - rounduptoalign(sbrk(0)); } n = rounduptoalign(n); if (n < freebie) { freebie -= n; if (sbrk(0) <= wherewenormallystart) brk(wherewenormallystart - freebie); return wherewenormallystart - freebie - n; } do what we normally do; } free(x) { if (x < wherewenormallystart) return; do what we normally do; } ---with no waste of space and practically no waste of time. Maybe add 8192 to wherewenormallystart; this is lots of room for people who know how to write small programs, and the cost is unnoticeable for people who don't. (Quite a few of my programs simulate this effect by checking for space in a bss array, typically 2K. But setting aside the right amount of space would mean compiling, inspecting the brk alignment, and recompiling. I also feel bad chewing up space on systems where malloc() actually knows what it's doing.) As for the safety of writing code that makes malloc() fail horribly: After the Solaris treatment of BSD sockets, and the ``look, Ma, I can make an only-slightly-broken imitation of poll() using select()!'' epidemic, I don't trust OS distributors to reserve syscall names for actual syscalls. I encounter more than enough portability problems without going out of my way to look for them. ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-freebsd-performance@FreeBSD.ORG Wed Jun 25 00:06:57 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2400337B401 for ; Wed, 25 Jun 2003 00:06:57 -0700 (PDT) Received: from mail.chesapeake.net (chesapeake.net [208.142.252.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3E59743FFD for ; Wed, 25 Jun 2003 00:06:56 -0700 (PDT) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id h5P76sG43107; Wed, 25 Jun 2003 03:06:54 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Wed, 25 Jun 2003 03:06:54 -0400 (EDT) From: Jeff Roberson To: "D. J. Bernstein" In-Reply-To: <20030625060629.51087.qmail@cr.yp.to> Message-ID: <20030625023621.N17881-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-performance@freebsd.org Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jun 2003 07:06:57 -0000 On 25 Jun 2003, D. J. Bernstein wrote: > As I said, I don't particularly care about the text segment. I'm not > talking about ten thousand separate programs. > > Why does the memory manager keep the stack separate from data? Suppose a > program has 1000 bytes of data+bss. You could organize VM as follows: > > 0x7fffac18 0x7fffb000 0x80000000 > <---- stack data+bss text, say 5 pages heap ----> This is a layout that is chosen by some 64bit architectures. Alpha for example. The problem is that on alpha you have a LOT of address space and so it has many options for placing shared libraries. On x86 if you place them roughly in the middle you take space away from heap and stack equally. Furthermore, text is typically linked to run relative to address 0. This isn't up to the operating system. This is up to the tool chain and object format. In some cases it is up to the ABI. The other problem with this arrangement is that it restricts the heap size. On FreeBSD this would leave you with 1GB of heap and nearly 2GB of stack. Perhaps you use your stack differently than I do but that does not sound so appealing. > > As long as the stack doesn't chew up more than 3096 bytes and the heap > isn't used, there's just one page per process. Except that the operating system needs a stack too. That's several pages. And the uarea adds another page. And the proc structure, and the vm space, and the file desc table, and the thread structures now that freebsd is multithreaded. That's probably another 20kb or so on x86. The minor savings in user space are far outweighed by the kernel usage. Amdahl would have something to say about that. Furthermore, the VM treats stack pages and data pages differently. it also treats bss pages differently. Sure you could fit them all in if you wrote special case code to handle this situation, but how often does it really occur? I'm guessing just about never for almost all applications that FreeBSD is used for. This is a general purpose operating system that needs to work for normal cases. > As for page tables: Instead of allocating space for a bunch of nearly > identical page tables, why not overlap page tables, with the changes > copied on a process switch? They aren't nearly identical. They point at different pages. You can't overlap them unless you have 4MB of aligned mapped pages that are identical across two processes as is the case with large shared memory segments. Again, I think you would do well to read up on MMUs and paging hardware. If I gave two processes the same page directory and page tables they would overwrite each others memory! > As for 39 pages of VM, mostly stack: Can the system actually allocate > 390000 pages of VM? I'm only mildly concerned with the memory-management There is no special allocation for virtual address space that is contiguous with another region. It is simply the upper bound on an address. The system can allocate more vm than the system has swap and physical memory. The system can allocate more vm than available disk space if you ask for the right thing in the right number of processes. 390000 is only 1.5 gigs. You could allocate that many pages in one process on x86. > time; what bothers me is the loss of valuable address space. I hope that > this 128-kilobyte stack carelessness doesn't reflect a general policy of > dishonest VM allocation (``overcommitment''); I need to be able to > preallocate memory with proper error detection, so that I can guarantee > the success of subsequent operations. You need to look at the situation realisticly. FreeBSD is not being developed for your mythical one page process. It's developed for real applications that use up stack space. That limit is set so that in the common case we don't have to do an expensive operation to grow the stack's map. Make the common case fast, right? I don't appreciate your tone here, especially coming from someone who obviously is not familiar with VMs. > > As for malloc()'s careless use of memory: Is it really asking so much > that a single malloc(1) not be expanded by a factor of 16384? Yes, when in the common case that extra allocation will be used later. The size of the allocation from the back end dramatically impacts the performance of malloc and the vm system. It also effects fragmentation. > Here's a really easy way to improve malloc(). Apparently, right now, > there's no use of the space between the initial brk and the next page > boundary. Okay: allocate that space in the simplest possible way--- This is fairly extreme hackery to save a half page of memory on average and take a branch mispredict the rest of the time. [code removed] > > ---with no waste of space and practically no waste of time. Maybe add Except for the most important time; developers. This is an absurd suggestion. > 8192 to wherewenormallystart; this is lots of room for people who know > how to write small programs, and the cost is unnoticeable for people who > don't. People who know how to write really small programs would know not to use the standard libc or at least not the standard malloc implementation. It is designed for average programs for real systems. > (Quite a few of my programs simulate this effect by checking for space > in a bss array, typically 2K. But setting aside the right amount of > space would mean compiling, inspecting the brk alignment, and > recompiling. I also feel bad chewing up space on systems where malloc() > actually knows what it's doing.) I'm sure your programs are very small. Our userland malloc is actually quite good. We have phk to thank for that. I'm sure he'd love to hear your critiques and suggestions. > As for the safety of writing code that makes malloc() fail horribly: > After the Solaris treatment of BSD sockets, and the ``look, Ma, I can > make an only-slightly-broken imitation of poll() using select()!'' > epidemic, I don't trust OS distributors to reserve syscall names for > actual syscalls. I encounter more than enough portability problems > without going out of my way to look for them. The man pages specifically warn against using brk and sbrk yourself if you're going to use malloc() and free(). You get what you deserve if you do that. > ---D. J. Bernstein, Associate Professor, Department of Mathematics, > Statistics, and Computer Science, University of Illinois at Chicago > As I said before, it sounds like your application is better suited for DOS. I'm sure you'll find that you have much more control over the address layout of your system. Cheers, Jeff From owner-freebsd-performance@FreeBSD.ORG Wed Jun 25 02:27:31 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C6B4E37B401 for ; Wed, 25 Jun 2003 02:27:31 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id DB52E43FDD for ; Wed, 25 Jun 2003 02:27:30 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38lc0fm.dialup.mindspring.com ([209.86.1.246] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19V6Yn-0000yh-00; Wed, 25 Jun 2003 02:27:26 -0700 Message-ID: <3EF96AAA.31BDB7B6@mindspring.com> Date: Wed, 25 Jun 2003 02:26:02 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: "D. J. Bernstein" References: <20030623030004.40078.qmail@cr.yp.to> <20030625060629.51087.qmail@cr.yp.to> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4839089531cf3da11e8084b71cc9c1445350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: freebsd-performance@freebsd.org Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jun 2003 09:27:32 -0000 "D. J. Bernstein" wrote: > As I said, I don't particularly care about the text segment. I'm not > talking about ten thousand separate programs. > > Why does the memory manager keep the stack separate from data? Stack needs to be executable for the current signal trampoline code. FreeBSD might be able to adopt the OpenBSD "non-executable stack" code to let it do this, too, but that would really depend on the interaction with the various threads implementations, all of which have signal code that's likely to be unhappy with it. > Suppose a > program has 1000 bytes of data+bss. You could organize VM as follows: > > 0x7fffac18 0x7fffb000 0x80000000 > <---- stack data+bss text, say 5 pages heap ----> > > As long as the stack doesn't chew up more than 3096 bytes and the heap > isn't used, there's just one page per process. The pages have to be on page boundaries. The absolute minimum number of pages you could have is 2, assuming you take your approach to putting the load address up high. In general, however, realize that the total virtual address space consists of (KVA + UVA = 4G). The model you propose, of throwing the heap out at 0x80000000 implies that the KVA and UVA are seperate. While this is possible, in theory, in practice it's a bad idea, since the uiomove/copyin/copyout code depends on the ability to run in system space and map the user address at its expected location (in fact, it is mapped there by default on kernel entry, to avoid having to establish mappings for data copies that cross protection domains). Breaking this would have some serious performance consequences, in terms of needing to create wired mapping for user pages in kernel space when copying data in and out -- and having to do explicit address translations, which would significantly damage a number of aspects of performance. So instead, the stack and heap grow towards each other, and the stack does not live in a data page. In general, the use of malloc() that is causing your primary page-count problem is tunable, via the malloc symlink flags hack, to get the total count down. However, the counts is probably never going to go below 4 pages, if there is any heap memory in use at all, no matter what you do, and 5 or more, if you count page table entries, etc., against the process. FWIW, most of these pages are phantoms -- that is, they exist as page mappings with no backing pages, and backing pages are allocated on fault at reference time. > As for page tables: Instead of allocating space for a bunch of nearly > identical page tables, why not overlap page tables, with the changes > copied on a process switch? Each page table descriptor is capable of handling 4M of memory, either directly, as a 4M page mapping (not used for user processes, except for mmap'ed device memory with proper size and alignment, if it goes out of its way to establish the mapping). The typical mapping in a PTD pointing to a 4K page containg PTE's. For the most part, overlap doesn't buy you anything in this case, because most programs are linked shared, which means they almost instantaneously have a delta on pages between the two processes, even if they were forked identically, as a result of the page (set) that contains the glue for the dynamic object references. Even if you were talking a statically linked program, then you will need to worry about data pages that are copy-on-write written as soon as you start doing stack and heap accesses, so the page mappings are not shareable. The closest you will come to shareable page mappings is going to be rfork() (where they *are* shared, but are not COW, so it does you no good in your suggested case), or the case that the data space is very, very much larger than 4M, at which point it *may* make sense to make them COW. I'm pretty sure this would not work at all for the 386, which does not support taking write faults on pages that are not marked writeable, when the processor is in system mode -- this is a bug in the 386 page protection implementation, which has to be worked around for security reasons by hacking the page mapping and taking a page not present fault, and then fixing it up based on apriori knowledge of the page mapping; failure to do this extra work would mean that someone could read or write arbitrary kernel addresses with arbitrary data, thus escalating priviledges by overwriting credentials. Even throwing out the 386, I'm pretty sure that the fault on a page that contains page mappings is not defined for some circumstances, at least from my reading of the IA32 architecture manual. So to deal with this, you would probably need to move to explicit TLB shootdowns for most cases, and, again, you are screwed in the shared mappings department. For the most part, this won't work anyway, since most x86 UNIX systems don't use a separate GDT entry per process, but instead have one for the recursive mapping, one for the kernel, one for text, and one for data. Occasionally, they also support one per VM86() instance. Changing this would have the negative effect of limiting the total number of processes you could run simultaneously (see the Linux mailing list archives for why they switched away from using the TSS in order to do context switching: all TSS descriptors must reside in the GDT, which has only 8192 entries). It's not worth accepting this limit, in order to obtain page table sharing for a minority of tasks, since you seem to be interested in small tasks which could not benefit from doing this anyway. I really reccomend: Protected Mode Software Architecture Tim Shanley MindShare, Inc. Addison-Wesley Publishing Company ISBN: 0-201-55447-X Specifically: Chapter 8: Code Segments Chapter 9: Data and Stack Segments Chapter 11: Mechanics of a Task Switch Chapter 13: Virtual Paging > As for 39 pages of VM, mostly stack: Can the system actually allocate > 390000 pages of VM? I'm only mildly concerned with the memory-management > time; what bothers me is the loss of valuable address space. I hope that > this 128-kilobyte stack carelessness doesn't reflect a general policy of > dishonest VM allocation (``overcommitment''); I need to be able to > preallocate memory with proper error detection, so that I can guarantee > the success of subsequent operations. It is, in fact, memory overcommit. It would take some work to disable this. BSD systems have used memory overcommit ever since they adopted the Mach VM architecture. > As for malloc()'s careless use of memory: Is it really asking so much > that a single malloc(1) not be expanded by a factor of 16384? You need to ask malloc() for this behaviour explicitly; it's not implicit in the default system configuration. The primary reason for this is that most programmers who have been trained in the last two decades believe that "All the world's a VAX"; e.g.: "memory is free" and "Intel Byte order is right and network byte order is wrong", and other untruths. Given this reality, most programs these days are what a lot of us who were trained up when 32K was an *amazing* amount of memory would politely call "bloated pigs". It makes sense, in that case, to be aggressive in allocation of memory, in the expectation that programs will tend to be memory-unconscious enough that they will tend to use everything you preallocate on their behalf. [ ... default malloc behaviour complaints that are controllable via flags settings ... ] > (Quite a few of my programs simulate this effect by checking for space > in a bss array, typically 2K. But setting aside the right amount of > space would mean compiling, inspecting the brk alignment, and > recompiling. I also feel bad chewing up space on systems where malloc() > actually knows what it's doing.) I defy you to name one modern UNIX variant that doesn't overcommit memory. I doubt there is a malloc() existant on these systems that "actually knows what its doing". BTW: I would class your suggestion of COW page table sharing as a type of memory overcommit, as well. If you can find a counterexample, the logical thing to do is to port it and carry it around with your code, if you need that type of behaviour, since most libc malloc() implementations are defined to use weak symbols, to permit you to override their malloc/free implementation completely. So long as your implementation conforms to POSIX, everything else (strsave(3), et. al.) should "just work" with the replacement. > As for the safety of writing code that makes malloc() fail horribly: > After the Solaris treatment of BSD sockets, and the ``look, Ma, I can > make an only-slightly-broken imitation of poll() using select()!'' > epidemic, I don't trust OS distributors to reserve syscall names for > actual syscalls. I encounter more than enough portability problems > without going out of my way to look for them. Malloc isn't, and never has been, a system call on POSIX/UNIX systems. It's *always* been in section 3, for long as it's been an accepted part of the OS. Only things whose manual pages are in section 2 are system calls, and that's not a guarantee that they will stay there: basically, it's legal to implement all of POSIX on top of asynchronous system calls with a wait primitive, and provide all of the "expected" POSIX semantics in nothing more than a user space library. -- Terry From owner-freebsd-performance@FreeBSD.ORG Wed Jun 25 02:42:29 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6DDC137B401 for ; Wed, 25 Jun 2003 02:42:29 -0700 (PDT) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by mx1.FreeBSD.org (Postfix) with SMTP id B48C543FDF for ; Wed, 25 Jun 2003 02:42:28 -0700 (PDT) (envelope-from djb-dsn-1056534181.56349@cr.yp.to) Received: (qmail 56350 invoked by uid 1017); 25 Jun 2003 09:43:01 -0000 Date: 25 Jun 2003 09:43:01 -0000 Message-ID: <20030625094301.56349.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: freebsd-performance@freebsd.org References: <20030625060629.51087.qmail@cr.yp.to> <20030625023621.N17881-100000@mail.chesapeake.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jun 2003 09:42:29 -0000 In case my subject line isn't clear enough: I'm not talking about browsers. I'm not talking about programs that ``use up stack space,'' or that would feel even slightly ``restricted'' by a 1GB limit on the heap, or that use threads, or that allocate many file descriptors per process. I'm talking about _small_ processes. I'm talking about programs that might have quite a bit of code, and might read and write quite a lot of data, but that don't use much memory per process. The problem is that I'm talking about ten thousand of these processes running at once. In this situation, if malloc() fritters away 24K of RAM in one process, it's actually frittering away more than 234 _megabytes_ of RAM. This memory is _not_ going to be ``used later.'' The behavior of malloc() here is not ``quite good.'' For the program that prompted this thread, malloc() achieved 4.3% fill, 95.7% external fragmentation. In this situation, squeezing variables into a single page---rather than spreading them among the first few bytes of several separate pages--- often produces huge improvements in CPU cache effectiveness. Yes, I know that doesn't make a difference for your browser; I'm not talking about your browser. In this situation, if the kernel is maintaining pages of memory that look like process 1: 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... process 2: 4 5 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... process 3: 7 8 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... then it's wasting an incredible amount of space. As I said before: Instead of allocating space for a bunch of nearly identical page tables (or anything else showing this type of waste), why not overlap page tables, with the changes copied on a process switch? ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-freebsd-performance@FreeBSD.ORG Wed Jun 25 03:29:31 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 45BD037B401 for ; Wed, 25 Jun 2003 03:29:31 -0700 (PDT) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9A64043FDD for ; Wed, 25 Jun 2003 03:29:30 -0700 (PDT) (envelope-from baka@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1921) id 7F18F2ED40E; Wed, 25 Jun 2003 03:29:30 -0700 (PDT) Date: Wed, 25 Jun 2003 03:29:30 -0700 From: Jon Mini To: "D. J. Bernstein" Message-ID: <20030625102930.GI55678@elvis.mu.org> References: <20030625060629.51087.qmail@cr.yp.to> <20030625023621.N17881-100000@mail.chesapeake.net> <20030625094301.56349.qmail@cr.yp.to> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030625094301.56349.qmail@cr.yp.to> User-Agent: Mutt/1.4.1i cc: freebsd-performance@freebsd.org Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jun 2003 10:29:31 -0000 D. J. Bernstein [djb@cr.yp.to] wrote : > In case my subject line isn't clear enough: I'm not talking about > browsers. I'm not talking about programs that ``use up stack space,'' or > that would feel even slightly ``restricted'' by a 1GB limit on the heap, > or that use threads, or that allocate many file descriptors per process. However, you *are* talking about modifying the behaviour of a system that normally runs processes like this. This is a general-purpose operating system. We have to consider the impact of any changes we make on all types of loads, but primarily the most common ones. Especially the most common performance-critical ones. That means processes which do not behave at all like what you are talking about. If FreeBSD primarily ran very small processes that only consumed a few kilobytes of state, then it would be designed very differently, and the problem you are describing would not exist. However, that is not the case. > I'm talking about _small_ processes. I'm talking about programs that > might have quite a bit of code, and might read and write quite a lot of > data, but that don't use much memory per process. The problem is that > I'm talking about ten thousand of these processes running at once. You are going to run in a far larger list of problems when trying to run orders of tens of thousands of processes on FreeBSD. The fact of the matter is not that FreeBSD might have poorly designed implementation details, but rather that a large-scale architectural decision was made among the UNIX community at large to, simply put, not support that. The fact of the matter is that a process is a heavy-weight entity. The resource consumption you are describing in userland is least of your worries. In the kernel many more resources are consumed on a per-process bases and, unfortunately, there are a handful of algorithms that are going to have poor growth charactaristics with that many processes active on the system. The impact on the scheduling subsystem alone is probably worth several months of research. You are not going to find much of a positive response here for your use case. What you suggest would take many hours of development time in order to provide gain only on use cases that are (a) not common, and (b) not helpful. Simply put, we don't want users to make tens of thousands of small processes. This, among other things, is what threads are for. > In this situation, if malloc() fritters away 24K of RAM in one process, > it's actually frittering away more than 234 _megabytes_ of RAM. This > memory is _not_ going to be ``used later.'' The behavior of malloc() > here is not ``quite good.'' For the program that prompted this thread, > malloc() achieved 4.3% fill, 95.7% external fragmentation. Congradulations! You've successfully shown that there is no "One True Memory Allocator" that allocates perfectly in all cases, and that memory fragmentation can become terribly wasteful! Our malloc(3) exists in order to solve a very different sort of problem than the one you are trying to solve: the common case for FreeBSD, which is large processes that perform many allocations of wildly varying sizes. I suggest you read Paul's paper on his malloc(3) implementation and the problems he was trying to solve. You can find a copy in /usr/share/doc/papers/malloc.ascii.gz. > In this situation, squeezing variables into a single page---rather than > spreading them among the first few bytes of several separate pages--- > often produces huge improvements in CPU cache effectiveness. Yes, I know > that doesn't make a difference for your browser; I'm not talking about > your browser. You have an economy of scale problem here. One moment, you argue that you want to create a system where you have tens of thousands of processes, and that wasting a few pages per process is unacceptable because there are so many of them. Then, in the next moment, you say that this is also bad because of cache behaviour? I'm sorry, but you are way off here. First of all, caches are *much larger* than the size of the processes you are talking about. Second of all, every time you perform a context switch between processes, you must flush your cache. Given that you are talking about "tens of thousands" of processes, one would imagine that you would be context-switching constantly. Cache behaviour would definately be a problem in that system, but it's not because of malloc(3). It sounds very much like you are trying to use the wrong solution here, and that what you want to do will never work well. Perhaps a better approach would be to try using (in order of weight): finite-automita, co-routines, a cooperative user-land thread scheduler, or kernel-assisted threading (KSE or thr)? On x86 spcifically, you could hand-roll yourself a more efficient protection mechanism using segments, but even there you are going to run into problems with the number of segment descriptors available. > In this situation, if the kernel is maintaining pages of memory that > look like > > process 1: 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > process 2: 4 5 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > process 3: 7 8 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... > > then it's wasting an incredible amount of space. As I said before: > Instead of allocating space for a bunch of nearly identical page tables > (or anything else showing this type of waste), why not overlap page > tables, with the changes copied on a process switch? I am trying to understand what you are saying here, but it is hard, because the picture you drew does not match the way that page tables actually work. So, to paraphrase: Problem: The majority of entries in these page directories are zero. That is to say, they are very sparse. Solution: Dynamically create a page table on every context switch from a compressed state description of the mappings (or perhaps from the vm object list) upon every context switch. Is this what you are proposing? > ---D. J. Bernstein, Associate Professor, Department of Mathematics, > Statistics, and Computer Science, University of Illinois at Chicago > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" -- Jonathan Mini http://www.freebsd.org/ From owner-freebsd-performance@FreeBSD.ORG Wed Jun 25 05:41:44 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4812837B401 for ; Wed, 25 Jun 2003 05:41:44 -0700 (PDT) Received: from silver.he.iki.fi (silver.he.iki.fi [193.64.42.241]) by mx1.FreeBSD.org (Postfix) with ESMTP id E1FC243FBD for ; Wed, 25 Jun 2003 05:41:42 -0700 (PDT) (envelope-from pete@he.iki.fi) Received: from PETEX31 (gprs-prointernet-3e47d055.mobile.inet.fi [62.71.208.85]) by silver.he.iki.fi (8.12.9/8.11.4) with SMTP id h5PCfTsL083042; Wed, 25 Jun 2003 15:41:33 +0300 (EEST) (envelope-from pete@he.iki.fi) Message-ID: <009901c33b17$1a5090c0$10d4473e@PETEX31> From: "Petri Helenius" To: "D. J. Bernstein" , References: <20030625060629.51087.qmail@cr.yp.to><20030625023621.N17881-100000@mail.chesapeake.net> <20030625094301.56349.qmail@cr.yp.to> Date: Wed, 25 Jun 2003 14:39:52 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jun 2003 12:41:44 -0000 > or that use threads, or that allocate many file descriptors per process. > > I'm talking about _small_ processes. I'm talking about programs that > might have quite a bit of code, and might read and write quite a lot of > data, but that don't use much memory per process. The problem is that > I'm talking about ten thousand of these processes running at once. > Why you want processes and not use threads? Running 10000 processes "at once" requires 10000 execution units, I don´t think FreeBSD will run on anything with more than 16 CPUs at this time. Pete From owner-freebsd-performance@FreeBSD.ORG Wed Jun 25 15:52:36 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D5BC937B401 for ; Wed, 25 Jun 2003 15:52:36 -0700 (PDT) Received: from possum.cryptohill.net (cambist.cryptohill.net [24.244.145.35]) by mx1.FreeBSD.org (Postfix) with ESMTP id BD19643FE5 for ; Wed, 25 Jun 2003 15:52:35 -0700 (PDT) (envelope-from jeroen@vangelderen.org) Received: from vangelderen.org (grolsch.cryptohill.net [24.244.145.13]) by possum.cryptohill.net (Postfix) with ESMTP id D24BCAE0E3; Wed, 25 Jun 2003 18:52:33 -0400 (EDT) Date: Wed, 25 Jun 2003 18:52:31 -0400 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Mime-Version: 1.0 (Apple Message framework v552) To: "Petri Helenius" From: "Jeroen C. van Gelderen" In-Reply-To: <009901c33b17$1a5090c0$10d4473e@PETEX31> Message-Id: Content-Transfer-Encoding: quoted-printable X-Mailer: Apple Mail (2.552) cc: freebsd-performance@freebsd.org cc: "D. J. Bernstein" Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jun 2003 22:52:37 -0000 On Wednesday, Jun 25, 2003, at 08:39 US/Eastern, Petri Helenius wrote: >> or that use threads, or that allocate many file descriptors per=20 >> process. >> >> I'm talking about _small_ processes. I'm talking about programs that >> might have quite a bit of code, and might read and write quite a lot=20= >> of >> data, but that don't use much memory per process. The problem is that >> I'm talking about ten thousand of these processes running at once. >> > > Why you want processes and not use threads? Running 10000 processes > "at once" requires 10000 execution units, I don=B4t think FreeBSD will=20= > run on > anything with more than 16 CPUs at this time. Fault isolation? -J --=20 Jeroen C. van Gelderen - jeroen@vangelderen.org A single glass of beer was passed, from which I was the last one to sip - a ritual signifying that I was not to be poisoned. From owner-freebsd-performance@FreeBSD.ORG Wed Jun 25 15:57:59 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 26EC737B401 for ; Wed, 25 Jun 2003 15:57:59 -0700 (PDT) Received: from pop018.verizon.net (pop018pub.verizon.net [206.46.170.212]) by mx1.FreeBSD.org (Postfix) with ESMTP id CCDD04402A for ; Wed, 25 Jun 2003 15:57:56 -0700 (PDT) (envelope-from cswiger@mac.com) Received: from mac.com ([141.149.47.46]) by pop018.verizon.net (InterMail vM.5.01.05.33 201-253-122-126-133-20030313) with ESMTP id <20030625225755.JIFM11703.pop018.verizon.net@mac.com>; Wed, 25 Jun 2003 17:57:55 -0500 Message-ID: <3EFA28EF.9050400@mac.com> Date: Wed, 25 Jun 2003 18:57:51 -0400 From: Chuck Swiger Organization: The Courts of Chaos User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030612 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "D. J. Bernstein" References: <20030623030004.40078.qmail@cr.yp.to> <20030624203536.D17881-100000@mail.chesapeake.net> <20030625060629.51087.qmail@cr.yp.to> In-Reply-To: <20030625060629.51087.qmail@cr.yp.to> X-Enigmail-Version: 0.76.0.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Authentication-Info: Submitted using SMTP AUTH at pop018.verizon.net from [141.149.47.46] at Wed, 25 Jun 2003 17:57:54 -0500 cc: freebsd-performance@freebsd.org Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jun 2003 22:57:59 -0000 D. J. Bernstein wrote: [ ... ] > Why does the memory manager keep the stack separate from data? Suppose a > program has 1000 bytes of data+bss. You could organize VM as follows: > > 0x7fffac18 0x7fffb000 0x80000000 > <---- stack data+bss text, say 5 pages heap ----> > > As long as the stack doesn't chew up more than 3096 bytes and the heap > isn't used, there's just one page per process. Remember that VMM hardware requires page-alignment: TEXT should be on pages marked X (or RX if the local architecture needs it), DATA+BSS should be RW, and I think FreeBSD needs the stack to be RWX. We need to consider the kernel's address space-- 32-bit systems generally reserve the top 2GB, or sometimes less, exclusively for the kernel. Besides, most programs are probably not built as PIC and could not have their starting address relocated arbitrarily, although perhaps it would be interesting to consider the following process address map: VM Address Usage 0x0 PAGEZERO 0x4000 XXX bytes reserved per the hard limit to process stack size XXX (+ 0x4000) TEXT segment YYY DATA + BSS ZZZ heap 0x80000000 KVA > As for page tables: Instead of allocating space for a bunch of nearly > identical page tables, why not overlap page tables, with the changes > copied on a process switch? Mach uses copy-on-write for VMO's associated with the virtual address space used by processes, which are similar abstractions to the page table entries used under classic BSD. From "man vmmap": The share mode describes whether pages are shared between processes, and what happens when pages are modified. Private pages (PRV) are pages only visible to this process. They are allocated as they are written to, and can be paged out to disk. Copy-on-write (COW) pages are shared by multi- ple processes (or shared by a single process in multiple locations). When the page is modified, the writing process then receives its own copy of the page. Empty (NUL) sharing implies that the page does not really exist in physical memory. Aliased (ALI) and shared (SHM) memory is shared between processes. The share mode typically describes the general mode controlling the region. For example, as copy-on-write pages are modified, they become private to the application. Even with the private pages, the region is still COW until all pages become private. Once all pages are private, then the share mode would change to private. The far left column names the purpose of the memory: text segment, data segment, allocated via malloc, stack, etc. For regions loaded from bina- ries, the far right shows the library loaded into the memory. Some lines in vmmap's output describe submaps. A submap is a shared set of virtual memory page descriptions that the operating system can reuse between multiple processes. The memory between 0x70000000 and 0x80000000, for example, is a submap containing the most common dynamic libraries. Submaps minimize the operating system's memory usage by rep- resenting the virtual memory regions only once. Submaps can either be shared by all processes (machine-wide) or local to the process (process- only). If the contents of a machine-wide submap are changed -- for exam- ple, the debugger makes a section of memory for a dylib writable so it can insert debugging traps -- then the submap becomes local, and the ker- nel will allocate memory to store the extra copy. 8-cube# vmmap 252 ==== Non-writable regions for process 252 __PAGEZERO 0 [ 4K] ---/--- SM=NUL syslogd __TEXT 1000 [ 20K] r-x/rwx SM=COW syslogd __LINKEDIT 7000 [ 4K] r--/rwx SM=COW syslogd Submap 90000000-9fffffff r--/r-- machine-wide submap __TEXT 90000000 [ 932K] r-x/r-x SM=COW ...System.B.dylib __LINKEDIT 900e9000 [ 260K] r--/r-- SM=COW ...System.B.dylib __TEXT 93a40000 [ 20K] r-x/r-x SM=COW ...Common.A.dylib __LINKEDIT 93a45000 [ 4K] r--/r-- SM=COW ...Common.A.dylib Submap a000b000-a3a3ffff r--/r-- process-only submap Submap a3a41000-afffffff r--/r-- process-only submap aff80000 [ 512K] r--/r-- SM=SHM ==== Writable regions for process 252 __DATA 6000 [ 4K] rw-/rwx SM=PRV syslogd MALLOC_USED(DefaultMallocZone_ 8000 [ 20K] rw-/rwx SM=COW MALLOC_USED(DefaultMallocZone_ d000 [ 4K] rw-/rwx SM=ZER MALLOC_USED(DefaultMallocZone_ e000 [ 4K] rw-/rwx SM=COW MALLOC_FREE(DefaultMallocZone_ f000 [ 228K] rw-/rwx SM=ZER __TEXT 8fe00000 [ 288K] rw-/rwx SM=COW /usr/lib/dyld __DATA 8fe48000 [ 8K] rw-/rwx SM=COW /usr/lib/dyld __DATA 8fe4a000 [ 4K] rw-/rwx SM=COW /usr/lib/dyld __DATA 8fe4b000 [ 4K] rw-/rwx SM=ZER /usr/lib/dyld __DATA 8fe4c000 [ 12K] rw-/rwx SM=COW /usr/lib/dyld __DATA 8fe4f000 [ 144K] rw-/rwx SM=ZER /usr/lib/dyld __LOCK 8fe73000 [ 4K] rw-/rwx SM=NUL /usr/lib/dyld __LINKEDIT 8fe74000 [ 44K] rw-/rwx SM=COW /usr/lib/dyld Submap 90000000-9fffffff r--/r-- machine-wide submap __DATA a0000000 [ 4K] rw-/rw- SM=ZER ...System.B.dylib __DATA a0001000 [ 4K] rw-/rw- SM=COW ...System.B.dylib __DATA a0002000 [ 20K] rw-/rw- SM=COW ...System.B.dylib __DATA a0007000 [ 16K] rw-/rw- SM=PRV ...System.B.dylib Submap a000b000-a3a3ffff r--/r-- process-only submap __DATA a3a40000 [ 4K] rw-/rw- SM=COW ...Common.A.dylib Submap a3a41000-afffffff r--/r-- process-only submap STACK[0] bff80000 [ 508K] rw-/rwx SM=PRV bffff000 [ 4K] rw-/rwx SM=PRV ==== Legend SM=sharing mode: COW=copy_on_write PRV=private NUL=empty ALI=aliased SHM=shared ZER=zero_filled S/A=shared_alias ==== Summary for process 252 ReadOnly portion of Libraries: Total=1572KB resident=1444KB(92%) swapped_out_or_ unallocated=128KB(8%) Writable regions: Total=968KB written=40KB(4%) resident=88KB(9%) swapped_out=0KB (0%) unallocated=880KB(91%) > As for 39 pages of VM, mostly stack: Can the system actually allocate > 390000 pages of VM? I believe 390000 4K pages is 1523 MB: if you've got the datasize resource limit set high enough and you've got the RAM or swap space available, the answer to your question should be yes. > I'm only mildly concerned with the memory-management > time; what bothers me is the loss of valuable address space. I hope that > this 128-kilobyte stack carelessness doesn't reflect a general policy of > dishonest VM allocation (``overcommitment''); I need to be able to > preallocate memory with proper error detection, so that I can guarantee > the success of subsequent operations. Preallocate at compile time, or preallocate at process run time? > As for malloc()'s careless use of memory: Is it really asking so much > that a single malloc(1) not be expanded by a factor of 16384? > > Here's a really easy way to improve malloc(). Apparently, right now, > there's no use of the space between the initial brk and the next page > boundary. Okay: allocate that space in the simplest possible way--- It's easy to write a memory allocator that performs a specific case well; writing a general purpose malloc is significantly more complicated, and FreeBSD's malloc is tuned for programs which are much larger than your example. If you know of a malloc() implementation that does better than FreeBSD's, and is suitable for SMP systems, let us know. -Chuck From owner-freebsd-performance@FreeBSD.ORG Wed Jun 25 19:04:25 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C79AB37B401 for ; Wed, 25 Jun 2003 19:04:25 -0700 (PDT) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by mx1.FreeBSD.org (Postfix) with SMTP id 07F1543FDD for ; Wed, 25 Jun 2003 19:04:23 -0700 (PDT) (envelope-from djb-dsn-1056593095.51862@cr.yp.to) Received: (qmail 51863 invoked by uid 1017); 26 Jun 2003 02:04:55 -0000 Date: 26 Jun 2003 02:04:55 -0000 Message-ID: <20030626020455.51862.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: freebsd-performance@freebsd.org References: <20030623030004.40078.qmail@cr.yp.to> <20030624203536.D17881-100000@mail.chesapeake.net> <20030625060629.51087.qmail@cr.yp.to> <3EFA28EF.9050400@mac.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jun 2003 02:04:26 -0000 Chuck Swiger writes: > Remember that VMM hardware requires page-alignment When I ask why the stack and data aren't put on the same page, and you say ``They aren't on the same page,'' you aren't answering the question. (As for adding an x bit to data: This obviously won't break anything.) Here's an alternative layout that doesn't move the text. Subtract the data+bss (or at least data) amount from the stack starting position, and put the data+bss (or data; but not the heap, obviously) into that space. This saves 78 megabytes of memory in the situation I'm talking about. > Mach uses copy-on-write I'm not talking about copy-on-write. I'm not talking about shared pages. I'm talking about RAM being frittered away for memory-management tables that, in this situation, could trivially be compressed by two orders of magnitude. This is not rocket science. Jon's ``dynamic page-table creation'' terminology is pretty good. Of course, for processes with many pages of process-specific memory, the page table should be cached rather than being shared among processes; I'm not suggesting any change in how browser memory is handled. > It's easy to write a memory allocator that performs a specific case well; > writing a general purpose malloc is significantly more complicated, I'm not talking about replacing malloc() with a special-purpose allocator. I'm talking about adding a tiny bit of code to malloc() to magically take advantage of space that is being ignored right now. The savings in this situation go beyond those dozens of megabytes of magically reacquired RAM. There's a nasty spike in memory usage as soon as malloc() starts extending the heap; when a program's allocations fit into the magically reacquired RAM, the program also avoids the spike. ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-freebsd-performance@FreeBSD.ORG Wed Jun 25 19:26:50 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 398DF37B401 for ; Wed, 25 Jun 2003 19:26:50 -0700 (PDT) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by mx1.FreeBSD.org (Postfix) with SMTP id 7263644005 for ; Wed, 25 Jun 2003 19:26:49 -0700 (PDT) (envelope-from djb-dsn-1056594442.62942@cr.yp.to) Received: (qmail 62943 invoked by uid 1017); 26 Jun 2003 02:27:22 -0000 Date: 26 Jun 2003 02:27:22 -0000 Message-ID: <20030626022722.62942.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: freebsd-performance@freebsd.org References: <009901c33b17$1a5090c0$10d4473e@PETEX31> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jun 2003 02:26:50 -0000 I want separate processes for the memory protection. Each process is chrooted under its own uid, so it can't write to disk except through supplied file descriptors, and it can't hit other processes. (If I had a portable way to cut off other communication channels, such as creating new sockets, I'd do that too.) I'm willing to sacrifice one page per process for the sake of memory protection; I realize that it's hard to do better than that. But I'm not willing to casually piss away large fractions of a gigabyte of RAM. Not this decade, anyway. The lack of memory protection is exactly why I can't use threads. It's also why I'm not surprised to hear that processes are _slightly_ less efficient than threads. But something is seriously wrong if processes are _much_ less efficient than threads. ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-freebsd-performance@FreeBSD.ORG Wed Jun 25 19:49:58 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2B64037B401 for ; Wed, 25 Jun 2003 19:49:58 -0700 (PDT) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by mx1.FreeBSD.org (Postfix) with SMTP id 709D44400D for ; Wed, 25 Jun 2003 19:49:57 -0700 (PDT) (envelope-from djb-dsn-1056595829.71392@cr.yp.to) Received: (qmail 71393 invoked by uid 1017); 26 Jun 2003 02:50:29 -0000 Date: 26 Jun 2003 02:50:29 -0000 Message-ID: <20030626025029.71392.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: freebsd-performance@freebsd.org References: <20030625060629.51087.qmail@cr.yp.to> <20030625023621.N17881-100000@mail.chesapeake.net> <20030625094301.56349.qmail@cr.yp.to> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jun 2003 02:49:58 -0000 Jon Mini writes: > I'm sorry, but you are way off here. First of all, caches are *much > larger* than the size of the processes you are talking about. I'm sorry, but you are being misled by a naive model of CPU performance. On a typical Pentium in our department, the following program becomes three times faster when SPACING is changed from 4096 to 128: #define SPACING 4096 char data[8 * SPACING]; main() { int i; for (i = 0;i < 10000000;++i) { data[0] = data[SPACING]; data[2 * SPACING] = data[3 * SPACING]; data[4 * SPACING] = data[5 * SPACING]; data[6 * SPACING] = data[7 * SPACING]; } } >From an asm programmer's perspective, when FreeBSD decides to spread a small program's variables between * the beginning of a data page, * the beginning of a bss page, * the beginning of a malloc mmap page, * the beginning of a heap page, * the beginning of the next heap page, * the beginning of yet another heap page, et cetera, it is actively trying (with varying degrees of success) to damage cache performance in exactly the same way that this program does. ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-freebsd-performance@FreeBSD.ORG Wed Jun 25 20:41:07 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0EB4237B401 for ; Wed, 25 Jun 2003 20:41:07 -0700 (PDT) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by mx1.FreeBSD.org (Postfix) with SMTP id 6F2FF43FE3 for ; Wed, 25 Jun 2003 20:41:06 -0700 (PDT) (envelope-from djb-dsn-1056598898.88365@cr.yp.to) Received: (qmail 88366 invoked by uid 1017); 26 Jun 2003 03:41:38 -0000 Date: 26 Jun 2003 03:41:38 -0000 Message-ID: <20030626034138.88365.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: freebsd-performance@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: The dangers of replacing malloc() X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jun 2003 03:41:07 -0000 Suppose that, taking the advice of inexperienced programmers who trumpet weak linking, I use sbrk() to write my own malloc(), free(), etc. Here's what can go horribly wrong. Suppose the OS distributor doesn't go to the effort of dealing with competition for sbrk(). This is the normal situation; it has been quite explicitly tolerated by the sbrk() documentation for several centuries. Suppose the OS distributor decides to write somedorkyosfunction() using some funky new allocation function that I haven't replaced because I haven't heard of it. Yesterday it was valloc(); tomorrow xyzalloc(). This happens all the time: look at the FreeBSD reallocf(), for example. Suppose the OS distributor decides that valloc() or xyzalloc() should do its own thing, rather than calling malloc(). This happens too: I tried the sample program shown below under Linux, and somedorkyosfunction() ended up calling brk() rather than my own malloc(). Finally, suppose the OS distributor decides that some syscall I use should be replaced by a library routine that uses somedorkyosfunction(). This happens too. Note for the reading-impaired: I'm not saying that the name malloc() has ever been used for a syscall; I'm saying that poll(), socket(), et al. have been used for allocating library routines. Result: My program innocently calls that library routine, which calls somedorkyosfunction(), which calls valloc() or xyzalloc(), which incorrectly assumes that its sbrk() results are contiguous, destroying the data allocated by my own malloc(). As I said before, I encounter more than enough portability problems without going out of my way to look for them. I wish OS distributors would put a little more thought into the needs of people who _don't_ spend their entire lives working with a single platform. ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago #include void somedorkyosfunction(void) { valloc(1); } void *malloc(size_t n) { write(1,"malloc\n",7); return 0; } void *calloc(size_t n,size_t m) { write(1,"calloc\n",7); return 0; } void *realloc(void *x,size_t n) { write(1,"realloc\n",8); return 0; } void free(void *x) { write(1,"free\n",5); } int main() { malloc(1); realloc(0,1); somedorkyosfunction(); return 0; } From owner-freebsd-performance@FreeBSD.ORG Wed Jun 25 21:39:06 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 12E4A37B401 for ; Wed, 25 Jun 2003 21:39:06 -0700 (PDT) Received: from ns1.xcllnt.net (209-128-86-226.BAYAREA.NET [209.128.86.226]) by mx1.FreeBSD.org (Postfix) with ESMTP id 28A3D43FFD for ; Wed, 25 Jun 2003 21:39:05 -0700 (PDT) (envelope-from marcel@xcllnt.net) Received: from dhcp01.pn.xcllnt.net (dhcp01.pn.xcllnt.net [192.168.4.201]) by ns1.xcllnt.net (8.12.9/8.12.9) with ESMTP id h5Q4ccDZ014706; Wed, 25 Jun 2003 21:38:38 -0700 (PDT) (envelope-from marcel@piii.pn.xcllnt.net) Received: from dhcp01.pn.xcllnt.net (localhost [127.0.0.1]) by dhcp01.pn.xcllnt.net (8.12.9/8.12.9) with ESMTP id h5Q4cYhn033751; Wed, 25 Jun 2003 21:38:34 -0700 (PDT) (envelope-from marcel@dhcp01.pn.xcllnt.net) Received: (from marcel@localhost) by dhcp01.pn.xcllnt.net (8.12.9/8.12.9/Submit) id h5Q4cUZe033750; Wed, 25 Jun 2003 21:38:30 -0700 (PDT) (envelope-from marcel) Date: Wed, 25 Jun 2003 21:38:30 -0700 From: Marcel Moolenaar To: "D. J. Bernstein" Message-ID: <20030626043830.GA33650@dhcp01.pn.xcllnt.net> References: <20030625060629.51087.qmail@cr.yp.to> <20030625023621.N17881-100000@mail.chesapeake.net> <20030625094301.56349.qmail@cr.yp.to> <20030626025029.71392.qmail@cr.yp.to> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030626025029.71392.qmail@cr.yp.to> User-Agent: Mutt/1.5.4i cc: freebsd-performance@freebsd.org Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jun 2003 04:39:06 -0000 On Thu, Jun 26, 2003 at 02:50:29AM -0000, D. J. Bernstein wrote: > Jon Mini writes: > > I'm sorry, but you are way off here. First of all, caches are *much > > larger* than the size of the processes you are talking about. > > I'm sorry, but you are being misled by a naive model of CPU performance. > On a typical Pentium in our department, the following program becomes > three times faster when SPACING is changed from 4096 to 128: *snip* > >From an asm programmer's perspective, when FreeBSD decides to spread a > small program's variables between > > * the beginning of a data page, > * the beginning of a bss page, > * the beginning of a malloc mmap page, > * the beginning of a heap page, > * the beginning of the next heap page, > * the beginning of yet another heap page, > > et cetera, it is actively trying (with varying degrees of success) to > damage cache performance in exactly the same way that this program does. Just curious: do you happen to know if the performance hit is caused by the second order effect of having the spacing be a multiple of the cache associativity, thereby resulting in thrashing of a few cache lines, and that compacting the code results in a more uniform cache placement? In other words: is it (sec) the spacing that counts or the interaction of a particular "distance" with cache placement? -- Marcel Moolenaar USPA: A-39004 marcel@xcllnt.net From owner-freebsd-performance@FreeBSD.ORG Wed Jun 25 22:15:45 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8749E37B401 for ; Wed, 25 Jun 2003 22:15:45 -0700 (PDT) Received: from bitblocks.com (bitblocks.com [209.204.185.216]) by mx1.FreeBSD.org (Postfix) with ESMTP id F150343FEC for ; Wed, 25 Jun 2003 22:15:44 -0700 (PDT) (envelope-from bakul@bitblocks.com) Received: from bitblocks.com (localhost [127.0.0.1]) by bitblocks.com (8.12.9/8.12.9) with ESMTP id h5Q5FhPF020045; Wed, 25 Jun 2003 22:15:43 -0700 (PDT) (envelope-from bakul@bitblocks.com) Message-Id: <200306260515.h5Q5FhPF020045@bitblocks.com> To: "D. J. Bernstein" In-reply-to: Your message of "26 Jun 2003 02:50:29 -0000." <20030626025029.71392.qmail@cr.yp.to> Date: Wed, 25 Jun 2003 22:15:43 -0700 From: Bakul Shah cc: freebsd-performance@freebsd.org Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jun 2003 05:15:45 -0000 > On a typical Pentium in our department, the following program becomes > three times faster when SPACING is changed from 4096 to 128: > > #define SPACING 4096 > char data[8 * SPACING]; > main() > { > int i; > for (i = 0;i < 10000000;++i) { > data[0] = data[SPACING]; > data[2 * SPACING] = data[3 * SPACING]; > data[4 * SPACING] = data[5 * SPACING]; > data[6 * SPACING] = data[7 * SPACING]; > } > } When SPACING is changed from 4096 to 128, this program runs ten times faster on a pentium pro. It runs two times _slower_ on an Athlon XP+. It runs at about the same speed on a PIII. All one can say is performance is sensitive to data spacing + processor architecture and you have to tune your programs for a given machine. Instead of complaining about wasting 78 megabytes and arguing about why various proposed solutions fall short and why your way is the best, why don't you come up with a patch that saves space for small programs? If *you* are not willing to `casually piss away large fractions of a gigabyte of RAM' it's up to *you* to solve the problem! If you expect others to do this for you, you are going about it the wrong way. From owner-freebsd-performance@FreeBSD.ORG Wed Jun 25 23:24:14 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BD0A837B401 for ; Wed, 25 Jun 2003 23:24:14 -0700 (PDT) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id 27F2543FFD for ; Wed, 25 Jun 2003 23:24:14 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-2ivfirs.dialup.mindspring.com ([165.247.203.124] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19VQB2-0001Aq-00; Wed, 25 Jun 2003 23:24:12 -0700 Message-ID: <3EFA9147.6435E822@mindspring.com> Date: Wed, 25 Jun 2003 23:23:03 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: "D. J. Bernstein" References: <20030623030004.40078.qmail@cr.yp.to> <20030624203536.D17881-100000@mail.chesapeake.net> <20030626020455.51862.qmail@cr.yp.to> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a415d3f081818e80c68e04e070ffe91d00350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: freebsd-performance@freebsd.org Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jun 2003 06:24:15 -0000 "D. J. Bernstein" wrote: > Chuck Swiger writes: > > Remember that VMM hardware requires page-alignment > > When I ask why the stack and data aren't put on the same page, and you > say ``They aren't on the same page,'' you aren't answering the question. > (As for adding an x bit to data: This obviously won't break anything.) I don't know if you just ignored my posting where I answered this, but I will answer it yet again: to prevent data from being made executable. The stack, at present, *must* be executable, since it must support the signal trampoline code, which is code that executes on the stack. > Here's an alternative layout that doesn't move the text. Subtract the > data+bss (or at least data) amount from the stack starting position, and > put the data+bss (or data; but not the heap, obviously) into that space. > This saves 78 megabytes of memory in the situation I'm talking about. The stack and data cannot share the same page, while leaving the stack executable and the data not. I am also not sure how it is we are supposed to judge the maximum stack size at compile/link time. > > Mach uses copy-on-write > > I'm not talking about copy-on-write. I'm not talking about shared pages. > I'm talking about RAM being frittered away for memory-management tables > that, in this situation, could trivially be compressed by two orders of > magnitude. This is not rocket science. I don't believe this is explicitly possible, due to architectural constraints. You either get ~1,810 processes less than you want, because you use a GDT entry per process, or you don't get write fault notification for the purposes of implementing COW for the page tables. -- Terry From owner-freebsd-performance@FreeBSD.ORG Thu Jun 26 00:12:58 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 81CAD37B49D for ; Thu, 26 Jun 2003 00:12:58 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9CF654401A for ; Thu, 26 Jun 2003 00:12:55 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-2ivfirs.dialup.mindspring.com ([165.247.203.124] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19VQw7-0004CN-00; Thu, 26 Jun 2003 00:12:52 -0700 Message-ID: <3EFA9CA3.E616B54C@mindspring.com> Date: Thu, 26 Jun 2003 00:11:31 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: "D. J. Bernstein" References: <20030626034138.88365.qmail@cr.yp.to> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4a46dd6d62f5fcbd04d162560863da2533ca473d225a0f487350badd9bab72f9c350badd9bab72f9c cc: freebsd-performance@freebsd.org Subject: Re: The dangers of replacing malloc() X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jun 2003 07:12:59 -0000 "D. J. Bernstein" wrote: > Suppose that, taking the advice of inexperienced programmers who trumpet > weak linking, I use sbrk() to write my own malloc(), free(), etc. Here's > what can go horribly wrong. If you are referring to me, I have 24 years of professional software engineering experience and 20 years of UNIX kernel experience. 8-). But let's address your issues... > Suppose the OS distributor doesn't go to the effort of dealing with > competition for sbrk(). This is the normal situation; it has been quite > explicitly tolerated by the sbrk() documentation for several centuries. The sbrk(2) system call is an implementation detail for specific systems. It has nothing to do with malloc(3), other than it is used on some systems to allocate private, anonymous memory. Other implementations use mmap(2) of /dev/zero to obtain pages of private, anonymous memory. In fact, mmap(2) *is* specified by IEEE 1003.1, while sbrk(2) is not. In other words, the implementation details of sbrk(2) are unimportant to the validity your argument. > Suppose the OS distributor decides to write somedorkyosfunction() using > some funky new allocation function that I haven't replaced because I > haven't heard of it. Yesterday it was valloc(); tomorrow xyzalloc(). > This happens all the time: look at the FreeBSD reallocf(), for example. The reallocf() function is a replacement-safe wrapper function; it is a red herring. Memeory allocation in the C library is *defined by standards*, all of which require the ability to replace memory allocation and freeing functions with equivalents, *as a unit*. The POSIX corrigenda is very clear on this matter. For example, less of a red herring is "strdup()", as it's also defined by POSIX. But internally, it's required to use the externally visible allocation method "malloc()". > Suppose the OS distributor decides that valloc() or xyzalloc() should do > its own thing, rather than calling malloc(). Then that OS distributor's OS no longer complies with standards. Realize that programs are not written to interfaces, they are written to standards which are implemented by interfaces. The distinction is subtle, but very, very important: it means that if you write a program to a standard, and a system implements that same standard, then you are guaranteed that your program will compile and run on that system. > This happens too: I tried the sample program shown below under > Linux, and somedorkyosfunction() ended up calling brk() rather > than my own malloc(). You should complain on one of the Linux lists that Linux is in non-compliance with "IEEE POSIX 1003.1-2003 Issue 6", and will they please fix this function, since it is broken. The Linux people are generally reasonable and generally cognizant of standards; I'm sure if they are notified reasonably of any non-compliance, they'll very quickly hurry to fix it. > Finally, suppose the OS distributor decides that some syscall I use > should be replaced by a library routine that uses somedorkyosfunction(). > This happens too. Note for the reading-impaired: I'm not saying that the > name malloc() has ever been used for a syscall; I'm saying that poll(), > socket(), et al. have been used for allocating library routines. Link dynamic instead of static. Nothing that's currently a system call is guaranteed to remain a system call, so an OS has no contract which prohibits it. Technically, you are in non-compliance with the IABI ELF specification if you link your program statically. If you are in non-compliance with the Intel Application Binary Interface specification, you should expect to *minimally* be required to relink, recompile, or have to modify your program source code, each time the OS major version number changes, for *any* IABI ELF compliant OS. The only contract it has with you is to not make symbols required by the standards with which it complies go away on you, without bumping the compliance level at the same time. > Result: My program innocently calls that library routine, which calls > somedorkyosfunction(), which calls valloc() or xyzalloc(), which > incorrectly assumes that its sbrk() results are contiguous, destroying > the data allocated by my own malloc(). Practically, only very bad programmers write libraries this promiscuous. Do you you really care if your software runs on OS's written by very bad programmers? If so, where do you dray the line? Am I permitted to, for example, add a parameter to all system calls to permit the optional passing of either a "0" or a mailbox address to deal with AST notification, thus making all system calls asynchronus? At some point, you have to pick a standard to which you will code, and then expect that the functions that you call that are defined by that standard will not change out from underneath you at some point, until the platform on which you are developing willfully decides to no longer comply with the standard. I know that you use "#include" in your programs; this works because there is a language standard call ANSI C which was defined by the ANSI X3J11 committee, and it includes standardization of things like preprocessor directives, and so on. At some level, you have to pick a standard that you are going to trust to be present on the platform(s) on which you wish your code to run, and then code to that standard as if it were set in concrete. > As I said before, I encounter more than enough portability problems > without going out of my way to look for them. I wish OS distributors > would put a little more thought into the needs of people who _don't_ > spend their entire lives working with a single platform. In general, we do: we attempt to comply with IEEE POSIX 1003.1; if you write to the 1998 version of that standard, we guarantee that your code will run on our platform, within the limits of the machine architecture (e.g. you can't map 400TB of disk file into your process address space on a 386 machine), or we will fix whatever it is that's preventing your program from running correctly. This includes fixing all instances of "somedorkyosfunction()" that you are able to find on our platforms. The Linux people are the same way; your example function, which you didn't name for us so we could get them to fix it, is something they would almost certainly immediately take care of, were you to bring it to their attention as a standards compliance issue. -- Terry From owner-freebsd-performance@FreeBSD.ORG Thu Jun 26 12:04:44 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B1EE037B401 for ; Thu, 26 Jun 2003 12:04:44 -0700 (PDT) Received: from pop018.verizon.net (pop018pub.verizon.net [206.46.170.212]) by mx1.FreeBSD.org (Postfix) with ESMTP id C70C943FF2 for ; Thu, 26 Jun 2003 12:04:43 -0700 (PDT) (envelope-from cswiger@mac.com) Received: from mac.com ([141.149.47.46]) by pop018.verizon.net (InterMail vM.5.01.05.33 201-253-122-126-133-20030313) with ESMTP id <20030626190443.RITK11703.pop018.verizon.net@mac.com> for ; Thu, 26 Jun 2003 14:04:43 -0500 Message-ID: <3EFB43C9.2050106@mac.com> Date: Thu, 26 Jun 2003 15:04:41 -0400 From: Chuck Swiger Organization: The Courts of Chaos User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030612 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-performance@freebsd.org References: <20030623030004.40078.qmail@cr.yp.to> <20030624203536.D17881-100000@mail.chesapeake.net> <20030625060629.51087.qmail@cr.yp.to> <3EFA28EF.9050400@mac.com> <20030626020455.51862.qmail@cr.yp.to> In-Reply-To: <20030626020455.51862.qmail@cr.yp.to> X-Enigmail-Version: 0.76.0.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Authentication-Info: Submitted using SMTP AUTH at pop018.verizon.net from [141.149.47.46] at Thu, 26 Jun 2003 14:04:42 -0500 Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jun 2003 19:04:45 -0000 D. J. Bernstein wrote: >> Remember that VMM hardware requires page-alignment > > When I ask why the stack and data aren't put on the same page, and you > say ``They aren't on the same page,'' you aren't answering the question. True. If A implies B, yet A is false, what have you shown about B? I didn't write the words "They aren't on the same page". What I wrote was "stack and data have different VM protections and, since VMM hardware requires page alignment, the stack and data need to be put on different pages." If you don't think the latter paraphrase is a valid answer to the question you asked, fine, say so: quoting a single phrase from the three paragraphs I wrote, and then adapting it to the point where you can complain that what I didn't say didn't answer your question has more to do with rhetorical strawmen. Thanks. > (As for adding an x bit to data: This obviously won't break anything.) Using VMM protection to forbid code execution within the DATA, BSS, heap, and stack (if one can) mitigates against a common class of problems-- "buffer overflows"-- which have lead to a vast number of security vulnerabilities. Well-written programs do not require VMM hardware to protect them from such issues, but that doesn't mean that a general-purpose OS like FreeBSD should assume that all programs are well-written. [ ... ] >> It's easy to write a memory allocator that performs a specific case well; >> writing a general purpose malloc is significantly more complicated, > > I'm not talking about replacing malloc() with a special-purpose > allocator. I'm talking about adding a tiny bit of code to malloc() to > magically take advantage of space that is being ignored right now. > > The savings in this situation go beyond those dozens of megabytes of > magically reacquired RAM. There's a nasty spike in memory usage as soon > as malloc() starts extending the heap; when a program's allocations fit > into the magically reacquired RAM, the program also avoids the spike. Calling malloc(1) for the first time causes a 16K spike in memory usage under FreeBSD 4.8; that's a factor of four larger than the minimum possible allocation of one 4K page. [ It's not clear whether all four VM pages are actually allocated, or whether one is and three are waiting for a write-fault before the system allocates resources for them. It's also worth noting that malloc() has to return memory properly aligned to whatever the local hardware requires for arbitrary data types: malloc(1), malloc(4), and even malloc(16) might all reserve the same amount of space. ] With regard to your magic suggestion, frankly, what you are doing is solving a much easier problem domain than the general purpose malloc is expected to be capable of solving. Linear memory allocation and no provision to handle internal fragmentation (because you'll never release or reuse in this buffer of memory, yes?) are wonderful decisions for a memory allocator tuned for a very small program. So long as the program doesn't repeatedly allocate and free this memory, anyway. -- -Chuck From owner-freebsd-performance@FreeBSD.ORG Thu Jun 26 14:04:36 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9AD0437B401 for ; Thu, 26 Jun 2003 14:04:36 -0700 (PDT) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by mx1.FreeBSD.org (Postfix) with SMTP id F1CCE43FF2 for ; Thu, 26 Jun 2003 14:04:35 -0700 (PDT) (envelope-from djb-dsn-1056661507.36261@cr.yp.to) Received: (qmail 36262 invoked by uid 1017); 26 Jun 2003 21:05:07 -0000 Date: 26 Jun 2003 21:05:07 -0000 Message-ID: <20030626210507.36261.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: freebsd-performance@freebsd.org References: <20030626034138.88365.qmail@cr.yp.to> <3EFA9CA3.E616B54C@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: Re: The dangers of replacing malloc() X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jun 2003 21:04:36 -0000 Terry proposed that I obtain the desired sequence of sbrk() calls by writing my own version of malloc(). I explained how this can destroy data; it isn't portable. Terry now makes the rather strange claim that POSIX guarantees the portability of the resulting programs---even though POSIX doesn't specify sbrk() or various other functions that I use. Terry Lambert writes: > Memeory allocation in the C library is *defined by standards*, > all of which require the ability to replace memory allocation and > freeing functions with equivalents, *as a unit*. You obviously aren't claiming that POSIX requires _all_ system functions to use a replacement malloc()/realloc()/free() library for allocation. That would prohibit sbrk() itself, for example. Are you trying to say that POSIX requires all _POSIX functions_ to do their memory allocation via malloc()/realloc()/free()? This has no relevance to anything I said. We aren't talking about programs that restrict themselves to the POSIX functions. > > Suppose the OS distributor decides that valloc() or xyzalloc() should do > > its own thing, rather than calling malloc(). > Then that OS distributor's OS no longer complies with standards. I already showed you code demonstrating that the Linux valloc() works this way. If, as you claim, there's a ``standard'' prohibiting the Linux behavior, then that ``standard'' is not useful for people who care about real-world portability. Anyway, I see no evidence supporting your claim. > If you are in non-compliance with the Intel Application Binary Interface > specification, you should expect to *minimally* be required to relink, > recompile, or have to modify your program source code, each time the > OS major version number changes If you link statically, upgrades can break your _source code_? Wow. I never realized that the -static option had such power. Will it also cause hair to grow on your palms? ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-freebsd-performance@FreeBSD.ORG Thu Jun 26 14:26:27 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C606537B404 for ; Thu, 26 Jun 2003 14:26:27 -0700 (PDT) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by mx1.FreeBSD.org (Postfix) with SMTP id E5B2F4400E for ; Thu, 26 Jun 2003 14:26:26 -0700 (PDT) (envelope-from djb-dsn-1056662819.51367@cr.yp.to) Received: (qmail 51368 invoked by uid 1017); 26 Jun 2003 21:26:59 -0000 Date: 26 Jun 2003 21:26:59 -0000 Message-ID: <20030626212659.51367.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: freebsd-performance@freebsd.org References: <20030626025029.71392.qmail@cr.yp.to> <200306260515.h5Q5FhPF020045@bitblocks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jun 2003 21:26:28 -0000 Bakul Shah writes: > Instead of complaining about wasting 78 megabytes and arguing > about why various proposed solutions fall short and why your > way is the best, why don't you come up with a patch that > saves space for small programs? Funny. Seems to me that I keep making concrete suggestions---including a detailed proposal for giving more space to malloc()---and the answer is consistently ``We really don't care about per-process overhead.'' What's the benefit of a patch for people who don't even see the problem? ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-freebsd-performance@FreeBSD.ORG Thu Jun 26 14:51:54 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0F33237B401 for ; Thu, 26 Jun 2003 14:51:54 -0700 (PDT) Received: from silver.he.iki.fi (silver.he.iki.fi [193.64.42.241]) by mx1.FreeBSD.org (Postfix) with ESMTP id 944A34400F for ; Thu, 26 Jun 2003 14:51:52 -0700 (PDT) (envelope-from pete@he.iki.fi) Received: from PETEX31 (gprs-prointernet-3e47d46b.mobile.inet.fi [62.71.212.107]) by silver.he.iki.fi (8.12.9/8.11.4) with SMTP id h5QLpcsL095250; Fri, 27 Jun 2003 00:51:42 +0300 (EEST) (envelope-from pete@he.iki.fi) Message-ID: <005801c33c2d$1f0e94b0$6bd4473e@PETEX31> From: "Petri Helenius" To: "D. J. Bernstein" , References: <20030626025029.71392.qmail@cr.yp.to><200306260515.h5Q5FhPF020045@bitblocks.com> <20030626212659.51367.qmail@cr.yp.to> Date: Thu, 26 Jun 2003 23:47:12 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jun 2003 21:51:54 -0000 > > Funny. Seems to me that I keep making concrete suggestions---including a > detailed proposal for giving more space to malloc()---and the answer is > consistently ``We really don't care about per-process overhead.'' What's > the benefit of a patch for people who don't even see the problem? > Many programmers read and write C more fluently than they do english. Code can also be run trough the common case benchmarks, proving that the improvements you suggest are not going to detoriate the 99.9%++ of users who don´t have 10000 processes. In general, not to dismiss the requirements, I think the design is broken if you require permission separation and memory separation between 10000 processes which run identical code, since that implies that either the code is badly designed, horribly broken or you expect it to be either or both. The only viable option would be that part of the executable actually comes somewhere else which kind of dismisses the optimization parameters because then the size would be unknown. Memory is cheap and FreeBSD supports 64G of it. Pete From owner-freebsd-performance@FreeBSD.ORG Thu Jun 26 14:54:02 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1BCF437B401 for ; Thu, 26 Jun 2003 14:54:02 -0700 (PDT) Received: from pop016.verizon.net (pop016pub.verizon.net [206.46.170.173]) by mx1.FreeBSD.org (Postfix) with ESMTP id EADA043FEC for ; Thu, 26 Jun 2003 14:54:00 -0700 (PDT) (envelope-from cswiger@mac.com) Received: from mac.com ([141.149.47.46]) by pop016.verizon.net (InterMail vM.5.01.05.33 201-253-122-126-133-20030313) with ESMTP id <20030626215359.LZA3199.pop016.verizon.net@mac.com>; Thu, 26 Jun 2003 16:53:59 -0500 Message-ID: <3EFB6B75.3000705@mac.com> Date: Thu, 26 Jun 2003 17:53:57 -0400 From: Chuck Swiger Organization: The Courts of Chaos User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030612 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-performance@freebsd.org References: <20030626025029.71392.qmail@cr.yp.to> <200306260515.h5Q5FhPF020045@bitblocks.com> <20030626212659.51367.qmail@cr.yp.to> In-Reply-To: <20030626212659.51367.qmail@cr.yp.to> X-Enigmail-Version: 0.76.0.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Authentication-Info: Submitted using SMTP AUTH at pop016.verizon.net from [141.149.47.46] at Thu, 26 Jun 2003 16:53:59 -0500 cc: "D. J. Bernstein" Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jun 2003 21:54:02 -0000 D. J. Bernstein wrote: [ ... ] > Funny. Seems to me that I keep making concrete suggestions---including a > detailed proposal for giving more space to malloc()---and the answer is > consistently ``We really don't care about per-process overhead.'' What's > the benefit of a patch for people who don't even see the problem? Speaking for myself (rather than for others), I care about per-process overhead. The source code to FreeBSD's implementation of malloc is available at: /usr/src/lib/libc/stdlib/malloc.c If you'd like to implement your suggested changes, generate a patch (preferably via 'diff -duw'), you may either submit it as a PR via the 'send-pr' command, or you can post it to this list. It would be nice if you performed some regression testing to confirm that your change works and is beneficial not just for your specific circumstances, but for the general case as well. If you were to do these things, and then people said "We really don't care...", at that time you'd have justification for the position taken prematurely above. -- -Chuck From owner-freebsd-performance@FreeBSD.ORG Thu Jun 26 15:09:14 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3961D37B401 for ; Thu, 26 Jun 2003 15:09:14 -0700 (PDT) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by mx1.FreeBSD.org (Postfix) with SMTP id 8CD8343FCB for ; Thu, 26 Jun 2003 15:09:13 -0700 (PDT) (envelope-from djb-dsn-1056665385.75399@cr.yp.to) Received: (qmail 75400 invoked by uid 1017); 26 Jun 2003 22:09:45 -0000 Date: 26 Jun 2003 22:09:45 -0000 Message-ID: <20030626220945.75399.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: freebsd-performance@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: sacrificing performance for confusion X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jun 2003 22:09:14 -0000 > Using VMM protection to forbid code execution within the DATA, BSS, heap, > and stack (if one can) mitigates against a common class of problems-- I don't believe you. Show me a real program that's (1) vulnerable if data/bss/heap/stack are executable and (2) invulnerable otherwise. Yes, attacks are often written to take advantage of executable stacks; but, in every case I've investigated, the programs would still have been vulnerable with non-executable stacks. ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-freebsd-performance@FreeBSD.ORG Thu Jun 26 18:04:27 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ABFE037B401 for ; Thu, 26 Jun 2003 18:04:27 -0700 (PDT) Received: from mail.chesapeake.net (chesapeake.net [208.142.252.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id AA38F43F3F for ; Thu, 26 Jun 2003 18:04:26 -0700 (PDT) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id h5R14Og40661; Thu, 26 Jun 2003 21:04:24 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Thu, 26 Jun 2003 21:04:24 -0400 (EDT) From: Jeff Roberson To: "D. J. Bernstein" In-Reply-To: <20030626220945.75399.qmail@cr.yp.to> Message-ID: <20030626210127.O17881-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-performance@freebsd.org Subject: Re: sacrificing performance for confusion X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Jun 2003 01:04:28 -0000 On 26 Jun 2003, D. J. Bernstein wrote: > > Using VMM protection to forbid code execution within the DATA, BSS, heap, > > and stack (if one can) mitigates against a common class of problems-- > > I don't believe you. Show me a real program that's (1) vulnerable if > data/bss/heap/stack are executable and (2) invulnerable otherwise. > > Yes, attacks are often written to take advantage of executable stacks; > but, in every case I've investigated, the programs would still have been > vulnerable with non-executable stacks. They would be vulnerable to a denial of service but not to any privilige gaining exploit. Please go spread FUD somewhere else. We're not going to put all of your sections in the same page. Nobody cares. Cheers, Jeff From owner-freebsd-performance@FreeBSD.ORG Thu Jun 26 18:23:35 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8A51B37B401 for ; Thu, 26 Jun 2003 18:23:35 -0700 (PDT) Received: from out002.verizon.net (out002pub.verizon.net [206.46.170.141]) by mx1.FreeBSD.org (Postfix) with ESMTP id AAB9E43FF3 for ; Thu, 26 Jun 2003 18:23:34 -0700 (PDT) (envelope-from cswiger@mac.com) Received: from mac.com ([141.149.47.46]) by out002.verizon.net (InterMail vM.5.01.05.33 201-253-122-126-133-20030313) with ESMTP id <20030627012333.YYO13328.out002.verizon.net@mac.com>; Thu, 26 Jun 2003 20:23:33 -0500 Message-ID: <3EFB9C92.4010807@mac.com> Date: Thu, 26 Jun 2003 21:23:30 -0400 From: Chuck Swiger Organization: The Courts of Chaos User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030612 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "D. J. Bernstein" References: <20030626220945.75399.qmail@cr.yp.to> In-Reply-To: <20030626220945.75399.qmail@cr.yp.to> X-Enigmail-Version: 0.76.0.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Authentication-Info: Submitted using SMTP AUTH at out002.verizon.net from [141.149.47.46] at Thu, 26 Jun 2003 20:23:33 -0500 cc: freebsd-performance@freebsd.org Subject: Re: sacrificing performance for confusion X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Jun 2003 01:23:35 -0000 D. J. Bernstein wrote: >>Using VMM protection to forbid code execution within the DATA, BSS, heap, >>and stack (if one can) mitigates against a common class of problems-- > > I don't believe you. Show me a real program that's (1) vulnerable if > data/bss/heap/stack are executable and (2) invulnerable otherwise. The easy case of buffer overflows involves someone putting a string on the stack which then gets passed to evec()/CreateProc/system()/popen()/etc: such cases fail criterion (2). The more difficult to exploit form of buffer overflow that depends on VMM execute permissions for the data/stack/etc tend to result from overwrites the return address or stored PC register with an address that points to memory controlled by the intruder. In particular, a buffer overflow in signal handling code or setjmp/longjmp, where the data being overwritten is the signal handler trampoline or the saved processor environment (env) can be very nasty. Anyway, the exploited buffer contains code which opens a socket ties it to a /bin/sh, cmd.exe, or whatever. If the VMM protection permits this code to run, that is. But you asked for code. I didn't write the following, but let it stand for the many examples of Unix or Windows shellcode: http://www.pkix.net/~chuck/win32_cmd.s ...I'm sure you could find or write the equivalent for Unix/C. -- Take a look at the following exception handler delegate (in Objective C): http://www.pkix.net/~chuck/CrashCatcher/Examples/Advanced/ControllerCC.m ...and note the following chunk of code: // The string returned by this method will be included in each crash report static char signature[40]; - (const char *)CC_applicationSignature { if (hasCrashed) exit(1); // quit if generating second report strcpy(signature, "Tailor "); strcat(signature, [self versionString]); #ifdef DEBUG strcat(signature, " (built for DEBUG)"); #endif return signature; } This code gets called after an exception has been handled, such as a Unix signal. If the buffer was on the stack (automatic variable, dynamicly allocated by alloca(), whatever), and versionString was too long, someone could overflow the stack and write over the signal handler trampoline. Or the stored return address of the caller's stack frame; the details don't matter: the point is having the PC switch to an address on the stack/heap/DATA and run the exploit code. [ See http://www.pkix.net/~chuck/CrashCatcher/libCrashCatcher/CCTypes.h and the CC_SignalTrigger.h and .m files for a type of exception applicable to plain C, namely a Unix signal, rather than the more esoteric flavors. ] -- Furthermore, at http://www.pkix.net/~chuck/CrashCatcher/Examples/CrashDummy there is a demo app containing a number of different exception types: TriggerMapVector triggers = { // non-fatal report triggers { "dump", "forceADumpBacktrace" }, { "usr1", "sigusr1" }, // fatal report triggers { "free", "triggerMessageToFreedObject" }, { "subclass", "triggerSubclassResponsibility" }, { "segfault", "triggerSegmentFault" }, { "buserr", "triggerBusError" }, { "usr2", "sigusr2" }, // type decoding demonstration [indirect] triggers { "object", "objectAndSelectorDemo" }, { "string", "stringDemo" }, { "basic", "assortedTypeDemo" }, { NULL, NULL } }; However, I will also acknowledge that it may be the case that it may be possible for code to work around a non-executable stack as per http://secinf.net/info/unix/stack.txt, which concludes as: ]Hopefully, these exploits demonstrate that it is important to make sure that ]programs that run at an elevated privilege are free of buffer overflow bugs. ]The stack protection will certainly help protect you from the majority of ]intruders, but moderately competent intruders will probably be able to bypass ]it. ] ]I believe that these techniques could be adopted for use in a remote exploit. ]Assuming we go with the strcpy technique, the attacker would need to do ]several things. First of all, the attacker would need to put the fake stack ]frame somewhere in the buffer that was overflowed. Then the attacker would ]have to make educated guesses at a few things. These would be: the location ]that strcpy() is mapped at, a safe location to store the shellcode, and the ]location of the fake stack frame. You could make pretty educated guesses at all ]of these, so it might only require a small number of tries. Of course, the ]added time and interaction that this would involve certainly makes the stack ]protection useful. -- This being said, I didn't claim that a non-executable stack rendered a system _invulnerable_, simply that it mitigated the difficulty of an exploit. Let me also acknowledge that there are other ways of protecting the stack besides VMM hardware, such as: http://www.trl.ibm.com/projects/security/ssp/main.html -- -Chuck From owner-freebsd-performance@FreeBSD.ORG Thu Jun 26 20:06:02 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0553B37B401 for ; Thu, 26 Jun 2003 20:06:02 -0700 (PDT) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by mx1.FreeBSD.org (Postfix) with SMTP id 534C043FEC for ; Thu, 26 Jun 2003 20:06:01 -0700 (PDT) (envelope-from djb-dsn-1056683193.18391@cr.yp.to) Received: (qmail 18392 invoked by uid 1017); 27 Jun 2003 03:06:33 -0000 Date: 27 Jun 2003 03:06:33 -0000 Message-ID: <20030627030633.18391.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: freebsd-performance@freebsd.org References: <20030626220945.75399.qmail@cr.yp.to> <3EFB9C92.4010807@mac.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: Re: sacrificing performance for confusion X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Jun 2003 03:06:02 -0000 Chuck Swiger writes: > However, I will also acknowledge that it may be the case that it may be > possible for code to work around a non-executable stack In every case that I've investigated, not only is it definitely possible to seize control of the process with limited exec, it's actually fairly easy. Maybe there are counterexamples, but you obviously don't know any. (Note to certain people making fools of themselves: that's ``seize control,'' not ``kill.'') If disabling x bits becomes popular, attackers will start working around it, and we'll be back to where we are today. We need to stop the buffer overflows (and other problems) from occurring in the first place. Anyway, it seems unlikely that you believe that stack-x data-non-x makes life any more difficult for the attacker than stack-x data-x; and you obviously think that stack-non-x data-non-x would be the best situation. So why do you object to merging the stack and data segments? ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-freebsd-performance@FreeBSD.ORG Thu Jun 26 20:49:45 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7399237B401 for ; Thu, 26 Jun 2003 20:49:45 -0700 (PDT) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 087EE43F93 for ; Thu, 26 Jun 2003 20:49:45 -0700 (PDT) (envelope-from baka@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1921) id C96482ED433; Thu, 26 Jun 2003 20:49:44 -0700 (PDT) Date: Thu, 26 Jun 2003 20:49:44 -0700 From: Jon Mini To: "D. J. Bernstein" Message-ID: <20030627034944.GP55678@elvis.mu.org> References: <20030625060629.51087.qmail@cr.yp.to> <20030625023621.N17881-100000@mail.chesapeake.net> <20030625094301.56349.qmail@cr.yp.to> <20030626025029.71392.qmail@cr.yp.to> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030626025029.71392.qmail@cr.yp.to> User-Agent: Mutt/1.4.1i cc: freebsd-performance@freebsd.org Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Jun 2003 03:49:45 -0000 I am aware of what you say, but it doesn't not change the inaccuracy of your original statement. D. J. Bernstein [djb@cr.yp.to] wrote : > Jon Mini writes: > > I'm sorry, but you are way off here. First of all, caches are *much > > larger* than the size of the processes you are talking about. > > I'm sorry, but you are being misled by a naive model of CPU performance. > On a typical Pentium in our department, the following program becomes > three times faster when SPACING is changed from 4096 to 128: > > #define SPACING 4096 > char data[8 * SPACING]; > main() > { > int i; > for (i = 0;i < 10000000;++i) { > data[0] = data[SPACING]; > data[2 * SPACING] = data[3 * SPACING]; > data[4 * SPACING] = data[5 * SPACING]; > data[6 * SPACING] = data[7 * SPACING]; > } > } > > >From an asm programmer's perspective, when FreeBSD decides to spread a > small program's variables between > > * the beginning of a data page, > * the beginning of a bss page, > * the beginning of a malloc mmap page, > * the beginning of a heap page, > * the beginning of the next heap page, > * the beginning of yet another heap page, > > et cetera, it is actively trying (with varying degrees of success) to > damage cache performance in exactly the same way that this program does. > > ---D. J. Bernstein, Associate Professor, Department of Mathematics, > Statistics, and Computer Science, University of Illinois at Chicago > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" -- Jonathan Mini http://www.freebsd.org/ From owner-freebsd-performance@FreeBSD.ORG Thu Jun 26 20:51:33 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3432637B401 for ; Thu, 26 Jun 2003 20:51:33 -0700 (PDT) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id CA3FB44003 for ; Thu, 26 Jun 2003 20:51:32 -0700 (PDT) (envelope-from baka@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1921) id B492B2ED446; Thu, 26 Jun 2003 20:51:32 -0700 (PDT) Date: Thu, 26 Jun 2003 20:51:32 -0700 From: Jon Mini To: "D. J. Bernstein" Message-ID: <20030627035132.GQ55678@elvis.mu.org> References: <009901c33b17$1a5090c0$10d4473e@PETEX31> <20030626022722.62942.qmail@cr.yp.to> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030626022722.62942.qmail@cr.yp.to> User-Agent: Mutt/1.4.1i cc: freebsd-performance@freebsd.org Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Jun 2003 03:51:33 -0000 D. J. Bernstein [djb@cr.yp.to] wrote : > I want separate processes for the memory protection. Each process is > chrooted under its own uid, so it can't write to disk except through > supplied file descriptors, and it can't hit other processes. (If I had a > portable way to cut off other communication channels, such as creating > new sockets, I'd do that too.) Have you looked into our jail(8) mechanism? Considiner your resource conumtion needs, it is also too heavy-weight. > protection; I realize that it's hard to do better than that. But I'm not > willing to casually piss away large fractions of a gigabyte of RAM. Not > this decade, anyway. Unfortunately, FreeBSD is the wrong operating system for you. > I'm willing to sacrifice one page per process for the sake of memory > The lack of memory protection is exactly why I can't use threads. It's > also why I'm not surprised to hear that processes are _slightly_ less > efficient than threads. But something is seriously wrong if processes > are _much_ less efficient than threads. There are many other contributing factors that have been mentioned, but you are choosing to ignore. I'm afraid I can't comunicate with you effectively if you are going to ignore the facts. My sincerest apologies, -- Jonathan Mini http://www.freebsd.org/ From owner-freebsd-performance@FreeBSD.ORG Thu Jun 26 21:45:28 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2FCE037B401 for ; Thu, 26 Jun 2003 21:45:28 -0700 (PDT) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 97EF643F93 for ; Thu, 26 Jun 2003 21:45:27 -0700 (PDT) (envelope-from baka@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1921) id 7CF032ED43B; Thu, 26 Jun 2003 21:45:27 -0700 (PDT) Date: Thu, 26 Jun 2003 21:45:27 -0700 From: Jon Mini To: "D. J. Bernstein" Message-ID: <20030627044527.GT55678@elvis.mu.org> References: <20030626220945.75399.qmail@cr.yp.to> <3EFB9C92.4010807@mac.com> <20030627030633.18391.qmail@cr.yp.to> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030627030633.18391.qmail@cr.yp.to> User-Agent: Mutt/1.4.1i cc: freebsd-performance@freebsd.org Subject: Sounding important (was re: sacrificing performance for confusion) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Jun 2003 04:45:28 -0000 Unfortunately, security is not an all-or-nothing proposition. Vulnerability analysis does not yield a yes or no answer. Instead, it yields a probability which is generally represented in terms of the cost it likely to take in order to compromise the system. Witness cryptography and message hashes. Making a system more secure is not a non-sensical statement. What you are arguing is that because some exploits exist other than the class described here, you should not rule out this class of exploits. Frankly, this is idiotic. I imagine that OpenBSD doesn't like your ideas either. Please stop making broad sweeping statements about things you don't understand. Not only are you making yourself the fool, but you are misleading other people who might be ignorant and are reading these lists looking to learn. Also, making semantic arguments with people here is adolescent. Obviously, because you have earned a doctorate in some field, you have experience sounding intelligent in front other people. We have no wish to see that here. Please understand that this is one of the most diverse forums of discussion on the areas of computer science as it applies to operating system development. From that follows that many of the people who post here are going to speak using different words than you are used to simply because their background also includes slightly different nomenclature. Many of them will not be fluent english speakers and will have a hard time understanding your text without a dictionary. Also, the technical language of the computer industry is as much a slang as in the legal industry. Furthermore, realise that this is an industry, not an area of academic investigation, and that we have real deliverable goals that must be met and deal with a wide population of people who speak a less precise technical language. There are many thousands, if not hundreds of thousands, of people on the various FreeBSD lists at any given point in time who fall into this category, and many many more who find our postings in the archives during research using a search engine. Attempting to say that we are inexperienced or unknowledgeable because we do not use the correct, large, specific words is completely contrary to what this forum is all about. Here, we try to speak clearly and in the simplest language possible because many of the people who read this forum are not as well-versed in these topics as we are, and many of whom would like to learn more. In short, we do not care how learned you sound. We only care about whether or not you are accurate, and you are not. When you are wrong, we will be very helpful in correcting you. This is not an attack, it is one of the strongest and most valueble parts of the FreeBSD community. It saddens me that you do not wish to participate. While you are wrong on these issues, I am sure that you have a good wealth of knowledge that you could contribute to our intellectual pool. I am sorry that your usage case is not of interest to the project. There are many projects. I do not know of one off the top of my head that fits your needs, but there probably is one. If not, you are welcome to start your own. I have spoken enough to you on this subject. Good luck in your endeavours. D. J. Bernstein [djb@cr.yp.to] wrote : > Chuck Swiger writes: > > However, I will also acknowledge that it may be the case that it may be > > possible for code to work around a non-executable stack > > In every case that I've investigated, not only is it definitely possible > to seize control of the process with limited exec, it's actually fairly > easy. Maybe there are counterexamples, but you obviously don't know any. > (Note to certain people making fools of themselves: that's ``seize > control,'' not ``kill.'') > > If disabling x bits becomes popular, attackers will start working around > it, and we'll be back to where we are today. We need to stop the buffer > overflows (and other problems) from occurring in the first place. > > Anyway, it seems unlikely that you believe that stack-x data-non-x makes > life any more difficult for the attacker than stack-x data-x; and you > obviously think that stack-non-x data-non-x would be the best situation. > So why do you object to merging the stack and data segments? > > ---D. J. Bernstein, Associate Professor, Department of Mathematics, > Statistics, and Computer Science, University of Illinois at Chicago > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" -- Jonathan Mini http://www.freebsd.org/ From owner-freebsd-performance@FreeBSD.ORG Thu Jun 26 21:58:33 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E8CE137B401 for ; Thu, 26 Jun 2003 21:58:33 -0700 (PDT) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by mx1.FreeBSD.org (Postfix) with SMTP id 669E043FE1 for ; Thu, 26 Jun 2003 21:58:33 -0700 (PDT) (envelope-from djb-dsn-1056689945.61410@cr.yp.to) Received: (qmail 61411 invoked by uid 1017); 27 Jun 2003 04:59:05 -0000 Date: 27 Jun 2003 04:59:05 -0000 Message-ID: <20030627045905.61410.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: freebsd-performance@freebsd.org References: <20030625060629.51087.qmail@cr.yp.to> <20030625023621.N17881-100000@mail.chesapeake.net> <20030625094301.56349.qmail@cr.yp.to> <20030626025029.71392.qmail@cr.yp.to> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Jun 2003 04:58:34 -0000 Jon, you're simply wrong. The original Pentium and Pentium Pro have 8K L1 dcache; the Pentium MMX and Pentium II/III have 16K L1 dcache. See http://cr.yp.to/hardware/x86.html for pointers to more information. ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-freebsd-performance@FreeBSD.ORG Fri Jun 27 03:25:16 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6862837B401 for ; Fri, 27 Jun 2003 03:25:16 -0700 (PDT) Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net [207.217.120.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id BDEB343FE0 for ; Fri, 27 Jun 2003 03:25:15 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38lc155.dialup.mindspring.com ([209.86.4.165] helo=mindspring.com) by bluejay.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19VqPo-0002Tg-00; Fri, 27 Jun 2003 03:25:13 -0700 Message-ID: <3EFC1B2A.344674DD@mindspring.com> Date: Fri, 27 Jun 2003 03:23:38 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: "D. J. Bernstein" References: <20030626034138.88365.qmail@cr.yp.to> <20030626210507.36261.qmail@cr.yp.to> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4e3a69767a611f1857f3eac942f95e2a3350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: freebsd-performance@freebsd.org Subject: Re: The dangers of replacing malloc() X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Jun 2003 10:25:16 -0000 "D. J. Bernstein" wrote: > You obviously aren't claiming that POSIX requires _all_ system functions > to use a replacement malloc()/realloc()/free() library for allocation. > That would prohibit sbrk() itself, for example. No. I'm claiming that it requires all _POSIX_ functions to do so. The function sbrk() (actually, brk() is the function of interest, on FreeBSD) is an implementation detail on systems that use it. > Are you trying to say that POSIX requires all _POSIX functions_ to do > their memory allocation via malloc()/realloc()/free()? This has no > relevance to anything I said. We aren't talking about programs that > restrict themselves to the POSIX functions. Then we are talking about programs that have, and will continue to have, the problems that you are complaining about. > > > Suppose the OS distributor decides that valloc() or xyzalloc() should do > > > its own thing, rather than calling malloc(). > > Then that OS distributor's OS no longer complies with standards. > > I already showed you code demonstrating that the Linux valloc() works > this way. You showed me code that valled valloc(); you didn't show me code *implementing valloc() on Linux*. I chose not to make this point because it would may have been a strawman, and I didn't need it to support my argument. > If, as you claim, there's a ``standard'' prohibiting the Linux > behavior, then that ``standard'' is not useful for people who care about > real-world portability. Anyway, I see no evidence supporting your claim. I think you are thinking of SunOS, which does have valloc(), and not Linux (neither SuSE nore RedHat have it, according to their manual pages). FreeBSD doesn't have it, and POSIX doesn't specify it. My argument in this case is that the valloc() interface is not portable, and you should not use it. If you are not interested in portability, then you argument about the non-portability of carrying around your own library functions for memory management fails. Either way, your argument fails. > > If you are in non-compliance with the Intel Application Binary Interface > > specification, you should expect to *minimally* be required to relink, > > recompile, or have to modify your program source code, each time the > > OS major version number changes > > If you link statically, upgrades can break your _source code_? > > Wow. I never realized that the -static option had such power. Will it > also cause hair to grow on your palms? That's because you never tried to run a SunOS 4.4u2 binary which used the select(2) system call on SunOS 5.0, which returns ENOSYS if you attempt to call that entry point. If you had, you'd realize that upgrades can break you compiled code, and that you should therefore expect to be required to relink, recompile, or have to modify your program source code. I'll note here that it's strange that you are complaining about a library using the brk(2) system call behind your back, but at the same time you appear unconcerned about libraries using the select(2) system call behind your back, both of which can result in identical classes of breakage. I understand why you want what you want, but I'm going to tell you here and now that you are not going to find it in a general purpose OS unless you are willing to carry around your own code for things which may be implemented differently at the OS vendors discretion, while not preventing them from complying with standards. As a final note on portability, I'll state the obvious: the portability of any program is inversely proportion to the number of system interfaces and system interface behaviours upon which it depends. I personally worked on the first shrink-wrap product ever sold by a third party vendor for UNIX systems, and it was, during the heyday of the UNIX incompatability wars, ported to over 140 different UNIX variants. Its hard system interface requirements were limited to 6 encapsulation functions, for which there were a grand total of three implementations, which fanned out to about 18, which breaks down to not using about 85% of the available system calls. Most ports of this code took about 4 hours, and 3 of those hours were running a validation suite and reading the tape with the source code and writing several copies of the tape with the binary distribution and dirty build tree. PS: The software didn't use valloc(), but it did use malloc()/free(). -- Terry From owner-freebsd-performance@FreeBSD.ORG Fri Jun 27 04:41:54 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A5C8137B401 for ; Fri, 27 Jun 2003 04:41:54 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 10F7E43FFD for ; Fri, 27 Jun 2003 04:41:54 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38lc155.dialup.mindspring.com ([209.86.4.165] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19Vrc0-0000JL-00; Fri, 27 Jun 2003 04:41:52 -0700 Message-ID: <3EFC2CDE.8AE25B2C@mindspring.com> Date: Fri, 27 Jun 2003 04:39:10 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: "D. J. Bernstein" References: <20030626025029.71392.qmail@cr.yp.to> <20030626212659.51367.qmail@cr.yp.to> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a439570a2132878fd7e55a25471eb70422350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: freebsd-performance@freebsd.org Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Jun 2003 11:41:54 -0000 "D. J. Bernstein" wrote: > Bakul Shah writes: > > Instead of complaining about wasting 78 megabytes and arguing > > about why various proposed solutions fall short and why your > > way is the best, why don't you come up with a patch that > > saves space for small programs? > > Funny. Seems to me that I keep making concrete suggestions---including a > detailed proposal for giving more space to malloc()---and the answer is > consistently ``We really don't care about per-process overhead.'' What's > the benefit of a patch for people who don't even see the problem? Your "concrete proposal for malloc" made a number of bad assumptions: o "UVA == 4G", rather than "UVA + KVA == 4G" Costs to move to "UVA == 4G": o pages must be mapped before each copy of data across a protection domain o pages must be unmapped afterwards o If we had access to another 2G of UVA, we'd have no better use for it than to make malloc more space efficient My answer, at least, was "too computationally expensive for a general purpose OS when making the size/computational overhead tradeoff". -- Terry From owner-freebsd-performance@FreeBSD.ORG Fri Jun 27 09:12:20 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2180C37B401 for ; Fri, 27 Jun 2003 09:12:20 -0700 (PDT) Received: from bitblocks.com (bitblocks.com [209.204.185.216]) by mx1.FreeBSD.org (Postfix) with ESMTP id 49F3243FB1 for ; Fri, 27 Jun 2003 09:12:19 -0700 (PDT) (envelope-from bakul@bitblocks.com) Received: from bitblocks.com (localhost [127.0.0.1]) by bitblocks.com (8.12.9/8.12.9) with ESMTP id h5RGCHPF029635; Fri, 27 Jun 2003 09:12:17 -0700 (PDT) (envelope-from bakul@bitblocks.com) Message-Id: <200306271612.h5RGCHPF029635@bitblocks.com> To: "D. J. Bernstein" In-reply-to: Your message of "26 Jun 2003 21:26:59 -0000." <20030626212659.51367.qmail@cr.yp.to> Date: Fri, 27 Jun 2003 09:12:17 -0700 From: Bakul Shah cc: freebsd-performance@freebsd.org Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Jun 2003 16:12:21 -0000 > > Instead of complaining about wasting 78 megabytes and arguing > > about why various proposed solutions fall short and why your > > way is the best, why don't you come up with a patch that > > saves space for small programs? > > Funny. Seems to me that I keep making concrete suggestions---including a > detailed proposal for giving more space to malloc()---and the answer is > consistently ``We really don't care about per-process overhead.'' What's > the benefit of a patch for people who don't even see the problem? If after repeated suggestions people are not "getting it", the reason is usually *not* apathy. Either you are not explaining well or your starting assumptions are different. But show me the code! If I like it I'll use it. "Build it and they will come" -- you should be familiar with that! If enough people like it, may be it will get incorporated in some form. May be. From owner-freebsd-performance@FreeBSD.ORG Fri Jun 27 09:32:52 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1370B37B401 for ; Fri, 27 Jun 2003 09:32:52 -0700 (PDT) Received: from mta6.adelphia.net (mta6.adelphia.net [64.8.50.190]) by mx1.FreeBSD.org (Postfix) with ESMTP id 420BE43FD7 for ; Fri, 27 Jun 2003 09:32:51 -0700 (PDT) (envelope-from Support@Netflag.Net) Received: from nfn2.Netflag.Net ([68.69.240.35]) by mta6.adelphia.net (InterMail vM.5.01.05.32 201-253-122-126-132-20030307) with ESMTP id <20030627163250.JAXW10267.mta6.adelphia.net@nfn2.Netflag.Net>; Fri, 27 Jun 2003 12:32:50 -0400 Message-Id: <5.2.0.9.2.20030627092330.020dc020@pop.dc3.adelphia.net> X-Sender: pedramn@pop.dc3.adelphia.net X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9 Date: Fri, 27 Jun 2003 09:32:13 -0700 To: Bakul Shah , "D. J. Bernstein" From: Pedram Nimreezi In-Reply-To: <200306271612.h5RGCHPF029635@bitblocks.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed cc: freebsd-performance@freebsd.org Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Jun 2003 16:32:52 -0000 Far be it for me to disregard anything Professor Bernstein suggests but Bakul does have an extremely good point and I feel that just as you proved BIND wrong in many ways with the advent of DJBDNS. So you should in proving this point which seems to of been the cause of an almost outrageous malloc and memory argument. I would also agree with Mr. Mini, I love saying that, that maybe the profoundness of your concepts are mitigated by this University-like language which you fail to realize some of the best coders in the world do not understand, like the Portuguese for instance. I have a friend named Manuel (Whose rather famous in the PHP community and whose last name I won't divulge), that even though his programming is very robust his error messages show that his English is rather poor. I feel that even though this is an English speaking mailing list it does give me much joy watching the elites of computer science bicker to no end. The rest of the world should be able to enjoy this free spectacle as well. I think I lost my point just there.. but here's a simple if not rhetoric suggestion that will undoubtedly cause me more boredom... and that is: More consensus less cursing. At 09:12 AM 6/27/2003 -0700, Bakul Shah wrote: > > > Instead of complaining about wasting 78 megabytes and arguing > > > about why various proposed solutions fall short and why your > > > way is the best, why don't you come up with a patch that > > > saves space for small programs? > > > > Funny. Seems to me that I keep making concrete suggestions---including a > > detailed proposal for giving more space to malloc()---and the answer is > > consistently ``We really don't care about per-process overhead.'' What's > > the benefit of a patch for people who don't even see the problem? > >If after repeated suggestions people are not "getting it", >the reason is usually *not* apathy. Either you are not >explaining well or your starting assumptions are different. >But show me the code! If I like it I'll use it. "Build it >and they will come" -- you should be familiar with that! If >enough people like it, may be it will get incorporated in >some form. May be. >_______________________________________________ >freebsd-performance@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-performance >To unsubscribe, send any mail to >"freebsd-performance-unsubscribe@freebsd.org" From owner-freebsd-performance@FreeBSD.ORG Fri Jun 27 23:11:02 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E164D37B401 for ; Fri, 27 Jun 2003 23:11:02 -0700 (PDT) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by mx1.FreeBSD.org (Postfix) with SMTP id EBEA343FE3 for ; Fri, 27 Jun 2003 23:11:01 -0700 (PDT) (envelope-from djb-dsn-1056780693.64427@cr.yp.to) Received: (qmail 64428 invoked by uid 1017); 28 Jun 2003 06:11:33 -0000 Date: 28 Jun 2003 06:11:33 -0000 Message-ID: <20030628061133.64427.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: freebsd-performance@freebsd.org References: <20030626034138.88365.qmail@cr.yp.to> <3EFA9CA3.E616B54C@mindspring.com> <20030626210507.36261.qmail@cr.yp.to> <3EFC1B2A.344674DD@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: Re: The dangers of replacing malloc() X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Jun 2003 06:11:03 -0000 Terry Lambert writes: > My argument in this case is that the valloc() interface is not > portable, and you should not use it. kqueue isn't portable, so you're saying nobody should use that, right? Or do you admit that it's actually a good idea for people to use kqueue, with a compile-time fallback to poll or select? Great. Now, what happens when some other system decides to emulate kqueue (not a huge surprise), using valloc (or some future xyzalloc), which in turn uses sbrk directly (as valloc already does on a huge number of systems, and as xyzalloc will probably do), rather than calling malloc? That's right: it obliterates the data that I obtained from sbrk in my malloc replacement. Kaboom. This is exactly the failure mode I explained before. This is why the weak linking of the system's malloc is useless for experienced programmers who care about portability. (Since you asked: My valloc-uses-sbrk-directly demonstration was under Linux, exactly as I said; specifically, Debian. Are you really so naive as to think that all the Linux functions are listed in the manual?) ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-freebsd-performance@FreeBSD.ORG Fri Jun 27 23:56:10 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0ECE637B401 for ; Fri, 27 Jun 2003 23:56:10 -0700 (PDT) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by mx1.FreeBSD.org (Postfix) with SMTP id 58B2743FDF for ; Fri, 27 Jun 2003 23:56:09 -0700 (PDT) (envelope-from djb-dsn-1056783400.79231@cr.yp.to) Received: (qmail 79232 invoked by uid 1017); 28 Jun 2003 06:56:40 -0000 Date: 28 Jun 2003 06:56:40 -0000 Message-ID: <20030628065640.79231.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: freebsd-performance@freebsd.org References: <20030626025029.71392.qmail@cr.yp.to> <200306260515.h5Q5FhPF020045@bitblocks.com> <20030626212659.51367.qmail@cr.yp.to> <3EFB6B75.3000705@mac.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Jun 2003 06:56:10 -0000 Chuck Swiger writes: > If you'd like to implement your suggested changes, generate a patch > (preferably via 'diff -duw'), you may either submit it as a PR via the > 'send-pr' command, or you can post it to this list. It would be nice if > you performed some regression testing to confirm that your change works and > is beneficial not just for your specific circumstances, but for the general > case as well. Does FreeBSD have a suite of regression tests and speed tests, or would I have to make them up as I go along? ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-freebsd-performance@FreeBSD.ORG Sat Jun 28 01:29:25 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 569BF37B401 for ; Sat, 28 Jun 2003 01:29:25 -0700 (PDT) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id AD15043FAF for ; Sat, 28 Jun 2003 01:29:24 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-2ivfjii.dialup.mindspring.com ([165.247.206.82] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19WB5G-0000sm-00; Sat, 28 Jun 2003 01:29:22 -0700 Message-ID: <3EFD5162.F1FF3947@mindspring.com> Date: Sat, 28 Jun 2003 01:27:14 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: "D. J. Bernstein" References: <20030626034138.88365.qmail@cr.yp.to> <20030626210507.36261.qmail@cr.yp.to> <20030628061133.64427.qmail@cr.yp.to> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4d8c2238a353dbffb5d0c5947669af193350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: freebsd-performance@freebsd.org Subject: Re: The dangers of replacing malloc() X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Jun 2003 08:29:25 -0000 "D. J. Bernstein" wrote: > Terry Lambert writes: > > My argument in this case is that the valloc() interface is not > > portable, and you should not use it. > > kqueue isn't portable, so you're saying nobody should use that, right? Not in portable code. > Or do you admit that it's actually a good idea for people to use kqueue, > with a compile-time fallback to poll or select? Great. Only the fallback code is actually portable. > Now, what happens when some other system decides to emulate kqueue (not > a huge surprise), using valloc (or some future xyzalloc), which in turn > uses sbrk directly (as valloc already does on a huge number of systems, > and as xyzalloc will probably do), rather than calling malloc? Look. You keep describing an implementation error on one system, as if implementations errors aren't correctable, or you have to support all systems with your software, even broken ones with interfaces you should avoid anyway. The "GNU answer" would be to carry around glibc2 with you wherever you go, and don't rely on the vendor's libc programmers having got it right. I can tell you right now I could fix the implementation error on the system in question through judicious used of "ar" and "ranlib" and the C compiler, to write a replacement for valloc() and *MAKE* that system work. In the limit, the programmer is always more able than all vendor supplied user space code and most kernel space code, even on closed source systems. On Open Source systems, the programmer is God, and all problems, if they exist, belong to him to fix ("With great power comes great responsibility"). > That's right: it obliterates the data that I obtained from sbrk in my > malloc replacement. Kaboom. This is exactly the failure mode I explained > before. This is why the weak linking of the system's malloc is useless > for experienced programmers who care about portability. cc -C -o valloc.o < #include void * valloc(size_t size) { void *mem; size_t sz = size; long pg_sz = sysconf(_SC_PAGESIZE); sz += pg_sz; mem = malloc(sz); mem += (pg_sz - 1); mem &= ~(pg_sz - 1); return(mem); } EOF cp /usr/lib/libc.a . ar -r valloc.o # ready to use local libc.a in place of the system libc.a Yes, you will have to add sizeof(void *) to your allocation an keep the real allocation address there, and free it using a negative index offset dereference as the real free address, if your malloc/free does not do other internal housekeeping, but that's the problem of the malloc implementor. > (Since you asked: My valloc-uses-sbrk-directly demonstration was under > Linux, exactly as I said; specifically, Debian. Are you really so naive > as to think that all the Linux functions are listed in the manual?) No... the manual only lists the functions you should consider OK to use... now, given that it's not in the manual... In any case, you have the source code and can fix yourself a local copy, if it was Debian instead of Solaris. -- Terry From owner-freebsd-performance@FreeBSD.ORG Sat Jun 28 03:35:44 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2859E37B401 for ; Sat, 28 Jun 2003 03:35:44 -0700 (PDT) Received: from craig.afraid.org (h24-69-213-234.cc.shawcable.net [24.69.213.234]) by mx1.FreeBSD.org (Postfix) with ESMTP id 886AA43FD7 for ; Sat, 28 Jun 2003 03:35:43 -0700 (PDT) (envelope-from craig@craig.afraid.org) Received: from fireball.internal.lan ([10.0.0.2] helo=fireball) by craig.afraid.org with smtp (Exim 4.20) id 19WD3X-0000aZ-VB for freebsd-performance@freebsd.org; Sat, 28 Jun 2003 03:35:43 -0700 Message-ID: <001f01c33d61$070388a0$0200000a@fireball> From: "Craig Reyenga" To: Date: Sat, 28 Jun 2003 03:35:43 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2600.0000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Subject: Tuning Ggiabit X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Craig Reyenga List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Jun 2003 10:35:44 -0000 (This message may show up three times, I apologise if so.) Hi, I've been trying to get my gigabit link to run as fast as possible, but I have run out of ideas. I've documented (though not very well) what I have done thus far: http://craig.afraid.org/a/b?page=computer/gigabit If there is any more I can do besides what I have already done, I would love to hear it. Please avoid suggestions that require spending money. Thanks in advance, -Craig From owner-freebsd-performance@FreeBSD.ORG Sat Jun 28 04:43:48 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9E5CF37B401 for ; Sat, 28 Jun 2003 04:43:48 -0700 (PDT) Received: from arthur.nitro.dk (port324.ds1-khk.adsl.cybercity.dk [212.242.113.79]) by mx1.FreeBSD.org (Postfix) with ESMTP id F17AC44001 for ; Sat, 28 Jun 2003 04:43:47 -0700 (PDT) (envelope-from simon@arthur.nitro.dk) Received: by arthur.nitro.dk (Postfix, from userid 1000) id 6289110BFA8; Sat, 28 Jun 2003 13:43:46 +0200 (CEST) Date: Sat, 28 Jun 2003 13:43:46 +0200 From: "Simon L. Nielsen" To: Craig Reyenga Message-ID: <20030628114342.GB401@nitro.dk> References: <001f01c33d61$070388a0$0200000a@fireball> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="gatW/ieO32f1wygP" Content-Disposition: inline In-Reply-To: <001f01c33d61$070388a0$0200000a@fireball> User-Agent: Mutt/1.5.4i cc: freebsd-performance@freebsd.org Subject: Re: Tuning Ggiabit X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Jun 2003 11:43:48 -0000 --gatW/ieO32f1wygP Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2003.06.28 03:35:43 -0700, Craig Reyenga wrote: > (This message may show up three times, I apologise if so.) >=20 > Hi, >=20 > I've been trying to get my gigabit link to run as fast as possible, but I > have run out of ideas. I've documented (though not very well) what I have > done thus far: >=20 > http://craig.afraid.org/a/b?page=3Dcomputer/gigabit >=20 > If there is any more I can do besides what I have already done, I would l= ove > to hear it. Please avoid suggestions that require spending money. >=20 > Thanks in advance, When using the em driver under FreeBSD, it can be useful to try to addjust some tuneables in sys/dev/em/if_em.h. Specifically EM_RDTR and EM_TADV. With the hardware in the FreeBSD box I don't think you should expect it to go that fast, since the bus/CPU will be the bottleneck. I would suggest looking at vmstat and similar tools. It can be very useful to find what the bottleneck really is. --=20 Simon L. Nielsen --gatW/ieO32f1wygP Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (FreeBSD) iD8DBQE+/X9u8kocFXgPTRwRArksAKDYjiG4kF/unQA5XZcm8ebZXst6TwCgmLZa v26EcFVwd61vqDm04VUhzUY= =CpMi -----END PGP SIGNATURE----- --gatW/ieO32f1wygP-- From owner-freebsd-performance@FreeBSD.ORG Sat Jun 28 11:13:34 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0DDD337B401 for ; Sat, 28 Jun 2003 11:13:34 -0700 (PDT) Received: from out001.verizon.net (out001pub.verizon.net [206.46.170.140]) by mx1.FreeBSD.org (Postfix) with ESMTP id 21A5A44017 for ; Sat, 28 Jun 2003 11:13:33 -0700 (PDT) (envelope-from cswiger@mac.com) Received: from mac.com ([141.149.47.46]) by out001.verizon.net (InterMail vM.5.01.05.33 201-253-122-126-133-20030313) with ESMTP id <20030628181332.XOZR12592.out001.verizon.net@mac.com>; Sat, 28 Jun 2003 13:13:32 -0500 Message-ID: <3EFDDACB.3080805@mac.com> Date: Sat, 28 Jun 2003 14:13:31 -0400 From: Chuck Swiger Organization: The Courts of Chaos User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030612 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "D. J. Bernstein" References: <20030626025029.71392.qmail@cr.yp.to> <200306260515.h5Q5FhPF020045@bitblocks.com> <20030626212659.51367.qmail@cr.yp.to> <3EFB6B75.3000705@mac.com> <20030628065640.79231.qmail@cr.yp.to> In-Reply-To: <20030628065640.79231.qmail@cr.yp.to> X-Enigmail-Version: 0.76.0.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Authentication-Info: Submitted using SMTP AUTH at out001.verizon.net from [141.149.47.46] at Sat, 28 Jun 2003 13:13:31 -0500 cc: freebsd-performance@freebsd.org Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Jun 2003 18:13:34 -0000 D. J. Bernstein wrote: > Chuck Swiger writes: >> If you'd like to implement your suggested changes, generate a patch >> (preferably via 'diff -duw'), you may either submit it as a PR via the >> 'send-pr' command, or you can post it to this list. It would be nice if >> you performed some regression testing to confirm that your change works and >> is beneficial not just for your specific circumstances, but for the general >> case as well. > > Does FreeBSD have a suite of regression tests and speed tests, or would > I have to make them up as I go along? The FreeBSD makefile system will pass "make test" or "make check" to software components which include such regression tests: that would apply to most of the GNU stuff in /usr/src/gnu/*/* and anything in the ports collection which you're familiar with. Timing the regression tests for gawk, flex, perl, python, etc would make a good starting point. A significantly more expensive regression test would be to follow the instructions in "man release" against RELENG_4_8_0_RELEASE with MAKE_ISOS enabled, and see whether the md5 checksums match. Should be yes, this is just a preliminary step to verify that you can run the system-wide self-hosted build and reproduce a known-valid result. And time this, if you please, say via "nohup time make release RELEASETAG=RELENG_4_8_0_RELEASE ... " Lather, rinse, repeat, only this time point "make release" against a CVS repository containing your change and see whether your modified version of malloc works in the sense that the build creates a working version of FreeBSD that you can burn to CD and install from, presumably with all file checksums except /usr/lib/libc.* identical? (ISO md5's won't match due to that.) Also see whether the build goes faster, which would indicate that the change to malloc() had a net positive effect over the set of small command-line utilities like sh, make, cc, awk, sed, etc...things which more closely resemble the Subject header of this thread than, say, Mozilla, emacs, X window managers, etc. -- -Chuck PS: "make release" will involve probably 20-40 hours of CPU time and probably 10 GB of free space, but a change to something as critical as malloc() means that someone is going to perform the full regression. Or so I would expect... From owner-freebsd-performance@FreeBSD.ORG Sat Jun 28 12:40:15 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 56EE237B401 for ; Sat, 28 Jun 2003 12:40:15 -0700 (PDT) Received: from mail.iskon.hr (mail.iskon.hr [213.191.128.4]) by mx1.FreeBSD.org (Postfix) with SMTP id C4C8344035 for ; Sat, 28 Jun 2003 12:40:13 -0700 (PDT) (envelope-from ivoras@fer.hr) Received: (qmail 7495 invoked from network); 28 Jun 2003 21:40:09 +0200 Received: from unknown (HELO flatline) (213.191.147.30) by mail.iskon.hr with SMTP; 28 Jun 2003 21:40:09 +0200 Message-ID: <000f01c33dad$1595a0f0$e602a8c0@flatline> From: "Ivan Voras" To: References: <20030628190036.0E06B37B405@hub.freebsd.org> Date: Sat, 28 Jun 2003 21:39:52 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Subject: Re: Tuning Gigabit X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Jun 2003 19:40:15 -0000 > Message: 5 > Date: Sat, 28 Jun 2003 13:43:46 +0200 > From: "Simon L. Nielsen" > Subject: Re: Tuning Ggiabit > Message-ID: <20030628114342.GB401@nitro.dk> > Content-Type: text/plain; charset="us-ascii" >> I've been trying to get my gigabit link to run as fast as possible, but I >> have run out of ideas. I've documented (though not very well) what I have >> done thus far: >> >> http://craig.afraid.org/a/b?page=computer/gigabit >> >> If there is any more I can do besides what I have already done, I would >> love >> to hear it. Please avoid suggestions that require spending money. This is probably an obsolete suggestion, but have you increased HZ when you turned on NIC polling? (the web page doesn't mention it) -- Logic is a systematic method of coming to the wrong conclusion with confidence. From owner-freebsd-performance@FreeBSD.ORG Sat Jun 28 16:35:55 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3609337B401 for ; Sat, 28 Jun 2003 16:35:55 -0700 (PDT) Received: from sabre.velocet.net (sabre.velocet.net [216.138.209.205]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2452F44001 for ; Sat, 28 Jun 2003 16:35:54 -0700 (PDT) (envelope-from dgilbert@velocet.ca) Received: from trooper.velocet.ca (trooper.velocet.net [216.138.242.2]) by sabre.velocet.net (Postfix) with ESMTP id 81696137F8B; Sat, 28 Jun 2003 19:35:47 -0400 (EDT) Received: by trooper.velocet.ca (Postfix, from userid 66) id 99C0674C27; Sat, 28 Jun 2003 19:35:46 -0400 (EDT) Received: by canoe.velocet.net (Postfix, from userid 101) id 0AD0C4AD1; Sat, 28 Jun 2003 19:35:42 -0400 (EDT) From: David Gilbert MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16126.9805.829406.368426@canoe.velocet.net> Date: Sat, 28 Jun 2003 19:35:41 -0400 To: "Ivan Voras" In-Reply-To: <000f01c33dad$1595a0f0$e602a8c0@flatline> References: <20030628190036.0E06B37B405@hub.freebsd.org> <000f01c33dad$1595a0f0$e602a8c0@flatline> X-Mailer: VM 7.14 under 21.4 (patch 12) "Portable Code" XEmacs Lucid cc: freebsd-performance@freebsd.org Subject: Re: Tuning Gigabit X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Jun 2003 23:35:55 -0000 >>>>> "Ivan" == Ivan Voras writes: >> Message: 5 Date: Sat, 28 Jun 2003 13:43:46 +0200 From: "Simon >> L. Nielsen" Subject: Re: Tuning Ggiabit >> Message-ID: <20030628114342.GB401@nitro.dk> Content-Type: >> text/plain; charset="us-ascii" >>> I've been trying to get my gigabit link to run as fast as >>> possible, but I have run out of ideas. I've documented (though not >>> very well) what I have done thus far: >>> >>> http://craig.afraid.org/a/b?page=computer/gigabit >>> >>> If there is any more I can do besides what I have already done, I >>> would love to hear it. Please avoid suggestions that require >>> spending money. Ivan> This is probably an obsolete suggestion, but have you increased Ivan> HZ when you turned on NIC polling? (the web page doesn't mention Ivan> it) Unfortunately you may have a motherboard that maxes at 300 megabit. Our recent testing of Gigabit NICs found more than 75% of the motherboards we tried unable to pass more than 500 megabit and 25% of the motherboards wouldn't pass more than 100 megabit. 300 megabit is about where 32bit 33Mhz PCI maxes out. Dave. -- ============================================================================ |David Gilbert, Velocet Communications. | Two things can only be | |Mail: dgilbert@velocet.net | equal if and only if they | |http://daveg.ca | are precisely opposite. | =========================================================GLO================ From owner-freebsd-performance@FreeBSD.ORG Sat Jun 28 16:57:45 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5AB3137B401 for ; Sat, 28 Jun 2003 16:57:45 -0700 (PDT) Received: from craig.afraid.org (h24-69-213-234.cc.shawcable.net [24.69.213.234]) by mx1.FreeBSD.org (Postfix) with ESMTP id AC6014401A for ; Sat, 28 Jun 2003 16:57:44 -0700 (PDT) (envelope-from craig@craig.afraid.org) Received: from fireball.internal.lan ([10.0.0.2] helo=fireball) by craig.afraid.org with smtp (Exim 4.20) id 19WPZh-0005wG-0w; Sat, 28 Jun 2003 16:57:45 -0700 Message-ID: <000901c33dd1$12268780$0200000a@fireball> From: "Craig Reyenga" To: "David Gilbert" References: <20030628190036.0E06B37B405@hub.freebsd.org> <000f01c33dad$1595a0f0$e602a8c0@flatline> <16126.9805.829406.368426@canoe.velocet.net> Date: Sat, 28 Jun 2003 16:57:45 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2600.0000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 cc: freebsd-performance@freebsd.org Subject: Re: Tuning Gigabit X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Craig Reyenga List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Jun 2003 23:57:45 -0000 From: "David Gilbert" > >>>>> "Ivan" == Ivan Voras writes: > > >> Message: 5 Date: Sat, 28 Jun 2003 13:43:46 +0200 From: "Simon > >> L. Nielsen" Subject: Re: Tuning Ggiabit > >> Message-ID: <20030628114342.GB401@nitro.dk> Content-Type: > >> text/plain; charset="us-ascii" > > >>> I've been trying to get my gigabit link to run as fast as > >>> possible, but I have run out of ideas. I've documented (though not > >>> very well) what I have done thus far: > >>> > >>> http://craig.afraid.org/a/b?page=computer/gigabit > >>> > >>> If there is any more I can do besides what I have already done, I > >>> would love to hear it. Please avoid suggestions that require > >>> spending money. > > Ivan> This is probably an obsolete suggestion, but have you increased > Ivan> HZ when you turned on NIC polling? (the web page doesn't mention > Ivan> it) > > Unfortunately you may have a motherboard that maxes at 300 megabit. > Our recent testing of Gigabit NICs found more than 75% of the > motherboards we tried unable to pass more than 500 megabit and 25% of > the motherboards wouldn't pass more than 100 megabit. > > 300 megabit is about where 32bit 33Mhz PCI maxes out. > > Dave. > > -- Could you tell me a little more about your tests? What boards, and what configuration? -Craig > ============================================================================ > |David Gilbert, Velocet Communications. | Two things can only be | > |Mail: dgilbert@velocet.net | equal if and only if they | > |http://daveg.ca | are precisely opposite. | > =========================================================GLO================ From owner-freebsd-performance@FreeBSD.ORG Sat Jun 28 17:31:36 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 22B7A37B401 for ; Sat, 28 Jun 2003 17:31:36 -0700 (PDT) Received: from mail1.zer0.org (klapaucius.zer0.org [204.152.186.45]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6C12C43F93 for ; Sat, 28 Jun 2003 17:31:35 -0700 (PDT) (envelope-from gsutter@zer0.org) Received: by mail1.zer0.org (Postfix, from userid 1001) id 36341239A15; Sat, 28 Jun 2003 17:31:35 -0700 (PDT) Date: Sat, 28 Jun 2003 17:31:35 -0700 From: Gregory Sutter To: David Gilbert Message-ID: <20030629003134.GV71533@klapaucius.zer0.org> References: <20030628190036.0E06B37B405@hub.freebsd.org> <000f01c33dad$1595a0f0$e602a8c0@flatline> <16126.9805.829406.368426@canoe.velocet.net> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="dnBRaVjvtun1Q0Od" Content-Disposition: inline In-Reply-To: <16126.9805.829406.368426@canoe.velocet.net> Organization: Zer0 X-Purpose: For great justice! Mail-Copies-To: poster X-Message-Flag: Ditch this virus-ridden Outlook crap and get a real mailer! X-Habeas-SWE-1: winter into spring X-Habeas-SWE-2: brightly anticipated X-Habeas-SWE-3: like Habeas SWE (tm) X-Habeas-SWE-4: Copyright 2002 Habeas (tm) X-Habeas-SWE-5: Sender Warranted Email (SWE) (tm). The sender of this X-Habeas-SWE-6: email in exchange for a license for this Habeas X-Habeas-SWE-7: warrant mark warrants that this is a Habeas Compliant X-Habeas-SWE-8: Message (HCM) and not spam. Please report use of this X-Habeas-SWE-9: mark in spam to . User-Agent: Mutt/1.5.4i cc: freebsd-performance@freebsd.org Subject: Re: Tuning Gigabit X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Jun 2003 00:31:36 -0000 --dnBRaVjvtun1Q0Od Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2003-06-28 19:35 -0400, David Gilbert wrote: >=20 > Unfortunately you may have a motherboard that maxes at 300 megabit. > Our recent testing of Gigabit NICs found more than 75% of the > motherboards we tried unable to pass more than 500 megabit and 25% of > the motherboards wouldn't pass more than 100 megabit. >=20 > 300 megabit is about where 32bit 33Mhz PCI maxes out. David, Will you please summarize the motherboard performance data so we know which boards to buy and which to skip? Thanks. Greg --=20 Gregory S. Sutter Brutalized, compromised, mailto:gsutter@zer0.org corrupted and debased. http://www.zer0.org/~gsutter/ hkp://wwwkeys.pgp.net/0x845DFEDD --dnBRaVjvtun1Q0Od Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- iD4DBQE+/jNmIBUx1YRd/t0RAuw8AJds/684Qf8L0I5QrbBZ9O/JKgKwAJ9kL6lL DcT+8BE2Nd0XCXIziHS7xQ== =B3d2 -----END PGP SIGNATURE----- --dnBRaVjvtun1Q0Od-- From owner-freebsd-performance@FreeBSD.ORG Sat Jun 28 19:06:09 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0C55337B401 for ; Sat, 28 Jun 2003 19:06:09 -0700 (PDT) Received: from sabre.velocet.net (sabre.velocet.net [216.138.209.205]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3453943FF9 for ; Sat, 28 Jun 2003 19:06:08 -0700 (PDT) (envelope-from dgilbert@velocet.ca) Received: from trooper.velocet.ca (trooper.velocet.net [216.138.242.2]) by sabre.velocet.net (Postfix) with ESMTP id E7319138999; Sat, 28 Jun 2003 22:06:00 -0400 (EDT) Received: by trooper.velocet.ca (Postfix, from userid 66) id 9926874D7D; Sat, 28 Jun 2003 22:06:00 -0400 (EDT) Received: by canoe.velocet.net (Postfix, from userid 101) id 918F74AD1; Sat, 28 Jun 2003 22:05:56 -0400 (EDT) From: David Gilbert MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16126.18820.474512.227009@canoe.velocet.net> Date: Sat, 28 Jun 2003 22:05:56 -0400 To: Gregory Sutter In-Reply-To: <20030629003134.GV71533@klapaucius.zer0.org> References: <20030628190036.0E06B37B405@hub.freebsd.org> <000f01c33dad$1595a0f0$e602a8c0@flatline> <16126.9805.829406.368426@canoe.velocet.net> <20030629003134.GV71533@klapaucius.zer0.org> X-Mailer: VM 7.14 under 21.4 (patch 12) "Portable Code" XEmacs Lucid cc: David Gilbert cc: freebsd-performance@freebsd.org Subject: Re: Tuning Gigabit X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Jun 2003 02:06:09 -0000 >>>>> "Gregory" == Gregory Sutter writes: Gregory> Will you please summarize the motherboard performance data so Gregory> we know which boards to buy and which to skip? Thanks. I've been working on such a summary. So far, the 'nvidia' chipset boards have all tested badly. They couldn't be coaxed to pass more than 100meg of traffic by any means we could discern. The K7S5A has been our mainstay. Many of them are DOA, but the ones that pass a couple weeks of cpuburn (see port) ... both on cpu and memory tests ... work amazingly well. These boards are limited to 300 megabit total thruput by being a 33Mhz 32bit PCI bus. We've been testing mainly Athlon boards ... we havn't seen good P4 boards ... but most of the boards we've had through for the P4 have been workstation and not server boards. The tiger tyan MPX is a dual board with 64 bit slots. I havn't had time to fully benchmark it becuase we use it as a fairly primary database server ... but it has generally been able to perform at or near the top of the class. There is an ASUS dusl board with 32-bit only slots and the AMD 76x chipset (unfortunately it's far away and I can't look at it). it's 32-bit slots run at 66Mhz and have extrodinarily good thruput. AFAICT, it's currently out of production ... but the dual board on the ASUS site looks very good. Dave. -- ============================================================================ |David Gilbert, Velocet Communications. | Two things can only be | |Mail: dgilbert@velocet.net | equal if and only if they | |http://daveg.ca | are precisely opposite. | =========================================================GLO================ From owner-freebsd-performance@FreeBSD.ORG Sat Jun 28 19:23:45 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F3F5B37B401 for ; Sat, 28 Jun 2003 19:23:44 -0700 (PDT) Received: from sabre.velocet.net (sabre.velocet.net [216.138.209.205]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5267443FF5 for ; Sat, 28 Jun 2003 19:23:44 -0700 (PDT) (envelope-from dgilbert@velocet.ca) Received: from trooper.velocet.ca (trooper.velocet.net [216.138.242.2]) by sabre.velocet.net (Postfix) with ESMTP id 216D313875F; Sat, 28 Jun 2003 22:23:23 -0400 (EDT) Received: by trooper.velocet.ca (Postfix, from userid 66) id EEE2574C27; Sat, 28 Jun 2003 22:23:22 -0400 (EDT) Received: by canoe.velocet.net (Postfix, from userid 101) id E263C4AD1; Sat, 28 Jun 2003 22:23:17 -0400 (EDT) From: David Gilbert MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16126.19861.842507.318997@canoe.velocet.net> Date: Sat, 28 Jun 2003 22:23:17 -0400 To: Craig Reyenga In-Reply-To: <000901c33dd1$12268780$0200000a@fireball> References: <20030628190036.0E06B37B405@hub.freebsd.org> <000f01c33dad$1595a0f0$e602a8c0@flatline> <16126.9805.829406.368426@canoe.velocet.net> <000901c33dd1$12268780$0200000a@fireball> X-Mailer: VM 7.14 under 21.4 (patch 12) "Portable Code" XEmacs Lucid cc: David Gilbert cc: freebsd-performance@freebsd.org Subject: Re: Tuning Gigabit X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Jun 2003 02:23:45 -0000 >>>>> "Craig" == Craig Reyenga writes: >> 300 megabit is about where 32bit 33Mhz PCI maxes out. Craig> Could you tell me a little more about your tests? What boards, Craig> and what configuration? Well... first of all, a 33Mhz 32-bit PCI bus can transfer 33M * 32 bits ... which is just about 1 gigabit of _total_ PCI bus bandwidth. Consider that you're likely testing disk->RAM->NIC and you end up with 1/3 of that as throughput (minus bus overhead) so 300 megabit is a good number. There are many ways boards can get around this. Your IDE controller can be on a different bus. Your RAM can be on a different bus. If all three are on different busses, you might get closer to your gigabit of throughput. You can also speed up the bus ... PCI can run at 66 or 100 Mhz. PCI-X can run at 66, 100 or 133 Mhz. You can also make the bus wider ... many new chipsets support 64 bit slots. Now some boards I've tested (like the nvidia chipset) are strangely limited to 100megabit. I can't explain this. It seems low no matter how you cut it. Our testing has been threefold: 1) Generating packets. We test the machines ability to generate both large (1500, 3000 and 9000 byte) and small (64 byte) packets. The large scale generation of packets is necessary for the other tests. So far, some packet flood utilities from the linux hacker camp are our most efficient small packet generators. netcat on memory cached objects or on /dev/zero generate our big packets. 2) Passing packets. Primarily, we're interested in routing. Passing packets, passing packets with 100k routes and passing packets with 100's of ipf accounting rules are our benchmarks. We look at both small and large packet performance. Packet passing machines have at least two interfaces ... but sometimes 3 or 4 are tested. Polling is a major win in the small packet passing race. 3) Receiving packets. netcat is our friend again here. Receiving packets doesn't appear to be the same level of challenge as generating or passing them. At any rate, we're clearly not testing file delivery. We sometimes play with file delivery as a first test ... or for other testing reasons. We've found several boards that corrupt packets when they pass more than 100megabit of packets. We havn't explained that one yet. Our tests centre on routing packets (because that's what we do with our high performance FreeBSD boxes. All our other FreeBSD boxes "just work" at the level of performance they have). Although I would note that we do have some strange datapoints where we've revisited old problems. One of the most peculiar is the DEC tulip chipset 4 port cards. ... on these cards ... we have only been able to ever pass 100megabit _per card_ ... never per port. It would appear that the PCI bridge on these cards is imposing some form of limitation. We havn't tested under any other OSs than FreeBSD ... but the problem is definately perplexing. Dave. -- ============================================================================ |David Gilbert, Velocet Communications. | Two things can only be | |Mail: dgilbert@velocet.net | equal if and only if they | |http://daveg.ca | are precisely opposite. | =========================================================GLO================