From owner-freebsd-current@FreeBSD.ORG Mon Dec 12 04:30:25 2005 Return-Path: X-Original-To: current@freebsd.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A715916A41F; Mon, 12 Dec 2005 04:30:25 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 17C4D43D49; Mon, 12 Dec 2005 04:30:25 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196]) by elvis.mu.org (Postfix) with ESMTP id 9E2421A3C25; Sun, 11 Dec 2005 20:30:24 -0800 (PST) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id 6782951432; Sun, 11 Dec 2005 23:30:23 -0500 (EST) Date: Sun, 11 Dec 2005 23:30:23 -0500 From: Kris Kennaway To: Kris Kennaway Message-ID: <20051212043023.GA16678@xor.obsecurity.org> References: <0B746373-8C29-4ADF-9218-311AE08F3834@canonware.com> <7318D807-9086-4817-A40B-50D6960880FB@canonware.com> <12CA5E15-D006-441D-A24C-1BCD1A69D740@canonware.com> <439CC5DA.3080103@elischer.org> <439CC939.5080703@freebsd.org> <20051212012907.GA13640@xor.obsecurity.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="sm4nu43k4a2Rpi4c" Content-Disposition: inline In-Reply-To: <20051212012907.GA13640@xor.obsecurity.org> User-Agent: Mutt/1.4.2.1i Cc: Julian Elischer , Jason Evans , Claus Guttesen , David Xu , current@freebsd.org Subject: Re: New libc malloc patch X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Dec 2005 04:30:25 -0000 --sm4nu43k4a2Rpi4c Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Sun, Dec 11, 2005 at 08:29:07PM -0500, Kris Kennaway wrote: > I'll try to test this on a 4 CPU amd64 machine next. phkmalloc: # ./malloc-test 1024 10000000 1 Starting test with 1 thread... Thread 5298176 adjusted timing: 4.173052 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 2 Starting test with 2 threads... Thread 5299200 adjusted timing: 325.108643 seconds for 10000000 requests of 1024 bytes. Thread 5298176 adjusted timing: 325.202485 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 3 Starting test with 3 threads... Thread 5414912 adjusted timing: 1133.238459 seconds for 10000000 requests of 1024 bytes. Thread 5299200 adjusted timing: 1134.525255 seconds for 10000000 requests of 1024 bytes. Thread 5298176 adjusted timing: 1134.539555 seconds for 10000000 requests of 1024 bytes. jemalloc: # ./malloc-test 1024 10000000 1 Starting test with 1 thread... Thread 1073760528 adjusted timing: 3.777175 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 2 Starting test with 2 threads... Thread 1073760560 adjusted timing: 3.851702 seconds for 10000000 requests of 1024 bytes. Thread 1073761584 adjusted timing: 3.887943 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 3 Starting test with 3 threads... Thread 1073760528 adjusted timing: 3.866206 seconds for 10000000 requests of 1024 bytes. Thread 1073761552 adjusted timing: 13.382795 seconds for 10000000 requests of 1024 bytes. Thread 1073762688 adjusted timing: 14.407229 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 4 Starting test with 4 threads... Thread 1073760528 adjusted timing: 3.782923 seconds for 10000000 requests of 1024 bytes. Thread 1073763792 adjusted timing: 6.668655 seconds for 10000000 requests of 1024 bytes. Thread 1073762688 adjusted timing: 14.346569 seconds for 10000000 requests of 1024 bytes. Thread 1073761584 adjusted timing: 14.680211 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 5 Starting test with 5 threads... Thread 1073760560 adjusted timing: 4.748248 seconds for 10000000 requests of 1024 bytes. Thread 1073761584 adjusted timing: 9.898153 seconds for 10000000 requests of 1024 bytes. Thread 1073764896 adjusted timing: 13.019884 seconds for 10000000 requests of 1024 bytes. Thread 1073762688 adjusted timing: 15.326908 seconds for 10000000 requests of 1024 bytes. Thread 1073763792 adjusted timing: 15.442164 seconds for 10000000 requests of 1024 bytes. So it's 1.1 times faster for single-threaded, and 107 times faster with 3 threads. With libthr instead of libpthread: phkmalloc: # ./malloc-test 1024 10000000 1 Starting test with 1 thread... Thread 5255680 adjusted timing: 2.357247 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 2 Starting test with 2 threads... Thread 5256192 adjusted timing: 10.964918 seconds for 10000000 requests of 1024 bytes. Thread 5255680 adjusted timing: 11.001288 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 3 Starting test with 3 threads... Thread 5255680 adjusted timing: 17.467754 seconds for 10000000 requests of 1024 bytes. Thread 5256704 adjusted timing: 17.724583 seconds for 10000000 requests of 1024 bytes. Thread 5256192 adjusted timing: 17.913381 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 4 Starting test with 4 threads... Thread 5255680 adjusted timing: 42.715420 seconds for 10000000 requests of 1024 bytes. Thread 5256192 adjusted timing: 43.481252 seconds for 10000000 requests of 1024 bytes. Thread 5256704 adjusted timing: 43.871452 seconds for 10000000 requests of 1024 bytes. Thread 5257216 adjusted timing: 43.887820 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 5 Starting test with 5 threads... Thread 5255680 adjusted timing: 139.316332 seconds for 10000000 requests of 1024 bytes. Thread 5257216 adjusted timing: 140.117720 seconds for 10000000 requests of 1024 bytes. Thread 5256192 adjusted timing: 140.134057 seconds for 10000000 requests of 1024 bytes. Thread 5256704 adjusted timing: 140.855289 seconds for 10000000 requests of 1024 bytes. Thread 5257728 adjusted timing: 140.865934 seconds for 10000000 requests of 1024 bytes. jemalloc: # ./malloc-test 1024 10000000 1 Starting test with 1 thread... Thread 1073742416 adjusted timing: 1.366353 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 2 Starting test with 2 threads... Thread 1073742416 adjusted timing: 1.429485 seconds for 10000000 requests of 1024 bytes. Thread 1073742896 adjusted timing: 1.530733 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 3 Starting test with 3 threads... Thread 1073742416 adjusted timing: 1.419813 seconds for 10000000 requests of 1024 bytes. Thread 1073743376 adjusted timing: 1.432790 seconds for 10000000 requests of 1024 bytes. Thread 1073742896 adjusted timing: 1.490218 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 4 Starting test with 4 threads... Thread 1073743376 adjusted timing: 1.447554 seconds for 10000000 requests of 1024 bytes. Thread 1073742416 adjusted timing: 1.503659 seconds for 10000000 requests of 1024 bytes. Thread 1073743856 adjusted timing: 1.503937 seconds for 10000000 requests of 1024 bytes. Thread 1073742896 adjusted timing: 1.504926 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 5 Starting test with 5 threads... Thread 1073743376 adjusted timing: 1.595239 seconds for 10000000 requests of 1024 bytes. Thread 1073742896 adjusted timing: 1.689753 seconds for 10000000 requests of 1024 bytes. Thread 1073742416 adjusted timing: 1.750115 seconds for 10000000 requests of 1024 bytes. Thread 1073744336 adjusted timing: 1.744271 seconds for 10000000 requests of 1024 bytes. Thread 1073743856 adjusted timing: 1.890269 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 6 Starting test with 6 threads... Thread 1073743856 adjusted timing: 1.847653 seconds for 10000000 requests of 1024 bytes. Thread 1073742416 adjusted timing: 2.018481 seconds for 10000000 requests of 1024 bytes. Thread 1073743376 adjusted timing: 2.059817 seconds for 10000000 requests of 1024 bytes. Thread 1073742896 adjusted timing: 2.129204 seconds for 10000000 requests of 1024 bytes. Thread 1073744336 adjusted timing: 2.223751 seconds for 10000000 requests of 1024 bytes. Thread 1073744816 adjusted timing: 2.293809 seconds for 10000000 requests of 1024 bytes. # ./malloc-test 1024 10000000 20 Starting test with 20 threads... Thread 1073744816 adjusted timing: 5.113769 seconds for 10000000 requests of 1024 bytes. Thread 1073751136 adjusted timing: 4.973369 seconds for 10000000 requests of 1024 bytes. Thread 1073750176 adjusted timing: 5.295912 seconds for 10000000 requests of 1024 bytes. Thread 1073745296 adjusted timing: 5.502331 seconds for 10000000 requests of 1024 bytes. Thread 1073743856 adjusted timing: 5.614890 seconds for 10000000 requests of 1024 bytes. Thread 1073744336 adjusted timing: 5.608690 seconds for 10000000 requests of 1024 bytes. Thread 1073752096 adjusted timing: 5.555465 seconds for 10000000 requests of 1024 bytes. Thread 1073748736 adjusted timing: 5.650922 seconds for 10000000 requests of 1024 bytes. Thread 1073748256 adjusted timing: 6.608054 seconds for 10000000 requests of 1024 bytes. Thread 1073750656 adjusted timing: 7.144998 seconds for 10000000 requests of 1024 bytes. Thread 1073742896 adjusted timing: 7.390905 seconds for 10000000 requests of 1024 bytes. Thread 1073746256 adjusted timing: 7.364728 seconds for 10000000 requests of 1024 bytes. Thread 1073742416 adjusted timing: 7.556064 seconds for 10000000 requests of 1024 bytes. Thread 1073749216 adjusted timing: 7.357179 seconds for 10000000 requests of 1024 bytes. Thread 1073752576 adjusted timing: 7.349483 seconds for 10000000 requests of 1024 bytes. c Thread 1073747776 adjusted timing: 7.375179 seconds for 10000000 requests of 1024 bytes. Thread 1073751616 adjusted timing: 7.557854 seconds for 10000000 requests of 1024 bytes. Thread 1073743376 adjusted timing: 7.915978 seconds for 10000000 requests of 1024 bytes. Thread 1073749696 adjusted timing: 7.795219 seconds for 10000000 requests of 1024 bytes. Thread 1073745776 adjusted timing: 8.007392 seconds for 10000000 requests of 1024 bytes. So libthr is *much* faster than libpthread with both malloc implementations, but jemalloc is still 1.7 times faster for 1 thread and 80 times faster for 5 threads than phkmalloc. Kris P.S. Holy crap :) --sm4nu43k4a2Rpi4c Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD8DBQFDnPzeWry0BWjoQKURAnVqAJ9cJGJuCWOLnIKy1Y+V6DEyZeUrWwCgxOzF X+0gquCFzLB20OwCt+7qhVc= =rZUQ -----END PGP SIGNATURE----- --sm4nu43k4a2Rpi4c--