From owner-cvs-all Mon Jun 25 8: 0:53 2001 Delivered-To: cvs-all@freebsd.org Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by hub.freebsd.org (Postfix) with ESMTP id DFFA037B407; Mon, 25 Jun 2001 08:00:45 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id BAA16654; Tue, 26 Jun 2001 01:00:37 +1000 Date: Tue, 26 Jun 2001 00:58:43 +1000 (EST) From: Bruce Evans X-Sender: bde@besplex.bde.org To: Matt Dillon Cc: Mikhail Teterin , jlemon@FreeBSD.ORG, cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG Subject: Re: Inline optimized bzero (was Re: cvs commit: src/sys/netinet tcp_subr.c) In-Reply-To: <200106241549.f5OFn6J78347@earth.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-cvs-all@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sun, 24 Jun 2001, Matt Dillon wrote: > :I benchmarked the following version using lmbench2: > : > :#define bzero(p, n) ({ \ > : if (__builtin_constant_p(n) && (n) <= 16) \ > : __builtin_memset((p), 0, (n)); \ > : else \ > : (bzero)((p), (n)); \ > :}) > : > :The results were uninteresting: essentially no change. lmbench2 is a > :micro-benchmark, so it tends to show larger improvements for micro- > :optimizations than can be expected in normal use. > > I wouldn't expect lmbench to be useful here. I would expect the opposite. If the bzero's in the networking code don't show up in the network latency benchmarks, where would they show up? ISTR that a Linux hacker who made lmbench1 go faster for Linux saying that the bzero() at the start of the FreeBSD tcp_input() is a really stupid thing to do. But I think even completely eliminating it would be just another micro-optimization, worth 1% in favourable cases, so you need 10 more like it to give a useful speedup. > :One point that I noticed after writing my original reply: the gcc > :builtins depend on misaligned accesses not trapping. This is reasonable > :on i386's, although it is broken if alignment checking is enabled > :(but other things are broken, e.g., copying of structs essentially > :uses the builtin memcpy and does misaligned copies for some structs > > I added an alignment check to my bzerol() inline and it blew it up... > it added 6ns to the loop, which is fine, but it blew up the constant > optimization and wound up adding a switch table and a dozen > instructions inline (hundreds of bytes!). Yes, it's clear that alignment is not worth doing in the kernel. Userland is different -- the application might have turned on alignment checking, or it might be poorly behaved and pass a lot of unaligned buffers. gcc is primarily a userland compiler, so it's a little surprising that its builtins don't worry about alignment. > I added alignment checks to i586_bzero but it ate 20nS. Also, > it should be noted that i586_bzero() as it currently stands does not > do any alignment checks either - it checks only the size argument, > it doesn't check the base pointer. Neither does generic_bzero(). i586_bzero() just turns itself into generic_bzero() for small sizes. I'm fairly sure that I benchmarked this, and came to the conclusion that there is nothing significanttly better than "rep movsl" when the size isn't know at compile time. In particular, lots of jumps as in i486_bzero are actively bad. This may be P5-specific (branch prediction is not very good on original Pentiums). Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message