From owner-cvs-all Thu Aug 19 7:29:49 1999 Delivered-To: cvs-all@freebsd.org Received: from overcee.netplex.com.au (overcee.netplex.com.au [202.12.86.7]) by hub.freebsd.org (Postfix) with ESMTP id 92B64151B6; Thu, 19 Aug 1999 07:29:34 -0700 (PDT) (envelope-from peter@netplex.com.au) Received: from netplex.com.au (localhost [127.0.0.1]) by overcee.netplex.com.au (Postfix) with ESMTP id C78D81C9F; Thu, 19 Aug 1999 22:28:50 +0800 (WST) (envelope-from peter@netplex.com.au) X-Mailer: exmh version 2.0.2 2/24/98 To: Bruce Evans Cc: cvs-all@FreeBSD.org, cvs-committers@FreeBSD.org Subject: Re: cvs commit: src/sys/i386/include cpufunc.h In-reply-to: Your message of "Thu, 19 Aug 1999 17:27:33 +1000." <199908190727.RAA14796@godzilla.zeta.org.au> Date: Thu, 19 Aug 1999 22:28:50 +0800 From: Peter Wemm Message-Id: <19990819142850.C78D81C9F@overcee.netplex.com.au> Sender: owner-cvs-all@FreeBSD.ORG Precedence: bulk Bruce Evans wrote: > > Modified files: > > sys/i386/include cpufunc.h > > Log: > > Try using the builtin ffs() for egcs, it (by random inspection) > > generates slightly better code and avoids the incl then subl when > > using ffs(foo) - 1. > > The inline asm version of ffs(x) should be implemented as > (x == 0 ? 0 : bsfl(x) + 1). The compiler can then perform all possible > optimisations except ones that use the condition codes delivered by bsfl > (these never seem to help). This gives slightly better code than the > builtin. except the one where I want "bsfl(x)", not "bsfl(x) + 1", With the cpufunc.h inline, it works out as: testl %eax,%eax; je 1f; bsfl %eax; addl $1,%eax; 1: subl $1,%eax ie: it can't optimize out the +1 -1. How about this instead: static __inline int __bsfl(int mask) { int result; __asm __volatile("bsfl %0,%0" : "=r" (result) : "0" (mask)); return result; } static __inline int ffs(int mask) { return mask == 0 ? mask : __bsfl(mask) + 1; } Then, with the following code: extern int bar(int); int foo(int j) { int i; if (j) bar (ffs(j) - 1); } It gets optimized much better: foo: movl 4(%esp),%eax testl %eax,%eax je .L6 #APP bsfl %eax,%eax #NO_APP pushl %eax call bar addl $4,%esp .L6: ret Versus the original inline with your ffs macro: foo: movl 4(%esp),%eax testl %eax,%eax je .L37 #APP testl %eax,%eax je 1f bsfl %eax,%eax incl %eax 1: #NO_APP decl %eax pushl %eax call bar addl $4,%esp .L37: ret The redundant incl, decl isn't optimized and contains a duplicate (never taken) test. Using this same code with builtin_ffs() results in: foo: movl 4(%esp),%eax testl %eax,%eax je .L37 bsfl %eax,%eax pushl %eax call bar addl $4,%esp .L37: ret However, you're right. builtin_ffs() sucks when the argument is not known ie: leave out the if (j), and it turns into: foo: movl 4(%esp),%eax bsfl %eax,%edx jne .L37 movl $-1,%edx .L37: pushl %edx call bar addl $4,%esp ret Which means bsfl is always called even for a zero arg. Leaving out "if (j)" on my above code with my __bsfl() version results in: foo: movl 4(%esp),%eax testl %eax,%eax je .L6 #APP bsfl %eax,%eax #NO_APP incl %eax .L6: decl %eax pushl %eax call bar addl $4,%esp ret .. which is equivalent to your inline. In all cases I've checked, the code is either equivalent or better. (ie: addl, subl optimized out) Having ffs() be base 1 is a pest. ffs0() (base 0) would be damn convenient at times, considering the number of places 'ffs(foo) - 1' turns up. > Bruce Cheers, -Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message