From owner-freebsd-arch@FreeBSD.ORG Fri Jul 13 04:38:16 2007 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 89C0916A402; Fri, 13 Jul 2007 04:38:16 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail17.syd.optusnet.com.au (mail17.syd.optusnet.com.au [211.29.132.198]) by mx1.freebsd.org (Postfix) with ESMTP id 2689813C467; Fri, 13 Jul 2007 04:38:15 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c220-239-235-248.carlnfd3.nsw.optusnet.com.au (c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248]) by mail17.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id l6D4cA0r004417 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 13 Jul 2007 14:38:13 +1000 Date: Fri, 13 Jul 2007 14:38:10 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: "Sean C. Farley" In-Reply-To: <20070712142024.Q8789@thor.farley.org> Message-ID: <20070713135453.H8054@delplex.bde.org> References: <20070711134721.D2385@thor.farley.org> <20070712191616.A4682@delplex.bde.org> <20070712211245.M8625@besplex.bde.org> <20070712142024.Q8789@thor.farley.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-arch@freebsd.org Subject: Re: Assembly string functions in i386 libc X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Jul 2007 04:38:16 -0000 On Thu, 12 Jul 2007, Sean C. Farley wrote: > On Thu, 12 Jul 2007, Bruce Evans wrote: >> Now I've looked at it. I think it is not testing strlen() at all, >> except for the libc case, because __pure prevents more than 1 call to >> strlen(). (The existence of __pure is also a bug. __pure was the >> FreeBSD spelling of the __const__ attribute in gcc-1. It was removed >> when special support for gcc-1 was dropped, and should not have been >> recycled.) __pure is a syntax error in the old version of FreeBSD >> that I tested on. I first tried __pure2, which is the FreeBSD >> spelling of the __const__ attribute in gcc-2. I think it is weaker >> than the __pure__ attribute in gcc-3. > >> From what I could find, strlen() should not have the __const__ (__pure2) > attribute since it is being passed a pointer, but __pure__ (__pure) > should work. Are you saying that __pure used to mean __const__ in gcc-1 > but now it means __pure__ for gcc-2.96 and above? The redefinition of > __pure is what you are saying is a bug. Yes? Yes to most of this. __pure2 is actually weaker than __pure[>2.96]. __pure2 has the very large effect of removing all calls to strlen() from the loop. This affected everything except libc strlen() since everything else was named xstrlen() and declared as __pure*, while libc strlen() was declared in without __pure*. OTOH, __pure[>2.96] has no effect on this benchmark, at least with gcc-3.3.3. I don't understand why it has no effect. It has no effect even when I change the arg to a literal. The context is very simple, with no aliasing problems in sight, at least with the literal arg (with the arg possibly being argv[2], maybe gcc has to worry about the arg being modified by a signal handler). If __pure[>2.96] doesn't work in this simple context, then it isn't clear when it can work. BTW, starting somewhere near gcc-3.4 for -O2 and gcc-4.2 for -O, simple loops like this don't always work in benchmarks, because the compiler removes the whole loop if it can see that it doesn't do anything. The compiler can see this if it can see inside any function calls in the loop (this currently requires the functions to be in the same source file or #included there), or if the functions are declared as sufficiently __pure. When I used __pure2 with gcc-3.3.3 -O, gcc removed the function calls but not the loop. gcc-4.2 would also remove the loop. > I removed __pure from main.c and added -static -g. > > Athlon XP 2100 (1.72 GHz): > libcstrlen: time spent executing strlen(string) = 64: 0.994755 > asmstrlen: time spent executing strlen(string) = 64: 0.989012 > basestrlen: time spent executing strlen(string) = 64: 0.879722 > strlen: time spent executing strlen(string) = 64: 0.626727 > strlen2: time spent executing strlen(string) = 64: 0.587162 That looks just like my results on A64 in 32-bit mode. (A64 is remarkably similar to AXP in most CPU resources including pipelines, so its performance is remarkably similar even when when its mode differs.) > ...[asm version more than twice as slow on P3-P4] > The Athlon XP did much better with the assembly version than either > Intel CPU for me. For all three CPU's using various string lengths from > 1 to 256, the C versions always beat the assembly version although it > came somewhat close for the 9 to 32 byte lengths to basestrlen. Intel CPUs are remarkably different from AXP :-). I'm surprised at the sign of the difference here -- I would have expected them to be better for the string instructions. Bruce