Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 23 Mar 2003 19:24:05 +0100
From:      Till Riedel <till@f111.hadiko.de>
To:        freebsd-current@freebsd.org
Subject:   Re: libm problem
Message-ID:  <20030323182405.GA2135@f111.hadiko.de>
In-Reply-To: <200303231843.16545.michaelnottebrock@gmx.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Mar 23, 2003 at 06:43:16PM +0100, Michael Nottebrock wrote:
Content-Description: signed data
> On Sunday 23 March 2003 18:02, Till Riedel wrote:
> > why not
> > +_CPUCFLAGS = -march=pentium4 -mno-sse2
> >
> > > choose, and in the case of pentium4 producing broken code the
> > > obvious fallback would be pentium3...
> >
> > above would be in fact the same because only the SSE2 code differs from
> > march=pentium3 which in turn only defines SSE additionally (which
> > probably generates the slower code compared to pentiumpro) as i see it.
> > code generation for all x86 uses the same rules (i386.md)
> > except that some rules only apply if TARGET_SSE2 is defined.
I at least now know to some extend what make -mpentium4 slow. someone at
gcc hacked a stupid cost table for its operations.This makes pentium4
fast again:
*** i386.c      Sun Mar 23 17:32:38 2003
--- i386.c.orig Sun Mar 23 17:45:35 2003
***************
*** 893,895 ****
{"pentium3", PROCESSOR_PENTIUMPRO, PTA_MMX | PTA_SSE | PTA_PREFETCH_SSE},
!       {"pentium4", PROCESSOR_PENTIUMPRO, PTA_SSE | PTA_SSE2 |
                     PTA_MMX | PTA_PREFETCH_SSE},
--- 893,895 ----
{"pentium3", PROCESSOR_PENTIUMPRO, PTA_MMX | PTA_SSE | PTA_PREFETCH_SSE},
!  {"pentium4", PROCESSOR_PENTIUM4, PTA_SSE | PTA_SSE2 | PTA_MMX | PTA_PREFETCH_SSE},

> 
> Just out of curiousity, have you tried using -mfpmath=sse? I remember someone 
> on this list claiming that the SSE fpa-code works much better than the i387 
> code which is used by default (even with -march=pentium4).
seems to be equally fast with whetstone benchmark , 
but makes sse2 slower because most sse2 rules depend on i387 math.
here some results after the cost patch above:

-march=pentiumpro
 whetstone took: 1.05 secs for 954 MFLOPS (w/  math lib)
 whetstone took: 0.28 secs for 3555 MFLOPS (w/o math lib)
-march=pentium3
 whetstone took: 1.05 secs for 954 MFLOPS (w/  math lib)
 whetstone took: 0.28 secs for 3556 MFLOPS (w/o math lib)
-march=pentium3  -mfpmath=sse
  whetstone took: 1.05 secs for 953 MFLOPS (w/  math lib)
  whetstone took: 0.28 secs for 3555 MFLOPS (w/o math lib)
-march=pentium4
 whetstone took: 1.06 secs for 942 MFLOPS (w/  math lib)
 whetstone took: 0.29 secs for 3393 MFLOPS (w/o math lib)
-march=pentium4  -mno-sse2  should after patch be the same as pentium3
 whetstone took: 1.05 secs for 954 MFLOPS (w/  math lib)
 whetstone took: 0.28 secs for 3555 MFLOPS (w/o math lib)
-march=pentium4  -mfpmath=sse
 whetstone took: 1.14 secs for 880 MFLOPS (w/  math lib)
 whetstone took: 0.36 secs for 2768 MFLOPS (w/o math lib)

till

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030323182405.GA2135>