Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Feb 2005 15:18:10 -0500
From:      David Schultz <das@FreeBSD.ORG>
To:        Nate Lawson <nate@root.org>
Cc:        cvs-all@FreeBSD.ORG
Subject:   Re: cvs commit: src/lib/msun/i387 Makefile.inc e_atan2.S e_atan2f.S s_atan.S
Message-ID:  <20050222201810.GA37791@VARK.MIT.EDU>
In-Reply-To: <421B81E4.6080909@root.org>
References:  <200502211604.j1LG4NNx037623@repoman.freebsd.org> <421B24E2.7050800@portaone.com> <20050222135251.GB29054@VARK.MIT.EDU> <421B81E4.6080909@root.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Feb 22, 2005, Nate Lawson wrote:
> David Schultz wrote:
> >By the way, here are some other results for the Pentium 4, all
> >without SSE.  SSE makes things a bit worse, probably because the
> >x87 and SSE registers are shared, and the Pentium 4 imposes a
> >large penalty for switching between the two sets.
> 
> I don't believe this is correct.  MMX and x87 use the same register 
> context (hence emms), however the XMM registers (SSE*) are separate. 
> It's possible gcc is generating MMX instructions though with your SSE 
> command line switch.

Yep, you're right, I was thinking of the MMX register set.  I
compared the code generated by gcc with and without SSE/SSE2, and
found that the only thing it uses SSE2 for is converting from
floating point->integer and back (e.g. CVTTSD2SI instead of i387
control word frobbing and FISTL).  There was also one place where
gcc just got confused and juggled around a bunch of registers on
the i387 stack, but I don't think that accounts for the
difference.  I wonder if CVTTSD2SI and friends are slower than an
OR/MOV/FLDCW/FISTL/FLDCW sequence on the Pentium 4 for some
bizarre reason, or if I missed something else significant while
scanning the diff.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050222201810.GA37791>