Date: Thu, 27 Jun 2013 12:25:38 -0700 From: Steve Kargl <sgk@troutmask.apl.washington.edu> To: enh <enh@google.com> Cc: freebsd-numerics@freebsd.org Subject: Re: sincos? Message-ID: <20130627192538.GA41760@troutmask.apl.washington.edu> In-Reply-To: <CAJgzZoqbF-bS6M8OYmVx7=eKfpNmavXXZXX0Zgvsxr07CUfC0w@mail.gmail.com> References: <CAJgzZopTzfYXecu7zRKhVNEEBOCtz8Z2qK8ka74c5LKZxC8mEw@mail.gmail.com> <20130627013502.GA37295@troutmask.apl.washington.edu> <CAJgzZoqbF-bS6M8OYmVx7=eKfpNmavXXZXX0Zgvsxr07CUfC0w@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jun 27, 2013 at 09:12:04AM -0700, enh wrote: > well, that was Intel and the code's not been accepted, but yes --- that's > another reason for me not to accept their patch! > > Intel claimed "The reason for this fix [beside workaround for O0 switch] - > it helps to remove some sin[f]+cos[f] code duplication (which is the whole > reason for introduction of such function at all), which results in > 1.58-1.81x performance gain on intervals |x|<100." i've not seen their > benchmark code, so i don't know what their distribution of values was, and > i don't understand why they covered a range as large as +/- 100. > The code duplication, which is removed, is the argument reduction for values |x| > pi / 4 for sin and cos. If you have void sincos(x, *s, *c) { *s = sin(x); *c = cos(x); } then both sin and cos call rem_pio2 (or whatever the function is called) if |x| > pi/4. The code in question removes one of the argument reduction calls, and so you get a speed improvement of 1.5 to 2. As Bruce noted, he would like to see some additional optimizations for -2*pi < x < 2*pi (may have the range incorrect here) integrated intoin, cos, sinl, and cosl before we worry about sincos[fl]. I'll get to those hopefully in August, but coshl, sinhl, and tanhl are on my plate. -- Steve
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130627192538.GA41760>