From owner-freebsd-numerics@freebsd.org Tue Sep 4 00:15:00 2018 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 482B4FFA82E for ; Tue, 4 Sep 2018 00:15:00 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id C52EA7BFBF for ; Tue, 4 Sep 2018 00:14:56 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id w83NvOFm095479 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Mon, 3 Sep 2018 16:57:24 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id w83NvO86095478 for freebsd-numerics@freebsd.org; Mon, 3 Sep 2018 16:57:24 -0700 (PDT) (envelope-from sgk) Date: Mon, 3 Sep 2018 16:57:24 -0700 From: Steve Kargl To: freebsd-numerics@freebsd.org Subject: j0 (and y0) in the range 2 <= x < (p/2)*log(2) Message-ID: <20180903235724.GA95333@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Sep 2018 00:15:00 -0000 Anyone know where the approximations for j0 (and y0) come from? msun/src/e_j0.c states * for x in (2,inf) * j0(x) = sqrt(2/(pi*x))*(p0(x)*cos(x0)-q0(x)*sin(x0)) * where x0 = x-pi/4. It is better to compute sin(x0),cos(x0) * as follow: * cos(x0) = cos(x)cos(pi/4)+sin(x)sin(pi/4) * = 1/sqrt(2) * (cos(x) + sin(x)) * sin(x0) = sin(x)cos(pi/4)-cos(x)sin(pi/4) * = 1/sqrt(2) * (sin(x) - cos(x)) * (To avoid cancellation, use * sin(x) +- cos(x) = -cos(2x)/(sin(x) -+ cos(x)) * to compute the worse one.) p0(x) and q0(x) are divergent asymptotic series. If I extract pzero() and qzero() from e_j0.c and compare the results against summing truncated versions of p0(x) and q0(x), there are no obvious connections. Reading the documentation for the algorithms used in MPFR suggests that x >= p/2*log(2), where p is precision of x, is required for use of the large argument approximation for j0(x). In double precision, p = 53, so we have x >= 18.368... Consider x=18.4 and sum up to N = 31 in the asymptotic series: % ./pq 30 18.4 p = 9.997932830701132e-01, q = -6.781826311540553e-03 <-- series pp = 9.997932830701132e-01, qq = -6.781826311540509e-03 <-- pzero,qzero ulp(p, pp) = 0.000000e+00 ulp(q, qq) = 2.550000e+01 This is almost reasonable if 25.5 ULP is acceptable in q0(x). Note the series are computed in long double with 64 bits of precision. Now, comparing x = 2 and summing N = 4 (best results). % ./pq 4 2 p = 9.894313812255859e-01, q = -5.334472656250000e-02 pp = 9.862158212188928e-01, qq = -5.647769967932505e-02 ulp(p, pp) = 1.448159e+13 ulp(q, qq) = 2.257545e+14 For values of N > 4, the series start to diverge! So, how does msun use the large argument approximation for j0(x)? -- Steve From owner-freebsd-numerics@freebsd.org Tue Sep 4 03:57:48 2018 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 541C8FDE9C3 for ; Tue, 4 Sep 2018 03:57:48 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from um-tip2-missouri-out.um.umsystem.edu (um-tip2-missouri-out.um.umsystem.edu [198.209.49.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "um-tip1.um.umsystem.edu", Issuer "InCommon RSA Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CCD6784BC8 for ; Tue, 4 Sep 2018 03:57:47 +0000 (UTC) (envelope-from stephen@missouri.edu) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A2E+BgCVAY5b/yI40cYXAUIOCAUBAQEBA?= =?us-ascii?q?wEBAQkBAQGCVwFMgQ9tEigKjFmNXZgnAQoYAQoJAoEZgl9GAoNpOBQBAgEBAgE?= =?us-ascii?q?BAgICaRwMgmhLagEBAQEBASMCDQdcAQEBAgIBASshIBsCAQgRBAEBLycBCQEdC?= =?us-ascii?q?AIEAQcHBAEHFQSDAYEdZA+GLJ4Gh3MBB4F+BYpvggCBEoIUfoMbAQICgT4BAYV?= =?us-ascii?q?2Aod6hSWONwkChjKFTINxH45ZiyeIFwICAgIJAhSBWCI0gSFyEzuCbIM2AQiHV?= =?us-ascii?q?oJmgh46bwEBDoEGigoECxeBCAGBGwEB?= X-IPAS-Result: =?us-ascii?q?A2E+BgCVAY5b/yI40cYXAUIOCAUBAQEBAwEBAQkBAQGCVwF?= =?us-ascii?q?MgQ9tEigKjFmNXZgnAQoYAQoJAoEZgl9GAoNpOBQBAgEBAgEBAgICaRwMgmhLa?= =?us-ascii?q?gEBAQEBASMCDQdcAQEBAgIBASshIBsCAQgRBAEBLycBCQEdCAIEAQcHBAEHFQS?= =?us-ascii?q?DAYEdZA+GLJ4Gh3MBB4F+BYpvggCBEoIUfoMbAQICgT4BAYV2Aod6hSWONwkCh?= =?us-ascii?q?jKFTINxH45ZiyeIFwICAgIJAhSBWCI0gSFyEzuCbIM2AQiHVoJmgh46bwEBDoE?= =?us-ascii?q?GigoECxeBCAGBGwEB?= Received: from ex-n22.um.umsystem.edu (HELO EX2-N22.um.umsystem.edu) ([198.209.56.34]) by um-tip2-exch-relay.um.umsystem.edu with ESMTP; 03 Sep 2018 22:56:29 -0500 Received: from EX2-N14.um.umsystem.edu (198.209.56.22) by EX2-N22.um.umsystem.edu (198.209.56.34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1531.3; Mon, 3 Sep 2018 22:56:28 -0500 Received: from EX2-N14.um.umsystem.edu ([198.209.56.22]) by EX2-N14.um.umsystem.edu ([198.209.56.22]) with mapi id 15.01.1531.003; Mon, 3 Sep 2018 22:56:28 -0500 From: "Montgomery-Smith, Stephen" To: "freebsd-numerics@freebsd.org" , "sgk@troutmask.apl.washington.edu" Subject: Re: j0 (and y0) in the range 2 <= x < (p/2)*log(2) Thread-Topic: j0 (and y0) in the range 2 <= x < (p/2)*log(2) Thread-Index: AQHUQ+RdF989zhJjQEG1rlb5E3D8e6TffLMg Date: Tue, 4 Sep 2018 03:56:28 +0000 Message-ID: References: <20180903235724.GA95333@troutmask.apl.washington.edu> In-Reply-To: <20180903235724.GA95333@troutmask.apl.washington.edu> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [72.161.255.145] MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.27 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Sep 2018 03:57:48 -0000 A quick google search turned up this https://www.cl.cam.ac.uk/~jrh13/papers/bessel.pdf which has the functions p0 and q0. Maybe this was the basis of this code. ________________________________ From: owner-freebsd-numerics@freebsd.org on behalf of Steve Kargl Sent: Monday, September 3, 2018 6:57:24 PM To: freebsd-numerics@freebsd.org Subject: j0 (and y0) in the range 2 <=3D x < (p/2)*log(2) Anyone know where the approximations for j0 (and y0) come from? msun/src/e_j0.c states * for x in (2,inf) * j0(x) =3D sqrt(2/(pi*x))*(p0(x)*cos(x0)-q0(x)*sin(x0)) * where x0 =3D x-pi/4. It is better to compute sin(x0),cos(x0) * as follow: * cos(x0) =3D cos(x)cos(pi/4)+sin(x)sin(pi/4) * =3D 1/sqrt(2) * (cos(x) + sin(x)) * sin(x0) =3D sin(x)cos(pi/4)-cos(x)sin(pi/4) * =3D 1/sqrt(2) * (sin(x) - cos(x)) * (To avoid cancellation, use * sin(x) +- cos(x) =3D -cos(2x)/(sin(x) -+ cos(x)) * to compute the worse one.) p0(x) and q0(x) are divergent asymptotic series. If I extract pzero() and qzero() from e_j0.c and compare the results against summing truncated versions of p0(x) and q0(x), there are no obvious connections. Reading the documentation for the algorithms used in MPFR suggests that x >=3D p/2*log(2), where p is precision of x, is required for use of the large argument approximation for j0(x). In double precision, p =3D 53, so we have x >=3D 18.368... Consider x=3D18.4 and sum up to N =3D 31 in the asymptotic series: % ./pq 30 18.4 p =3D 9.997932830701132e-01, q =3D -6.781826311540553e-03 <-- series pp =3D 9.997932830701132e-01, qq =3D -6.781826311540509e-03 <-- pzero,qze= ro ulp(p, pp) =3D 0.000000e+00 ulp(q, qq) =3D 2.550000e+01 This is almost reasonable if 25.5 ULP is acceptable in q0(x). Note the series are computed in long double with 64 bits of precision. Now, comparing x =3D 2 and summing N =3D 4 (best results). % ./pq 4 2 p =3D 9.894313812255859e-01, q =3D -5.334472656250000e-02 pp =3D 9.862158212188928e-01, qq =3D -5.647769967932505e-02 ulp(p, pp) =3D 1.448159e+13 ulp(q, qq) =3D 2.257545e+14 For values of N > 4, the series start to diverge! So, how does msun use the large argument approximation for j0(x)? -- Steve _______________________________________________ freebsd-numerics@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-numerics To unsubscribe, send any mail to "freebsd-numerics-unsubscribe@freebsd.org" From owner-freebsd-numerics@freebsd.org Tue Sep 4 04:10:12 2018 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E067DFE00B9 for ; Tue, 4 Sep 2018 04:10:12 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 6ACB5850DD for ; Tue, 4 Sep 2018 04:10:12 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id w844AAAG096219 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 3 Sep 2018 21:10:10 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id w844AAbc096218; Mon, 3 Sep 2018 21:10:10 -0700 (PDT) (envelope-from sgk) Date: Mon, 3 Sep 2018 21:10:10 -0700 From: Steve Kargl To: "Montgomery-Smith, Stephen" Cc: "freebsd-numerics@freebsd.org" Subject: Re: j0 (and y0) in the range 2 <= x < (p/2)*log(2) Message-ID: <20180904041010.GA96191@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20180903235724.GA95333@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Sep 2018 04:10:13 -0000 On Tue, Sep 04, 2018 at 03:56:28AM +0000, Montgomery-Smith, Stephen wrote: > A quick google search turned up this > > https://www.cl.cam.ac.uk/~jrh13/papers/bessel.pdf > > which has the functions p0 and q0. Maybe this was the basis of this code. I've read that paper. It uses |x| > 45 for the cut over to the large argument asymptotic expansion. One of the primary results for that paper is the development of new approximations that are robust near zeros of Jn(x). In the the discussion of the results, the paper notes the use of a double-double representation for intermediate results. A&S claims that the remainder in truncating the series does not exceed the magnitude of the first neglected term. If you set x = 2 and compute the terms in p0(x), one finds the smallest term is about |pk| = 1e-4. -- Steve From owner-freebsd-numerics@freebsd.org Tue Sep 4 21:22:03 2018 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DC308FFC781 for ; Tue, 4 Sep 2018 21:22:03 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 6E8078225A for ; Tue, 4 Sep 2018 21:22:03 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id w84LM2T6001818 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 4 Sep 2018 14:22:02 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id w84LM14P001817; Tue, 4 Sep 2018 14:22:01 -0700 (PDT) (envelope-from sgk) Date: Tue, 4 Sep 2018 14:22:01 -0700 From: Steve Kargl To: "Montgomery-Smith, Stephen" Cc: "freebsd-numerics@freebsd.org" Subject: Re: j0 (and y0) in the range 2 <= x < (p/2)*log(2) Message-ID: <20180904212201.GA1752@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20180903235724.GA95333@troutmask.apl.washington.edu> <20180904041010.GA96191@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180904041010.GA96191@troutmask.apl.washington.edu> User-Agent: Mutt/1.10.1 (2018-07-13) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Sep 2018 21:22:04 -0000 On Mon, Sep 03, 2018 at 09:10:10PM -0700, Steve Kargl wrote: > On Tue, Sep 04, 2018 at 03:56:28AM +0000, Montgomery-Smith, Stephen wrote: > > A quick google search turned up this > > > > https://www.cl.cam.ac.uk/~jrh13/papers/bessel.pdf > > > > which has the functions p0 and q0. Maybe this was the basis of this code. > > I've read that paper. It uses |x| > 45 for the cut over > to the large argument asymptotic expansion. One of the > primary results for that paper is the development of > new approximations that are robust near zeros of Jn(x). > In the the discussion of the results, the paper notes > the use of a double-double representation for intermediate > results. > > A&S claims that the remainder in truncating the series > does not exceed the magnitude of the first neglected > term. If you set x = 2 and compute the terms in > p0(x), one finds the smallest term is about |pk| = 1e-4. > To follow-up, here the individual terms and the estimated value of j0(x). pk and qk are the terms and p0(n,x) and q0(n,x) are the accumulated sum. In p0(n,x) k pk p0(n,x) 0 1.000000000000000e+00 1.000000000000000e+00 1 -1.757812500000000e-02 9.824218750000000e-01 2 2.574920654296875e-03 9.849967956542969e-01 3 -1.026783138513565e-03 9.839700125157833e-01 4 7.959574759297539e-04 9.847659699917131e-01 5 -1.015778544477541e-03 9.837501914472355e-01 6 1.931887598216261e-03 9.856820790454518e-01 7 -5.124164754615886e-03 9.805579142908359e-01 8 1.807719255473622e-02 9.986351068455721e-01 In q0(n,x) k qk q0(n,x) 0 -6.250000000000000e-02 -6.250000000000000e-02 1 4.943847656250000e-03 -5.755615234375000e-02 2 -1.341104507446289e-03 -5.889725685119629e-02 3 7.861308404244483e-04 -5.811112601077184e-02 4 -8.059069443788758e-04 -5.891703295515072e-02 5 1.280304207101901e-03 -5.763672874804882e-02 6 -2.915080393737037e-03 -6.055180914178585e-02 7 9.007320857723237e-03 -5.154448828406261e-02 8 -3.627992116888033e-02 -8.782440945294294e-02 x libm j0(x) A&S 2.000000 2.238907791412357e-01 2.429095124592851e-01 As k increases above k=8, one sees the divergence of the asymptotic series. -- Steve From owner-freebsd-numerics@freebsd.org Wed Sep 5 12:06:41 2018 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5370FFEE3B9 for ; Wed, 5 Sep 2018 12:06:41 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id C94D47E43D for ; Wed, 5 Sep 2018 12:06:39 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 6E42142A9B1; Wed, 5 Sep 2018 22:06:30 +1000 (AEST) Date: Wed, 5 Sep 2018 22:06:29 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: j0 (and y0) in the range 2 <= x < (p/2)*log(2) In-Reply-To: <20180903235724.GA95333@troutmask.apl.washington.edu> Message-ID: <20180905201540.D1142@besplex.bde.org> References: <20180903235724.GA95333@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=I9sVfJog c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=x7bEGLp0ZPQA:10 a=EPxvRtggjOO-4uFslx4A:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Sep 2018 12:06:41 -0000 On Mon, 3 Sep 2018, Steve Kargl wrote: > Anyone know where the approximations for j0 (and y0) come from? I think they are ordinary minimax rational approximations for related functions. As you noticed, the asymptotic expansion doesn't work below about x = 8 (it is off by about 10% for j0(2). But we want to use the single formula given by the asymptotic expansion for all the subintervals: XX /* XX * j0(x) = 1/sqrt(pi) * (P(0,x)*cc - Q(0,x)*ss) / sqrt(x) XX * y0(x) = 1/sqrt(pi) * (P(0,x)*ss + Q(0,x)*cc) / sqrt(x) XX */ XX if(ix>0x48000000) z = (invsqrtpi*cc)/sqrt(x); XX else { XX u = pzero(x); v = qzero(x); XX z = invsqrtpi*(u*cc-v*ss)/sqrt(x); XX } XX return z; XX } where pzero(s) is nominally 1 - 9/128 s**2 + ... and qzero(s) is nominally -1/8 *s + .... To work, pzero and qzero must not actually be these nominal functions. A non-tricky implementation would use : z = pzero(x) / qzero(x); where pzero and qzero depend on the subinterval like now and have all the other terms folded into them. We already do this for |x| < 2 (except we spell the rational function r/s and don't fold all of the terms directly into r and s). This might be the best implementation, and I would also try using smaller intervals with polynomial approximations. However, we use the asympotic formula for all cases above x = 2. It is off by at least 10% unless pzero and qzero are adjusted. But 10% is not much, and adjusting only pzero and qzero apparently works to compensate for this. To use the same formula for all cases, the complicated factors cc, ss and sqrt(x) must not be folded into pzero and qzero; instead these factors supply some of the errors which must be compensated for. To find the adjustments, I would try applying the same scale factor to the nominal pzero and qzero. E.g., for j0, the error factor is invsqrtpi*(u*cc-v*ss)/sqrt(x) / j0(x). Divide the nominal pzero and qzero by this. Apply normal minimax methods to the modfided pzero and qzero. Bruce From owner-freebsd-numerics@freebsd.org Wed Sep 5 12:19:11 2018 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 422BBFEE745 for ; Wed, 5 Sep 2018 12:19:11 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id B76887E8C2 for ; Wed, 5 Sep 2018 12:19:10 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id B410A42AC83; Wed, 5 Sep 2018 22:19:07 +1000 (AEST) Date: Wed, 5 Sep 2018 22:19:06 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org cc: Steve Kargl , freebsd-numerics@freebsd.org Subject: Re: j0 (and y0) in the range 2 <= x < (p/2)*log(2) In-Reply-To: <20180905201540.D1142@besplex.bde.org> Message-ID: <20180905221116.X1704@besplex.bde.org> References: <20180903235724.GA95333@troutmask.apl.washington.edu> <20180905201540.D1142@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=I9sVfJog c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=9cW_t1CCXrUA:10 a=kj9zAlcOel0A:10 a=x7bEGLp0ZPQA:10 a=0GgSnPAGSJmx6d4KNYIA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Sep 2018 12:19:11 -0000 On Wed, 5 Sep 2018, Bruce Evans wrote: > On Mon, 3 Sep 2018, Steve Kargl wrote: > >> Anyone know where the approximations for j0 (and y0) come from? > > I think they are ordinary minimax rational approximations for related > functions. As you noticed, the asymptotic expansion doesn't work below > about x = 8 (it is off by about 10% for j0(2). But we want to use the ... j0(2)). > single formula given by the asymptotic expansion for all the subintervals: > > XX /* > XX * j0(x) = 1/sqrt(pi) * (P(0,x)*cc - Q(0,x)*ss) / sqrt(x) > XX * y0(x) = 1/sqrt(pi) * (P(0,x)*ss + Q(0,x)*cc) / sqrt(x) > XX */ > XX if(ix>0x48000000) z = (invsqrtpi*cc)/sqrt(x); > XX else { > XX u = pzero(x); v = qzero(x); > XX z = invsqrtpi*(u*cc-v*ss)/sqrt(x); > XX } > XX return z; > XX } > > where pzero(s) is nominally 1 - 9/128 s**2 + ... and qzero(s) is nominally > -1/8 *s + .... These polynomials are actually only part of the numerators of pzero and qzero (s = 1/x already gives a non-polynomial, and even more divisions are used in pzero = 1 + R/S...). > To work, pzero and qzero must not actually be these nominal functions. > ... Bruce From owner-freebsd-numerics@freebsd.org Wed Sep 5 15:21:12 2018 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7F88CFF3A35 for ; Wed, 5 Sep 2018 15:21:12 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 1FB0C85A2D for ; Wed, 5 Sep 2018 15:21:12 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id w85FL4gX026575 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 5 Sep 2018 08:21:04 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id w85FL4lG026574; Wed, 5 Sep 2018 08:21:04 -0700 (PDT) (envelope-from sgk) Date: Wed, 5 Sep 2018 08:21:04 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: j0 (and y0) in the range 2 <= x < (p/2)*log(2) Message-ID: <20180905152104.GA26453@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20180903235724.GA95333@troutmask.apl.washington.edu> <20180905201540.D1142@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180905201540.D1142@besplex.bde.org> User-Agent: Mutt/1.10.1 (2018-07-13) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Sep 2018 15:21:12 -0000 On Wed, Sep 05, 2018 at 10:06:29PM +1000, Bruce Evans wrote: > On Mon, 3 Sep 2018, Steve Kargl wrote: > > > Anyone know where the approximations for j0 (and y0) come from? > > I think they are ordinary minimax rational approximations for related > functions. As you noticed, the asymptotic expansion doesn't work below > about x = 8 (it is off by about 10% for j0(2). But we want to use the > single formula given by the asymptotic expansion for all the subintervals: I've scoured the literature and web for methods of computing Bessel functions. These functions are important to my real work. I have not found any paper, webpage, documentation, etc. that describes what "the related functions" are. > XX /* > XX * j0(x) = 1/sqrt(pi) * (P(0,x)*cc - Q(0,x)*ss) / sqrt(x) > XX * y0(x) = 1/sqrt(pi) * (P(0,x)*ss + Q(0,x)*cc) / sqrt(x) > XX */ > XX if(ix>0x48000000) z = (invsqrtpi*cc)/sqrt(x); > XX else { > XX u = pzero(x); v = qzero(x); > XX z = invsqrtpi*(u*cc-v*ss)/sqrt(x); > XX } > XX return z; > XX } > > where pzero(s) is nominally 1 - 9/128 s**2 + ... and qzero(s) is nominally > -1/8 *s + .... > > To work, pzero and qzero must not actually be these nominal functions. If |x| > p/2*log(2) with p the precision of x, P0(x) and Q0(x) can be used directly. It is the 2 <= |x| < p/2*log(2) range that is the issue, but your last paragraph below may be what I have been missing. > > A non-tricky implementation would use : > > z = pzero(x) / qzero(x); > > where pzero and qzero depend on the subinterval like now and have all the > other terms folded into them. We already do this for |x| < 2 (except we > spell the rational function r/s and don't fold all of the terms directly > into r and s). This might be the best implementation, and I would also > try using smaller intervals with polynomial approximations. > > However, we use the asympotic formula for all cases above x = 2. It is > off by at least 10% unless pzero and qzero are adjusted. But 10% is not > much, and adjusting only pzero and qzero apparently works to compensate > for this. To use the same formula for all cases, the complicated factors > cc, ss and sqrt(x) must not be folded into pzero and qzero; instead these > factors supply some of the errors which must be compensated for. > > To find the adjustments, I would try applying the same scale factor to > the nominal pzero and qzero. E.g., for j0, the error factor is > invsqrtpi*(u*cc-v*ss)/sqrt(x) / j0(x). Divide the nominal pzero and qzero > by this. Apply normal minimax methods to the modfided pzero and qzero. Obviously, I had not thought about scaling P0(x) and Q0(x) with a common factor to force agreement between j0(x) and its approximation. -- Steve From owner-freebsd-numerics@freebsd.org Wed Sep 5 18:09:17 2018 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EA99CFF8AB8 for ; Wed, 5 Sep 2018 18:09:16 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 695EF8CA2E for ; Wed, 5 Sep 2018 18:09:16 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 226823CC636; Thu, 6 Sep 2018 04:09:05 +1000 (AEST) Date: Thu, 6 Sep 2018 04:09:05 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: j0 (and y0) in the range 2 <= x < (p/2)*log(2) In-Reply-To: <20180905152104.GA26453@troutmask.apl.washington.edu> Message-ID: <20180906034525.A2959@besplex.bde.org> References: <20180903235724.GA95333@troutmask.apl.washington.edu> <20180905201540.D1142@besplex.bde.org> <20180905152104.GA26453@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=DZtnkrlW c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=x7bEGLp0ZPQA:10 a=sdg16lVg9fCeTce6EDsA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Sep 2018 18:09:17 -0000 On Wed, 5 Sep 2018, Steve Kargl wrote: > On Wed, Sep 05, 2018 at 10:06:29PM +1000, Bruce Evans wrote: >> On Mon, 3 Sep 2018, Steve Kargl wrote: >> >>> Anyone know where the approximations for j0 (and y0) come from? >> >> I think they are ordinary minimax rational approximations for related >> functions. As you noticed, the asymptotic expansion doesn't work below >> about x = 8 (it is off by about 10% for j0(2). But we want to use the >> single formula given by the asymptotic expansion for all the subintervals: > > I've scoured the literature and web for methods of computing > Bessel functions. These functions are important to my real > work. I have not found any paper, webpage, documentation, etc. > that describes what "the related functions" are. They are just the functions in the asymptotic expansion with errors corrected as I discussed. >> ... >> However, we use the asympotic formula for all cases above x = 2. It is >> off by at least 10% unless pzero and qzero are adjusted. But 10% is not >> much, and adjusting only pzero and qzero apparently works to compensate >> for this. To use the same formula for all cases, the complicated factors >> cc, ss and sqrt(x) must not be folded into pzero and qzero; instead these >> factors supply some of the errors which must be compensated for. >> >> To find the adjustments, I would try applying the same scale factor to >> the nominal pzero and qzero. E.g., for j0, the error factor is >> invsqrtpi*(u*cc-v*ss)/sqrt(x) / j0(x). Divide the nominal pzero and qzero >> by this. Apply normal minimax methods to the modfided pzero and qzero. > > Obviously, I had not thought about scaling P0(x) and Q0(x) with a > common factor to force agreement between j0(x) and its approximation. I think I understand why we use the asymptotic formula as a base. If we choose a minimax function for each interval, this would need to have a very high order. Much the same order as the rational function part of the asymptotic expansion. But the latter has a clever grouping of terms using cos and sin and in the internals of P0 and Q0. This makes it faster to calculate and also probably has better accuracy. Bruce From owner-freebsd-numerics@freebsd.org Wed Sep 5 19:02:25 2018 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7AD52FFA31A for ; Wed, 5 Sep 2018 19:02:25 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 11FF88EBA2 for ; Wed, 5 Sep 2018 19:02:25 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id w85J2N2b027979 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 5 Sep 2018 12:02:23 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id w85J2Njc027978; Wed, 5 Sep 2018 12:02:23 -0700 (PDT) (envelope-from sgk) Date: Wed, 5 Sep 2018 12:02:23 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: j0 (and y0) in the range 2 <= x < (p/2)*log(2) Message-ID: <20180905190223.GA27865@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20180903235724.GA95333@troutmask.apl.washington.edu> <20180905201540.D1142@besplex.bde.org> <20180905152104.GA26453@troutmask.apl.washington.edu> <20180906034525.A2959@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180906034525.A2959@besplex.bde.org> User-Agent: Mutt/1.10.1 (2018-07-13) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Sep 2018 19:02:25 -0000 On Thu, Sep 06, 2018 at 04:09:05AM +1000, Bruce Evans wrote: > On Wed, 5 Sep 2018, Steve Kargl wrote: > > > On Wed, Sep 05, 2018 at 10:06:29PM +1000, Bruce Evans wrote: > >> On Mon, 3 Sep 2018, Steve Kargl wrote: > >> > >>> Anyone know where the approximations for j0 (and y0) come from? > >> > >> I think they are ordinary minimax rational approximations for related > >> functions. As you noticed, the asymptotic expansion doesn't work below > >> about x = 8 (it is off by about 10% for j0(2). But we want to use the > >> single formula given by the asymptotic expansion for all the subintervals: > > > > I've scoured the literature and web for methods of computing > > Bessel functions. These functions are important to my real > > work. I have not found any paper, webpage, documentation, etc. > > that describes what "the related functions" are. > > They are just the functions in the asymptotic expansion with errors corrected > as I discussed. And as I noted, there is no documentation stating the approximations pzero(x) and qzero(x) aren't approximations for the asymptotic series P0(x) and Q0(x). If you are correct, then pzero(x) and qzero(x) are approximations to fudge*P0(x) and fudge*Q0(x). What fudge is and how it is determined is not documented. -- Steve From owner-freebsd-numerics@freebsd.org Thu Sep 6 16:02:32 2018 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 61AE0FFF7B8 for ; Thu, 6 Sep 2018 16:02:32 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id D5F9C7DAEE for ; Thu, 6 Sep 2018 16:02:31 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 67323105F06; Fri, 7 Sep 2018 02:02:21 +1000 (AEST) Date: Fri, 7 Sep 2018 02:02:20 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: j0 (and y0) in the range 2 <= x < (p/2)*log(2) In-Reply-To: <20180905190223.GA27865@troutmask.apl.washington.edu> Message-ID: <20180906202658.K978@besplex.bde.org> References: <20180903235724.GA95333@troutmask.apl.washington.edu> <20180905201540.D1142@besplex.bde.org> <20180905152104.GA26453@troutmask.apl.washington.edu> <20180906034525.A2959@besplex.bde.org> <20180905190223.GA27865@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=x7bEGLp0ZPQA:10 a=3U586gcwnfTDvXu4w2cA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Sep 2018 16:02:32 -0000 On Wed, 5 Sep 2018, Steve Kargl wrote: > On Thu, Sep 06, 2018 at 04:09:05AM +1000, Bruce Evans wrote: >> On Wed, 5 Sep 2018, Steve Kargl wrote: >>> I've scoured the literature and web for methods of computing >>> Bessel functions. These functions are important to my real >>> work. I have not found any paper, webpage, documentation, etc. >>> that describes what "the related functions" are. >> >> They are just the functions in the asymptotic expansion with errors corrected >> as I discussed. > > And as I noted, there is no documentation stating the approximations > pzero(x) and qzero(x) aren't approximations for the asymptotic series > P0(x) and Q0(x). If you are correct, then pzero(x) and qzero(x) are > approximations to fudge*P0(x) and fudge*Q0(x). What fudge is and how > it is determined is not documented. The documentation (comments in the source code) is indeed deficient. It is too verbose about routine general methods but says little about non- routine things. I now think that no single fudge factor works for both P0 and Q0. P0 and Q0 have very different characteristics. Merely choosing the truncation point for the asymptotic expansion makes them very different. One thing that can go wrong is that if you truncate to more than about 5 terms, a zero point for one or both of P0 and Q0 ends up in the subinterval being handled. The fudge factor would need to be nearly infinite to compensate for the zero, and since P0 and Q0 won't have the zero at the same place, the compensation for one would destroy the other. The infinities show up in attempts to calculate the fudge factor near the zero. When P0 is truncated after the s^14 term, then on [2, 2.857] P0 (1/x) is strictly increasing from -5.61 to 0.95, with the zero in the middle giving singularities, but pzero is strictly increasing from 1 - 0.013 to 1 - 0.007. However, when P0 is truncated after the s^6 term, it is similar to pzero (strictly increasing from 1 - 0.019 to 1 - 0.009). pzero is apparently based on the latter P0. To find different fudge factors, I would try starting with a common one and then iteratively adjust to different ones. j0 has a zero in the middle of at least the first subinterval, and the relative error should be minimized for this. I think the choice of subintervals is related to this. With only one zero per subinterval, the relative error near it can be minimized without messing up the relative error near other zeros. The adjustment for this is absolutely tiny, so it is easier to arrange that it doesn't mess up the absolute error too much. Bruce From owner-freebsd-numerics@freebsd.org Thu Sep 6 22:30:35 2018 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3F94CFE7A14 for ; Thu, 6 Sep 2018 22:30:35 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id D58CC8F52E for ; Thu, 6 Sep 2018 22:30:34 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id w86MUQCV043015 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 6 Sep 2018 15:30:26 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id w86MUQu8043014; Thu, 6 Sep 2018 15:30:26 -0700 (PDT) (envelope-from sgk) Date: Thu, 6 Sep 2018 15:30:26 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: j0 (and y0) in the range 2 <= x < (p/2)*log(2) Message-ID: <20180906223026.GA43005@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20180903235724.GA95333@troutmask.apl.washington.edu> <20180905201540.D1142@besplex.bde.org> <20180905152104.GA26453@troutmask.apl.washington.edu> <20180906034525.A2959@besplex.bde.org> <20180905190223.GA27865@troutmask.apl.washington.edu> <20180906202658.K978@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180906202658.K978@besplex.bde.org> User-Agent: Mutt/1.10.1 (2018-07-13) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Sep 2018 22:30:35 -0000 On Fri, Sep 07, 2018 at 02:02:20AM +1000, Bruce Evans wrote: > On Wed, 5 Sep 2018, Steve Kargl wrote: > > > On Thu, Sep 06, 2018 at 04:09:05AM +1000, Bruce Evans wrote: > >> On Wed, 5 Sep 2018, Steve Kargl wrote: > >>> I've scoured the literature and web for methods of computing > >>> Bessel functions. These functions are important to my real > >>> work. I have not found any paper, webpage, documentation, etc. > >>> that describes what "the related functions" are. > >> > >> They are just the functions in the asymptotic expansion with errors corrected > >> as I discussed. > > > > And as I noted, there is no documentation stating the approximations > > pzero(x) and qzero(x) aren't approximations for the asymptotic series > > P0(x) and Q0(x). If you are correct, then pzero(x) and qzero(x) are > > approximations to fudge*P0(x) and fudge*Q0(x). What fudge is and how > > it is determined is not documented. > > The documentation (comments in the source code) is indeed deficient. It > is too verbose about routine general methods but says little about non- > routine things. > > I now think that no single fudge factor works for both P0 and Q0. P0 > and Q0 have very different characteristics. Merely choosing the > truncation point for the asymptotic expansion makes them very different. > One thing that can go wrong is that if you truncate to more than about > 5 terms, a zero point for one or both of P0 and Q0 ends up in the > subinterval being handled. The fudge factor would need to be nearly > infinite to compensate for the zero, and since P0 and Q0 won't have > the zero at the same place, the compensation for one would destroy the > other. > > The infinities show up in attempts to calculate the fudge factor near the > zero. > > When P0 is truncated after the s^14 term, then on [2, 2.857] P0 (1/x) > is strictly increasing from -5.61 to 0.95, with the zero in the middle > giving singularities, but pzero is strictly increasing from 1 - 0.013 to > 1 - 0.007. However, when P0 is truncated after the s^6 term, it is > similar to pzero (strictly increasing from 1 - 0.019 to 1 - 0.009). > pzero is apparently based on the latter P0. > > To find different fudge factors, I would try starting with a common one > and then iteratively adjust to different ones. > > j0 has a zero in the middle of at least the first subinterval, and the > relative error should be minimized for this. I think the choice of > subintervals is related to this. With only one zero per subinterval, > the relative error near it can be minimized without messing up the > relative error near other zeros. The adjustment for this is absolutely > tiny, so it is easier to arrange that it doesn't mess up the absolute > error too much. > I'm beginning to like my use of the product formula. It allows one to remove the zero from an interval. I did some code spelunking and found #ifndef lint static char sccsid[] = "@(#)j0.c 4.2 (Berkeley) 8/21/85"; #endif not lint Found at http://www.retro11.de/ouxr/43bsd/usr/src/usr.lib/libm/j0.c.html The algorithm is completely different than what we have in libm. I won't have time to look at this until the weekend. -- Steve From owner-freebsd-numerics@freebsd.org Fri Sep 7 14:07:32 2018 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 206CDFFBDE9 for ; Fri, 7 Sep 2018 14:07:32 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id 66C428AFF1 for ; Fri, 7 Sep 2018 14:07:30 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 2B74910685F; Sat, 8 Sep 2018 00:07:27 +1000 (AEST) Date: Sat, 8 Sep 2018 00:07:26 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: j0 (and y0) in the range 2 <= x < (p/2)*log(2) In-Reply-To: <20180906223026.GA43005@troutmask.apl.washington.edu> Message-ID: <20180907230825.A933@besplex.bde.org> References: <20180903235724.GA95333@troutmask.apl.washington.edu> <20180905201540.D1142@besplex.bde.org> <20180905152104.GA26453@troutmask.apl.washington.edu> <20180906034525.A2959@besplex.bde.org> <20180905190223.GA27865@troutmask.apl.washington.edu> <20180906202658.K978@besplex.bde.org> <20180906223026.GA43005@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=I9sVfJog c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=x7bEGLp0ZPQA:10 a=xz5PcUCpAAAA:8 a=NdYQpMVuwmpbWQRTCfkA:9 a=CjuIK1q_8ugA:10 a=DwsaIXcYyF8A:10 a=pM-ymjXk_DYA:10 a=IWbigzpecG3ytlOIVP4C:22 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Sep 2018 14:07:32 -0000 On Thu, 6 Sep 2018, Steve Kargl wrote: > On Fri, Sep 07, 2018 at 02:02:20AM +1000, Bruce Evans wrote: >> ... >> I now think that no single fudge factor works for both P0 and Q0. P0 >> and Q0 have very different characteristics. Merely choosing the >> ... >> To find different fudge factors, I would try starting with a common one >> and then iteratively adjust to different ones. >> >> j0 has a zero in the middle of at least the first subinterval, and the >> relative error should be minimized for this. I think the choice of >> subintervals is related to this. With only one zero per subinterval, >> the relative error near it can be minimized without messing up the >> relative error near other zeros. The adjustment for this is absolutely >> tiny, so it is easier to arrange that it doesn't mess up the absolute >> error too much. In fact, this must more or less work, and the difficulty is to make it work more than less. Start with any reasonable formula like (u*cc-v*ss). (This is not specific to j0, except it gives an example of a nontrivial formula.) cc and ss are held fixed. The best choice of u is not known, but we if we can get fairly close to it (perhaps even off by a factor of 2), then we can throw away any previous approximation for v and solve for v (in high precision), then approximate that. There must be no singularities for this to work in high precision. We expect to avoid the singularities by some combination of starting with u close enough, keeping the interval small, and special handling of zeros. Here the formula for v is something divided by ss, so we want to avoid zeros of ss. ss is sin(x) - cos(x) which is close to sqrt(2) of the entire interval [2,2.857], so there is no problem with pole singularities. In the next interval, ss has a zero in the middle of the interval, so we should switch the roles of u and v so as to divide by cc instead (it is not so close to -sqrt(2)). Some magic is needed to make the resulting v easy to approximate. Hopefully not much more than a good u and a small interval. Then there is the known zero in the middle of the interval. v should be chosen to at least have the correct signs on the FP values on each side of the zero. I noticed that this method cannot work for jn, since u and v depend on n so there is no way to generate before runtime. jn actually uses a different method and I suspect that it is much more inaccurate. My tests don't cover jn since its n arg is an integer and I only have coverage for 2 real args or 1 complex arg. > I'm beginning to like my use of the product formula. It > allows one to remove the zero from an interval. I did some I haven't seen that. Is it for the final evaluation or for determining the approximation? I don't know of any better way determining approximations near zeros except to multiply by a linear factor to remove the zero. It is becoming clearer why a separate interval is needed for each zero. Polynomials can only give good approximations for one zero at a time except in special cases. > code spelunking and found > > #ifndef lint > static char sccsid[] = "@(#)j0.c 4.2 (Berkeley) 8/21/85"; > #endif not lint > > Found at http://www.retro11.de/ouxr/43bsd/usr/src/usr.lib/libm/j0.c.html > > The algorithm is completely different than what we have in libm. The one in Net/2 (1992) is the same as now, all the way down to style bugs like spelling qzero as pzero in 1 place in the comment about qzero. Bruce From owner-freebsd-numerics@freebsd.org Fri Sep 7 17:54:09 2018 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9C7BF1007B78 for ; Fri, 7 Sep 2018 17:54:09 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 3322372868 for ; Fri, 7 Sep 2018 17:54:09 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id w87Hs7l6049699 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 7 Sep 2018 10:54:07 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id w87Hs6nr049698; Fri, 7 Sep 2018 10:54:06 -0700 (PDT) (envelope-from sgk) Date: Fri, 7 Sep 2018 10:54:06 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: j0 (and y0) in the range 2 <= x < (p/2)*log(2) Message-ID: <20180907175406.GA49351@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20180903235724.GA95333@troutmask.apl.washington.edu> <20180905201540.D1142@besplex.bde.org> <20180905152104.GA26453@troutmask.apl.washington.edu> <20180906034525.A2959@besplex.bde.org> <20180905190223.GA27865@troutmask.apl.washington.edu> <20180906202658.K978@besplex.bde.org> <20180906223026.GA43005@troutmask.apl.washington.edu> <20180907230825.A933@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180907230825.A933@besplex.bde.org> User-Agent: Mutt/1.10.1 (2018-07-13) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Sep 2018 17:54:09 -0000 On Sat, Sep 08, 2018 at 12:07:26AM +1000, Bruce Evans wrote: > On Thu, 6 Sep 2018, Steve Kargl wrote: > > > On Fri, Sep 07, 2018 at 02:02:20AM +1000, Bruce Evans wrote: > >> ... > >> I now think that no single fudge factor works for both P0 and Q0. P0 > >> and Q0 have very different characteristics. Merely choosing the > >> ... > >> To find different fudge factors, I would try starting with a common one > >> and then iteratively adjust to different ones. > >> > >> j0 has a zero in the middle of at least the first subinterval, and the > >> relative error should be minimized for this. I think the choice of > >> subintervals is related to this. With only one zero per subinterval, > >> the relative error near it can be minimized without messing up the > >> relative error near other zeros. The adjustment for this is absolutely > >> tiny, so it is easier to arrange that it doesn't mess up the absolute > >> error too much. > > In fact, this must more or less work, and the difficulty is to make it work > more than less. Start with any reasonable formula like (u*cc-v*ss). (This > is not specific to j0, except it gives an example of a nontrivial formula.) > cc and ss are held fixed. The best choice of u is not known, but we if we > can get fairly close to it (perhaps even off by a factor of 2), then we can > throw away any previous approximation for v and solve for v (in high > precision), then approximate that. There must be no singularities for this > to work in high precision. We expect to avoid the singularities by some > combination of starting with u close enough, keeping the interval small, > and special handling of zeros. > > Here the formula for v is something divided by ss, so we want to avoid > zeros of ss. ss is sin(x) - cos(x) which is close to sqrt(2) of the > entire interval [2,2.857], so there is no problem with pole singularities. > In the next interval, ss has a zero in the middle of the interval, so we > should switch the roles of u and v so as to divide by cc instead (it is > not so close to -sqrt(2)). > > Some magic is needed to make the resulting v easy to approximate. > Hopefully not much more than a good u and a small interval. > > Then there is the known zero in the middle of the interval. v should be > chosen to at least have the correct signs on the FP values on each side > of the zero. I don't see how the above works. We have j0(x) = sqrt(1/pi) * (cc*u-ss*v) / sqrt(x). If we take u to be accurate (with some amount of error from the asymptotic series) and assume v as an unknown, we can solve for v to high accuracy. But, u is still inaccurate. So, in the interval [2,2.857] we accept whatever the truncated asymptotic series gives for u and adjust v to compensate. In the next interval, [2.857,4.547], we flip the roles of u and v based on the behavior of cc and ss. Eventually, we get to the interval [8,inf). The divergent asymptotic series in 8 <= x < p/2*log(2) give u and v, which yield poor values for j0(x). There are 4 zeros of j0(x) in this range, and cc and ss have 4 and 3 zeros, respectfully. There are some subtleties in your approach that I don't understand, yet! I'll probably try a different approach tomorrow. j0(x), cc, and ss are smoothly varying functions of x. u and v appear to be slowly varying (in small intervals), so we can have j0(x) = sqrt(1/pi) * (cc(x) *u-ss(x) *v) / sqrt(x). j0(x+e) = sqrt(1/pi) * (cc(x+e)*u-ss(x+e)*v) / sqrt(x+e). where e is some small perturbution. Two equation and two unknowns is easily solved. Simply need to do this across the interval to generate u(x) and v(x). > I noticed that this method cannot work for jn, since u and v depend on n > so there is no way to generate before runtime. jn actually uses a different > method and I suspect that it is much more inaccurate. My tests don't > cover jn since its n arg is an integer and I only have coverage for 2 > real args or 1 complex arg. jn with n > 1 only uses the large argument asymptotic expansion for the case of very large x where p0(x) = 1 and q0(x) = 0. This then gives (from the comment in e_jn.c) * Jn(x) = cos(x-(2n+1)*pi/4)*sqrt(2/x*pi) The work is in computing cos(). Elsewhere jn is computed via upward or downward recursion. > > I'm beginning to like my use of the product formula. It > > allows one to remove the zero from an interval. I did some > > I haven't seen that. Is it for the final evaluation or for determining > the approximation? I don't know of any better way determining approximations > near zeros except to multiply by a linear factor to remove the zero. > > It is becoming clearer why a separate interval is needed for each zero. > Polynomials can only give good approximations for one zero at a time > except in special cases. I haven't sent the code to you as its a work in progress. I did send you some of my testing results for 2 <= x < 4 a couple weeks ago. For j0(x), the infinite product is (https://dlmf.nist.gov/10.21) j0(x) = prod(1 - (x/zj)**2), j = 0, 1, 2, ... with z0 the first zero of j0(x), etc. This can be rewritten and approximated by j0(x) / ((x-z0)*(x+z0)) = R(x) / S(x) While developing the rational approximation, on the LHS we use z0 = z0hi + z0lo. Both z0hi and z0lo have p-bits of precision. The final result is j0(x) = (x - z0hi - z0lo) * (x + z0) * R(x) / S(x) -- Steve From owner-freebsd-numerics@freebsd.org Sat Sep 8 04:49:39 2018 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B0930FE788B for ; Sat, 8 Sep 2018 04:49:39 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 356208592E for ; Sat, 8 Sep 2018 04:49:39 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id w884nV5A052904 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 7 Sep 2018 21:49:31 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id w884nVaO052903; Fri, 7 Sep 2018 21:49:31 -0700 (PDT) (envelope-from sgk) Date: Fri, 7 Sep 2018 21:49:31 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: j0 (and y0) in the range 2 <= x < (p/2)*log(2) Message-ID: <20180908044931.GA52882@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20180903235724.GA95333@troutmask.apl.washington.edu> <20180905201540.D1142@besplex.bde.org> <20180905152104.GA26453@troutmask.apl.washington.edu> <20180906034525.A2959@besplex.bde.org> <20180905190223.GA27865@troutmask.apl.washington.edu> <20180906202658.K978@besplex.bde.org> <20180906223026.GA43005@troutmask.apl.washington.edu> <20180907230825.A933@besplex.bde.org> <20180907175406.GA49351@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180907175406.GA49351@troutmask.apl.washington.edu> User-Agent: Mutt/1.10.1 (2018-07-13) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Sep 2018 04:49:39 -0000 On Fri, Sep 07, 2018 at 10:54:06AM -0700, Steve Kargl wrote: > > I'll probably try a different approach tomorrow. j0(x), > cc, and ss are smoothly varying functions of x. u and v > appear to be slowly varying (in small intervals), so we can > have > > j0(x) = sqrt(1/pi) * (cc(x) *u-ss(x) *v) / sqrt(x). > j0(x+e) = sqrt(1/pi) * (cc(x+e)*u-ss(x+e)*v) / sqrt(x+e). > > where e is some small perturbution. Two equation and two > unknowns is easily solved. Simply need to do this across > the interval to generate u(x) and v(x). > I may have have a better approach! j0(x) = sqrt(1/pi) * (cc(x) * u - ss(x) * v) / sqrt(x) y0(x) = sqrt(1/pi) * (cc(x) * u + ss(x) * v) / sqrt(x) j0(x) + y0(x) = 2 * sqrt(1/pi) * cc(x) * u j0(x) - y0(x) = - 2 * sqrt(1/pi) * ss(x) * v So, we have u = (j0 + y0) / (2 * sqrt(1/pi) * cc) v = (y0 - j0) / (s * sqrt(1/pi) * ss) Thus, we can find u = r/s or u = 1 + r/s and v = r/s. -- Steve From owner-freebsd-numerics@freebsd.org Sat Sep 8 12:09:46 2018 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C5910FF235D for ; Sat, 8 Sep 2018 12:09:46 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 18910922F4 for ; Sat, 8 Sep 2018 12:09:45 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 2490D427C94; Sat, 8 Sep 2018 22:09:36 +1000 (AEST) Date: Sat, 8 Sep 2018 22:09:36 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: j0 (and y0) in the range 2 <= x < (p/2)*log(2) In-Reply-To: <20180907175406.GA49351@troutmask.apl.washington.edu> Message-ID: <20180908203432.X883@besplex.bde.org> References: <20180903235724.GA95333@troutmask.apl.washington.edu> <20180905201540.D1142@besplex.bde.org> <20180905152104.GA26453@troutmask.apl.washington.edu> <20180906034525.A2959@besplex.bde.org> <20180905190223.GA27865@troutmask.apl.washington.edu> <20180906202658.K978@besplex.bde.org> <20180906223026.GA43005@troutmask.apl.washington.edu> <20180907230825.A933@besplex.bde.org> <20180907175406.GA49351@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=DZtnkrlW c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=x7bEGLp0ZPQA:10 a=PYnjg3YJAAAA:8 a=PELLQPGwMwU0Yra-3LsA:9 a=JhKNuFBUfYnbQ4la:21 a=W0vPupI8WKlq7FNH:21 a=CjuIK1q_8ugA:10 a=96-UuAdfYG6OSYlHWuPe:22 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Sep 2018 12:09:47 -0000 On Fri, 7 Sep 2018, Steve Kargl wrote: > On Sat, Sep 08, 2018 at 12:07:26AM +1000, Bruce Evans wrote: >> ... >> In fact, this must more or less work, and the difficulty is to make it work >> more than less. Start with any reasonable formula like (u*cc-v*ss). (This >> is not specific to j0, except it gives an example of a nontrivial formula.) >> cc and ss are held fixed. The best choice of u is not known, but we if we >> can get fairly close to it (perhaps even off by a factor of 2), then we can >> throw away any previous approximation for v and solve for v (in high >> precision), then approximate that. There must be no singularities for this >> to work in high precision. We expect to avoid the singularities by some >> combination of starting with u close enough, keeping the interval small, >> and special handling of zeros. >> ... > I don't see how the above works. We have > > j0(x) = sqrt(1/pi) * (cc*u-ss*v) / sqrt(x). > > If we take u to be accurate (with some amount of error from the > asymptotic series) and assume v as an unknown, we can solve for > v to high accuracy. But, u is still inaccurate. So, in the u should have a small error, but suppose we replace it by 0. This also loses the ss factor. To make things even worse, replace (-ss) and the sqrt's by 1. Then the approximation reduces to v alone, and we know that we can do this fairly accurately. Actually doing this, I get an absolute accuracy of 53.652 bits for approximating plain j0 on [2, 2.857] using an even polynomial of degree 16. A rational function of order 16 (with degree 8 and 8 polynomials?) should work slightly better. Getting relative accuracy near the zero will lose some of the absolute acuracy, but I expect a degree/order of 2 higher would be enough to compensate. But we keep u and all the other terms in the expecation that they will allow better approximations with lower orders for u. They actually seem to require much lower orders of u get only slightly worse approximations. pzero(x) = 1 + r(z) / s(z) where z = 1/(x*x) has degree 5 for p and degree 4 for q. This has order 2 * (5 + 4) = 18 in x. qzero(x) = (-0.125 + r(z) / s(z)) / x has degree 1 higher for s and thus order 20 in x for the r/s part, or 19 for qzero. Add another 12 or 13 for the order of cos or sin. Add further inefficiencies and inaccuracies for the divisions in this. I get a large unimprovement in the simple polynomial approximation by approximating j0(x) * sqrt(x) instead of j0(x). This compensates for keeping the 1/sqrt(x) factor from the asymptotic formula. It breaks the evenness of the function being approximated, so using an even polynomial of degree 18 gives an accuracy of only 26.754 bits. For odd polys, it takes degree 13 to get 53+ bits of accuracy. For even polys, it takes degree 38 to get 53+ bits of accuracy. It now seems clear that the asympotic approximation is just garbage near x = 2. Especially the 1/sqrt(x) factor in it. This destroys the evenness of the function so it needs a much higher order of rational function to compensate. Probably similarly for x <= 8. It is only for x > 8 that we are forced to use the asymptotic expansion with its inconvenient 1/sqrt(x) factor. > interval [2,2.857] we accept whatever the truncated asymptotic > series gives for u and adjust v to compensate. In the next > interval, [2.857,4.547], we flip the roles of u and v based on > the behavior of cc and ss. Eventually, we get to the interval > [8,inf). The divergent asymptotic series in 8 <= x < p/2*log(2) > give u and v, which yield poor values for j0(x). There are > 4 zeros of j0(x) in this range, and cc and ss have 4 and 3 zeros, > respectfully. I didn't notice at first that you were stating my idea in different words here. I think you did see. > There are some subtleties in your approach that I don't > understand, yet! The only subtlety seems to be that we didn't understand the full awfulness of using the asympotic formula where it doesn't apply. The subtleties mostly go away if we use the simple method of a different poly approx on each subinterval. I don't like rational function approximations (with a nontrivial denominator), and don't fully understand why they are often more accurate. Now I suspect that their main advantage is reducing the number of zeros in the approximating function, and that this advantage is especially small here since we use small intervals to avoid zeros in the function being approximated. Higher degree polys have a higher number of zeros, and these zeros must all be outside of the interval except ones matching the zeros of the function being approximated. Using a rational function reduces the degree, by using poles instead of zeros. The poles must also be outside of the interval, but this is apparently easier to arrange. > I'll probably try a different approach tomorrow. j0(x), > cc, and ss are smoothly varying functions of x. u and v > appear to be slowly varying (in small intervals), so we can > have > > j0(x) = sqrt(1/pi) * (cc(x) *u-ss(x) *v) / sqrt(x). > j0(x+e) = sqrt(1/pi) * (cc(x+e)*u-ss(x+e)*v) / sqrt(x+e). > > where e is some small perturbution. Two equation and two > unknowns is easily solved. Simply need to do this across > the interval to generate u(x) and v(x). I will reply to your other mail about this. Better throw away the asymptotic expansion except for the interval at infinity, but that would require reorganizing almost everything. I would first try using the simple poly approx in the interval before the infinite one. The asymptotic formula should work better then (but pzero and qzero sill have high order, perhaps because the destruction of parity by the 1/sqrt(x), u and v factors is almost as bad as for the interval starting at 2). >> I noticed that this method cannot work for jn, since u and v depend on n >> so there is no way to generate before runtime. jn actually uses a different >> method and I suspect that it is much more inaccurate. My tests don't >> cover jn since its n arg is an integer and I only have coverage for 2 >> real args or 1 complex arg. > > jn with n > 1 only uses the large argument asymptotic expansion for > the case of very large x where p0(x) = 1 and q0(x) = 0. This then > gives (from the comment in e_jn.c) > > * Jn(x) = cos(x-(2n+1)*pi/4)*sqrt(2/x*pi) > > The work is in computing cos(). Elsewhere jn is computed via > upward or downward recursion. I fear that the recursion is very inaccurate. It seems to do about n multiplications, so must have an inaccuracy of >= n/2 ulps. >>> I'm beginning to like my use of the product formula. It >>> allows one to remove the zero from an interval. I did some >> >> I haven't seen that. Is it for the final evaluation or for determining >> the approximation? I don't know of any better way determining approximations >> near zeros except to multiply by a linear factor to remove the zero. >> >> It is becoming clearer why a separate interval is needed for each zero. >> Polynomials can only give good approximations for one zero at a time >> except in special cases. > > I haven't sent the code to you as its a work in progress. I did > send you some of my testing results for 2 <= x < 4 a couple weeks > ago. For j0(x), the infinite product is (https://dlmf.nist.gov/10.21) > > j0(x) = prod(1 - (x/zj)**2), j = 0, 1, 2, ... > > with z0 the first zero of j0(x), etc. This can be rewritten and > approximated by > > j0(x) / ((x-z0)*(x+z0)) = R(x) / S(x) > > While developing the rational approximation, on the LHS we use > z0 = z0hi + z0lo. Both z0hi and z0lo have p-bits of precision. > The final result is > > j0(x) = (x - z0hi - z0lo) * (x + z0) * R(x) / S(x) I see. I think this can only work well near one zero, and reduces to my idea for [2, 2.857]. For simplicity, I used j0(x) = P(x) and omitted the details for handling the zero. Your formula gives a good way to handle the zero. I doubt that the rational function works much better than the polynomial. It is difficult to evaluate the multiplicatons accurately and the division gives further inaccuracies. The expression should be written as: lower terms + (x - z0hi - z0lo) * (x + z0) * P0. This still has an inaccuracy of about 1.5 ulps. To do better, everything in this expression would have to be done in extra precision, starting with P0 = P0hi + P0lo, but this is too hard. It obviously can't work on an infinite interval, since there are an infinite number of zeros there, but only a finite number pf zeros in R(x) or P(x), and poles in 1/S(x). It's surprising that the asymptotic expansion seems to find all the zeros of j0f with errors of less than 4 million ulps (according to consistency checks between j0f and j0). Bruce From owner-freebsd-numerics@freebsd.org Sat Sep 8 13:52:21 2018 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 75F2AFF474C for ; Sat, 8 Sep 2018 13:52:21 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id E87CA94F77 for ; Sat, 8 Sep 2018 13:52:20 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id A6A8A104B9F8; Sat, 8 Sep 2018 23:52:10 +1000 (AEST) Date: Sat, 8 Sep 2018 23:52:10 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: j0 (and y0) in the range 2 <= x < (p/2)*log(2) In-Reply-To: <20180908044931.GA52882@troutmask.apl.washington.edu> Message-ID: <20180908232603.V883@besplex.bde.org> References: <20180903235724.GA95333@troutmask.apl.washington.edu> <20180905201540.D1142@besplex.bde.org> <20180905152104.GA26453@troutmask.apl.washington.edu> <20180906034525.A2959@besplex.bde.org> <20180905190223.GA27865@troutmask.apl.washington.edu> <20180906202658.K978@besplex.bde.org> <20180906223026.GA43005@troutmask.apl.washington.edu> <20180907230825.A933@besplex.bde.org> <20180907175406.GA49351@troutmask.apl.washington.edu> <20180908044931.GA52882@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=x7bEGLp0ZPQA:10 a=HajKcG0aLoVG3s6pNfgA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Sep 2018 13:52:21 -0000 On Fri, 7 Sep 2018, Steve Kargl wrote: > On Fri, Sep 07, 2018 at 10:54:06AM -0700, Steve Kargl wrote: >> >> I'll probably try a different approach tomorrow. j0(x), >> cc, and ss are smoothly varying functions of x. u and v >> appear to be slowly varying (in small intervals), so we can >> have >> >> j0(x) = sqrt(1/pi) * (cc(x) *u-ss(x) *v) / sqrt(x). >> j0(x+e) = sqrt(1/pi) * (cc(x+e)*u-ss(x+e)*v) / sqrt(x+e). >> >> where e is some small perturbution. Two equation and two >> unknowns is easily solved. Simply need to do this across >> the interval to generate u(x) and v(x). > > I may have have a better approach! > > j0(x) = sqrt(1/pi) * (cc(x) * u - ss(x) * v) / sqrt(x) > y0(x) = sqrt(1/pi) * (cc(x) * u + ss(x) * v) / sqrt(x) > > j0(x) + y0(x) = 2 * sqrt(1/pi) * cc(x) * u > j0(x) - y0(x) = - 2 * sqrt(1/pi) * ss(x) * v Divided by sqrt(x) on the RHS. > So, we have > > u = (j0 + y0) / (2 * sqrt(1/pi) * cc) > v = (y0 - j0) / (s * sqrt(1/pi) * ss) Multiplied by sqrt(x) on the RHS. Also, change s back to 2. > Thus, we can find u = r/s or u = 1 + r/s and v = r/s. Good idea. I checked that the u and v calculated by this work right in pari (they give a precision almost as high as the precision used in pari when they are used to implement j0 using libm's formula in pari) (57 decimal digits with 100-digit precision in pari, even though I used the limit formula A&S 9.1.2 for y0 because I couldn't find y0 in pari (use j_nu and y_nu with nu = 1e-50 to get the 57 decimal digits of accuracy). (I just noticed that libm only supports integral nu, presumably because its recursion method only works for integers. Current versions of pari document not very good accuracy for small args and/or small nu, and I'm using an 11 year old version, but nu = 1e-50 works perfectly.) However, the u and v calculated by this have no resemblance to libm's pzero and qzero (with the latter checked by pari to give a precision of about 16 decimal digits when used to implement j0). pzero / u - 1 is between -0.5 and -3.1. qzero / v - 1 is between -1.2 and -1.0. So libm doesn't seem to be doing anything better than fixing a vague approximation to the correct u and then solving for p. Better use the simple product method. Bruce From owner-freebsd-numerics@freebsd.org Sat Sep 8 16:12:22 2018 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 97251FF74B2 for ; Sat, 8 Sep 2018 16:12:22 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 35E9D7056A for ; Sat, 8 Sep 2018 16:12:22 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id w88GCK9X055612 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 8 Sep 2018 09:12:20 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id w88GCK7I055611; Sat, 8 Sep 2018 09:12:20 -0700 (PDT) (envelope-from sgk) Date: Sat, 8 Sep 2018 09:12:20 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: j0 (and y0) in the range 2 <= x < (p/2)*log(2) Message-ID: <20180908161219.GA55313@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20180903235724.GA95333@troutmask.apl.washington.edu> <20180905201540.D1142@besplex.bde.org> <20180905152104.GA26453@troutmask.apl.washington.edu> <20180906034525.A2959@besplex.bde.org> <20180905190223.GA27865@troutmask.apl.washington.edu> <20180906202658.K978@besplex.bde.org> <20180906223026.GA43005@troutmask.apl.washington.edu> <20180907230825.A933@besplex.bde.org> <20180907175406.GA49351@troutmask.apl.washington.edu> <20180908203432.X883@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180908203432.X883@besplex.bde.org> User-Agent: Mutt/1.10.1 (2018-07-13) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Sep 2018 16:12:22 -0000 On Sat, Sep 08, 2018 at 10:09:36PM +1000, Bruce Evans wrote: > On Fri, 7 Sep 2018, Steve Kargl wrote: > > On Sat, Sep 08, 2018 at 12:07:26AM +1000, Bruce Evans wrote: (a lot of detail removed) > > interval [2,2.857] we accept whatever the truncated asymptotic > > series gives for u and adjust v to compensate. In the next > > interval, [2.857,4.547], we flip the roles of u and v based on > > the behavior of cc and ss. Eventually, we get to the interval > > [8,inf). The divergent asymptotic series in 8 <= x < p/2*log(2) > > give u and v, which yield poor values for j0(x). There are > > 4 zeros of j0(x) in this range, and cc and ss have 4 and 3 zeros, > > respectfully. > > I didn't notice at first that you were stating my idea in different > words here. I think you did see. Maybe. Even after 15 or so years of exchanging emails about libm issues, it still takes me a bit to unravel your emails. > > There are some subtleties in your approach that I don't > > understand, yet! > > The only subtlety seems to be that we didn't understand the full > awfulness of using the asympotic formula where it doesn't apply. > The subtleties mostly go away if we use the simple method of a > different poly approx on each subinterval. > > I don't like rational function approximations (with a nontrivial > denominator), and don't fully understand why they are often more > accurate. Now I suspect that their main advantage is reducing the > number of zeros in the approximating function, and that this advantage > is especially small here since we use small intervals to avoid zeros > in the function being approximated. Higher degree polys have a higher > number of zeros, and these zeros must all be outside of the interval > except ones matching the zeros of the function being approximated. > Using a rational function reduces the degree, by using poles instead > of zeros. The poles must also be outside of the interval, but this is > apparently easier to arrange. You're not the only one who prefers polynomial approximations over rational approximations! I went looking for information about when one method should be used instead of the other. I've looked in standard books in numerical analysis (e.g., Ralston and Rabinowitz, Hildebrand, etc). Never found a simple rule. I've come to the conclusion: try to find a minimax polynomial first and fall back to rational approximation if the polynomial is bad. Just found https://dlmf.nist.gov/3.11 You might find the Example interesting. (discussion of j(x) and jx(x+e) removed for brevity). > >> I noticed that this method cannot work for jn, since u and v depend > >> on n so there is no way to generate before runtime. jn actually uses > >> a different method and I suspect that it is much more inaccurate. > >> My tests don't cover jn since its n arg is an integer and I only > >> have coverage for 2 real args or 1 complex arg. > > > > jn with n > 1 only uses the large argument asymptotic expansion for > > the case of very large x where p0(x) = 1 and q0(x) = 0. This then > > gives (from the comment in e_jn.c) > > > > * Jn(x) = cos(x-(2n+1)*pi/4)*sqrt(2/x*pi) > > > > The work is in computing cos(). Elsewhere jn is computed via > > upward or downward recursion. > > I fear that the recursion is very inaccurate. It seems to do about n > multiplications, so must have an inaccuracy of >= n/2 ulps. For the regions that I've used recursion (n and x < 20000), recursion works well. I have a paper by Olver that discussions the stability of forward and backwards recursion. You can also find info at https://dlmf.nist.gov/10.74#iv > >>> I'm beginning to like my use of the product formula. It > >>> allows one to remove the zero from an interval. I did some > >> > >> I haven't seen that. Is it for the final evaluation or for > >> determining the approximation? I don't know of any better way > >> determining approximations near zeros except to multiply by a > >> linear factor to remove the zero. > >> > >> It is becoming clearer why a separate interval is needed for each zero. > >> Polynomials can only give good approximations for one zero at a time > >> except in special cases. > > > > I haven't sent the code to you as its a work in progress. I did > > send you some of my testing results for 2 <= x < 4 a couple weeks > > ago. Here's exhaustive testing of j0f(x) on 2 <= x < 4 mobile:kargl[230] ./testf -m 2 -M 4 -E double -D Interval tested: [2.0000e+00,4.0000e+00] ulp <= 0.50: 38.034% ( 3190526) | 38.034% ( 3190526) 0.50 < ulp < 0.55: 3.435% ( 288112) | 41.469% ( 3478638) 0.55 <= ulp < 0.60: 3.343% ( 280402) | 44.811% ( 3759040) 0.60 <= ulp < 0.70: 6.358% ( 533378) | 51.170% ( 4292418) 0.70 <= ulp < 0.80: 5.903% ( 495183) | 57.073% ( 4787601) 0.80 <= ulp < 0.90: 5.460% ( 458035) | 62.533% ( 5245636) 0.90 <= ulp < 1.00: 4.960% ( 416092) | 67.493% ( 5661728) 1.00 <= ulp < 2.00: 26.941% ( 2259993) | 94.434% ( 7921721) 2.00 <= ulp < 3.00: 5.057% ( 424212) | 99.491% ( 8345933) 3.00 <= ulp : 0.509% ( 42675) | 100.000% ( 8388608) Max ulp: 5.246863 at 3.98081970e+00 > > For j0(x), the infinite product is (https://dlmf.nist.gov/10.21) > > > > j0(x) = prod(1 - (x/zj)**2), j = 0, 1, 2, ... > > > > with z0 the first zero of j0(x), etc. This can be rewritten and > > approximated by > > > > j0(x) / ((x-z0)*(x+z0)) = R(x) / S(x) > > > > While developing the rational approximation, on the LHS we use > > z0 = z0hi + z0lo. Both z0hi and z0lo have p-bits of precision. > > The final result is > > > > j0(x) = (x - z0hi - z0lo) * (x + z0) * R(x) / S(x) > > I see. I think this can only work well near one zero, and reduces to > my idea for [2, 2.857]. I tried to do 2 <= x < 8, where we have z0 and z1. I did not like the results. > For simplicity, I used j0(x) = P(x) and omitted > the details for handling the zero. Your formula gives a good way to handle > the zero. I doubt that the rational function works much better than the > polynomial. It is difficult to evaluate the multiplicatons accurately > and the division gives further inaccuracies. The expression should be > written as: > > lower terms + (x - z0hi - z0lo) * (x + z0) * P0. > > This still has an inaccuracy of about 1.5 ulps. To do better, everything > in this expression would have to be done in extra precision, starting with > P0 = P0hi + P0lo, but this is too hard. > > It obviously can't work on an infinite interval, since there are an infinite > number of zeros there, but only a finite number pf zeros in R(x) or P(x), and > poles in 1/S(x). It's surprising that the asymptotic expansion seems to > find all the zeros of j0f with errors of less than 4 million ulps (according > to consistency checks between j0f and j0). The problem with using an approximation for the infinite product of j0(x) is that there doesn't seem to be a similar product for y0(x). I haven't comtemplated what to do about y0(x). -- Steve From owner-freebsd-numerics@freebsd.org Sat Sep 8 18:10:09 2018 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0016EFF9E56 for ; Sat, 8 Sep 2018 18:10:08 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 46DAB73417 for ; Sat, 8 Sep 2018 18:10:07 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id C4051D6C52D; Sun, 9 Sep 2018 04:09:57 +1000 (AEST) Date: Sun, 9 Sep 2018 04:09:56 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: j0 (and y0) in the range 2 <= x < (p/2)*log(2) In-Reply-To: <20180908161219.GA55313@troutmask.apl.washington.edu> Message-ID: <20180909031247.K3906@besplex.bde.org> References: <20180903235724.GA95333@troutmask.apl.washington.edu> <20180905201540.D1142@besplex.bde.org> <20180905152104.GA26453@troutmask.apl.washington.edu> <20180906034525.A2959@besplex.bde.org> <20180905190223.GA27865@troutmask.apl.washington.edu> <20180906202658.K978@besplex.bde.org> <20180906223026.GA43005@troutmask.apl.washington.edu> <20180907230825.A933@besplex.bde.org> <20180907175406.GA49351@troutmask.apl.washington.edu> <20180908203432.X883@besplex.bde.org> <20180908161219.GA55313@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=x7bEGLp0ZPQA:10 a=PYnjg3YJAAAA:8 a=7A0SVispkeHtQis5JpoA:9 a=vuP2INKny8eB9in_:21 a=me3H_tYmzkuvfRLF:21 a=CjuIK1q_8ugA:10 a=96-UuAdfYG6OSYlHWuPe:22 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Sep 2018 18:10:09 -0000 On Sat, 8 Sep 2018, Steve Kargl wrote: > On Sat, Sep 08, 2018 at 10:09:36PM +1000, Bruce Evans wrote: >> [even more details removed] >> I don't like rational function approximations (with a nontrivial >> denominator), and don't fully understand why they are often more >> accurate. Now I suspect that their main advantage is reducing the >> number of zeros in the approximating function, and that this advantage >> is especially small here since we use small intervals to avoid zeros >> in the function being approximated. Higher degree polys have a higher >> number of zeros, and these zeros must all be outside of the interval >> except ones matching the zeros of the function being approximated. >> Using a rational function reduces the degree, by using poles instead >> of zeros. The poles must also be outside of the interval, but this is >> apparently easier to arrange. > > You're not the only one who prefers polynomial approximations > over rational approximations! I went looking for information about > when one method should be used instead of the other. I've looked > in standard books in numerical analysis (e.g., Ralston and > Rabinowitz, Hildebrand, etc). Never found a simple rule. I've > come to the conclusion: try to find a minimax polynomial first > and fall back to rational approximation if the polynomial is bad. > > Just found https://dlmf.nist.gov/3.11 > > You might find the Example interesting. Will check. >> ... >> I fear that the recursion is very inaccurate. It seems to do about n >> multiplications, so must have an inaccuracy of >= n/2 ulps. > > For the regions that I've used recursion (n and x < 20000), recursion > works well. I have a paper by Olver that discussions the stability > of forward and backwards recursion. You can also find info at > > https://dlmf.nist.gov/10.74#iv Getting too much to check :-). > ... > Here's exhaustive testing of j0f(x) on 2 <= x < 4 > > mobile:kargl[230] ./testf -m 2 -M 4 -E double -D > Interval tested: [2.0000e+00,4.0000e+00] > ulp <= 0.50: 38.034% ( 3190526) | 38.034% ( 3190526) > 0.50 < ulp < 0.55: 3.435% ( 288112) | 41.469% ( 3478638) > 0.55 <= ulp < 0.60: 3.343% ( 280402) | 44.811% ( 3759040) > 0.60 <= ulp < 0.70: 6.358% ( 533378) | 51.170% ( 4292418) > 0.70 <= ulp < 0.80: 5.903% ( 495183) | 57.073% ( 4787601) > 0.80 <= ulp < 0.90: 5.460% ( 458035) | 62.533% ( 5245636) > 0.90 <= ulp < 1.00: 4.960% ( 416092) | 67.493% ( 5661728) > 1.00 <= ulp < 2.00: 26.941% ( 2259993) | 94.434% ( 7921721) > 2.00 <= ulp < 3.00: 5.057% ( 424212) | 99.491% ( 8345933) > 3.00 <= ulp : 0.509% ( 42675) | 100.000% ( 8388608) > Max ulp: 5.246863 at 3.98081970e+00 I plugged in my polynomomal approximations with your hi+lo decomposition in the second interval [2, 2.857]. This reduces the maximum error for j0f from 896571 ulps on amd64 to 3.8591 ulps on amd64 and 1.7455 ulps on i386. i386's native extra precision is defeated by C bindings and libm not having any special support for arches with extra precision. This mysteriously only increases efficiency by < 10%. The plugin uses a slow Horner's method evaluation but should still do less than 1/4 as many operations. XX diff -u2 e_j0.c~ e_j0.c XX --- e_j0.c~ Thu May 30 18:14:16 2013 XX +++ e_j0.c Sun Sep 9 03:09:25 2018 XX @@ -80,4 +82,25 @@ XX S04 = 1.16614003333790000205e-09; /* 0x3E1408BC, 0xF4745D8F */ XX XX +/* XX + * XXX: the compiler cannot be trusted to calculate z2hi and z2lo here. XX + * gcc on i386 and all compilers on systems with long double == double XX + * will round z2 to double precision, giving a useless decomposition with XX + * z2lo == 0. XX + */ XX +#define z2 2.404825557695772768621631879L XX +static const float XX +z2hi = z2, XX +z2lo = z2 - (float)z2, XX +PS2[] = { XX + -1.7291506899370193e-1, /* -0x162214bb2821dd.0p-55 */ XX + 1.3329146107965132e-2, /* 0x1b4c4fb4f04355.0p-59 */ XX + -3.9698769373352208e-4, /* -0x1a045929580f67.0p-64 */ XX + 6.4047723657482110e-6, /* 0x1add126c1f1b21.0p-70 */ XX + -6.5169501554451472e-8, /* -0x117e69feeaa2ee.0p-76 */ XX + 4.5704675158948688e-10, /* 0x1f68739484aa44.0p-84 */ XX + -2.3227382988447959e-12, /* -0x146e57779789e0.0p-91 */ XX + 7.9237009913278836e-15, /* 0x11d7b3dfda9418.0p-99 */ XX +}; XX + XX static const double zero = 0.0; XX XX @@ -106,4 +129,10 @@ XX * y0(x) = 1/sqrt(pi) * (P(0,x)*ss + Q(0,x)*cc) / sqrt(x) XX */ XX + if(ix<0x4006db6d) { XX + z = x*x; XX + return (x-z2hi-z2lo) * (x+z2hi) * XX + (PS2[0]+z*(PS2[1]+z*(PS2[2]+z*(PS2[3]+z*(PS2[4]+ XX + z*(PS2[5]+z*(PS2[6]+z*PS2[7]))))))); XX + } XX if(ix>0x48000000) z = (invsqrtpi*cc)/sqrt(x); XX else { XX diff -u2 e_j0f.c~ e_j0f.c XX --- e_j0f.c~ Thu May 30 18:14:16 2013 XX +++ e_j0f.c Sun Sep 9 03:02:36 2018 XX @@ -37,4 +43,16 @@ XX S04 = 1.1661400734e-09; /* 0x30a045e8 */ XX XX +#define z2 2.4048255576957728 XX +static const float XX +z2hi = z2, XX +z2lo = z2 - (float)z2, XX +PS2[] = { XX + -1.72912285e-1, /* -0xb10feb.0p-26 */ XX + 1.33266943e-2, /* 0xda5835.0p-30 */ XX + -3.96135118e-4, /* -0xcfb05b.0p-35 */ XX + 6.25792291e-6, /* 0xd1fb26.0p-41 */ XX + -5.25139328e-8, /* -0xe18bae.0p-48 */ XX +}; XX + XX static const float zero = 0.0; XX XX @@ -63,5 +81,10 @@ XX * y0(x) = 1/sqrt(pi) * (P(0,x)*ss + Q(0,x)*cc) / sqrt(x) XX */ XX - if(ix>0x80000000) z = (invsqrtpi*cc)/sqrtf(x); XX + if(ix<0x4036d917) { XX + z = x*x; XX + return (x-z2hi-z2lo) * (x+z2hi) * XX + (PS2[0]+z*(PS2[1]+z*(PS2[2]+z*(PS2[3]+ z*PS2[4])))); XX + } XX + if(ix>0x58000000) z = (invsqrtpi*cc)/sqrtf(x); /* |x|>2**49 */ XX else { XX u = pzerof(x); v = qzerof(x); >> ... >> It obviously can't work on an infinite interval, since there are an infinite >> number of zeros there, but only a finite number pf zeros in R(x) or P(x), and >> poles in 1/S(x). It's surprising that the asymptotic expansion seems to >> find all the zeros of j0f with errors of less than 4 million ulps (according >> to consistency checks between j0f and j0). > > The problem with using an approximation for the infinite product of > j0(x) is that there doesn't seem to be a similar product for y0(x). > I haven't comtemplated what to do about y0(x). y0 just needs different minimax approximations. Sharing pzero and qzero might be the main reason for using the asymptotic formula where it doesn't really apply. Oops, y2 is neither odd nor even, so it is hard to approximate. I just found a good formula for this in the source code (I couldn't find one in A&S or W&W). It is y0(x) = 2/pi*(j0(x)*ln(x/2)+even_func(x)). y0(x) is actually undefined for x < 0. You added a comment about this when improving the comments about special cases. We could use a single poly with odd and even terms to approximate y0 directly, and this is probably better than approximating j0, log and even_func specially, since all methods have errors of several ulps unless everything uses extra precision. The advantage of using separate approximations is that the log term requires a high degree poly and/or complications whether it is separate or not, and we already have good versions of it when it is separate (at least for logl and for log and logf on i386). libm does use this formula with its separate approximations for 0 < x < 2. Bruce