Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Mar 2013 08:37:22 +0000
From:      David Chisnall <theraven@FreeBSD.org>
To:        "freebsd-numerics@freebsd.org" <freebsd-numerics@FreeBSD.org>
Subject:   Fwd: [cfe-dev] More on atlas and clang
Message-ID:  <8652E076-8710-4766-8FD0-7774D82A1A0B@FreeBSD.org>
References:  <E49A1576-970A-4613-A09E-28BD3A818225@macports.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Recent benchmarks of Atlas with clang, recently posted to the clang list =
attached.  Note that the -fvectorize and -fslp-vectorize flags are =
enabling the new autovectorisation code in clang, which will be enabled =
by default in 3.3. =20

David

Begin forwarded message:

> Hi there,
>=20
> I have recently undertaken another experimental build of Atlas =
(http://math-atlas.sourceforge.net =96 briefly speaking, Atlas provides =
a highly complete BLAS/LAPACK implementation optimized for the native =
architecture of the computer on which it is running) on an AVX machine =
(MacMini 2011) using a snapshot of clang 3.3 (r173279) provided by =
MacPorts (http://macports.org), with -O3, -fPIC, -fvectorize and =
-fslp-vectorize flags.=20
>=20
> I am please to say that:
>=20
> 1. The generated AVX code seems fine: a full test session run under an =
Atlas-based SciPy didn=92t raise any error;
> 2. The performance seems now on-par or even (sometimes surprisingly) =
better than the =91reference GCC=92 =96 whatever that means (I was =
unable to get in touch with Atlas developer at that time) =96 as =
evidenced by the table below:
>=20
> Reference clock rate=3D3292Mhz, new rate=3D2300Mhz
>  Refrenc : % of clock rate achieved by reference install
>  Present : % of clock rate achieved by present ATLAS install
>=20
>                   single precision                  double precision
>           ********************************   =
*******************************
>                 real           complex           real           =
complex
>           ---------------  ---------------  ---------------  =
---------------
> Benchmark   Refrenc Present  Refrenc Present  Refrenc Present  Refrenc =
Present
> =3D=3D=3D=3D=3D=3D=3D=3D=3D   =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=
  =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D  =3D=3D=3D=3D=3D=3D=3D =
=3D=3D=3D=3D=3D=3D=3D  =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D
> kSelMM     1289.9  1407.4   1188.7  1229.8    686.7   826.8    647.4   =
682.1
> kGenMM      198.2   239.7    198.5   237.8    193.9   231.8    196.0   =
233.8
> kMM_NT      193.7   266.4    195.2   192.9    184.2   187.4    188.5   =
197.5
> kMM_TN      198.5   211.1    197.9   226.2    189.8   227.6    189.5   =
223.2
> BIG_MM     1213.8  1346.7   1241.3  1366.5    652.0   789.5    661.4   =
795.8
>  kMV_N      224.3   308.1    438.8   617.3    115.9   152.1    205.8   =
283.5
>  kMV_T      224.6   313.5    460.3   642.9    123.2   159.6    211.3   =
288.2
>   kGER      148.3   192.4    290.2   381.2     73.3    95.6    144.3   =
184.3
>=20
> This is in stark contrast with the previous test where clang were =
lagging about 20% beyond the =91reference implementation=92 based on GCC =
for lines 2, 3 and 4 where compiler performance matters most.
>=20
> So =96 to summarize in two words: kudos folks!
>=20
> I will build another version on a Core2Duo machine tonight and see if =
the results are consistent.
>=20
> Cheers!
> Vincent
>=20
>=20
> _______________________________________________
> cfe-dev mailing list
> cfe-dev@cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8652E076-8710-4766-8FD0-7774D82A1A0B>