Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 Mar 2017 17:07:42 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 218203] Implement AVX2 accelerated Fletcher algorithms
Message-ID:  <bug-218203-8-3otwsGHPqA@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-218203-8@https.bugs.freebsd.org/bugzilla/>
References:  <bug-218203-8@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D218203

--- Comment #1 from kungfujesus06@gmail.com ---
If desired, I can post my benchmark code.  It is using more instructions th=
an
the zfsonlinux variant (I used SIMD intrinsics instead of inline assembly).=
=20
The extra instructions are mostly just shuffling values between registers.=
=20
After the intermediate sum loop is completed I aliased into the __m256i's
instead of doing vmovqdu into memory for the constant multiplications.  I
suspect the compiler was able to shuffle registers around enough to avoid s=
ome
trips to memory, but the Intel whitepaper isn't quite fair to itself, as I
think they are comparing the best possible performance without SIMD (which =
is
not the original loop, but the loop unrolled 4 times) with their SIMD varia=
nt.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-218203-8-3otwsGHPqA>