Date: Thu, 30 Mar 2017 17:07:42 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 218203] Implement AVX2 accelerated Fletcher algorithms Message-ID: <bug-218203-8-3otwsGHPqA@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-218203-8@https.bugs.freebsd.org/bugzilla/> References: <bug-218203-8@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D218203 --- Comment #1 from kungfujesus06@gmail.com --- If desired, I can post my benchmark code. It is using more instructions th= an the zfsonlinux variant (I used SIMD intrinsics instead of inline assembly).= =20 The extra instructions are mostly just shuffling values between registers.= =20 After the intermediate sum loop is completed I aliased into the __m256i's instead of doing vmovqdu into memory for the constant multiplications. I suspect the compiler was able to shuffle registers around enough to avoid s= ome trips to memory, but the Intel whitepaper isn't quite fair to itself, as I think they are comparing the best possible performance without SIMD (which = is not the original loop, but the loop unrolled 4 times) with their SIMD varia= nt. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-218203-8-3otwsGHPqA>