Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 20 Oct 2012 08:48:47 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        freebsd-arch@FreeBSD.org
Subject:   Re: using SSE2 in kernel C code (improving AES-NI module)
Message-ID:  <20121020054847.GB35915@deviant.kiev.zoral.com.ua>
In-Reply-To: <20121019233833.GS1967@funkthat.com>
References:  <20121019233833.GS1967@funkthat.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--E+zqmlIEIVYE0XqN
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Oct 19, 2012 at 04:38:33PM -0700, John-Mark Gurney wrote:
> So, the AES-NI module already uses SSE2 instructions, but it does so
> only in assembly.  I have improved the perofrmance of the AES-NI
> modules implementation, but this involves me using additional SSE2
> instructions.
>=20
> In order to keep my sanity, I did part of the new code in C using
> gcc native types and xmmintrin.h, but we do not support this header in
> the kernel..  This means we cannot simply add the new code to the
> kernel...
>=20
> Any good ideas on how to integrate this code into the kernel build?
>=20
> I have used the trick of producing assembly of the C file with gcc -S,
> and then compiling the assembly into the kernel, but I'm not sure if
> that's the best way, and even if it is the best, how I'd do the
> generation as part of the kernel build...  Or would it be ok to commit
> both, and require a regeneration each time the C file is updated?
>=20
> In my testing in userland w/o the opencrypto framework overhead, the old
> code would only get about ~250MB/sec..  With the new code I get
> ~2200MB/sec...
>=20
> Sample code:
> static inline __m128i
> xts_crank_lfsr(__m128i inp)
> {
> 	const __m128i alphamask =3D _mm_set_epi32(1, 1, 1, AES_XTS_ALPHA);
> 	__m128i xtweak, ret;
>=20
> 	/* set up xor mask */
> 	xtweak =3D _mm_shuffle_epi32(inp, 0x93);
> 	xtweak =3D _mm_srai_epi32(xtweak, 31);
> 	xtweak &=3D alphamask;
>=20
> 	/* next term */
> 	ret =3D _mm_slli_epi32(inp, 1);
> 	ret ^=3D xtweak;
>=20
> 	return ret;
> }

The current structure of the aes-ni driver is partly enforced by the
issue you noted. We cannot use sse intristics in the kernel, and
huge inline assembler fragments are hard to write.

I prefer to have the separate .S files with the optimized code,
hand-written. If needed, I offer you a help with transition. I would
need a full patch to rewrite the code.

--E+zqmlIEIVYE0XqN
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAlCCOz8ACgkQC3+MBN1Mb4h/EgCcDyMBlXwl3CpOPrOLMTt1x4yG
29QAn30b9pBDFFEwI6M7HcLx36HWq6GI
=a4fj
-----END PGP SIGNATURE-----

--E+zqmlIEIVYE0XqN--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121020054847.GB35915>