Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Oct 2012 00:04:17 -0700
From:      John-Mark Gurney <jmg@funkthat.com>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-arch@freebsd.org
Subject:   Re: using SSE2 in kernel C code (improving AES-NI module)
Message-ID:  <20121023070417.GD1563@funkthat.com>
In-Reply-To: <20121021061011.GG35915@deviant.kiev.zoral.com.ua>
References:  <20121019233833.GS1967@funkthat.com> <20121020054847.GB35915@deviant.kiev.zoral.com.ua> <20121020171124.GU1967@funkthat.com> <CAGE5yCoM92rU7Ca7C7_x=3vXW%2BqO9Zc0uQhPURuMbstPDvq9yg@mail.gmail.com> <20121021024726.GA1563@funkthat.com> <20121021061011.GG35915@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Konstantin Belousov wrote this message on Sun, Oct 21, 2012 at 09:10 +0300:
> On Sat, Oct 20, 2012 at 07:47:26PM -0700, John-Mark Gurney wrote:
> > Peter Wemm wrote this message on Sat, Oct 20, 2012 at 11:10 -0700:
> > > Or, another option.. do something like genassym or the many other
> > > kernel build tools.  aicasm builds and runs a userland tool to
> > > generate something to build into the kernel.  With sufficient
> > > cross-contamination safeguards I wonder if something similar might be
> > > able to be done here.
> > 
> > Well, looks like I may this working...  Turns out I can't name the file
> > .s otherwise config puts it in SFILES which causes all sorts of problems..
> > So, I went w/ .nos, does any one else have any suggestions?
> > 
> > how does this look to people:
> > aesni_wrap2.nos                 optional aesni                             \
> >         dependency      "$S/crypto/aesni/aesni_wrap2.c"                    \
> >         compile-with    "${CC} -O3 -fPIC -S -o aesni_wrap2.nos $S/crypto/aesni/aesni_wrap2.c" \   
> >         no-obj no-implicit-rule before-depend                              \
> >         clean           "aesni_wrap2.nos"
> > aesni_wrap2.o                   optional aesni                             \
> >         dependency      "aesni_wrap2.nos"                                  \
> >         compile-with    "${NORMAL_S} aesni_wrap2.nos"                      \
> >         no-implicit-rule                                                   \
> >         clean           "aesni_wrap2.o"
> > 
> > We'll have to do something similar in the module Makefile, but that is
> > easier...
> > 
> > Also, I thought we had a better way to note that some devices depend
> > upon others than just throwing a depend error...  If you include aesni
> > w/o crypto, you get error about missing cryptodev_if.h...
> > 
> Hm, if such thing is possible, why do you need to compile through the
> .S at all ? All you need is to specify the special compiling flags,
> including -msse and -msse2.

Thanks, I managed to get it down to one...

> Note, you shall not need -fPIC, at least for amd64. I would suggest to use
> -O2, as well as to try to honour the -g settings.

If I don't do -fpic I get:
aesni_wrap2.o:(.eh_frame+0x20): relocation truncated to fit: R_X86_64_32 against `.text'

when linking the kernel...  If you can explain to me how to get rid of
this error, I'll do it..

> Most likely, you can put the ${CFLAGS} on the command line, followed
> by -msse -msse2.

I can't use CFLAGS because it removes access to the xmmintrin.h header
file...  It looks like an option is to use:
-fpic ${OPTFLAGS:C/^-O2$/-O3/} ${DEBUG}

In my testing, -O2 is significantly slower, hence the bump to -O3:
x O2.txt
+ O3.txt
    N           Min           Max        Median           Avg        Stddev
x  20     1741.3491      1754.987     1752.9267     1751.5602     3.5616947
+  20      2223.217     2244.4501     2242.7028     2240.3183     5.7020691
Difference at 95.0% confidence
        488.758 +/- 3.04271
        27.9042% +/- 0.173715%
        (Student's t, pooled s = 4.75391)

Those are MB/sec...

Index: files.amd64
===================================================================
--- files.amd64	(revision 241041)
+++ files.amd64	(working copy)
@@ -137,6 +137,11 @@
 crypto/aesni/aeskeys_amd64.S	optional aesni
 crypto/aesni/aesni.c		optional aesni
 crypto/aesni/aesni_wrap.c	optional aesni
+aesni_wrap2.o			optional aesni				   \
+	dependency	"$S/crypto/aesni/aesni_wrap2.c"			   \
+	compile-with    "${CC} -c -fpic ${COPTFLAGS:C/^-O2$/-O3/} ${DEBUG} -o aesni_wrap2.o $S/crypto/aesni/aesni_wrap2.c" \
+	no-implicit-rule						   \
+	clean           "aesni_wrap2.o"
 crypto/blowfish/bf_enc.c	optional	crypto | ipsec 
 crypto/des/des_enc.c		optional	crypto | ipsec | netsmb
 crypto/via/padlock.c		optional	padlock


I still need to fix up i386, and will let people review a full patch
to address both arches before committing...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121023070417.GD1563>