Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Feb 2014 10:18:05 -0800
From:      John-Mark Gurney <jmg@funkthat.com>
To:        Peter Jeremy <peter@rulingia.com>
Cc:        arch@freebsd.org
Subject:   Re: small kernel kernel option...
Message-ID:  <20140228181804.GQ47921@funkthat.com>
In-Reply-To: <20140228114224.GE2705@server.rulingia.com>
References:  <20140226214816.GB92037@funkthat.com> <20140228114224.GE2705@server.rulingia.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Peter Jeremy wrote this message on Fri, Feb 28, 2014 at 22:42 +1100:
> On 2014-Feb-26 13:48:16 -0800, John-Mark Gurney <jmg@funkthat.com> wrote:
> >I'm about to commit a change to sha256 to speed it up, but the cost
> >of that speed up is an increase in code/data size from just under 1k
> >to almost 9k (as measured on amd64)...  this increase is from unrolling
> >a loop..
> 
> Out of interest, how much of a speedup and what CPU/compiler
> combinations did you test your change on?  I ask because several years
> ago, I tried about 7 different SHA-256 implementations (basically, all
> the C ones I could easily find in FreeBSD and ports I had installed,
> as well as one I tweaked myself) across a range of CPUs and compilers.
> I found that not only was there a very wide variation in speed between
> implementations but that the best on one CPU often ran quite poorly on
> another and unrolling loops didn't necessarily help.

I did not do an exhaustive search.. I only benchmarked the two easy
ones, the one from libmd and the kernel one...  I ran my tests on
an A10-5700@3.4GHz, Core i7@2GHz (though under MacOSX) and an
Opteron-4228 HE@2.8Ghz...  All tests were on amd64...  There were a
few people who also ran the tests for me but I don't remeber what
processors they ran on..  In all the cases we saw an improvement, and
mostly saw a ~20% improvement by using cperciva's libmd version than
the kernel version...  These were proven w/ ministat...

Part of the reason I didn't to an exhaustive search is that many
implementations (OpenSSL, NSS) are very difficult to extract w/o
major work..

If you'd like to run my test suite, it can be run by d/l'ing:
https://www.funkthat.com/~jmg/sha256.test.unr.tgz

just run the script gennumbers and wait for a while...  it'll compile
and validate and perform the tests..

This also includes the tests to test various numbers of loop unrolling
which exposed some weird timing behavior...  Enough that the only
option is either unrolled or completely rolled, no partial loops..

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140228181804.GQ47921>