Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 May 2015 09:27:05 +0100
From:      David Chisnall <theraven@FreeBSD.org>
To:        John-Mark Gurney <jmg@funkthat.com>
Cc:        Poul-Henning Kamp <phk@phk.freebsd.dk>, Baptiste Daroussin <bapt@freebsd.org>, current@freebsd.org
Subject:   Re: Increase BUFSIZ to 8192
Message-ID:  <A1224018-7540-4C76-91EF-AEA2655E49A8@FreeBSD.org>
In-Reply-To: <20150513080342.GE37063@funkthat.com>
References:  <20150511230635.GA46991@ivaldir.etoilebsd.net> <20150512032307.GP37063@funkthat.com> <14994.1431412293@critter.freebsd.dk> <20150513080342.GE37063@funkthat.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 13 May 2015, at 09:03, John-Mark Gurney <jmg@funkthat.com> wrote:
>=20
> Poul-Henning Kamp wrote this message on Tue, May 12, 2015 at 06:31 =
+0000:
>> --------
>> In message <20150512032307.GP37063@funkthat.com>, John-Mark Gurney =
writes:
>>=20
>>> Also, you'd probably see even better performance by increasing the
>>> size to 64k, [...]
>>=20
>> easy:
>> 	8K on 32bit
>> 	64k on 64bit
>=20
> Sounds good to me...  Just for people who care... I did a quick set of
> benchmarks on sha256.. This is using my preliminary patch to use sse4
> optimized sha256...  But this should be the same for others...
>=20
> The numbers in ministat output are the time in seconds it takes my
> 3.4GHz AMD A10-5700 APU running HEAD to process a 512MB file, so lower
> numbers are better..  I've processed them into easier to read format:
> BUFSIZ:	145MB/sec
> 8k:	193MB/sec
> 16k:	198MB/sec
> 64k:	202MB/sec
> 128k:	202MB/sec
> -t:	211MB/sec

It looks like most of the benefit is gained at 16KB.  Did you try =
running the benchmark with something else running at the same time to =
see if there is any advantage in trashing the caches a bit less (simple =
case, what happens if you run two instances of the same benchmark at =
once)?

I suspect that you=E2=80=99re about right anyway - I recently did some =
tests while playing with JavaScript FFI generation with a multithreaded =
process JavaScript environment calling out to OpenSSL to do SHA =
calculations and having each of 8 threads reading in 128KB chunks gave =
the fastest performance (Core i7, 4 cores + hyperthreading), with only a =
negligible gain over 64KB.  In all cases, the JavaScript implementation =
was significantly faster than the openssl tool, which used 8KB buffers.

David




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A1224018-7540-4C76-91EF-AEA2655E49A8>