Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Jan 2015 23:40:34 +0300
From:      rozhuk.im@gmail.com
To:        "'John-Mark Gurney'" <jmg@funkthat.com>
Cc:        freebsd-hackers@freebsd.org, freebsd-geom@freebsd.org
Subject:   RE: ChaCha8/12/20 and GEOM ELI tests
Message-ID:  <54b43144.2d08980a.437b.0f8f@mx.google.com>
In-Reply-To: <20150112072249.GM1949@funkthat.com>
References:  <54b33bfa.e31b980a.3e5d.ffffc823@mx.google.com> <20150112072249.GM1949@funkthat.com>

next in thread | previous in thread | raw e-mail | index | archive | help
> > Cha?ha patch:
> >
> http://netlab.linkpc.net/download/software/FreeBSD/patches/chacha.patch
> 
> What's the difference between CHACHA and XCHACHA?

Same as between SALSA and XSALSA.

XChaCha20 uses a 256-bit key as well as the first 128 bits of the nonce in
order to compute a subkey. This subkey, as well as the remaining 64 bits of
the nonce, are the parameters of the ChaCha20 function used to actually
generate the stream.

But with XChaCha20's longer nonce, it is safe to generate nonces using
randombytes_buf() for every message encrypted with the same key without
having to worry about a collision.

More details: http://cr.yp.to/snuffle/xsalsa-20081128.pdf



> Also, where are the man page diffs?  They might have explained the
> difference between the two, and explained why two versions of chacha
> are needed...

No man page diffs.
Man pages does not explain difference between AES-CBC and AES-XTS...


> Is there a reason you decided to write your own ChaCha implementation
> instead of using one of the standard ones?  Did you run performance
> tests between your implementation and others?

Reference ChaCha and reference (FreeBSD) XTS (4k sector):
ChaCha8-XTS-256   = 199518722 bytes/sec
ChaCha12-XTS-256  = 179029849 bytes/sec
ChaCha20-XTS-256  = 149447317 bytes/sec
XChaCha8-XTS-256  = 195675728 bytes/sec
XChaCha12-XTS-256 = 175790196 bytes/sec
XChaCha20-XTS-256 = 147939263 bytes/sec


This is the reference version adapted for use in /dev/crypto.
chacha_block_unaligneg() - processing the reference version of a data block.
Macros are used for readability.
chacha_block_aligned() - the same but the work on the aligned data.
To increase speed, instead of one byte is processed for 4/8 byte times.
The data in the context of an 8-byte aligned.
To increase security, all data, including temporary, saved in a context that
on completion of the work is filled with zeros.


> > HW: Core Duo E8500, 8Gb DDR2-800.
> > dd if=/dev/zero of=/dev/md0 bs=1m
> > 2148489421 bytes/sec
> >
> >
> > # sector = 512b
> > 3DES-CBC-192      =  20773120 bytes/sec
> > AES-CBC-128       =  85276853 bytes/sec
> > AES-CBC-256       =  68893016 bytes/sec
> > AES-XTS-128       =  68194868 bytes/sec
> > AES-XTS-256       =  56611573 bytes/sec
> > Blowfish-CBC-128  =  11169657 bytes/sec
> > Blowfish-CBC-256  =  11185891 bytes/sec
> > Camellia-CBC-128  =  78077243 bytes/sec
> > Camellia-CBC-256  =  65732219 bytes/sec
> > ChaCha8-XTS-256   = 258042765 bytes/sec
> > ChaCha12-XTS-256  = 223616967 bytes/sec
> > ChaCha20-XTS-256  = 176005366 bytes/sec
> > XChaCha8-XTS-256  = 228292624 bytes/sec
> > XChaCha12-XTS-256 = 195577624 bytes/sec
> > XChaCha20-XTS-256 = 152247267 bytes/sec
> > XChaCha20-XTS-128 = 152717737 bytes/sec ! 128 bit key have same speed
> > as 256
> >
> >
> > # sector = 4kb
> > 3DES-CBC-192      =  22018189 bytes/sec
> > AES-CBC-128       = 104097143 bytes/sec
> > AES-CBC-256       =  81983833 bytes/sec
> > AES-XTS-128       =  78559346 bytes/sec
> > AES-XTS-256       =  66047200 bytes/sec
> > Blowfish-CBC-128  =  38635464 bytes/sec
> > Blowfish-CBC-256  =  38810555 bytes/sec
> > Camellia-CBC-128  =  92814510 bytes/sec
> > Camellia-CBC-256  =  75949489 bytes/sec
> > ChaCha8-XTS-256   = 337336982 bytes/sec
> > ChaCha12-XTS-256  = 284740187 bytes/sec
> > ChaCha20-XTS-256  = 217326865 bytes/sec
> > XChaCha8-XTS-256  = 328424551 bytes/sec
> > XChaCha12-XTS-256 = 278579692 bytes/sec
> > XChaCha20-XTS-256 = 211660225 bytes/sec
> >
> > Optimized AES-XTS - speed like AES-CBC:
> > AES-XTS-128       = 102841051 bytes/sec
> > AES-XTS-256       =  80813644 bytes/sec
> 
> Is this from a different patch or what?  Can you talk more about this?

No patch at this moment.
After optimization ChaCha-XTS I applied these optimizations to the AES-XTS
and get this result.
All changes were aes_xts_reinit() and aes_xts_crypt(), just slightly changed
the structure aes_xts_ctx.

aes_xts_ctx:
u_int8_t tweak[] -> u_int64_t tweak[]

aes_xts_reinit -> same as chacha_xts_reinit()

aes_xts_crypt -> same as chacha_xts_crypt():
block[] - temp buf removed;
xor 1 byte -> xor 8 bytes at once;
tweak[i] << 1: rotl 1 bit: 1 byte -> 8 bytes;
unroll loops;

Final:

struct aes_xts_ctx {
	rijndael_ctx key1;
	rijndael_ctx key2;
	uint64_t tweak[(AES_XTS_BLOCKSIZE / sizeof(uint64_t))];
};

void
aes_xts_reinit(caddr_t key, u_int8_t *iv)
{
	struct aes_xts_ctx *ctx = (struct aes_xts_ctx *)key;

	/*
	 * Prepare tweak as E_k2(IV). IV is specified as LE representation
	 * of a 64-bit block number which we allow to be passed in directly.
	 */
	if (ALIGNED_POINTER(iv, uint64_t)) {
		ctx->tweak[0] = (*((uint64_t*)(void*)iv));
	} else {
		bcopy(iv, ctx->tweak, sizeof(uint64_t));
	}
	/* Convert to LE. */
	ctx->tweak[0] = htole64(ctx->tweak[0]);
	/* Last 64 bits of IV are always zero */
	ctx->tweak[1] = 0;

	rijndael_encrypt(&ctx->key2, (uint8_t*)ctx->tweak,
(uint8_t*)ctx->tweak);
}

static void
aes_xts_crypt(struct aes_xts_ctx *ctx, u_int8_t *data, u_int do_encrypt)
{
	size_t i;
	uint64_t crr, tm;

	if (ALIGNED_POINTER(blk, uint64_t)) {
		((uint64_t*)(void*)data)[0] ^= ctx->tweak[0];
		((uint64_t*)(void*)data)[1] ^= ctx->tweak[1];
	} else {
		for (i = 0; i < AES_XTS_BLOCKSIZE; i ++)
			data[i] ^= ((uint8_t*)ctx->tweak)[i];
	}

	if (do_encrypt)
		rijndael_encrypt(&ctx->key1, data, data);
	else
		rijndael_decrypt(&ctx->key1, data, data);

	if (ALIGNED_POINTER(blk, uint64_t)) {
		((uint64_t*)(void*)data)[0] ^= ctx->tweak[0];
		((uint64_t*)(void*)data)[1] ^= ctx->tweak[1];
	} else {
		for (i = 0; i < AES_XTS_BLOCKSIZE; i ++)
			data[i] ^= ((uint8_t*)ctx->tweak)[i];
	}

	/* Exponentiate tweak */
	crr = (ctx->tweak[0] >> ((sizeof(uint64_t) * 8) - 1));
	ctx->tweak[0] = (ctx->tweak[0] << 1);

	tm = ctx->tweak[1];
	ctx->tweak[1] = ((tm << 1) | crr);
	crr = (tm >> ((sizeof(uint64_t) * 8) - 1));

	if (crr)
		ctx->tweak[0] ^= 0x87; /* GF(2^128) generator polynomial. */
}






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54b43144.2d08980a.437b.0f8f>