Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Jan 2015 19:40:13 -0800
From:      Alexey Ivanov <savetherbtz@gmail.com>
To:        rozhuk.im@gmail.com, John-Mark Gurney <jmg@funkthat.com>
Cc:        freebsd-hackers@freebsd.org, freebsd-geom@freebsd.org
Subject:   Re: ChaCha8/12/20 and GEOM ELI tests
Message-ID:  <7A712B22-1151-4A80-970A-36C0C2A63653@gmail.com>
In-Reply-To: <20150112233411.GP1949@funkthat.com>
References:  <54b33bfa.e31b980a.3e5d.ffffc823@mx.google.com> <20150112072249.GM1949@funkthat.com> <54b43144.2d08980a.437b.0f8f@mx.google.com> <20150112233411.GP1949@funkthat.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_43A9D6FE-F5F9-4CCC-B6A3-B8B5171B44D8
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

Just curious: why does a stream cipher use mode of operation (e.g. XTS)?

> On Jan 12, 2015, at 3:34 PM, John-Mark Gurney <jmg@funkthat.com> =
wrote:
>=20
> rozhuk.im@gmail.com wrote this message on Mon, Jan 12, 2015 at 23:40 =
+0300:
>>>> Cha?ha patch:
>>>>=20
>>> =
http://netlab.linkpc.net/download/software/FreeBSD/patches/chacha.patch
>>>=20
>>> What's the difference between CHACHA and XCHACHA?
>>=20
>> Same as between SALSA and XSALSA.
>>=20
>> XChaCha20 uses a 256-bit key as well as the first 128 bits of the =
nonce in
>> order to compute a subkey. This subkey, as well as the remaining 64 =
bits of
>> the nonce, are the parameters of the ChaCha20 function used to =
actually
>> generate the stream.
>>=20
>> But with XChaCha20's longer nonce, it is safe to generate nonces =
using
>> randombytes_buf() for every message encrypted with the same key =
without
>> having to worry about a collision.
>>=20
>> More details: http://cr.yp.to/snuffle/xsalsa-20081128.pdf
>=20
> Ahh, thanks..
>=20
>>> Also, where are the man page diffs?  They might have explained the
>>> difference between the two, and explained why two versions of chacha
>>> are needed...
>>=20
>> No man page diffs.
>=20
> You need to document the new defines in crypto(9), and document the
> various parameters in crypto(7)...  Yes, not all modes are documented
> in crypto(7), but going forward, at a minimum we need to document new
> additions...
>=20
> I'll admit I didn't document the other algorithms as I'm not as =
familar
> w/ those as the ones that I worked one...
>=20
>> Man pages does not explain difference between AES-CBC and AES-XTS...
>=20
> True, but CBC and XTS (which includes a reference to the standard) are
> a lot more searchable/common knowlege than xchacha..  google thinks =
you
> mean chacha, and xchacha just turns up a bunch of people on various
> networks... Not until you search on xchacha crypto do you get a =
relevant
> page...  Also, wikipedia doesn't have an entry for xchacha, nor does
> the chacha (cipher) page list it...  So, when documenting xchacha in
> crypto(7), include a link to the description/standard...
>=20
>>> Is there a reason you decided to write your own ChaCha =
implementation
>>> instead of using one of the standard ones?  Did you run performance
>>> tests between your implementation and others?
>>=20
>> Reference ChaCha and reference (FreeBSD) XTS (4k sector):
>> ChaCha8-XTS-256   =3D 199518722 bytes/sec
>> ChaCha12-XTS-256  =3D 179029849 bytes/sec
>> ChaCha20-XTS-256  =3D 149447317 bytes/sec
>> XChaCha8-XTS-256  =3D 195675728 bytes/sec
>> XChaCha12-XTS-256 =3D 175790196 bytes/sec
>> XChaCha20-XTS-256 =3D 147939263 bytes/sec
>=20
> So, you're seeing a 33%-50% improvement, good to hear...
>=20
> Also, do you publish this implementation somewhere?  If so, it'd be
> helpful to include a url to where up to date versions can be =
obtained...
> If you don't plan on publishing/maintaining it outside of FreeBSD, =
then
> we need to unifdef out the Windows parts of it for our tree...
>=20
>> This is the reference version adapted for use in /dev/crypto.
>> chacha_block_unaligneg() - processing the reference version of a data =
block.
>> Macros are used for readability.
>> chacha_block_aligned() - the same but the work on the aligned data.
>=20
> Please use the macro __NO_STRICT_ALIGNMENT to decide if special work
> is necessary to handle the alignment...
>=20
> What is the CHACHA_X64 macro for?  If that is to detect LP64 =
platforms,
> please use the macro __LP64__ to decide this...  Have you done
> performance evaluations on 32bit arches to make sure double rounds =
aren't
> a benefit there too?
>=20
> Use the byteorder(9) macros to encode/decode integers instead of =
rolling
> your own (U8TO32_LITTLE and U32TO8_LITTLE)...  Turns out compilers =
aren't
> good at optimizing this type of code, and platforms may have assembly
> optimized versions for these...
>=20
>> To increase speed, instead of one byte is processed for 4/8 byte =
times.
>> The data in the context of an 8-byte aligned.
>> To increase security, all data, including temporary, saved in a =
context that
>> on completion of the work is filled with zeros.
>=20
> Please use the function explicite_bzero that is available for all of
> these instead of creating your own..
>=20
>>>> HW: Core Duo E8500, 8Gb DDR2-800.
>>>> dd if=3D/dev/zero of=3D/dev/md0 bs=3D1m
>>>> 2148489421 bytes/sec
>>>>=20
>>>>=20
>>>> # sector =3D 512b
>>>> 3DES-CBC-192      =3D  20773120 bytes/sec
>>>> AES-CBC-128       =3D  85276853 bytes/sec
>>>> AES-CBC-256       =3D  68893016 bytes/sec
>>>> AES-XTS-128       =3D  68194868 bytes/sec
>>>> AES-XTS-256       =3D  56611573 bytes/sec
>>>> Blowfish-CBC-128  =3D  11169657 bytes/sec
>>>> Blowfish-CBC-256  =3D  11185891 bytes/sec
>>>> Camellia-CBC-128  =3D  78077243 bytes/sec
>>>> Camellia-CBC-256  =3D  65732219 bytes/sec
>>>> ChaCha8-XTS-256   =3D 258042765 bytes/sec
>>>> ChaCha12-XTS-256  =3D 223616967 bytes/sec
>>>> ChaCha20-XTS-256  =3D 176005366 bytes/sec
>>>> XChaCha8-XTS-256  =3D 228292624 bytes/sec
>>>> XChaCha12-XTS-256 =3D 195577624 bytes/sec
>>>> XChaCha20-XTS-256 =3D 152247267 bytes/sec
>>>> XChaCha20-XTS-128 =3D 152717737 bytes/sec ! 128 bit key have same =
speed
>>>> as 256
>>>>=20
>>>>=20
>>>> # sector =3D 4kb
>>>> 3DES-CBC-192      =3D  22018189 bytes/sec
>>>> AES-CBC-128       =3D 104097143 bytes/sec
>>>> AES-CBC-256       =3D  81983833 bytes/sec
>>>> AES-XTS-128       =3D  78559346 bytes/sec
>>>> AES-XTS-256       =3D  66047200 bytes/sec
>>>> Blowfish-CBC-128  =3D  38635464 bytes/sec
>>>> Blowfish-CBC-256  =3D  38810555 bytes/sec
>>>> Camellia-CBC-128  =3D  92814510 bytes/sec
>>>> Camellia-CBC-256  =3D  75949489 bytes/sec
>>>> ChaCha8-XTS-256   =3D 337336982 bytes/sec
>>>> ChaCha12-XTS-256  =3D 284740187 bytes/sec
>>>> ChaCha20-XTS-256  =3D 217326865 bytes/sec
>>>> XChaCha8-XTS-256  =3D 328424551 bytes/sec
>>>> XChaCha12-XTS-256 =3D 278579692 bytes/sec
>>>> XChaCha20-XTS-256 =3D 211660225 bytes/sec
>>>>=20
>>>> Optimized AES-XTS - speed like AES-CBC:
>>>> AES-XTS-128       =3D 102841051 bytes/sec
>>>> AES-XTS-256       =3D  80813644 bytes/sec
>>>=20
>>> Is this from a different patch or what?  Can you talk more about =
this?
>>=20
>> No patch at this moment.
>> After optimization ChaCha-XTS I applied these optimizations to the =
AES-XTS
>> and get this result.
>> All changes were aes_xts_reinit() and aes_xts_crypt(), just slightly =
changed
>> the structure aes_xts_ctx.
>>=20
>> aes_xts_ctx:
>> u_int8_t tweak[] -> u_int64_t tweak[]
>>=20
>> aes_xts_reinit -> same as chacha_xts_reinit()
>>=20
>> aes_xts_crypt -> same as chacha_xts_crypt():
>> block[] - temp buf removed;
>> xor 1 byte -> xor 8 bytes at once;
>> tweak[i] << 1: rotl 1 bit: 1 byte -> 8 bytes;
>> unroll loops;
>=20
> Ahh, I thought I had done some similar optimizations, but I only did
> them to the aesni version of the routines...  You should use the macro
> above to decide if things are aligned or not...
>=20
>>=20
>> Final:
>>=20
>> struct aes_xts_ctx {
>> 	rijndael_ctx key1;
>> 	rijndael_ctx key2;
>> 	uint64_t tweak[(AES_XTS_BLOCKSIZE / sizeof(uint64_t))];
>> };
>>=20
>> void
>> aes_xts_reinit(caddr_t key, u_int8_t *iv)
>> {
>> 	struct aes_xts_ctx *ctx =3D (struct aes_xts_ctx *)key;
>>=20
>> 	/*
>> 	 * Prepare tweak as E_k2(IV). IV is specified as LE =
representation
>> 	 * of a 64-bit block number which we allow to be passed in =
directly.
>> 	 */
>> 	if (ALIGNED_POINTER(iv, uint64_t)) {
>> 		ctx->tweak[0] =3D (*((uint64_t*)(void*)iv));
>> 	} else {
>> 		bcopy(iv, ctx->tweak, sizeof(uint64_t));
>> 	}
>> 	/* Convert to LE. */
>> 	ctx->tweak[0] =3D htole64(ctx->tweak[0]);
>=20
> Hmm... this line bothers me.. I'll need to spend more time reading up
> to decide if it is buggy or not...  Is ctx->tweak in host order? or LE
> order?  I believe it's suppose to be LE order, as it gets passed
> directly to _encryt..  I'm also not sure if the original code is BE
> clean, which is part of my problem...
>=20
>> 	/* Last 64 bits of IV are always zero */
>> 	ctx->tweak[1] =3D 0;
>>=20
>> 	rijndael_encrypt(&ctx->key2, (uint8_t*)ctx->tweak,
>> (uint8_t*)ctx->tweak);
>> }
>>=20
>> static void
>> aes_xts_crypt(struct aes_xts_ctx *ctx, u_int8_t *data, u_int =
do_encrypt)
>> {
>> 	size_t i;
>> 	uint64_t crr, tm;
>>=20
>> 	if (ALIGNED_POINTER(blk, uint64_t)) {
>> 		((uint64_t*)(void*)data)[0] ^=3D ctx->tweak[0];
>> 		((uint64_t*)(void*)data)[1] ^=3D ctx->tweak[1];
>> 	} else {
>> 		for (i =3D 0; i < AES_XTS_BLOCKSIZE; i ++)
>> 			data[i] ^=3D ((uint8_t*)ctx->tweak)[i];
>> 	}
>>=20
>> 	if (do_encrypt)
>> 		rijndael_encrypt(&ctx->key1, data, data);
>> 	else
>> 		rijndael_decrypt(&ctx->key1, data, data);
>>=20
>> 	if (ALIGNED_POINTER(blk, uint64_t)) {
>> 		((uint64_t*)(void*)data)[0] ^=3D ctx->tweak[0];
>> 		((uint64_t*)(void*)data)[1] ^=3D ctx->tweak[1];
>> 	} else {
>> 		for (i =3D 0; i < AES_XTS_BLOCKSIZE; i ++)
>> 			data[i] ^=3D ((uint8_t*)ctx->tweak)[i];
>> 	}
>>=20
>> 	/* Exponentiate tweak */
>> 	crr =3D (ctx->tweak[0] >> ((sizeof(uint64_t) * 8) - 1));
>> 	ctx->tweak[0] =3D (ctx->tweak[0] << 1);
>>=20
>> 	tm =3D ctx->tweak[1];
>> 	ctx->tweak[1] =3D ((tm << 1) | crr);
>> 	crr =3D (tm >> ((sizeof(uint64_t) * 8) - 1));
>>=20
>> 	if (crr)
>> 		ctx->tweak[0] ^=3D 0x87; /* GF(2^128) generator =
polynomial. */
>=20
> Please use the AES_XTS_ALPHA define instead of hardcoding the value..
>=20
> Thanks.
>=20
> --
>  John-Mark Gurney				Voice: +1 415 225 5579
>=20
>     "All that I will do, has been done, All that I have, has not."
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to =
"freebsd-hackers-unsubscribe@freebsd.org"


--Apple-Mail=_43A9D6FE-F5F9-4CCC-B6A3-B8B5171B44D8
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJUtJOjAAoJECvXQw+IBr2aY+8P/iuzKXskTgHrRJUYnLcL6B9M
JXD/KGCX38n5jEt7wzkv5dlvihnXvYaHxdvA0xQ7ehCEWdBwV4w/lwWnMcl10a1n
SIbzyDtk5diYsbHBKLEQE3uuWXG1dcC08+LS3J4QYz0oYJzdJkVe/8Ci3FSCNhGX
tt2RZJjMikVZcMU9/4TD51zvKbJfWaZOiS6Z/BTU/gWmPx0+HzelbudR8zrs6w3+
0ow8PZE39qaj+RIxHjUhQyHGXRMnGW2ebrX/7nanVTO2j6Hxxip1Kqfc3Aa3wSIx
S2NrL2VCA+vOfAcHqeAFOjAPrnasYivR3Rjw1aJ8u7m7wwn2ZVTSfGgykR+rvuIp
wNWCb7N+487yLTxVH4+xso8hUnxEAJ/rkVQaS44JR3Bm0hGUkDaPZ5obp+7Szu3S
BJAqHLkKn5NqHyXENfKdZQEFYHEot9m9H1gNWXqWSmk/0sed7bC1CjD3LQ3MRCQk
tRjr6REATviqRT/DRKwQ7ldX1GUe3WN6t2ozA4xbxM/H7IGdKkztmZ9p4urnhIgp
3B6NhWzhX7bVkHZbEu/dq8WC8ZQMF+PlfcOTyDb8wl8Dfb9va/+vriV6zOosKOMU
tzAbK/kDgSE/m2Aum1xYlCC1NxW02VfHrEVYGP2YHfA1i9a1fa+yqkR3gMYZjpHE
qLjhXxVTMebG60ru9R84
=IQih
-----END PGP SIGNATURE-----

--Apple-Mail=_43A9D6FE-F5F9-4CCC-B6A3-B8B5171B44D8--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7A712B22-1151-4A80-970A-36C0C2A63653>