Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Jan 2015 10:21:24 +0300
From:      rozhuk.im@gmail.com
To:        "'John-Mark Gurney'" <jmg@funkthat.com>
Cc:        freebsd-hackers@freebsd.org, freebsd-geom@freebsd.org
Subject:   RE: ChaCha8/12/20 and GEOM ELI tests
Message-ID:  <54b618f6.43ac700a.3509.2eae@mx.google.com>
In-Reply-To: <20150112233411.GP1949@funkthat.com>
References:  <54b33bfa.e31b980a.3e5d.ffffc823@mx.google.com> <20150112072249.GM1949@funkthat.com> <54b43144.2d08980a.437b.0f8f@mx.google.com> <20150112233411.GP1949@funkthat.com>

next in thread | previous in thread | raw e-mail | index | archive | help

I've updated the patch.
Deleted XTC mode. ChaCha/XChaCha added to GELI.
http://netlab.linkpc.net/download/software/FreeBSD/patches/chacha.patch


> > > Also, where are the man page diffs?  They might have explained the
> > > difference between the two, and explained why two versions of
> chacha
> > > are needed...
> >
> > No man page diffs.
> 
> You need to document the new defines in crypto(9), and document the
> various parameters in crypto(7)...  Yes, not all modes are documented
> in crypto(7), but going forward, at a minimum we need to document new
> additions...
> 
> I'll admit I didn't document the other algorithms as I'm not as familar
> w/ those as the ones that I worked one...

Agree.


 
> > Man pages does not explain difference between AES-CBC and AES-XTS...
> 
> True, but CBC and XTS (which includes a reference to the standard) are
> a lot more searchable/common knowlege than xchacha..  google thinks you
> mean chacha, and xchacha just turns up a bunch of people on various
> networks... Not until you search on xchacha crypto do you get a
> relevant page...  Also, wikipedia doesn't have an entry for xchacha,
> nor does the chacha (cipher) page list it...  So, when documenting
> xchacha in crypto(7), include a link to the description/standard...

Agree.


> > > Is there a reason you decided to write your own ChaCha
> > > implementation instead of using one of the standard ones?  Did you
> > > run performance tests between your implementation and others?
> >
> > Reference ChaCha and reference (FreeBSD) XTS (4k sector):
> > ChaCha8-XTS-256   = 199518722 bytes/sec
> > ChaCha12-XTS-256  = 179029849 bytes/sec
> > ChaCha20-XTS-256  = 149447317 bytes/sec
> > XChaCha8-XTS-256  = 195675728 bytes/sec
> > XChaCha12-XTS-256 = 175790196 bytes/sec
> > XChaCha20-XTS-256 = 147939263 bytes/sec
> 
> So, you're seeing a 33%-50% improvement, good to hear...
> 
> Also, do you publish this implementation somewhere?  If so, it'd be
> helpful to include a url to where up to date versions can be
> obtained...
> If you don't plan on publishing/maintaining it outside of FreeBSD, then
> we need to unifdef out the Windows parts of it for our tree...

On my own site:
http://www.netlab.linkpc.net/download/software/SDK/core/include/chacha.h
(working copy)
This is not FreeBSD kernel specific, I also test it under Windows - 32 bit
and FreeBSD user space.
geli (user space) also use this code to encrypt/decrypt password/metadata.



> > This is the reference version adapted for use in /dev/crypto.
> > chacha_block_unaligneg() - processing the reference version of a data
> block.
> > Macros are used for readability.
> > chacha_block_aligned() - the same but the work on the aligned data.
> 
> Please use the macro __NO_STRICT_ALIGNMENT to decide if special work is
> necessary to handle the alignment...

I`m already use ALIGNED_POINTER() macro.

 
> What is the CHACHA_X64 macro for?  If that is to detect LP64 platforms,
> please use the macro __LP64__ to decide this...  Have you done
> performance evaluations on 32bit arches to make sure double rounds
> aren't a benefit there too?

__LP64__ - done.
I run self test on x32, all passed Ok. No speed degradation.


> Use the byteorder(9) macros to encode/decode integers instead of
> rolling your own (U8TO32_LITTLE and U32TO8_LITTLE)...  Turns out
> compilers aren't good at optimizing this type of code, and platforms
> may have assembly optimized versions for these...

1. U8TO32_LITTLE / U32TO8_LITTLE can read/write unaligned data. Can htonl()
handle unaligned input on arm?
2. On LE systems no conversion required.


 
> > To increase speed, instead of one byte is processed for 4/8 byte
> times.
> > The data in the context of an 8-byte aligned.
> > To increase security, all data, including temporary, saved in a
> > context that on completion of the work is filled with zeros.
> 
> Please use the function explicite_bzero that is available for all of
> these instead of creating your own..

explicite_bzero() available only in FreeBSD kernel space.
I`m use bzero() in chacha_zerokey() / xchacha_zerokey() as all other
***_zerokey() functions in this file.




> > Final:
> >
> > struct aes_xts_ctx {
> > 	rijndael_ctx key1;
> > 	rijndael_ctx key2;
> > 	uint64_t tweak[(AES_XTS_BLOCKSIZE / sizeof(uint64_t))]; };
> >
> > void
> > aes_xts_reinit(caddr_t key, u_int8_t *iv) {
> > 	struct aes_xts_ctx *ctx = (struct aes_xts_ctx *)key;
> >
> > 	/*
> > 	 * Prepare tweak as E_k2(IV). IV is specified as LE
> representation
> > 	 * of a 64-bit block number which we allow to be passed in
> directly.
> > 	 */
> > 	if (ALIGNED_POINTER(iv, uint64_t)) {
> > 		ctx->tweak[0] = (*((uint64_t*)(void*)iv));
> > 	} else {
> > 		bcopy(iv, ctx->tweak, sizeof(uint64_t));
> > 	}
> > 	/* Convert to LE. */
> > 	ctx->tweak[0] = htole64(ctx->tweak[0]);
> 
> Hmm... this line bothers me.. I'll need to spend more time reading up
> to decide if it is buggy or not...  Is ctx->tweak in host order? or LE
> order?  I believe it's suppose to be LE order, as it gets passed
> directly to _encryt..  I'm also not sure if the original code is BE
> clean, which is part of my problem...

I hope to see an optimized version soon to 10x :)






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54b618f6.43ac700a.3509.2eae>