Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Feb 2017 13:27:52 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Conrad Meyer <cem@freebsd.org>
Cc:        Bruce Evans <brde@optusnet.com.au>,  Konstantin Belousov <kostikbel@gmail.com>,  src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r313006 - in head: sys/conf sys/libkern sys/libkern/x86 sys/sys tests/sys/kern
Message-ID:  <20170228121335.Q2733@besplex.bde.org>
In-Reply-To: <CAG6CVpV8fqMd82hjYoyDfO3f5P-x6%2B0OJDoQHtqXqY_tfWtZsA@mail.gmail.com>
References:  <201701310326.v0V3QW30024375@repo.freebsd.org> <20170202184819.GP2092@kib.kiev.ua> <20170203062806.A2690@besplex.bde.org> <CAG6CVpV8fqMd82hjYoyDfO3f5P-x6%2B0OJDoQHtqXqY_tfWtZsA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--0-952914049-1488248872=:2733
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

On Mon, 27 Feb 2017, Conrad Meyer wrote:

> On Thu, Feb 2, 2017 at 12:29 PM, Bruce Evans <brde@optusnet.com.au> wrote:
>> I've almost finished fixing and optimizing this.  I didn't manage to fix
>> all the compiler pessimizations, but the result is within 5% of optimal
>> for buffers larger than a few K.
>
> Did you ever get to a final patch that you are satisfied with?  It
> would be good to get this improvement into the tree.

I'm happy with this version (attached and partly enclosed).  You need to
test it in the kernel and commit it (I on;y did simple correctness tests
in userland).

X Index: conf/files.amd64
X ===================================================================
X --- conf/files.amd64	(revision 314363)
X +++ conf/files.amd64	(working copy)
X @@ -545,6 +545,9 @@
X  isa/vga_isa.c			optional	vga
X  kern/kern_clocksource.c		standard
X  kern/link_elf_obj.c		standard
X +libkern/x86/crc32_sse42.c	standard
X +libkern/memmove.c		standard
X +libkern/memset.c		standard

Also fix some nearby disorder.

X ...
X Index: libkern/x86/crc32_sse42.c
X ===================================================================
X --- libkern/x86/crc32_sse42.c	(revision 314363)
X +++ libkern/x86/crc32_sse42.c	(working copy)
X @@ -31,15 +31,41 @@
X   */
X  #ifdef USERSPACE_TESTING
X  #include <stdint.h>
X +#include <stdlib.h>
X  #else
X  #include <sys/param.h>
X +#include <sys/systm.h>
X  #include <sys/kernel.h>
X -#include <sys/libkern.h>
X -#include <sys/systm.h>
X  #endif

Also fix minor #include errors.

X 
X -#include <nmmintrin.h>
X +static __inline uint32_t
X +_mm_crc32_u8(uint32_t x, uint8_t y)
X +{
X +	/*
X +	 * clang (at least 3.9.[0-1]) pessimizes "rm" (y) and "m" (y)
X +	 * significantly and "r" (y) a lot by copying y to a different
X +	 * local variable (on the stack or in a register), so only use
X +	 * the latter.  This costs a register and an instruction but
X +	 * not a uop.
X +	 */
X +	__asm("crc32b %1,%0" : "+r" (x) : "r" (y));
X +	return (x);
X +}

Using intrinsics avoids the silly copying via the stack, and allows more
unrolling.  Old gcc does more unrolling with just asms.  Unrolling is
almost useless (some details below).

X @@ -47,12 +73,14 @@
X   * Block sizes for three-way parallel crc computation.  LONG and SHORT must
X   * both be powers of two.
X   */
X -#define LONG	8192
X -#define SHORT	256
X +#define LONG	128
X +#define SHORT	64

These are aggressively low.

Note that small buffers aren't handled very well.  SHORT = 64 means that
a buffer of size 3 * 64 = 192 is handled entirely by the "SHORT" loop.
192 is not very small, but any smaller than that and overheads for
adjustment at the end of the loop are too large for the "SHORT" loop
to be worth doing.  Almost any value of LONG larger than 128 works OK
now, but if LONG is large then it gives too much work for the "SHORT"
loop, since normal buffer sizes are not a multiple of 3.  E.g., with
the old LONG and SHORT, a buffer of size 128 was decomposed as 5 * 24K
(done almost optimally by the "LONG" loop) + 10 * 768 (done a bit less
optimally by the "SHORT" loop) + 10 * 768 + 64 * 8 (done pessimally).

I didn't get around to ifdefing this for i386.  On i386, the loops take
twice as many crc32 instructions for a given byte count, so the timing
is satisfed by a byte count half as large, so SHORT and LONG can be
reduced by a factor of 2 to give faster handling for small buffers without
affecting the speed for large buffers significantly.

X 
X  /* Tables for hardware crc that shift a crc by LONG and SHORT zeros. */
X  static uint32_t crc32c_long[4][256];
X +static uint32_t crc32c_2long[4][256];
X  static uint32_t crc32c_short[4][256];
X +static uint32_t crc32c_2short[4][256];

I didn't get around to updating the comment.  2long shifts by 2*LONG zeros,
etc.

Shifts by 3N are done by adding shifts by 1N and 2N in parallel.  I couldn't
get the direct 3N shift to run any faster.

X @@ -190,7 +220,11 @@
X  	const size_t align = 4;
X  #endif
X  	const unsigned char *next, *end;
X -	uint64_t crc0, crc1, crc2;      /* need to be 64 bits for crc32q */
X +#ifdef __amd64__
X +	uint64_t crc0, crc1, crc2;
X +#else
X +	uint32_t crc0, crc1, crc2;
X +#endif
X 
X  	next = buf;
X  	crc0 = crc;

64 bits of course isn't needed for i386.  It isn't needed for amd64 either.
I think the crc32 instruction zeros the top 32 bits so they can be ignored.
However, when I modified the asm to return 32 bits to tell the compiler about
this (which the intrinsic wouldn't be able to do) and used 32 bits here,
this was just slightly slower.

For some intermediate crc calculations, only 32 bits are used, and the
compiler can see this.  clang on amd64 optimizes this better than gcc,
starting with all the intermediate crc variables declared as 64 bits.
gcc generates worst code when some of the intermediates are declared as
32 bits.  So keep using 64 bits on amd64 here.

X @@ -202,6 +236,7 @@
X  		len--;
X  	}
X 
X +#if LONG > SHORT
X  	/*
X  	 * Compute the crc on sets of LONG*3 bytes, executing three independent
X  	 * crc instructions, each on LONG bytes -- this is optimized for the

LONG = SHORT = 64 works OK on Haswell, but I suspect that slower CPUs
benefit from larger values and I want to keep SHORT as small as possible
for the fastest CPUs.

X @@ -209,6 +244,7 @@
X  	 * have a throughput of one crc per cycle, but a latency of three
X  	 * cycles.
X  	 */
X +	crc = 0;
X  	while (len >= LONG * 3) {
X  		crc1 = 0;
X  		crc2 = 0;
X @@ -229,16 +265,64 @@
X  #endif
X  			next += align;
X  		} while (next < end);
X -		crc0 = crc32c_shift(crc32c_long, crc0) ^ crc1;
X -		crc0 = crc32c_shift(crc32c_long, crc0) ^ crc2;
X +		/*-
X +		 * Update the crc.  Try to do it in parallel with the inner
X +		 * loop.  'crc' is used to accumulate crc0 and crc1
X +		 * produced by the inner loop so that the next iteration
X +		 * of the loop doesn't depend on anything except crc2.
X +		 *
X +		 * The full expression for the update is:
X +		 *     crc = S*S*S*crc + S*S*crc0 + S*crc1
X +		 * where the terms are polynomials modulo the CRC polynomial.
X +		 * We regroup this subtly as:
X +		 *     crc = S*S * (S*crc + crc0) + S*crc1.
X +		 * This has an extra dependency which reduces possible
X +		 * parallelism for the expression, but it turns out to be
X +		 * best to intentionally delay evaluation of this expression
X +		 * so that it competes less with the inner loop.
X +		 *
X +		 * We also intentionally reduce parallelism by feedng back
X +		 * crc2 to the inner loop as crc0 instead of accumulating
X +		 * it in crc.  This synchronizes the loop with crc update.
X +		 * CPU and/or compiler schedulers produced bad order without
X +		 * this.
X +		 *
X +		 * Shifts take about 12 cycles each, so 3 here with 2
X +		 * parallelizable take about 24 cycles and the crc update
X +		 * takes slightly longer.  8 dependent crc32 instructions
X +		 * can run in 24 cycles, so the 3-way blocking is worse
X +		 * than useless for sizes less than 8 * <word size> = 64
X +		 * on amd64.  In practice, SHORT = 32 confirms these
X +		 * timing calculations by giving a small improvement
X +		 * starting at size 96.  Then the inner loop takes about
X +		 * 12 cycles and the crc update about 24, but these are
X +		 * partly in parallel so the total time is less than the
X +		 * 36 cycles that 12 dependent crc32 instructions would
X +		 * take.
X +		 *
X +		 * To have a chance of completely hiding the overhead for
X +		 * the crc update, the inner loop must take considerably
X +		 * longer than 24 cycles.  LONG = 64 makes the inner loop
X +		 * take about 24 cycles, so is not quite large enough.
X +		 * LONG = 128 works OK.  Unhideable overheads are about
X +		 * 12 cycles per inner loop.  All assuming timing like
X +		 * Haswell.
X +		 */
X +		crc = crc32c_shift(crc32c_long, crc) ^ crc0;
X +		crc1 = crc32c_shift(crc32c_long, crc1);
X +		crc = crc32c_shift(crc32c_2long, crc) ^ crc1;
X +		crc0 = crc2;
X  		next += LONG * 2;
X  		len -= LONG * 3;
X  	}
X +	crc0 ^= crc;
X +#endif /* LONG > SHORT */
X 
X  	/*
X  	 * Do the same thing, but now on SHORT*3 blocks for the remaining data
X  	 * less than a LONG*3 block
X  	 */
X +	crc = 0;
X  	while (len >= SHORT * 3) {
X  		crc1 = 0;
X  		crc2 = 0;

See the comment.

X @@ -259,11 +343,14 @@
X  #endif
X  			next += align;

When SHORT is about what it is (64), on amd64 the "SHORT" loop has 24 crc32
instructions and compilers sometimes to unroll them all.  This makes little
difference.

X  		} while (next < end);
X -		crc0 = crc32c_shift(crc32c_short, crc0) ^ crc1;
X -		crc0 = crc32c_shift(crc32c_short, crc0) ^ crc2;
X +		crc = crc32c_shift(crc32c_short, crc) ^ crc0;
X +		crc1 = crc32c_shift(crc32c_short, crc1);
X +		crc = crc32c_shift(crc32c_2short, crc) ^ crc1;
X +		crc0 = crc2;
X  		next += SHORT * 2;
X  		len -= SHORT * 3;
X  	}
X +	crc0 ^= crc;

The change is perhaps easier to understand without looking at the comment.
We accumulate changes in crc instead of into crc0, so that the next iteration
can start without waiting for accumulation.  This requires more shifting
steps, and we try to arrange these optimally.

X 
X  	/* Compute the crc on the remaining bytes at native word size. */
X  	end = next + (len - (len & (align - 1)));

The adjustments for alignment are slow if they are not null, and wasteful
if they are null, but have relatively little cost for the non-small buffers
that are handled well, so I didn't remove them.

Bruce
--0-952914049-1488248872=:2733
Content-Type: TEXT/PLAIN; charset=US-ASCII; name="crc32.dif"
Content-Transfer-Encoding: BASE64
Content-ID: <20170228132752.J2733@besplex.bde.org>
Content-Description: 
Content-Disposition: attachment; filename="crc32.dif"

SW5kZXg6IGNvbmYvZmlsZXMuYW1kNjQNCj09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT0NCi0tLSBjb25mL2ZpbGVzLmFtZDY0CShyZXZpc2lvbiAzMTQzNjMpDQor
KysgY29uZi9maWxlcy5hbWQ2NAkod29ya2luZyBjb3B5KQ0KQEAgLTU0NSw2
ICs1NDUsOSBAQA0KIGlzYS92Z2FfaXNhLmMJCQlvcHRpb25hbAl2Z2ENCiBr
ZXJuL2tlcm5fY2xvY2tzb3VyY2UuYwkJc3RhbmRhcmQNCiBrZXJuL2xpbmtf
ZWxmX29iai5jCQlzdGFuZGFyZA0KK2xpYmtlcm4veDg2L2NyYzMyX3NzZTQy
LmMJc3RhbmRhcmQNCitsaWJrZXJuL21lbW1vdmUuYwkJc3RhbmRhcmQNCits
aWJrZXJuL21lbXNldC5jCQlzdGFuZGFyZA0KICMNCiAjIElBMzIgYmluYXJ5
IHN1cHBvcnQNCiAjDQpAQCAtNjAyLDE0ICs2MDUsNiBAQA0KIGNvbXBhdC9u
ZGlzL3N1YnJfdXNiZC5jCQlvcHRpb25hbAluZGlzYXBpIHBjaQ0KIGNvbXBh
dC9uZGlzL3dpbng2NF93cmFwLlMJb3B0aW9uYWwJbmRpc2FwaSBwY2kNCiAj
DQotY3JjMzJfc3NlNDIubwkJCXN0YW5kYXJkCQkJCVwNCi0JZGVwZW5kZW5j
eQkiJFMvbGlia2Vybi94ODYvY3JjMzJfc3NlNDIuYyIJCQlcDQotCWNvbXBp
bGUtd2l0aAkiJHtDQ30gLWMgJHtDRkxBR1M6Ti1ub3N0ZGluY30gJHtXRVJS
T1J9ICR7UFJPRn0gLW1zc2U0ICR7LklNUFNSQ30iIFwNCi0Jbm8taW1wbGlj
aXQtcnVsZQkJCQkJCVwNCi0JY2xlYW4JCSJjcmMzMl9zc2U0Mi5vIg0KLWxp
Ymtlcm4vbWVtbW92ZS5jCQlzdGFuZGFyZA0KLWxpYmtlcm4vbWVtc2V0LmMJ
CXN0YW5kYXJkDQotIw0KICMgeDg2IHJlYWwgbW9kZSBCSU9TIGVtdWxhdG9y
LCByZXF1aXJlZCBieSBkcG1zL3BjaS92ZXNhDQogIw0KIGNvbXBhdC94ODZi
aW9zL3g4NmJpb3MuYwlvcHRpb25hbCB4ODZiaW9zIHwgZHBtcyB8IHBjaSB8
IHZlc2ENCkluZGV4OiBjb25mL2ZpbGVzLmkzODYNCj09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT0NCi0tLSBjb25mL2ZpbGVzLmkzODYJKHJldmlzaW9uIDMxNDM2
MykNCisrKyBjb25mL2ZpbGVzLmkzODYJKHdvcmtpbmcgY29weSkNCkBAIC01
NTcsMTEgKzU1Nyw2IEBADQoga2Vybi9pbWdhY3RfYW91dC5jCQlvcHRpb25h
bCBjb21wYXRfYW91dA0KIGtlcm4vaW1nYWN0X2d6aXAuYwkJb3B0aW9uYWwg
Z3ppcA0KIGtlcm4vc3Vicl9zZmJ1Zi5jCQlzdGFuZGFyZA0KLWNyYzMyX3Nz
ZTQyLm8JCQlzdGFuZGFyZAkJCQlcDQotCWRlcGVuZGVuY3kJIiRTL2xpYmtl
cm4veDg2L2NyYzMyX3NzZTQyLmMiCQkJXA0KLQljb21waWxlLXdpdGgJIiR7
Q0N9IC1jICR7Q0ZMQUdTOk4tbm9zdGRpbmN9ICR7V0VSUk9SfSAke1BST0Z9
IC1tc3NlNCAkey5JTVBTUkN9IiBcDQotCW5vLWltcGxpY2l0LXJ1bGUJCQkJ
CQlcDQotCWNsZWFuCQkiY3JjMzJfc3NlNDIubyINCiBsaWJrZXJuL2RpdmRp
My5jCQlzdGFuZGFyZA0KIGxpYmtlcm4vZmZzbGwuYwkJCXN0YW5kYXJkDQog
bGlia2Vybi9mbHNsbC5jCQkJc3RhbmRhcmQNCkBAIC01NzIsNiArNTY3LDcg
QEANCiBsaWJrZXJuL3VjbXBkaTIuYwkJc3RhbmRhcmQNCiBsaWJrZXJuL3Vk
aXZkaTMuYwkJc3RhbmRhcmQNCiBsaWJrZXJuL3Vtb2RkaTMuYwkJc3RhbmRh
cmQNCitsaWJrZXJuL3g4Ni9jcmMzMl9zc2U0Mi5jCXN0YW5kYXJkDQogaTM4
Ni94Ym94L3hib3guYwkJb3B0aW9uYWwgeGJveA0KIGkzODYveGJveC94Ym94
ZmIuYwkJb3B0aW9uYWwgeGJveGZiDQogZGV2L2ZiL2Jvb3RfZm9udC5jCQlv
cHRpb25hbCB4Ym94ZmINCkluZGV4OiBsaWJrZXJuL3g4Ni9jcmMzMl9zc2U0
Mi5jDQo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09DQotLS0gbGlia2Vybi94ODYv
Y3JjMzJfc3NlNDIuYwkocmV2aXNpb24gMzE0MzYzKQ0KKysrIGxpYmtlcm4v
eDg2L2NyYzMyX3NzZTQyLmMJKHdvcmtpbmcgY29weSkNCkBAIC0zMSwxNSAr
MzEsNDEgQEANCiAgKi8NCiAjaWZkZWYgVVNFUlNQQUNFX1RFU1RJTkcNCiAj
aW5jbHVkZSA8c3RkaW50Lmg+DQorI2luY2x1ZGUgPHN0ZGxpYi5oPg0KICNl
bHNlDQogI2luY2x1ZGUgPHN5cy9wYXJhbS5oPg0KKyNpbmNsdWRlIDxzeXMv
c3lzdG0uaD4NCiAjaW5jbHVkZSA8c3lzL2tlcm5lbC5oPg0KLSNpbmNsdWRl
IDxzeXMvbGlia2Vybi5oPg0KLSNpbmNsdWRlIDxzeXMvc3lzdG0uaD4NCiAj
ZW5kaWYNCiANCi0jaW5jbHVkZSA8bm1taW50cmluLmg+DQorc3RhdGljIF9f
aW5saW5lIHVpbnQzMl90DQorX21tX2NyYzMyX3U4KHVpbnQzMl90IHgsIHVp
bnQ4X3QgeSkNCit7DQorCS8qDQorCSAqIGNsYW5nIChhdCBsZWFzdCAzLjku
WzAtMV0pIHBlc3NpbWl6ZXMgInJtIiAoeSkgYW5kICJtIiAoeSkNCisJICog
c2lnbmlmaWNhbnRseSBhbmQgInIiICh5KSBhIGxvdCBieSBjb3B5aW5nIHkg
dG8gYSBkaWZmZXJlbnQNCisJICogbG9jYWwgdmFyaWFibGUgKG9uIHRoZSBz
dGFjayBvciBpbiBhIHJlZ2lzdGVyKSwgc28gb25seSB1c2UNCisJICogdGhl
IGxhdHRlci4gIFRoaXMgY29zdHMgYSByZWdpc3RlciBhbmQgYW4gaW5zdHJ1
Y3Rpb24gYnV0DQorCSAqIG5vdCBhIHVvcC4NCisJICovDQorCV9fYXNtKCJj
cmMzMmIgJTEsJTAiIDogIityIiAoeCkgOiAiciIgKHkpKTsNCisJcmV0dXJu
ICh4KTsNCit9DQogDQorc3RhdGljIF9faW5saW5lIHVpbnQzMl90DQorX21t
X2NyYzMyX3UzMih1aW50MzJfdCB4LCB1aW50MzJfdCB5KQ0KK3sNCisJX19h
c20oImNyYzMybCAlMSwlMCIgOiAiK3IiICh4KSA6ICJyIiAoeSkpOw0KKwly
ZXR1cm4gKHgpOw0KK30NCisNCitzdGF0aWMgX19pbmxpbmUgdWludDY0X3QN
CitfbW1fY3JjMzJfdTY0KHVpbnQ2NF90IHgsIHVpbnQ2NF90IHkpDQorew0K
KwlfX2FzbSgiY3JjMzJxICUxLCUwIiA6ICIrciIgKHgpIDogInIiICh5KSk7
DQorCXJldHVybiAoeCk7DQorfQ0KKw0KIC8qIENSQy0zMkMgKGlTQ1NJKSBw
b2x5bm9taWFsIGluIHJldmVyc2VkIGJpdCBvcmRlci4gKi8NCiAjZGVmaW5l
IFBPTFkJMHg4MmY2M2I3OA0KIA0KQEAgLTQ3LDEyICs3MywxNCBAQA0KICAq
IEJsb2NrIHNpemVzIGZvciB0aHJlZS13YXkgcGFyYWxsZWwgY3JjIGNvbXB1
dGF0aW9uLiAgTE9ORyBhbmQgU0hPUlQgbXVzdA0KICAqIGJvdGggYmUgcG93
ZXJzIG9mIHR3by4NCiAgKi8NCi0jZGVmaW5lIExPTkcJODE5Mg0KLSNkZWZp
bmUgU0hPUlQJMjU2DQorI2RlZmluZSBMT05HCTEyOA0KKyNkZWZpbmUgU0hP
UlQJNjQNCiANCiAvKiBUYWJsZXMgZm9yIGhhcmR3YXJlIGNyYyB0aGF0IHNo
aWZ0IGEgY3JjIGJ5IExPTkcgYW5kIFNIT1JUIHplcm9zLiAqLw0KIHN0YXRp
YyB1aW50MzJfdCBjcmMzMmNfbG9uZ1s0XVsyNTZdOw0KK3N0YXRpYyB1aW50
MzJfdCBjcmMzMmNfMmxvbmdbNF1bMjU2XTsNCiBzdGF0aWMgdWludDMyX3Qg
Y3JjMzJjX3Nob3J0WzRdWzI1Nl07DQorc3RhdGljIHVpbnQzMl90IGNyYzMy
Y18yc2hvcnRbNF1bMjU2XTsNCiANCiAvKg0KICAqIE11bHRpcGx5IGEgbWF0
cml4IHRpbWVzIGEgdmVjdG9yIG92ZXIgdGhlIEdhbG9pcyBmaWVsZCBvZiB0
d28gZWxlbWVudHMsDQpAQCAtMTcxLDcgKzE5OSw5IEBADQogY3JjMzJjX2lu
aXRfaHcodm9pZCkNCiB7DQogCWNyYzMyY196ZXJvcyhjcmMzMmNfbG9uZywg
TE9ORyk7DQorCWNyYzMyY196ZXJvcyhjcmMzMmNfMmxvbmcsIDIgKiBMT05H
KTsNCiAJY3JjMzJjX3plcm9zKGNyYzMyY19zaG9ydCwgU0hPUlQpOw0KKwlj
cmMzMmNfemVyb3MoY3JjMzJjXzJzaG9ydCwgMiAqIFNIT1JUKTsNCiB9DQog
I2lmZGVmIF9LRVJORUwNCiBTWVNJTklUKGNyYzMyY19zc2U0MiwgU0lfU1VC
X0xPQ0ssIFNJX09SREVSX0FOWSwgY3JjMzJjX2luaXRfaHcsIE5VTEwpOw0K
QEAgLTE5MCw3ICsyMjAsMTEgQEANCiAJY29uc3Qgc2l6ZV90IGFsaWduID0g
NDsNCiAjZW5kaWYNCiAJY29uc3QgdW5zaWduZWQgY2hhciAqbmV4dCwgKmVu
ZDsNCi0JdWludDY0X3QgY3JjMCwgY3JjMSwgY3JjMjsgICAgICAvKiBuZWVk
IHRvIGJlIDY0IGJpdHMgZm9yIGNyYzMycSAqLw0KKyNpZmRlZiBfX2FtZDY0
X18NCisJdWludDY0X3QgY3JjMCwgY3JjMSwgY3JjMjsNCisjZWxzZQ0KKwl1
aW50MzJfdCBjcmMwLCBjcmMxLCBjcmMyOw0KKyNlbmRpZg0KIA0KIAluZXh0
ID0gYnVmOw0KIAljcmMwID0gY3JjOw0KQEAgLTIwMiw2ICsyMzYsNyBAQA0K
IAkJbGVuLS07DQogCX0NCiANCisjaWYgTE9ORyA+IFNIT1JUDQogCS8qDQog
CSAqIENvbXB1dGUgdGhlIGNyYyBvbiBzZXRzIG9mIExPTkcqMyBieXRlcywg
ZXhlY3V0aW5nIHRocmVlIGluZGVwZW5kZW50DQogCSAqIGNyYyBpbnN0cnVj
dGlvbnMsIGVhY2ggb24gTE9ORyBieXRlcyAtLSB0aGlzIGlzIG9wdGltaXpl
ZCBmb3IgdGhlDQpAQCAtMjA5LDYgKzI0NCw3IEBADQogCSAqIGhhdmUgYSB0
aHJvdWdocHV0IG9mIG9uZSBjcmMgcGVyIGN5Y2xlLCBidXQgYSBsYXRlbmN5
IG9mIHRocmVlDQogCSAqIGN5Y2xlcy4NCiAJICovDQorCWNyYyA9IDA7DQog
CXdoaWxlIChsZW4gPj0gTE9ORyAqIDMpIHsNCiAJCWNyYzEgPSAwOw0KIAkJ
Y3JjMiA9IDA7DQpAQCAtMjI5LDE2ICsyNjUsNjQgQEANCiAjZW5kaWYNCiAJ
CQluZXh0ICs9IGFsaWduOw0KIAkJfSB3aGlsZSAobmV4dCA8IGVuZCk7DQot
CQljcmMwID0gY3JjMzJjX3NoaWZ0KGNyYzMyY19sb25nLCBjcmMwKSBeIGNy
YzE7DQotCQljcmMwID0gY3JjMzJjX3NoaWZ0KGNyYzMyY19sb25nLCBjcmMw
KSBeIGNyYzI7DQorCQkvKi0NCisJCSAqIFVwZGF0ZSB0aGUgY3JjLiAgVHJ5
IHRvIGRvIGl0IGluIHBhcmFsbGVsIHdpdGggdGhlIGlubmVyDQorCQkgKiBs
b29wLiAgJ2NyYycgaXMgdXNlZCB0byBhY2N1bXVsYXRlIGNyYzAgYW5kIGNy
YzENCisJCSAqIHByb2R1Y2VkIGJ5IHRoZSBpbm5lciBsb29wIHNvIHRoYXQg
dGhlIG5leHQgaXRlcmF0aW9uDQorCQkgKiBvZiB0aGUgbG9vcCBkb2Vzbid0
IGRlcGVuZCBvbiBhbnl0aGluZyBleGNlcHQgY3JjMi4NCisJCSAqDQorCQkg
KiBUaGUgZnVsbCBleHByZXNzaW9uIGZvciB0aGUgdXBkYXRlIGlzOg0KKwkJ
ICogICAgIGNyYyA9IFMqUypTKmNyYyArIFMqUypjcmMwICsgUypjcmMxDQor
CQkgKiB3aGVyZSB0aGUgdGVybXMgYXJlIHBvbHlub21pYWxzIG1vZHVsbyB0
aGUgQ1JDIHBvbHlub21pYWwuDQorCQkgKiBXZSByZWdyb3VwIHRoaXMgc3Vi
dGx5IGFzOg0KKwkJICogICAgIGNyYyA9IFMqUyAqIChTKmNyYyArIGNyYzAp
ICsgUypjcmMxLg0KKwkJICogVGhpcyBoYXMgYW4gZXh0cmEgZGVwZW5kZW5j
eSB3aGljaCByZWR1Y2VzIHBvc3NpYmxlDQorCQkgKiBwYXJhbGxlbGlzbSBm
b3IgdGhlIGV4cHJlc3Npb24sIGJ1dCBpdCB0dXJucyBvdXQgdG8gYmUNCisJ
CSAqIGJlc3QgdG8gaW50ZW50aW9uYWxseSBkZWxheSBldmFsdWF0aW9uIG9m
IHRoaXMgZXhwcmVzc2lvbg0KKwkJICogc28gdGhhdCBpdCBjb21wZXRlcyBs
ZXNzIHdpdGggdGhlIGlubmVyIGxvb3AuDQorCQkgKg0KKwkJICogV2UgYWxz
byBpbnRlbnRpb25hbGx5IHJlZHVjZSBwYXJhbGxlbGlzbSBieSBmZWVkbmcg
YmFjaw0KKwkJICogY3JjMiB0byB0aGUgaW5uZXIgbG9vcCBhcyBjcmMwIGlu
c3RlYWQgb2YgYWNjdW11bGF0aW5nDQorCQkgKiBpdCBpbiBjcmMuICBUaGlz
IHN5bmNocm9uaXplcyB0aGUgbG9vcCB3aXRoIGNyYyB1cGRhdGUuDQorCQkg
KiBDUFUgYW5kL29yIGNvbXBpbGVyIHNjaGVkdWxlcnMgcHJvZHVjZWQgYmFk
IG9yZGVyIHdpdGhvdXQNCisJCSAqIHRoaXMuDQorCQkgKg0KKwkJICogU2hp
ZnRzIHRha2UgYWJvdXQgMTIgY3ljbGVzIGVhY2gsIHNvIDMgaGVyZSB3aXRo
IDINCisJCSAqIHBhcmFsbGVsaXphYmxlIHRha2UgYWJvdXQgMjQgY3ljbGVz
IGFuZCB0aGUgY3JjIHVwZGF0ZQ0KKwkJICogdGFrZXMgc2xpZ2h0bHkgbG9u
Z2VyLiAgOCBkZXBlbmRlbnQgY3JjMzIgaW5zdHJ1Y3Rpb25zDQorCQkgKiBj
YW4gcnVuIGluIDI0IGN5Y2xlcywgc28gdGhlIDMtd2F5IGJsb2NraW5nIGlz
IHdvcnNlDQorCQkgKiB0aGFuIHVzZWxlc3MgZm9yIHNpemVzIGxlc3MgdGhh
biA4ICogPHdvcmQgc2l6ZT4gPSA2NA0KKwkJICogb24gYW1kNjQuICBJbiBw
cmFjdGljZSwgU0hPUlQgPSAzMiBjb25maXJtcyB0aGVzZQ0KKwkJICogdGlt
aW5nIGNhbGN1bGF0aW9ucyBieSBnaXZpbmcgYSBzbWFsbCBpbXByb3ZlbWVu
dA0KKwkJICogc3RhcnRpbmcgYXQgc2l6ZSA5Ni4gIFRoZW4gdGhlIGlubmVy
IGxvb3AgdGFrZXMgYWJvdXQNCisJCSAqIDEyIGN5Y2xlcyBhbmQgdGhlIGNy
YyB1cGRhdGUgYWJvdXQgMjQsIGJ1dCB0aGVzZSBhcmUNCisJCSAqIHBhcnRs
eSBpbiBwYXJhbGxlbCBzbyB0aGUgdG90YWwgdGltZSBpcyBsZXNzIHRoYW4g
dGhlDQorCQkgKiAzNiBjeWNsZXMgdGhhdCAxMiBkZXBlbmRlbnQgY3JjMzIg
aW5zdHJ1Y3Rpb25zIHdvdWxkDQorCQkgKiB0YWtlLg0KKwkJICoNCisJCSAq
IFRvIGhhdmUgYSBjaGFuY2Ugb2YgY29tcGxldGVseSBoaWRpbmcgdGhlIG92
ZXJoZWFkIGZvcg0KKwkJICogdGhlIGNyYyB1cGRhdGUsIHRoZSBpbm5lciBs
b29wIG11c3QgdGFrZSBjb25zaWRlcmFibHkNCisJCSAqIGxvbmdlciB0aGFu
IDI0IGN5Y2xlcy4gIExPTkcgPSA2NCBtYWtlcyB0aGUgaW5uZXIgbG9vcA0K
KwkJICogdGFrZSBhYm91dCAyNCBjeWNsZXMsIHNvIGlzIG5vdCBxdWl0ZSBs
YXJnZSBlbm91Z2guDQorCQkgKiBMT05HID0gMTI4IHdvcmtzIE9LLiAgVW5o
aWRlYWJsZSBvdmVyaGVhZHMgYXJlIGFib3V0DQorCQkgKiAxMiBjeWNsZXMg
cGVyIGlubmVyIGxvb3AuICBBbGwgYXNzdW1pbmcgdGltaW5nIGxpa2UNCisJ
CSAqIEhhc3dlbGwuDQorCQkgKi8NCisJCWNyYyA9IGNyYzMyY19zaGlmdChj
cmMzMmNfbG9uZywgY3JjKSBeIGNyYzA7DQorCQljcmMxID0gY3JjMzJjX3No
aWZ0KGNyYzMyY19sb25nLCBjcmMxKTsNCisJCWNyYyA9IGNyYzMyY19zaGlm
dChjcmMzMmNfMmxvbmcsIGNyYykgXiBjcmMxOw0KKwkJY3JjMCA9IGNyYzI7
DQogCQluZXh0ICs9IExPTkcgKiAyOw0KIAkJbGVuIC09IExPTkcgKiAzOw0K
IAl9DQorCWNyYzAgXj0gY3JjOw0KKyNlbmRpZiAvKiBMT05HID4gU0hPUlQg
Ki8NCiANCiAJLyoNCiAJICogRG8gdGhlIHNhbWUgdGhpbmcsIGJ1dCBub3cg
b24gU0hPUlQqMyBibG9ja3MgZm9yIHRoZSByZW1haW5pbmcgZGF0YQ0KIAkg
KiBsZXNzIHRoYW4gYSBMT05HKjMgYmxvY2sNCiAJICovDQorCWNyYyA9IDA7
DQogCXdoaWxlIChsZW4gPj0gU0hPUlQgKiAzKSB7DQogCQljcmMxID0gMDsN
CiAJCWNyYzIgPSAwOw0KQEAgLTI1OSwxMSArMzQzLDE0IEBADQogI2VuZGlm
DQogCQkJbmV4dCArPSBhbGlnbjsNCiAJCX0gd2hpbGUgKG5leHQgPCBlbmQp
Ow0KLQkJY3JjMCA9IGNyYzMyY19zaGlmdChjcmMzMmNfc2hvcnQsIGNyYzAp
IF4gY3JjMTsNCi0JCWNyYzAgPSBjcmMzMmNfc2hpZnQoY3JjMzJjX3Nob3J0
LCBjcmMwKSBeIGNyYzI7DQorCQljcmMgPSBjcmMzMmNfc2hpZnQoY3JjMzJj
X3Nob3J0LCBjcmMpIF4gY3JjMDsNCisJCWNyYzEgPSBjcmMzMmNfc2hpZnQo
Y3JjMzJjX3Nob3J0LCBjcmMxKTsNCisJCWNyYyA9IGNyYzMyY19zaGlmdChj
cmMzMmNfMnNob3J0LCBjcmMpIF4gY3JjMTsNCisJCWNyYzAgPSBjcmMyOw0K
IAkJbmV4dCArPSBTSE9SVCAqIDI7DQogCQlsZW4gLT0gU0hPUlQgKiAzOw0K
IAl9DQorCWNyYzAgXj0gY3JjOw0KIA0KIAkvKiBDb21wdXRlIHRoZSBjcmMg
b24gdGhlIHJlbWFpbmluZyBieXRlcyBhdCBuYXRpdmUgd29yZCBzaXplLiAq
Lw0KIAllbmQgPSBuZXh0ICsgKGxlbiAtIChsZW4gJiAoYWxpZ24gLSAxKSkp
Ow0K

--0-952914049-1488248872=:2733--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170228121335.Q2733>