Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Jan 2007 00:00:34 GMT
From:      Mikhail Teterin <mi+kde@aldan.algebra.com>
To:        freebsd-bugs@FreeBSD.org
Subject:   Re: bin/106734: [patch] SSE2 optimization for bzip2/libbz2
Message-ID:  <200701100000.l0A00Y9S097998@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR bin/106734; it has been noted by GNATS.

From: Mikhail Teterin <mi+kde@aldan.algebra.com>
To: Julian Seward <jseward@acm.org>
Cc: bug-followup@freebsd.org
Subject: Re: bin/106734: [patch] SSE2 optimization for bzip2/libbz2
Date: Tue, 9 Jan 2007 18:34:36 -0500

 On Sunday 07 January 2007 00:08, Julian Seward wrote:
 = >         /* Load the bytes: */
 = >         n1 = (__m128i)_mm_loadu_pd((double *)(block + i1));
 = >         n2 = (__m128i)_mm_loadu_pd((double *)(block + i2));
 
 = > read beyond the end of the defined area of block.  block is
 = > defined for [0 .. nblock + BZ_N_OVERSHOOT - 1], but I think
 = > you are doing a SSE load at &block[nblock + BZ_N_OVERSHOOT - 2],
 = > hence loading 15 bytes of garbage.
 
 I don't think, that's quite right... Instead of processing 8 bytes at a time, 
 as the non-SSE code is doing, I'm comparing 16 at a time. Thus it is possible 
 for me to be over by exactly 8 sometimes...
 
 Anyway, the problem was stemming from my bumping i1 and i2 by 16 instead of 8 
 after the _initial check_ (which, in the quadrant-less case should not need 
 to be separate at all, actually). Sometimes _that_ would bring them over... I 
 think, the solution is to either bump up BZ_N_OVERSHOOT even further or check 
 and adjust i1 and i2:
 
 	if (i1 >= nblock)
 		i1 -= nblock;
 	if (i2 >= nblock)
 		i2 -= nblock;
 
 at the beginning, rather than the end of the loop. Having done that, I no 
 longer peek beyond the end of the block (according to gdb's conditional 
 breakpoints, at least).
 
 Please, check the new http://aldan.algebra.com/~mi/bz/blocksort-SSE2-patch-2
 
 Yours,
 
 	-mi
 
 P.S. The following gdb-script is what I used. Run as:
 
 	gdb -x x.txt bzip2
 
 x.txt:
 	break blocksort.c:516
 	cond 1 (i1 > nblock) || (i2 > nblock)
 	run -9 < /tmp/PLIST > /dev/null
 
 andjust the compression level, the input's location, and be sure to have 
 blocksort.o compiled with debug information, of course...



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200701100000.l0A00Y9S097998>