Date: Thu, 14 Dec 2006 16:46:30 -0500 (EST) From: "Mikhail T." <mi@aldan.algebra.com> To: FreeBSD-gnats-submit@FreeBSD.org Subject: bin/106734: SSE2 optimization for bzip2/libbz2 Message-ID: <200612142146.kBELkUxO024275@aldan.algebra.com> Resent-Message-ID: <200612142150.kBELo6wg033558@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 106734 >Category: bin >Synopsis: SSE2 optimization for bzip2/libbz2 >Confidential: no >Severity: non-critical >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: change-request >Submitter-Id: current-users >Arrival-Date: Thu Dec 14 21:50:06 GMT 2006 >Closed-Date: >Last-Modified: >Originator: Mikhail T. >Release: FreeBSD 6.2-PRERELEASE amd64 >Organization: Virtual Estates, Inc. >Environment: Intel's and AMD chips with SSE2 instructions. >Description: The patch below makes bzip2's blocksort routines use SSE2-registers to compare 16 bytes at a time. On both i386 and AMD chips I tested, the performance improvement ranges from 5% for the already compressed (.gz) files to 20% for the highly compressible system logs. The compressed files are byte-for-byte identical with those produced by the original bzip2. The changes are ifdef-ed by __SSE2__ and relies on the intrinsics available in GNU, Intel's, and Microsoft's compilers. No changes to Makefile(s) are necessary -- when targeting an SSE2-capable CPU (i.e. ``-march=opteron'' or ``-march=pentium4''), the __SSE2__ is set by the compiler. >How-To-Repeat: >Fix: The patch is available from http://aldan.algebra.com/~mi/bz/ The patch is not FreeBSD-specific, but was developed, tested, and timed on FreeBSD-6.x using both i386 and amd64. Feedback welcome. >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200612142146.kBELkUxO024275>