From owner-freebsd-hackers Tue Mar 20 17:48:47 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from cube.gelatinous.com (cube.gelatinous.com [207.82.194.150]) by hub.freebsd.org (Postfix) with SMTP id 237A137B73D for ; Tue, 20 Mar 2001 17:48:44 -0800 (PST) (envelope-from aaron@mutex.org) Received: (qmail 53174 invoked by uid 1000); 21 Mar 2001 01:46:30 -0000 Date: Tue, 20 Mar 2001 17:46:30 -0800 From: Aaron Smith To: freebsd-hackers@freebsd.org Cc: jon@csua.berkeley.edu, breadbox@muppetlabs.com Subject: gzip's custom i386 asm should be disabled Message-ID: <20010320174630.B82004@gelatinous.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="0rSojgWGcpz+ezC3" Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG --0rSojgWGcpz+ezC3 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline gzip's i386 assembly code, activated by default in the FreeBSD source tree, produces poor performance on an i686 core (PPro/P2/P3). This is due to the 'partial register stall' problem, explained in a URL recently brought up on the list, http://www.emulators.com/pentium4.htm. In the course of learning more about partial register stalls I came across the following i686 and i586 assembly optimizations for gzip: http://www.muppetlabs.com/~breadbox/software/assembly.html. This optimized i686 asm avoids partial reg stall and is between 20-40% faster, with higher compression levels achieving greater benefit from the patch. The i586 patch is usually only 5% faster, but in some cases achieves a 25% speedup. For completeness, I also ran some tests on a non-asm gcc 2.95.2 compile, with and without -march=pentiumpro. Here are the results (three runs, averaged, caches warmed with some throwaway runs) on a Pentium II 400, linux-2.4.2.tar, --best. [type] [user secs] [time (as % of slowest)] i386 asm: 175 100% no asm, -O: 142 81.1% no asm, -O2: 139 79.4% no asm, -O -march=pentiumpro: 136 77.7% no asm, -O2 -march=pentiumpro: 140 80.0% i686 asm: 124 70.8% I'm interested in other people's results/tests. Particularly, I should do some runs with -mcpu=pentiumpro as well. An important part of the equation is to make sure it doesn't hurt i586 machines. I did several tests on a Pentium 200MMX; the i386 asm and the gcc-emitted asm are not measurably different on that CPU. Brian Raiter (breadbox@muppetlabs.com, author of the i586/i686 asm patches) has contacted the gzip maintainers, but it's been years since a release and there may not be another gzip release. I have seen a 1.2.4a release which had his files in a contrib/ directory, but they were not active in any way. Since I would imagine a large percentage of FreeBSD users run on i686 cores, it'd be great to get this pretty significant speed increase into our tree. The i686 patch is neat (30% faster!) but its improvement over gcc's emitted assembly is small. Disabling the old i386 assembly seems a good first step. Attached is a patch that disables the custom asm. I'm interested in hearing everyone's comments. Aaron --0rSojgWGcpz+ezC3 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=gzip-noasm-patch Index: Makefile =================================================================== RCS file: /usr/cvs/src/gnu/usr.bin/gzip/Makefile,v retrieving revision 1.21 diff -u -r1.21 Makefile --- Makefile 1999/08/27 23:35:48 1.21 +++ Makefile 2001/03/20 23:59:48 @@ -8,11 +8,6 @@ CFLAGS+=-DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DDIRENT=1 GREP_LIBZ?= YES -.if ${MACHINE_ARCH} == "i386" -SRCS+= match.S -CFLAGS+=-DASMV -.endif - MLINKS= gzip.1 gunzip.1 gzip.1 zcat.1 gzip.1 gzcat.1 MLINKS+= zdiff.1 zcmp.1 --0rSojgWGcpz+ezC3-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message