Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 21 Jul 2007 12:35:57 +1000
From:      Norberto Meijome <freebsd@meijome.net>
To:        James Long <list@museum.rain.com>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: speed of bzip2 versus gzip
Message-ID:  <20070721123557.715b38f7@localhost>
In-Reply-To: <20070721012455.GA5012@ns.umpquanet.com>
References:  <20070720220337.GA87174@ns.umpquanet.com> <20070721103710.1e16a319@localhost> <2BF10D44-4FB5-4F07-B515-553BC705B900@mac.com> <20070721012455.GA5012@ns.umpquanet.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 20 Jul 2007 18:24:55 -0700
James Long <list@museum.rain.com> wrote:

> On Fri, Jul 20, 2007 at 05:50:20PM -0700, Chuck Swiger wrote:
> > On Jul 20, 2007, at 5:37 PM, Norberto Meijome wrote:
> >>> Is it normal for bzip2 to be significantly slower than gzip?
> >>> If not, where can I look for things that might be causing
> >>> "bzip2 --fast" to take 50-60 times longer to compress a
> >>> (sendmail log) file than gzip?
> >> 
> >> i never measured it to see if it is 50-60 times slower, but yes, gzip 
> >> blows
> >> bzip2 out of the water on speed. I wanted to use bzip2 to compress 
> >> multi-GB
> >> weblog files, but gzip beat it my miles, and bzip2 wasn't THAT much better 
> >> @
> >> compressing it to make it worth it.
> > 
> > Thanks for the feedback, Norberto.
> > 
> > Of course, it all depends on what your priorities are, too-- if what you 
> > want is a final tarball which is being mirrored and downloaded frequently, 
> > then your goal is to obtain the absolute best compression, and how much CPU 
> > --best takes isn't important.
> >
> > Comparing the default (-5 compression?) of gzip to bzip2 would probably be 
> > more reasonable if you care about reasonably timely compression.
> 
> If I read the man page correctly, bzip2 defaults to --best, which is why
> I compared gzip to bzip2 --fast.  With the 1.5G sendmail log, bzip2 --fast 
> compresses to just under 10M in about 55 minutes, give or take.  bzip2
> --best compresses 1.5G to 1.8M, but takes about 2.25 hours.  gzip
> compresses almost as well (with 3% or so) as --fast, but does it in 1 
> minute instead of 55 on a dual P-III 1.4GHz (but of course, using only
> one CPU).

I don't have the exact numbers at hand, but yes, they were definitely in that range of crazy comparison.
BTW, i always compared using default bzip2 and gzip -9, because i was interested in making gzip work harder at achieving some more compression.

I ran some short tests... both systems are not doing much more than this simple test
Comparison using a 249 MB Apache web log file

First is my laptop running FreeBSD, single CPU.
2nd is a server with the same hardware as I had compressed those multi-GB log files in 2005...this one is running CentOS/64 bit. . I know, not Freebsd, but to see if there's a difference in the OS...
Both boxes have enough RAM to hold all the file in memory.

The numbers are quite similar, even given the difference in hardware...it may speak very well of FreeBSD speeds ;)

Compression ratios are the same in both Linux + FreeBSD, and Bzip2 compresses >THIS FILE< about 50% more than gzip -9

------------------


CPU: Intel(R) Pentium(R) M processor 2.00GHz (1995.02-MHz 686-class CPU)
1.5 GB RAM
$ uname -a
FreeBSD ayiin.octantis.com.au 6.2-STABLE FreeBSD 6.2-STABLE #12: Fri Jul 13 17:45:09 EST 2007     root@ayiin.octantis.com.au:/usr/obj/usr/src/sys/AYIIN  i386


$ time gzip -9 20070604-desktop.log 

real    0m13.373s
user    0m10.398s
sys     0m0.257s

[betom@ayiin] [Sat Jul 21 12:27:14 2007]
/usr/home/betom/Desktop
$ ls -lh 20070604-desktop.log.gz 
-rw-r--r--  1 betom  betom    11M Jul 21 12:17 20070604-desktop.log.gz

$ time gunzip ./20070604-desktop.log.gz 

real    0m13.926s
user    0m1.455s
sys     0m0.525s


$ time bzip2 20070604-desktop.log 

real    4m2.662s
user    3m21.184s
sys     0m0.321s

$ ls -lh 20070604-desktop.log.bz2 
-rw-r--r--  1 betom  betom   5.2M Jul 21 12:17 20070604-desktop.log.bz2


$ time bunzip2 20070604-desktop.log.bz2 

real    0m18.650s
user    0m13.922s
sys     0m0.794s


==================================================
Box 2 
# uname -a
Linux cerberus.octantis.com.au. 2.6.18-8.1.4.el5.centos.plus #1 SMP Sun May 20 10:53:21 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
CPU :  2 x model name      : AMD Opteron(tm) Processor 250
stepping        : 10
cpu MHz         : 2400.000
 4 GB RAM

[root@cerberus] [Sat 21 Jul 2007 12:22:39 PM EST]
~
# time gzip -9 20070604-desktop.log

real    0m7.818s
user    0m7.343s
sys     0m0.332s

[root@cerberus] [Sat 21 Jul 2007 12:22:56 PM EST]
~
# ls -lh 20070604-desktop.log.gz 
-rw-r--r-- 1 numard numard 11M Jul 21 12:09 20070604-desktop.log.gz

# time gunzip 20070604-desktop.log.gz 

real    0m2.502s
user    0m1.049s
sys     0m1.044s

# time bzip2 20070604-desktop.log 

real    3m22.587s
user    3m17.566s
sys     0m1.741s

[root@cerberus] [Sat 21 Jul 2007 12:29:19 PM EST]
~
# ls -lh 20070604-desktop.log.bz2 
-rw-r--r-- 1 numard numard 5.2M Jul 21 12:09 20070604-desktop.log.bz2

# time bunzip2 20070604-desktop.log.bz2 

real    0m17.544s
user    0m15.261s
sys     0m1.435s

_________________________
{Beto|Norberto|Numard} Meijome

"They redundantly repeated themselves over and over again incessantly without end ad infinitum"
   ibid.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070721123557.715b38f7>