Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 2 May 2003 22:33:47 +0200
From:      Brad Knowles <brad.knowles@skynet.be>
To:        Matthias Buelow <mkb@mukappabeta.de>
Cc:        freebsd-current@freebsd.org
Subject:   Re: HEADS UP: bzip2(1) compression for manpages, Groff and Texinfo 	docs
Message-ID:  <a0521061bbad867d807d1@[10.0.1.2]>
In-Reply-To: <20030502174352.GE18677@moghedien.mukappabeta.net>
References:  <20030502171957.28624.qmail@laurel.tmseck.homedns.org> <3EB2AC00.7070307@tcoip.com.br> <20030502174352.GE18677@moghedien.mukappabeta.net>

next in thread | previous in thread | raw e-mail | index | archive | help
At 7:43 PM +0200 2003/05/02, Matthias Buelow wrote:

>  The two programs, however, only do the same thing if you consider
>  that they're both compressors.  bzip2 eats much more resources than
>  gzip, both space and time.  And the algorithm is rather overkill for
>  small files anyways.

	Granted, the space savings is not that much.  I took 
/usr/share/man/man1 from a 4.6.2-RELEASE box and made three copies of 
it under /tmp/man, uncompressed all the files, and then re-compressed 
them using `compress`, `gzip -9`, and `bzip2`.  Here's the results:

		% du * | sort -nr
		4646    compress
		3624    gzip
		3422    bzip2

	So, bzip2 is not that much of an improvement over gzip (~6%), but 
it is a fair improvement over compress (~35.7%).  This is just one 
section of the man pages, and does not include the cat pages, but I 
figure it's probably fairly representative.

	I haven't looked at the stuff under /usr/share/info or 
/usr/share/doc.  I'm not sure which of those files would be 
compressed and which ones wouldn't.  These three directories comprise 
~82MB of disk space, of which about 15MB is in /usr/share/man and 
about 64.6MB in /usr/share/doc.  At the moment, it doesn't appear 
that the files in /usr/share/doc are compressed at all, so there 
might be significant storage savings there.


	I built a tarball from the /usr/share/doc hierarchy, and tried 
the three different compression programs on it.  I know that 
compression on a tarball is going to be different from compression on 
individual files, but this should at least give us some idea.

	Anyway, here's the results:

		% ls -1s doc* | sort -nr
		 64368 doc.tar
		 22896 doc-compress.tar.Z
		 16080 doc-gzip.tar.gz
		 12032 doc-bzip2.tar.bz2

	So, bzip2 result in a file about 18.6% of the size of the 
original, gzip does about 24.9%, and compress is only 35.5%. 
Relatively speaking, bzip2 results in a file that is about 74.8% the 
size of the version produced by `gzip -9`.


	Seeing as /usr/share/doc and /usr/share/info is not currently 
compressed (in 4.6.2-RELEASE), any compression algorithm would be a 
significant improvement.

-- 
Brad Knowles, <brad.knowles@skynet.be>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
     -Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a0521061bbad867d807d1>