Date: Fri, 2 May 2003 22:33:47 +0200 From: Brad Knowles <brad.knowles@skynet.be> To: Matthias Buelow <mkb@mukappabeta.de> Cc: freebsd-current@freebsd.org Subject: Re: HEADS UP: bzip2(1) compression for manpages, Groff and Texinfo docs Message-ID: <a0521061bbad867d807d1@[10.0.1.2]> In-Reply-To: <20030502174352.GE18677@moghedien.mukappabeta.net> References: <20030502171957.28624.qmail@laurel.tmseck.homedns.org> <3EB2AC00.7070307@tcoip.com.br> <20030502174352.GE18677@moghedien.mukappabeta.net>
next in thread | previous in thread | raw e-mail | index | archive | help
At 7:43 PM +0200 2003/05/02, Matthias Buelow wrote: > The two programs, however, only do the same thing if you consider > that they're both compressors. bzip2 eats much more resources than > gzip, both space and time. And the algorithm is rather overkill for > small files anyways. Granted, the space savings is not that much. I took /usr/share/man/man1 from a 4.6.2-RELEASE box and made three copies of it under /tmp/man, uncompressed all the files, and then re-compressed them using `compress`, `gzip -9`, and `bzip2`. Here's the results: % du * | sort -nr 4646 compress 3624 gzip 3422 bzip2 So, bzip2 is not that much of an improvement over gzip (~6%), but it is a fair improvement over compress (~35.7%). This is just one section of the man pages, and does not include the cat pages, but I figure it's probably fairly representative. I haven't looked at the stuff under /usr/share/info or /usr/share/doc. I'm not sure which of those files would be compressed and which ones wouldn't. These three directories comprise ~82MB of disk space, of which about 15MB is in /usr/share/man and about 64.6MB in /usr/share/doc. At the moment, it doesn't appear that the files in /usr/share/doc are compressed at all, so there might be significant storage savings there. I built a tarball from the /usr/share/doc hierarchy, and tried the three different compression programs on it. I know that compression on a tarball is going to be different from compression on individual files, but this should at least give us some idea. Anyway, here's the results: % ls -1s doc* | sort -nr 64368 doc.tar 22896 doc-compress.tar.Z 16080 doc-gzip.tar.gz 12032 doc-bzip2.tar.bz2 So, bzip2 result in a file about 18.6% of the size of the original, gzip does about 24.9%, and compress is only 35.5%. Relatively speaking, bzip2 results in a file that is about 74.8% the size of the version produced by `gzip -9`. Seeing as /usr/share/doc and /usr/share/info is not currently compressed (in 4.6.2-RELEASE), any compression algorithm would be a significant improvement. -- Brad Knowles, <brad.knowles@skynet.be> "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a0521061bbad867d807d1>