From owner-freebsd-current@FreeBSD.ORG Fri May 2 13:34:23 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DE98C37B405 for ; Fri, 2 May 2003 13:34:22 -0700 (PDT) Received: from vhost109.his.com (vhost109.his.com [216.194.225.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 07CA243FBD for ; Fri, 2 May 2003 13:34:12 -0700 (PDT) (envelope-from brad.knowles@skynet.be) Received: from [10.0.1.2] (localhost.his.com [127.0.0.1]) by vhost109.his.com (8.12.6p2/8.12.3) with ESMTP id h42KY6tS010645; Fri, 2 May 2003 16:34:07 -0400 (EDT) (envelope-from brad.knowles@skynet.be) Mime-Version: 1.0 X-Sender: bs663385@pop.skynet.be Message-Id: In-Reply-To: <20030502174352.GE18677@moghedien.mukappabeta.net> References: <20030502171957.28624.qmail@laurel.tmseck.homedns.org> <3EB2AC00.7070307@tcoip.com.br> <20030502174352.GE18677@moghedien.mukappabeta.net> Date: Fri, 2 May 2003 22:33:47 +0200 To: Matthias Buelow From: Brad Knowles Content-Type: text/plain; charset="us-ascii" ; format="flowed" cc: freebsd-current@freebsd.org Subject: Re: HEADS UP: bzip2(1) compression for manpages, Groff and Texinfo docs X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 May 2003 20:34:23 -0000 At 7:43 PM +0200 2003/05/02, Matthias Buelow wrote: > The two programs, however, only do the same thing if you consider > that they're both compressors. bzip2 eats much more resources than > gzip, both space and time. And the algorithm is rather overkill for > small files anyways. Granted, the space savings is not that much. I took /usr/share/man/man1 from a 4.6.2-RELEASE box and made three copies of it under /tmp/man, uncompressed all the files, and then re-compressed them using `compress`, `gzip -9`, and `bzip2`. Here's the results: % du * | sort -nr 4646 compress 3624 gzip 3422 bzip2 So, bzip2 is not that much of an improvement over gzip (~6%), but it is a fair improvement over compress (~35.7%). This is just one section of the man pages, and does not include the cat pages, but I figure it's probably fairly representative. I haven't looked at the stuff under /usr/share/info or /usr/share/doc. I'm not sure which of those files would be compressed and which ones wouldn't. These three directories comprise ~82MB of disk space, of which about 15MB is in /usr/share/man and about 64.6MB in /usr/share/doc. At the moment, it doesn't appear that the files in /usr/share/doc are compressed at all, so there might be significant storage savings there. I built a tarball from the /usr/share/doc hierarchy, and tried the three different compression programs on it. I know that compression on a tarball is going to be different from compression on individual files, but this should at least give us some idea. Anyway, here's the results: % ls -1s doc* | sort -nr 64368 doc.tar 22896 doc-compress.tar.Z 16080 doc-gzip.tar.gz 12032 doc-bzip2.tar.bz2 So, bzip2 result in a file about 18.6% of the size of the original, gzip does about 24.9%, and compress is only 35.5%. Relatively speaking, bzip2 results in a file that is about 74.8% the size of the version produced by `gzip -9`. Seeing as /usr/share/doc and /usr/share/info is not currently compressed (in 4.6.2-RELEASE), any compression algorithm would be a significant improvement. -- Brad Knowles, "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)