Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 09 Jul 2002 15:37:23 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Artem Tepponen <temik@egartech.com>
Cc:        Garrett Wollman <wollman@lcs.mit.edu>, Mark Valentine <mark@thuvia.demon.co.uk>, arch@freebsd.org
Subject:   Re: Package system flaws?
Message-ID:  <3D2B65A3.ABB92114@mindspring.com>
References:  <5235EF9BAE6B7F4CB3735789EEF73B29074172@turtle.egar.egartech.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Artem Tepponen wrote:
> No, Terry. Dictionary locality works in a different way.
> gzipped tar will almost always win vs. tarred gzipped files.
> 10-15% from memory. Just a quick check:
> 
> -rw-r--r--    1 temik    develops 29020160 Jul  9 12:36 gcc-3.0.1.gz.tar
> -rw-r--r--    1 temik    develops 13821669 Jul  9 12:41 gcc-3.0.1.tar.bz2
> -rw-r--r--    1 temik    develops 18054324 Sep 24  2001 gcc-3.0.1.tar.gz
> -rw-r--r--    1 temik    develops 22746511 Jul  9 12:52 gcc-3.0.1.zip
> 
> Oops. I was wrong. >35% is a big difference. And bzip adds another 24%.
> But for binaries difference between gzip vs. bzip2 will be smaller.
> 
> This is quite simple check but the picture will remain the same
> for pretty any kind of data and hope that's enough to choose
> single tar.somez + header.
> 
> Will header be combined or in a different file is another question.

1)	"Most compression", not "all compression".

2)	LZW resets the dictionary every 12K.  This is the patented
	process that Terry Welch of Unisys introduced.  So your
	argument is only valid for a lot of small files who size
	is well under 12K, which have similar contents.

3)	I believe gzip and bzip were both written to get out from
	under the Unisys patent, and therefore do not compress as
	well as they could compress, even though Unisys has granted
	blanket royalty free use for certain applications which fall
	into this category.

4)	Nothing in my statement precludes maintaining the dictionary
	as a spanning set over a number of small files, per #2, while
	at the same time leaving the index uncompressed.

5)	Yes, I would expect that an uncompressed index would take
	more room than a compressed index.

6)	For most modern communications media, (including broad-band
	where a modulator/demodulator pair is used... e.g. cable modem)
	the modems involved include their own compression; usually a
	form of trellis encoding.

As a side note: compression of compressed data is useless, and usually,
in fact, counter-productive.  All of the format arguments I've been
making are predicated on non-CDROM distribution over some medium which
is two orders of magnitude or more slower than a local CDROM... and
which, by their very nature of having hardware compression, tend to not
benefit at all from compression anyway.  But even if your argument were
totally valid, then the compression you seek is going to come from the
link level compression on top of the data being transferred anyway.

NB: The Unisys patent expires on Dec 10th of this year, in any case,
so the only reason bzip/gzip wouldn't support using it after that is
religious.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D2B65A3.ABB92114>