Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 9 Jul 2002 23:53:45 -0700 (PDT)
From:      Don Lewis <dl-freebsd@catspoiler.org>
To:        tlambert2@mindspring.com
Cc:        temik@egartech.com, wollman@lcs.mit.edu, mark@thuvia.demon.co.uk, arch@FreeBSD.ORG
Subject:   Re: Package system flaws?
Message-ID:  <200207100653.g6A6rjwr006212@gw.catspoiler.org>
In-Reply-To: <3D2B65A3.ABB92114@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On  9 Jul, Terry Lambert wrote:
> Artem Tepponen wrote:
>> No, Terry. Dictionary locality works in a different way.
>> gzipped tar will almost always win vs. tarred gzipped files.
>> 10-15% from memory. Just a quick check:
>> 
>> -rw-r--r--    1 temik    develops 29020160 Jul  9 12:36 gcc-3.0.1.gz.tar
>> -rw-r--r--    1 temik    develops 13821669 Jul  9 12:41 gcc-3.0.1.tar.bz2
>> -rw-r--r--    1 temik    develops 18054324 Sep 24  2001 gcc-3.0.1.tar.gz
>> -rw-r--r--    1 temik    develops 22746511 Jul  9 12:52 gcc-3.0.1.zip
>> 
>> Oops. I was wrong. >35% is a big difference. And bzip adds another 24%.
>> But for binaries difference between gzip vs. bzip2 will be smaller.
>> 
>> This is quite simple check but the picture will remain the same
>> for pretty any kind of data and hope that's enough to choose
>> single tar.somez + header.
>> 
>> Will header be combined or in a different file is another question.
> 
> 1)	"Most compression", not "all compression".
> 
> 2)	LZW resets the dictionary every 12K.  This is the patented
> 	process that Terry Welch of Unisys introduced.  So your
> 	argument is only valid for a lot of small files who size
> 	is well under 12K, which have similar contents.
> 
> 3)	I believe gzip and bzip were both written to get out from
> 	under the Unisys patent, and therefore do not compress as
> 	well as they could compress, even though Unisys has granted
> 	blanket royalty free use for certain applications which fall
> 	into this category.

In the comparisons I've done between gzip(1) and compress(1), gzip has
always gotten better compression that compress, though gzip runs slower.
When I've tuned the compression level knob on gzip to get similar
compression levels, it runs faster than compress.

The algorithm.doc file in the gzip distribution seems to indicate that
gzip resets its dictionary when it decides it would be advantageous to
do so.

I've read somewhere a long time ago that the compression results would
be better if it used arithmetic encoding on its output instead of
Huffman encoding, but I believe that IBM has the patent on arithmetic
encoding.

> NB: The Unisys patent expires on Dec 10th of this year, in any case,
> so the only reason bzip/gzip wouldn't support using it after that is
> religious.

I was just thinking the other day that the patent expiration date should
be approaching.  I believe that the area most impacted by this patent in
recent years is the creation of .gif files.  At least that's what has
gotten all the press.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200207100653.g6A6rjwr006212>