Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 8 May 2007 22:01:12 -0700
From:      "Ted Mittelstaedt" <tedm@toybox.placo.com>
To:        "Gary Kline" <kline@tao.thought.org>, <usleepless@gmail.com>
Cc:        FreeBSD Mailing List <freebsd-questions@freebsd.org>
Subject:   RE: Another slightly OT q...
Message-ID:  <BMEDLGAENEKCJFGODFOCKEAOCAAA.tedm@toybox.placo.com>
In-Reply-To: <20070509021840.GA41793@thought.org>

next in thread | previous in thread | raw e-mail | index | archive | help


> -----Original Message-----
> From: owner-freebsd-questions@freebsd.org
> [mailto:owner-freebsd-questions@freebsd.org]On Behalf Of Gary Kline
> Sent: Tuesday, May 08, 2007 7:19 PM
> To: usleepless@gmail.com
> Cc: Gary Kline; FreeBSD Mailing List
> Subject: Re: Another slightly OT q...
>
>
>
> 	So it *was* a hoax?  Rats.  Some weeks ago on Public
> 	Broadcasting, a few sentences were spoken on the potential of
> 	fractal geometry to achieve [I'm guessing] data-compression on
> 	the order of what Sloot was claiming.  So far, no one has figured
> 	it out.  It may be a dream... .
>

There's some cool math out there that explains all of this but I never liked
math, but it isn't necessary to know the math to understand the issue.  Just
consider the problem for a while and you will realize that the compression
ratio of a specific data stream varies dependent on the amount of repetition
in
the input datastream.  A perfectly unrandom datastream, like a constant
series of logical 1's, carries no information, but has a compression ratio
that is infinite.  A perfectly random datastream, on the other hand,
also carries no information, but has a compression ratio that is zero.
I believe that a datastream that is 50% of the way between either extreme
carries the most information, and I believe your typical datastream is much
closer to
the perfectly unrandom side than the perfectly random side, compression is
merely the process of pushing the randomness of the stream closer to the
random side.

Thus, if the input datastream is very close to the perfectly unrandom side -
meaning it has a very high amount of repetition in it, you can get some
pretty spectacular compression ratios.  But as you move closer to unrandom,
you carry less data.  So, the better applications emit datastreams that
are less unrandom, therefore compression does not work as well on them.

This of course is completely ignoring the other data issue, is the
application
data efficient to begin with?  For example, you can transfer about a page of
information in ASCII that consumes about 1K of data, that same page of
information in a MS Word file consumes a hundred times that amount of
space -
Word is therefore extremely inefficient with data.

Probably the worst offender of this are the news websites like www.cnn.com.
They insist on putting more and more news articles into videos rather than
just a couple screens of text.  I just do not see any benefit to the
consumer of a video of an interview with someone like George Bush,
when the video consists of 2 sentence fragments.  The entire story
could be written on a webpage, sans video.  Do they really think the
typical reader doesen't know what he looks like already?

I see this a lot with audio files, also.  For example, how many times have
you come across an .mp3 file that was of speech only - perhaps a professor's
lecture - that's been recorded in CD quality full stereo?  A .wav file
recorded at the lowest sampling rate in mono, which is perfectly acceptable
for speech, would be smaller.

Ted




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BMEDLGAENEKCJFGODFOCKEAOCAAA.tedm>