Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 26 Jul 2000 03:50:26 -0700 (PDT)
From:      Kris Kennaway <kris@FreeBSD.org>
To:        Mark Murray <mark@grondar.za>
Cc:        arch@FreeBSD.org
Subject:   Re: Estimating entropy 
Message-ID:  <Pine.BSF.4.21.0007260333360.95417-100000@freefall.freebsd.org>
In-Reply-To: <200007261019.MAA00605@grimreaper.grondar.za>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 26 Jul 2000, Mark Murray wrote:

> > 2) Keep a frequency table and calculate or estimate the shannon entropy
> > periodically. This may be feasible if we treat the samples as 8-bit
> > sources, as you only have to loop over 256 values and calculate a log_2 of
> > the probabilities (although lack of FP in the kernel would complicate
> > this)
> 
> I have been looking for articles on Shannon entropy; all I can find
> is a theorem that covers ergodic systems. Do you have any online
> references?

It's pretty simple: if you have an information source which outputs data
with a given probability distribution p(x) (technically speaking a
statistical ensemble of sources, in order to define the probability) then
the entropy content is

S = - \Int_x dx p(x) ln_2 p(x)

Where \Int_x is an integral over x.

If x is a discrete variable (as it is in most/all of the cases we'll be
looking at) then replace \Int_x dx with \Sigma_x (i.e. sum over all
possible output values)

Basically, (Shannon) entropy is a measure (in bits) of how much
information you learn by performing the measurement, given that some
measurements are more likely than others and therefore not as much of a
"surprise" (you probably knew this, but for the benefit of anyone else who
might not..)

It's also the absolute minimum length a given piece of data can be
compressed to by a general-purpose, "perfect" compression algorithm, which
is why gzip is a decent metric for it.

The requirement about ergodic systems means basically that the way a
single system behaves if you watch it for a long period of time is the
same as a large number of identical systems (in different states) behave
when observed at a single instant (the aformentioned statistical ensemble
assumption). There's also another implicit assumption about a "stationary"
distribution which basically means it's not time-varying: if you have a
probability distribution which changes over time then shannon entropy will
not be a true measure of the information content of the source, because it
doesn't take into account temporal patterns which may decrease the actual
information content.

See the code I posted a few days ago for a C implementation which
calculates the entropy of the PCM input which should make the above
clearer.

Kris

--
In God we Trust -- all others must submit an X.509 certificate.
    -- Charles Forsythe <forsythe@alum.mit.edu>



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0007260333360.95417-100000>