The Shannon entropy was first introduced by Shannon in 1948 in his landmark paper “A Mathematical Theory of Communication.” The entropy is a functional of the probability distribution function , and is sometime written as
It is noted that the entropy of does not depend on the actual values of , it only depends on . The definition of Shannon’s entropy can be written as an expectation
The quantity is interpreted as the information content of the outcome , and is also called the Hartley information of . Hence the Shannon’s entropy is the average amount of information contained in random variable , it is also the uncertainty removed after the actual outcome of is revealed.
Event of probability zero does not contribute to the entropy, i.e. for any ,
Entropy is maximized when the probability distribution is uniform. For all ,
This follows from Jensen inequality,
If we partition the outcomes of the random experiment into groups, each group contains elements, we can do the experiment in two steps: first determine the group to which the actual outcome belongs to, and second find the outcome in this group. The probability that you will observe group is . The conditional probability distribution function given group is . The entropy
is the entropy of the probability distribution conditioned on group . Property 4 says that the total information is the sum of the information you gain in the first step, , and a weighted sum of the entropies conditioned on each group.
Entropy in the continuous case is called differential entropy (http://planetmath.org/DifferentialEntropy).
Despite its seductively analogous form, continuous entropy cannot be obtained as a limiting case of discrete entropy.
We wish to obtain a generally finite measure as the “bin size” goes to zero. In the discrete case, the bin size is the (implicit) width of each of the (finite or infinite) bins/buckets/states whose probabilities are the . As we generalize to the continuous domain, we must make this width explicit.
To do this, start with a continuous function discretized as shown in the figure:
As the figure indicates, by the mean-value theorem there exists a value in each bin such that
and thus the integral of the function can be approximated (in the Riemannian sense) by
where this limit and “bin size goes to zero” are equivalent.
We will denote
and expanding the we have
As , we have
This leads us to our definition of the differential entropy (continuous entropy):
|Date of creation||2013-03-22 12:00:53|
|Last modified on||2013-03-22 12:00:53|
|Last modified by||kshum (5987)|