Shannon’s entropy
Definition (Discrete)
Let be a discrete random variable on a finite set , with probability distribution function . The entropy of is defined as
(1) |
The convention is adopted in the definition. The logarithm is usually taken to the base 2, in which case the entropy is measured in “bits,” or to the base e, in which case is measured in “nats.”
If and are random variables on and respectively, the joint entropy of and is
where denote the joint distribution of and .
Discussion
The Shannon entropy was first introduced by Shannon in 1948 in his landmark paper “A Mathematical Theory of Communication.” The entropy is a functional of the probability distribution function , and is sometime written as
It is noted that the entropy of does not depend on the actual values of , it only depends on . The definition of Shannon’s entropy can be written as an expectation
The quantity is interpreted as the information content of the outcome , and is also called the Hartley information of . Hence the Shannon’s entropy is the average amount of information contained in random variable , it is also the uncertainty removed after the actual outcome of is revealed.
Characterization
We write as . The Shannon entropy satisfies the following properties.
-
1.
For any , is a continuous and symmetric function on variables , .
-
2.
Event of probability zero does not contribute to the entropy, i.e. for any ,
-
3.
Entropy is maximized when the probability distribution is uniform. For all ,
This follows from Jensen inequality,
-
4.
If , , are non-negative real numbers summing up to one, and , then
If we partition the outcomes of the random experiment into groups, each group contains elements, we can do the experiment in two steps: first determine the group to which the actual outcome belongs to, and second find the outcome in this group. The probability that you will observe group is . The conditional probability distribution function given group is . The entropy
is the entropy of the probability distribution conditioned on group . Property 4 says that the total information is the sum of the information you gain in the first step, , and a weighted sum of the entropies conditioned on each group.
Khinchin in 1957 showed that the only function satisfying the above assumptions is of the form:
where is a positive constant, essentially a choice of unit of measure.
Definition (Continuous)
Entropy in the continuous case is called differential entropy (http://planetmath.org/DifferentialEntropy).
Discussion—Continuous Entropy
Despite its seductively analogous form, continuous entropy cannot be obtained as a limiting case of discrete entropy.
We wish to obtain a generally finite measure as the “bin size” goes to zero. In the discrete case, the bin size is the (implicit) width of each of the (finite or infinite) bins/buckets/states whose probabilities are the . As we generalize to the continuous domain, we must make this width explicit.
To do this, start with a continuous function discretized as shown in the figure:
As the figure indicates, by the mean-value theorem there exists a value in each bin such that
(2) |
and thus the integral of the function can be approximated (in the Riemannian sense) by
(3) |
where this limit and “bin size goes to zero” are equivalent.
We will denote
(4) |
and expanding the we have
(5) | ||||
(6) |
As , we have
(7) | ||||
(8) |
This leads us to our definition of the differential entropy (continuous entropy):
(9) |
Title | Shannon’s entropy |
Canonical name | ShannonsEntropy |
Date of creation | 2013-03-22 12:00:53 |
Last modified on | 2013-03-22 12:00:53 |
Owner | kshum (5987) |
Last modified by | kshum (5987) |
Numerical id | 26 |
Author | kshum (5987) |
Entry type | Definition |
Classification | msc 94A17 |
Synonym | entropy |
Synonym | Shannon entropy |
Related topic | Logarithm |
Related topic | ConditionalEntropy |
Related topic | MutualInformation |
Related topic | DifferentialEntropy |
Related topic | HartleyFunction |
Defines | joint entropy |