Shannonβs entropy
Definition (Discrete)
Let X be a discrete random variable on a finite set
π³={x1,β¦,xn}, with probability distribution
function p(x)=Pr(X=x). The entropy
H(X) of X is
defined as
H(X)=-βxβπ³p(x)logbp(x). | (1) |
The convention 0log0=0 is adopted in the definition. The logarithm is usually taken to the base 2, in which case the entropy is measured in βbits,β or to the base e, in which case H(X) is measured in βnats.β
If X and Y are random variables on π³ and π΄ respectively, the joint entropy of X and Y is
H(X,Y)=-β(x,y)βπ³Γπ΄p(x,y)logbp(x,y), |
where p(x,y) denote the joint distribution of X and Y.
Discussion
The Shannon entropy was first introduced by Shannon in 1948 in his
landmark paper βA Mathematical Theory of Communication.β The
entropy is a functional of the probability distribution function
p(x), and is sometime written as
H(p(x1),p(x2),β¦,p(xn)). |
It is noted that the entropy of X does not depend on the actual values of X, it only depends on p(x). The definition of Shannonβs entropy can be written as an expectation
H(X)=-E[logbp(X)]. |
The quantity -logbp(x) is interpreted as the information
content of the outcome xβπ³, and is also called the Hartley
information of x. Hence the Shannonβs entropy is the average
amount of information contained in random variable X, it is also
the uncertainty removed after the actual outcome of X is
revealed.
Characterization
We write H(X) as Hn(p1,β¦,pn). The Shannon entropy satisfies the following properties.
-
1.
For any n, Hn(p1,β¦,pn) is a continuous
and symmetric function on variables p1, p2,β¦,pn.
-
2.
Event of probability zero does not contribute to the entropy, i.e. for any n,
Hn+1(p1,β¦,pn,0)=Hn(p1,β¦,pn). -
3.
Entropy is maximized when the probability distribution is uniform. For all n,
Hn(p1,β¦,pn)β€Hn(1n,β¦,1n). This follows from Jensen inequality
,
H(X)=E[logb(1p(X))]β€logb(E[1p(X)])=logb(n). -
4.
If pij, 1β€iβ€m, 1β€jβ€n are non-negative real numbers summing up to one, and qi=βnj=1pij, then
Hmn(p11,β¦,pmn)=Hm(q1,β¦,qm)+mβi=1qiHn(pi1qi,β¦,pinqi). If we partition
the mn outcomes of the random experiment into m groups, each group contains n elements, we can do the experiment in two steps: first determine the group to which the actual outcome belongs to, and second find the outcome in this group. The probability that you will observe group i is qi. The conditional probability distribution function given group i is (pi1/qi,β¦,pin/qi). The entropy
Hn(pi1qi,β¦,pinqi) is the entropy of the probability distribution conditioned on group i. Property 4 says that the total information is the sum of the information you gain in the first step, Hm(q1,β¦,qm), and a weighted sum of the entropies conditioned on each group.
Khinchin in 1957 showed that the only function satisfying the
above assumptions is of the form:
Hn(p1,β¦,pn)=-knβi=1pilogpi |
where k is a positive constant, essentially a choice of unit of measure.
Definition (Continuous)
Entropy in the continuous case is called differential entropy (http://planetmath.org/DifferentialEntropy).
DiscussionβContinuous Entropy
Despite its seductively analogous form, continuous entropy cannot be obtained as a limiting case of discrete entropy.
We wish to obtain a generally finite measure as the βbin sizeβ goes to zero. In the discrete case, the bin size is the (implicit) width of each of the n (finite or infinite) bins/buckets/states whose probabilities are the pn. As we generalize to the continuous domain, we must make this width explicit.
To do this, start with a continuous function f discretized as shown in the figure:
As the figure indicates, by the mean-value theorem there exists a value xi in each bin such that
f(xi)Ξ=β«(i+1)ΞiΞf(x)πx | (2) |
and thus the integral of the function f can be approximated (in the Riemannian sense) by
β«β-βf(x)πx=lim | (3) |
where this limit and βbin size goes to zeroβ are equivalent.
We will denote
(4) |
and expanding the we have
(5) | ||||
(6) |
As , we have
(7) | ||||
(8) |
This leads us to our definition of the differential entropy (continuous entropy):
(9) |
Title | Shannonβs entropy |
Canonical name | ShannonsEntropy |
Date of creation | 2013-03-22 12:00:53 |
Last modified on | 2013-03-22 12:00:53 |
Owner | kshum (5987) |
Last modified by | kshum (5987) |
Numerical id | 26 |
Author | kshum (5987) |
Entry type | Definition |
Classification | msc 94A17 |
Synonym | entropy |
Synonym | Shannon entropy |
Related topic | Logarithm |
Related topic | ConditionalEntropy |
Related topic | MutualInformation |
Related topic | DifferentialEntropy |
Related topic | HartleyFunction |
Defines | joint entropy |