You are here
Home βΊShannon's entropy
Primary tabs
Shannonβs entropy
Definition (Discrete)
Let be a discrete random variable on a finite set , with probability distribution function . The entropy of is defined as
| (1) |
The convention is adopted in the definition. The logarithm is usually taken to the base 2, in which case the entropy is measured in βbits,β or to the base e, in which case is measured in βnats.β
If and are random variables on and respectively, the joint entropy of and is
where denote the joint distribution of and .
Discussion
The Shannon entropy was first introduced by Shannon in 1948 in his landmark paper βA Mathematical Theory of Communication.β The entropy is a functional of the probability distribution function , and is sometime written as
It is noted that the entropy of does not depend on the actual values of , it only depends on . The definition of Shannonβs entropy can be written as an expectation
The quantity is interpreted as the information content of the outcome , and is also called the Hartley information of . Hence the Shannonβs entropy is the average amount of information contained in random variable , it is also the uncertainty removed after the actual outcome of is revealed.
Characterization
We write as . The Shannon entropy satisfies the following properties.
1. For any , is a continuous and symmetric function on variables , .
2. 3. Entropy is maximized when the probability distribution is uniform. For all ,
This follows from Jensen inequality,
4. If , , are non-negative real numbers summing up to one, and , then
If we partition the outcomes of the random experiment into groups, each group contains elements, we can do the experiment in two steps: first determine the group to which the actual outcome belongs to, and second find the outcome in this group. The probability that you will observe group is . The conditional probability distribution function given group is . The entropy
is the entropy of the probability distribution conditioned on group . Property 4 says that the total information is the sum of the information you gain in the first step, , and a weighted sum of the entropies conditioned on each group.
Khinchin in 1957 showed that the only function satisfying the above assumptions is of the form:
where is a positive constant, essentially a choice of unit of measure.
Definition (Continuous)
Entropy in the continuous case is called differential entropy.
DiscussionβContinuous Entropy
Despite its seductively analogous form, continuous entropy cannot be obtained as a limiting case of discrete entropy.
We wish to obtain a generally finite measure as the βbin sizeβ goes to zero. In the discrete case, the bin size is the (implicit) width of each of the (finite or infinite) bins/buckets/states whose probabilities are the . As we generalize to the continuous domain, we must make this width explicit.
To do this, start with a continuous function discretized as shown in the figure:
x_i
\psfragf(x)
\psfragx
\psfragdelta
As the figure indicates, by the mean-value theorem there exists a value in each bin such that
| (2) |
and thus the integral of the function can be approximated (in the Riemannian sense) by
| (3) |
where this limit and βbin size goes to zeroβ are equivalent.
We will denote
| (4) |
and expanding the we have
| (5) | ||||
| (6) |
As , we have
| (7) | ||||
| (8) |
This leads us to our definition of the differential entropy (continuous entropy):
| (9) |
Mathematics Subject Classification
94A17 Measures of information, entropy- Forums
- Planetary Bugs
- HS/Secondary
- University/Tertiary
- Graduate/Advanced
- Industry/Practice
- Research Topics
- LaTeX help
- Math Comptetitions
- Math History
- Math Humor
- PlanetMath Comments
- PlanetMath System Updates and News
- PlanetMath help
- PlanetMath.ORG
- Strategic Communications Development
- The Math Pub
- Testing messages (ignore)
- Other useful stuff
Recent Activity
new question: Sorry to steal a few minutes of your time for this question, but i honestly don't know what else to do. by Whrazithar
new question: equality of the determinants of submatrices of an orthogonal matrix by ismayli
Jun 11
new correction: Typo by suitangi
Jun 2
new question: Creating another set with same cardinality. by hkkass
Jun 1
new image: ProblemOneRevised by unlord
new Education: Chapter II by rspuzio
May 31
new collection: The Calculus by Davis and Brenke by rspuzio
new question: Proofs by weixifan
new question: Summation Integration Question by trevor.nickle
May 27
new correction: typo+finite measure hypothesis by Filipe
Corrections
suggestion by akrowne β
Explicit definition in the continuous case. by Dr_Absentius β
classification by yark β
missing word by Mathprof β



Comments
proof
I suppose I should post the proof as well...later.
Re: proof
before someone else does =]
-apk
Proof - external link
The proof can eventually be found at
http://www.cs.brandeis.edu/~cs170a/Handouts/axiomatics.pdf