PlanetMath (more info)
 Math for the people, by the people. Sponsor PlanetMath
Encyclopedia | Requests | Forums | Docs | Wiki | Random | RSS  
Login
create new user
name:
pass:
forget your password?
Main Menu
Owner confidence rating: Medium Entry average rating: No information on entry rating
Shannon's entropy (Definition)

Definition (Discrete)

Let $ X$ be a discrete random variable on a finite set $ \mathcal{X}=\{x_1,\ldots,x_n\}$, with probability distribution function $ p(x) = \Pr(X=x)$. The entropy $ H(X)$ of $ X$ is defined as

$\displaystyle H(X) = -\sum_{x \in \mathcal{X}} p(x) \log_b p(x).$ (1)

The convention $ 0 \log 0 =0$ is adopted in the definition. The logarithm is usually taken to the base 2, in which case the entropy is measured in ``bits,'' or to the base e, in which case $ H(X)$ is measured in ``nats.''

If $ X$ and $ Y$ are random variables on $ \mathcal{X}$ and $ \mathcal{Y}$ respectively, the joint entropy of $ X$ and $ Y$ is

$\displaystyle H(X,Y) = -\sum_{(x,y)\in \mathcal{X}\times \mathcal{Y}} p(x,y) \log_b p(x,y), $
where $ p(x,y)$ denote the joint distribution of $ X$ and $ Y$.

Discussion

The Shannon entropy was first introduced by Shannon in 1948 in his landmark paper ``A Mathematical Theory of Communication.'' The entropy is a functional of the probability distribution function $ p(x)$, and is sometime written as

$\displaystyle H(p(x_1), p(x_2),\ldots, p(x_n)). $
It is noted that the entropy of $ X$ does not depend on the actual values of $ X$, it only depends on $ p(x)$. The definition of Shannon's entropy can be written as an expectation

$\displaystyle H(X) = -E[\log_b p(X)]. $
The quantity $ -\log_b p(x)$ is interpreted as the information content of the outcome $ x\in\mathcal{X}$, and is also called the Hartley information of $ x$. Hence the Shannon's entropy is the average amount of information contained in random variable $ X$, it is also the uncertainty removed after the actual outcome of $ X$ is revealed.

Characterization

We write $ H(X)$ as $ H_n(p_1,\ldots,p_n)$. The Shannon entropy satisfies the following properties.

  1. For any $ n$, $ H_n(p_1,\ldots,p_n)$ is a continuous and symmetric function on variables $ p_1$, $ p_2,\ldots, p_n$.
  2. Event of probability zero does not contribute to the entropy, i.e. for any $ n$,

    $\displaystyle H_{n+1}(p_1,\ldots,p_n,0) = H_n(p_1,\ldots,p_n). $
  3. Entropy is maximized when the probability distribution is uniform. For all $ n$,

    $\displaystyle H_n(p_1,\ldots,p_n) \leq H_n\Big(\frac{1}{n},\ldots,\frac{1}{n} \Big). $
    This follows from Jensen inequality,

    $\displaystyle H(X) = E\Big[\log_b \Big( \frac{1}{p(X)}\Big) \Big] \leq \log_b \Big( E\Big[ \frac{1}{p(X)} \Big] \Big) = \log_b(n). $
  4. If $ p_{ij}$, $ 1\leq i \leq m$, $ 1\leq j \leq n$ are non-negative real numbers summing up to one, and $ q_i = \sum_{j=1}^n p_{ij}$, then

    $\displaystyle H_{mn}(p_{11},\ldots, p_{mn}) = H_m(q_1,\ldots,q_m) + \sum_{i=1}^m q_i H_n\Big(\frac{p_{i1}}{q_i},\ldots, \frac{p_{in}}{q_i} \Big). $
    If we partition the $ mn$ outcomes of the random experiment into $ m$ groups, each group contains $ n$ elements, we can do the experiment in two steps: first determine the group to which the actual outcome belongs to, and second find the outcome in this group. The probability that you will observe group $ i$ is $ q_i$. The conditional probability distribution function given group $ i$ is $ (p_{i1}/q_i,\ldots,p_{in}/q_i)$. The entropy

    $\displaystyle H_n\Big(\frac{p_{i1}}{q_i},\ldots, \frac{p_{in}}{q_i} \Big) $
    is the entropy of the probability distribution conditioned on group $ i$. Property 4 says that the total information is the sum of the information you gain in the first step, $ H_m(q_1,\ldots, q_m)$, and a weighted sum of the entropies conditioned on each group.

Khinchin in 1957 showed that the only function satisfying the above assumptions is of the form:

$\displaystyle H_n(p_1,\ldots,p_n) = -k \sum_{i=1}^n p_i \log p_i $
where $ k$ is a positive constant, essentially a choice of unit of measure.

Definition (Continuous)

Entropy in the continuous case is called differential entropy.

Discussion--Continuous Entropy

Despite its seductively analogous form, continuous entropy cannot be obtained as a limiting case of discrete entropy.

We wish to obtain a generally finite measure as the ``bin size'' goes to zero. In the discrete case, the bin size is the (implicit) width of each of the $ n$ (finite or infinite) bins/buckets/states whose probabilities are the $ p_n$. As we generalize to the continuous domain, we must make this width explicit.

To do this, start with a continuous function $ f$ discretized as shown in the figure:

Figure: Discretizing the function $ f$ into bins of width $ \Delta$
\includegraphics[width=\textwidth]{function-with-bins.eps}
As the figure indicates, by the mean-value theorem there exists a value $ x_i$ in each bin such that
$\displaystyle f(x_i) \Delta = \int_{i\Delta}^{(i+1)\Delta} f(x) dx$ (2)

and thus the integral of the function $ f$ can be approximated (in the Riemannian sense) by
$\displaystyle \int_{-\infty}^{\infty} f(x) dx = \lim_{\Delta \to 0} \sum_{i = -\infty}^{\infty} f(x_i) \Delta$ (3)

where this limit and ``bin size goes to zero'' are equivalent.

We will denote

$\displaystyle H^{\Delta} :=- \sum_{i=-\infty}^{\infty} \Delta f(x_i) \log \Delta f(x_i)$ (4)

and expanding the $ \log$ we have
$\displaystyle H^{\Delta}$ $\displaystyle = - \sum_{i=-\infty}^{\infty} \Delta f(x_i) \log \Delta f(x_i)$ (5)
  $\displaystyle = - \sum_{i=-\infty}^{\infty} \Delta f(x_i) \log f(x_i) -\sum_{i=-\infty}^{\infty} f(x_i) \Delta \log \Delta.$ (6)

As $ \Delta \to 0$, we have
$\displaystyle \sum_{i=-\infty}^{\infty} f(x_i) \Delta$ $\displaystyle \to \int f(x) dx = 1$   and (7)
$\displaystyle \sum_{i=-\infty}^{\infty} \Delta f(x_i) \log f(x_i)$ $\displaystyle \to \int f(x) \log f(x) dx$ (8)

This leads us to our definition of the differential entropy (continuous entropy):
$\displaystyle h[f] = \lim_{\Delta \to 0} \left[H^{\Delta} + \log \Delta\right] = -\int_{-\infty}^{\infty} f(x) \log f(x) dx.$ (9)




"Shannon's entropy" is owned by kshum. [ full author list (2) | owner history (2) ]
(view preamble | get metadata)

View style:

See Also: logarithm, conditional entropy, mutual information, differential entropy, Hartley function

Other names:  entropy, Shannon entropy
Also defines:  joint entropy
Keywords:  entropy, shannon, uncertainty, information
Log in to rate this entry.
(view current ratings)

Cross-references: differential entropy, equivalent, limit, integral, mean-value theorem, domain, infinite, width, size, finite, discrete, positive, distribution function, conditional probability, summing, real numbers, Jensen inequality, event, variables, function, continuous, contained, average, Hartley information, expectation, functional, joint distribution, random variables, base, logarithm, probability distribution function, finite set, discrete random variable
There are 6 references to this entry.

This is version 21 of Shannon's entropy, born on 2001-11-19, modified 2006-06-28.
Object id is 968, canonical name is ShannonsTheoremEntropy.
Accessed 48103 times total.

Classification:
AMS MSC94A17 (Information and communication, circuits :: Communication, information :: Measures of information, entropy)

Pending Errata and Addenda
None.
[ View all 5 ]
Discussion
Style: Expand: Order:
forum policy
proof by drummond on 2001-11-19 23:11:17
I suppose I should post the proof as well...later.
[ reply | up ]

Interact
post | correct | update request | add derivation | add example | add (any)