PlanetMath (more info)
 Math for the people, by the people. Sponsor PlanetMath
Encyclopedia | Requests | Forums | Docs | Wiki | Random | RSS  
Login
create new user
name:
pass:
forget your password?
Main Menu
Owner confidence rating: Very low Entry average rating: No information on entry rating
[parent] derivation of mutual information (Derivation)

The maximum likelihood estimater for mutual information is identical (except for a scale factor) to the generalized log-likelihood ratio for multinomials and closely related to Pearson's $\chi^2$ test. This implies that the distribution of observed values of mutual information computed using maximum likelihood estimates for probabilities is $\chi^2$ distributed except for that scaling factor.

In particular if we sample each of $X$ and $Y$ and combine the samples to form $N$ tuples sampled from $X \times Y$ . Now define $T(x, y)$ to be the total number of times the tuple $(x, y)$ was observed. Further define $T(x, *)$ to be the number of times that a tuple starting with $x$ was observed and $T(*, y)$ to be the number of times that a tuple ending with $y$ was observed. Clearly, $T(*, *)$ is just $N$ , the number of tuples in the sample. From the definition, the generalized log-likelihood ratio test of independence for $X$ and $Y$ (based on the sample of tuples) is \begin{equation*} -2 log \lambda = 2 \sum_{xy} T(x,y) \log \frac {\pi_{x|y}} {\mu_x} \end{equation*}where \begin{equation*} \pi_{x|y} = T(x,y) / \sum_x T(x,y) \end{equation*}and \begin{equation*} \mu_x = T(x,*) / T(*, *) \end{equation*} This allows the log-likelihood ratio to be expressed in terms of row and column sums, \begin{equation*} -2 log \lambda = 2 \sum_{xy} T(x,y) \log {\frac {T(x,y) T(*, *)} {T(x, *) T(*, y) } } \end{equation*}This reduces to the following expression in terms of maximum likelihood estimates of cell, row and column probabilities, \begin{equation*} -2 log \lambda = 2 \sum_{xy} T(x,y) \log {\frac {\pi_{xy}} { \mu_{*y} \mu_{x*} } } \end{equation*}This can be rearranged into \begin{equation*} -2 log \lambda = 2 N \left[ \sum_{xy} \pi_{xy} \log \pi_{xy} \sum_{x} \mu_{x*} \log \mu_{x*} \sum_{y} \mu_{*y} \log \mu_{*y} \right] = 2 N \hat I (X;Y) \end{equation*}where the hat indicates a maximum likelihood estimation of $I(X;Y)$ .

This also gives the asymptotic distribution of $\hat I(X;Y)$ as $2N$ times a $\chi^2$ deviate.




"derivation of mutual information" is owned by tdunning.
(view preamble | get metadata)

View style:


This object's parent.
Log in to rate this entry.
(view current ratings)

Cross-references: cell, expression, sums, column, row, terms, ratio test, number, tuples, factor, scaling, maximum likelihood estimates, distribution, implies, multinomials, ratio, scale factor, mutual information, likelihood

This is version 2 of derivation of mutual information, born on 2005-05-01, modified 2005-05-01.
Object id is 6994, canonical name is DerivationOfMutualInformation.
Accessed 2287 times total.

Classification:
AMS MSC94A17 (Information and communication, circuits :: Communication, information :: Measures of information, entropy)

Pending Errata and Addenda
None.
Discussion
Style: Expand: Order:
forum policy

No messages.

Interact
post | correct | update request | add example | add (any)