|
|
|
|
derivation of mutual information
|
(Derivation)
|
|
|
The maximum likelihood estimater for mutual information is identical (except for a scale factor) to the generalized log-likelihood ratio for multinomials and closely related to Pearson's $\chi^2$ test. This implies that the distribution of observed values of mutual information computed using
maximum likelihood estimates for probabilities is $\chi^2$ distributed except for that scaling factor.
In particular if we sample each of $X$ and $Y$ and combine the samples to form $N$ tuples sampled from $X \times Y$ . Now define $T(x, y)$ to be the total number of times the tuple $(x, y)$ was observed. Further define $T(x, *)$ to be the number of times that a tuple starting with $x$ was observed and $T(*, y)$ to be the number of times that a tuple ending with $y$ was observed. Clearly, $T(*, *)$ is just $N$ , the number of tuples in the sample. From the definition, the generalized
log-likelihood ratio test of independence for $X$ and $Y$ (based on the sample of tuples) is \begin{equation*} -2 log \lambda = 2 \sum_{xy} T(x,y) \log \frac {\pi_{x|y}} {\mu_x} \end{equation*}where \begin{equation*} \pi_{x|y} = T(x,y) / \sum_x T(x,y) \end{equation*}and \begin{equation*} \mu_x = T(x,*) / T(*, *) \end{equation*} This allows the log-likelihood ratio to be expressed in terms of row and column sums, \begin{equation*} -2 log \lambda = 2 \sum_{xy} T(x,y) \log {\frac {T(x,y) T(*, *)} {T(x, *) T(*, y) } } \end{equation*}This reduces to the following expression in terms of maximum likelihood estimates of cell, row and column probabilities, \begin{equation*} -2 log \lambda = 2 \sum_{xy} T(x,y) \log {\frac {\pi_{xy}} { \mu_{*y} \mu_{x*} } } \end{equation*}This can be rearranged into \begin{equation*} -2 log \lambda = 2 N \left[ \sum_{xy} \pi_{xy} \log \pi_{xy} \sum_{x} \mu_{x*} \log \mu_{x*} \sum_{y} \mu_{*y} \log \mu_{*y} \right] = 2 N \hat I (X;Y) \end{equation*}where the hat indicates a maximum likelihood estimation of $I(X;Y)$ .
This also gives the asymptotic distribution of $\hat I(X;Y)$ as $2N$ times a $\chi^2$ deviate.
|
"derivation of mutual information" is owned by tdunning.
|
|
(view preamble | get metadata)
Cross-references: cell, expression, sums, column, row, terms, ratio test, number, tuples, factor, scaling, maximum likelihood estimates, distribution, implies, multinomials, ratio, scale factor, mutual information, likelihood
This is version 2 of derivation of mutual information, born on 2005-05-01, modified 2005-05-01.
Object id is 6994, canonical name is DerivationOfMutualInformation.
Accessed 2287 times total.
Classification:
| AMS MSC: | 94A17 (Information and communication, circuits :: Communication, information :: Measures of information, entropy) |
|
|
|
|
|
|
Pending Errata and Addenda
|
|
|
|
|
|
|
|
|
|
|