derivation of mutual information
The maximum likelihood estimater for mutual information is identical (except for a scale factor) to the generalized log-likelihood ratio for multinomials and closely related to Pearson’s
χ2 test. This implies that the distribution
of observed values of mutual information computed using maximum likelihood estimates for probabilities is χ2 distributed except for that scaling factor.
In particular if we sample each of X and Y and combine the samples to form N tuples sampled from X×Y. Now define T(x,y) to be the total number of times the tuple (x,y) was observed. Further define T(x,*) to be the number of times that a tuple starting with x was observed and T(*,y) to be the number of times that a tuple ending with y was observed. Clearly, T(*,*) is just N, the number of tuples in the sample. From the definition, the generalized log-likelihood ratio test of independence for X and Y (based on the sample of tuples) is
-2logλ=2∑xyT(x,y)logπx|yμx |
where
πx|y=T(x,y)/∑xT(x,y) |
and
μx=T(x,*)/T(*,*) |
This allows the log-likelihood ratio to be expressed in terms of row and column sums,
-2logλ=2∑xyT(x,y)logT(x,y)T(*,*)T(x,*)T(*,y) |
This reduces to the following expression in terms of maximum likelihood estimates of cell, row and column probabilities,
-2logλ=2∑xyT(x,y)logπxyμ*yμx* |
This can be rearranged into
-2logλ=2N[∑xyπxylogπxy∑xμx*logμx*∑yμ*ylogμ*y]=2NˆI(X;Y) |
where the hat indicates a maximum likelihood estimation of I(X;Y).
This also gives the asymptotic distribution of ˆI(X;Y) as 2N times a χ2 deviate.
Title | derivation of mutual information |
---|---|
Canonical name | DerivationOfMutualInformation |
Date of creation | 2013-03-22 15:13:38 |
Last modified on | 2013-03-22 15:13:38 |
Owner | tdunning (9331) |
Last modified by | tdunning (9331) |
Numerical id | 5 |
Author | tdunning (9331) |
Entry type | Derivation |
Classification | msc 94A17 |