derivation of mutual information
The maximum likelihood estimater for mutual information is identical (except for a scale factor) to the generalized log-likelihood ratio for multinomials and closely related to Pearson’s test. This implies that the distribution of observed values of mutual information computed using maximum likelihood estimates for probabilities is distributed except for that scaling factor.
In particular if we sample each of and and combine the samples to form tuples sampled from . Now define to be the total number of times the tuple was observed. Further define to be the number of times that a tuple starting with was observed and to be the number of times that a tuple ending with was observed. Clearly, is just , the number of tuples in the sample. From the definition, the generalized log-likelihood ratio test of independence for and (based on the sample of tuples) is
This allows the log-likelihood ratio to be expressed in terms of row and column sums,
This reduces to the following expression in terms of maximum likelihood estimates of cell, row and column probabilities,
This can be rearranged into
where the hat indicates a maximum likelihood estimation of .
This also gives the asymptotic distribution of as times a deviate.
|Title||derivation of mutual information|
|Date of creation||2013-03-22 15:13:38|
|Last modified on||2013-03-22 15:13:38|
|Last modified by||tdunning (9331)|