derivation of mutual information

The maximum likelihood estimater for mutual informationMathworldPlanetmath is identical (except for a scale factor) to the generalized log-likelihood ratio for multinomials and closely related to Pearson’s χ2 test. This implies that the distributionPlanetmathPlanetmathPlanetmath of observed values of mutual information computed using maximum likelihood estimates for probabilities is χ2 distributed except for that scaling factor.

In particular if we sample each of X and Y and combine the samples to form N tuples sampled from X×Y. Now define T(x,y) to be the total number of times the tuple (x,y) was observed. Further define T(x,*) to be the number of times that a tuple starting with x was observed and T(*,y) to be the number of times that a tuple ending with y was observed. Clearly, T(*,*) is just N, the number of tuples in the sample. From the definition, the generalized log-likelihood ratio test of independence for X and Y (based on the sample of tuples) is






This allows the log-likelihood ratio to be expressed in terms of row and column sums,


This reduces to the following expression in terms of maximum likelihood estimates of cell, row and column probabilities,


This can be rearranged into


where the hat indicates a maximum likelihood estimation of I(X;Y).

This also gives the asymptotic distribution of I^(X;Y) as 2N times a χ2 deviate.

Title derivation of mutual information
Canonical name DerivationOfMutualInformation
Date of creation 2013-03-22 15:13:38
Last modified on 2013-03-22 15:13:38
Owner tdunning (9331)
Last modified by tdunning (9331)
Numerical id 5
Author tdunning (9331)
Entry type Derivation
Classification msc 94A17