<?xml version="1.0" encoding="UTF-8"?>

<record version="1" id="2887">
 <title>mutual information</title>
 <name>MutualInformation</name>
 <created>2002-04-30 01:00:13</created>
 <modified>2002-04-30 01:00:13</modified>
 <type>Definition</type>
 <creator id="72" name="drummond"/>
 <author id="72" name="drummond"/>
 <classification>
	<category scheme="msc" code="94A17"/>
 </classification>
 <defines>
	<concept>information</concept>
	<concept>mutual information</concept>
 </defines>
 <synonyms>
	<synonym concept="mutual information" alias="information"/>
 </synonyms>
 <related>
	<object name="RelativeEntropy"/>
	<object name="Entropy"/>
	<object name="ShannonsTheoremEntropy"/>
	<object name="DynamicStream"/>
 </related>
 <preamble>\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{amsfonts}

% used for TeXing text within eps files
%\usepackage{psfrag}
% need this for including graphics (\includegraphics)
%\usepackage{graphicx}
% for neatly defining theorems and propositions
%\usepackage{amsthm}
% making logically defined graphics
%\usepackage{xypic} 

% there are many more packages, add them here as you need them

% define commands here
\newcommand{\md}{d}
\newcommand{\mv}[1]{\mathbf{#1}}	% matrix or vector
\newcommand{\mvt}[1]{\mv{#1}^{\mathrm{T}}}
\newcommand{\mvi}[1]{\mv{#1}^{-1}}
\newcommand{\mderiv}[1]{\frac{\md}{\md {#1}}} %d/dx
\newcommand{\mnthderiv}[2]{\frac{\md^{#2}}{\md {#1}^{#2}}} %d^n/dx
\newcommand{\mpderiv}[1]{\frac{\partial}{\partial {#1}}} %partial d^n/dx
\newcommand{\mnthpderiv}[2]{\frac{\partial^{#2}}{\partial {#1}^{#2}}} %partial d^n/dx
\newcommand{\borel}{\mathfrak{B}}
\newcommand{\integers}{\mathbb{Z}}
\newcommand{\rationals}{\mathbb{Q}}
\newcommand{\reals}{\mathbb{R}}
\newcommand{\complexes}{\mathbb{C}}
\newcommand{\naturals}{\mathbb{N}}
\newcommand{\defined}{:=}
\newcommand{\var}{\mathrm{var}}
\newcommand{\cov}{\mathrm{cov}}
\newcommand{\corr}{\mathrm{corr}}
\newcommand{\set}[1]{\left\{#1\right\}}
\newcommand{\powerset}[1]{\mathcal{P}(#1)}
\newcommand{\bra}[1]{\langle#1 \vert}
\newcommand{\ket}[1]{\vert \hspace{1pt}#1\rangle}
\newcommand{\braket}[2]{\langle #1 \ket{#2}}
\newcommand{\abs}[1]{\left|#1\right|}
\newcommand{\norm}[1]{\left|\left|#1\right|\right|}
\newcommand{\esssup}{\mathrm{ess\ sup}}
\newcommand{\Lspace}[1]{L^{#1}}
\newcommand{\Lone}{\Lspace{1}}
\newcommand{\Ltwo}{\Lspace{2}}
\newcommand{\Lp}{\Lspace{p}}
\newcommand{\Lq}{\Lspace{q}}
\newcommand{\Linf}{\Lspace{\infty}}
\newcommand{\sequence}[1]{\{#1\}}</preamble>
 <content>Let $(\Omega, \mathcal{F}, \mu)$ be a discrete probability space, and let $X$ and $Y$ be discrete random variables on $\Omega$.  

The \emph{mutual information} $I[X;Y]$, read as ``the mutual information of $X$ and $Y$,'' is defined as
\begin{align*}
I[X;Y] &amp;= \sum_{x \in \Omega}\sum_{y \in \Omega} \mu(X=x,Y=y) \log \frac{\mu(X=x,Y=y)}{\mu(X=x)\mu(Y=y)}\\
&amp;= D(\mu(x,y)||\mu(x)\mu(y)).
\end{align*}
where $D$ denotes the relative entropy.

Mutual information, or just information, is measured in bits if the logarithm is to the base 2, and in ``nats'' when using the natural logarithm.

\paragraph{Discussion}
The most obvious characteristic of mutual information is that it depends on both $X$ and $Y$.  There is no information in a vacuum---information is always \emph{about} something.  In this case, $I[X;Y]$ is the information in $X$ about $Y$.  As its name suggests, mutual information is symmetric, $I[X;Y] = I[Y;X]$, so any information $X$ carries about $Y$, $Y$ also carries about $X$.

The definition in terms of relative entropy gives a useful interpretation of $I[X;Y]$ as a kind of ``distance'' between the joint distribution $\mu(x,y)$ and the product distribution $\mu(x)\mu(y)$.  Recall, however, that relative entropy is not a true distance, so this is just a conceptual tool.  However, it does capture another intuitive notion of information.  Remember that for $X,Y$ independent, $\mu(x,y) = \mu(x)\mu(y)$.  Thus the relative entropy ``distance'' goes to zero, and we have $I[X;Y]=0$ as one would expect for independent random variables.

 A number of useful expressions, most apparent from the definition, relate mutual information to the entropy $H$:

\begin{align}
0 \le I[X;Y] &amp;\le H[X]\\
I[X;Y] &amp;= H[X] - H[X|Y]\\
I[X;Y] &amp;= H[X] + H[Y] - H[X,Y]\\
I[X;X] &amp;= H[X]\\
\end{align}

Recall that the entropy $H[X]$ quantifies our uncertainty about $X$.  The last line justifies the description of entropy as ``self-information.''

\paragraph{Historical Notes}
Mutual information, or simply information, was introduced by Shannon in his landmark 1948 paper ``A Mathematical Theory of Communication.''</content>
</record>
