# Fisher information matrix

Given a statistical model $\{f_{\textbf{X}}(\boldsymbol{x}\mid\boldsymbol{\theta})\}$ of a random vector X, the Fisher information matrix, $I$, is the variance of the score function $U$. So,

 $I=\operatorname{Var}[U].$

If there is only one parameter involved, then $I$ is simply called the Fisher information or information of $f_{\textbf{X}}(\boldsymbol{x}\mid\theta)$.

Remarks

If $f_{\textbf{X}}(\boldsymbol{x}\mid\boldsymbol{\theta})$ belongs to the exponential family, $I=\operatorname{E}\big{[}U^{\operatorname{T}}U\big{]}$. Furthermore, with some regularity conditions imposed, we have

 $I=-\operatorname{E}\Big{[}\frac{\partial U}{\partial\boldsymbol{\theta}}\Big{]}.$
As an example, the normal distribution, $N(\mu,\sigma^{2})$, belongs to the exponential family and its log-likelihood function $\ell(\boldsymbol{\theta}\mid x)$ is

 $-\frac{1}{2}\operatorname{ln}(2\pi\sigma^{2})-\frac{(x-\mu)^{2}}{2\sigma^{2}},$

where $\boldsymbol{\theta}=(\mu,\sigma^{2})$. Then the score function $U(\boldsymbol{\theta})$ is given by

 $\Big{(}\frac{\partial\ell}{\partial\mu},\frac{\partial\ell}{\partial\sigma^{2}% }\Big{)}=\Big{(}\frac{x-\mu}{\sigma^{2}},\frac{(x-\mu)^{2}}{2\sigma^{4}}-\frac% {1}{2\sigma^{2}}\Big{)}.$

Taking the derivative with respect to $\boldsymbol{\theta}$, we have

 $\frac{\partial U}{\partial\boldsymbol{\theta}}=\begin{pmatrix}\displaystyle{% \frac{\partial U_{1}}{\partial\mu}}&\displaystyle{\frac{\partial U_{2}}{% \partial\mu}}\\ \\ \displaystyle{\frac{\partial U_{1}}{\partial\sigma^{2}}}&\displaystyle{\frac{% \partial U_{2}}{\partial\sigma^{2}}}\\ \end{pmatrix}=\begin{pmatrix}\displaystyle{\frac{-1}{\sigma^{2}}}&% \displaystyle{-\frac{x-\mu}{\sigma^{4}}}\\ \\ \displaystyle{-\frac{x-\mu}{\sigma^{4}}}&\displaystyle{\frac{1}{2\sigma^{4}}-% \frac{(x-\mu)^{2}}{\sigma^{6}}}\end{pmatrix}.$

Therefore, the Fisher information matrix $I$ is

 $-\operatorname{E}\Big{[}\frac{\partial U}{\partial\boldsymbol{\theta}}\Big{]}=% \frac{1}{2\sigma^{4}}\begin{pmatrix}2\sigma^{2}&0\\ 0&-1\end{pmatrix}.$
Now, in linear regression model with constant variance $\sigma^{2}$, it can be shown that the Fisher information matrix $I$ is

 $\frac{1}{\sigma^{2}}\textbf{X}^{\operatorname{T}}\textbf{X},$

where X is the design matrix of the regression model.

In general, the Fisher information meansures how much βinformationβ is known about a parameter $\theta$. If $T$ is an unbiased estimator of $\theta$, it can be shown that

 $\operatorname{Var}\big{[}T(X)\big{]}\geq\frac{1}{I(\theta)}$

This is known as the Cramer-Rao inequality, and the number $1/I(\theta)$ is known as the Cramer-Rao lower bound. The smaller the variance of the estimate of $\theta$, the more information we have on $\theta$. If there is more than one parameter, the above can be generalized by saying that

 $\operatorname{Var}\big{[}T(X)\big{]}-I(\boldsymbol{\theta})^{-1}$

is positive semidefinite, where $I$ is the Fisher information matrix.

