# Fisher information matrix

Given a statistical model $\{f_{\textbf{X}}(\boldsymbol{x}\mid\boldsymbol{\theta})\}$ of a random vector X, the Fisher information matrix, $I$, is the variance of the score function $U$. So,

 $I=\operatorname{Var}[U].$

If there is only one parameter involved, then $I$ is simply called the Fisher information or information of $f_{\textbf{X}}(\boldsymbol{x}\mid\theta)$.

Remarks

• If $f_{\textbf{X}}(\boldsymbol{x}\mid\boldsymbol{\theta})$ belongs to the exponential family, $I=\operatorname{E}\big{[}U^{\operatorname{T}}U\big{]}$. Furthermore, with some regularity conditions imposed, we have

 $I=-\operatorname{E}\Big{[}\frac{\partial U}{\partial\boldsymbol{\theta}}\Big{]}.$
• As an example, the normal distribution, $N(\mu,\sigma^{2})$, belongs to the exponential family and its log-likelihood function $\ell(\boldsymbol{\theta}\mid x)$ is

 $-\frac{1}{2}\operatorname{ln}(2\pi\sigma^{2})-\frac{(x-\mu)^{2}}{2\sigma^{2}},$

where $\boldsymbol{\theta}=(\mu,\sigma^{2})$. Then the score function $U(\boldsymbol{\theta})$ is given by

 $\Big{(}\frac{\partial\ell}{\partial\mu},\frac{\partial\ell}{\partial\sigma^{2}% }\Big{)}=\Big{(}\frac{x-\mu}{\sigma^{2}},\frac{(x-\mu)^{2}}{2\sigma^{4}}-\frac% {1}{2\sigma^{2}}\Big{)}.$

Taking the derivative with respect to $\boldsymbol{\theta}$, we have

 $\frac{\partial U}{\partial\boldsymbol{\theta}}=\begin{pmatrix}\displaystyle{% \frac{\partial U_{1}}{\partial\mu}}&\displaystyle{\frac{\partial U_{2}}{% \partial\mu}}\\ \\ \displaystyle{\frac{\partial U_{1}}{\partial\sigma^{2}}}&\displaystyle{\frac{% \partial U_{2}}{\partial\sigma^{2}}}\\ \end{pmatrix}=\begin{pmatrix}\displaystyle{\frac{-1}{\sigma^{2}}}&% \displaystyle{-\frac{x-\mu}{\sigma^{4}}}\\ \\ \displaystyle{-\frac{x-\mu}{\sigma^{4}}}&\displaystyle{\frac{1}{2\sigma^{4}}-% \frac{(x-\mu)^{2}}{\sigma^{6}}}\end{pmatrix}.$

Therefore, the Fisher information matrix $I$ is

 $-\operatorname{E}\Big{[}\frac{\partial U}{\partial\boldsymbol{\theta}}\Big{]}=% \frac{1}{2\sigma^{4}}\begin{pmatrix}2\sigma^{2}&0\\ 0&-1\end{pmatrix}.$
• Now, in linear regression model with constant variance $\sigma^{2}$, it can be shown that the Fisher information matrix $I$ is

 $\frac{1}{\sigma^{2}}\textbf{X}^{\operatorname{T}}\textbf{X},$

where X is the design matrix of the regression model.

• In general, the Fisher information meansures how much “information” is known about a parameter $\theta$. If $T$ is an unbiased estimator of $\theta$, it can be shown that

 $\operatorname{Var}\big{[}T(X)\big{]}\geq\frac{1}{I(\theta)}$

This is known as the Cramer-Rao inequality, and the number $1/I(\theta)$ is known as the Cramer-Rao lower bound. The smaller the variance of the estimate of $\theta$, the more information we have on $\theta$. If there is more than one parameter, the above can be generalized by saying that

 $\operatorname{Var}\big{[}T(X)\big{]}-I(\boldsymbol{\theta})^{-1}$

is positive semidefinite, where $I$ is the Fisher information matrix.

 Title Fisher information matrix Canonical name FisherInformationMatrix Date of creation 2013-03-22 14:30:15 Last modified on 2013-03-22 14:30:15 Owner CWoo (3771) Last modified by CWoo (3771) Numerical id 14 Author CWoo (3771) Entry type Definition Classification msc 62H99 Classification msc 62B10 Classification msc 62A01 Synonym information matrix Defines Fisher information Defines information Defines Cramer-Rao inequality Defines Cramer-Rao lower bound