Fisher information matrix

Given a statistical model $\{f_{\textbf{X}}(\boldsymbol{x}\mid\boldsymbol{\theta})\}$ of a random vector X, the Fisher information matrix, $I$ , is the variance of the score function $U$ . So,

I=\operatorname{Var}[U].

If there is only one parameter involved, then $I$ is simply called the Fisher information or information of $f_{\textbf{X}}(\boldsymbol{x}\mid\theta)$ .

Remarks

•

If $f_{\textbf{X}}(\boldsymbol{x}\mid\boldsymbol{\theta})$ belongs to the exponential family, $I=\operatorname{E}\big{[}U^{\operatorname{T}}U\big{]}$ . Furthermore, with some regularity conditions imposed, we have

$I=-\operatorname{E}\Big{[}\frac{\partial U}{\partial\boldsymbol{\theta}}\Big{]}.$
•

As an example, the normal distribution, $N(\mu,\sigma^{2})$ , belongs to the exponential family and its log-likelihood function $\ell(\boldsymbol{\theta}\mid x)$ is

$-\frac{1}{2}\operatorname{ln}(2\pi\sigma^{2})-\frac{(x-\mu)^{2}}{2\sigma^{2}},$

where $\boldsymbol{\theta}=(\mu,\sigma^{2})$ . Then the score function $U(\boldsymbol{\theta})$ is given by

$\Big{(}\frac{\partial\ell}{\partial\mu},\frac{\partial\ell}{\partial\sigma^{2}% }\Big{)}=\Big{(}\frac{x-\mu}{\sigma^{2}},\frac{(x-\mu)^{2}}{2\sigma^{4}}-\frac% {1}{2\sigma^{2}}\Big{)}.$

Taking the derivative with respect to $\boldsymbol{\theta}$ , we have

$\frac{\partial U}{\partial\boldsymbol{\theta}}=\begin{pmatrix}\displaystyle{% \frac{\partial U_{1}}{\partial\mu}}&\displaystyle{\frac{\partial U_{2}}{% \partial\mu}}\\ \\ \displaystyle{\frac{\partial U_{1}}{\partial\sigma^{2}}}&\displaystyle{\frac{% \partial U_{2}}{\partial\sigma^{2}}}\\ \end{pmatrix}=\begin{pmatrix}\displaystyle{\frac{-1}{\sigma^{2}}}&% \displaystyle{-\frac{x-\mu}{\sigma^{4}}}\\ \\ \displaystyle{-\frac{x-\mu}{\sigma^{4}}}&\displaystyle{\frac{1}{2\sigma^{4}}-% \frac{(x-\mu)^{2}}{\sigma^{6}}}\end{pmatrix}.$

Therefore, the Fisher information matrix $I$ is

$-\operatorname{E}\Big{[}\frac{\partial U}{\partial\boldsymbol{\theta}}\Big{]}=% \frac{1}{2\sigma^{4}}\begin{pmatrix}2\sigma^{2}&0\\ 0&-1\end{pmatrix}.$
•

Now, in linear regression model with constant variance $\sigma^{2}$ , it can be shown that the Fisher information matrix $I$ is

$\frac{1}{\sigma^{2}}\textbf{X}^{\operatorname{T}}\textbf{X},$

where X is the design matrix of the regression model.
•

In general, the Fisher information meansures how much “information” is known about a parameter $\theta$ . If $T$ is an unbiased estimator of $\theta$ , it can be shown that

$\operatorname{Var}\big{[}T(X)\big{]}\geq\frac{1}{I(\theta)}$

This is known as the Cramer-Rao inequality, and the number $1/I(\theta)$ is known as the Cramer-Rao lower bound. The smaller the variance of the estimate of $\theta$ , the more information we have on $\theta$ . If there is more than one parameter, the above can be generalized by saying that

$\operatorname{Var}\big{[}T(X)\big{]}-I(\boldsymbol{\theta})^{-1}$

is positive semidefinite, where $I$ is the Fisher information matrix.

Title	Fisher information matrix
Canonical name	FisherInformationMatrix
Date of creation	2013-03-22 14:30:15
Last modified on	2013-03-22 14:30:15
Owner	CWoo (3771)
Last modified by	CWoo (3771)
Numerical id	14
Author	CWoo (3771)
Entry type	Definition
Classification	msc 62H99
Classification	msc 62B10
Classification	msc 62A01
Synonym	information matrix
Defines	Fisher information
Defines	information
Defines	Cramer-Rao inequality
Defines	Cramer-Rao lower bound