deviance

Background

In testing the fit of a generalized linear model $\mathcal{P}$ of some data (with response variable Y and explanatory variable(s) X), one way is to compare $\mathcal{P}$ with a similar model $\mathcal{P}_{0}$ . By similarity we mean: given $\mathcal{P}$ with the response variable $Y_{i}\sim f_{Y_{i}}$ and link function $g$ such that $g(\operatorname{E}[Y_{i}])={\textbf{X}_{i}}^{\operatorname{T}}\boldsymbol{\beta}$ , the model $\mathcal{P}_{0}$

1.

is a generalized linear model of the same data,
2.

has the response variable $Y$ distributed as $f_{Y}$ , same as found in $\mathcal{P}$
3.

has the same link function $g$ as found in $\mathcal{P}$ , such that $g(\operatorname{E}[Y_{i}])={\textbf{X}_{i}}^{\operatorname{T}}\boldsymbol{% \beta_{0}}$

Notice that the only possible difference is found in the parameters $\boldsymbol{\beta}$ .

It is desirable for this $\mathcal{P}_{0}$ to be served as a base model in case when more than one models are being assessed. Two possible candidates for $\mathcal{P}_{0}$ are the null model and the saturated model. The null model $\mathcal{P}_{null}$ is one in which only one parameter $\mu$ is used so that $g(\operatorname{E}[Y_{i}])=\mu$ , all responses have the same predicted outcome. The saturated model $\mathcal{P}_{max}$ is the other extreme where the maximum number of parameters are used in the model so that the observed response values equal to the predicted response values exactly, $g(\operatorname{E}[Y_{i}])={\textbf{X}_{i}}^{\operatorname{T}}\boldsymbol{% \beta}_{max}=y_{i}$

Definition The deviance of a model $\mathcal{P}$ (generalized linear model) is given by

$\operatorname{dev}(\mathcal{P})=2\big{[}\ell(\hat{\boldsymbol{\beta}}_{max}% \mid\textbf{y})-\ell(\hat{\boldsymbol{\beta}}\mid\textbf{y})\big{]},$

where $\ell$ is the log-likelihood function, $\hat{\boldsymbol{\beta}}$ is the MLE of the parameter vector $\boldsymbol{\beta}$ from $\mathcal{P}$ and $\hat{\boldsymbol{\beta}}_{max}$ is the MLE of parameter vector $\boldsymbol{\beta}_{max}$ from the saturated model $\mathcal{P}_{max}$ .

Example For a normal or general linear model, where the link function is the identity:

$\operatorname{E}[Y_{i}]={\textbf{x}_{i}}^{\operatorname{T}}\boldsymbol{\beta},$

where the $Y_{i}$ ’s are mutually independent and normally distributed as $N(\mu_{i},\sigma^{2})$ . The log-likelihood function is given by

$\ell(\boldsymbol{\beta}\mid\textbf{y})=-\frac{1}{2\sigma^{2}}\sum_{i=1}^{n}(y_% {i}-\mu_{i})^{2}-\frac{n\operatorname{ln}(2\pi\sigma^{2})}{2},$

where $\mu_{i}={\textbf{x}_{i}}^{\operatorname{T}}\boldsymbol{\beta}$ is the predicted response values, and $n$ is the number of observations.

For the model in question, suppose $\hat{\mu}_{i}={\textbf{X}_{i}}^{\operatorname{T}}\hat{\boldsymbol{\beta}}$ is the expected mean calculated from the maximum likelihood estimate $\hat{\boldsymbol{\beta}}$ of the parameter vector $\boldsymbol{\beta}$ . So,

$\ell(\hat{\boldsymbol{\beta}}\mid\textbf{y})=-\frac{1}{2\sigma^{2}}\sum_{i=1}^% {n}(y_{i}-\hat{\mu}_{i})^{2}-\frac{n\operatorname{ln}(2\pi\sigma^{2})}{2},$

For the saturated model $\mathcal{P}_{max}$ , the predicted value $(\hat{\mu}_{max})_{i}$ = the observed response value $y_{i}$ . Therefore,

$\ell(\hat{\boldsymbol{\beta}}_{max}\mid\textbf{y})=-\frac{1}{2\sigma^{2}}\sum_% {i=1}^{n}(y_{i}-(\hat{\mu}_{max})_{i})^{2}-\frac{n\operatorname{ln}(2\pi\sigma% ^{2})}{2}=-\frac{n\operatorname{ln}(2\pi\sigma^{2})}{2}.$

So the deviance is

$\operatorname{dev}(\mathcal{P})=2\big{[}\ell(\hat{\boldsymbol{\beta}}_{max}% \mid\textbf{y})-\ell(\hat{\boldsymbol{\beta}}\mid\textbf{y})\big{]}=\frac{1}{% \sigma^{2}}\sum_{i=1}^{n}(y_{i}-\hat{\mu}_{i})^{2},$

which is exactly the residual sum of squares, or RSS, used in regression models.

Remarks

•

The deviance is necessarily non-negative.
•

The distribution of the deviance is asymptotically a chi square distribution (http://planetmath.org/ChiSquaredRandomVariable) with $n-p$ degress of freedom, where $n$ is the number of observations and $p$ is the number of parameters in the model $\mathcal{P}$ .
•

If two generalized linear models $\mathcal{P}_{1}$ and $\mathcal{P}_{2}$ are nested, say $\mathcal{P}_{1}$ is nested within $\mathcal{P}_{2}$ , we can perform hypothesis testing $H_{0}$ : the model for the data is $\mathcal{P}_{1}$ with $p_{1}$ parameters, against $H_{1}$ : the model for the data is the more general $\mathcal{P}_{2}$ with $p_{2}$ parameters, where $p_{1}<p_{2}$ . The deviance difference $\Delta$ (dev) $=\operatorname{dev}(\mathcal{P}_{2})-\operatorname{dev}(\mathcal{P}_{1})$ can be used as a test statistic and it is approximately a chi square distribution with $p_{2}-p_{1}$ degrees of freedom.

References

1 P. McCullagh and J. A. Nelder, Generalized Linear Models, Chapman & Hall/CRC, 2nd ed., London (1989).
2 A. J. Dobson, An Introduction to Generalized Linear Models, Chapman & Hall, 2nd ed. (2001).

Title	deviance
Canonical name	Deviance
Date of creation	2013-03-22 14:34:04
Last modified on	2013-03-22 14:34:04
Owner	CWoo (3771)
Last modified by	CWoo (3771)
Numerical id	8
Author	CWoo (3771)
Entry type	Definition
Classification	msc 62J12
Defines	null model
Defines	saturated model