# deviance

Background

In testing the fit of a generalized linear model $\mathcal{P}$ of some data (with response variable Y and explanatory variable(s) X), one way is to compare $\mathcal{P}$ with a similar model $\mathcal{P}_{0}$. By similarity we mean: given $\mathcal{P}$ with the response variable $Y_{i}\sim f_{Y_{i}}$ and link function $g$ such that $g(\operatorname{E}[Y_{i}])={\textbf{X}_{i}}^{\operatorname{T}}\boldsymbol{\beta}$, the model $\mathcal{P}_{0}$

1. 1.

is a generalized linear model of the same data,

2. 2.

has the response variable $Y$ distributed as $f_{Y}$, same as found in $\mathcal{P}$

3. 3.

has the same link function $g$ as found in $\mathcal{P}$, such that $g(\operatorname{E}[Y_{i}])={\textbf{X}_{i}}^{\operatorname{T}}\boldsymbol{% \beta_{0}}$

Notice that the only possible difference is found in the parameters $\boldsymbol{\beta}$.

It is desirable for this $\mathcal{P}_{0}$ to be served as a base model in case when more than one models are being assessed. Two possible candidates for $\mathcal{P}_{0}$ are the null model and the saturated model. The null model $\mathcal{P}_{null}$ is one in which only one parameter $\mu$ is used so that $g(\operatorname{E}[Y_{i}])=\mu$, all responses have the same predicted outcome. The saturated model $\mathcal{P}_{max}$ is the other extreme where the maximum number of parameters are used in the model so that the observed response values equal to the predicted response values exactly, $g(\operatorname{E}[Y_{i}])={\textbf{X}_{i}}^{\operatorname{T}}\boldsymbol{% \beta}_{max}=y_{i}$

Definition The deviance of a model $\mathcal{P}$ (generalized linear model) is given by

 $\operatorname{dev}(\mathcal{P})=2\big{[}\ell(\hat{\boldsymbol{\beta}}_{max}% \mid\textbf{y})-\ell(\hat{\boldsymbol{\beta}}\mid\textbf{y})\big{]},$

where $\ell$ is the log-likelihood function, $\hat{\boldsymbol{\beta}}$ is the MLE of the parameter vector $\boldsymbol{\beta}$ from $\mathcal{P}$ and $\hat{\boldsymbol{\beta}}_{max}$ is the MLE of parameter vector $\boldsymbol{\beta}_{max}$ from the saturated model $\mathcal{P}_{max}$.

Example For a normal or general linear model, where the link function is the identity:

 $\operatorname{E}[Y_{i}]={\textbf{x}_{i}}^{\operatorname{T}}\boldsymbol{\beta},$

where the $Y_{i}$’s are mutually independent and normally distributed as $N(\mu_{i},\sigma^{2})$. The log-likelihood function is given by

 $\ell(\boldsymbol{\beta}\mid\textbf{y})=-\frac{1}{2\sigma^{2}}\sum_{i=1}^{n}(y_% {i}-\mu_{i})^{2}-\frac{n\operatorname{ln}(2\pi\sigma^{2})}{2},$

where $\mu_{i}={\textbf{x}_{i}}^{\operatorname{T}}\boldsymbol{\beta}$ is the predicted response values, and $n$ is the number of observations.

For the model in question, suppose $\hat{\mu}_{i}={\textbf{X}_{i}}^{\operatorname{T}}\hat{\boldsymbol{\beta}}$ is the expected mean calculated from the maximum likelihood estimate $\hat{\boldsymbol{\beta}}$ of the parameter vector $\boldsymbol{\beta}$. So,

 $\ell(\hat{\boldsymbol{\beta}}\mid\textbf{y})=-\frac{1}{2\sigma^{2}}\sum_{i=1}^% {n}(y_{i}-\hat{\mu}_{i})^{2}-\frac{n\operatorname{ln}(2\pi\sigma^{2})}{2},$

For the saturated model $\mathcal{P}_{max}$, the predicted value $(\hat{\mu}_{max})_{i}$ = the observed response value $y_{i}$. Therefore,

 $\ell(\hat{\boldsymbol{\beta}}_{max}\mid\textbf{y})=-\frac{1}{2\sigma^{2}}\sum_% {i=1}^{n}(y_{i}-(\hat{\mu}_{max})_{i})^{2}-\frac{n\operatorname{ln}(2\pi\sigma% ^{2})}{2}=-\frac{n\operatorname{ln}(2\pi\sigma^{2})}{2}.$

So the deviance is

 $\operatorname{dev}(\mathcal{P})=2\big{[}\ell(\hat{\boldsymbol{\beta}}_{max}% \mid\textbf{y})-\ell(\hat{\boldsymbol{\beta}}\mid\textbf{y})\big{]}=\frac{1}{% \sigma^{2}}\sum_{i=1}^{n}(y_{i}-\hat{\mu}_{i})^{2},$

which is exactly the residual sum of squares, or RSS, used in regression models.

Remarks

• The deviance is necessarily non-negative.

• The distribution of the deviance is asymptotically a chi square distribution (http://planetmath.org/ChiSquaredRandomVariable) with $n-p$ degress of freedom, where $n$ is the number of observations and $p$ is the number of parameters in the model $\mathcal{P}$.

• If two generalized linear models $\mathcal{P}_{1}$ and $\mathcal{P}_{2}$ are nested, say $\mathcal{P}_{1}$ is nested within $\mathcal{P}_{2}$, we can perform hypothesis testing $H_{0}$: the model for the data is $\mathcal{P}_{1}$ with $p_{1}$ parameters, against $H_{1}$: the model for the data is the more general $\mathcal{P}_{2}$ with $p_{2}$ parameters, where $p_{1}. The deviance difference $\Delta$(dev)$=\operatorname{dev}(\mathcal{P}_{2})-\operatorname{dev}(\mathcal{P}_{1})$ can be used as a test statistic and it is approximately a chi square distribution with $p_{2}-p_{1}$ degrees of freedom.

## References

• 1 P. McCullagh and J. A. Nelder, Generalized Linear Models, Chapman & Hall/CRC, 2nd ed., London (1989).
• 2 A. J. Dobson, An Introduction to Generalized Linear Models, Chapman & Hall, 2nd ed. (2001).
Title deviance Deviance 2013-03-22 14:34:04 2013-03-22 14:34:04 CWoo (3771) CWoo (3771) 8 CWoo (3771) Definition msc 62J12 null model saturated model