deviance
Background
In testing the fit of a generalized linear model $\mathcal{P}$ of some data (with response variable Y and explanatory variable(s) X), one way is to compare $\mathcal{P}$ with a similar^{} model ${\mathcal{P}}_{0}$. By similarity we mean: given $\mathcal{P}$ with the response variable ${Y}_{i}\sim {f}_{{Y}_{i}}$ and link function $g$ such that $g(\mathrm{E}[{Y}_{i}])=\text{\mathbf{X}}_{i}{}^{\mathrm{T}}\bm{\beta}$, the model ${\mathcal{P}}_{0}$

1.
is a generalized linear model of the same data,

2.
has the response variable $Y$ distributed as ${f}_{Y}$, same as found in $\mathcal{P}$

3.
has the same link function $g$ as found in $\mathcal{P}$, such that $g(\mathrm{E}[{Y}_{i}])=\text{\mathbf{X}}_{i}{}^{\mathrm{T}}{\bm{\beta}}_{\mathrm{\U0001d7ce}}$
Notice that the only possible difference is found in the parameters $\bm{\beta}$.
It is desirable for this ${\mathcal{P}}_{0}$ to be served as a base model in case when more than one models are being assessed. Two possible candidates for ${\mathcal{P}}_{0}$ are the null model and the saturated model. The null model ${\mathcal{P}}_{null}$ is one in which only one parameter $\mu $ is used so that $g(\mathrm{E}[{Y}_{i}])=\mu $, all responses have the same predicted outcome. The saturated model ${\mathcal{P}}_{max}$ is the other extreme where the maximum number of parameters are used in the model so that the observed response values equal to the predicted response values exactly, $g(\mathrm{E}[{Y}_{i}])=\text{\mathbf{X}}_{i}{}^{\mathrm{T}}{\bm{\beta}}_{max}={y}_{i}$
Definition The deviance of a model $\mathcal{P}$ (generalized linear model) is given by
$$\mathrm{dev}(\mathcal{P})=2[\mathrm{\ell}({\widehat{\bm{\beta}}}_{max}\mid \text{\mathbf{y}})\mathrm{\ell}(\widehat{\bm{\beta}}\mid \text{\mathbf{y}})],$$ 
where $\mathrm{\ell}$ is the loglikelihood function^{}, $\widehat{\bm{\beta}}$ is the MLE of the parameter vector $\bm{\beta}$ from $\mathcal{P}$ and ${\widehat{\bm{\beta}}}_{max}$ is the MLE of parameter vector ${\bm{\beta}}_{max}$ from the saturated model ${\mathcal{P}}_{max}$.
Example For a normal or general linear model, where the link function is the identity:
$$\mathrm{E}[{Y}_{i}]=\text{\mathbf{x}}_{i}{}^{\mathrm{T}}\bm{\beta},$$ 
where the ${Y}_{i}$’s are mutually independent^{} and normally distributed as $N({\mu}_{i},{\sigma}^{2})$. The loglikelihood function is given by
$$\mathrm{\ell}(\bm{\beta}\mid \text{\mathbf{y}})=\frac{1}{2{\sigma}^{2}}\sum _{i=1}^{n}{({y}_{i}{\mu}_{i})}^{2}\frac{n\mathrm{ln}(2\pi {\sigma}^{2})}{2},$$ 
where ${\mu}_{i}=\text{\mathbf{x}}_{i}{}^{\mathrm{T}}\bm{\beta}$ is the predicted response values, and $n$ is the number of observations.
For the model in question, suppose ${\widehat{\mu}}_{i}=\text{\mathbf{X}}_{i}{}^{\mathrm{T}}\widehat{\bm{\beta}}$ is the expected mean calculated from the maximum likelihood estimate $\widehat{\bm{\beta}}$ of the parameter vector $\bm{\beta}$. So,
$$\mathrm{\ell}(\widehat{\bm{\beta}}\mid \text{\mathbf{y}})=\frac{1}{2{\sigma}^{2}}\sum _{i=1}^{n}{({y}_{i}{\widehat{\mu}}_{i})}^{2}\frac{n\mathrm{ln}(2\pi {\sigma}^{2})}{2},$$ 
For the saturated model ${\mathcal{P}}_{max}$, the predicted value ${({\widehat{\mu}}_{max})}_{i}$ = the observed response value ${y}_{i}$. Therefore,
$$\mathrm{\ell}({\widehat{\bm{\beta}}}_{max}\mid \text{\mathbf{y}})=\frac{1}{2{\sigma}^{2}}\sum _{i=1}^{n}{({y}_{i}{({\widehat{\mu}}_{max})}_{i})}^{2}\frac{n\mathrm{ln}(2\pi {\sigma}^{2})}{2}=\frac{n\mathrm{ln}(2\pi {\sigma}^{2})}{2}.$$ 
So the deviance is
$$\mathrm{dev}(\mathcal{P})=2[\mathrm{\ell}({\widehat{\bm{\beta}}}_{max}\mid \text{\mathbf{y}})\mathrm{\ell}(\widehat{\bm{\beta}}\mid \text{\mathbf{y}})]=\frac{1}{{\sigma}^{2}}\sum _{i=1}^{n}{({y}_{i}{\widehat{\mu}}_{i})}^{2},$$ 
which is exactly the residual sum of squares, or RSS, used in regression models.
Remarks

•
The deviance is necessarily nonnegative.

•
The distribution^{} of the deviance is asymptotically a chi square distribution (http://planetmath.org/ChiSquaredRandomVariable) with $np$ degress of freedom, where $n$ is the number of observations and $p$ is the number of parameters in the model $\mathcal{P}$.

•
If two generalized linear models ${\mathcal{P}}_{1}$ and ${\mathcal{P}}_{2}$ are nested, say ${\mathcal{P}}_{1}$ is nested within ${\mathcal{P}}_{2}$, we can perform hypothesis testing^{} ${H}_{0}$: the model for the data is ${\mathcal{P}}_{1}$ with ${p}_{1}$ parameters, against ${H}_{1}$: the model for the data is the more general ${\mathcal{P}}_{2}$ with ${p}_{2}$ parameters, where $$. The deviance difference $\mathrm{\Delta}$(dev)$=\mathrm{dev}({\mathcal{P}}_{2})\mathrm{dev}({\mathcal{P}}_{1})$ can be used as a test statistic and it is approximately a chi square distribution with ${p}_{2}{p}_{1}$ degrees of freedom.
References
 1 P. McCullagh and J. A. Nelder, Generalized Linear Models, Chapman & Hall/CRC, 2nd ed., London (1989).
 2 A. J. Dobson, An Introduction to Generalized Linear Models, Chapman & Hall, 2nd ed. (2001).
Title  deviance 

Canonical name  Deviance 
Date of creation  20130322 14:34:04 
Last modified on  20130322 14:34:04 
Owner  CWoo (3771) 
Last modified by  CWoo (3771) 
Numerical id  8 
Author  CWoo (3771) 
Entry type  Definition 
Classification  msc 62J12 
Defines  null model 
Defines  saturated model 