|
|
|
|
|
Background
In testing the fit of a generalized linear model
of some data (with response variable Y and explanatory variable(s) X), one way is to compare
with a similar model
. By similarity we mean: given
with the response variable
and link function such that
, the model

- is a generalized linear model of the same data,
- has the response variable
distributed as , same as found in

- has the same link function
as found in
, such that
![$ g(\operatorname{E}[Y_i])={\textbf{X}_i}^{\operatorname{T}}\boldsymbol{\beta_0}$ $ g(\operatorname{E}[Y_i])={\textbf{X}_i}^{\operatorname{T}}\boldsymbol{\beta_0}$](http://images.planetmath.org:8080/cache/objects/6125/l2h/img14.png)
Notice that the only possible difference is found in the parameters
.
It is desirable for this
to be served as a base model in case when more than one models are being assessed. Two possible candidates for
are the null model and the saturated model. The null model
is one in which only one parameter is used so that
, all responses have the same predicted outcome. The saturated model
is the other extreme where the maximum number of parameters are used in the model so that the observed response values equal to the predicted response values exactly,
![$ g(\operatorname{E}[Y_i])={\textbf{X}_i}^{\operatorname{T}}\boldsymbol{\beta}_{max}=y_i$ $ g(\operatorname{E}[Y_i])={\textbf{X}_i}^{\operatorname{T}}\boldsymbol{\beta}_{max}=y_i$](http://images.planetmath.org:8080/cache/objects/6125/l2h/img22.png)
Definition The deviance of a model
(generalized linear model) is given by
where is the log-likelihood function,
is the MLE of the parameter vector
from
and
is the MLE of parameter vector
from the saturated model
.
Example For a normal or general linear model, where the link function is the identity:
where the 's are mutually independent and normally distributed as
. The log-likelihood function is given by
where
is the predicted response values, and is the number of observations.
For the model in question, suppose
is the expected mean calculated from the maximum likelihood estimate
of the parameter vector
. So,
For the saturated model
, the predicted value
= the observed response value . Therefore,
So the deviance is
which is exactly the residual sum of squares, or RSS, used in regression models.
Remarks
- The deviance is necessarily non-negative.
- The distribution of the deviance is asymptotically a chi square distribution with
degress of freedom, where is the number of observations and is the number of parameters in the model
.
- If two generalized linear models
and
are nested, say
is nested within
, we can perform hypothesis testing : the model for the data is
with parameters, against : the model for the data is the more general
with parameters, where . The deviance difference (dev)
can be used as a test statistic and it is approximately a chi square distribution with degrees of freedom.
- 1
- P. McCullagh and J. A. Nelder, Generalized Linear Models, Chapman & Hall/CRC, 2nd ed., London (1989).
- 2
- A. J. Dobson, An Introduction to Generalized Linear Models, Chapman & Hall, 2nd ed. (2001).
|
"deviance" is owned by CWoo.
|
|
(view preamble | get metadata)
| Also defines: |
null model, saturated model |
|
|
Cross-references: degrees of freedom, statistic, hypothesis testing, distribution, regression models, squares, sum, residual, maximum likelihood estimate, observations, independent, identity, general linear model, normal, vector, MLE, log-likelihood function, number, outcome, base, parameters, difference, link function, mean, similarity, similar, explanatory variable, response variable, generalized linear model
This is version 5 of deviance, born on 2004-09-02, modified 2006-09-12.
Object id is 6125, canonical name is Deviance.
Accessed 6963 times total.
Classification:
| AMS MSC: | 62J12 (Statistics :: Linear inference, regression :: Generalized linear models) |
|
|
|
|
|
|
Pending Errata and Addenda
|
|
|
|
|
|
|
|
|
|
|