In testing the fit of a generalized linear model of some data (with response variable Y and explanatory variable(s) X), one way is to compare with a similar model . By similarity we mean: given with the response variable and link function such that , the model
is a generalized linear model of the same data,
has the response variable distributed as , same as found in
has the same link function as found in , such that
Notice that the only possible difference is found in the parameters .
It is desirable for this to be served as a base model in case when more than one models are being assessed. Two possible candidates for are the null model and the saturated model. The null model is one in which only one parameter is used so that , all responses have the same predicted outcome. The saturated model is the other extreme where the maximum number of parameters are used in the model so that the observed response values equal to the predicted response values exactly,
Definition The deviance of a model (generalized linear model) is given by
where is the log-likelihood function, is the MLE of the parameter vector from and is the MLE of parameter vector from the saturated model .
where the ’s are mutually independent and normally distributed as . The log-likelihood function is given by
where is the predicted response values, and is the number of observations.
For the model in question, suppose is the expected mean calculated from the maximum likelihood estimate of the parameter vector . So,
For the saturated model , the predicted value = the observed response value . Therefore,
So the deviance is
which is exactly the residual sum of squares, or RSS, used in regression models.
The deviance is necessarily non-negative.
The distribution of the deviance is asymptotically a chi square distribution (http://planetmath.org/ChiSquaredRandomVariable) with degress of freedom, where is the number of observations and is the number of parameters in the model .
If two generalized linear models and are nested, say is nested within , we can perform hypothesis testing : the model for the data is with parameters, against : the model for the data is the more general with parameters, where . The deviance difference (dev) can be used as a test statistic and it is approximately a chi square distribution with degrees of freedom.
- 1 P. McCullagh and J. A. Nelder, Generalized Linear Models, Chapman & Hall/CRC, 2nd ed., London (1989).
- 2 A. J. Dobson, An Introduction to Generalized Linear Models, Chapman & Hall, 2nd ed. (2001).
|Date of creation||2013-03-22 14:34:04|
|Last modified on||2013-03-22 14:34:04|
|Last modified by||CWoo (3771)|