deviance
Background
In testing the fit of a generalized linear model 𝒫 of some data (with response variable Y and explanatory variable(s) X), one way is to compare 𝒫 with a similar model 𝒫0. By similarity we mean: given 𝒫 with the response variable Yi∼fYi and link function g such that g(E[Yi])=𝐗iT𝜷, the model 𝒫0
-
1.
is a generalized linear model of the same data,
-
2.
has the response variable Y distributed as fY, same as found in 𝒫
-
3.
has the same link function g as found in 𝒫, such that g(E[Yi])=𝐗iT𝜷𝟎
Notice that the only possible difference is found in the parameters 𝜷.
It is desirable for this 𝒫0 to be served as a base model in case when more than one models are being assessed. Two possible candidates for 𝒫0 are the null model and the saturated model. The null model 𝒫null is one in which only one parameter μ is used so that g(E[Yi])=μ, all responses have the same predicted outcome. The saturated model 𝒫max is the other extreme where the maximum number of parameters are used in the model so that the observed response values equal to the predicted response values exactly, g(E[Yi])=𝐗iT𝜷max=yi
Definition The deviance of a model 𝒫 (generalized linear model) is given by
dev(𝒫)=2[ℓ(^𝜷max∣𝐲)-ℓ(^𝜷∣𝐲)], |
where ℓ is the log-likelihood function, ^𝜷 is the MLE of the parameter vector 𝜷 from 𝒫 and ^𝜷max is the MLE of parameter vector 𝜷max from the saturated model 𝒫max.
Example For a normal or general linear model, where the link function is the identity:
E[Yi]=𝐱iT𝜷, |
where the Yi’s are mutually independent and normally distributed as N(μi,σ2). The log-likelihood function is given by
ℓ(𝜷∣𝐲)=-12σ2n∑i=1(yi-μi)2-nln(2πσ2)2, |
where μi=𝐱iT𝜷 is the predicted response values, and n is the number of observations.
For the model in question, suppose ˆμi=𝐗iT^𝜷 is the expected mean calculated from the maximum likelihood estimate ^𝜷 of the parameter vector 𝜷. So,
ℓ(^𝜷∣𝐲)=-12σ2n∑i=1(yi-ˆμi)2-nln(2πσ2)2, |
For the saturated model 𝒫max, the predicted value (ˆμmax)i = the observed response value yi. Therefore,
ℓ(^𝜷max∣𝐲)=-12σ2n∑i=1(yi-(ˆμmax)i)2-nln(2πσ2)2=-nln(2πσ2)2. |
So the deviance is
dev(𝒫)=2[ℓ(^𝜷max∣𝐲)-ℓ(^𝜷∣𝐲)]=1σ2n∑i=1(yi-ˆμi)2, |
which is exactly the residual sum of squares, or RSS, used in regression models.
Remarks
-
•
The deviance is necessarily non-negative.
-
•
The distribution
of the deviance is asymptotically a chi square distribution (http://planetmath.org/ChiSquaredRandomVariable) with n-p degress of freedom, where n is the number of observations and p is the number of parameters in the model 𝒫.
-
•
If two generalized linear models 𝒫1 and 𝒫2 are nested, say 𝒫1 is nested within 𝒫2, we can perform hypothesis testing
H0: the model for the data is 𝒫1 with p1 parameters, against H1: the model for the data is the more general 𝒫2 with p2 parameters, where p1<p2. The deviance difference Δ(dev)=dev(𝒫2)-dev(𝒫1) can be used as a test statistic and it is approximately a chi square distribution with p2-p1 degrees of freedom.
References
- 1 P. McCullagh and J. A. Nelder, Generalized Linear Models, Chapman & Hall/CRC, 2nd ed., London (1989).
- 2 A. J. Dobson, An Introduction to Generalized Linear Models, Chapman & Hall, 2nd ed. (2001).
Title | deviance |
---|---|
Canonical name | Deviance |
Date of creation | 2013-03-22 14:34:04 |
Last modified on | 2013-03-22 14:34:04 |
Owner | CWoo (3771) |
Last modified by | CWoo (3771) |
Numerical id | 8 |
Author | CWoo (3771) |
Entry type | Definition |
Classification | msc 62J12 |
Defines | null model |
Defines | saturated model |