deviance


Background

In testing the fit of a generalized linear model 𝒫 of some data (with response variable Y and explanatory variable(s) X), one way is to compare 𝒫 with a similarMathworldPlanetmathPlanetmath model 𝒫0. By similarity we mean: given 𝒫 with the response variable YifYi and link function g such that g(E[Yi])=𝐗iT𝜷, the model 𝒫0

  1. 1.

    is a generalized linear model of the same data,

  2. 2.

    has the response variable Y distributed as fY, same as found in 𝒫

  3. 3.

    has the same link function g as found in 𝒫, such that g(E[Yi])=𝐗iT𝜷𝟎

Notice that the only possible difference is found in the parameters 𝜷.

It is desirable for this 𝒫0 to be served as a base model in case when more than one models are being assessed. Two possible candidates for 𝒫0 are the null model and the saturated model. The null model 𝒫null is one in which only one parameter μ is used so that g(E[Yi])=μ, all responses have the same predicted outcome. The saturated model 𝒫max is the other extreme where the maximum number of parameters are used in the model so that the observed response values equal to the predicted response values exactly, g(E[Yi])=𝐗iT𝜷max=yi

Definition The deviance of a model 𝒫 (generalized linear model) is given by

dev(𝒫)=2[(𝜷^max𝐲)-(𝜷^𝐲)],

where is the log-likelihood functionMathworldPlanetmath, 𝜷^ is the MLE of the parameter vector 𝜷 from 𝒫 and 𝜷^max is the MLE of parameter vector 𝜷max from the saturated model 𝒫max.

Example For a normal or general linear model, where the link function is the identity:

E[Yi]=𝐱iT𝜷,

where the Yi’s are mutually independentPlanetmathPlanetmath and normally distributed as N(μi,σ2). The log-likelihood function is given by

(𝜷𝐲)=-12σ2i=1n(yi-μi)2-nln(2πσ2)2,

where μi=𝐱iT𝜷 is the predicted response values, and n is the number of observations.

For the model in question, suppose μ^i=𝐗iT𝜷^ is the expected mean calculated from the maximum likelihood estimate 𝜷^ of the parameter vector 𝜷. So,

(𝜷^𝐲)=-12σ2i=1n(yi-μ^i)2-nln(2πσ2)2,

For the saturated model 𝒫max, the predicted value (μ^max)i = the observed response value yi. Therefore,

(𝜷^max𝐲)=-12σ2i=1n(yi-(μ^max)i)2-nln(2πσ2)2=-nln(2πσ2)2.

So the deviance is

dev(𝒫)=2[(𝜷^max𝐲)-(𝜷^𝐲)]=1σ2i=1n(yi-μ^i)2,

which is exactly the residual sum of squares, or RSS, used in regression models.

Remarks

  • The deviance is necessarily non-negative.

  • The distributionPlanetmathPlanetmath of the deviance is asymptotically a chi square distribution (http://planetmath.org/ChiSquaredRandomVariable) with n-p degress of freedom, where n is the number of observations and p is the number of parameters in the model 𝒫.

  • If two generalized linear models 𝒫1 and 𝒫2 are nested, say 𝒫1 is nested within 𝒫2, we can perform hypothesis testingMathworldPlanetmath H0: the model for the data is 𝒫1 with p1 parameters, against H1: the model for the data is the more general 𝒫2 with p2 parameters, where p1<p2. The deviance difference Δ(dev)=dev(𝒫2)-dev(𝒫1) can be used as a test statistic and it is approximately a chi square distribution with p2-p1 degrees of freedom.

References

  • 1 P. McCullagh and J. A. Nelder, Generalized Linear Models, Chapman & Hall/CRC, 2nd ed., London (1989).
  • 2 A. J. Dobson, An Introduction to Generalized Linear Models, Chapman & Hall, 2nd ed. (2001).
Title deviance
Canonical name Deviance
Date of creation 2013-03-22 14:34:04
Last modified on 2013-03-22 14:34:04
Owner CWoo (3771)
Last modified by CWoo (3771)
Numerical id 8
Author CWoo (3771)
Entry type Definition
Classification msc 62J12
Defines null model
Defines saturated model