PlanetMath (more info)
 Math for the people, by the people.
Encyclopedia | Requests | Forums | Docs | Wiki | Random | RSS  
Login
create new user
name:
pass:
forget your password?
Main Menu
Owner confidence rating: Very high Entry average rating: No information on entry rating
generalized linear model (Definition)

Given a random vector, or the response variable, Y, a generalized linear model, or GLM for short, is a statistical model $ \lbrace f_\textbf{Y}(\boldsymbol{y}\mid\boldsymbol{\theta})\rbrace$ such that

  1. the components of Y are mutually independent of each other,
  2. $ f_{Y_i}(y_i\mid\theta_i)$ belongs to the exponential family of distributions and has the following canonical form:
    $\displaystyle f_{Y_i}(y_i\mid\theta_i)=\operatorname{exp}[y\theta_i-b(\theta_i)+c(y)],$
    where the parameter $ \theta_i$ is called the canonical parameter and $ b(\theta_i)$ is called the cumulant function.
  3. for each component or variate $ Y_i$, with a corresponding set of $ p$ covariates $ X_{ij}$, there exists a monotone differentiable function $ g$, called the link function, such that
    $\displaystyle g(\operatorname{E}[Y_i])={\textbf{X}_i}^{\operatorname{T}}\boldsymbol{\beta},$
    where $ {\textbf{X}_i}^{\operatorname{T}}=(X_{i1},\ldots,X_{ip})$, and $ \boldsymbol{\beta}=(\beta_1,\ldots,\beta_p)^{\operatorname{T}}$ is a parameter vector.

In practice, an extra parameter called the dispersion parameter, $ \phi$, is introducted to the model to lower a phenonmenon known as overdispersion. The GLM now looks like:

$\displaystyle f_{Y_i}(y_i\mid\theta_i)=\operatorname{exp}[\frac{y\theta_i-b(\theta_i)}{a(\phi)}+c(y,\phi)]$

Remarks

  • Below is a table of canonical parameters and cumulant functions for some well-known distributions from the exponential family:
    distribution notation canonical parameter $ \theta$ cumulant function $ b(\theta)$
    Normal $ N(\mu,\sigma^2)$ $ \mu$ $ \displaystyle{\frac{\theta^2}{2}}$
    Poisson $ Poisson(\mu)$ $ \operatorname{ln}\mu$ $ \operatorname{exp}(\theta)$
    Binomial $ Bin(m,\pi)$ $ \operatorname{logit}(\pi)$ $ \operatorname{ln}(1+e^{\theta})$
    Gamma $ Gamma(\alpha,\lambda)$ $ -\lambda$ $ -\operatorname{ln}(-\theta)$
  • GLM is a direct generalization of the general linear model, which includes linear regression models, ANOVA and ANCOVA. The link function for the general linear model is the identity function $ g(\mu)=\mu$.
  • For a GLM, $ \operatorname{E}[Y]=b^{\prime}(\theta)$ and $ \operatorname{Var}[Y]=b^{\prime\prime}(\theta)$. $ b^{\prime\prime}(\theta)$, when expressed in terms of $ \mu=\operatorname{E}[Y]$, is known as the variance function $ V(\mu)$. Below are some examples of variance functions:
    distribution notation variance function
    Normal $ N(\mu,\sigma^2)$ 1
    Poisson $ Poisson(\mu)$ $ \mu$
    Binomial $ Bin(m,\pi)$ $ \pi(1-\pi)$
    Gamma $ Gamma(\alpha,\lambda)$ $ \displaystyle{\frac{1}{\lambda^2}}$
  • The logistic regression model, where the response variable $ Y$ is categorial in nature, is a special case of GLM, with possible link functions the logit function, $ \operatorname{logit}(\pi)=\operatorname{ln}(\operatorname{odds}(\pi))$, the inverse cumulative normal distribution function, or probit function $ \Phi^{-1}(\pi)$, or the complementary-log-log function, $ \operatorname{ln}(-\operatorname{ln}(1-\pi))$, where the parameter $ \pi$ is between 0 and 1, usually measured as the frequency of occurrences of certain events.
  • The log-linear model, where the response variable $ Y$ has a Poisson distribution, is also a special case of GLM, with link function the natural logarithm of the parameter $ \mu$ in question. Poisson distribution is typically used to model count or frequency data.

Bibliography

1
P. McCullagh and J. A. Nelder, Generalized Linear Models, Chapman & Hall/CRC, 2nd ed., London (1989).
2
A. J. Dobson, An Introduction to Generalized Linear Models, Chapman & Hall, 2nd ed. (2001).



"generalized linear model" is owned by CWoo.
(view preamble)

View style:

Other names:  GLM
Also defines:  link function, canonical parameter, cumulant function, variance function

Attachments:
general linear model (Definition) by CWoo
logistic regression (Definition) by CWoo
Log in to rate this entry.
(view current ratings)

Cross-references: natural logarithm, Poisson distribution, events, occurrences, complementary-log-log, probit, normal distribution, inverse, logit, logistic regression, terms, identity function, ANCOVA, ANOVA, linear regression models, general linear model, function, overdispersion, dispersion parameter, vector, differentiable function, monotone, parameter, canonical, distributions, exponential family, independent, components, statistical model, response variable, random vector
There are 4 references to this entry.

This is version 14 of generalized linear model, born on 2004-07-27, modified 2007-12-18.
Object id is 6040, canonical name is GeneralizedLinearModel.
Accessed 14280 times total.

Classification:
AMS MSC62J12 (Statistics :: Linear inference, regression :: Generalized linear models)

Pending Errata and Addenda
None.
[ View all 2 ]
Discussion
Style: Expand: Order:
forum policy

No messages.

Interact
post | correct | update request | add derivation | add example | add (any)