## You are here

Homescore function

## Primary tabs

# score function

Given a statistical model $\{f_{{\mathbf{X}}}(\boldsymbol{x}\mid\boldsymbol{\theta}):\boldsymbol{\theta}%
\in\Theta\}$ with log-likelihood function $\ell(\boldsymbol{\theta}\mid\boldsymbol{x})$, the *score function* $U$ is defined to be the gradient of $\ell$:

$U(\boldsymbol{\theta})=\nabla\ell=\frac{\partial\ell}{\partial\boldsymbol{% \theta}}.$ |

Since the score function $U$ is also a function of the random vector $\boldsymbol{x}$, $U$ is itself a random vector.
By setting $U$ to 0, we have a system of $k$ equation(s), otherwise known as the *likelihood equation(s)*:

$U(\boldsymbol{\theta})=\Big(\frac{\partial\ell}{\partial\theta_{1}},\ldots,% \frac{\partial\ell}{\partial\theta_{k}}\Big)=(0,\ldots,0).$ |

If $\boldsymbol{\theta}=\theta$ is one-dimensional, then the score function is simply referred to as the *score* of $\theta$.

The maximum likelihood estimate (MLE) $\hat{\boldsymbol{\theta}}$ of the parameter vector $\boldsymbol{\theta}$ can usually be found by finding the solutions of the likelihood equations. The likelihood equations may also be formed by setting the gradient of the plain likelihood function to zero. The use of the log function often facilitates the algebra as many distributions are exponential in nature. For some distributions it may also be necessary to test that the solution to the likelihood equations is really a minimum as opposed to a point of inflection.

Example. $n$ independent observations are made from a random variable $X$ with a Poisson distribution with parameter $\lambda$. The observed values are $x_{1},\ldots,x_{n}$. The log-likelihood of the joint pdf is

$\ell(\lambda\mid\boldsymbol{x})=\sum_{{i=1}}^{{n}}-\lambda+x_{i}\ln(\lambda)-% \ln(x_{i}!)$ |

and so the score function is

$U(\lambda)=\frac{d\ell}{d\lambda}=\sum_{{i=1}}^{{n}}\big(-1+\frac{x_{i}}{% \lambda}\big)=-n+\frac{n\overline{x}}{\lambda},$ |

where $n\overline{x}=\sum x_{i}$. To find the MLE of $\lambda$, we set $U=0$ and solve for $\lambda$. So the MLE $\hat{\lambda}$ of $\lambda=\overline{x}$.

## Mathematics Subject Classification

62A01*no label found*

- Forums
- Planetary Bugs
- HS/Secondary
- University/Tertiary
- Graduate/Advanced
- Industry/Practice
- Research Topics
- LaTeX help
- Math Comptetitions
- Math History
- Math Humor
- PlanetMath Comments
- PlanetMath System Updates and News
- PlanetMath help
- PlanetMath.ORG
- Strategic Communications Development
- The Math Pub
- Testing messages (ignore)

- Other useful stuff
- Corrections

## Comments

## Expectation evaluated with respect to what?

It says we evaluate expectation of the score function, and set it to zero. But which distribution is expectation evaluated with respect to?

If we have a Bernoulli variable and observe a heads and b tails,

then

log-likelihood: l(t)=a log(t) + b log(1-t)

score function: U(t) = a/t + b/(1-t)

maximum likelihood solution is a/(a+b) which is the solution of U(t)=0

So it looks like here we are setting the score function to 0 directly, and not it's expectation. Am I missing something here?

## Re: Expectation evaluated with respect to what?

First, for a given random variable, say X, and its prob density function f(X;t) where t is a nuisance parameter, we find the log-likelihood function (with respect to X). Then, the score function is calculated by taking derivative with respect to the nuisance parameter t. Finally, the expectation of the score function with respet to the random variable X is calculated. This is then set to zero and solve for t, in terms of E(X).

So, in your example, we are interested in the MLE of t in a Binomial random variable, with density function f(X;t) = (n choose x) t^X (1-t)^(n-X). Taking the natural log and then the first partial derivative with respect to t, you have

U(t) = X/t - (n-X)/(1-t).

Set E(U)=0, we have 0 = E(X)/t-(n-E(X))/(1-t). Now solve for t, in terms of E(X) to get t = E(X)/n. Given the experimental result, n = a+b, and the sample expectation of X is a. So, the MLE of t = E(X)/n = a/(a+b).

I will add an example (perhaps this one) to my entry to clarify...

Chi

## Likelihood Equations

Hi,

I think the expection of the score function will always be zero, only the variance is affected by the parameter.

I think the likelihood equations should be "score(\theta) = 0", not "expect(score(\theta)) = 0". If you do the integration necessary in computing the expectation, you end up differentiating a constant and thus get zero.

Am I wrong?

Thanks,

Greg

## Re: Likelihood Equations

You're right. Thanks for pointint it out. How does it look now?

Chi

## Re: Likelihood Equations

Much better :)

Perhaps "The maximum likelihood estimate (MLE) of" would be better than just "MLE \theta".

Also, and this is probably just pedantism, you can probably form the likelihood equations by setting the gradient of the likelihood function to zero (it doesn't have to be the log-likelihood), since ln is monotonic. This may, in some strange cases, be easier to differentiate I suppose.

Just out of interest, it's obvious that the point won't be a minimum, but I suppose it could be a point of inflection, so perhaps there is a need to check this before declaring \theta a MLE.

Greg

## Re: Likelihood Equations

Right again Greg! When I created this entry, I had pdf's from the exponential families in my mind. So naturally the log-likelihood functions are easier to use. Also, I was trying to tie the likelihood equations to the score function.

Go ahead, you should be able to edit the entry now as well.

Chi

## Re: Likelihood Equations

Thanks Chi. I hope the corrections are OK.

Greg