# one-pass algorithm to compute sample variance

In many situations it is desirable to calculate,
in one iteration, the sample variance^{} of many numbers, and without
having to have every number available
in computer memory before beginning processing.

Let ${x}_{1},{x}_{2},\mathrm{\dots}$ denote the data. The naïve formula for calculating the sample variance in one pass,

$$v=\frac{1}{n-1}\sum _{i=1}^{n}{({x}_{i}-\overline{x})}^{2}=\frac{\left({\sum}_{i=1}^{n}{x}_{i}^{2}\right)-n{\overline{x}}^{2}}{n-1},\overline{x}=\frac{1}{n}\sum _{i=1}^{n}{x}_{i},$$ |

suffers from computational round-off error.
If the mean $\overline{x}$ is large in absolute value^{},
and ${\sum}_{i=1}^{n}{x}_{i}^{2}$ is close to $n{\overline{x}}^{2}$,
then the subtraction at the end will tend to lose
significant digits on the result.
Also, in rare cases, the sum of squares ${\sum}_{i=1}^{n}{x}_{i}^{2}$
can overflow on a computer.

A better alternative, though requiring more work per iteration,
is to calculate the running sample mean
and variance^{} instead,
and update these as each datum is processed.
Here we give the derivation of the one-pass algorithm — which
involves nothing more than simple algebraic manipulations.

Define the running arithmetic mean and the sum of squared residuals:

$${a}_{n}=\frac{1}{n}\sum _{i=1}^{n}{x}_{i},{s}_{n}=\sum _{i=1}^{n}{({x}_{i}-{a}_{n})}^{2}.$$ |

We want to express ${a}_{n+1}$ and ${s}_{n+1}$ in terms of the old values ${a}_{n}$ and ${s}_{n}$.

For convenience, let $\delta ={x}_{n+1}-{a}_{n}$ and $\gamma ={a}_{n+1}-{a}_{n}$. Then we have

$${a}_{n+1}=\frac{n{a}_{n}+{x}_{n+1}}{n+1}=\frac{(n+1){a}_{n}+{x}_{n+1}-{a}_{n}}{n+1}={a}_{n}+\frac{\delta}{n+1}.$$ |

For the variance calculation, we have

${s}_{n+1}$ | $={\displaystyle \sum _{i=1}^{n}}{\left(({x}_{i}-{a}_{n})-\gamma \right)}^{2}+{({x}_{n+1}-{a}_{n+1})}^{2}$ | ||

$={\displaystyle \sum _{i=1}^{n}}{({x}_{i}-{a}_{n})}^{2}-2\gamma {\displaystyle \sum _{i=1}^{n}}({x}_{i}-{a}_{n})+{\displaystyle \sum _{i=1}^{n}}{\gamma}^{2}+{({x}_{n+1}-{a}_{n+1})}^{2}$ | |||

$={s}_{n}+0+n{\gamma}^{2}+{({x}_{n+1}-{a}_{n+1})}^{2}.$ |

Now observe:

$\gamma ={\displaystyle \frac{\delta}{n+1}},{x}_{n+1}-{a}_{n+1}=\delta -\gamma =(n+1)\gamma -\gamma =n\gamma ;$ |

hence we obtain:

${s}_{n+1}={s}_{n}+n{\gamma}^{2}+{n}^{2}{\gamma}^{2}={s}_{n}+n(n+1){\gamma}^{2}$ | $={s}_{n}+n\gamma \delta $ | ||

$={s}_{n}+({x}_{n+1}-{a}_{n+1})\delta .$ |

Note that the number to be added to ${s}_{n}$ is never negative, so no cancellation error will occur from this procedure. (However, there can still be computational round-off error if ${s}_{n+1}-{s}_{n}$ happens to be very small compared to ${s}_{n}$.)

The recurrence relation^{} for the
sample covariance of two lists of numbers ${x}_{1},{x}_{2},\mathrm{\dots}$
and ${y}_{1},{y}_{2},\mathrm{\dots}$
can be derived similarly.
If ${a}_{n}$ and ${b}_{n}$ denote the arithmetic means
of first $n$ numbers of each of the two lists respectively,
then the sum of adjusted products

$${c}_{n}=\sum _{i=1}^{n}({x}_{i}-{a}_{n})({y}_{i}-{b}_{n})$$ |

can be incrementally updated by

$${c}_{n+1}={c}_{n}+({y}_{n+1}-{b}_{n+1})({x}_{n+1}-{a}_{n})={c}_{n}+({x}_{n+1}-{a}_{n+1})({y}_{n+1}-{b}_{n}).$$ |

## References

- 1 B. P. Welford. “Note on a Method for Calculating Corrected Sums of Squares and Products”. Technometrics, Vol. 4, No. 3 (Aug., 1962), p. 419-420.
- 2 “http://en.wikipedia.org/wiki/Algorithms_for_calculating_varianceAlgorithms for calculating variance”. Wikipedia, The Free Encyclopedia. Accessed 25 February 2007.

Title | one-pass algorithm to compute sample variance |
---|---|

Canonical name | OnepassAlgorithmToComputeSampleVariance |

Date of creation | 2013-03-22 16:45:19 |

Last modified on | 2013-03-22 16:45:19 |

Owner | stevecheng (10074) |

Last modified by | stevecheng (10074) |

Numerical id | 9 |

Author | stevecheng (10074) |

Entry type | Algorithm^{} |

Classification | msc 68W01 |

Classification | msc 65-00 |

Classification | msc 62-00 |