one-pass algorithm to compute sample variance


In many situations it is desirable to calculate, in one iteration, the sample varianceMathworldPlanetmath of many numbers, and without having to have every number available in computer memory before beginning processing.

Let x1,x2, denote the data. The naïve formula for calculating the sample variance in one pass,

v=1n-1i=1n(xi-x¯)2=(i=1nxi2)-nx¯2n-1,x¯=1ni=1nxi,

suffers from computational round-off error. If the mean x¯ is large in absolute valuePlanetmathPlanetmath, and i=1nxi2 is close to nx¯2, then the subtraction at the end will tend to lose significant digits on the result. Also, in rare cases, the sum of squares i=1nxi2 can overflow on a computer.

A better alternative, though requiring more work per iteration, is to calculate the running sample mean and varianceMathworldPlanetmath instead, and update these as each datum is processed. Here we give the derivation of the one-pass algorithm — which involves nothing more than simple algebraic manipulations.

Define the running arithmetic mean and the sum of squared residuals:

an=1ni=1nxi,sn=i=1n(xi-an)2.

We want to express an+1 and sn+1 in terms of the old values an and sn.

For convenience, let δ=xn+1-an and γ=an+1-an. Then we have

an+1=nan+xn+1n+1=(n+1)an+xn+1-ann+1=an+δn+1.

For the variance calculation, we have

sn+1 =i=1n((xi-an)-γ)2+(xn+1-an+1)2
=i=1n(xi-an)2-2γi=1n(xi-an)+i=1nγ2+(xn+1-an+1)2
=sn+0+nγ2+(xn+1-an+1)2.

Now observe:

γ=δn+1,xn+1-an+1=δ-γ=(n+1)γ-γ=nγ;

hence we obtain:

sn+1=sn+nγ2+n2γ2=sn+n(n+1)γ2 =sn+nγδ
=sn+(xn+1-an+1)δ.

Note that the number to be added to sn is never negative, so no cancellation error will occur from this procedure. (However, there can still be computational round-off error if sn+1-sn happens to be very small compared to sn.)

The recurrence relationMathworldPlanetmath for the sample covariance of two lists of numbers x1,x2, and y1,y2, can be derived similarly. If an and bn denote the arithmetic means of first n numbers of each of the two lists respectively, then the sum of adjusted products

cn=i=1n(xi-an)(yi-bn)

can be incrementally updated by

cn+1=cn+(yn+1-bn+1)(xn+1-an)=cn+(xn+1-an+1)(yn+1-bn).

References

  • 1 B. P. Welford. “Note on a Method for Calculating Corrected Sums of Squares and Products”. Technometrics, Vol. 4, No. 3 (Aug., 1962), p. 419-420.
  • 2 http://en.wikipedia.org/wiki/Algorithms_for_calculating_varianceAlgorithms for calculating variance”. Wikipedia, The Free Encyclopedia. Accessed 25 February 2007.
Title one-pass algorithm to compute sample variance
Canonical name OnepassAlgorithmToComputeSampleVariance
Date of creation 2013-03-22 16:45:19
Last modified on 2013-03-22 16:45:19
Owner stevecheng (10074)
Last modified by stevecheng (10074)
Numerical id 9
Author stevecheng (10074)
Entry type AlgorithmMathworldPlanetmath
Classification msc 68W01
Classification msc 65-00
Classification msc 62-00