|
|
|
|
one-pass algorithm to compute sample variance
|
(Algorithm)
|
|
|
In many situations it is desirable to calculate, in one iteration, the sample variance of many numbers, and without having to have every number available in computer memory before beginning processing.
Let
denote the data. The naïve formula for calculating the sample variance in one pass,
suffers from computational round-off error. If the mean
is large in absolute value, and
is close to
, then the subtraction at the end will tend to lose significant digits on the result. Also, in rare cases, the sum of squares
can overflow on a computer.
A better alternative, though requiring more work per iteration, is to calculate the running sample mean and variance instead, and update these as each datum is processed. Here we give the derivation of the one-pass algorithm -- which involves nothing more than simple algebraic manipulations.
Define the running arithmetic mean and the sum of squared residuals:
We want to express and in terms of the old values and .
For convenience, let
and
. Then we have
For the variance calculation, we have
Now observe:
hence we obtain:
Note that the number to be added to is never negative, so no cancellation error will occur from this procedure. (However, there can still be computational round-off error if
happens to be very small compared to .)
The recurrence relation for the sample covariance of two lists of numbers
and
can be derived similarly. If and denote the arithmetic means of first numbers of each of the two lists respectively, then the sum of adjusted products
can be incrementally updated by
- 1
- B. P. Welford. ``Note on a Method for Calculating Corrected Sums of Squares and Products''. Technometrics, Vol. 4, No. 3 (Aug., 1962), p. 419-420.
- 2
- ``Algorithms for calculating variance''. Wikipedia, The Free Encyclopedia. Accessed 25 February 2007.
|
"one-pass algorithm to compute sample variance" is owned by stevecheng.
|
|
(view preamble)
Cross-references: products, covariance, recurrence relation, negative, residuals, arithmetic mean, algebraic, simple, algorithm, variance, sample mean, running, squares, sum, digits, subtraction, absolute value, mean, numbers, sample variance, iteration, calculate
This is version 6 of one-pass algorithm to compute sample variance, born on 2007-02-24, modified 2007-04-11.
Object id is 8981, canonical name is OnePassAlgorithmToComputeSampleVariance.
Accessed 2500 times total.
Classification:
| AMS MSC: | 62-00 (Statistics :: General reference works ) | | | 65-00 (Numerical analysis :: General reference works ) | | | 68W01 (Computer science :: Algorithms :: General) |
|
|
|
|
|
|
Pending Errata and Addenda
|
|
|
|
|
|
|
|
|
|
|