PlanetMath (more info)
 Math for the people, by the people. Sponsor PlanetMath
Encyclopedia | Requests | Forums | Docs | Wiki | Random | RSS  
Login
create new user
name:
pass:
forget your password?
Main Menu
Owner confidence rating: Very high Entry average rating: No information on entry rating
[parent] linear least squares fit (Definition)

One of the most common uses of least squares fitting is fitting a straight line to data. Whilst, in general, it is difficult to determine the curve which best fits the data, in this case there is a relatively simple formula which can be used.

Theorem 1   Suppose we have a data set $(x_1,y_1), \ldots, (x_n,y_n)$ . Then the straight line which best fits this set is given as $$ y = {ns - pq \over nr - p^2} x + {qr - ps \over nr - p^2} $$ where
$\displaystyle p$ $\displaystyle = \sum_{k=1}^n x_k$ (1)
$\displaystyle q$ $\displaystyle = \sum_{k=1}^n y_k$ (2)
$\displaystyle r$ $\displaystyle = \sum_{k=1}^n x_k^2$ (3)
$\displaystyle s$ $\displaystyle = \sum_{k=1}^n x_k y_k$ (4)

Proof. Being the best fitting line means minimizing the merit function $M$ , given as $$ M(a,b) = \sum_{k=0}^n (a x_k + b - y_k)^2 $$ with respect to the parameters $a$ and $b$ . Expanding the square, this can be written as $$ M(a,b) = r a^2 + 2pab + nb^2 - 2sa - 2qb + t $$ where $p,q,r,s$ are as above and $$ t = \sum_{k=1}^n y_k^2 . $$

This function $M$ is a quadratic polynomial; moreover, from its definition as a sum of squares, it is clear that the highest order terms are positive definite, hence it has a minimum and all that remains is to find that minimum. To do this, we set the derivatives equal to zero to obtain the following equations:

$\displaystyle 0 = {\partial M(a,b) \over \partial a}$ $\displaystyle = 2ar + 2pb - 2s$ (5)
$\displaystyle 0 = {\partial M(a,b) \over \partial b}$ $\displaystyle = 2pa + 2nb - 2q$ (6)

These equations are easily solved to give
$\displaystyle a$ $\displaystyle = {ns - pq \over nr - p^2}$ (7)
$\displaystyle b$ $\displaystyle = {qr - ps \over nr - p^2} ;$ (8)

substituting in the equation $y = ax + b$ for a straight line, we obtain the answer given above. $ \qedsymbol$

Because of the ease with which one can make a least squares fit of a line, this technique is often adapted to fitting other sorts of curves by making a change of variables. Two common cases of this practice are power laws and exponentials.

Suppose that one wants to fit some data to a curve of the form $y = c e^{kx}$ . Making a change of variable $y = e^u$ and defining $b = \log c$ , the equation of the curve becomes $u = kx + b$ . One can therefore fit the data set $(x_1, \log y_1), \ldots (x_n, \log y_n)$ to a straight line.

Suppose that one wants to fit some data to a curve of the form $y = cx^p$ . Making a change of variable $x = e^v$ , $y = e^u$ and defining $b = \log c$ , the equation of the curve becomes $u = pv + b$ . One can therefore fit the data set $(\log x_1, \log y_1), \ldots (\log x_n, \log y_n)$ to a straight line.

Although convenient and common, this procedure can be a cheat because changing variables and making a least squares fit of a line is not the same as making a least squares fit to a curve. The reason for this is that the merit functions are different and will not, in general have a minimum in the same place. However, if the data happen to approximately lie on a power curve or an exponential, then the answer obtained by changing variables and fitting will be an approximation to the correct answer. Depending on what one is doing, this approximation may be good enough or one may use it as a starting point for some algorithm to compute the correct minimum.




"linear least squares fit" is owned by rspuzio. [ full author list (2) ]
(view preamble | get metadata)

View style:

See Also: regression model, Gauss-Markov theorem


This object's parent.
Log in to rate this entry.
(view current ratings)

Cross-references: algorithm, point, approximation, lie on, place, exponentials, power, variables, sorts, adapted, equations, derivatives, positive definite, terms, order, clear, sum, polynomial, square, parameters, function, best fitting line, formula, simple, curve, line, straight, least squares

This is version 10 of linear least squares fit, born on 2007-07-17, modified 2007-07-18.
Object id is 9776, canonical name is LinearLeastSquaresFit.
Accessed 2780 times total.

Classification:
AMS MSC15-00 (Linear and multilinear algebra; matrix theory :: General reference works )

Pending Errata and Addenda
None.
[ View all 2 ]
Discussion
Style: Expand: Order:
forum policy

No messages.

Interact
post | correct | update request | add derivation | add example | add (any)