extended discussion of the conjugate gradient method
Suppose we wish to solve the system
(1) |
where is a symmetric positive definite matrix. If we define the function
(2) |
we realise that solving (1) is equivalent to minimizing . This is because if is a minimum of then we must have
(3) |
These considerations give rise to the steepest descent algorithm. Given an approximation to the solution , the idea is to improve the approximation by moving in the direction in which decreases most rapidly. This direction is given by the gradient of in . Therefore we formulate our algorithm as follows
Given an initial approximation , for ,
where and is a scalar to be determined.
is traditionally called the residual vector. We wish to choose in such a way that we reduce as much as possibile in each iteration, in other words, we wish to minimize the function with respect to
Itβs possible to demonstrate that the steepest descent algorithm described above converges to the solution , in an infinite time. The conjugate gradient method improves on this by finding the exact solution after only iterations. Letβs see how we can achieve this.
We say that is optimal with respect to a direction if is a local minimum for the function .
In the steepest descent algorithm, is optimal with respect to , but in general it is not optimal with respect to . If we could modify the algorithm such that the optimality with respect to the search directions is preserved we might hope that that is optimal with respect to linearly independent directions, at which point we would have found the exact solution.
Letβs make the following modification
(4) | |||||
(5) |
where and are scalar multipliers to be determined
We choose in the same way as before, i.e. such that is optimal with respect to
(6) |
Now we wish to choose such that , is also optimal with respect to . We require that
(7) |
Since , and assuming that is optimal with respect to (i.e. that ) we can rewrite this condition as
(8) |
and therefore we obtain the required value for ,
(9) |
Now we want to show that is also optimal with respect to , i.e. that
(10) |
We do this by strong induction on , assuming that for all , is optimal with respect to or equivalently that
(11) |
Noticing that , we want to show
(12) |
and therefore since by inductive hypothesis, it suffices to prove that
(13) |
But, again by the definition of , this is equivalent to proving
(14) |
and since by inductive hypothesis, all we need to prove is that
(15) |
To proceed we require the following lemma.
Let .
- β’
- β’
If then for all .
Since , it follows that , and since ,
(16) |
To finish letβs prove the lemma. The first point is by induction. The base case holds. Assuming that , we have that
(17) |
For the second point we need an alternative characterization of . Since ,
(18) |
By (11), we have that if for some
(19) |
then , but we know that
(20) |
but were this zero, weβd have and we would have solved the original problem. Thus we conclude that , so the vectors are linearly independent. It follows that , which on the other hand is the maximum possible dimension of and thus we must have
(21) |
which is the alternative characterization we were looking for. Now we have, again by (11), that if , and , then
(22) |
thus the second point is proven.
Title | extended discussion of the conjugate gradient method |
---|---|
Canonical name | ExtendedDiscussionOfTheConjugateGradientMethod |
Date of creation | 2013-03-22 17:28:24 |
Last modified on | 2013-03-22 17:28:24 |
Owner | ehremo (15714) |
Last modified by | ehremo (15714) |
Numerical id | 13 |
Author | ehremo (15714) |
Entry type | Topic |
Classification | msc 15A06 |
Classification | msc 90C20 |