extended discussion of the conjugate gradient method
Suppose we wish to solve the system
These considerations give rise to the steepest descent algorithm. Given an approximation to the solution , the idea is to improve the approximation by moving in the direction in which decreases most rapidly. This direction is given by the gradient of in . Therefore we formulate our algorithm as follows
Given an initial approximation , for ,
where and is a scalar to be determined.
is traditionally called the residual vector. We wish to choose in such a way that we reduce as much as possibile in each iteration, in other words, we wish to minimize the function with respect to
It’s possible to demonstrate that the steepest descent algorithm described above converges to the solution , in an infinite time. The conjugate gradient method improves on this by finding the exact solution after only iterations. Let’s see how we can achieve this.
We say that is optimal with respect to a direction if is a local minimum for the function .
In the steepest descent algorithm, is optimal with respect to , but in general it is not optimal with respect to . If we could modify the algorithm such that the optimality with respect to the search directions is preserved we might hope that that is optimal with respect to linearly independent directions, at which point we would have found the exact solution.
Let’s make the following modification
where and are scalar multipliers to be determined
We choose in the same way as before, i.e. such that is optimal with respect to
Now we wish to choose such that , is also optimal with respect to . We require that
Since , and assuming that is optimal with respect to (i.e. that ) we can rewrite this condition as
and therefore we obtain the required value for ,
Now we want to show that is also optimal with respect to , i.e. that
We do this by strong induction on , assuming that for all , is optimal with respect to or equivalently that
Noticing that , we want to show
and therefore since by inductive hypothesis, it suffices to prove that
But, again by the definition of , this is equivalent to proving
and since by inductive hypothesis, all we need to prove is that
To proceed we require the following lemma.
If then for all .
Since , it follows that , and since ,
To finish let’s prove the lemma. The first point is by induction. The base case holds. Assuming that , we have that
For the second point we need an alternative characterization of . Since ,
By (11), we have that if for some
then , but we know that
but were this zero, we’d have and we would have solved the original problem. Thus we conclude that , so the vectors are linearly independent. It follows that , which on the other hand is the maximum possible dimension of and thus we must have
which is the alternative characterization we were looking for. Now we have, again by (11), that if , and , then
thus the second point is proven.
|Title||extended discussion of the conjugate gradient method|
|Date of creation||2013-03-22 17:28:24|
|Last modified on||2013-03-22 17:28:24|
|Last modified by||ehremo (15714)|