lecture notes on the Cayley-Hamilton theorem
1 Overview
You should all know about the characteristic polynomial of a square matrix, . To calculate the characteristic polynomial of , one subtracts a variable, say , from the diagonal entries of , and then takes the determinant of the result. In other words, letting denote the characteristic polynomial of , one has
where as usual denotes the identity matrix. For example, set
Evaluating the determinant of one gets
Now the interesting thing about square matrices is that one can do algebra with them. So if is a matrix then , , indeed every power of will also be a matrix. Indeed, one can take any polynomial , and happily plug into it. The result will be some other matrix. The obvious question now is: what will happen when one plugs a square matrix into its own characteristic polynomial? Let’s see what happens for the sample matrix above. Straightforward calculations show that
Next adding the various powers of with the coefficients of characteristic polynomial (note that one uses the identity matrix in place of the constants) one gets
Zero! One gets zero. This seemingly miraculous answer is not a coincidence. Indeed one gets zero regardless of what matrix one starts with. I encourage you to try this with a few examples of your own.
Theorem 1 (Cayley-Hamilton).
Let be an matrix, and let be the corresponding characteristic polynomial. Then, .
The goal of these notes will be to explain and prove the above theorem. There are various hidden reasons that make the Cayley-Hamilton theorem work. It is the purpose of these notes to bring these reasons into the open.
2 The Gist of the Matter.
Indeed there are two factors that make the Cayley-Hamilton theorem such a striking and interesting result. Recall that if is an -dimensional vector space, and , are any vectors in , then there will some kind of a linear relation between the ’s, i.e. for some choice of scalars one will have
Now the space of matrices is -dimensional. Therefore for every matrix there must be a linear relationship between the different matrix powers
The “miracle” of the Cayley-Hamilton theorem is twofold. First, a linear relation arises already for the powers . Second, the coefficients for this linear relation are precisely the coefficients of the characteristic polynomial of .
Let’s put it another way. Look at the first column vectors of the matrices , i.e. the vectors
Now is a -dimensional vector space, and so there should be a linear relation between the above vectors. Indeed there is: the coefficients of the linear relation are ( i.e. times the first vector, plus times the second, plus times the third, plus times the fourth is equal to zero — try it yourself!). What about the second column vectors of ? Now the vectors in question are
Again, we have here vectors from a -dimensional vectors space, and therefore there should be a linear relation between the vectors. However by some miracle the coefficients of the linear relation for the second column vectors are the same as the coefficients of the linear relation between the first column vectors, namely . Furthermore, these coefficients are precisely the coefficients of the characteristic polynomial: . Needless to say the third column vectors are joined in a linear relation with the same coefficients: . Why is this happening?
3 The Cyclic Basis
Let’s look again at the first column vectors of the matrices (recall that is just the identity matrix):
and let’s take these vectors as a new basis, . A basis obtained in this fashion, i.e. by starting with a vector and successively applying a matrix to it, is called a cyclic basis. What will be the representation of the matrix relative to this cyclic basis? Now is just the first elementary vector, . Furthermore, note that is nothing but , and that . Now is the first column vector of and we already determined the linear relation between the first column vectors of . The bottom line is that
and consequently will have the following appearance relative to the basis :
The transition matrix from to the standard basis is given by
Of course is relevant to our discussion precisely because
Proposition 1.
Let be an matrix, a non-singular matrix, and set . The matrices and have the same characteristic polynomials.
Proof.
In other words, according to the above theorem we should expect the characteristic polynomial of to be equal to the characteristic polynomial of . Let’s check this using a co-factor expansion.
Also note that the last column of contains all but one of the coefficients of the characteristic polynomial. This too is not a coincidence.
Proposition 2.
Consider an matrix such that the column vector for is the basic vector , while the last column of is the vector . In other words has the following form:
Then, the characteristic polynomial of is given by
Proof.
We will calculate the determinant of by doing a co-factor expansion along the first row. Let be the matrix obtained by deleting the first row and the first column from , and let be the matrix obtained by deleting the first row and the last column from . Doing a co-factor expansion along the top row, it is easy to see that
Now is an upper triangular matrix with ones on the diagonal, and therefore . The matrix , on the other hand has the same structure as , only it’s 1 size smaller. To that end let be the matrix obtained by deleting the first two rows and columns from . By the same reasoning as above it’s easy to see that
and therefore
Continuing inductively we see that for even , the determinant of will have the form:
For odd , will be just like the formula above, but multiplied through by a negative sign. ∎
4 Putting it all together
Thanks to Propositions 1 and 2 we are now in a position to understand and to prove the Cayley-Hamilton Theorem. Let be an matrix. Start by setting , and then create a sequence of vectors by successively applying , i.e. , , etc. Notice that ; in other words, is the first column of the matrix .
Next, suppose that the vectors form a basis, , of (There are matrices for which this doesn’t happen, but we’ll consider this possibility later.) There will therefore exist scalars such that
Now the representation of relative to the cyclic basis will have the form
By Proposition 1 the characteristic polynomial of is equal to the characteristic polynomial of . Furthermore, by Proposition 2 the characteristic polynomial of is equal to
Only one conclusion is possible: must be precisely the coefficients of the characteristic polynomial of . Let us summarize these findings.
Proposition 3.
Let be an matrix, with characteristic polynomial
Fix a number between and , and let be the column of the matrix . If the vectors form a basis of , then the vectors satisfy the linear relation:
5 A Complication
We are almost done with the proof of the Cayley-Hamilton Theorem. First, however, we must deal with the possibility that the square matrix is such that the column vectors of do not form a basis. Consider, for example
An easy calculation shows that the characteristic polynomial is given by
Writing down the sequence of powers of :
we notice that the first columns do, in fact, obey a linear relation with the coefficients of the characteristic polynomial:
(1) |
However these first column vectors do not form a basis of , and therefore Proposition 3 is not enough to explain why these vectors obey the above linear relation.
In order to find an explanation, let us proceed as follows. Just as before, start by setting , and . If we take , then will not form a basis, so instead, let us choose that is linearly independent from and , thereby ensuring that is a basis. There are many, many possible such choices for . To keep the discussion concrete, let us take . Note that
Therefore, representing relative to the basis we obtain
By Proposition 1, we know that the characteristic polynomial of is equal to the characteristic polynomial of . However, we know much more.
Proposition 4.
Let be an matrix of the form
where is a matrix, is a matrix, and is a matrix. Then, the characteristic polynomial of is the product of the characteristic polynomials of and , i.e. .
Proof.
Note that
where is the identity matrix, and is the identity matrix. The Proposition now follows from the fact that the determinant of a matrix whose shape is like is the determinant of the upper-left block times the determinant of the lower-right block. ∎
Thanks to Proposition 4 we know that the characteristic polynomial of is a product of the characteristic polynomial of the matrix
and the characteristic polynomial of the matrix . In other words,
Furthermore by Proposition 2 we know that , , , i.e. the first column vectors of , obey a linear relation with the coefficients of the polynomial :
(2) |
Multiplying this relation through by wed deduce that the first column vectors of obey the same linear relation:
(3) |
Next think about what it means to multiply a polynomial such as by another polynomial such as . Indeed, one can structure the multiplication by multiplying the first polynomial through by , then multiplying it through by , and then adding the two terms:
The bottom line is, of course, just the characteristic polynomial of , and the whole idea behind the above calculation is that can be “formed out of” the polynomial . This shows that we can combine relations (2) and (3) and produce in the end the desired relation (1). All we have to do is take relation (3), and add to it times the relation (2). This explains why the first column vectors of obey a linear relation whose coefficients come from the characteristic polynomial of .
Proposition 5.
Let be an matrix, with characteristic polynomial
Fix a number between and , and let be the column of the matrix . The vectors satisfy the linear relation:
even if the vectors do not form a basis of .
Proof.
Suppose that there is a number such that can be given as a linear combination of ; let’s say
Choose vectors so that the list
forms a basis of . Relative to this basis, will have the form
where the upper-left block has the form
By Proposition 2,
. Let denote the characteristic polynomial of . By Proposition 4, the characteristic polynomial of (which is equal to the characteristic polynomial of ) is the product , and therefore can be obtained by taking linear combinations of times various powers of :
Corresponding to the above polynomials are the relations
Adding these relations in the same way as the polynomials yields the desired relation:
∎
Now we really are finished. Thanks to Propositions 3 and 5 we know that for every , the column vectors of the matrices
obey a linear relation with the coefficients of the characteristic polynomial of . Since this is true for every column of the above matrices, it is certainly true for the full matrix — and that is the precisely the conclusion of the Cayley-Hamilton theorem.
Title | lecture notes on the Cayley-Hamilton theorem |
---|---|
Canonical name | LectureNotesOnTheCayleyHamiltonTheorem |
Date of creation | 2013-03-22 16:50:59 |
Last modified on | 2013-03-22 16:50:59 |
Owner | rmilson (146) |
Last modified by | rmilson (146) |
Numerical id | 8 |
Author | rmilson (146) |
Entry type | Topic |
Classification | msc 15A18 |
Classification | msc 15A15 |