# lecture notes on the Cayley-Hamilton theorem

## 1 Overview

You should all know about the characteristic polynomial^{} of a square
matrix^{}, $A$. To calculate the characteristic polynomial of $A$, one
subtracts a variable, say $t$, from the diagonal entries of $A$, and
then takes the determinant^{} of the result. In other words, letting
${p}_{A}(t)$ denote the characteristic polynomial of $A$, one has

$${p}_{A}(t)=det(A-tI),$$ |

where $I$ as usual denotes the identity matrix^{}.
For example, set

$$A=\left[\begin{array}{ccc}\hfill 1& \hfill 2& \hfill 3\\ \hfill 0& \hfill -2& \hfill 1\\ \hfill 1& \hfill 1& \hfill 0\end{array}\right].$$ |

Evaluating the determinant of $A-tI$ one gets

$${p}_{A}(t)=-{t}^{3}-{t}^{2}+6t+7.$$ |

Now the interesting thing about square matrices is that one can do
algebra with them. So if $A$ is a $3\times 3$ matrix then ${A}^{2}$,
${A}^{3}$, indeed every power of $A$ will also be a $3\times 3$ matrix.
Indeed, one can take any polynomial^{} $p(t)$, and happily plug $A$ into
it. The result will be some other $3\times 3$ matrix. The obvious
question now is: what will happen when one plugs a square matrix $A$
into its own characteristic polynomial? Let’s see what happens for
the sample $3\times 3$ matrix above. Straightforward calculations show
that

$${A}^{2}=\left[\begin{array}{ccc}\hfill 4& \hfill 1& \hfill 5\\ \hfill 1& \hfill 5& \hfill -2\\ \hfill 1& \hfill 0& \hfill 4\end{array}\right],{A}^{3}=\left[\begin{array}{ccc}\hfill 9& \hfill 11& \hfill 13\\ \hfill -1& \hfill -10& \hfill 8\\ \hfill 5& \hfill 6& \hfill 3\end{array}\right],$$ |

Next adding the various powers of $A$ with the coefficients of characteristic polynomial (note that one uses the identity matrix in place of the constants) one gets

$$\begin{array}{cccc}\hfill -{A}^{3}\hfill & \hfill -{A}^{2}\hfill & \hfill 6A\hfill & \hfill 7I\hfill \\ & & & \\ \hfill -\left[\begin{array}{ccc}\hfill 9& \hfill 11& \hfill 13\\ \hfill -1& \hfill -10& \hfill 8\\ \hfill 5& \hfill 6& \hfill 3\end{array}\right]\hfill & \hfill -\left[\begin{array}{ccc}\hfill 4& \hfill 1& \hfill 5\\ \hfill 1& \hfill 5& \hfill -2\\ \hfill 1& \hfill 0& \hfill 4\end{array}\right]\hfill & \hfill +6\left[\begin{array}{ccc}\hfill 1& \hfill 2& \hfill 3\\ \hfill 0& \hfill -2& \hfill 1\\ \hfill 1& \hfill 1& \hfill 0\end{array}\right]\hfill & \hfill +7\left[\begin{array}{ccc}\hfill 1& \hfill 0& \hfill 0\\ \hfill 0& \hfill 1& \hfill 0\\ \hfill 0& \hfill 0& \hfill 1\end{array}\right]\hfill \end{array}$$ |

$$=\left[\begin{array}{ccc}\hfill 0& \hfill 0& \hfill 0\\ \hfill 0& \hfill 0& \hfill 0\\ \hfill 0& \hfill 0& \hfill 0\end{array}\right]$$ |

Zero! One gets zero. This seemingly miraculous answer is not a coincidence. Indeed one gets zero regardless of what matrix one starts with. I encourage you to try this with a few examples of your own.

###### Theorem 1 (Cayley-Hamilton).

Let $A$ be an $n\mathrm{\times}n$ matrix, and let ${p}_{A}\mathit{}\mathrm{(}t\mathrm{)}\mathrm{=}\mathrm{det}\mathit{}\mathrm{(}A\mathrm{-}t\mathit{}I\mathrm{)}$ be the corresponding characteristic polynomial. Then, ${p}_{A}\mathit{}\mathrm{(}A\mathrm{)}\mathrm{=}\mathrm{0}$.

The goal of these notes will be to explain and prove the above
theorem. There are various hidden reasons that make the
Cayley-Hamilton theorem^{} work. It is the purpose of these notes to
bring these reasons into the open.

## 2 The Gist of the Matter.

Indeed there are two factors that make the Cayley-Hamilton theorem
such a striking and interesting result. Recall that if $U$ is an
$n$-dimensional vector space^{}, and ${\mathbf{u}}_{1},{\mathbf{u}}_{2},{\mathbf{u}}_{3},\mathrm{\dots},{\mathbf{u}}_{n+1}$, are any $n+1$ vectors in $U$, then there will some kind of
a linear relation^{} between the ${\mathbf{u}}_{i}$’s, i.e. for some choice of
scalars ${a}_{1},{a}_{2},\mathrm{\dots},{a}_{n+1}$ one will have

$${a}_{1}{\mathbf{u}}_{1}+{a}_{2}{\mathbf{u}}_{2}+{a}_{3}{\mathbf{u}}_{3}+\mathrm{\dots}{a}_{n+1}{\mathbf{u}}_{n+1}=0.$$ |

Now the space of $3\times 3$ matrices is $9$-dimensional. Therefore for every $3\times 3$ matrix $A$ there must be a linear relationship between the $10$ different matrix powers

$${A}^{9},{A}^{8},{A}^{7},{A}^{6},{A}^{5},{A}^{4},{A}^{3},{A}^{2},{A}^{1}=A,{A}^{0}=I.$$ |

The “miracle” of the Cayley-Hamilton theorem is twofold. First, a linear relation arises already for the powers ${A}^{3},{A}^{2},{A}^{1},{A}^{0}$. Second, the coefficients for this linear relation are precisely the coefficients of the characteristic polynomial of $A$.

Let’s put it another way. Look at the first column vectors^{} of the
matrices ${A}^{3},{A}^{2},{A}^{1},{A}^{0}$, i.e. the vectors

$$\left[\begin{array}{c}\hfill 9\\ \hfill -1\\ \hfill 5\end{array}\right],\left[\begin{array}{c}\hfill 4\\ \hfill 1\\ \hfill 1\end{array}\right],\left[\begin{array}{c}\hfill 1\\ \hfill 0\\ \hfill 1\end{array}\right],\left[\begin{array}{c}\hfill 1\\ \hfill 0\\ \hfill 0\end{array}\right].$$ |

Now ${\mathbb{R}}^{3}$ is a $3$-dimensional vector space, and so there should be a linear relation between the above $4$ vectors. Indeed there is: the coefficients of the linear relation are $-1,-1,6,7$ ( i.e. $-1$ times the first vector, plus $-1$ times the second, plus $6$ times the third, plus $7$ times the fourth is equal to zero — try it yourself!). What about the second column vectors of ${A}^{3},{A}^{2},{A}^{1},{A}^{0}$? Now the vectors in question are

$$\left[\begin{array}{c}\hfill 11\\ \hfill -10\\ \hfill 6\end{array}\right],\left[\begin{array}{c}\hfill 1\\ \hfill 5\\ \hfill 0\end{array}\right],\left[\begin{array}{c}\hfill 2\\ \hfill -2\\ \hfill 1\end{array}\right],\left[\begin{array}{c}\hfill 0\\ \hfill 1\\ \hfill 0\end{array}\right].$$ |

Again, we have here $4$ vectors from a $3$-dimensional vectors space, and therefore there should be a linear relation between the vectors. However by some miracle the coefficients of the linear relation for the second column vectors are the same as the coefficients of the linear relation between the first column vectors, namely $-1,-1,6,7$. Furthermore, these coefficients are precisely the coefficients of the characteristic polynomial: $-{t}^{3}-{t}^{2}+6t+7$. Needless to say the third column vectors are joined in a linear relation with the same coefficients: $-1,-1,6,7$. Why is this happening?

## 3 The Cyclic Basis

Let’s look again at the first column vectors of the matrices ${A}^{0},{A}^{1},{A}^{2}$ (recall that ${A}^{0}$ is just the identity matrix):

$${\mathbf{u}}_{0}=\left[\begin{array}{c}\hfill 1\\ \hfill 0\\ \hfill 0\end{array}\right],{\mathbf{u}}_{1}=\left[\begin{array}{c}\hfill 1\\ \hfill 0\\ \hfill 1\end{array}\right],{\mathbf{u}}_{2}=\left[\begin{array}{c}\hfill 4\\ \hfill 1\\ \hfill 1\end{array}\right],$$ |

and let’s take these $3$ vectors as a new basis, $\mathbf{B}$. A basis obtained in this fashion, i.e. by starting with a vector and successively applying a matrix to it, is called a cyclic basis. What will be the representation of the matrix $A$ relative to this cyclic basis? Now ${\mathbf{u}}_{0}$ is just the first elementary vector, ${\mathbf{e}}_{1}$. Furthermore, note that ${\mathbf{u}}_{1}$ is nothing but $A{\mathbf{u}}_{0}$, and that ${\mathbf{u}}_{2}=A{\mathbf{u}}_{1}$. Now $A{\mathbf{u}}_{2}$ is the first column vector of ${A}^{3}$ and we already determined the linear relation between the first column vectors of ${A}^{3},\mathrm{\dots}{A}^{0}$. The bottom line is that

$$A{\mathbf{u}}_{2}=-({\mathbf{u}}_{2}-6{\mathbf{u}}_{1}-7{\mathbf{u}}_{0}),$$ |

and consequently $A$ will have the following appearance relative to the basis $\mathbf{B}$:

$${[A]}_{\mathbf{B}}=\left[\begin{array}{ccc}\hfill 0& \hfill 0& \hfill 7\\ \hfill 1& \hfill 0& \hfill 6\\ \hfill 0& \hfill 1& \hfill -1\end{array}\right]$$ |

The transition matrix $P$ from $\mathbf{B}$ to the standard basis ${\mathbf{e}}_{1},{\mathbf{e}}_{2},{\mathbf{e}}_{3}$ is given by

$$P=\left[\begin{array}{ccc}\hfill 1& \hfill 1& \hfill 4\\ \hfill 0& \hfill 0& \hfill 1\\ \hfill 0& \hfill 1& \hfill 1\end{array}\right].$$ |

Of course $P$ is relevant to our discussion precisely because

$${[A]}_{\mathbf{B}}={P}^{-1}AP.$$ |

###### Proposition 1.

Let $A$ be an $n\mathrm{\times}n$ matrix, $P$ a non-singular $n\mathrm{\times}n$ matrix, and set $B\mathrm{=}{P}^{\mathrm{-}\mathrm{1}}\mathit{}A\mathit{}P$. The matrices $A$ and $B$ have the same characteristic polynomials.

###### Proof.

The characteristic polynomial of $B$ is given as

$$det(B-tI)=det\left({P}^{-1}AP-tI\right)=det\left({P}^{-1}(A-tI)P\right).$$ |

Recall that the determinant of a product^{} is the product of the
determinants, and that the determinant of an inverse^{} is the inverse of
the determinant. Therefore

$$det(B-tI)=det({P}^{-1})det(A-tI)det(P)=det(A-tI).$$ |

∎

In other words, according to the above theorem we should expect the characteristic polynomial of ${[A]}_{\mathbf{B}}$ to be equal to the characteristic polynomial of $A$. Let’s check this using a co-factor expansion.

$$\left|\begin{array}{ccc}\hfill -t\hfill & \hfill 0\hfill & \hfill 7\hfill \\ \hfill 1\hfill & \hfill -t\hfill & \hfill 6\hfill \\ \hfill 0\hfill & \hfill 1\hfill & \hfill -1-t\hfill \end{array}\right|=-t({t}^{2}+t-6)+7\cdot 1=-{t}^{3}-{t}^{2}+6t+7$$ |

Also note that the last column of ${[A]}_{\mathbf{B}}$ contains all but one of the coefficients of the characteristic polynomial. This too is not a coincidence.

###### Proposition 2.

Consider an $n\mathrm{\times}n$ matrix $B$ such that the ${j}^{\mathrm{\text{th}}}$ column vector for $j\mathrm{=}\mathrm{1}\mathrm{,}\mathrm{2}\mathrm{,}\mathrm{\dots}\mathit{}n\mathrm{-}\mathrm{1}$ is the basic vector ${\mathrm{e}}_{j\mathrm{+}\mathrm{1}}$, while the last column of $B$ is the vector ${\mathrm{[}\mathrm{-}{b}_{\mathrm{0}}\mathrm{,}\mathrm{-}{b}_{\mathrm{1}}\mathrm{,}\mathrm{\dots}\mathrm{,}\mathrm{-}{b}_{n\mathrm{-}\mathrm{1}}\mathrm{]}}^{T}$. In other words $B$ has the following form:

$$B=\left[\begin{array}{cccccc}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill \mathrm{\dots}\hfill & \hfill 0\hfill & \hfill -{b}_{0}\hfill \\ \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill \mathrm{\dots}\hfill & \hfill 0\hfill & \hfill -{b}_{1}\hfill \\ \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill \mathrm{\dots}\hfill & \hfill 0\hfill & \hfill -{b}_{2}\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill \mathrm{\dots}\hfill & \hfill 0\hfill & \hfill -{b}_{3}\hfill \\ \hfill \mathrm{\vdots}\hfill & \hfill \mathrm{\vdots}\hfill & \hfill \mathrm{\vdots}\hfill & \hfill \mathrm{\ddots}\hfill & \hfill \mathrm{\vdots}\hfill & \hfill \mathrm{\vdots}\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill \mathrm{\dots}\hfill & \hfill 1\hfill & \hfill -{b}_{n-1}\hfill \end{array}\right]$$ |

Then, the characteristic polynomial of $B$ is given by

$${(-1)}^{n}{p}_{B}(t)={t}^{n}+{b}_{n-1}{t}^{n-1}+\mathrm{\dots}+{b}_{2}{t}^{2}+{b}_{1}t+{b}_{0}.$$ |

###### Proof.

We will calculate the determinant of $B-tI$ by doing a co-factor expansion along the first row. Let ${B}_{1}$ be the matrix obtained by deleting the first row and the first column from $B-tI$, and let $D$ be the matrix obtained by deleting the first row and the last column from $B-tI$. Doing a co-factor expansion along the top row, it is easy to see that

$$det(B-tI)=-tdet({B}_{1})+{(-1)}^{n}{b}_{0}det(D).$$ |

Now $D$ is an
upper triangular matrix^{} with ones on the diagonal, and therefore
$det(D)=1$. The matrix ${B}_{1}$, on the other hand has the same
structure^{} as $B-tI$, only it’s 1 size smaller. To that end let
${B}_{2}$ be the matrix obtained by deleting the first two rows and
columns from $B-tI$. By the same reasoning as above it’s easy to
see that

$$det({B}_{1})=-tdet({B}_{2})+{(-1)}^{n-1}{b}_{1},$$ |

and therefore

$$det(B-tI)={(-1)}^{n}{b}_{0}-t\left({(-1)}^{n-1}{b}_{1}-tdet({B}_{2})\right).$$ |

Continuing inductively we see that for even $n$, the determinant of $B-tI$ will have the form:

$${b}_{0}-t\left(-{b}_{1}-t\left({b}_{2}-t\left(-{b}_{3}-t(\mathrm{\dots})\right)\right)\right)={b}_{0}+{b}_{1}t+{b}_{2}{t}^{2}+{b}_{3}{t}^{3}+\mathrm{\dots}+{b}_{n-1}{t}^{n-1}+{t}^{n}$$ |

For odd $n$, $det(B-tI)$ will be just like the
formula^{} above, but multiplied through by a negative sign.
∎

## 4 Putting it all together

Thanks to Propositions^{} 1 and 2 we are
now in a position to understand and to prove the
Cayley-Hamilton Theorem. Let $A$ be an $n\times n$ matrix. Start by
setting ${\mathbf{u}}_{0}={\mathbf{e}}_{1}$, and then create a sequence^{} of vectors by
successively applying $A$, i.e. ${\mathbf{u}}_{1}=A{\mathbf{u}}_{0}$, ${\mathbf{u}}_{2}=A{\mathbf{u}}_{1}$, etc.
Notice that ${\mathbf{u}}_{k}={A}^{k}{\mathbf{u}}_{0}$; in other words, ${\mathbf{u}}_{k}$ is the first
column of the matrix ${A}^{k}$.

Next, suppose that the $n$ vectors ${\mathbf{u}}_{0},{\mathbf{u}}_{1},{\mathbf{u}}_{2},\mathrm{\dots},{\mathbf{u}}_{n-1}$ form a basis, $\mathbf{B}$, of ${\mathbb{R}}^{n}$ (There are matrices $A$ for which this doesn’t happen, but we’ll consider this possibility later.) There will therefore exist scalars ${b}_{0},{b}_{1},\mathrm{\dots}{b}_{n-1}$ such that

$${\mathbf{u}}_{n}+{b}_{n-1}{\mathbf{u}}_{n-1}+\mathrm{\dots}+{b}_{1}{\mathbf{u}}_{1}+{b}_{0}{\mathbf{u}}_{0}=0.$$ |

Now the representation of $A$ relative to the cyclic basis $\mathbf{B}$ will have the form

$${[A]}_{\mathbf{B}}=\left[\begin{array}{cccccc}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill \mathrm{\dots}\hfill & \hfill 0\hfill & \hfill -{b}_{0}\hfill \\ \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill \mathrm{\dots}\hfill & \hfill 0\hfill & \hfill -{b}_{1}\hfill \\ \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill \mathrm{\dots}\hfill & \hfill 0\hfill & \hfill -{b}_{2}\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill \mathrm{\dots}\hfill & \hfill 0\hfill & \hfill -{b}_{3}\hfill \\ \hfill \mathrm{\vdots}\hfill & \hfill \mathrm{\vdots}\hfill & \hfill \mathrm{\vdots}\hfill & \hfill \mathrm{\ddots}\hfill & \hfill \mathrm{\vdots}\hfill & \hfill \mathrm{\vdots}\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill \mathrm{\dots}\hfill & \hfill 1\hfill & \hfill -{b}_{n-1}\hfill \end{array}\right]$$ |

By Proposition 1 the characteristic polynomial of ${[A]}_{\mathbf{B}}$ is equal to the characteristic polynomial of $A$. Furthermore, by Proposition 2 the characteristic polynomial of ${[A]}_{\mathbf{B}}$ is equal to

$$\pm ({t}^{n}+{b}_{n-1}{t}^{n-1}+\mathrm{\dots}{b}_{1}t+{b}_{0}).$$ |

Only one conclusion^{}
is possible: ${b}_{0},{b}_{1},\mathrm{\dots},{b}_{n-1}$ must be precisely the
coefficients of the characteristic polynomial of $A$. Let us
summarize these findings.

###### Proposition 3.

Let $A$ be an $n\mathrm{\times}n$ matrix, with characteristic polynomial

$${p}_{A}(t)=\pm \left({t}^{n}+{b}_{n-1}{t}^{n-1}+\mathrm{\dots}+{b}_{1}t+{b}_{0}\right).$$ |

Fix a number $k$ between $\mathrm{1}$ and $n$, and let ${\mathrm{u}}_{j}$ be the ${k}^{\mathrm{\text{th}}}$ column of the matrix ${A}^{j}$. If the vectors ${\mathrm{u}}_{\mathrm{0}}\mathrm{,}{\mathrm{u}}_{\mathrm{1}}\mathrm{,}\mathrm{\dots}\mathrm{,}{\mathrm{u}}_{n\mathrm{-}\mathrm{1}}$ form a basis of ${\mathrm{R}}^{n}$, then the vectors ${\mathrm{u}}_{\mathrm{0}}\mathrm{,}{\mathrm{u}}_{\mathrm{1}}\mathrm{,}\mathrm{\dots}\mathrm{,}{\mathrm{u}}_{n\mathrm{-}\mathrm{1}}\mathrm{,}{\mathrm{u}}_{n}$ satisfy the linear relation:

$${\mathbf{u}}_{n}+{b}_{n-1}{\mathbf{u}}_{n-1}+\mathrm{\dots}+{b}_{1}{\mathbf{u}}_{1}+{b}_{0}{\mathbf{u}}_{0}=0.$$ |

## 5 A Complication

We are almost done with the proof of the Cayley-Hamilton Theorem. First, however, we must deal with the possibility that the square matrix $A$ is such that the column vectors of ${A}^{0},{A}^{1},\mathrm{\dots},{A}^{n-1}$ do not form a basis. Consider, for example

$$A=\left[\begin{array}{ccc}\hfill 1& \hfill -1& \hfill 2\\ \hfill 1& \hfill 4& \hfill -4\\ \hfill 1& \hfill 2& \hfill -2\end{array}\right]$$ |

An easy calculation shows that the characteristic polynomial is given by

$${p}_{A}(t)={t}^{3}-3{t}^{2}+t+2.$$ |

Writing down the sequence of powers of $A$:

$$\begin{array}{cccc}\hfill {A}^{3}\hfill & \hfill {A}^{2}\hfill & \hfill {A}^{1}\hfill & \hfill {A}^{0}\hfill \\ & & & \\ \hfill \left[\begin{array}{ccc}\hfill 3& \hfill -2& \hfill 4\\ \hfill 2& \hfill 15& \hfill -14\\ \hfill 2& \hfill 7& \hfill -6\end{array}\right]\hfill & \hfill \left[\begin{array}{ccc}\hfill 2& \hfill -1& \hfill 2\\ \hfill 1& \hfill 7& \hfill -6\\ \hfill 1& \hfill 3& \hfill -2\end{array}\right]\hfill & \hfill \left[\begin{array}{ccc}\hfill 1& \hfill -1& \hfill 2\\ \hfill 1& \hfill 4& \hfill -4\\ \hfill 1& \hfill 2& \hfill -2\end{array}\right]\hfill & \hfill \left[\begin{array}{ccc}\hfill 1& \hfill 0& \hfill 0\\ \hfill 0& \hfill 1& \hfill 0\\ \hfill 0& \hfill 0& \hfill 1\end{array}\right]\hfill \end{array}$$ |

we notice that the first columns do, in fact, obey a linear relation with the coefficients of the characteristic polynomial:

$$\left[\begin{array}{c}\hfill 3\\ \hfill 2\\ \hfill 2\end{array}\right]-3\left[\begin{array}{c}\hfill 2\\ \hfill 1\\ \hfill 1\end{array}\right]+\left[\begin{array}{c}\hfill 1\\ \hfill 1\\ \hfill 1\end{array}\right]+2\left[\begin{array}{c}\hfill 1\\ \hfill 0\\ \hfill 0\end{array}\right]=\left[\begin{array}{c}\hfill 0\\ \hfill 0\\ \hfill 0\end{array}\right].$$ | (1) |

However these first column vectors do not form a basis of ${\mathbb{R}}^{3}$, and therefore Proposition 3 is not enough to explain why these vectors obey the above linear relation.

In order to find an explanation, let us proceed as follows. Just as
before, start by setting ${\mathbf{u}}_{0}={\mathbf{e}}_{1}$, and ${\mathbf{u}}_{1}=A{\mathbf{u}}_{0}$. If we
take ${\mathbf{u}}_{2}=A{\mathbf{u}}_{1}$, then $\mathbf{B}=({\mathbf{u}}_{0},{\mathbf{u}}_{1},{\mathbf{u}}_{2})$ will not form a
basis, so instead, let us choose ${\mathbf{u}}_{2}$ that is linearly independent^{}
from ${\mathbf{u}}_{0}$ and ${\mathbf{u}}_{1}$, thereby ensuring that $\mathbf{B}$ is a basis.
There are many, many possible such choices for ${\mathbf{u}}_{2}$. To keep the
discussion concrete, let us take ${\mathbf{u}}_{2}={\mathbf{e}}_{3}={[0,0,1]}^{T}$. Note that

$$A{\mathbf{u}}_{0}={\mathbf{u}}_{1}$$ |

$$A{\mathbf{u}}_{1}={[2,1,1]}^{T}={\mathbf{u}}_{0}+{\mathbf{u}}_{1}.$$ |

$$A{\mathbf{u}}_{2}={[2,-4,2]}^{T}=6{\mathbf{u}}_{0}-4{\mathbf{u}}_{1}+2{\mathbf{u}}_{2}.$$ |

Therefore, representing $A$ relative to the basis $\mathbf{B}$ we obtain

$${[A]}_{\mathbf{B}}=\left[\begin{array}{ccc}\hfill 0& \hfill 1& \hfill 6\\ \hfill 1& \hfill 1& \hfill -4\\ \hfill 0& \hfill 0& \hfill 2\end{array}\right]$$ |

By Proposition 1, we know that the characteristic polynomial of ${[A]}_{\mathbf{B}}$ is equal to the characteristic polynomial of $A$. However, we know much more.

###### Proposition 4.

Let $B$ be an $n\mathrm{\times}n$ matrix of the form

$$B=\left[\begin{array}{cc}\hfill {B}_{1}\hfill & \hfill {B}_{2}\hfill \\ \hfill \mathrm{\U0001d7ce}\hfill & \hfill {B}_{3}\hfill \end{array}\right]$$ |

where ${B}_{\mathrm{1}}$ is a $k\mathrm{\times}k$ matrix, ${B}_{\mathrm{2}}$ is a $k\mathrm{\times}\mathrm{(}n\mathrm{-}k\mathrm{)}$ matrix, and ${B}_{\mathrm{3}}$ is a $\mathrm{(}n\mathrm{-}k\mathrm{)}\mathrm{\times}\mathrm{(}n\mathrm{-}k\mathrm{)}$ matrix. Then, the characteristic polynomial of $B$ is the product of the characteristic polynomials of ${B}_{\mathrm{1}}$ and ${B}_{\mathrm{3}}$, i.e. ${p}_{B}\mathit{}\mathrm{(}t\mathrm{)}\mathrm{=}{p}_{{B}_{\mathrm{1}}}\mathit{}\mathrm{(}t\mathrm{)}\mathrm{\times}{p}_{{B}_{\mathrm{3}}}\mathit{}\mathrm{(}t\mathrm{)}$.

###### Proof.

Note that

$$B-tI=\left[\begin{array}{cc}\hfill {B}_{1}-t{I}_{1}\hfill & \hfill {B}_{2}\hfill \\ \hfill \mathrm{\U0001d7ce}\hfill & \hfill {B}_{3}-t{I}_{3}\hfill \end{array}\right]$$ |

where ${I}_{1}$ is the $k\times k$ identity matrix, and ${I}_{3}$ is the $(n-k)\times (n-k)$ identity matrix. The Proposition now follows from the fact that the determinant of a matrix whose shape is like $B$ is the determinant of the upper-left block times the determinant of the lower-right block. ∎

Thanks to Proposition 4 we know that the characteristic polynomial of ${[A]}_{\mathbf{B}}$ is a product of the characteristic polynomial of the $2\times 2$ matrix

$$\left[\begin{array}{cc}\hfill 0& \hfill 1\\ \hfill 1& \hfill 1\end{array}\right]$$ |

and the characteristic polynomial of the $1\times 1$ matrix $[2]$. In other words,

$${p}_{{[A]}_{\mathbf{B}}}(t)=({t}^{2}-t-1)(t-2).$$ |

Furthermore by Proposition 2 we know that $A{\mathbf{u}}_{1}$, ${\mathbf{u}}_{1}$, ${\mathbf{u}}_{0}$, i.e. the first column vectors of ${A}^{2},{A}^{1},{A}^{0}$, obey a linear relation with the coefficients of the polynomial ${t}^{2}-t-1$:

$$A{\mathbf{u}}_{1}-{\mathbf{u}}_{1}-{\mathbf{u}}_{0}=\mathrm{\U0001d7ce},$$ |

$$\left[\begin{array}{c}\hfill 2\\ \hfill 1\\ \hfill 1\end{array}\right]-\left[\begin{array}{c}\hfill 1\\ \hfill 1\\ \hfill 1\end{array}\right]-\left[\begin{array}{c}\hfill 1\\ \hfill 0\\ \hfill 0\end{array}\right]=\left[\begin{array}{c}\hfill 0\\ \hfill 0\\ \hfill 0\end{array}\right].$$ | (2) |

Multiplying this relation through by $A$ wed deduce that the first column vectors of ${A}^{3},{A}^{2},{A}^{1}$ obey the same linear relation:

$$\left[\begin{array}{c}\hfill 3\\ \hfill 2\\ \hfill 2\end{array}\right]-\left[\begin{array}{c}\hfill 2\\ \hfill 1\\ \hfill 1\end{array}\right]-\left[\begin{array}{c}\hfill 1\\ \hfill 1\\ \hfill 1\end{array}\right]=\left[\begin{array}{c}\hfill 0\\ \hfill 0\\ \hfill 0\end{array}\right].$$ | (3) |

Next think about what it means to multiply a polynomial such as ${t}^{2}-t-1$ by another polynomial such as $t-2$. Indeed, one can structure the multiplication by multiplying the first polynomial through by $t$, then multiplying it through by $-2$, and then adding the two terms:

$$\begin{array}{cccc}\hfill {t}^{3}\hfill & \hfill -{t}^{2}\hfill & \hfill -t\hfill & \\ & \hfill -2{t}^{2}\hfill & \hfill 2t\hfill & \hfill 2\hfill \\ \hfill {t}^{3}\hfill & \hfill -3{t}^{2}\hfill & \hfill +t\hfill & \hfill +2\hfill \end{array}$$ |

The bottom line is, of course, just the characteristic polynomial of $A$, and the whole idea behind the above calculation is that ${p}_{A}(t)$ can be “formed out of” the polynomial ${t}^{2}-t-1$. This shows that we can combine relations (2) and (3) and produce in the end the desired relation (1). All we have to do is take relation (3), and add to it $-2$ times the relation (2). This explains why the first column vectors of ${A}^{3},{A}^{2},{A}^{1},{A}^{0}$ obey a linear relation whose coefficients come from the characteristic polynomial of $A$.

###### Proposition 5.

Let $A$ be an $n\mathrm{\times}n$ matrix, with characteristic polynomial

$${p}_{A}(t)=\pm \left({t}^{n}+{b}_{n-1}{t}^{n-1}+\mathrm{\dots}+{b}_{1}t+{b}_{0}\right).$$ |

Fix a number $k$ between $\mathrm{1}$ and $n$, and let ${\mathrm{u}}_{j}$ be the ${k}^{\mathrm{\text{th}}}$ column of the matrix ${A}^{j}$. The vectors ${\mathrm{u}}_{\mathrm{0}}\mathrm{,}{\mathrm{u}}_{\mathrm{1}}\mathrm{,}\mathrm{\dots}\mathrm{,}{\mathrm{u}}_{n\mathrm{-}\mathrm{1}}\mathrm{,}{\mathrm{u}}_{n}$ satisfy the linear relation:

$${\mathbf{u}}_{n}+{b}_{n-1}{\mathbf{u}}_{n-1}+\mathrm{\dots}+{b}_{1}{\mathbf{u}}_{1}+{b}_{0}{\mathbf{u}}_{0}=0,$$ |

even if the vectors ${\mathrm{u}}_{\mathrm{0}}\mathrm{,}{\mathrm{u}}_{\mathrm{1}}\mathrm{,}\mathrm{\dots}\mathrm{,}{\mathrm{u}}_{n\mathrm{-}\mathrm{1}}$ do not form a basis of ${\mathrm{R}}^{n}$.

###### Proof.

Suppose that there is a number $$ such that ${\mathbf{u}}_{m}$ can be given
as a linear combination^{} of ${\mathbf{u}}_{0},{\mathbf{u}}_{1},\mathrm{\dots},{\mathbf{u}}_{m-1}$; let’s
say

$${\mathbf{u}}_{m}+{c}_{m-1}{\mathbf{u}}_{m-1}+\mathrm{\dots}+{c}_{1}{\mathbf{u}}_{1}+{c}_{0}{\mathbf{u}}_{0}=0.$$ |

Choose vectors ${\mathbf{v}}_{m},{\mathbf{v}}_{m+1},\mathrm{\dots},{\mathbf{v}}_{n-1}$ so that the list

$$\mathbf{B}=({\mathbf{u}}_{0},{\mathbf{u}}_{1},\mathrm{\dots},{\mathbf{u}}_{m-1},{\mathbf{v}}_{m},{\mathbf{v}}_{m+1},\mathrm{\dots},{\mathbf{v}}_{n-1})$$ |

forms a basis of ${\mathbb{R}}^{n}$. Relative to this basis, $A$ will have the form

$$A=\left[\begin{array}{cc}\hfill {A}_{1}& \hfill {A}_{2}\\ \hfill 0& \hfill {A}_{3}\end{array}\right],$$ |

where the upper-left block has the form

$${A}_{1}=\left[\begin{array}{ccccc}\hfill 0\hfill & \hfill 0\hfill & \hfill \mathrm{\dots}\hfill & \hfill 0\hfill & \hfill -{c}_{0}\hfill \\ \hfill 1\hfill & \hfill 0\hfill & \hfill \mathrm{\dots}\hfill & \hfill 0\hfill & \hfill -{c}_{1}\hfill \\ \hfill 0\hfill & \hfill 1\hfill & \hfill \mathrm{\dots}\hfill & \hfill 0\hfill & \hfill -{c}_{2}\hfill \\ \hfill \mathrm{\vdots}\hfill & \hfill \mathrm{\vdots}\hfill & \hfill \mathrm{\ddots}\hfill & \hfill \mathrm{\vdots}\hfill & \hfill \mathrm{\vdots}\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill \mathrm{\dots}\hfill & \hfill 1\hfill & \hfill -{c}_{m-1}\hfill \end{array}\right]$$ |

By Proposition 2,

$${p}_{{A}_{1}}(t)=\pm ({t}^{m}+{c}_{m-1}{t}^{m-1}+\mathrm{\dots}+{c}_{1}t+{c}_{0})$$ |

. Let $\pm ({t}^{n-m}+{d}_{n-m-1}{t}^{m-1}+\mathrm{\dots}{d}_{1}t+{d}_{0})$ denote the characteristic polynomial of ${A}_{3}$. By Proposition 4, the characteristic polynomial of ${[A]}_{\mathbf{B}}$ (which is equal to the characteristic polynomial of $A$) is the product ${p}_{{A}_{1}}(t)\times {p}_{{A}_{3}}(t)$, and therefore ${p}_{A}(t)$ can be obtained by taking linear combinations of ${p}_{{A}_{1}}(t)$ times various powers of $t$:

$$\begin{array}{cccccccccc}\hfill {d}_{0}\times & & & \hfill {t}^{m}+& \hfill {c}_{m-1}{t}^{m-1}+& \hfill \mathrm{\dots}+& \hfill {c}_{2}{t}^{2}+& \hfill {c}_{1}t+& \hfill {c}_{0}& \\ \hfill {d}_{1}\times & & \hfill {t}^{m+1}+& \hfill {c}_{m-1}{t}^{m}+& \hfill {c}_{m-2}{t}^{m-1}+& \hfill \mathrm{\dots}+& \hfill {c}_{1}{t}^{2}+& \hfill {c}_{0}t& & \\ & & \hfill \mathrm{\dots}& \hfill \mathrm{\dots}& \hfill \mathrm{\dots}& & & & & \\ \hfill 1\times & \hfill {t}^{n}+& \hfill {c}_{m-1}{t}^{n-1}+& \hfill \mathrm{\dots}+& \hfill {c}_{1}{t}^{n-m+1}+& \hfill {c}_{0}{t}^{n-m}& & & & \\ & \hfill {t}^{n}+& \hfill {b}_{n-1}{t}^{n-1}+& \hfill {b}_{n-2}{t}^{n-2}+& \hfill \mathrm{\dots}& \hfill +& \hfill {b}_{2}{t}^{2}+& \hfill {b}_{1}t+& \hfill {b}_{0}& \end{array}$$ |

Corresponding to the above polynomials are the relations

$$\begin{array}{cccccccccc}& & \hfill {\mathbf{u}}_{m}+& \hfill {c}_{m-1}{\mathbf{u}}_{m-1}+& \hfill \mathrm{\dots}+& \hfill {c}_{2}{\mathbf{u}}_{2}+& \hfill {c}_{1}{\mathbf{u}}_{1}+& \hfill {c}_{0}{\mathbf{u}}_{0}& \hfill =0& \\ & \hfill {\mathbf{u}}_{m+1}+& \hfill {c}_{m-1}{\mathbf{u}}_{m}+& \hfill {c}_{m-2}{\mathbf{u}}_{m-1}+& \hfill \mathrm{\dots}+& \hfill {c}_{1}{\mathbf{u}}_{2}+& \hfill {c}_{0}{\mathbf{u}}_{1}& & \hfill =0& \\ & \hfill \mathrm{\dots}& \hfill \mathrm{\dots}& \hfill \mathrm{\dots}& & & & & & \\ \hfill {\mathbf{u}}_{n}+& \hfill {c}_{m-1}{\mathbf{u}}_{n-1}+& \hfill \mathrm{\dots}+& \hfill {c}_{1}{\mathbf{u}}_{n-m+1}+& \hfill {c}_{0}{\mathbf{u}}_{n-m}& & & & \hfill =0& \end{array}$$ |

Adding these relations in the same way as the polynomials yields the desired relation:

$${\mathbf{u}}_{n}+{b}_{n-1}{\mathbf{u}}_{n-1}+\mathrm{\dots}+{b}_{1}{\mathbf{u}}_{1}+{b}_{0}{\mathbf{u}}_{0}=0,$$ |

∎

Now we really are finished. Thanks to Propositions 3 and 5 we know that for every $k=1,\mathrm{\dots},n$, the ${k}^{\mathrm{th}}$ column vectors of the matrices

$${A}^{n},{A}^{n-1},\mathrm{\dots},{A}^{1},{A}^{0}$$ |

obey a linear relation with the coefficients of the characteristic polynomial of $A$. Since this is true for every column of the above matrices, it is certainly true for the full matrix — and that is the precisely the conclusion of the Cayley-Hamilton theorem.

Title | lecture notes on the Cayley-Hamilton theorem |
---|---|

Canonical name | LectureNotesOnTheCayleyHamiltonTheorem |

Date of creation | 2013-03-22 16:50:59 |

Last modified on | 2013-03-22 16:50:59 |

Owner | rmilson (146) |

Last modified by | rmilson (146) |

Numerical id | 8 |

Author | rmilson (146) |

Entry type | Topic |

Classification | msc 15A18 |

Classification | msc 15A15 |