You are here
Homeproof of CayleyHamilton theorem by formal substitutions
Primary tabs
proof of CayleyHamilton theorem by formal substitutions
Let $A$ be a $n\times n$ matrix with entries in a commutative ring with identity, and let $p(\lambda)=c_{0}+c_{1}\lambda+\dots+c_{n}\lambda^{n}$ be its characteristic polynomial. We will prove that $p(A):=c_{0}I+c_{1}A+\dots+c_{n}A^{n}=0$.
Proof$\,$ (Popular fake proof):
In the expression
$\displaystyle p(t)=\det(AtI)=c_{0}+c_{1}t+\cdots+c_{n}t^{n}\,,$ 
substitute $t=A$; then $p(A)=\det(AAI)=\det(0)=0$. $\square$
$\,$
It is clear why the argument is faulty. But interestingly, there is a way to rescue it using clever formal substitution arguments. For the moment, we assume that the matrix $A$ is over the complex field.
Since the notation $p(\lambda)$ and $p(A)$ can be confusing at first sight, as one expression takes scalar values and the other matrix values, we will change it. From now on, we will use the notation $\widetilde{p}(t)$ when applying the polymial $p$ to a matrix $t$. In this sense $p(\cdot)$ is a function of $\mathbb{C}$ and $\widetilde{p}(\cdot)$ is function of matrices. Also, by definition,
$\displaystyle\widetilde{p}(t):=c_{0}I+c_{1}t+\dots+c_{n}t^{n}\,,\qquad\qquad% \text{where $t$ is a matrix}$ 
Of course, we intend to prove that $\widetilde{p}(A)=0$.
Proof $\,$ (Proof of the complex case):
For each $\lambda\in\mathbb{C}$ let $B(\lambda)$ be the classical adjoint to $A\lambda I$. We have
$B(\lambda)(A\lambda I)=\det(A\lambda I)I=p(\lambda)I=c_{0}I+\lambda(c_{1}I)+% \cdots+\lambda^{n}(c_{n}I)\,.$  (1) 
From the definition of the classical adjoint, it is clear that $B(\lambda)$ can be written as a polynomial of $\lambda$ having degree $\leq n1$, whose coefficients are matrices. That is,
$\displaystyle B(\lambda)=D_{0}+D_{1}\lambda+\dots+D_{{n1}}\lambda^{{n1}}\,,$ 
for some constant coefficient matrices $D_{i}$.
Now, we would like $B$ to be defined for matrices (just like from the polynomial $p$ we have considered $\widetilde{p}$), so we define for every matrix $t$
$\displaystyle\widetilde{B}(t):=D_{0}+D_{1}t+\dots+D_{{n1}}t^{{n1}}\,.$ 
Now consider the following function of matrices:
$\displaystyle Q(t):=(D_{0}Ac_{0}I)+(D_{1}AD_{0}c_{1}I)t+\dots+(D_{{n1}}AD% _{{n2}}c_{{n1}}I)t^{{n1}}+(D_{{n1}}c_{n}I)t^{n}$  (2) 
The above expression may look strange, but if we think that the matrix $t$ commutes with all $D_{0},\dots,D_{{n1}}$ and $A$, then expression (2) is easily seen to be equal to
$\displaystyle(D_{0}+D_{1}t+\dots+D_{{n1}}t^{{n1}})(AIt)c_{0}Ic_{1}t\dots% c_{n}t^{n}$ 
This means that
$\displaystyle Q(t)=\widetilde{B}(t)(AIt)\widetilde{p}(t)\;\;\;\;\;\;\;\text{% whenever}\;t\text{commutes with}D_{0},\dots,D_{{n1}},A$  (3) 
The reason for not defining $Q(t)$ by the expression in (3) is that we want $Q(t)$ to be some kind of ”polynomial” in $t$ (with matrix coefficients on the left of each $t^{k}$).
We now state some properties that can be easily checked by straightforward calculation:

$\widetilde{p}(\lambda I)=p(\lambda)I$

$\widetilde{B}(\lambda I)=B(\lambda)$
Notice that matrices of the form $\lambda I$ with $\lambda\in\mathbb{C}$ commute with every other matrix, so that
$\displaystyle Q(\lambda I)=\widetilde{B}(\lambda I)(A\lambda I)\widetilde{p}% (\lambda I)=B(\lambda)(A\lambda I)p(\lambda)I=0$ 
Now $Q(\lambda I)$ is also a matrix whose entries $q_{{ij}}(\lambda)$ are polynomials in $\lambda$. Since $Q(\lambda I)=0$ we must have $q_{{ij}}(\lambda)=0$ for all $\lambda\in\mathbb{C}$. This means that $q_{{ij}}(t)$ is the zero polynomial and, since this occurs for all $i,j$, it follows that the matrix coefficients of $t^{k}$ occurring in (2) are all zero, i.e. $Q(t)$ is the zero matrix for all matrices $t$.
Taking $t=A$ we can also see that
$\displaystyle Q(A)$  $\displaystyle=$  $\displaystyle(D_{0}Ac_{0}I)+(D_{1}AD_{0}c_{1}I)A+\dots+(D_{{n1}}AD_{{n2}% }c_{{n1}}I)A^{{n1}}+(D_{{n1}}c_{n}I)A^{n}$  
$\displaystyle=$  $\displaystylec_{0}Ic_{1}A\dotsc_{{n1}}A^{{n1}}c_{n}A^{n}$  
$\displaystyle=$  $\displaystyle\widetilde{p}(A)$ 
Hence $\widetilde{p}(A)=0$, which finishes the proof. $\square$
$,$
Actually, $\mathbb{C}$ could have been substituted with $\mathbb{R}$ or $\mathbb{Q}$ in the proof above. The only property of $\mathbb{C}$ that was used is that it is an infinite integral domain.
Proof $\,$ (Proof for an arbitrary commutative ring with identity):
Let $A=(a_{{ij}})$, where the entries $a_{{ij}}$ are in commutative ring with identity $R$. First notice that, since $c_{0}+c_{1}\lambda+\dots+c_{n}\lambda^{n}=p(\lambda)=\det(A\lambda I)$, where $\lambda\in R$, we have that the coefficients $c_{0},\dots,c_{n}$ are polynomials in $\{a_{{ij}}\}$.
Hence $\widetilde{p}(A):=c_{0}I+c_{1}A+\dots+c_{n}A^{n}$ is a matrix whose entries are also polynomials in $\{a_{{ij}}\}$. These polynomials vanish for every assignment of $\{a_{{ij}}\}$ to numbers in $\mathbb{C}$, because the complex case of the theorem has already been proven (this would be the same as substituting the matrix $A$ by a matrix with complex entries). Therefore these polynomials are zero polynomials and we conclude that $\widetilde{p}(A)=0$ as we inteded to prove.$\square$
Comments on other proofs
Yet another proof of the CayleyHamilton Theorem is to establish it for diagonalizable matrices, and then by a density argument (i.e. every matrix can be approximated by diagonalizable ones in an algebraically closed field), we conclude that $p(A)=0$ is an identity for all matrices over a field. This kind of proof is also presented as an exercise in [1].
The “standard approach” can be found in [3]. It proves the result for matrices over any field, but requires no “abstract algebra” (no algebraic closure, Zariski topology or formal substitutions).
The two other proofs just mentioned can be extended to matrices over an arbitrary commutative ring simply by repeating the last argument in our proof.
References
 1 Michael Artin. Algebra. PrenticeHall, 1991.
 2 Martin Braun. Differential equations and their applications: an introduction to applied mathematics, 3rd edition. SpringerVerlag, 1983.
 3 Friedberg, Insel, Spence. Linear Algebra, 3rd edition. PrenticeHall, 1997.
Mathematics Subject Classification
15A18 no label found15A15 no label found Forums
 Planetary Bugs
 HS/Secondary
 University/Tertiary
 Graduate/Advanced
 Industry/Practice
 Research Topics
 LaTeX help
 Math Comptetitions
 Math History
 Math Humor
 PlanetMath Comments
 PlanetMath System Updates and News
 PlanetMath help
 PlanetMath.ORG
 Strategic Communications Development
 The Math Pub
 Testing messages (ignore)
 Other useful stuff
Recent Activity
new question: numerical method (implicit) for nonlinear pde by roozbe
new question: Harshad Number by pspss
Sep 14
new problem: Geometry by parag
Aug 24
new question: Scheduling Algorithm by ncovella
new question: Scheduling Algorithm by ncovella
Comments
proof needs correction
The proof on this page is not quite correct. I found your page by a link from the wikipedia (http://en.wikipedia.org/wiki/CayleyHamilton_theorem), where I recently corrected the also erroneous proof that use to be there, and provided ample discussion of how and why, so please look there for details. When you say "we think of the formal symbol $t$ as representing an $n \times n$ matrix" you ignore the fact that already it has been treated as commuting with matrices, for instance by placing them arbitrarily at the left of the terms in the expansion of $B(t)$. So you simply cannot just substitute a matrix for $t$ and expect to get a valid identity, even if (like you do) you care about left and right _after_ the substitution.
A valid argument can be given by doing some noncommutative theory first, and carefully distinguishing leftevaluation from rightevaluation. For instance another proof I found on PlanteMath, http://planetmath.org/encyclopedia/ProofOfCayleyHamiltonTheoremInACommut... is basically correct, provided one supplies proper definitions of left factor and left hand value, and proves the result used (which however is about as hard as proving CayleyHamilton in the first place...).
how corrections work in planetmath
Just a note. Here on PlanetMath we have a correction systems which allows you to request corrections to the author. Once these are fixed, the corrections are closed, meaning removed from view and the author is no longer bugged by the mistake. However, posts, such as this one, are never removed and so it is preferable to file a correction so that in the future, once the error is cleared up, the author's entry returns to normal. (You didn't know this but just info for the future.) Also, the corrections are stored in the system (and can be seen by all) and if the author does not fix them in time eventually the entry is placed on a pile for public editing.
Unlike Wikipedia, PlanetMath tries to give the original authors a chance to make changes in the style they feel matches the rest of their entry. They may also disagree with a correction so from time to time some interaction and communication is needed  for which we have a messaging system and forums.
For us it works the best and we are usually quite happy to be told of incorrect aspects in our entry. So thank you very much for taking the time to indicate the mistake you found.
If I were you, I would file a correction with your same comments so that the article will begin the correction process as normal. Thanks for joining in. I hope we see more comments/corrections/entries from you in the future.
Re: proof needs correction
Hello van Leeuwen,
I think Steve's proof is some incomplete, but I cannot see mistakes there. My opinion is that one may circumvent commutativity issue. On the ring of matrices (an algebra) is valid X^{1}X=XX^{1}=I. The definition of matrix adjugate B(\lambda) leads to
(\lambda IA)B(\lambda)=\lambda IAI, (1)
and
B(\lambda)(\lambda IA)=\lambda IAI. (2)
In both equations the RHS as well as B(\lambda) on the LHS are polynomial matrices in \lambda. One see in those eq., that \lambda IAI us divisible without remainder. So by applying generalized Bezout theorem,
http://planetmath.org/encyclopedia/GeneralizedBezoutTheoremOnMatrices.html
CayleyHamilton may be proved.
perucho