simultaneous triangularisation of commuting matrices over any field

Let $\mathbf{e}_{i}$ denote the (column) vector whose $i$ th position is $1$ and where all other positions are $0$ . Denote by $[n]$ the set $\{1,\ldots,n\}$ . Denote by $\mathrm{M}_{n}(\mathcal{K})$ the set of all $n\times n$ matrices over $\mathcal{K}$ , and by $\mathrm{GL}_{n}(\mathcal{K})$ the set of all invertible elements of $\mathrm{M}_{n}(\mathcal{K})$ . Let $d_{i}$ be the function which extracts the $i$ th diagonal element of a matrix, i.e., $d_{i}(A)=\mathbf{e}_{i}^{\mathrm{T}}\!A\mathbf{e}_{i}$ .

Theorem.

Let $\mathcal{K}$ be a field, let $A_{1},\ldots,A_{r}\in\mathrm{M}_{n}(\mathcal{K})$ be pairwise commuting matrices, and let $\mathcal{L}$ be a field extension of $\mathcal{K}$ in which the characteristic polynomials of all $A_{k}$ split (http://planetmath.org/SplittingField). Then there exists some $P\in\mathrm{GL}_{n}(\mathcal{L})$ such that

1.

$P^{-1}A_{k}P$ is upper triangular for all $k=1,\ldots,r$ , and
2.

if $i,j,l\in[n]$ are such that $i\leqslant l\leqslant j$ and $d_{i}(P^{-1}A_{k}P)=d_{j}(P^{-1}A_{k}P)$ for all $k=1,\ldots,r$ , then $d_{l}(P^{-1}A_{k}P)=d_{j}(P^{-1}A_{k}P)$ for all $k=1,\ldots,r$ as well.

The proof relies on two lemmas.

Lemma 1.

Let $\mathcal{K}$ be a field, let $A_{1},\ldots,A_{r}\in\mathrm{M}_{n}(\mathcal{K})$ be pairwise commuting matrices, and let $\mathcal{L}$ be a field extension of $\mathcal{K}$ in which the characteristic polynomials of all $A_{k}$ split. Then there exists some nonzero $\mathbf{u}\in\mathcal{L}^{n}$ which is an eigenvector of $A_{k}$ for all $k=1,\ldots,r$ .

Lemma 2.

For any sequence $R_{1},\ldots,R_{r}\in\mathrm{M}_{n}(\mathcal{L})$ of upper triangular pairwise commuting matrices and every row index $i\in[n]$ , there exists $\mathbf{v}\in\mathcal{L}^{n}\setminus\{0\}$ such that

R_{k}\mathbf{v}=d_{i}(R_{k})\mathbf{v}\quad\text{for all \(k\in[r]\).}

Proof.

This is by induction on $n$ . The induction hypothesis is that given pairwise commuting matrices $A_{1},\ldots,A_{r}\in\mathrm{M}_{n}(\mathcal{L})$ , whose characteristic polynomials all split in $\mathcal{L}$ , and a sequence of arbitrary scalars $\mu_{1},\ldots,\mu_{r}\in\mathcal{L}$ , there exists some $P\in\mathrm{GL}_{n}(\mathcal{L})$ such that:

1.

$P^{-1}A_{k}P$ is upper triangular for all $k=1,\ldots,r$ .
2.

If some $i,j\in[n]$ are such that $i<j$ and $d_{j}(P^{-1}A_{k}P)=d_{i}(P^{-1}A_{k}P)$ for all $k\in[r]$ , then $d_{i+1}(P^{-1}A_{k}P)=d_{i}(P^{-1}A_{k}P)$ .
3.

If some $j\in[n]$ is such that $d_{j}(P^{-1}A_{k}P)=\mu_{k}$ for all $k\in[r]$ , then $d_{1}(P^{-1}A_{k}P)=\mu_{k}$ for all $k\in[r]$ .

For $n=1$ this hypothesis is trivially fulfilled (all $1\times 1$ matrices are upper triangular). Assume that it holds for $n=m$ and consider the case $n=m+1$ .

It is easy to see that condition 1 implies that $P\mathbf{e}_{1}$ must be an eigenvector that is common to all the matrices. If there exists a nonzero vector $\mathbf{u}_{1}\in\mathcal{L}^{n}$ such that $A_{k}\mathbf{u}_{1}=\mu_{k}\mathbf{u}_{1}$ for all $k=1,\ldots,r$ then this is such a common eigenvector, and in that case let $\lambda_{k}=\mu_{k}$ for all $k=1,\ldots,r$ . Otherwise there by Lemma 1 exists a vector $\mathbf{u}_{1}\in\mathcal{L}^{n}\setminus\{\mathbf{0}\}$ such that $A_{k}\mathbf{u}_{1}=\lambda_{k}\mathbf{u}_{1}$ for some $\{\lambda_{k}\}_{k=1}^{r}\subseteq\mathcal{L}$ . Either way, one gets a suitable candidate $\mathbf{u}_{1}$ for $P\mathbf{e}_{1}$ and eigenvalues $\lambda_{1},\ldots,\lambda_{r}$ that incidentally will satisfy $d_{1}(P^{-1}A_{k}P)=\lambda_{k}$ for all $k\in[r]$ .

Let $\mathbf{u}_{2},\ldots,\mathbf{u}_{n}\in\mathcal{L}^{n}$ be arbitrary vectors such that $\{\mathbf{u}_{i}\}_{i=1}^{n}$ is a basis of $\mathcal{L}^{n}$ . Let $U$ be the $n\times n$ matrix whose $i$ th column is $\mathbf{u}_{i}$ for $1\leqslant i\leqslant n$ .¹¹By imposing extra conditions on the choice of the basis $\{\mathbf{u}_{i}\}_{i=1}^{n}$ (such as for example requesting that it is orthonormal) at this point, one can often prove a stronger claim where the choice of $P$ is restricted to some smaller group of matrices (for example the group of orthogonal matrices), but this requires assuming additional things about the fields $\mathcal{K}$ and $\mathcal{L}$ . Then $U$ is invertible and for each $k$ the first column of $B_{k}=U^{-1}A_{k}U$ is

U^{-1}A_{k}U\mathbf{e}_{1}=U^{-1}A_{k}\mathbf{u}_{1}=\lambda_{k}U^{-1}\mathbf{% u}_{1}=\lambda_{k}\mathbf{e}_{1}\text{.}

Furthermore

\displaystyle B_{j}B_{k}=U^{-1}A_{j}UU^{-1}A_{k}U=U^{-1}A_{j}A_{k}U=\\ \displaystyle=U^{-1}A_{k}A_{j}U=U^{-1}A_{k}UU^{-1}A_{j}U=B_{k}B_{j}

for all $j$ and $k$ .

Now let $A_{k}^{\prime}$ be the matrix formed from rows and columns $2$ though $n$ of $B_{k}$ . Since $\det(A_{k}-\nobreak xI)=\det(B_{k}-\nobreak xI)=(\lambda_{k}-\nobreak x)\det(A% _{k}^{\prime}-\nobreak xI)$ by expansion (http://planetmath.org/LaplaceExpansion) along the first column, it follows that the characteristic polynomial of $A_{k}^{\prime}$ splits in $\mathcal{L}$ . Furthermore all the $A_{k}^{\prime}$ have side $m=n-1$ and commute pairwise with each other, whence by the induction hypothesis there exists some $P^{\prime}\in\mathrm{GL}_{n-1}(\mathcal{L})$ such that every $P^{\prime-1}A_{k}^{\prime}P^{\prime}$ is upper triangular. Let $P=U\left(\begin{smallmatrix}1&0\\ 0&P^{\prime}\end{smallmatrix}\right)$ . Then the submatrix consisting of rows and columns $2$ through $n$ of $P^{-1}A_{k}P$ is equal to $P^{\prime-1}A_{k}^{\prime}P^{\prime}$ and hence contains no nonzero subdiagonal elements. Furthermore the first column of $P^{-1}A_{k}P$ is equal to the first column of $B_{k}$ and thus the $P^{-1}A_{k}P$ are all upper triangular, as claimed.

It also follows from the induction hypothesis that $P$ can be chosen such that $d_{2}(P^{-1}A_{k}P)=d_{1}(P^{\prime-1}A_{k}^{\prime}P^{\prime})=\lambda_{k}=d_% {1}(P^{-1}A_{k}P)$ for all $k\in[r]$ if there is any $j\geqslant 2$ for which $d_{j}(P^{-1}A_{k}P)=d_{j-1}(P^{\prime-1}A_{k}^{\prime}P^{\prime})=\lambda_{k}=% d_{1}(P^{-1}A_{k}P)$ for all $k\in[r]$ and more generally if $2\leqslant i<j$ are such that $d_{j}(P^{-1}A_{k}P)=d_{i}(P^{-1}A_{k}P)$ for all $k\in[r]$ then similarly $d_{i+1}(P^{-1}A_{k}P)=d_{i}(P^{-1}A_{k}P)$ for all $k\in[r]$ . This has verified condition 2 of the induction hypothesis. For the remaining condition 3, one may first observe that if there is some $i\in[n]$ such that $d_{i}(P^{-1}A_{k}P)=\mu_{k}$ for all $k\in[r]$ then by Lemma 2 there exists a nonzero $\mathbf{v}\in\mathcal{L}^{n}$ such that $P^{-1}A_{k}P\mathbf{v}=\mu_{k}\mathbf{v}$ for all $k\in[r]$ . This means $P\mathbf{v}$ will fulfill the condition for choice of $\mathbf{u}_{1}$ , and hence $d_{1}(P^{-1}A_{k}P)=\lambda_{k}=\mu_{k}$ as claimed.

The theorem now follows from the principle of induction. ∎

Title	simultaneous triangularisation of commuting matrices over any field
Canonical name	SimultaneousTriangularisationOfCommutingMatricesOverAnyField
Date of creation	2013-03-22 15:29:38
Last modified on	2013-03-22 15:29:38
Owner	lars_h (9802)
Last modified by	lars_h (9802)
Numerical id	4
Author	lars_h (9802)
Entry type	Theorem
Classification	msc 15A21
Related topic	CommutingMatrices