proof of Birkhoff-von Neumann theorem

First, we prove the following lemma:
Lemma:
A convex combination of doubly stochastic matrices is doubly stochastic. Proof:
Let $\{A_{i}\}_{i=1}^{m}$ be a collection of $n\times n$ doubly-stochastic matrices, and suppose $\{\lambda_{i}\}_{i=1}^{m}$ is a collection of scalars satisfying $\sum_{i=1}^{m}\lambda_{i}=1$ and $\lambda_{i}\geq 0$ for each $i=1,\ldots,m$ . We claim that $A=\sum_{i=1}^{m}\lambda_{i}A_{i}$ is doubly stochastic.
Take any $i\in\{1,\ldots,m\}$ . Since $A_{i}$ is doubly stochastic, each of its rows and columns sum to 1. Thus each of the rows and columns of $\lambda_{i}A_{i}$ sum to $\lambda_{i}$ .
By the definition of elementwise summation, given matrices $N=M_{1}+M_{2}$ , the sum of the entries in the $i$ th column of $N$ is clearly the sum of the sums of entries of the $i$ th columns of $M_{1}$ and $M_{2}$ respectively. A similar result holds for the $j$ th row.
Hence the sum of the entries in the $i$ th column of $A$ is the sum of the sums of entries of the $i$ th columns of $\lambda_{k}A_{k}$ for each $i$ , that is, $\sum_{k=1}^{m}\lambda_{k}=1$ . The sum of entries of the $j$ th row of $A$ is the same. Hence $A$ is doubly stochastic.
Observe that since a permutation matrix has a single nonzero entry, equal to 1, in each row and column, so the sum of entries in any row or column must be 1. So a permutation matrix is doubly stochastic, and on applying the lemma, we see that a convex combination of permutation matrices is doubly stochastic. This provides one direction of our proof; we now prove the more difficult direction: suppose $B$ is doubly-stochastic. Define a weighted graph $G=(V,E)$ with vertex set $V=\{r_{1},\ldots,r_{n},c_{1},\ldots,c_{n}\}$ , edge set $E$ , where $e_{ij}=(r_{i},c_{i})\in E$ if $B_{ij}\neq 0$ , and edge weight $\omega$ , where $\omega(e_{ij})=B_{ij}$ . Clearly $G$ is a bipartite graph, with partitions $R=\{r_{1},\ldots,r_{n}\}$ and $C=\{c_{1},\ldots,c_{n}\}$ , since the only edges in $E$ are between $r_{i}$ and $c_{j}$ for some $i,j\in\{1,\ldots,n\}$ . Furthermore, since $B_{ij}\geq 0$ , then $\omega(e)>0$ for every $e\in E$ . For any $A\subset V$ define $N(A)$ , the neighbourhood of $A$ , to be the set of vertices $u\in V$ such that there is some $v\in A$ such that $(u,v)\in E$ . We claim that, for any $v\in V$ , $\sum_{u\in N(\{v\})}\omega(u,v)=1$ . Take any $v\in V$ ; either $v\in R$ or $v\in C$ . Since $G$ is bipartite, $v\in R$ implies $N(\{v\})\subset C$ , and $v\in C$ implies $N(\{v\})\subset R$ . Now,

\begin{array}[]{c}{\displaystyle v=r_{i}\implies\sum_{u\in N(r_{i})}\omega(r_{% i},u)=\sum_{\lx@stackrel{{\scriptstyle j=1,}}{{e_{ij}\in E}}}^{n}\omega(r_{i},% c_{j})=\sum_{\lx@stackrel{{\scriptstyle j=1,}}{{B_{ij}\neq 0}}}^{n}B_{ij}=\sum% _{j=1}^{n}B_{ij}=1}\\ {\displaystyle v=c_{j}\implies\sum_{u\in N(c_{j})}\omega(u,c_{j})=\sum_{% \lx@stackrel{{\scriptstyle i=1,}}{{e_{ij}\in E}}}^{n}\omega(r_{i},c_{j})=\sum_% {\lx@stackrel{{\scriptstyle i=1,}}{{B_{ij}\neq 0}}}^{n}B_{ij}=\sum_{i=1}^{n}B_% {ij}=1}\end{array}

since $B$ is doubly stochastic. Now, take any $A\subset R$ . We have

\sum_{\lx@stackrel{{\scriptstyle v\in A}}{{w\in N(A)}}}\omega(v,w)=\sum_{v\in A% }\sum_{w\in N(\{v\})}\omega(v,w)=\sum_{v\in A}1=|A|

Let $B=N(A)$ . But then clearly $A\subset N(B)$ , by definition of neighbourhood. So

|N(A)|=|B|=\sum_{\lx@stackrel{{\scriptstyle v\in B}}{{w\in N(B)}}}\omega(v,w)% \geq\sum_{\lx@stackrel{{\scriptstyle v\in B}}{{w\in A}}}\omega(v,w)=\sum_{% \lx@stackrel{{\scriptstyle w\in A}}{{v\in N(A)}}}\omega(v,w)=|A|

So $|N(A)|\geq|A|$ . We may therefore apply the graph-theoretic version of Hall’s marriage theorem to $G$ to conclude that $G$ has a perfect matching. So let $M\subset E$ be a perfect matching for $G$ . Define an $n\times n$ matrix $P$ by

P_{ij}=\left\{\begin{array}[]{ll}1&\text{if }e_{ij}\in M\\ 0&\text{otherwise}\end{array}\right.

Note that $B_{ij}=0$ implies $P_{ij}=0$ : if $B_{ij}=0$ , then $(r_{i},c_{j})\not\in E$ , so $(r_{i},c_{j})\not\in M$ , which implies $P_{ij}=0$ . Further, we claim that $P$ is a permutation matrix:
Let $i$ be any row of $P$ . Since $M$ is a perfect matching of $G$ , there exists $e_{0}\in M$ such that $r_{i}$ is an end of $e_{0}$ . Let the other end be $c_{j}$ for some $i$ ; then $P_{ij}=1$ .
Suppose $i_{i},i_{2}\in\{1,\ldots,n\}$ with $i_{1}\neq i_{2}$ and $P_{i_{1},j}=P_{i_{2},j}=1$ for some $j$ . This implies $(r_{i_{1}},c_{j}),(r_{i_{2}},c_{j})\in M$ , but this implies the vertex $i$ is the end of two distinct edges, which contradicts the fact that $M$ is a matching.
Hence, for each row and column of $P$ , there is exactly one nonzero entry, whose value is 1. So $P$ is a permutation matrix. Define $\displaystyle{\lambda=\min_{i,j\in\{1,\ldots,n\}}\{B_{ij}\mid P_{ij}\neq 0\}}$ . We see that $\lambda>0$ since $B_{ij}\geq 0$ , and $P_{ij}\neq 0\implies B_{ij}\neq 0$ . Further, $\lambda=B_{pq}$ for some $p, q$ . Let $D=B-\lambda P$ . If $D=0$ , then $\lambda=1$ and $B$ is a permutation matrix, so we are done. Otherwise, note that $D$ is nonnegative; this is clear since $\lambda P_{ij}\leq\lambda\leq B_{ij}$ for any $B_{ij}\neq 0$ . Notice that $D_{pq}=B_{pq}-\lambda P_{pq}=\lambda-\lambda\cdot 1=0$ . Note that since every row and column of $B$ and $P$ sums to 1, that every row and column of $D=B-\lambda P$ sums to $1-\lambda$ . Define $B^{\prime}=\frac{1}{1-\lambda}D$ . Then every row and column of $B^{\prime}$ sums to 1, so $B^{\prime}$ is doubly stochastic. Rearranging, we have $B=\lambda P+(1-\lambda)B^{\prime}$ . Clearly $B_{ij}=0$ implies that $P_{ij}=0$ which implies that $B^{\prime}_{ij}=0$ , so the zero entries of $B^{\prime}$ are a superset of those of $B^{\prime}$ . But notice that $B^{\prime}_{pq}=\frac{1}{1-\lambda}D_{pq}=0$ , so the zero entries of $B^{\prime}$ are a strict superset of those of $B$ . We have decomposed $B$ into a convex combination of a permutation matrix and another doubly stochastic matrix with strictly more zero entries than $B$ . Thus we may apply this procedure repeatedly on the doubly stochastic matrix obtained from the previous step, and the number of zero entries will increase with each step. Since $B$ has at most $n^{2}$ nonzero entries, we will obtain a convex combination of permutation matrices in at most $n^{2}$ steps. Thus $B$ is indeed expressible as a convex combination of permutation matrices.

References

G. Birkhoff, “Tres observaciones sobre el algebra lineal,” Univ. Nac. Tucumán Rev, Ser. A, no. 5, pp147–151. (1946)

Title	proof of Birkhoff-von Neumann theorem
Canonical name	ProofOfBirkhoffvonNeumannTheorem
Date of creation	2013-03-22 13:10:03
Last modified on	2013-03-22 13:10:03
Owner	Andrea Ambrosio (7332)
Last modified by	Andrea Ambrosio (7332)
Numerical id	14
Author	Andrea Ambrosio (7332)
Entry type	Proof
Classification	msc 15A51