# proof of Birkhoff-von Neumann theorem

First, we prove the following lemma:

Lemma:

A convex combination^{} of doubly stochastic matrices is doubly stochastic.
Proof:

Let ${\{{A}_{i}\}}_{i=1}^{m}$ be a collection of $n\times n$ doubly-stochastic matrices, and suppose ${\{{\lambda}_{i}\}}_{i=1}^{m}$ is a collection of scalars satisfying
${\sum}_{i=1}^{m}{\lambda}_{i}=1$ and ${\lambda}_{i}\ge 0$ for each $i=1,\mathrm{\dots},m$.
We claim that $A={\sum}_{i=1}^{m}{\lambda}_{i}{A}_{i}$ is doubly stochastic.

Take any $i\in \{1,\mathrm{\dots},m\}$. Since ${A}_{i}$ is doubly stochastic, each of its rows and columns sum to 1. Thus each of the rows and columns of ${\lambda}_{i}{A}_{i}$ sum to ${\lambda}_{i}$.

By the definition of elementwise summation, given matrices $N={M}_{1}+{M}_{2}$, the sum of the entries in the $i$th column of $N$ is clearly the sum of the sums of entries of the $i$th columns of ${M}_{1}$ and ${M}_{2}$ respectively. A similar result holds for the $j$th row.

Hence the sum of the entries in the $i$th column of $A$ is the sum of the sums of entries of the $i$th columns of ${\lambda}_{k}{A}_{k}$ for each $i$, that is, ${\sum}_{k=1}^{m}{\lambda}_{k}=1$. The sum of entries of the $j$th row of $A$ is the same. Hence $A$ is doubly stochastic.

Observe that since a permutation matrix^{} has a single nonzero entry, equal to 1, in each row and column, so the sum of entries in any row or column must be 1. So a permutation matrix is doubly stochastic, and on applying the lemma, we see that a convex combination of permutation matrices is doubly stochastic.
This provides one direction of our proof; we now prove the more difficult direction: suppose $B$ is doubly-stochastic. Define a weighted graph $G=(V,E)$ with vertex set $V=\{{r}_{1},\mathrm{\dots},{r}_{n},{c}_{1},\mathrm{\dots},{c}_{n}\}$, edge set $E$, where ${e}_{ij}=({r}_{i},{c}_{i})\in E$ if ${B}_{ij}\ne 0$, and edge weight $\omega $, where $\omega ({e}_{ij})={B}_{ij}$.
Clearly $G$ is a bipartite graph^{}, with partitions^{} $R=\{{r}_{1},\mathrm{\dots},{r}_{n}\}$ and $C=\{{c}_{1},\mathrm{\dots},{c}_{n}\}$, since the only edges in $E$ are between ${r}_{i}$ and ${c}_{j}$ for some $i,j\in \{1,\mathrm{\dots},n\}$. Furthermore, since ${B}_{ij}\ge 0$, then $\omega (e)>0$ for every $e\in E$.
For any $A\subset V$ define $N(A)$, the neighbourhood of $A$, to be the set of vertices $u\in V$ such that there is some $v\in A$ such that $(u,v)\in E$.
We claim that, for any $v\in V$, ${\sum}_{u\in N(\{v\})}\omega (u,v)=1$. Take any $v\in V$; either $v\in R$ or $v\in C$. Since $G$ is bipartite, $v\in R$ implies $N(\{v\})\subset C$, and $v\in C$ implies $N(\{v\})\subset R$. Now,

$$\begin{array}{c}\hfill v={r}_{i}\u27f9\sum _{u\in N({r}_{i})}\omega ({r}_{i},u)=\sum _{\stackrel{j=1,}{{e}_{ij}\in E}}^{n}\omega ({r}_{i},{c}_{j})=\sum _{\stackrel{j=1,}{{B}_{ij}\ne 0}}^{n}{B}_{ij}=\sum _{j=1}^{n}{B}_{ij}=1\hfill \\ \hfill v={c}_{j}\u27f9\sum _{u\in N({c}_{j})}\omega (u,{c}_{j})=\sum _{\stackrel{i=1,}{{e}_{ij}\in E}}^{n}\omega ({r}_{i},{c}_{j})=\sum _{\stackrel{i=1,}{{B}_{ij}\ne 0}}^{n}{B}_{ij}=\sum _{i=1}^{n}{B}_{ij}=1\hfill \end{array}$$ |

since $B$ is doubly stochastic. Now, take any $A\subset R$. We have

$$\sum _{\stackrel{v\in A}{w\in N(A)}}\omega (v,w)=\sum _{v\in A}\sum _{w\in N(\{v\})}\omega (v,w)=\sum _{v\in A}1=|A|$$ |

Let $B=N(A)$. But then clearly $A\subset N(B)$, by definition of neighbourhood. So

$$|N(A)|=|B|=\sum _{\stackrel{v\in B}{w\in N(B)}}\omega (v,w)\ge \sum _{\stackrel{v\in B}{w\in A}}\omega (v,w)=\sum _{\stackrel{w\in A}{v\in N(A)}}\omega (v,w)=|A|$$ |

So $|N(A)|\ge |A|$. We may therefore apply the graph-theoretic version of Hall’s marriage theorem^{} to $G$ to conclude that $G$ has a perfect matching.
So let $M\subset E$ be a perfect matching for $G$. Define an $n\times n$ matrix $P$ by

$${P}_{ij}=\{\begin{array}{cc}1\hfill & \text{if}{e}_{ij}\in M\hfill \\ 0\hfill & \text{otherwise}\hfill \end{array}$$ |

Note that ${B}_{ij}=0$ implies ${P}_{ij}=0$: if ${B}_{ij}=0$, then
$({r}_{i},{c}_{j})\notin E$, so $({r}_{i},{c}_{j})\notin M$, which implies ${P}_{ij}=0$.
Further, we claim that $P$ is a permutation matrix:

Let $i$ be any row of $P$. Since $M$ is a perfect matching of $G$, there exists ${e}_{0}\in M$ such that ${r}_{i}$ is an end of ${e}_{0}$. Let the other end be ${c}_{j}$ for some $i$; then ${P}_{ij}=1$.

Suppose ${i}_{i},{i}_{2}\in \{1,\mathrm{\dots},n\}$ with ${i}_{1}\ne {i}_{2}$ and ${P}_{{i}_{1},j}={P}_{{i}_{2},j}=1$ for some $j$. This implies $({r}_{{i}_{1}},{c}_{j}),({r}_{{i}_{2}},{c}_{j})\in M$, but this implies the vertex $i$ is the end of two distinct edges, which contradicts the fact that $M$ is a matching.

Hence, for each row and column of $P$, there is exactly one nonzero entry, whose value is 1. So $P$ is a permutation matrix.
Define $\lambda =\underset{i,j\in \{1,\mathrm{\dots},n\}}{\mathrm{min}}\{{B}_{ij}\mid {P}_{ij}\ne 0\}$. We see that $\lambda >0$ since ${B}_{ij}\ge 0$, and ${P}_{ij}\ne 0\u27f9{B}_{ij}\ne 0$. Further, $\lambda ={B}_{pq}$ for some $p,q$.
Let $D=B-\lambda P$. If $D=0$, then $\lambda =1$ and $B$ is a permutation matrix, so we are done. Otherwise, note that $D$ is nonnegative; this is clear since $\lambda {P}_{ij}\le \lambda \le {B}_{ij}$ for any ${B}_{ij}\ne 0$.
Notice that ${D}_{pq}={B}_{pq}-\lambda {P}_{pq}=\lambda -\lambda \cdot 1=0$.
Note that since every row and column of $B$ and $P$ sums to 1, that every row and column of $D=B-\lambda P$ sums to $1-\lambda $. Define ${B}^{\prime}=\frac{1}{1-\lambda}D$. Then every row and column of ${B}^{\prime}$ sums to 1, so ${B}^{\prime}$ is doubly stochastic.
Rearranging, we have $B=\lambda P+(1-\lambda ){B}^{\prime}$. Clearly ${B}_{ij}=0$ implies that ${P}_{ij}=0$ which implies that ${B}_{ij}^{\prime}=0$, so the zero entries of ${B}^{\prime}$ are a superset of those of ${B}^{\prime}$. But notice that ${B}_{pq}^{\prime}=\frac{1}{1-\lambda}{D}_{pq}=0$, so the zero entries of ${B}^{\prime}$ are a strict superset of those of $B$.
We have decomposed $B$ into a convex combination of a permutation matrix and another doubly stochastic matrix with strictly more zero entries than $B$. Thus we may apply this procedure repeatedly on the doubly stochastic matrix obtained from the previous step, and the number of zero entries will increase with each step. Since $B$ has at most ${n}^{2}$ nonzero entries, we will obtain a convex combination of permutation matrices in at most ${n}^{2}$ steps.
Thus $B$ is indeed expressible as a convex combination of permutation matrices.

## References

G. Birkhoff, “Tres observaciones sobre el algebra lineal,” Univ. Nac. Tucumán Rev, Ser. A, no. 5, pp147–151. (1946)

Title | proof of Birkhoff-von Neumann theorem |
---|---|

Canonical name | ProofOfBirkhoffvonNeumannTheorem |

Date of creation | 2013-03-22 13:10:03 |

Last modified on | 2013-03-22 13:10:03 |

Owner | Andrea Ambrosio (7332) |

Last modified by | Andrea Ambrosio (7332) |

Numerical id | 14 |

Author | Andrea Ambrosio (7332) |

Entry type | Proof |

Classification | msc 15A51 |