<<

ABSTRACT

TAYLOR, VALERIE EOWYN. The Birkhoff-von Neumann Decomposition and its Applications. (Under the direction of Dr. Arvind Krishna Saibaba).

This paper explores the Birkhoff-von Neumann decomposition theorem which is a celebrated theorem applicable to a specific class of matrices, called doubly stochastic matrices. The Birkhoff- von Neumann decomposition has many application ranging from theoretical areas such as approximation to applied areas such as graph isomorphisms and assignment. The purpose of this paper is to review the literature, to collect and organize, various statements and applications of this theorem.

This paper will start with presenting two different and equivalent statements of the theorem.

We will present proofs for both of these statements of the theorem and highlight the connections between them. We then go into the applications of this theorem. We mention several theoretical applications where the Birkhoff-von Neumann is used in the proof of other results. We then address some more applied results such as graph isomorphism and the assignment problem. The Birkhoff-von Neumann Decomposition and its Applications

by Valerie Eowyn Taylor

A thesis submitted to the Graduate Faculty of North Carolina State University in partial fulfillment of the requirements for the degree of Master of Science

Mathematics

Raleigh, North Carolina

2018

APPROVED BY:

Dr. Arvind Krishna Saibaba Dr. Ernie Stitzinger Chair of Advisory Committee

Dr. Agnes Szanto ii

BIOGRAPHY

The author was born in Chapel Hill, NC on April 8th, 1991. They were homeschooled by their mother until attending public high school in 9th grade at Cedar Ridge High School in Hillsborough,

NC. They then completed their undergraduate career at the University of North Carolina at Wilm- ington, receiving their Bachelor of Arts in Mathematics Education. Upon graduating, they began teaching math at John T. Hoggard High School in Wilmington. They taught at JTH for three years until entering graduate school at North Carolina State University. They plan on returning to the education field upon completion of the graduate program at NCSU. iii TABLE OF CONTENTS

1. Introduction ...... 1

2. Birkhoff-von Neumann Decomposition...... 2

3. Applications ...... 8 3.1. Von Neumann Inequality...... 8 3.2. Hoffman-Wielandt Theorem...... 13 3.3. Other Applications...... 15 Graph Isomorphisms ...... 15 Assignment Problem ...... 16 Majorization ...... 18 Fan Dominance Principle ...... 19

4. Conclusion...... 20

References ...... 21

APPENDICES ...... 22 Appendix A: Additional Proofs...... 23 Appendix B: Examples ...... 24 1

1. Introduction

Doubly stochastic matrices are square matrices with non-negative entries such that all the rows and columns sum to 1. Doubly stochastic matrices have many applications. These matrices are used in intercity population migration models. An example of this would be that there are cities

C1,...,Cn with n ≥ 2 and each day a constant fraction aij of the current population of city j moves to city i for all distinct i, j ∈ {1, . . . , n}. Problems like this are necessary for issues such as planning city services and capital investment. These migration models quickly become complication and doubly stochastic matrices can be extremely useful in the calculations [4]. Doubly stochastic matrices are also used in Markov Chains and modeling problems in economics and operations research. Another application related to modeling deals with communication theory and satellites orbiting the earth [2].

This paper will explore Birkhoff-von Neumann decomposition and various applications of the theorem. We will first present two different, and equivalent, statements of the theorem along with two different proofs. Then we will look at how this result impacted the development of other results also well as some applied applications of the theorem.

Throughout this document let A = [aij] be an n × n matrix in R, unless otherwise stated. We denote by π a permutation of the integers {1, . . . , n}, and it’s entries by π(i) for i = 1, . . . , n.

Denote the columns of the n × n by e1, . . . , en. Associated with every permutation

π is the P given by

  P = . eπ(1) . . . eπ(n) 2

2. Birkhoff-von Neumann Decomposition

The Birkhoff-von Neumann Decomposition theorem is an important result for a special class of matrices known as doubly stochastic, a definition of which follows below.

Definition 1 (Doubly Stochastic Matrices). A doubly is a square n × n matrix A, with non-negative entries aij ≥ 0 for i, j = 1, . . . , n and

n n X X aij = 1, j = 1, . . . , n and aij = 1, i = 1, . . . n. i=1 j=1

This definition simply means that all the columns and the rows sum to 1. Alternatively, given

T n a doubly stochastic matrix A and the vector of all ones e = (1,..., 1) ∈ R , then Ae = e and eT A = eT . This second definition implies that 1 is always an eigenvalue value of a doubly stochastic matrix, and the corresponding right and left eigenvector is e.

Given this definition, we can state the first version of the Birkhoff-von Neumann decomposition theorem.

Theorem 2 (Birkhoff-von Neumann Decomposition). Let A be a doubly stochastic ma-

Pk trix, there exist constants α1, α2, . . . , αk ∈ (0, 1) with i=1 αi = 1 and permutation matrices

P1,P2,...,Pk such that

A = α1Pi + ... + αkPk.

That is, a doubly stochastic matrix can be expressed as a convex combination of permutation matrices. Conversely, a single permutation matrix is also a doubly stochastic matrix. Note the summation is what is referred to as the Birkhoff-von Neumann decomposition. Before we can prove this theorem, we need to establish the following lemma. 3

Lemma 3. Let A be a n × n doubly stochastic matrix that is not the identity matrix. There is a permutation π of {1, . . . n} that is not the identity permutation and is such that

a1π(1) ··· anπ(n) > 0.

This means that we can find n nonzero elements of A, one in each column. Recall that all of the entries in A are positive, and this lemma is saying that their product is positive, which is simply implying that none of the entries are zero. The proof of this lemma is located in Appendix A, so that we can directly give the proof of the Birkhoff-von Neumann theorem. The following proof is adapted from the proof presented by Marshall, Olkin and Arnold [6].

Proof of Theorem 2. Let A be doubly stochastic. If A is a permutation matrix, there is nothing to prove. So, assume that A is not a permutation matrix.

Let π be a permutation of (1, . . . , n) that is not the identity permutation, such that the product a1π(1)a2π(2) . . . anπ(n) 6= 0, whose existence is ensured by Lemma 3. Denote the corresponding permutation matrix by P1. Let c1 = min{a1π(1), . . . , anπ(n)} and define R by A = c1P1 + R.

Note that c1 ≤ 1 since A is doubly stochastic. Also note that c1 6= 0 since none of the values

{a1π(1), . . . , anπ(n)} can equal 0 since their product is not 0.

Because c1P1 has element c1 in positions 1π(1), 2π(2), . . . , nπ(n) and A has elements a1π(1), . . . anπ(n) in the corresponding positions, the choice of c1 = min{a1π(1), . . . , anπ(n)} ensures that alπ(l) −c1 ≥ 0, with equality for some l.

Consequently, R has non-negative elements, since rlπ(l) = alπ(l) − c1, and contains at least one more zero element than A, since for some l we know that alπ(l) = c1 which implies rlπ(l) = 0.

Observe that for e = (1, 1,..., 1)T we have that

e = Ae = (c1P1 + R)e = c1P1e + Re = c1e + Re. 4

Now we have two cases to look at. First consider if c1 = 1. This implies that R = 0 and A = P1 so

A is already a permutation matrix and the desired decomposition is trivial.

For the second case consider c1 < 1. From our earlier statement that e = c1e + Re we can say that e = A e where A = R . A is doubly stochastic since all entries are positive and A e = e, 1 1 1−c1 1 1 which is the definition of doubly stochastic.

In this case, we apply the same procedure to A1 to continue the decomposition. Each time we reduce the number of nonzero entries in the remainder, until we get the . Note that each time we pick a permutation of {1, . . . , n} we necessarily pick a permutation that is not the identity. If the identity permutation is the only one available, then A = I as shown in the proof of

Lemma 3, and hence the decomposition is done.

Consequently, for some k, when the remainder is 0, we have A = c1P1 + ... + ckPk where each Pi is a permutation matrix. In remains to observe that e = Ae = c1P1e + ... + ckPke = (c1 + . . . ck)e

Pk which implies that i=1 ci = 1. This completes the proof. 

It is relevant here to note a bound on the number of iterations needed to reach the decomposition and a bound on k, the number of summands. We know that a doubly stochastic matrix has at most n2 − n zero entries. This is because there are a total of n2 entries and there must be at least one nonzero entry in each column, meaning a least n non-zero entries. Since there are at most n2 − n zero entries this implies there are at most n2 − n iterations in the decomposition process. After this many iterations we would have n2 − n + 1 summands, that is k ≤ n2 − n + 1.

This bound has since been improved upon by several sources. For reference on these sources consult [1, Section II pg. 38] and [6, Section 2 Theorem F.2]. It has been proved that the best possible bound for k is n2 −2n+2. That is, every n×n doubly stochastic matrix can be represented as a convex combination of at most n2 − 2n + 2 permutation matrices. 5

One immediate consequence is that this is a constructive proof for computing the Birkhoff-von

Neumann decomposition. Below we have given an example of a doubly stochastic 3 × 3 matrix and what its resulting decomposition would be. The complete work for finding this decomposition by hand is located in Appendix B. However, in reality this is not the algorithm that would be used to complete this decomposition. It has been shown that the Birkhoff-von Neumann decomposition is an NP-Hard problem.

Example 1. Let A be defined as the matrix below

  1/3 2/3 0       A = 1/3 1/3 1/3 .       1/3 0 2/3

It can easily be verified that A is doubly stochastic, as each row and column sums to 1. After following the steps described in the proof above, we reach the decomposition

      0 1 0 0 1 0 1 0 0             1   1   1   A = 0 0 1 + 1 0 0 + 0 1 0 . 3   3   3               1 0 0 0 0 1 0 0 1

There is another, equivalent, way to state the Birkhoff-von Neumann Decomposition theorem.

It is stated below and followed by a different proof of this same result.

Theorem 4 (Birkhoff-von Neumann Decomposition). The permutation matrices constitute the extreme points of the set of doubly stochastic matrices. Moreover, the set of doubly stochastic matrices is the of the permutation matrices.

Let us clarify the concepts of extreme points and convex hull in regards to matrices. Geometri- cally, a convex figure is one in which if we connect any two points a and b that are in the figure, 6 the line segment ab will also be contained within the figure. A is the general name

n given to multi-dimensional figures in R that are both a polytope and a convex set of points. Recall that a polytope is a geometric object with flat sides. That is, a convex polytope is the smallest

n convex figure that contains a finite nonempty set of points in R .

Convex polytopes can also be viewed as the intersection of hyperplanes. This can be helpful when relating these polytopes to matrices. First note that a closed hyperplane can be written as a linear inequality, such as aT x ≤ b. Thus a convex polytope can be viewed as the set of solutions to a system of linear inequalities Ax ≤ b where A is the n × n matrix of coefficients. So then the set of matrices that are solutions to this system of linear inequalities define the hyperplanes. Thus this matrices represent a convex polytope containing some convex set. Then we can see that the

k 2 n × n doubly stochastic matrices can be viewed as a convex polytope in R where k = n .

An extreme point of a convex set is similar to the idea of maximum and minimum. Geometrically, the extreme points are the vertices of the convex polytope. Note that any point contained within the convex polytope can be written as a linear combination of it’s vertices. Thus, the extreme point matrices in this case are those that represent the vertices of the convex polytope created by the set of doubly stochastic matrices. This also implies that every doubly stochastic matrix can be written as a linear combination of the extreme point matrices. Another important property of a matrix, say A, that is an extreme point is that A cannot be written as λA1 + (1 − λ)A2 for distinct elements A1 and A2 in the convex set and with 0 < λ < 1.

n2 For the convex polytope in R that is formed by the doubly stochastic matrices, an extreme

2 2 point must satisfy pij = 0 for at least n − (2n − 1) = (n − 1) elements. This comes from the fact that there are n2 total entries in each matrix and there are 2n − 1 independent linear constraints specifying that the row and column sums all equal one. There are 2n − 1 constraints rather than

2n because one of the constraints is dependent, since both row and column sums must equal one. 7

This can be thought of as each entry aij of the matrix has 2n − 1 constraints on it since there are

th th n entries in i row and n entries in the j column and we subtract one so that the aij entry isn’t double counted.

The following proof was also adapted from the proof presented by Marshall, Olkin and Arnold [6].

Proof of Theorem 4. Suppose that the matrix P = [pij] is an extreme point in the convex set of doubly stochastic n × n matrices. Then, as we just observed, it must be that pij = 0 for at least

(n − 1)2 pairs (i, j). At least one row must have n − 1 zero entries with the remaining entry being a one. We know it has to be n − 1 because if we say put only n − 2 zero entries in each row, then only n(n − 2) = (n − 1)2 − 1 zeros would be distributed, not the (n − 1)2 that we need. In the column containing this unit entry, all other entries must be zero. If this row and column are deleted from

P , then an (n − 1) × (n − 1) doubly stochastic matrix P 0 is obtained.

To see that P 0 must be an extreme point in the set of (n−1)×(n−1) doubly stochastic matrices, suppose for ease of notation that P 0 is obtained from P by deleting the last row and column. We are going to use the fact that if P 0 is extreme then it cannot be written as a linear combination of

0 0 0 0 0 distinct elements. If P has a representation as λP1 + (1 − λ)P2, where 0 < λ < 1 and P1 and P2 are doubly stochastic, we can express P as

    P 0 0 P 0 0  1   2  P = λ   + (1 − λ)       0 1 0 1

0 0 0 we know we must have P1 = P2, since P is extreme. This then implies that P is extreme. Now that we have that P 0 is extreme we can repeat the argument we just used with P , and say that

P 0 must have at least one row with a single one entry and all other entries zero. By then using induction on n with this argument, we arrive at the conclusion that P is a permutation matrix.

Hence, we have showed that if P is an extreme point then P is a permutation matrix. 8

To show that each n×n permutation matrix is extreme, suppose that P is a permutation matrix and P = λP1 + (1 − λ)P2 where 0 < λ < 1 and P1 and P2 are doubly stochastic. Since P1 and

P2 are doubly stochastic all of their entries are in the interval [0, 1]. Thus P cannot have entries consisting of zeros and units unless P1 = P2. Which then implies that P is extreme.

Lastly, we must show that the class of doubly stochastic matrices is the convex hull of its extreme points. Meaning the set of doubly stochastic matrices is the smallest set that contains its extreme points, the permutation matrices. This fact comes directly from the fact that the set of doubly stochastic matrices is closed and bounded by the permutation matrices. 

Although perhaps not immediately obvious, these two statements of the Birkhoff-von Neumann theorem are readily equivalent. The algebraic version of the theorem states that a doubly stochastic matrix A can be written as A = c1P1 + ... + ckPk, which is a convex combination of permutation matrices. The geometric version of the theorem says the set of doubly stochastic matrices forms a convex polytope with the permutation matrices as the vertices. Any point in a convex polytope can be written as a linear combination of its vertices. Thus since every doubly stochastic matrix can be written as a linear combination of permutation matrices then those are the vertices (extreme points) of the convex polytope that the doubly stochastic matrices form.

3. Applications

There are many applications of the Birkhoff-von Neumann Decomposition. On one hand, this theorem lead to the formation of several other theorems and is crucial in their proofs. Furthermore, this theorem has many applications in areas such as and .

3.1. Von Neumann Trace Inequality. One of the theoretical applications of the Birkhoff-von

Neumann Theorem is a trace inequality proved by von Neumann. Before stating this theorem let us clarify some notation and definitions. 9

Let A be a complex n × n and let its conjugate transpose be denoted by A∗. The SVD of A is denoted by

∗ A = UΣV Σ = diag(σ1(A), . . . , σn(A)).

Recall the singular values of A are always real and non-negative, and arranged in decreasing order.

The theorem proved by von Neumann gives an upper bound on the trace of matrix products, and is useful in many matrix approximation problems.

Theorem 5 (Von Neuman Trace Inequality). If A, B are complex n × n matrices with singular values arranged in decreasing order

σ1(A) ≥ ... ≥ σn(A) σ1(B) ≥ ... ≥ σn(B) respectively, then n X |tr(AB)| ≤ σr(A)σr(B). r=1

The proof of this theorem relies on the following lemma. The proof of this lemma, in turn, is a consequence of the Birkhoff-von Neumann decomposition.

n×n Lemma 6. Let A ∈ R be doubly stochastic. If the entries of the vectors x and y are real and arranged in decreasing order

x1 ≥ ... ≥ xn ≥ 0, y1 ≥ ... ≥ yn ≥ 0.

Then n n X X arsxrys ≤ xryr. r,s=1 r=1

Proof. It is readily verified that the statement of the lemma can alternatively be expressed as xT Ay ≤ xT y . 10

Since A is doubly stochastic we know that it can be decomposed as a convex combination of

Pk permutation matrices by using Theorem 2 . That is, A = c1P1 + c2P2 + . . . ckPk where i=1 ci = 1 and each Pi is a permutation matrix. In the left side of the inequality, we replace A with its decomposition, so that

T T T T x Ay = x (c1P1 + ... + ckPk)y = c1x P1y + ... + ckx Pky

k n X X T = cj xi yπj . j=1 i=1

Here πi is the permutation vector corresponding to the matrix Pi. The notation yπj means that the indices of the vector y have been permuted according to the permutation πj. Note that the permutation matrix was directly incorporated into y; however, without loss of generality, we could have equivalently incorporated it into the x vector. The following inequality will be of use in the sequel, but we defer it’s proof till the end. For any permutation of the integers π

n X T T (1) xi yπ ≤ x y. i=1

Applying this inequality, we have

T x Ay = c1(x1yπ1(1) + ... + xnyπ1(n)) + ...... + ck(x1yπk(1) + ... + xnyπk(n))

T T T T ≤ c1x y + ...... + ckx y = (c1 + ... + ck)x y = x y.

This then proves our desired result. It remains to justify (1), which completes the proof. It is important to remember that x1 ≥ ... ≥ xn and likewise y1 ≥ ... ≥ yn. Note that this implies that 11

(xi − xi+1) ≥ 0 since the xi are in decreasing order. So now we can say the following

n n−1 i n X X  X  X xiyπl(i) = (xi − xin ) yπl(j) + xn yπl(n) i=1 i=1 j=1 j=1

n−1 i n X  X  X ≤ (xi − xin ) yj + xn yj i=1 j=1 j=1 n X = xiyi. i=1

Thus we have verified that inequality (1) and the proof is complete. 

Now that we have proven this lemma, we can move on to proving von Neumann’s Trace Inequality theorem. This proof is largely based on a proof presented by Mirsky [7].

Proof of Theorem 5. Let A and B have the SVD A = U1RV1 and B = U2SV2, where U1,V1,U2,V2 are unitary matrices and R,S are diagonal matrices. It is important to note that in SVD the in the middle of the decomposition has the singular values on it’s diagonal, so we have the following:

R = diag(σ1(A), . . . , σn(A)) S = diag(σ1(B), . . . , σn(B)).

Because of the cyclic property of the trace we can say,

∗ tr(AB) = tr(U1RV1U2SV2) = tr(V2U1RV1U2S) = tr(U RVS),

∗ ∗ ∗ where U = [urs] = U1 V2 (so note that U = V2U1) and V = [vrs] = V1U2. The matrices U and V are unitary since they are the product of two unitary matrices. So we have that

n X tr(AB) = ursvrsσr(A)σs(B). r,s=1 12

An application of the scalar triangle inequality shows that

n X |tr(AB)| ≤ |ursvrs|σr(A)σs(B) r,s=1 n n 1 X 1 X ≤ |u |2σ (A)σ (B) + |v |2σ (A)σ (B). 2 rs r s 2 rs r s r,s=1 r,s=1

Note that this last step uses the Cauchy-Schwartz inequality. Since U and V are unitary matrices

2 2 we can see that [|urs| ] and [|vrs| ] are doubly stochastic. Recall that a has columns

∗ 2 2 with unit norm and that U U = I. That is, [|urs| ] and [|vrs| ] are doubly stochastic, and we can use Lemma 6. Hence, applying the lemma we have

n n 1 X 1 X |u |2σ (A)σ (B) + |v |2σ (A)σ (B) 2 rs r s 2 rs r s r,s=1 r,s=1 n n 1 X 1 X ≤ σ (A)σ (B) + σ (A)σ (B) 2 r r 2 r r r=1 r=1 n X = σr(A)σr(B) r=1

Pn Thus we have shown that |tr(AB)| ≤ r,s=1 σr(A)σr(B) and have proven the theorem. 

This result, which gives a bound on the trace of a matrix product with respect to singular values, has several applications. It is used in the proof of many other theorems. One result where it is crucial in the proof is the unitary Procrustes problem. This problem asks that given A, B are m×n matrices, how well can A be approximated in the Frobenius norm by the rotation UB for some unitary matrix U that is m × m. Although we will not complete the whole proof of the Procrustes problem here, the relevant calculation involving the von Neumann trace inequality is shown below.

2 2 ∗ ∗ 2 kA − UBkF = kAkF − 2Re tr(AB U ) + kBkF

m 2 X ∗ 2 ≥ kAkF − 2 σi(AB ) + kBkF . i=1 13

It is clear that the inequality can be proved directly using the von Neumann theorem, since

Pn Pn |tr(AB)| ≤ r,s=1 σr(A)σr(B) implies that −tr(AB) ≥ − r,s=1 σr(A)σr(B).

3.2. Hoffman-Wielandt Theorem.

Another theoretical result that utilizes the Birkhoff-von Neumann Theorem is the Hoffman-

Wielandt theorem stated below. This theorem is useful in dealing with matrix perturbations. Let

A be the target matrix and let A + E be the perturbed matrix. Suppose that both matrices are normal, then Theorem 7 gives a Frobenius norm upper bound for perturbations to all of the eigenvalues.

p ∗ Let k · kF denote the Frobenius norm which is defined as kAkF = trace(A A). Alternatively, the Frobenius norm can be expressed in terms of the singular values of the matrix as

1/2  n  X 2 kAkF =  σj  . j=1

n×n Theorem 7 (Hoffman-Wielandt). Let A, E ∈ C , assume that A and A + E are both normal.

Let λ1, . . . , λn be the eigenvalues of A in some given order, and let λˆ1,..., λˆn be the eigenvalues of

A + E in some given order. There exists a permutation π of the integers {1, . . . , n} such that

n X ˆ 2 2 ∗ |λπ(i) − λi| ≤ kEkF = tr(E E). i=1

The following proof is based on the proof presented by Horn and Johnson [4].

Proof of Theorem 7. Let Λ = diag{λ1, . . . , λn} and Λˆ = diag{λˆ1,..., λˆn}, contain the eigenvalues of A and A+E respectively. Since both matrices are normal, the says that we can represent A = V ΛV ∗ and A + E = W ΛˆW ∗ where V and W are unitary. Using unitary invariance 14 of the Frobenius norm we have the following.

2 2 kEkF = k(A + E) − AkF

ˆ ∗ ∗ 2 = kW ΛW − V ΛV kF

∗ ˆ ∗ 2 = kV W Λ − ΛV W kF

ˆ 2 ∗ = kUΛ − ΛUkF where U = V W

n X 2 2 = |λˆi − λj| |uij| . i,j=1

2 As we have seen in the proof of Theorem 5, we know that the matrix [|uij| ] is doubly stochastic.

So then we can say

n 2 X ˆ 2 2 kEkF = |λi − λj| |uij| i,j=1 n n X 2 o ≥ min |λˆi − λj| sij i,j=1

Pn ˆ 2 where S = [sij] is a doubly stochastic matrix. The function f(S) = i,j=1 |λi − λj| sij is a linear function on the compact convex set of doubly stochastic matrices. So by Theorem 4 we know that f attains its minimum at some permutation matrix P . If P T corresponds to the permutation π of

{1, . . . , n} then we have

n n 2 X ˆ 2 X ˆ 2 kEkF ≥ |λi − λj| pij = |λσ(i) − λi| . i,j=1 i=1

Thus we have proven the desired result. 

Note that this theorem is true only for a specific permutation π, and not necessarily true for an arbitrary permutation. However, if A is Hermitian then the eigenvalues may be ordered in 15 increasing (or decreasing) order and the theorem simplifies to

n X ˆ 2 ∗ |λi − λi| ≤ kEkF = tr(E E). i=1

3.3. Other Applications. There are several more applications of the Birkhoff-von Neumann The- orem and we will briefly discuss some of them here. The first two are more applied in nature and are related to graph theory and then the assignment problem. The next two are more theoretical, one of which is a way to tackle unitarily invariant norms.

Graph Isomorphisms. One unexpected application of the Birkhoff-von Neumann Theorem is within graph theory. There are a multitude of ways that doubly stochastic matrices, and thus the Birkhoff- von Neumann theorem, can be applied to graphs. One major way is that doubly stochastic matrices are used when talking about graph isomorphisms. Consider a graph G that has n vertices and where each pair of vertices is connected by at most one edge. You can create an A such that aij = 1 if vertices i and j are adjoined and aij = 0 if they are not.

Let A be the adjacency matrix for graph G and let B be the adjacency matrix for a graph H that also has n vertices. A permutation π of the vertex set {1, . . . , n} is an isomorphism of G and

H if it is such that i and j are joined by an edge in G if and only if π(i) and π(j) are joined by an edge in H. If we let P be the permutation matrix corresponding to π then we say there is an isomorphism between G and H if and only if A = PBP −1, or equivalently AP = PB.

The definition of an isomorphism can be relaxed to that of a doubly stochastic isomorphism. If

X is a doubly stochastic matrix then G and H have a doubly stochastic isomorphism if and only if AX = XB. Note that this is a relaxed definition because permutation matrices are a subset of doubly stochastic matrices. Within graph isomorphisms, we often want to discuss automorphisms.

Graph automorphisms are when G = H and therefore, we consider isomorphisms from G to G. 16

To understand the full connection with the Birkhoff-von Neumann theorem we must define some concepts pertaining to automorphisms. For a graph G, let Π(A) be the set of automorphisms. Let

Π(A) be the convex hull of automorphisms of G. That is, the smallest convex set containing all of the automorphisms. Let Ωn(A) be the convex polytope of doubly stochastic isomorphisms of

G. That is, Ωn(A) is the set of doubly stochastic matrices X that commute with A, the adjacency matrix of G.

A question of interest is then when does Π(A) = Ωn(A). Graphs in which this is true are called compact graphs. Some examples of compact graphs are complete graphs and graphs consisting of only self loops. Identifying families of compact graphs is an extension of the Birkhoff-von Neumann theorem.

To see this connection we will consider a special case. Let G be a graph with n vertices and n edges that are all self loops. This implies that A = In, since every vertex is only connected to itself.

This means that Ωn(In) is equal to the set of doubly stochastic matrices, since the identity matrix commutes with everything. Then we have that Π(In) is the set of all permutation matrices with n vertices. This is because you can find an isomorphism between the identity matrix and any other permutation matrix.

Thus by the Birkhoff-von Neumann theorem we know that Π(In) is the set of doubly stochastic matrices. This is because the Birkhoff-von Neumann theorem explicitly states that the set of doubly stochastic matrices is the convex hull of the permutation matrices. Therefore we have that

Ωn(In) = Π(In) and graph G is compact. The Birkhoff-von Neumann theorem can be used in a similar way to show other families of graphs are compact.

Assignment Problem. Another application of the Birkhoff-von Neumann decomposition that is ap- plied in nature is with the random assignment problems. The assignment problem refers to matching n objectx to n categories. For example, matching n men/factories/workers to n women/buyers/jobs. 17

To see how this problem is related to doubly stochastic matrices, consider randomly assigning three agents {1, 2, 3} to three objects {a, b, c} with the following matrix:

  1/3 2/3 0       P = 1/3 1/3 1/3 .       1/3 0 2/3

Here the pij entry refers to the probability of assigning agent i to object j. If each agent is considered separately, some objects may be allocated more than once while others are not allocated at all, which is not possible. Thus the doubly stochastic matrix is utilized to consider the problem as a whole. Note that the example above is the same example that we computed the Birkhoff-von

Neumann decomposition for earlier, which is restated below.

      0 1 0 0 1 0 1 0 0             1   1   1   P = 0 0 1 + 1 0 0 + 0 1 0 . 3   3   3               1 0 0 0 0 1 0 0 1

Each permutation matrix in the decomposition represents a possible allocation of objects and categories. For example the first matrix represents agent 1 assigned to object b, agent 2 assigned to object c and agent 3 assigned to object a. The coefficient out front is the probability of each allocation occurring, so in this case each of the three allocations has a one third probability of happening.

When the random assignment is a simple one such as this, where the number of agents equals the number of objects, the Birkhoff-von Neumann theorem can be directly applied. However, adjustments must be made to the theorem for more complicated situations. These situations could be anything from it being acceptable for some agents or objects to remain unassigned, some agents 18 being able to be assigned to more than one object or not having the same number of agents and objects. In these situations the Birkhoff-von Neumann Theorem must be altered, sometimes a little and sometimes a lot, to compensate for these changes.

Majorization. The theory of majorization is closely connected to doubly stochastic matrices, and thus the Birkhoff-von Neumann Theorem. Majorization refers to comparing two vectors. We say that y majorizes x and denote it by x ≺ y based on the following definition.

n Definition 8 ( Majorization ). For x, y ∈ R

 k k  P x ≤ P y k = 1, . . . , n − 1  i i  i=1 i=1  x ≺ y if .   n n  P P  xi = yi  i=1 i=1

A common theorem in majorization is directly related to doubly stochastic matrices. It states a necessary and sufficient condition for y to majorize x.

n Theorem 9. Let x, y ∈ R , then x ≺ y if and only if there exists a doubly stochastic matrix P such that x = P y.

This theorem provides a crucial connection between majorization and doubly stochastic matrices.

Once we have this connection, we can apply the Birkhoff-von Neumann theorem in a variety of ways within majorization. One such way is with the proposition stated below.

n Proposition 10. Let x, y ∈ R . Then x ≺ y if and only if x lies in the convex hull of the n! permutation of y.

The proof of this proposition relies heavily on the Birkhoff-von Neumann theorem. From Theo- rem 9 we know that x is majorized by y if and only if x = P y for some doubly stochastic matrix P . 19

Since P is doubly stochastic we know by Birkhoff-von Neumann theorem that it can be written as a linear combination of some of the n! permutation matrices. Once P is replaced with its decom- position we can view P y as permutations of y. Thus if x = P y then x can be written as a linear combination of these permutations of y. That is, x lies in the convex hull of the n! permutations of y.

Fan Dominance Principle. This principle uses Ky Fan’s (k)-norm k · k(k), which is defined as

Pk kAk(k) = i=1 σi(A), where σ1(A) ≥ ... ≥ σn(A) are the singular values of A for k = 1, . . . , n. It is a special case of a unitarily invariant norm, which means that kAk = kUAV k for any unitary U and V . It is relevant to note some special cases of the (k)-norm. If k = 1, this is just the spectral norm, which is the maximum . If k = n, then this is the trace norm, also called the nuclear norm, which is just the sum of singular values. The Fan dominance principle, which is stated below, provides a useful way of proving inequalities for unitarily invariant norms, since it shows that it is sufficient to analyze the Ky-Fan norms.

n×n Theorem 11. Let A, B ∈ C . Then kAk ≤ kBk for all unitarily invariant norms if and only if kAk(k) ≤ kBk(k) for all k = 1, . . . , n.

The proof of this theorem, although omitted here, relies heavily on doubly stochastic matrices and the Birkhoff-von Neumann Theorem. First, you consider the equivalent statement of this theorem that applies to vectors and gauge functions. Let the vectors x and y contain singular values of A and B, respectively. Then the condition kAk(k) ≤ kBk(k) for all k = 1, . . . , n implies that y weakly majorizes x (it does not majorize since the final sum is not necessarily equal). From there we use an analog of Theorem 9 to say that x and y are related by a doubly stochastic matrix

P . Then P is replaced with is Birkhoff-von Neumann decomposition and the rest of the proof of

Theorem 11 falls into place [9]. 20

4. Conclusion

The Birkhoff-von Neumman decomposition theorem is a celebrated result with applications in a wide range of subjects. Although this theorem only applies to doubly stochastic matrices, it has applications everywhere from graph theory to singular values. The theorem is so useful because it can be applied in a variety of ways. This is highlighted in the fact that there are two equivalent, and yet completely different, statements of the theorem. 21

References

[1] Bhatia, Rajendra. Matrix Analysis. Graduate Texts in Mathematics., vol. 169, 1952.

[2] Brualdi, Richard A. Some Applications of Doubly Stochastic Matrices, Elsevier Science Publishing co. inc., 1988.

[3] Budish, Eric, et al. Implementing Random Assignments: A Generalization of the Birkhoff-von Neumann Theo-

rem. 2008.

[4] Horn, Roger A., and Charles R. Johnson. Matrix Analysis, Cambridge University Press, 2012.

[5] Li, Chi-Kwong, and Roy Mathias . Generalizations of Ky Fan’s Dominance Theorem. SIAM Journal on Matrix

Analysis and Applications, 1997.

[6] Marshall, Albert W., et al. Inequalities: Theory of Majorization and Its Applications . 2nd ed., Springer, 2011.

[7] Mirsky, Leon. A Trace Inequality of . Monatshefte fur Mathematik, vol. 79, no. 4, Dec. 1975,

pp. 303-306.

[8] Mirsky, Leon. On the Trace of Matrix Products. Mathematische Nachrichten, vol. 20, 1959.

[9] Stewart, G. W., and Ji-guang Sun. Matrix Perturbation Theory. Academic Press, Inc., 1990. 22 APPENDICES 23

Appendix A: Additional Proofs. Proof of Lemma 3 . Suppose that A 6= I and that every permutation π of {1, . . . , n} that is not the identity permutation π0 has the property that a1π(1) ··· anπ(n) = 0. This assumption allows us to compute the characteristic polynomial of A:

n n Y X  Y  pA(t) = det(tI − A) = (t − aii) + sgnπ (−aiπ(i)) i=1 π6=π0 i=1 n Y = (t − aii) i=1

It follows that the main diagonal entries of A are its eigenvalues. Since +1 is an eigenvalue of A, at least one of its main diagonal entries is +1. We can say that A is permutation similar to [1] L B (n−1)×(n−1) where B ∈ C is doubly stochastic. The main diagonal entries of B are obtained from the main diagonal entries of A by omitting one +1 entry. Since B is doubly stochastic as well, we know +1 is also an eigenvalue of B. We also know that the characteristic polynomial of B is

n−1 pA(t) Y p (t) = = (t − b ) B (t − 1) ii i=1

Applying the preceding argument to B that we just used with A, shows that some bii = 1. This then implies that at least two main diagonal entries of A are +1. Continuing with this argument, after at most n − 1 steps, we can conclude that every main diagonal entry of A is +1 so A = I.

This contradicts our assumption that A 6= I. Thus some product a1π(1) ··· anπ(n) must be positive and we have proved the lemma.  24

Appendix B: Examples. Example 1 continued from page 5.   1/3 2/3 0       A = 1/3 1/3 1/3       1/3 0 2/3

Let A be defined as the matrix above. It can easily be verified that A is doubly stochastic, as each row and column sums to 1. Let p1 = (2, 3, 1) be the permutation of (1, 2, 3) such that 2 1 1 a12 · a23 · a31 = 3 · 3 · 3 > 0. Note that this permutation is not unique. Now let

  0 1 0       P1 = 0 0 1       1 0 0

2 1 1 1 be the permutation matrix that corresponds to p1. Now let c1 = min{ 3 , 3 , 3 } = 3 and we can do the first step of the decomposition, writing A as

    0 1 0 1/3 1/3 0         1     A = c1P1 + R1 = 0 0 1 + 1/3 1/3 0  3             1 0 0 0 0 2/3

where we are defining R1 as the second matrix and it is the remainder. Now following the steps in our previous proof, we know that R! should be doubly stochastic, for ease let us denote this new 1−c1 matrix as B. Evaluating this we get   1/2 1/2 0     R1 R1 R1 3   B = = = = R1 = 1/2 1/2 0 . 1 − c 1 − 1/3 2/3 2   1     0 0 1

We can easily verify that B is doubly stochastic. However, we must remember to keep the expression 2 2 equal, so we need to multiply B by 3 since R1 = 3 B. Our decomposition now looks like the following 25

    0 1 0 1/2 1/2 0         1 2 1   2   A = P1 + B = 0 0 1 + 1/2 1/2 0 . 3 3 3   3           1 0 0 0 0 1

We now are going to decompose B in the same fashion, since it is doubly stochastic. Let 1 1 p2 = (2, 1, 3) be a new permutation of (1, 2, 3) where we have that b12 · b21 · b33 = 2 · 2 · 1 > 0. Let

  0 1 0       P2 = 1 0 0       0 0 1

1 1 1 be the permutation matrix corresponding to p2. Let c2 = min{ 2 , 2 , 1} = 2 . Now we can decompose B as the following.     0 1 0 1/2 0 0         1     B = c2P2 + R2 = 1 0 0 +  0 1/2 0  2             0 0 1 0 0 1/2 where R2 is again the second matrix and is our remainder. Note that our decomposition of the original matrix A currently looks like

      0 1 0 0 1 0 1/2 0 0             ! 1   2 1     A = 0 0 1 + 1 0 0 +  0 1/2 0  . 3   3 2                 1 0 0 0 0 1 0 0 1/2

Continuing, we know that R2 should be doubly stochastic, again for ease let us denote this matrix 1−c2 as B2. Evaluating this we get the following matrix. 26

  1 0 0     R2 R2 R2   B2 = = = = 2R2 = 0 1 0 . 1 − c 1 − 1/2 1/2   2     0 0 1

Clearly B2 is doubly stochastic. If we were to attempt another round of decomposition we would have that c3 = 1 and the remainder, R3, would be zero, which signals the end of the process. Now we can combine and simplify our expressions to get our final decomposition.

      0 1 0 0 1 0 1 0 0             ! 1   2 1   1   A = 0 0 1 + 1 0 0 + 0 1 0 3   3 2   2               1 0 0 0 0 1 0 0 1       0 1 0 0 1 0 1 0 0             1   1   1   = 0 0 1 + 1 0 0 + 0 1 0 . 3   3   3               1 0 0 0 0 1 0 0 1

1 1 1 1 We have that c1 = c2 = c3 = 3 , and can verify that c1 + c2 + c3 = 3 + 3 + 3 = 1 as it should according to the theorem. In addition the three matrices in the final decomposition are clearly permutation matrices. Thus we have computed the Birkhoff-von Neumann Decomposition of this doubly stochastic matrix A.