Graphs, Principal Minors, and Eigenvalue Problems John C. Urschel

Home , Leon Mirsky

Graphs, Principal Minors, and Eigenvalue Problems by John C. Urschel Submitted to the Department of Mathematics in partial fulﬁllment of the requirements for the degree of Doctor of Philosophy in Mathematics at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2021 © John C. Urschel, MMXXI. All rights reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created.

Author...... Department of Mathematics August 6, 2021

Certiﬁed by ...... Michel X. Goemans RSA Professor of Mathematics and Department Head Thesis Supervisor

Accepted by...... Jonathan Kelner Chairman, Department Committee on Graduate Theses 2 Graphs, Principal Minors, and Eigenvalue Problems by John C. Urschel

Submitted to the Department of Mathematics on August 6, 2021, in partial fulﬁllment of the requirements for the degree of Doctor of Philosophy in Mathematics

Abstract This thesis considers four independent topics within linear algebra: determinantal point processes, extremal problems in spectral graph theory, force-directed layouts, and eigenvalue algorithms. For determinantal point processes (DPPs), we consider the classes of symmetric and signed DPPs, respectively, and in both cases connect the problem of learning the parameters of a DPP to a related matrix recovery problem. Next, we consider two conjectures in spectral graph theory regarding the spread of a graph, and resolve both. For force-directed layouts of graphs, we connect the layout of the boundary of a Tutte spring embedding to trace theorems from the theory of elliptic PDEs, and we provide a rigorous theoretical analysis of the popular Kamada-Kawai objective, proving hardness of approximation and structural results regarding optimal layouts, and providing a polynomial time randomized approximation scheme for low diameter graphs. Finally, we consider the Lanczos method for computing extremal eigenvalues of a symmetric matrix and produce new error estimates for this algorithm.

Thesis Supervisor: Michel X. Goemans Title: RSA Professor of Mathematics and Department Head

3 4 Acknowledgments

I would like to thank my advisor, Michel Goemans. From the moment I met him, I was impressed and inspired by the way he thinks about mathematics. Despite his busy schedule as head of the math department, he always made time when I needed him. He has been the most supportive advisor a student could possibly ask for. He’s the type of mathematician I aspire to be. I would also like to thank the other members of my thesis committee, Alan Edelman and Philippe Rigollet. They have been generous with their time and their mathematical knowledge. In my later years as a PhD candidate, Alan has served as a secondary advisor of sorts, and I’m thankful for our weekly Saturday meetings.

I’ve also been lucky to work with a number of great collaborators during my time at MIT. This thesis would not have been possible without my mathematical collaborations and conversations with Jane Breen, Victor-Emmanuel Brunel, Erik Demaine, Adam Hesterberg, Fred Koehler, Jayson Lynch, Ankur Moitra, Alex Riasanovksy, Michael Tait, and Ludmil Zikatanov. And, though the associated work does not appear in this thesis, I am also grateful for collaborations with Xiaozhe Hu, Dhruv Rohatgi, and Jake Wellens during my PhD. I’ve been the beneﬁciary of a good deal of advice, mentorship, and support from the larger mathematics community. This group is too long to list, but I’m especially grateful to Rediet Abebe, Michael Brenner, David Mordecai, Jelani Nelson, Peter Sarnak, Steven Strogatz, and Alex Townsend, among many others, for always being willing to give me some much needed advice during my time at MIT.

I would like to thank my office mates, Fred Koehler and Jake Wellens (and Megan Fu), for making my time in office 2-342 so enjoyable. With Jake’s wall mural and very large and dusty rug, our office felt lived in and was a comfortable place to do math. I’ve had countless interesting conversations with fellow grad students, including Michael Cohen, Younhun Kim, Meehtab Sawhney, and Jonathan Tidor, to name a few. My interactions with MIT undergrads constantly reminded me just how fulfilling and rewarding mentorship can be. I’m thankful to all the support staff members at MIT math, and have particularly fond memories of grabbing drinks with André at the Muddy, constantly bothering Michele with

5 whatever random MIT problem I could not solve, and making frequent visits to Rosalee for more espresso pods. More generally, I have to thank the entire math department for fostering such a warm and welcoming environment to do math. MIT math really does feel like home. I’d be remiss if I did not thank my friends and family for their non-academic contributions. I have to thank Ty Howle for supporting me every step of the way, and trying to claim responsibility for nearly all of my mathematical work. This is a hard claim to refute, as it was often only once I took a step back and looked at things from a different perspective that breakthroughs were made. I’d like to thank Robert Hess for constantly encouraging and believing in me, despite knowing absolutely no math. I’d like to thank my father and mother for instilling a love of math in me from a young age. My fondest childhood memories of math were of going through puzzle books at the kitchen table with my mother, and leaﬁng through the math section in the local Chapters in St. Catherines while my father worked in the cafe. I’d like to thank Louisa for putting up with me, moving with me (eight times during this PhD), and proofreading nearly all of my academic papers. I would thank Joanna, but, frankly, she was of little to no help. This research was supported in part by ONR Research Contract N00014-17-1-2177. During my time at MIT, I was previously supported by a Dean of Science Fellowship and am currently supported by a Mathworks Fellowship, both of which have allowed me to place a greater emphasis on research, and without which this thesis, in its current form, would not have been possible.

6 For John and Venita

7 8 Contents

1 Introduction 11

2 Determinantal Point Processes and Principal Minor Assignment Problems 17 2.1 Introduction ...... 17 2.1.1 Symmetric Matrices ...... 18 2.1.2 Magnitude-Symmetric Matrices ...... 20 2.2 Learning Symmetric Determinantal Point Processes ...... 22 2.2.1 An Associated Principal Minor Assignment Problem ...... 22 2.2.2 Deﬁnition of the Estimator ...... 25 2.2.3 Information Theoretic Lower Bounds ...... 29 2.2.4 Algorithmic Aspects and Experiments ...... 32 2.3 Recovering a Magnitude-Symmetric Matrix from its Principal Minors . . . 38 2.3.1 Principal Minors and Magnitude-Symmetric Matrices ...... 40 2.3.2 Efﬁciently Recovering a Matrix from its Principal Minors . . . . . 60 2.3.3 An Algorithm for Principal Minors with Noise ...... 66

3 The Spread and Bipartite Spread Conjecture 75 3.1 Introduction ...... 75 3.2 The Bipartite Spread Conjecture ...... 78 3.3 A Sketch for the Spread Conjecture ...... 85 3.3.1 Properties of Spread-Extremal Graphs ...... 88

9 4 Force-Directed Layouts 95 4.1 Introduction ...... 95 4.2 Tutte’s Spring Embedding and a Trace Theorem ...... 98 4.2.1 Spring Embeddings and a Schur Complement ...... 101 4.2.2 A Discrete Trace Theorem and Spectral Equivalence ...... 109 4.3 The Kamada-Kawai Objective and Optimal Layouts ...... 127 4.3.1 Structural Results for Optimal Embeddings ...... 131 4.3.2 Algorithmic Lower Bounds ...... 138 4.3.3 An Approximation Algorithm ...... 147

5 Error Estimates for the Lanczos Method 151 5.1 Introduction ...... 151 5.1.1 Related Work ...... 155 5.1.2 Contributions and Remainder of Chapter ...... 157 5.2 Preliminary Results ...... 159 5.3 Asymptotic Lower Bounds and Improved Upper Bounds ...... 161 5.4 Distribution Dependent Bounds ...... 175 5.5 Estimates for Arbitrary Eigenvalues and Condition Number ...... 181 5.6 Experimental Results ...... 187

10 Chapter 1

Introduction

In this thesis, we consider a number of different mathematical topics in linear algebra, with a focus on determinantal point processes and eigenvalue problems. Below we provide a brief non-technical summary of each topic in this thesis, as well as a short summary of other interesting projects that I had the pleasure of working on during my PhD that do not appear in this thesis.

Determinantal Point Processes and Principal Minor Assignment Problems A determinantal point process is a probability distribution over the subsets of a finite ground set where the distribution is characterized by the principal minors of some fixed matrix. Given a finite set, say, [푁] = {1, . . . , 푁} where 푁 is a positive integer, a DPP

is a random subset 푌 ⊆ [푁] such that 푃 (퐽 ⊆ 푌 ) = det(퐾퐽 ), for all ﬁxed 퐽 ⊆ [푁], 푁×푁 where 퐾 ∈ R is a given matrix called the kernel of the DPP and 퐾퐽 = (퐾푖,푗)푖,푗∈퐽 is the principal submatrix of 퐾 associated with the set 퐽. In Chapter 2, we focus on DPPs with kernels that have two different types of structure. First, we restrict ourselves to DPPs with symmetric kernels, and produce an algorithm that learns the kernel of a DPP from its samples. Then, we consider the slightly broader class of DPPs with kernels that have

corresponding off-diagonal entries equal in magnitude, i.e., 퐾 satisﬁes |퐾푖,푗| = |퐾푗,푖| for all 푖, 푗 ∈ [푁], and produce an algorithm to recover such a matrix 퐾 from (possibly perturbed versions of) its principal minors. Sections 2.2 and 2.3 are written so that they may be read

11 independently of each other. This chapter is joint work with Victor-Emmanuel Brunel, Michel Goemans, Ankur Moitra, and Philippe Rigollet, and based on [119, 15].

The Spread and Bipartite Spread Conjecture

Given a graph 퐺 = ([푛], 퐸), [푛] = {1, ..., 푛}, the adjacency matrix 퐴 ∈ R푛×푛 of 퐺

is deﬁned as the matrix with 퐴푖,푗 = 1 if {푖, 푗} ∈ 퐸 and 퐴푖,푗 = 0 otherwise. 퐴 is a real

symmetric matrix, and so has real eigenvalues 휆1 ≥ ... ≥ 휆푛. One interesting quantity is

the difference between extreme eigenvalues 휆1 − 휆푛, commonly referred to as the spread of the graph 퐺. In [48], the authors proposed two conjectures. First, the authors conjectured that if a graph 퐺 of size |퐸| ≤ 푛2/4 maximizes the spread over all graphs of order 푛 and size |퐸|, then the graph is bipartite. The authors also conjectured that if a graph 퐺 maximizes the spread over all graphs of order 푛, then the graph is the join (i.e., with all ¯ edges in between) 퐾⌊2푛/3⌋ ∨ 푒푛퐾⌈푛/3⌉ of a clique of order ⌊2푛/3⌋ and an independent set of order ⌈푛/3⌉. We refer to these conjectures as the bipartite spread and spread conjectures, respectively. In Chapter 3, we consider both of these conjectures. We provide an infinite class of counterexamples to the bipartite spread conjecture, and prove an asymptotic version of the bipartite spread conjecture that is as tight as possible. In addition, for the spread conjecture, we make a number of observations regarding any spread-optimal graph, and use these observations to sketch a proof of the spread conjecture for all 푛 sufficiently large. The full proof of the spread conjecture for all 푛 sufficiently large is rather long and technical, and can be found in [12]. This chapter is joint work with Jane Breen, Alex Riasanovsky, and Michael Tait, and based on [12].

Force-Directed Layouts

A force-directed layout, broadly speaking, is a technique for drawing a graph in a low- dimensional Euclidean space (usually dimension ≤ 3) by applying “forces” between the set of vertices and/or edges. In a force-directed layout, vertices connected by an edge (or at a small graph distance from each other) tend to be close to each other in the resulting layout. Two well-known examples of a force-directed layout are Tutte’s spring embedding

12 and metric multidimensional scaling. In his 1963 work titled “How to Draw a Graph,” Tutte found an elegant technique to produce planar embeddings of planar graphs that also minimize the sum of squared edge lengths in some sense. In particular, for a three-connected planar graph, he showed that if the outer face of the graph is ﬁxed as the complement of some convex region in the plane, and every other point is located at the mass center of its neighbors, then the resulting embedding is planar. This result is now known as Tutte’s spring embedding theorem, and is considered by many to be the ﬁrst example of a force-directed layout [117]. One of the major questions that this result does not treat is how to best embed the outer face. In Chapter 4, we investigate this question, consider connections to a Schur complement, and provide some theoretical results for this Schur complement using a discrete energy-only trace theorem. In addition, we also consider the later proposed Kamada-Kawai objective, which can provide a force-directed layout of an arbitrary graph. In particular, we prove a number of structural results regarding layouts that minimize this objective, provide algorithmic lower bounds for the optimization problem, and propose a polynomial time approximation scheme for drawing low diameter graphs. Sections 4.2 and 4.3 are written so that they may be read independently of each other. This chapter is joint work with Erik Demaine, Adam Hesterberg, Fred Koehler, Jayson Lynch, and Ludmil Zikatanov, and is based on [124, 31].

Error Estimates for the Lanczos Method

The Lanczos method is one of the most powerful and fundamental techniques for solving an extremal symmetric eigenvalue problem. Convergence-based error estimates depend heavily on the eigenvalue gap, but in practice, this gap is often relatively small, resulting in signiﬁcant overestimates of error. One way to avoid this issue is through the use of uniform error estimates, namely, bounds that depend only on the dimension of the matrix and the number of iterations. In Chapter 5, we prove upper uniform error estimates for the Lanczos method, and provide a number of lower bounds through the use of orthogonal polynomials. In addition, we prove more speciﬁc results for matrices that possess some level of eigenvalue regularity or whose eigenvalues converge to some limiting empirical spectral

13 distribution. Through numerical experiments, we show that the theoretical estimates of this chapter do apply to practical computations for reasonably sized matrices. This chapter has no collaborators, and is based on [122].

Topics not in this Thesis

During my PhD, I have had the chance to work on a number of interesting topics that do not appear in this thesis. Below I brieﬂy describe some of these projects. In [52] (joint with X. Hu and L. Zikatanov), we consider graph disaggregation, a technique to break high degree nodes of a graph into multiple smaller degree nodes, and prove a number of results regarding the spectral approximation of a graph by a disaggregated version of it. In [17] (joint with V.E. Brunel, A. Moitra, and P. Rigollet), we analyze the curvature of the expected log-likelihood around its maximum and characterize of when the maximum likelihood estimator converges at a parametric rate. In [120], we study centroidal Voronoi tessellations from a variational perspective and show that, for any density function, there does not exist a unique two generator centroidal Voronoi tessellation for dimensions greater than one. In [121], we prove a generalization of Courant’s theorem for discrete graphs, namely, that for the 푘푡ℎ eigenvalue of a generalized Laplacian of a discrete graph, there exists a set of corresponding eigenvectors such that each eigenvector can be decomposed into at most k nodal domains, and that this set is of co-dimension zero with respect to the entire eigenspace. In [123] (joint with J. Wellens), we show that, given a graph with local crossing number either at most 푘 or at least 2푘, it is NP-complete to decide whether the local crossing number is at most 푘 or at least 2푘. In [97] (joint with D. Rohatgi and J. Wellens), we treat two conjectures – one regarding the maximum possible value of 푐푝(퐺) + 푐푝(퐺¯) (where 푐푝(퐺) is the minimum number of cliques needed to cover the edges of 퐺 exactly

once), due to de Caen, Erdos, Pullman and Wormald, and the other regarding 푏푝푘(퐾푛)

(where 푏푝푘(퐺) is the minimum number of bicliques needed to cover each edge of 퐺 exactly 푘 times), due to de Caen, Gregory and Pritikin. We disprove the ﬁrst, obtaining improved ¯ lower and upper bounds on 푚푎푥퐺 푐푝(퐺) + 푐푝(퐺), and we prove an asymptotic version

of the second, showing that 푏푝푘(퐾푛) = (1 + 표(1))푛. If any of the topics (very brieﬂy)

14 described here interest you, I encourage you to take a look at the associated paper. This thesis was constructed to center around a topic and to be of a manageable length, and I am no less proud of some of the results described here that do not appear in this thesis.

15 16 Chapter 2

Determinantal Point Processes and Principal Minor Assignment Problems

2.1 Introduction

A determinantal point process (DPP) is a probability distribution over the subsets of a finite ground set where the distribution is characterized by the principal minors of some fixed matrix. DPPs emerge naturally in many probabilistic setups such as random matrices and integrable systems [10]. They also have attracted interest in machine learning in the past years for their ability to model random choices while being tractable and mathematically elegant [71]. Given a finite set, say, [푁] = {1, . . . , 푁} where 푁 is a positive integer, a

DPP is a random subset 푌 ⊆ [푁] such that 푃 (퐽 ⊆ 푌 ) = det(퐾퐽 ), for all ﬁxed 퐽 ⊆ [푁], 푁×푁 where 퐾 ∈ R is a given matrix called the kernel of the DPP and 퐾퐽 = (퐾푖,푗)푖,푗∈퐽 is the principal submatrix of 퐾 associated with the set 퐽. Assumptions on 퐾 that yield the existence of a DPP can be found in [14] and it is easily seen that uniqueness of the DPP is then automatically guaranteed. For instance, if 퐼 − 퐾 is invertible (퐼 being the

−1 identity matrix), then 퐾 is the kernel of a DPP if and only if 퐾(퐼 − 퐾) is a 푃0-matrix, i.e., all its principal minors are nonnegative. We refer to [54] for further deﬁnitions and

properties of 푃0 matrices. Note that in that case, the DPP is also an 퐿-ensemble, with

probability mass function given by 푃 (푌 = 퐽) = det(퐿퐽 )/ det(퐼 + 퐿), for all 퐽 ⊆ [푁],

17 where 퐿 = 퐾(퐼 − 퐾)−1. DPPs with a symmetric kernel have attracted a lot of interest in machine learning because they satisfy a property called negative association, which models repulsive interactions between items [8]. Following the seminal work of Kulesza and Taskar [72], discrete symmetric DPPs have found numerous applications in machine learning, including in document and timeline summarization [76, 132], image search [70, 1] and segmentation [73], audio signal processing [131], bioinformatics [6] and neuroscience [106]. What makes such models appealing is that they exhibit repulsive behavior and lend themselves naturally to tasks where returning a diverse set of objects is important. For instance, when applied to recommender systems, DPPs enforce diversity in the items within one basket [45]. From a statistical point of view, DPPs raise two essential questions. First, what are the families of kernels that give rise to one and the same DPP? Second, given observations of independent copies of a DPP, how to recover its kernel (which, as foreseen by the first question, is not necessarily unique)? These two questions are directly related to the principal minor assignment problem (PMA). Given a class of matrices, (1) describe the set of all matrices in this class that have a prescribed list of principal minors (this set may be empty), (2) find one such matrix. While the first task is theoretical in nature, the second one is algorithmic and should be solved using as few queries of the prescribed principal minors as possible. In this work, we focus on the question of recovering a kernel from observations, which is closely related to the problem of recovering a matrix with given principal minors. In this chapter, we focus on symmetric DPPs (which, when clear from context, we simply refer to as DPPs) and signed DPPs, a slightly larger class of kernels that only require corresponding off-diagonal entries be symmetric in magnitude.

2.1.1 Symmetric Matrices

In the symmetric case, there are fast algorithms for sampling (or approximately sampling) from a DPP [32, 94, 74, 75]. Marginalizing the distribution on a subset 퐼 ⊆ [푁] and conditioning on the event that 퐽 ⊆ 푌 both result in new DPPs and closed form expressions for their kernels are known [11]. All of this work pertains to how to use a DPP once we

18 have learned its parameters. However, there has been much less work on the problem of learning the parameters of a symmetric DPP. A variety of heuristics have been proposed, including Expectation-Maximization [46], MCMC [1], and ﬁxed point algorithms [80]. All of these attempt to solve a non-convex optimization problem, and no guarantees on their statistical performance are known. Recently, Brunel et al. [16] studied the rate of estimation achieved by the maximum likelihood estimator, but the question of efﬁcient computation remains open. Apart from positive results on sampling, marginalization and conditioning, most provable results about DPPs are actually negative. It is conjectured that the maximum likelihood estimator is NP-hard to compute [69]. Actually, approximating the mode of size 푘 of a DPP to within a 푐푘 factor is known to be NP-hard for some 푐 > 1 [22, 108]. The best known algorithms currently obtain a 푒푘 + 표(푘) approximation factor [88, 89].

In Section 2.2, we bypass the difﬁculties associated with maximum likelihood estimation by using the method of moments to achieve optimal sample complexity. In the setting of DPPs, the method of moments has a close connection to a principal minor assignment problem, which asks, given some set of principal minors of a matrix, to recover the original matrix, up to some equivalence class. We introduce a parameter ℓ, called the cycle sparsity of the graph induced by the kernel 퐾, which governs the number of moments that need to be considered and, thus, the sample complexity. The cycle sparsity of a graph is the smallest integer ℓ so that the cycles of length at most ℓ yield a basis for the cycle space of the graph. We use a reﬁned version of Horton’s algorithm [51, 2] to implement the method of moments in polynomial time. Even though there are in general exponentially many cycles in a graph to consider, Horton’s algorithm constructs a minimum weight cycle basis and, in doing so, also reveals the parameter ℓ together with a collection of at most ℓ induced cycles spanning the cycle space. We use such cycles in order to construct our method of moments estimator.

For any ﬁxed ℓ ≥ 2 and kernel 퐾 satisfying either |퐾푖,푗| ≥ 훼 or 퐾푖,푗 = 0 for all 푖, 푗 ∈ [푁], our algorithm has sample complexity

(︃ )︃ (︂퐶 )︂2ℓ log 푁 푛 = 푂 + 훼 훼2휀2

19 for some constant 퐶 > 1, runs in time polynomial in 푛 and 푁, and learns the parameters up to an additive 휀 with high probability. The (퐶/훼)2ℓ term corresponds to the number of samples needed to recover the signs of the entries in 퐾. We complement this result with a minimax lower bound (Theorem 2) to show that this sample complexity is in fact near optimal. In particular, we show that there is an infinite family of graphs with cycle sparsity ℓ (namely length ℓ cycles) on which any algorithm requires at least (퐶′훼)−2ℓ samples to recover the signs of the entries of 퐾 for some constant 퐶′ > 1. We also provide experimental results that confirm many quantitative aspects of our theoretical predictions. Together, our upper bounds, lower bounds, and experiments present a nuanced understanding of which symmetric DPPs can be learned provably and efficiently.

2.1.2 Magnitude-Symmetric Matrices

Recently, DPPs with non-symmetric kernels have gained interest in the machine learning community [14, 44, 3, 93]. However, for these classes, questions of recovery are significantly harder in general. In [14], Brunel proposes DPPs with kernels that are symmetric in magnitudes, i.e., |퐾푖,푗| = |퐾푗,푖|, for all 푖, 푗 = 1, . . . , 푁. This class is referred to as signed DPPs. A signed DPP is a generalization of DPPs which allows for both attractive and repulsive behavior. Such matrices are relevant in machine learning applications, because they increase the modeling power of DPPs. An essential assumption made in [14] is that 퐾 is dense, which simplifies the combinatorial analysis. We consider magnitude- symmetric matrices, but without any density assumption. The problem becomes significantly harder because it requires a fine analysis of the combinatorial properties of a graphical representation of the sparsity of the matrix and the products of corresponding off-diagonal entries. In Section 2.3, we treat both theoretical and algorithmic questions around recovering a magnitude-symmetric matrix from its principal minors. First, in Subsection 2.3.1, we show that, for a given magnitude-symmetric matrix 퐾, the principal minors of length at most ℓ, for some graph invariant ℓ depending only on principal minors of order one and two, uniquely determine principal minors of all orders. Next, in Subsection 2.3.2, we describe an efficient algorithm that, given the principal minors of a magnitude-symmetric

20 matrix, computes a matrix with those principal minors. This algorithm queries only 푂(푛2) principal minors, all of a bounded order that depends solely on the sparsity of the matrix. Finally, in Subsection 2.3.3, we consider the question of recovery when principal minors are only known approximately, and construct an algorithm that, for magnitude-symmetric matrices that are sufﬁciently generic in a sense, recovers the matrix almost exactly. This algorithm immediately implies an procedure for learning a signed DPP from its samples, as basic probabilistic techniques (e.g., a union bound, etc.) can be used to show that with high probability all estimators are sufﬁciently close to the principal minor they are approximating.

21 2.2 Learning Symmetric Determinantal Point Processes

Let 푌1, . . . , 푌푛 be 푛 independent copies of 푌 ∼ DPP(퐾), for some unknown kernel 퐾 such

that 0 ⪯ 퐾 ⪯ 퐼푁 . It is well known that 퐾 is identiﬁed by DPP(퐾) only up to ﬂips of the ′ ′ signs of its rows and columns: If 퐾 is another symmetric matrix with 0 ⪯ 퐾 ⪯ 퐼푁 , then ′ ′ DPP(퐾 )=DPP(퐾) if and only if 퐾 = 퐷퐾퐷 for some 퐷 ∈ 풟푁 , where 풟푁 denotes the class of all 푁 × 푁 diagonal matrices with only 1 and −1 on their diagonals [69, Theorem

For any 푆 ⊂ [푁], we write ∆푆 = det(퐾푆), where 퐾푆 denotes the |푆| × |푆| sub-matrix of 퐾 obtained by keeping rows and columns with indices in 푆. Note that for 1 ≤ 푖 ̸= 푗 ≤ 푁, we have the following relations:

퐾푖,푖 = P[푖 ∈ 푌 ], ∆{푖,푗} = P[{푖, 푗} ⊆ 푌 ],

√︀ and |퐾푖,푗| = 퐾푖,푖퐾푗,푗 − ∆{푖,푗}. Therefore, the principal minors of size one and two of 퐾 determine 퐾 up to the sign of its off-diagonal entries. What remains is to compute the signs of the off-diagonal entries. In fact, for any 퐾, there exists an ℓ depending only on the

graph 퐺퐾 induced by 퐾, such that 퐾 can be recovered up to a 풟푁 -similarity with only the knowledge of its principal minors of size at most ℓ. We will show that this ℓ is exactly the cycle sparsity.

2.2.1 An Associated Principal Minor Assignment Problem

In this subsection, we consider the following related principal minor assignment problem:

Given a symmetric matrix 퐾 ∈ R푁×푁 , what is the minimal ℓ such that 퐾 can be

recovered (up to 풟푁 -similarity), using principal minors ∆푆, of size at most ℓ?

22 This problem has a clear relation to learning DPPs, as, in our setting, we can approximate the principal minors of 퐾 by empirical averages. However the accuracy of our estimator deteriorates with the size of the principal minor, and we must therefore estimate the smallest possible principal minors in order to achieve optimal sample complexity. Answering the above question tells us how small of principal minors we can consider. The relationship between the principal minors of 퐾 and recovery of DPP(퐾) has also been considered elsewhere. There has been work regarding the symmetric principal minor assignment problem, namely the problem of computing a matrix given an oracle that gives any principal minor in constant time [96]. Here, we prove that the smallest ℓ such that all the principal minors of 퐾 are uniquely determined by those of size at most ℓ is exactly the cycle sparsity of the graph induced by 퐾. We begin by recalling some standard graph theoretic notions. Let 퐺 = ([푁], 퐸), |퐸| = 푚.A cycle 퐶 of 퐺 is any connected subgraph in which each vertex has even degree.

푚 Each cycle 퐶 is associated with an incidence vector 푥 ∈ 퐺퐹 (2) such that 푥푒 = 1 if 푒 is 푚 an edge in 퐶 and 푥푒 = 0 otherwise. The cycle space 풞 of 퐺 is the subspace of 퐺퐹 (2)

spanned by the incidence vectors of the cycles in 퐺. The dimension 휈퐺 of the cycle space is

called the cyclomatic number, and it is well known that 휈퐺 := 푚 − 푁 + 휅(퐺), where 휅(퐺) denotes the number of connected components of 퐺. Recall that a simple cycle is a graph where every vertex has either degree two or zero and the set of vertices with degree two form a connected set. A cycle basis is a basis of 풞 ⊂ 퐺퐹 (2)푚 such that every element is a simple cycle. It is well known that every cycle space has a cycle basis of induced cycles.

Deﬁnition 1. The cycle sparsity of a graph 퐺 is the minimal ℓ for which 퐺 admits a cycle basis of induced cycles of length at most ℓ, with the convention that ℓ = 2 whenever the cycle space is empty. A corresponding cycle basis is called a shortest maximal cycle basis.

A shortest maximal cycle basis of the cycle space was also studied for other reasons by [21]. We defer a discussion of computing such a basis to a later subsection. For any subset

푆 ⊆ [푁], denote by 퐺퐾 (푆) = (푆, 퐸(푆)) the subgraph of 퐺퐾 induced by 푆. A matching

of 퐺퐾 (푆) is a subset 푀 ⊆ 퐸(푆) such that any two distinct edges in 푀 are not adjacent in 퐺(푆). The set of vertices incident to some edge in 푀 is denoted by 푉 (푀). We denote by

23 ℳ(푆) the collection of all matchings of 퐺퐾 (푆). Then, if 퐺퐾 (푆) is an induced cycle, we

can write the principal minor ∆푆 = det(퐾푆) as follows:

∑︁ |푀| ∏︁ 2 ∏︁ |푆|+1 ∏︁ ∆푆 = (−1) 퐾푖,푗 퐾푖,푖 + 2 × (−1) 퐾푖,푗. (2.1) 푀∈ℳ(푆) {푖,푗}∈푀 푖̸∈푉 (푀) {푖,푗}∈퐸(푆)

푁×푁 Proposition 1. Let 퐾 ∈ R be a symmetric matrix, 퐺퐾 be the graph induced by 퐾, and

ℓ ≥ 3 be some integer. The kernel 퐾 is completely determined up to 풟푁 -similarity by its

principal minors of size at most ℓ if and only if the cycle sparsity of 퐺퐾 is at most ℓ.

Proof. Note ﬁrst that all the principal minors of 퐾 completely determine 퐾 up to a 풟푁 - similarity [96, Theorem 3.14]. Moreover, recall that principal minors of degree at most 2 determine the diagonal entries of 퐾 as well as the magnitude of its off-diagonal entries. In particular, given these principal minors, one only needs to recover the signs of the off- diagonal entries of 퐾. Let the sign of cycle 퐶 in 퐾 be the product of the signs of the entries of 퐾 corresponding to the edges of 퐶.

Suppose 퐺퐾 has cycle sparsity ℓ and let (퐶1, . . . , 퐶휈) be a cycle basis of 퐺퐾 where each

퐶푖, 푖 ∈ [휈] is an induced cycle of length at most ℓ. By (2.1), the sign of any 퐶푖, 푖 ∈ [휈] is

completely determined by the principal minor ∆푆, where 푆 is the set of vertices of 퐶푖 and is 푚 such that |푆| ≤ ℓ. Moreover, for 푖 ∈ [휈], let 푥푖 ∈ 퐺퐹 (2) denote the incidence vector of 퐶푖. ∑︀ By deﬁnition, the incidence vector 푥 of any cycle 퐶 is given by 푖∈ℐ 푥푖 for some subset

ℐ ⊂ [휈]. The sign of 퐶 is then given by the product of the signs of 퐶푖, 푖 ∈ ℐ and thus by corresponding principal minors. In particular, the signs of all cycles are determined by the

principal minors ∆푆 with |푆| ≤ ℓ. In turn, by Theorem 3.12 in [96], the signs of all cycles

completely determine 퐾, up to a 풟푁 -similarity.

Next, suppose the cycle sparsity of 퐺퐾 is at least ℓ + 1, and let 풞ℓ be the subspace of 푚 퐺퐹 (2) spanned by the induced cycles of length at most ℓ in 퐺퐾 . Let 푥1, . . . , 푥휈 be a basis

of 풞ℓ made of the incidence column vectors of induced cycles of length at most ℓ in 퐺퐾 and 푚×휈 form the matrix 퐴 ∈ 퐺퐹 (2) by concatenating the 푥푖’s. Since 풞ℓ does not span the cycle ⊤ space of 퐺퐾 , 휈 < 휈퐺퐾 ≤ 푚. Hence, the rank of 퐴 is less than 푚, so the null space of 퐴 is ¯ non trivial. Let 푥¯ be the incidence column vector of an induced cycle 퐶 that is not in 풞ℓ, and

24 let ℎ ∈ 퐺퐿(2)푚 with 퐴⊤ℎ = 0, ℎ ̸= 0 and 푥¯⊤ℎ = 1. These three conditions are compatible ¯ ′ because 퐶∈ / 풞ℓ. We are now in a position to deﬁne an alternate kernel 퐾 as follows: Let ′ ′ 퐾푖,푖 = 퐾푖,푖 and |퐾푖,푗| = |퐾푖,푗| for all 푖, 푗 ∈ [푁]. We deﬁne the signs of the off-diagonal ′ ′ entries of 퐾 as follows: For all edges 푒 = {푖, 푗}, 푖 ̸= 푗, sgn(퐾푒) = sgn(퐾푒) if ℎ푒 = 0 and ′ ′ sgn(퐾푒) = − sgn(퐾푒) otherwise. We now check that 퐾 and 퐾 have the same principal minors of size at most ℓ but differ on a principal minor of size larger than ℓ. To that end, let

휈 푥 be the incidence vector of a cycle 퐶 in 풞ℓ so that 푥 = 퐴푤 for some 푤 ∈ 퐺퐿(2) . Thus the sign of 퐶 in 퐾 is given by

∏︁ 푥⊤ℎ ∏︁ ′ 푤⊤퐴⊤ℎ ∏︁ ′ ∏︁ ′ 퐾푒 = (−1) 퐾푒 = (−1) 퐾푒 = 퐾푒 푒 : 푥푒=1 푒 : 푥푒=1 푒 : 푥푒=1 푒 : 푥푒=1

⊤ ′ because 퐴 ℎ = 0. Therefore, the sign of any 퐶 ∈ 풞ℓ is the same in 퐾 and 퐾 . Now, let

푆 ⊆ [푁] with |푆| ≤ ℓ, and let 퐺 = 퐺 = 퐺 ′ be the graph corresponding to 퐾 (or, 퐾푆 퐾푆 푆 ′ equivalently, to 퐾푆). For any induced cycle 퐶 in 퐺, 퐶 is also an induced cycle in 퐺퐾 and ′ its length is at most ℓ. Hence, 퐶 ∈ 풞ℓ and the sign of 퐶 is the same in 퐾 and 퐾 . By [96, ′ ¯ Theorem 3.12], det(퐾푆) = det(퐾푆). Next observe that the sign of 퐶 in 퐾 is given by

∏︁ 푥¯⊤ℎ ∏︁ ′ ∏︁ ′ 퐾푒 = (−1) 퐾푒 = − 퐾푒. 푒 :푥 ¯푒=1 푒 :푥 ¯푒=1 푒 : 푥푒=1

¯ Note also that since 퐶 is an induced cycle of 퐺퐾 = 퐺퐾′ , the above quantity is nonzero. Let ¯ ¯ ′ 푆 be the set of vertices in 퐶. By (2.1) and the above display, we have det(퐾푆¯) ̸= det(퐾푆¯). ′ Together with [96, Theorem 3.14], it yields 퐾 ̸= 퐷퐾 퐷 for all 퐷 ∈ 풟푁 .

2.2.2 Deﬁnition of the Estimator

Our procedure is based on the previous result and can be summarized as follows. We ﬁrst estimate the diagonal entries (i.e., the principal minors of size one) of 퐾 by the method of moments. By the same method, we estimate the principal minors of size two of 퐾, and we deduce estimates of the magnitude of the off-diagonal entries. To use these estimates to

25 ˆ deduce an estimate 퐺 of 퐺퐾 , we make the following assumption on the kernel 퐾.

Assumption 1. Fix 훼 ∈ (0, 1). For all 1 ≤ 푖 < 푗 ≤ 푁, either 퐾푖,푗 = 0, or |퐾푖,푗| ≥ 훼.

Finally, we ﬁnd a shortest maximal cycle basis of 퐺ˆ, and we set the signs of our non-zero off-diagonal entry estimates by using estimators of the principal minors induced by the elements of the basis, again obtained by the method of moments. 푛 1 ∑︁ For 푆 ⊆ [푁], set ∆ˆ = 1 , and deﬁne 푆 푛 푆⊆푌푝 푝=1

ˆ ˆ ˆ ˆ ˆ ˆ 퐾푖,푖 = ∆{푖} and 퐵푖,푗 = 퐾푖,푖퐾푗,푗 − ∆{푖,푗},

ˆ ˆ 2 ˆ ˆ where 퐾푖,푖 and 퐵푖,푗 are our estimators of 퐾푖,푖 and 퐾푖,푗, respectively. Deﬁne 퐺 = ([푁], 퐸), ˆ ˆ 1 2 ˆ where, for 푖 ̸= 푗, {푖, 푗} ∈ 퐸 if and only if 퐵푖,푗 ≥ 2 훼 . The graph 퐺 is our estimator of 퐺퐾 . Let {퐶ˆ , ..., 퐶ˆ } be a shortest maximal cycle basis of the cycle space of 퐺ˆ. Let 푆ˆ ⊆ [푁] 1 휈퐺^ 푖 ˆ be the subset of vertices of 퐶푖, for 1 ≤ 푖 ≤ 휈퐺^. We deﬁne

∑︁ ∏︁ ∏︁ 퐻ˆ = ∆ˆ − (−1)|푀| 퐵ˆ 퐾ˆ , 푖 푆^푖 푖,푗 푖,푖

푀∈ℳ(푆^푖) {푖,푗}∈푀 푖̸∈푉 (푀)

for 1 ≤ 푖 ≤ 휈퐺^. In light of (2.1), for large enough 푛, this quantity should be close to

|푆^푖|+1 ∏︁ 퐻푖 = 2 × (−1) 퐾푖,푗 .

{푖,푗}∈퐸(푆^푖)

ˆ We note that this definition is only symbolic in nature, and computing 퐻푖 in this fashion is extremely inefficient. Instead, to compute it in practice, we will use the determinant of an auxiliary matrix, computed via a matrix factorization. Namely, let us define the matrix 푁×푁 ˆ ˆ1/2 퐾̃︀ ∈ R such that 퐾̃︀푖,푖 = 퐾푖,푖 for 1 ≤ 푖 ≤ 푁, and 퐾̃︀푖,푗 = 퐵푖,푗 . We have

∑︁ ∏︁ ∏︁ ^ ∏︁ 1/2 det 퐾 = (−1)|푀| 퐵ˆ 퐾ˆ + 2 × (−1)|푆푖|+1 퐵ˆ , ̃︀푆^푖 푖,푗 푖,푖 푖,푗

푀∈ℳ {푖,푗}∈푀 푖̸∈푉 (푀) {푖,푗}∈퐸^(푆^푖)

26 so that we may equivalently write

^ ∏︁ 1/2 퐻ˆ = ∆ˆ − det(퐾 ) + 2 × (−1)|푆푖|+1 퐵ˆ . 푖 푆^푖 ̃︀푆^푖 푖,푗

{푖,푗}∈퐸^(푆^푖)

ˆ 휈 ^ ×푚^ ˆ Finally, let 푚ˆ = |퐸|. Set the matrix 퐴 ∈ 퐺퐹 (2) 퐺 with 푖-th row representing 퐶푖 in 푚 휈 1 퐺퐹 (2) , 1 ≤ 푖 ≤ 휈 , 푏 = (푏 , . . . , 푏 ) ∈ 퐺퐹 (2) 퐺^ with 푏 = [sgn(퐻ˆ ) + 1], 1 ≤ 푖 ≤ 휈 , 퐺^ 1 휈퐺^ 푖 2 푖 퐺^ 푚 and let 푥 ∈ 퐺퐹 (2) be a solution to the linear system 퐴푥 = 푏 if a solution exists, 푥 = 1푚 ˆ ˆ ˆ ˆ ˆ1/2 otherwise. We deﬁne 퐾푖,푗 = 0 if {푖, 푗} ∈/ 퐸 and 퐾푖,푗 = 퐾푗,푖 = (2푥{푖,푗} − 1)퐵푖,푗 for all {푖, 푗} ∈ 퐸ˆ.

Next, we prove the following lemma which relates the quality of estimation of 퐾 in terms of 휌 to the quality of estimation of the principal minors ∆푆.

Lemma 1. Let 퐾 satisfy Assumption 1, and let ℓ be the cycle sparsity of 퐺퐾 . Let 휀 > 0. If ˆ ˆ |푆| |∆푆 − ∆푆| ≤ 휀 for all 푆 ⊆ [푁] with |푆| ≤ 2 and if |∆푆 − ∆푆| ≤ (훼/4) for all 푆 ⊆ [푁] with 3 ≤ |푆| ≤ ℓ, then 휌(퐾,ˆ 퐾) < 4휀/훼 .

ˆ 2 Proof. We can bound |퐵푖,푗 − 퐾푖,푗|, namely,

ˆ 2 2 2 퐵푖,푗 ≤ (퐾푖,푖 + 훼 /16)(퐾푗,푗 + 훼 /16) − (∆{푖,푗} − 훼 /16)

2 2 ≤ 퐾푖,푗 + 훼 /4 and

ˆ 2 2 2 퐵푖,푗 ≥ (퐾푖,푖 − 훼 /16)(퐾푗,푗 − 훼 /16) − (∆{푖,푗} + 훼 /16)

2 2 ≥ 퐾푖,푗 − 3훼 /16,

ˆ 2 2 giving |퐵푖,푗 − 퐾푖,푗| < 훼 /4. Thus, we can correctly determine whether 퐾푖,푗 = 0 or |퐾푖,푗| ≥ 훼, yielding 퐺ˆ = 퐺 . In particular, the cycle basis 퐶ˆ ,..., 퐶ˆ of 퐺ˆ is a cycle basis of 퐺 . 퐾 1 휈퐺^ 퐾

27 |푆푖| Let 1 ≤ 푖 ≤ 휈퐺^. Denote by 푡 = (훼/4) . We have

⃒ ⃒ [︁ ^ ]︁ ˆ ˆ ˆ |푆푖| ⃒퐻푖 − 퐻푖⃒ ≤ |∆ ^ − ∆ ^ | + |ℳ(푆푖)| max (1 + 4푡푥) − 1 ⃒ ⃒ 푆푖 푆푖 푥∈±1 ^ [︁ ^ ]︁ |푆푖| ˆ |푆푖| ≤ (훼/4) + |ℳ(푆푖)| (1 + 4푡) − 1 (︃ ⌊︃ ˆ ⌋︃)︃ ^ |푆푖| ≤ (훼/4)|푆푖| + 푇 |푆ˆ |, 4푡 푇 (|푆ˆ |, |푆ˆ |) 푖 2 푖 푖

^ |푆^ | |푆푖| |푆^ | ≤ (훼/4) 푖 + 4푡 (2 2 − 1)(2 푖 − 1)

^ ^ ≤ (훼/4)|푆푖| + 푡22|푆푖|

|푆^푖| < 2훼 ≤ |퐻푖|,

∑︀푝 (︀푞)︀ where, for positive integers 푝 < 푞, we denote by 푇 (푞, 푝) = 푖=1 푖 . Therefore, we can determine the sign of the product ∏︀ 퐾 for every element in the cycle basis and {푖,푗}∈퐸(푆^푖) 푖,푗

recover the signs of the non-zero off-diagonal entries of 퐾푖,푗. Hence,

ˆ ⃒ ˆ ⃒ 휌(퐾, 퐾) = max ⃒|퐾푖,푗| − |퐾푖,푗|⃒. 1≤푖,푗≤푁 ⃒ ⃒

⃒ ⃒ ⃒ ˆ ⃒ ˆ ˆ For 푖 = 푗, ⃒|퐾푖,푗| − |퐾푖,푗|⃒ = |퐾푖,푖 − 퐾푖,푖| ≤ 휀. For 푖 ̸= 푗 with {푖, 푗} ∈ 퐸 = 퐸, one can ⃒ ⃒ ⃒ ˆ 2 ⃒ easily show that ⃒퐵푖,푗 − 퐾푖,푗⃒ ≤ 4휀, yielding

4휀 4휀 |퐵ˆ1/2 − |퐾 || ≤ ≤ , 푖,푗 푖,푗 ⃒ ˆ1/2 ⃒ 훼 ⃒퐵푖,푗 + |퐾푖,푗|⃒ which completes the proof.

We are now in a position to establish a sufﬁcient sample size to estimate 퐾 within distance 휀.

Theorem 1. Let 퐾 satisfy Assumption 1, and let ℓ be the cycle sparsity of 퐺퐾 . Let 휀 > 0. For any 퐴 > 0, there exists 퐶 > 0 such that

(︁ 1 4 )︁ 푛 ≥ 퐶 + ℓ(︀ )︀2ℓ log 푁 , 훼2휀2 훼

28 yields 휌(퐾,ˆ 퐾) ≤ 휀 with probability at least 1 − 푁 −퐴.

Proof. Using the previous lemma, and applying a union bound,

[︁ ]︁ [︁ ]︁ [︁ ]︁ ˆ ∑︁ ˆ ∑︁ ˆ |푆| P 휌(퐾, 퐾) > 휀 ≤ P |∆푆 − ∆푆| > 훼휀/4 + P |∆푆 − ∆푆| > (훼/4) |푆|≤2 2≤|푆|≤ℓ

2 2 2ℓ ≤ 2푁 2푒−푛훼 휀 /8 + 2푁 ℓ+1푒−2푛(훼/4) , (2.2) where we used Hoeffding’s inequality.

2.2.3 Information Theoretic Lower Bounds

We prove an information-theoretic lower bound that holds already if 퐺퐾 is an ℓ-cycle. Let 퐷(퐾‖퐾′) and H(퐾, 퐾′) denote respectively the Kullback-Leibler divergence and the Hellinger distance between DPP(퐾) and DPP(퐾′).

Lemma 2. For 휂 ∈ {−, +}, let 퐾휂 be the ℓ × ℓ matrix with elements given by

⎧ ⎪ 1/2 if 푗 = 푖 ⎪ ⎪ ⎨⎪ 훼 if 푗 = 푖 ± 1 퐾푖,푗 = . ⎪ 휂훼 if (푖, 푗) ∈ {(1, ℓ), (ℓ, 1)} ⎪ ⎪ ⎩⎪ 0 otherwise

Then, for any 훼 ≤ 1/8, it holds

′ ℓ ′ 2 ℓ 퐷(퐾‖퐾 ) ≤ 4(6훼) , and H(퐾, 퐾 ) ≤ (8훼 ) .

Proof. It is straightforward to see that

⎧ ⎪2훼ℓ if 퐽 = [ℓ] + − ⎨ det(퐾퐽 ) − det(퐾퐽 ) = . ⎩⎪ 0 else

휂 If 푌 is sampled from DPP(퐾 ), we denote by 푝휂(푆) = P[푌 = 푆], for 푆 ⊆ [ℓ]. It follows

29 from the inclusion-exclusion principle that for all 푆 ⊆ [ℓ],

∑︁ |퐽| + − 푝+(푆) − 푝−(푆) = (−1) (det 퐾푆∪퐽 − det 퐾푆∪퐽 ) 퐽⊆[ℓ]∖푆 = (−1)ℓ−|푆|(det 퐾+ − det 퐾−) = ±2훼ℓ , (2.3) where |퐽| denotes the cardinality of 퐽. The inclusion-exclusion principle also yields that

휂 푝휂(푆) = | det(퐾 − 퐼푆¯)| for all 푆 ⊆ [푙], where 퐼푆¯ stands for the ℓ × ℓ diagonal matrix with ones on its entries (푖, 푖) for 푖∈ / 푆, zeros elsewhere. We denote by 퐷(퐾+‖퐾−) the Kullback Leibler divergence between DPP(퐾+) and DPP(퐾−):

(︂ )︂ + − ∑︁ 푝+(푆) 퐷(퐾 ‖퐾 ) = 푝+(푆) log 푝−(푆) 푆⊆[ℓ]

∑︁ 푝+(푆) ≤ (푝+(푆) − 푝−(푆)) 푝−(푆) 푆⊆[ℓ] + ∑︁ | det(퐾 − 퐼 ¯)| ≤ 2훼ℓ 푆 , (2.4) | det(퐾− − 퐼 ¯)| 푆⊆[ℓ] 푆

by (2.3). Using the fact that 0 < 훼 ≤ 1/8 and the Gershgorin circle theorem, we conclude

휂 that the absolute value of all eigenvalues of 퐾 − 퐼푆¯ are between 1/4 and 3/4. Thus we obtain from (2.4) the bound 퐷(퐾+‖퐾−) ≤ 4(6훼)ℓ.

Using the same arguments as above, the Hellinger distance H(퐾+, 퐾−) between DPP(퐾+) and DPP(퐾−) satisﬁes

(︃ )︃2 + − ∑︁ 푝+(퐽) − 푝−(퐽) H(퐾 , 퐾 ) = √︀ √︀ 퐽⊆[ℓ] 푝+(퐽) + 푝−(퐽) ∑︁ 4훼2ℓ ≤ = (8훼2)ℓ 2 · 4−ℓ 퐽⊆[ℓ] which completes the proof.

The sample complexity lower bound now follows from standard arguments.

30 Theorem 2. Let 0 < 휀 ≤ 훼 ≤ 1/8 and 3 ≤ ℓ ≤ 푁. There exists a constant 퐶 > 0 such that if (︁ 8ℓ log(푁/ℓ) log 푁 )︁ 푛 ≤ 퐶 + + , 훼2ℓ (6훼)ℓ 휀2

then the following holds: for any estimator 퐾ˆ based on 푛 samples, there exists a kernel

퐾 that satisﬁes Assumption 1 and such that the cycle sparsity of 퐺퐾 is ℓ and for which 휌(퐾,ˆ 퐾) ≥ 휀 with probability at least 1/3.

Proof. Recall the notation of Lemma 2. First consider the 푁 × 푁 block diagonal matrix 퐾

′ + − (resp. 퐾 ) where its ﬁrst block is 퐾 (resp. 퐾 ) and its second block is 퐼푁−ℓ. By a standard ′ ⊗푛 argument, the Hellinger distance H푛(퐾, 퐾 ) between the product measures DPP(퐾) and DPP(퐾′)⊗푛 satisﬁes

2 (퐾, 퐾′) 2(퐾, 퐾′) 훼2ℓ 1 − H푛 = (︀1 − H )︀푛 ≥ (︀1 − )︀푛, 2 2 2 × 8ℓ which yields the ﬁrst term in the desired lower bound. Next, by padding with zeros, we can assume that 퐿 = 푁/ℓ is an integer. Let 퐾(0) be a block diagonal matrix where each block is 퐾+ (using the notation of Lemma 2). For 푗 = 1, . . . , 퐿, deﬁne the 푁 × 푁 block diagonal matrix 퐾(푗) as the matrix obtained from 퐾(0) by replacing its 푗th block with 퐾− (again using the notation of Lemma 2). Since DPP(퐾(푗)) is the product measure of 퐿 lower dimensional DPPs that are each independent of each other, using Lemma 2 we have 퐷(퐾(푗)‖퐾(0)) ≤ 4(6훼)ℓ. Hence, by Fano’s lemma (see, e.g., Corollary 2.6 in [115]), the sample complexity to learn the kernel of a DPP within a distance 휀 ≤ 훼 is

(︂log(푁/ℓ))︂ Ω (6훼)ℓ which yields the second term.

The third term follows from considering 퐾0 = (1/2)퐼푁 and letting 퐾푗 be obtained from 2 퐾0 by adding 휀 to the 푗th entry along the diagonal. It is easy to see that 퐷(퐾푗‖퐾0) ≤ 8휀 . Hence, a second application of Fano’s lemma yields that the sample complexity to learn the

31 log 푁 kernel of a DPP within a distance 휀 is Ω( 휀2 ).

The third term in the lower bound is the standard parametric term and is unavoidable in order to estimate the magnitude of the coefﬁcients of 퐾. The other terms are more

interesting. They reveal that the cycle sparsity of 퐺퐾 , namely, ℓ, plays a key role in the task of recovering the sign pattern of 퐾. Moreover the theorem shows that the sample complexity of our method of moments estimator is near optimal.

2.2.4 Algorithmic Aspects and Experiments

We ﬁrst give an algorithm to compute the estimator 퐾ˆ deﬁned in Subsection 2.2.2. A well-known algorithm of Horton [51] computes a cycle basis of minimum total length in time 푂(푚3푁). Subsequently, the running time was improved to 푂(푚2푁/ log 푁) time [2]. Also, it is known that a cycle basis of minimum total length is a shortest maximal cycle basis [21]. Together, these results imply the following.

Lemma 3. Let 퐺 = ([푁], 퐸), |퐸| = 푚. There is an algorithm to compute a shortest maximal cycle basis in 푂(푚2푁/ log 푁) time.

In addition, we recall the following standard result regarding the complexity of Gaussian elimination [47].

Lemma 4. Let 퐴 ∈ 퐺퐹 (2)휈×푚, 푏 ∈ 퐺퐹 (2)휈. Then Gaussian elimination will ﬁnd a vector 푥 ∈ 퐺퐹 (2)푚 such that 퐴푥 = 푏 or conclude that none exists in 푂(휈2푚) time.

We give our procedure for computing the estimator 퐾ˆ in Algorithm 1. In the following theorem, we bound the running time of Algorithm 1 and establish an upper bound on the sample complexity needed to solve the recovery problem as well as the sample complexity needed to compute an estimate 퐾ˆ that is close to 퐾.

Theorem 3. Let 퐾 ∈ R푁×푁 be a symmetric matrix satisfying 0 ⪯ 퐾 ⪯ 퐼, and satisfying

Assumption 1. Let 퐺퐾 be the graph induced by 퐾 and ℓ be the cycle sparsity of 퐺퐾 . Let

32 Algorithm 1 Compute Estimator 퐾ˆ

Input: samples 푌1, ..., 푌푛, parameter 훼 > 0. ˆ Compute ∆푆 for all |푆| ≤ 2. ˆ ˆ Set 퐾푖,푖 = ∆{푖} for 1 ≤ 푖 ≤ 푁. ˆ Compute 퐵푖,푗 for 1 ≤ 푖 < 푗 ≤ 푁. Form 퐾̃︀ ∈ R푁×푁 and 퐺ˆ = ([푁], 퐸ˆ). Compute a shortest maximal cycle basis {푣ˆ , ..., 푣ˆ }. 1 휈퐺^ Compute ∆ˆ for 1 ≤ 푖 ≤ 휈 . 푆^푖 퐺^ Compute 퐶ˆ using det 퐾 for 1 ≤ 푖 ≤ 휈 . 푆^푖 ̃︀푆^푖 퐺^ 휈 ×푚 휈 Construct 퐴 ∈ 퐺퐹 (2) 퐺^ , 푏 ∈ 퐺퐹 (2) 퐺^ . Solve 퐴푥 = 푏 using Gaussian elimination. ˆ ˆ ˆ1/2 ˆ Set 퐾푖,푗 = 퐾푗,푖 = (2푥{푖,푗} − 1)퐵푖,푗 , for all {푖, 푗} ∈ 퐸.

푌1, ..., 푌푛 be samples from DPP(퐾) and 훿 ∈ (0, 1). If

log(푁 ℓ+1/훿) 푛 > , (훼/4)2ℓ

then with probability at least 1 − 훿, Algorithm 1 computes an estimator 퐾ˆ which recovers

the signs of 퐾 up to a 풟푁 -similarity and satisﬁes

1 (︂8 log(4푁 ℓ+1/훿))︂1/2 휌(퐾, 퐾ˆ ) < (2.5) 훼 푛 in 푂(푚3 + 푛푁 2) time.

Proof. (2.5) follows directly from (2.2) in the proof of Theorem 1. That same proof also

shows that with probability at least 1−훿, the support of 퐺퐾 and the signs of 퐾 are recovered

up to a 풟푁 -similarity. What remains is to upper bound the worst case run time of Algorithm 1. We will perform this analysis line by line. Initializing 퐾ˆ requires 푂(푁 2) operations. 2 ˆ Computing ∆푆 for all subsets |푆| ≤ 2 requires 푂(푛푁 ) operations. Setting 퐾푖,푖 requires ˆ 2 푂(푁) operations. Computing 퐵푖,푗 for 1 ≤ 푖 < 푗 ≤ 푁 requires 푂(푁 ) operations. Forming 2 2 퐾̃︀ requires 푂(푁 ) operations. Forming 퐺퐾 requires 푂(푁 ) operations. By Lemma 3, computing a shortest maximal cycle basis requires 푂(푚푁) operations. Constructing the ˆ subsets 푆푖, 1 ≤ 푖 ≤ 휈퐺^, requires 푂(푚푁) operations. Computing ∆푆푖 for 1 ≤ 푖 ≤ 휈퐺^

33 ˆ requires 푂(푛푚) operations. Computing 퐶푆푖 using det(퐾̃︀[푆푖]) for 1 ≤ 푖 ≤ 휈퐺^ requires 3 푂(푚ℓ ) operations, where a factorization of each 퐾̃︀[푆푖] is used to compute each determinant in 푂(ℓ3) operations. Constructing 퐴 and 푏 requires 푂(푚ℓ) operations. By Lemma 4, ﬁnding 3 ˆ a solution 푥 using Gaussian elimination requires 푂(푚 ) operations. Setting 퐾푖,푗 for all edges {푖, 푗} ∈ 퐸 requires 푂(푚) operations. Put this all together, Algorithm 1 runs in 푂(푚3 + 푛푁 2) time.

Chordal Graphs

Here we show that it is possible to obtain faster algorithms by exploiting the structure

of 퐺퐾 . Speciﬁcally, in the case where 퐺퐾 chordal, we give an 푂(푚) time algorithm to determine the signs of the off-diagonal entries of the estimator 퐾ˆ , resulting in an improved overall runtime of 푂(푚 + 푛푁 2). Recall that a graph 퐺 = ([푁], 퐸) is said to be chordal if every induced cycle in 퐺 is of length three. Moreover, a graph 퐺 = ([푁], 퐸) has a perfect

elimination ordering (PEO) if there exists an ordering of the vertex set {푣1, ..., 푣푁 } such that,

for all 푖, the graph induced by {푣푖} ∪ {푣푗|{푖, 푗} ∈ 퐸, 푗 > 푖} is a clique. It is well known that a graph is chordal if and only if it has a PEO. A PEO of a chordal graph with 푚 edges can be computed in 푂(푚) operations using lexicographic breadth-ﬁrst search [98].

Lemma 5. Let 퐺 = ([푁], 퐸), be a chordal graph and {푣1, ..., 푣푛} be a PEO. Given 푖, * ′ ′ ′ let 푖 := min{푗|푗 > 푖, {푣푖, 푣푗} ∈ 퐸}. Then the graph 퐺 = ([푁], 퐸 ), where 퐸 = 푁−휅(퐺) {{푣푖, 푣푖* }}푖=1 , is a spanning forest of 퐺. Proof. We ﬁrst show that there are no cycles in 퐺′. Suppose to the contrary, that there is

an induced cycle 퐶 of length 푘 on the vertices {푣푗1 , ..., 푣푗푘 }. Let 푣 be the vertex of smallest index. Then 푣 is connected to two other vertices in the cycle of larger index. This is a contradiction to the construction. What remains is to show that |퐸′| = 푁 − 휅(퐺). It sufﬁces to prove the case 휅(퐺) = 1.

Suppose to the contrary, that there exists a vertex 푣푖, 푖 < 푁, with no neighbors of larger

index. Let 푃 be the shortest path in 퐺 from 푣푖 to 푣푁 . By connectivity, such a path exists.

Let 푣푘 be the vertex of smallest index in the path. However, it has two neighbors in the path of larger index, which must be adjacent to each other. Therefore, there is a shorter path.

34 Algorithm 2 Compute Signs of Edges in Chordal Graph ˆ Input: 퐺퐾 = ([푁], 퐸) chordal, ∆푆 for |푆| ≤ 3.

Compute a perfect elimination ordering {푣1, ..., 푣푁 }. Compute the spanning forest 퐺′ = ([푁], 퐸′). Set all edges in 퐸′ to have positive sign. ˆ ′ Compute 퐶{푖,푗,푖*} for all {푖, 푗} ∈ 퐸 ∖ 퐸 , 푗 < 푖. ′ Order edges 퐸 ∖ 퐸 = {푒1, ..., 푒휈} such that 푖 > 푗 if max 푒푖 < max 푒푗. Visit edges in sorted order and for 푒 = {푖, 푗}, 푗 > 푖, set

ˆ * * sgn({푖, 푗}) = sgn(퐶{푖,푗,푖*}) sgn({푖, 푖 }) sgn({푗, 푖 }).

Now, given the chordal graph 퐺퐾 induced by 퐾 and the estimates of principal minors

of size at most three, we provide an algorithm to determine the signs of the edges of 퐺퐾 , or, equivalently, the off-diagonal entries of 퐾.

Theorem 4. If 퐺퐾 is chordal, Algorithm 2 correctly determines the signs of the edges of

퐺퐾 in 푂(푚) time.

Proof. We will simultaneously perform a count of the operations and a proof of the cor- rectness of the algorithm. Computing a PEO requires 푂(푚) operations. Computing the spanning forest requires 푂(푚) operations. The edges of the spanning tree can be given arbitrary sign, because it is a cycle-free graph. This assigns a sign to two edges of each ˆ 3-cycle. Computing each 퐶{푖,푗,푖*} requires a constant number of operations because ℓ = 3, requiring a total of 푂(푚 − 푁) operations. Ordering the edges requires 푂(푚) operations. Setting the signs of each remaining edge requires 푂(푚) operations.

Therefore, when 퐺퐾 is chordal, the overall complexity required by our algorithm to compute 퐾ˆ is reduced to 푂(푚 + 푛푁 2).

Experiments

Here we present experiments to supplement the theoretical results of the section. We test

our algorithm on two types of random matrices. First, we consider the matrix 퐾 ∈ R푁×푁

35 (a) graph recovery, cycle (b) graph and sign recovery, cycle

Figure 2-1: Plots of the proportion of successive graph recovery, and graph and sign recovery, for random matrices with cycle and clique graph structure, respectively. The darker the box, the higher the proportion of trials that were recovered successfully.

corresponding to the cycle on 푁 vertices,

1 1 퐾 = 퐼 + 퐴, 2 4 where 퐴 is symmetric and has non-zero entries only on the edges of the cycle, either +1 or −1, each with probability 1/2. By the Gershgorin circle theorem, 0 ⪯ 퐾 ⪯ 퐼. Next, we

consider the matrix 퐾 ∈ R푁×푁 corresponding to the clique on 푁 vertices,

1 1 퐾 = 퐼 + √ 퐴, 2 4 푁 where 퐴 is symmetric and has all entries either +1 or −1, each with probability 1/2. It is √ √ well known that −2 푁 ⪯ 퐴 ⪯ 2 푁 with high probability, implying 0 ⪯ 퐾 ⪯ 퐼.

For both cases and for a range of values of matrix dimension 푁 and samples 푛, we run our algorithm on 50 randomly generated instances. We record the proportion of trials where we recover the graph induced by 퐾, and the proportion of the trials where we recover

36 both the graph and correctly determine the signs of the entries. In Figure 2-1, the shade of each box represents the proportion of trials where recovery was successful for a given pair 푁, 푛. A completely white box corresponds to zero success rate, black to a perfect success rate. The plots corresponding to the cycle and the clique are telling. We note that for the clique, recovering the sparsity pattern and recovering the signs of the off-diagonal entries come hand-in-hand. However, for the cycle, there is a noticeable gap between the number of samples required to recover the sparsity pattern and the number of samples required to recover the signs of the off-diagonal entries. This empirically conﬁrms the central role that cycle sparsity plays in parameter estimation, and further corroborates our theoretical results.

37 2.3 Recovering a Magnitude-Symmetric Matrix from its Principal Minors

푁×푁 In this section, we consider matrices 퐾 ∈ R satisfying |퐾푖,푗| = |퐾푗,푖| for 푖, 푗 ∈ [푁], which we refer to as magnitude-symmetric matrices, and investigate the algorithmic question of recovering such a matrix from its principal minors. First, we require a number of key deﬁnitions and notation. For completeness (and to allow independent reading), we are

including terms and notations that we have deﬁned in Section 2.2. If 퐾 ∈ R푁×푁 and

푆 ⊆ [푁], 퐾푆 := (퐾푖,푗)푖,푗∈푆 and ∆푆(퐾) := det 퐾푆 is the principal minor of 퐾 associated with the set 푆 (∆∅(퐾) = 1 by convention). When it is clear from the context, we simply write ∆푆 instead of ∆푆(퐾), and for sets of order at most four, we replace the set itself by

its elements, i.e., write ∆{1,2,3} as ∆1,2,3. In addition, we recall a number of relevant graph-theoretic deﬁnitions. In this work, all graphs 퐺 are simple, undirected graphs. An articulation point or cut vertex of a graph 퐺 is a vertex whose removal disconnects the graph. A graph 퐺 is said to be two-connected if it has at least two vertices and has no articulation points. A maximal two-connected subgraph 퐻 of 퐺 is called a block of 퐺. Given a graph 퐺, a cycle 퐶 is a subgraph of 퐺 in which every vertex of 퐶 has even degree (within 퐶). A simple cycle 퐶 is a connected subgraph of 퐺 in which every vertex of 퐶 has degree two. For simplicity, we may sometimes describe 퐶 by a

traversal of its vertices along its edges, i.e., 퐶 = 푖1 푖2 ... 푖푘 푖1; notice we have the choice of the start vertex and the orientation of the cycle in this description. We denote the subgraph induced by a vertex subset 푆 ⊂ 푉 by 퐺[푆]. A simple cycle 퐶 of a graph 퐺 on vertex set 푉 (퐶) is not necessarily induced, and while the cycle itself has all vertices of degree two, this cycle may contain some number of chords in the original graph 퐺, and we denote this set 퐸(퐺[푆])∖퐸(퐶) of chords by 훾(퐶).

푚 Given any subgraph 퐻 of 퐺, we can associate with 퐻 an incidence vector 휒퐻 ∈ 퐺퐹 (2) ,

푚 = |퐸|, where 휒퐻 (푒) = 1 if and only if 푒 ∈ 퐻, and 휒퐻 (푒) = 0 otherwise. Given two

subgraphs 퐻1, 퐻2 ⊂ 퐺 of 퐺, we deﬁne their sum 퐻1 + 퐻2 as the graph containing all

edges in exactly one of 퐸(퐻1) and 퐸(퐻2) (i.e., their symmetric difference) and no isolated

38 vertices. This corresponds to the graph resulting from the sum of their incidence vectors. The cycle space of 퐺 is given by

푚 풞(퐺) = span{휒퐶 | 퐶 is a cycle of 퐺} ⊂ 퐺퐹 (2) ,

and has dimension 휈 = 푚 − 푁 + 휅(퐺), where 휅(퐺) denotes the number of connected components of 퐺. The quantity 휈 is commonly referred to as the cyclomatic number. The cycle sparsity ℓ of the graph 퐺 is the smallest number for which the set of incidence vectors

휒퐶 of cycles of edge length at most ℓ spans 풞(퐺).

In this section, we require the use and analysis of graphs 퐺 = ([푁], 퐸) endowed with a linear Boolean function 휖 that maps subgraphs of 퐺 to {−1, +1}, i.e., 휖(푒) ∈ {−1, +1} for all 푒 ∈ 퐸(퐺) and ∏︁ 휖(퐻) = 휖(푒) 푒∈퐸(퐻) for all subgraphs 퐻. The graph 퐺 combined with a linear Boolean function 휖 is denoted by 퐺 = ([푁], 퐸, 휖) and referred to as a charged graph. If 휖(퐻) = +1 (resp. 휖(퐻) = −1), then we say the subgraph 퐻 is positive (resp. negative). For the sake of space, we often

푁×푁 denote 휖({푖, 푗}), {푖, 푗} ∈ 퐸(퐺), by 휖푖,푗. Given a magnitude-symmetric matrix 퐾 ∈ R ,

|퐾푖,푗| = |퐾푗,푖| for 푖, 푗 ∈ [푁], we deﬁne the charged sparsity graph 퐺퐾 as 퐺퐾 = ([푁], 퐸, 휖),

퐸 := {푖, 푗 ∈ [푁] | 푖 ̸= 푗, |퐾푖,푗|= ̸ 0}, 휖({푖, 푗}) := sgn(퐾푖,푗퐾푗,푖), and when clear from

context, we simply write 퐺 instead of 퐺퐾 .

We deﬁne the span of the incidence vectors of positive cycles as

+ 풞 (퐺) = span{휒퐶 | 퐶 is a cycle of 퐺, 휖(퐶) = +1}.

As we will see, when 퐻 is a block, the space 풞+(퐻) is spanned by positive simple cycles

+ (Proposition 3). We deﬁne the simple cycle sparsity ℓ+ of 풞 (퐺) to be the smallest number

for which the set of incidence vectors 휒퐶 of positive simple cycles of edge length at most + + ℓ+ spans 풞 (퐺) (or, equivalently, the set 풞 (퐻) for every block 퐻 of 퐺). Unlike 풞(퐺), for + 풞 (퐺) this quantity ℓ+ depends on whether we consider a basis of cycles or of only simple

39 cycles. The study of bases of 풞+(퐻), for blocks 퐻, consisting of incidence vectors of simple cycles constitutes the major graph-theoretic subject of this work, and has connections to the recovery of a magnitude-symmetric matrix from its principal minors.

2.3.1 Principal Minors and Magnitude-Symmetric Matrices

In this subsection, we are primarily concerned with recovering a magnitude-symmetric matrix 퐾 from its principal minors. Using only principal minors of order one and two,

the quantities 퐾푖,푖, 퐾푖,푗퐾푗,푖, and the charged graph 퐺퐾 can be computed immediately, as

퐾푖,푖 = ∆푖 and 퐾푖,푗퐾푗,푖 = ∆푖∆푗 − ∆푖,푗 for all 푖 ̸= 푗. The main focus of this subsection is to obtain further information on 퐾 using principal minors of order greater than two, and to quantify the extent to which 퐾 can be identiﬁed from its principal minors. To avoid the unintended cancellation of terms in a principal minor, in what follows we assume that 퐾 is generic in the sense that

If 퐾푖,푗퐾푗,푘퐾푘,ℓ퐾ℓ,푖 ̸= 0 for 푖, 푗, 푘, ℓ ∈ [푁] distinct, then |퐾푖,푗퐾푘,ℓ|= ̸ |퐾푗,푘퐾ℓ,푖|, (2.6)

and 휑1 퐾푖,푗퐾푗,푘퐾푘,ℓ퐾ℓ,푖 + 휑2퐾푖,푗퐾푗,ℓ퐾ℓ,푘퐾푘,푖 + 휑3퐾푖,푘퐾푘,푗퐾푗,ℓ퐾ℓ,푖 ̸= 0

for all 휑1, 휑2, 휑3 ∈ {−1, 1}.

The first part of the condition in (2.6) implies that the three terms (corresponding to the three distinct cycles on 4 vertices) in the second part are all distinct in magnitude; the second part strengthens this requirement. Condition (2.6) can be thought of as a no-cancellation requirement for four-cycles of principal minors of order four. As we will see, this condition is quite important for the recovery of a magnitude-symmetric matrix from its principal minors. Though, the results of this section, slightly modified, hold under slightly weaker conditions than (2.6), albeit at the cost of simplicity. This condition rules out dense matrices with a high degree of symmetry, as well as the large majority of rank one matrices, but this condition is satisfied for almost all matrices. We denote the set of 푁 × 푁 magnitude-symmetric matrices satisfying Property (2.6)

by 풦푁 , and, when the dimension is clear from context, we often simply write 풦. We note

40 ′ ′ that if 퐾 ∈ 풦, then any matrix 퐾 satisfying |퐾푖,푗| = |퐾푖,푗|, 푖, 푗 ∈ [푁], is also in 풦. In this subsection, we answer the following question:

′ Given 퐾 ∈ 풦, what is the minimal ℓ+ such that any 퐾 ∈ 풦 with (2.7)

′ ′ ∆푆(퐾) = ∆푆(퐾 ) for all |푆| ≤ ℓ+ also satisﬁes ∆푆(퐾) = ∆푆(퐾 ) for all 푆 ⊂ [푁]?

Question (2.7) asks for the smallest ℓ such that the principal minors of order at most ℓ uniquely determines principal minors of all orders. In Theorem 5, we show that the answer is the simple cycle sparsity of 풞+(퐺). In Subsections 2.3.2 and 2.3.3, we build on the analysis of this subsection, and produce a polynomial-time algorithm for recovering a matrix 퐾 with prescribed principal minors (possibly given up to some error term). The polynomial- time algorithm (of Subsection 2.3.3) for recovery given perturbed principal minors has key connections to learning signed DPPs.

In addition, it is also reasonable to ask:

′ ′ Given 퐾 ∈ 풦, what is the set of 퐾 ∈ 풦 that satisfy ∆푆(퐾) = ∆푆(퐾 ) (2.8) for all 푆 ⊂ [푁]?

Question (2.8) treats the extent to which we can hope to recover a matrix from its principal minors. For instance, the transpose operation 퐾푇 and the similarity transformation 퐷퐾퐷, where 퐷 is a diagonal matrix with entries {−1, +1} on the diagonal, both clearly preserve principal minors. In fact, these two operations suitably combined completely deﬁne this set. This result follows fairly quickly from a more general result of Loewy [79, Theorem 1] regarding matrices with all principal minors equal. In particular, Loewy shows that if two 푛 × 푛, 푛 ≥ 4, matrices 퐴 and 퐵 have all principal minors equal, and 퐴 is irreducible and

rank(퐴푆,푇 ) ≥ 2 or rank(퐴푇,푆) ≥ 2 (where 퐴푆,푇 is the submatrix of 퐴 containing the rows in 푆 and the columns in 푇 ) for all partitions 푆 ∪ 푇 = [푛], 푆 ∩ 푇 = ∅, |푆|, |푇 | ≥ 2, then either 퐵 or 퐵푇 is diagonally similar to 퐴. In Proposition 4, we include an alternate proof

41 answering Question (2.8), as Loewy’s result and proof, though more general and quite nice, is more involved and not as illuminating for the speciﬁc case that we consider here.

To answer Question (2.7), we study properties of certain bases of the space 풞+(퐺). We recall that, given a matrix 퐾 ∈ 풦, we can deﬁne the charged sparsity graph 퐺 = ([푁], 퐸, 휖) of 퐾, where 퐺 is the simple graph on vertex set [푁] with an edge {푖, 푗} ∈ 퐸, 푖 ̸= 푗, if

퐾푖,푗 ̸= 0 and endowed with a function 휖 mapping subgraphs of 퐺 to {−1, +1}. Viewing the possible values ±1 for 휖 as the two elements of 퐺퐹 (2), we note that 휖(퐻) is additive over

퐺퐹 (2), i.e. 휖(퐻1 + 퐻2) = 휖(퐻1)휖(퐻2). We can make a number of observations regarding the connectivity of 퐺. If 퐺 is disconnected, with connected components given by vertex

sets 푉1, ..., 푉푘, then any principal minor ∆푆 satisﬁes

푘 ∏︁ ∆푆 = ∆푆∩푉푗 . (2.9) 푗=1

In addition, if 퐺 has a cut vertex 푖, whose removal results in connected components with vertex sets 푉1, ..., 푉푘, then the principal minor ∆푆 satisﬁes

푘 푘 ∑︁ ∏︁ ∏︁ ∆ = ∆ ∆ − (푘 − 1)∆ ∆ . (2.10) 푆 [{푖}∪푉푗1 ]∩푆 푉푗2 ∩푆 {푖}∩푆 푉푗 ∩푆 푗1=1 푗2̸=푗1 푗=1

This implies that the principal minors of 퐾 are uniquely determined by principal minors corresponding to subsets of blocks of 퐺. Given this property, we focus on matrices 퐾 without an articulation point, i.e. 퐺 is two-connected. Given results for matrices without an articulation point and Equations (2.9) and (2.10), we can then answer Question (2.7) in the more general case.

Next, we make an important observation regarding the contribution of certain terms in the Laplace expansion of a principal minor. Recall that the Laplace expansion of the determinant is given by

푁 ∑︁ ∏︁ det(퐾) = sgn(휎) 퐾푖,휎(푖),

휎∈풮푁 푖=1

42 where sgn(휎) is multiplicative over the (disjoint) cycles forming the permutation, and is preserved when taking the inverse of the permutation, or of any cycle therein. Consider now an arbitrary, possibly non-induced, simple cycle 퐶 (in the graph-theoretic sense) of the sparsity graph 퐺, without loss of generality given by 퐶 = 1 2 ... 푘 1, that satisﬁes 휖(퐶) = −1. Consider the sum of all terms in the Laplace expansion that contains either the cycle (1 2 ... 푘) or its inverse (푘 푘 − 1 ... 1) in the associated permutation. Because 휖(퐶) = −1, 푘−1 푘 ∏︁ ∏︁ 퐾푘,1 퐾푖,푖+1 + 퐾1,푘 퐾푖,푖−1 = 0, 푖=1 푖=2 and the sum over all permutations containing the cyclic permutations (푘 푘 − 1 ... 1) or (1 2 ... 푘) is zero. Therefore, terms associated with permutations containing negative cycles (휖(퐶) = −1) do not contribute to principal minors.

To illustrate the additional information contained in higher order principal minors ∆푆, |푆| > 2, we ﬁrst consider principal minors of order three and four. Consider the principal

minor ∆1,2,3, given by

∆1,2,3 = 퐾1,1퐾2,2퐾3,3 − 퐾1,1퐾2,3퐾3,2 − 퐾2,2퐾1,3퐾3,1 − 퐾3,3퐾1,2퐾2,1

+ 퐾1,2퐾2,3퐾3,1 + 퐾3,2퐾2,1퐾1,3 [︀ ]︀ [︀ ]︀ [︀ ]︀ = ∆1∆2∆3 − ∆1 ∆2∆3 − ∆2,3 − ∆2 ∆1∆3 − ∆1,3 − ∆3 ∆1∆2 − ∆1,2 [︀ ]︀ + 1 + 휖1,2휖2,3휖3,1 퐾1,2퐾2,3퐾3,1.

If the corresponding graph 퐺[{1, 2, 3}] is not a cycle, or if the cycle is negative (휖1,2휖2,3휖3,1 =

−1), then ∆1,2,3 can be written in terms of principal minors of order at most two, and contains no additional information about 퐾. If 퐺[{1, 2, 3}] is a positive cycle, then we can write

퐾1,2퐾2,3퐾3,1 as a function of principal minors of order at most three,

1 1 퐾 퐾 퐾 = ∆ ∆ ∆ − [︀∆ ∆ + ∆ ∆ + ∆ ∆ ]︀ + ∆ , 1,2 2,3 3,1 1 2 3 2 1 2,3 2 1,3 3 1,2 2 1,2,3

which allows us to compute sgn(퐾1,2퐾2,3퐾3,1). This same procedure holds for any posi- tively charged induced simple cycle in 퐺. However, when a simple cycle is not induced,

43 further analysis is required. To illustrate some of the potential issues for a non-induced positive simple cycle, we consider principal minors of order four.

Consider the principal minor ∆1,2,3,4. By our previous analysis, all terms in the Laplace

expansion of ∆1,2,3,4 corresponding to permutations with cycles of length at most three can be written in terms of principal minors of size at most three. What remains is to consider the sum of the terms in the Laplace expansion corresponding to permutations containing a cycle of length four (which there are three pairs of, each pair corresponding to the two orientations of a cycle of length four in the graph sense), which we denote by 푍, given by

푍 = −퐾1,2퐾2,3퐾3,4퐾4,1 − 퐾1,2퐾2,4퐾4,3퐾3,1 − 퐾1,3퐾3,2퐾2,4퐾4,1

− 퐾4,3퐾3,2퐾2,1퐾1,4 − 퐾4,2퐾2,1퐾1,3퐾3,4 − 퐾4,2퐾2,3퐾3,1퐾1,4.

If 퐺[{1, 2, 3, 4}] does not contain a positive simple cycle of length four, then 푍 = 0 and

∆1,2,3,4 can be written in terms of principal minors of order at most three. If 퐺[{1, 2, 3, 4}] has exactly one positive cycle of length four, without loss of generality given by 퐶 := 1 2 3 4 1, then

[︀ ]︀ 푍 = − 1 + 휖(퐶) 퐾1,2퐾2,3퐾3,4퐾4,1 = −2퐾1,2퐾2,3퐾3,4퐾4,1,

and we can write 퐾1,2퐾2,3퐾3,4퐾4,1 as a function of principal minors of order at most four, which allows us to compute sgn(퐾1,2퐾2,3퐾3,4퐾4,1). Finally, we treat the case in which there is more than one positive simple cycle of length four. This implies that all simple cycles of length four are positive and 푍 is given by

[︀ ]︀ 푍 = −2 퐾1,2퐾2,3퐾3,4퐾4,1 + 퐾1,2퐾2,4퐾4,3퐾3,1 + 퐾1,3퐾3,2퐾2,4퐾4,1 . (2.11)

By Condition (2.6), the magnitude of each of these three terms is distinct, 푍 ̸= 0, and we can

compute the sign of each of these terms, as there is a unique choice of 휑1, 휑2, 휑3 ∈ {−1, +1}

44 satisfying

푍 휑 |퐾 퐾 퐾 퐾 | + 휑 |퐾 퐾 퐾 퐾 | + 휑 |퐾 퐾 퐾 퐾 | = − . 1 1,2 2,3 3,4 4,1 2 1,2 2,4 4,3 3,1 3 1,3 3,2 2,4 4,1 2

In order to better understand the behavior of principal minors of order greater than four, we investigate the set of positive simple cycles (i.e., simple cycles 퐶 satisfying 휖(퐶) = +1). In the following two propositions, we compute the dimension of 풞+(퐺), and note that when 퐺 is two-connected, this space is spanned by incidence vectors corresponding to positive simple cycles.

Proposition 2. Let 퐺 = ([푁], 퐸, 휖) be a charged graph. Then dim(풞+(퐺)) = 휈 − 1 if 퐺 contains at least one negative cycle, and equals 휈 otherwise.

Proof. Consider a basis {푥1, ..., 푥휈} for 풞(퐺). If all associated cycles are positive, then dim(풞+(퐺)) = 휈 and we are done. If 퐺 has a negative cycle, then, by the linearity of 휖, at

least one element of {푥1, ..., 푥휈} must be the incidence vector of a negative cycle. Without

loss of generality, suppose that 푥1, ..., 푥푖 are incidence vectors of negative cycles. Then

푥1 + 푥2, ..., 푥1 + 푥푖, 푥푖+1, ..., 푥휈 is a linearly independent set of incidence vectors of positive + cycles. Therefore, dim(풞 (퐺)) ≥ 휈 − 1. However, 푥1 is the incidence vector of a negative

cycle, and so cannot be in the span of 푥1 + 푥2, ..., 푥1 + 푥푖, 푥푖+1, ..., 푥휈.

Proposition 3. Let 퐺 = ([푁], 퐸, 휖) be a two-connected charged graph. Then

+ 풞 (퐺) = span{휒퐶 | 퐶 is a simple cycle of 퐺, 휖(퐶) = +1}.

Proof. If 퐺 does not contain a negative cycle, then the result follows immediately, as then 풞+(퐺) = 풞(퐺). Suppose then that 퐺 has at least one negative cycle. Decomposing this negative cycle into the union of edge-disjoint simple cycles, we note that 퐺 must also have at least one negative simple cycle. Since 퐺 is a two-connected graph, it admits a proper (also called open) ear decomposition with 휈 − 1 proper ears (Whitney [128], see also [63]) starting from any simple cycle. We

choose our initial simple cycle to be a negative simple cycle denoted by 퐺0. Whitney’s proper

45 ear decomposition says that we can obtain a sequence of graphs 퐺0, 퐺1, ··· , 퐺휈−1 = 퐺 where 퐺푖 is obtained from 퐺푖−1 by adding a path 푃푖 between two distinct vertices 푢푖 and 푣푖

of 푉 (퐺푖−1) with its internal vertices not belonging to 푉 (퐺푖−1) (a proper or open ear). By

construction, 푃푖 is also a path between 푢푖 and 푣푖 in 퐺푖. We will prove a stronger statement by induction on 푖, namely that for suitably constructed

positive simple cycles 퐶푗, 푗 = 1, ..., 휈 − 1, we have that, for every 푖 = 0, 1, ..., 휈 − 1:

+ (i) 풞 (퐺푖) = span{휒퐶푗 : 1 ≤ 푗 ≤ 푖},

(ii) for every pair of distinct vertices 푢, 푣 ∈ 푉 (퐺푖), there exists both a positive path in 퐺푖

between 푢 and 푣 and a negative path in 퐺푖 between 푢 and 푣.

For 푖 = 0, (i) is clear and (ii) follows from the fact that the two paths between 푢 and 푣 in

the cycle 퐺0 must have opposite charge since their sum is 퐺0 and 휖(퐺0) = −1. For 푖 > 0, we assume that we have constructed 퐶1, ··· , 퐶푖−1 satisfying (i) and (ii) for smaller values

of 푖. To construct 퐶푖, we take the path 푃푖 between 푢푖 and 푣푖 and complete it with a path of

charge 휖(푃푖) between 푢푖 and 푣푖 in 퐺푖−1. The existence of this latter path follows from (ii),

and these two paths together form a simple cycle 퐶푖 with 휖(퐶푖) = +1. It is clear that this 퐶푖

is linearly independent from all the previously constructed cycles 퐶푗 since 퐶푖 contains 푃푖

but 푃푖 was not even part of 퐺푖−1. This implies (i) for 푖.

To show (ii) for 퐺푖, we need to consider three cases for 푢 ̸= 푣. If 푢, 푣 ∈ 푉 (퐺푖−1), we

can use the corresponding positive and negative paths in 퐺푖−1 (whose existence follows

from (ii) for 푖 − 1). If 푢, 푣 ∈ 푉 (푃푖), one of the paths can be the subpath 푃푢푣 of 푃푖 between

푢 and 푣 and the other path can be 푃푖 ∖ 푃푢푣 together with a path in 퐺푖−1 between 푢푖 and 푣푖 of

charge equal to −휖(푃푖) (so that together they form a negative cycle). Otherwise, we must

have 푢 ∈ 푉 (푃푖) ∖ {푢푖, 푣푖} and 푣 ∈ 푉 (퐺푖−1) ∖ {푢푖, 푣푖} (or vice versa), and we can select the

paths to be the path in 푃푖 from 푢 to 푢푖 together with two oppositely charged paths in 퐺푖−1

between 푢푖 and 푣. In all these cases, we have shown property (ii) holds for 퐺푖.

For the remainder of the analysis in this subsection, we assume that 퐺 contains a negative cycle, otherwise 풞+(퐺) = 풞(퐺) and we inherit all of the desirable properties of 풞(퐺). Next, we study the properties of simple cycle bases (bases consisting of incidence

46 vectors of simple cycles) of 풞+(퐺) that are minimal in some sense. We say that a simple

+ cycle basis {휒퐶1 , ..., 휒퐶휈−1 } for 풞 (퐺) is lexicographically minimal if it lexicographically

minimizes (|퐶1|, |퐶2|, ..., |퐶휈−1|), i.e., minimizes |퐶1|, minimizes |퐶2| conditional on |퐶1| ∑︀ being minimal, etc. Any lexicographical minimal basis also minimizes |퐶푖| and max |퐶푖| (by optimality of the greedy algorithm for (linear) matroids). For brevity, we will simply refer to such a basis as “minimal." One complicating issue is that, while a minimal cycle basis for 풞(퐺) always consists of induced simple cycles, this is no longer the case for 풞+(퐺). This for example can happen for an appropriately constructed graph with only two negative, short simple cycles, far away from each other; a minimal cycle basis for 풞+(퐺) then contains a cycle consisting of the union of these 2 negative simple cycles. However, we can make a number of statements regarding chords of simple cycles corresponding to elements of a minimal simple cycle basis of 풞+(퐺). In the following lemma, we show that a minimal simple cycle basis satisﬁes a number of desirable properties. However, before we do so, we introduce some useful notation. Given a simple cycle 퐶, it is

convenient to ﬁx an orientation say 퐶 = 푖1 푖2 ... 푖푘 푖1. Now for any chord {푖푘1 , 푖푘2 } ∈ 훾(퐶),

푘1 < 푘2, we denote the two cycles created by this chord by

퐶(푘1, 푘2) = 푖푘1 푖푘1+1 ... 푖푘2 푖푘1 and 퐶(푘2, 푘1) = 푖푘2 푖푘2+1 ... 푖푘 푖1 ... 푖푘1 푖푘2 .

We have the following result.

Lemma 6. Let 퐺 = ([푁], 퐸, 휖) be a two-connected charged graph, and 퐶 := 푖1 푖2 ... 푖푘 푖1

be a cycle corresponding to an incidence vector of a minimal simple cycle basis {푥1, ..., 푥휈−1} of 풞+(퐺). Then

(︀ )︀ (︀ )︀ (i) 휖 퐶(푘1, 푘2) = 휖 퐶(푘2, 푘1) = −1 for all {푖푘1 , 푖푘2 } ∈ 훾(퐶),

(ii) if {푖푘1 , 푖푘2 }, {푖푘3 , 푖푘4 } ∈ 훾(퐶) satisfy 푘1 < 푘3 < 푘2 < 푘4 (crossing chords), then either

푘3 − 푘1 = 푘4 − 푘2 = 1 or 푘1 = 1, 푘2 − 푘3 = 1, 푘4 = 푘, i.e. these two chords form a four-cycle with two edges of the cycle,

(iii) there does not exist {푖푘1 , 푖푘2 }, {푖푘3 , 푖푘4 }, {푖푘5 , 푖푘6 } ∈ 훾(퐶) satisfying either 푘1 < 푘2 ≤

47 푘3 < 푘4 ≤ 푘5 < 푘6 or 푘6 ≤ 푘1 < 푘2 ≤ 푘3 < 푘4 ≤ 푘5.

Proof. Without loss of generality, suppose that 퐶 = 1 2 ... 푘. Given a chord {푘1, 푘2}, (︀ )︀ (︀ )︀ 퐶(푘1, 푘2) + 퐶(푘2, 푘1) = 퐶, and so 휖(퐶) = +1 implies that 휖 퐶(푘1, 푘2) = 휖 퐶(푘2, 푘1) . If both these cycles were positive, then this would contradict the minimality of the basis, as

|퐶(푘1, 푘2)|, |퐶(푘2, 푘1)| < 푘. This completes the proof of Property (i).

Given two chords {푘1, 푘2} and {푘3, 푘4} satisfying 푘1 < 푘3 < 푘2 < 푘4, we consider the cycles

퐶1 := 퐶(푘1, 푘2) + 퐶(푘3, 푘4),

퐶2 := 퐶(푘1, 푘2) + 퐶(푘4, 푘3).

By Property (i), 휖(퐶1) = 휖(퐶2) = +1. In addition, 퐶1 + 퐶2 = 퐶, and |퐶1|, |퐶2| ≤ |퐶|.

By the minimality of the basis, either |퐶1| = |퐶| or |퐶2| = |퐶|, which implies that either

|퐶1| = 4 or |퐶2| = 4 (or both). This completes the proof of Property (ii).

Given three non-crossing chords {푘1, 푘2}, {푘3, 푘4}, and {푘5, 푘6}, with 푘1 < 푘2 ≤ 푘3 <

푘4 ≤ 푘5 < 푘6 (the other case is identical, up to rotation), we consider the cycles

퐶1 := 퐶(푘3, 푘4) + 퐶(푘5, 푘6) + 퐶,

퐶2 := 퐶(푘1, 푘2) + 퐶(푘5, 푘6) + 퐶,

퐶3 := 퐶(푘1, 푘2) + 퐶(푘3, 푘4) + 퐶.

By Property (i), 휖(퐶1) = 휖(퐶2) = 휖(퐶3) = +1. In addition, 퐶1 + 퐶2 + 퐶3 = 퐶 and

|퐶1|, |퐶2|, |퐶3| < |퐶|, a contradiction to the minimality of the basis. This completes the proof of Property (iii).

The above lemma tells us that in a minimal simple cycle basis for 풞+(퐺), in each cycle two crossing chords always form a positive cycle of length four (and thus any chord has at most one other crossing chord), and there doesn’t exist three non-crossing chords without one chord in between the other two. Using a simple cycle basis of 풞+(퐺) that satisﬁes the three properties of the above lemma, we aim to show that the principal minors of order at

48 most the length of the longest cycle in the basis uniquely determine the principal minors of all orders. To do so, we prove a series of three lemmas that show that from the principal

minor corresponding to a cycle 퐶 = 푖1 ... 푖푘 푖1 in our basis and 푂(1) principal minors of subsets of 푉 (퐶), we can compute the quantity

∏︁ (︀ )︀ 푠(퐶) := sgn 퐾푖,푗 . {푖,푗}∈퐸(퐶), 푖<푗

Once we have computed 푠(퐶푖) for every simple cycle in our basis, we can compute 푠(퐶) for any simple positive cycle by writing 퐶 as a sum of some subset of the simple cycles

퐶1, ..., 퐶휈−1 corresponding to incidence vectors in our basis, and taking the product of 푠(퐶푖) for all indices 푖 in the aforementioned sum. To this end, we prove the following three lemmas.

Lemma 7. Let 퐶 be a positive simple cycle of 퐺 = ([푁], 퐸, 휖) with 푉 (퐶) = {푖1, 푖2, ··· , 푖푘} whose chords 훾(퐶) satisfy Properties (i)-(iii) of Lemma 6. Then there exist two edges of 퐶,

say, {푖1, 푖푘} and {푖ℓ, 푖ℓ+1}, where 퐶 := 푖1 푖2 ... 푖푘 푖1 such that

(i) all chords {푖푝, 푖푞} ∈ 훾(퐶), 푝 < 푞, satisfy 푝 ≤ ℓ < 푞,

(ii) any positive simple cycle in 퐺[푉 (퐶)] containing {푖1, 푖푘} spans 푉 (퐶) and contains only edges in 퐸(퐶) and pairs of crossed chords.

Proof. We ﬁrst note that Property (i) of Lemma 6 implies that the charge of a cycle 퐶′ ⊂ 퐺[푉 (퐶)] corresponds to the parity of 퐸(퐶′) ∩ 훾(퐶), i.e., if 퐶′ contains an even number of chords then 휖(퐶′) = +1, otherwise 휖(퐶′) = −1. This follows from constructing a cycle basis for 퐺[푉 (퐶)] consisting of incidence vectors corresponding to 퐶 and 퐶(푢, 푣) for every chord {푢, 푣} ∈ 훾(퐶). Let the cycle 퐶(푖, 푗) be a shortest cycle in 퐺[푉 (퐶)] containing exactly one chord of 퐶. If this is a non-crossed chord, then we take an arbitrary edge in 퐸(퐶(푖, 푗))∖{푖, 푗} to be

′ ′ ′ ′ ′ ′ {푖1, 푖푘}. If this chord crosses another, denoted by {푖 , 푗 }, then either 퐶(푖 , 푗 ) or 퐶(푗 , 푖 ) is also a shortest cycle, and we take an arbitrary edge in the intersection of these two shortest

49 cycles. Without loss of generality, let {1, 푘} denote this edge and let the cycle be 1 2 ... 푘 1, where for simplicity we have assumed that 푉 (퐶) = [푘]. Because of Property (iii) of Lemma 6, there exists ℓ such that all chords {푖, 푗} ∈ 훾(퐶) satisfy 푖 ≤ ℓ and 푗 > ℓ. This completes the proof of Property (i).

Next, consider an arbitrary positive simple cycle 퐶′ in 퐺[푉 (퐶)] containing {1, 푘}. We aim to show that this cycle contains only edges in 퐸(퐺) and pairs of crossed chords, from which the equality 푉 (퐶′) = 푉 (퐶) immediately follows. Since 퐶′ is positive, it contains an even number of chords 훾(퐶), and either all the chords 훾(퐶) ∩ 퐸(퐶′) are pairs

′ of crossed chords or there exist two chords {푝1, 푞1}, {푝2, 푞2} ∈ 훾(퐶) ∩ 퐸(퐶 ), w.l.o.g.

푝1 ≤ 푝2 ≤ ℓ < 푞2 < 푞1 (we can simply reverse the orientation if 푞1 = 푞2), that do not cross ′ ′ any other chord in 훾(퐶) ∩ 퐸(퐶 ). In this case, there would be a 1 − 푞2 path in 퐶 neither ′ containing 푝1 nor 푞1 (as {푝1, 푞1} would be along the other path between 1 and 푞2 in 퐶 and ′ 푞1 ̸= 푞2). However, this is a contradiction as no such path exists in 퐶 , since {푝1, 푞1} does not cross any chord in 훾(퐶) ∩ 퐸(퐶′). Therefore, the chords 훾(퐶) ∩ 퐸(퐶′) are all pairs of crossed chords. Furthermore, since all pairs of crossed chords must deﬁne cycles of length 4, we have 푉 (퐶′) = 푉 (퐶). This completes the proof of Property (ii).

Although this is not needed in what follows, observe that, in the above Lemma 7,

{푖ℓ, 푖ℓ+1} also plays the same role as {푖1, 푖푘} in the sense that any positive simple cycle

containing {푖ℓ, 푖ℓ+1} also spans 푉 (퐶) and contains only pairs of crossed chords and edges in 퐸(퐶).

Lemma 8. Let 퐾 ∈ 풦 have charged sparsity graph 퐺 = ([푁], 퐸, 휖), and 퐶 = 푖1 ... 푖푘 푖1 be a positive simple cycle of 퐺 whose chords 훾(퐶) satisfy Properties (i)-(iii) of Lemma 6, and with vertices ordered as in Lemma 7. Then the principal minor corresponding to

50 푆 = 푉 (퐶) is given by

[︀ ]︀ ∆푆 = ∆푖1 ∆푆∖푖1 + ∆푖푘 ∆푆∖푖푘 + ∆푖1,푖푘 − 2∆푖1 ∆푖푘 ∆푆∖푖1,푖푘

− 2퐾푖1,푖푘−1 퐾푖푘−1,푖푘 퐾푖푘,푖2 퐾푖2,푖1 ∆푆∖푖1,푖2,푖푘−1,푖푘 [︀ ]︀[︀ ]︀ − ∆푖1,푖2 − ∆푖1 ∆푖2 ∆푖푘−1,푖푘 − ∆푖푘−1 ∆푖푘 ∆푆∖푖1,푖2,푖푘−1,푖푘 [︀ ]︀[︀ ]︀ + ∆푖1,푖푘−1 − ∆푖1 ∆푖푘−1 ∆푖2,푖푘 − ∆푖2 ∆푖푘 ∆푆∖푖1,푖2,푖푘−1,푖푘 [︀ ]︀[︀ ]︀ + ∆푖1,푖2 − ∆푖1 ∆푖2 ∆푆∖푖1,푖2 − ∆푖푘 ∆푆∖푖1,푖2,푖푘 [︀ ]︀[︀ ]︀ + ∆푖푘−1,푖푘 − ∆푖푘−1 ∆푖푘 ∆푆∖푖푘−1,푖푘 − ∆푖2 ∆푆∖푖1,푖푘−1,푖푘 + 푍.

where 푍 is the sum of terms in the Laplace expansion of ∆푆 corresponding to a permutation

where 휎(푖1) = 푖푘 or 휎(푖푘) = 푖1, but not both.

Proof. Without loss of generality, suppose that 퐶 = 1 2 ... 푘 1. Each term in ∆푆 corresponds to a partition of 푆 = 푉 (퐶) into a disjoint collection of vertices, pairs corresponding

to edges of 퐺[푆], and simple cycles 퐶푖 of 퐸(푆) which can be assumed to be positive. We

can decompose the Laplace expansion of ∆푆 into seven sums of terms, based on how the as-

sociated permutation for each term treats the elements 1 and 푘. Let 푋푗1,푗2 , 푗1, 푗2 ∈ {1, 푘, *},

equal the sum of terms corresponding to permutations where 휎(1) = 푗1 and 휎(푘) = 푗2, where * denotes any element in {2, ..., 푘 − 1}. The case 휎(1) = 휎(푘) obviously cannot occur, and so

∆푆 = 푋1,푘 + 푋1,* + 푋*,푘 + 푋푘,1 + 푋푘,* + 푋*,1 + 푋*,*.

By deﬁnition, 푍 = 푋푘,* + 푋*,1. What remains is to compute the remaining ﬁve terms. We have

푋푘,1 = −퐾1,푘퐾푘,1∆푆∖1,푘 [︀ ]︀ = ∆1,푘 − ∆1∆푘 ∆푆∖1,푘,

51 푋1,푘 = ∆1∆푘∆푆∖1,푘,

푋1,푘 + 푋1,* = ∆1∆푆∖1,

푋1,푘 + 푋*,푘 = ∆푘∆푆∖푘,

and so

[︀ ]︀ ∆푆 = ∆1∆푆∖1 + ∆푘∆푆∖푘 + ∆1,푘 − 2∆1∆푘 ∆푆∖1,푘 + 푋*,* + 푍.

The sum 푋*,* corresponds to permutations where 휎(1) ∈/ {1, 푘} and 휎(푘) ∈/ {1, 푘}. We ﬁrst show that the only permutations that satisfy these two properties take the form 휎2(1) = 1 or 휎2(푘) = 푘 (or both), or contain both 1 and 푘 in the same cycle. Suppose to the contrary, that there exists some permutation where 1 and 푘 are not in the same cycle, and both are in cycles of length at least three. Then in 퐺[푆] both vertices 1 and 푘 contain a chord emanating from them. By Property (ii) of Lemma 6, each has exactly one chord and these chords are {1, 푘 − 1} and {2, 푘}. The cycle containing 1 must also contain 2 and 푘 − 1, and the cycle containing 푘 must also contain 2 and 푘 − 1, a contradiction. In addition, we note that, also by the above analysis, the only cycles (of a permutation) that can contain both 1 and 푘 without having 휎(1) = 푘 or 휎(푘) = 1 are given by (1 2 푘 푘−1)

and (1 푘 − 1 푘 2). Therefore, we can decompose 푋*,* further into the sum of ﬁve terms. Let

• 푌1 = the sum of terms corresponding to permutations containing either (1 2 푘 푘 − 1) or (1 푘 − 1 푘 2),

• 푌2 = the sum of terms corresponding to permutations containing (1 2) and (푘 − 1 푘),

• 푌3 = the sum of terms corresponding to permutations containing (1 푘 − 1) and (2 푘),

• 푌4 = the sum of terms corresponding to permutations containing (1 2) and where 휎(푘) ̸= 푘 − 1, 푘,

• 푌5 = the sum of terms corresponding to permutations containing (푘 − 1 푘) and where 휎(1) ̸= 1, 2.

52 푌1 and 푌3 are only non-zero if {1, 푘 − 1} and {2, 푘} are both chords of 퐶. 푌4 is only

non-zero if there is a chord incident to 푘, and 푌5 is only non-zero if there is a chord incident

to 1. We have 푋*,* = 푌1 + 푌2 + 푌3 + 푌4 + 푌5, and

푌1 = −2퐾1,푘−1퐾푘−1,푘퐾푘,2퐾2,1∆푆∖1,2,푘−1,푘,

푌2 = 퐾1,2퐾2,1퐾푘−1,푘퐾푘,푘−1∆푆∖1,2,푘−1,푘 [︀ ]︀[︀ ]︀ = ∆1,2 − ∆1∆2 ∆푘−1,푘 − ∆푘−1∆푘 ∆푆∖1,2,푘−1,푘,

푌3 = 퐾1,푘−1퐾푘−1,1퐾2,푘퐾푘,2∆푆∖1,2,푘−1,푘 [︀ ]︀[︀ ]︀ = ∆1,푘−1 − ∆1∆푘−1 ∆2,푘 − ∆2∆푘 ∆푆∖1,2,푘−1,푘, [︀ ]︀ 푌2 + 푌4 = −퐾1,2퐾2,1 ∆푆∖1,2 − ∆푘∆푆∖1,2,푘 [︀ ]︀[︀ ]︀ = ∆1,2 − ∆1∆2 ∆푆∖1,2 − ∆푘∆푆∖1,2,푘 , [︀ ]︀ 푌2 + 푌5 = −퐾푘−1,푘퐾푘,푘−1 ∆푆∖푘−1,푘 − ∆1∆푆∖1,푘−1,푘 [︀ ]︀[︀ ]︀ = ∆푘−1,푘 − ∆푘−1∆푘 ∆푆∖푘−1,푘 − ∆2∆푆∖1,푘−1,푘 .

Combining all of these terms gives us

푋*,* = −2퐾1,푘−1퐾푘−1,푘퐾푘,2퐾2,1∆푆∖1,2,푘−1,푘 [︀ ]︀[︀ ]︀ − ∆1,2 − ∆1∆2 ∆푘−1,푘 − ∆푘−1∆푘 ∆푆∖1,2,푘−1,푘 [︀ ]︀[︀ ]︀ + ∆1,푘−1 − ∆1∆푘−1 ∆2,푘 − ∆2∆푘 ∆푆∖1,2,푘−1,푘 [︀ ]︀[︀ ]︀ + ∆1,2 − ∆1∆2 ∆푆∖1,2 − ∆푘∆푆∖1,2,푘 [︀ ]︀[︀ ]︀ + ∆푘−1,푘 − ∆푘−1∆푘 ∆푆∖푘−1,푘 − ∆2∆푆∖1,푘−1,푘 .

53 Combining our formula for 푋*,* with our formula for ∆푆 leads to the desired result

[︀ ]︀ ∆푆 = ∆1∆푆∖1 + ∆푘∆푆∖푘 + ∆1,푘 − 2∆1∆푘 ∆푆∖1,푘

− 2퐾1,푘−1퐾푘−1,푘퐾푘,2퐾2,1∆푆∖1,2,푘−1,푘 [︀ ]︀[︀ ]︀ − ∆1,2 − ∆1∆2 ∆푘−1,푘 − ∆푘−1∆푘 ∆푆∖1,2,푘−1,푘 [︀ ]︀[︀ ]︀ + ∆1,푘−1 − ∆1∆푘−1 ∆2,푘 − ∆2∆푘 ∆푆∖1,2,푘−1,푘 [︀ ]︀[︀ ]︀ + ∆1,2 − ∆1∆2 ∆푆∖1,2 − ∆푘∆푆∖1,2,푘 [︀ ]︀[︀ ]︀ + ∆푘−1,푘 − ∆푘−1∆푘 ∆푆∖푘−1,푘 − ∆2∆푆∖1,푘−1,푘 + 푍.

Lemma 9. Let 퐾 ∈ 풦 have two-connected charged sparsity graph 퐺 = ([푁], 퐸, 휖), and

퐶 = 푖1 ... 푖푘 푖1 be a positive simple cycle of 퐺 whose chords 훾(퐶) satisfy Properties (i)-(iii) of Lemma 6, with vertices ordered as in Lemma 7. Let 푍 equal the sum of terms in the

Laplace expansion of ∆푆, 푆 = 푉 (퐶), corresponding to a permutation where 휎(푖1) = 푖푘 2 or 휎(푖푘) = 푖1, but not both. Let 푈 ⊂ 푆 be the set of pairs (푎, 푏), 푎 < 푏, for which

{푖푎, 푖푏−1}, {푖푎+1, 푖푏} ∈ 훾(퐶), i.e., cycle edges {푖푎, 푖푎+1} and {푖푏−1, 푖푏}) have a pair of crossed chords between them. Then 푍 is equal to

푘−1 [︂ ]︂ 푘+1 ∏︁ ∏︁ 휖푖푎,푖푎+1 퐾푖푏−1,푖푎 퐾푖푎+1,푖푏 2 (−1) 퐾푖푘,푖1 퐾푖푗 ,푖푗+1 1 − . 퐾푖 ,푖 퐾푖 ,푖 푗=1 (푎,푏)∈푈 푎 푎+1 푏−1 푏

Proof. Without loss of generality, suppose that 퐺 is labelled so that 퐶 = 1 2 ... 푘 1. Edge {1, 푘} satisﬁes Property (ii) of Lemma 7, and so every positive simple cycle of 퐺[푆] containing {1, 푘} spans 푆 and contains only edges in 퐸(퐶) and pairs of crossing chords. Therefore, we may assume without loss of generality that 퐺[푆] contains 푝 pairs of crossing chords, and no other chords. If 푝 = 0, then 퐶 is an induced cycle and the result follows immediately. There are 2푝 Hamiltonian 1 − 푘 paths in 퐺[푆] not containing {1, 푘}, corresponding to the 푝 binary choices of whether to use each pair of crossing chords or not. Let us

54 1,푘 푝 푡ℎ denote these paths from 1 to 푘 by 푃휃 , where 휃 ∈ Z2, and 휃푖 equals one if the 푖 crossing is used in the path (ordered based on increasing distance to {1, 푘}), and zero otherwise. 1,푘 In particular, 푃0 corresponds to the path = 1 2 ... 푘. We only consider paths with the orientation 1 → 푘, as all cycles under consideration are positive, and the opposite orientation has the same sum. Denoting the product of the terms of 퐾 corresponding to the 1,푘 path 푃휃 = 1 = ℓ1 ℓ2 ... ℓ푘 = 푘, by

푘−1 (︀ 1,푘)︀ ∏︁ 퐾 푃휃 = 퐾ℓ푗 ,ℓ푗+1 , 푗=1 we have (︀ 1,푘)︀ (︁ )︁ ∑︁ 퐾 푃 푍 = 2 (−1)푘+1 퐾 퐾 푃 1,푘 휃 , 푘,1 0 (︀ 1,푘)︀ 푝 퐾 푃0 휃∈Z2 where 푘−1 (︀ 1,푘)︀ ∏︁ 퐾 푃0 = 퐾푗,푗+1. 푗=1 Suppose that the ﬁrst possible crossing occurs at cycle edges {푎, 푎 + 1} and {푏 − 1, 푏}, 푎 < 푏, i.e., {푎, 푏 − 1} and {푎 + 1, 푏} are crossing chords. Then

푘−1 (︀ 푎,푏)︀ ∏︁ ∑︁ 퐾 푃 푍 = 2 (−1)푘+1 퐾 퐾 휃 . 푘,1 푗,푗+1 (︀ 푎,푏)︀ 푗=1 푝 퐾 푃0 휃∈Z2

We have ∑︁ (︀ 푎,푏)︀ ∑︁ (︀ 푎,푏)︀ ∑︁ (︀ 푎,푏)︀ 퐾 푃휃 = 퐾 푃휃 + 퐾 푃휃 , 푝 휃 =0 휃 =1 휃∈Z2 1 1 and

∑︁ (︀ 푎,푏)︀ ∑︁ (︀ 푎+1,푏−1)︀ 퐾 푃휃 = 퐾푎,푎+1퐾푏−1,푏 퐾 푃휃′ , 휃1=0 ′ 푝−1 휃 ∈Z2

55 ∑︁ (︀ 푎,푏)︀ ∑︁ (︀ 푏−1,푎+1)︀ 퐾 푃휃 = 퐾푎,푏−1퐾푎+1,푏 퐾 푃휃′ 휃1=1 ′ 푝−1 휃 ∈Z2 ∑︁ (︀ 푎+1,푏−1)︀ (︀ 푎+1,푏−1)︀ = 퐾푎,푏−1퐾푎+1,푏 퐾 푃휃′ 휖 푃휃′ . ′ 푝−1 휃 ∈Z2

By Property (i) of Lemma 6, 휖(︀퐶(푎, 푏 − 1))︀ = −1, and so

(︀ 푎+1,푏−1)︀ (︀ 푎+1,푏−1)︀ 휖 푃휃′ = −휖푏−1,푎휖푎,푎+1 and 퐾푎,푏−1휖 푃휃′ = −휖푎,푎+1퐾푏−1,푎.

This implies that

∑︁ (︀ 푎,푏)︀ (︀ )︀ ∑︁ (︁ 푎+1,푏−1)︁ 퐾 푃휃 = 퐾푎,푎+1퐾푏−1,푏 − 휖푎,푎+1퐾푏−1,푎퐾푎+1,푏 퐾 푃휃′ , 휃∈ 푝 ′ 푝−1 Z2 휃 ∈Z2 and that

(︀ 푎,푏)︀ (︂ )︂ (︀ 푎+1,푏−1)︀ ∑︁ 퐾 푃 휖푎,푎+1퐾푏−1,푎퐾푎+1,푏 ∑︁ 퐾 푃 ′ 휃 = 1 − 휃 . (︀ 푎,푏)︀ (︀ 푎+1,푏−1)︀ 퐾 푃 퐾푎,푎+1퐾푏−1,푏 퐾 푃 휃∈ 푝 0 ′ 푝−1 0 Z2 휃 ∈Z2

Repeating the above procedure for the remaining 푝 − 1 crossings completes the proof.

Equipped with Lemmas 6, 7, 8, and 9, we can now make a key observation. Suppose that

+ we have a simple cycle basis {푥1, ..., 푥휈−1} for 풞 (퐺) whose corresponding cycles all satisfy Properties (i)-(iii) of Lemma 6. Of course, a minimal simple cycle basis satisﬁes this, but in Subsections 2.3.2 and 2.3.3 we will consider alternate bases that also satisfy these conditions and may be easier to compute in practice. For cycles 퐶 of length at most four, we have already detailed how to compute 푠(퐶), and this computation requires only principal minors corresponding to subsets of 푉 (퐶). When 퐶 is of length greater than four, by Lemmas 8 and 9, we can also compute 푠(퐶), using only principal minors corresponding to subsets

of 푉 (퐶), as the quantity sgn(퐾푖1,푖푘−1 퐾푖푘−1,푖푘 퐾푖푘,푖2 퐾푖2,푖1 ) in Lemma 8 corresponds to a positive four-cycle, and in Lemma 9 the quantity

(︂ 휖 퐾 퐾 )︂ sgn 1 − 푖푎,푖푎+1 푖푏−1,푖푎 푖푎+1,푖푏 퐾푖푎,푖푎+1 퐾푖푏−1,푖푏

56 equals +1 if |퐾푖푎,푖푎+1 퐾푖푏−1,푖푏 | > |퐾푖푏−1,푖푎 퐾푖푎+1,푖푏 | and equals

(︀ )︀ −휖푖푎,푖푎+1 휖푖푏−1,푖푏 sgn 퐾푖푎,푖푎+1 퐾푖푎+1,푖푏 퐾푖푏,푖푏−1 퐾푖푏−1,푖푎

if |퐾푖푎,푖푎+1 퐾푖푏−1,푖푏 | < |퐾푖푏−1,푖푎 퐾푖푎+1,푖푏 |. By Condition (2.6), we have |퐾푖푎,푖푎+1 퐾푖푏−1,푖푏 | ̸=

|퐾푖푏−1,푖푎 퐾푖푎+1,푖푏 |. Therefore, given a simple cycle basis {푥1, ..., 푥휈−1} whose corresponding cycles satisfy Properties (i)-(iii) of Lemma 6, we can compute 푠(퐶) for every such cycle in the basis using only principal minors of size at most the length of the longest cycle in the basis. Given this fact, we are now prepared to answer Question (2.7) and provide an alternate proof for Question (2.8) through the following proposition and theorem.

Proposition 4. Let 퐾 ∈ 풦, with charged sparsity graph 퐺 = ([푁], 퐸, 휖). The set of 퐾′ ∈ 풦

′ that satisfy ∆푆(퐾) = ∆푆(퐾 ) for all 푆 ⊂ [푁] is exactly the set generated by 퐾 and the operations

푁×푁 풟푁 -similarity: 퐾 → 퐷퐾퐷, where 퐷 ∈ R is an arbitrary involutory diagonal matrix, i.e., 퐷 has entries ±1 on the diagonal, and

′ ′ block transpose: 퐾 → 퐾 , where 퐾푖,푗 = 퐾푗,푖 for all 푖, 푗 ∈ 푉 (퐻) for some block 퐻, and ′ 퐾푖,푗 = 퐾푖,푗 otherwise.

Proof. We ﬁrst verify that the 풟푁 -similarity and block transpose operations both preserve principal minors. The determinant is multiplicative, and so principal minors are immediately

preserved under 풟푁 -similarity, as the principal submatrix of 퐷퐾퐷 corresponding to 푆 is equal to the product of the principal submatrices corresponding to 푆 of the three matrices 퐷, 퐾, 퐷, and so

∆푆(퐷퐾퐷) = ∆푆(퐷)∆푆(퐾)∆푆(퐷) = ∆푆(퐾).

For the block transpose operation, we note that the transpose leaves the determinant unchanged. By equation (2.10), the principal minors of a matrix are uniquely determined by the principal minors of the matrices corresponding to the blocks of 퐺. As the transpose of any block also leaves the principal minors corresponding to subsets of other blocks unchanged,

principal minors are preserved under 풟푁 -similarity.

57 ′ ′ ′ What remains is to show that if 퐾 satisﬁes ∆푆(퐾) = ∆푆(퐾 ) for all 푆 ⊂ [푁], then 퐾 is generated by 퐾 and the above two operations. Without loss of generality, we may suppose that the shared charged sparsity graph 퐺 = ([푁], 퐸, 휖) of 퐾 and 퐾′ is two-connected, as the general case follows from this one. We will transform 퐾′ to 퐾 by ﬁrst making them agree for entries corresponding to a spanning tree and a negative edge of 퐺, and then observing that by Lemmas 8 and 9, this property implies that all entries agree.

Let 퐶′ be an arbitrary negative simple cycle of 퐺, 푒 ∈ 퐸(퐶′) be a negative edge in 퐶′,

′ and 푇 be a spanning tree of 퐺 containing the edges 퐸(퐶 ) ∖ 푒. By applying 풟푁 -similarity and block transpose to 퐾′, we can produce a matrix that agrees with 퐾 for all entries corresponding to edges in 퐸(푇 ) ∪ 푒. We perform this procedure in three steps, first by making 퐾′ agree with 퐾 for edges in 퐸(푇 ), then for edge 푒, and then finally fixing any edges in 퐸(푇 ) for which the matrices no longer agree.

′ ′ First, we make 퐾 and 퐾 agree for edges in 퐸(푇 ). Suppose that 퐾푝,푞 = −퐾푝,푞 for some {푝, 푞} ∈ 퐸(푇 ). Let 푈 ⊂ [푁] be the set of vertices connected to 푝 in the forest 푇 ∖ {푝, 푞} ˆ ˆ (the removal of edge {푝, 푞} from 푇 ), and 퐷 be the diagonal matrix with 퐷푖,푖 = +1 if 푖 ∈ 푈 ˆ ˆ ′ ˆ and 퐷푖,푖 = −1 if 푖 ̸∈ 푈. The matrix 퐷퐾 퐷 satisﬁes

[︀ ˆ ′ ˆ]︀ ˆ ′ ˆ ′ 퐷퐾 퐷 푝,푞 = 퐷푝,푝퐾푝,푞퐷푞,푞 = −퐾푝,푞 = 퐾푝,푞,

[︀ ˆ ′ ˆ]︀ ′ and 퐷퐾 퐷 푖,푗 = 퐾푖,푗 for any 푖, 푗 either both in 푈 or neither in 푈 (and therefore for all edges {푖, 푗} ∈ 퐸(푇 ) ∖ {푝, 푞}). Repeating this procedure for every edge {푝, 푞} ∈ 퐸(푇 ) for

′ which 퐾푝,푞 = −퐾푝,푞 results in a matrix that agrees with 퐾 for all entries corresponding to edges in 퐸(푇 ).

Next, we make our matrix and 퐾 agree for the edge 푒. If our matrix already agrees for edge 푒, then we are done with this part of the proof, and we denote the resulting matrix by 퐾ˆ . If our resulting matrix does not agree, then, by taking the transpose of this matrix, we have a new matrix that now agrees with 퐾 for all edges 퐸(푇 ) ∪ 푒, except for negative edges

in 퐸(푇 ). By repeating the 풟푁 -similarity operation again on negative edges of 퐸(푇 ), we again obtain a matrix that agrees with 퐾 on the edges of 퐸(푇 ), but now also agrees on the

58 edge 푒, as there is an even number of negative edges in the path between the vertices of 푒 in the tree 푇 . We now have a matrix that agrees with 퐾 for all entries corresponding to edges in 퐸(푇 ) ∪ 푒, and we denote this matrix by 퐾ˆ . Finally, we aim to show that agreement on the edges 퐸(푇 ) ∪ 푒 already implies that 퐾ˆ = 퐾. Let {푖, 푗} be an arbitrary edge not in 퐸(푇 ) ∪ 푒, and 퐶ˆ be the simple cycle containing edge {푖, 푗} and the unique 푖 − 푗 path in the tree 푇 . By Lemmas 8 and 9, the value

of 푠퐾 (퐶) can be computed for every cycle corresponding to an incidence vector in a minimal + ˆ simple cycle basis of 풞 (퐺) using only principal minors. Then 푠퐾 (퐶) can be computed ′ ′ using principal minors and 푠퐾 (퐶 ), as the incidence vector for 퐶 combined with a minimal ˆ positive simple cycle basis forms a basis for 풞(퐺). By assumption, ∆푆(퐾) = ∆푆(퐾) for ′ ′ ˆ ˆ all 푆 ⊂ [푁], and, by construction, 푠퐾 (퐶 ) = 푠퐾^ (퐶 ). Therefore, 푠퐾 (퐶) = 푠퐾^ (퐶) for all {푖, 푗} not in 퐸(푇 ) ∪ 푒, and so 퐾ˆ = 퐾.

Theorem 5. Let 퐾 ∈ 풦, with charged sparsity graph 퐺 = ([푁], 퐸, 휖). Let ℓ+ ≥ 3 be the + ′ ′ simple cycle sparsity of 풞 (퐺). Then any matrix 퐾 ∈ 풦 with ∆푆(퐾) = ∆푆(퐾 ) for all ′ ˆ |푆| ≤ ℓ+ also satisﬁes ∆푆(퐾) = ∆푆(퐾 ) for all 푆 ⊂ [푁], and there exists a matrix 퐾 ∈ 풦 ˆ ˆ with ∆푆(퐾) = ∆푆(퐾) for all |푆| < ℓ+ but ∆푆(퐾) ̸= ∆푆(퐾) for some 푆.

Proof. The ﬁrst part of theorem follows almost immediately from Lemmas 8 and 9. It sufﬁces to consider a matrix 퐾 ∈ 풦 with a two-connected charged sparsity graph 퐺 = ([푁], 퐸, 휖), as the more general case follows from this one. By Lemmas 8, and 9, the quantity 푠(퐶) is computable for all simple cycles 퐶 in a minimal cycle basis for 풞+(퐺) (and therefore for all positive simple cycles), using only principal minors of size at most the length of the longest cycle in the basis, which in this case is the simple cycle sparsity

+ ℓ+ of 풞 (퐺). The values 푠(퐶) for positive simple cycles combined with the magnitude of the entries of 퐾 and the charge function 휖 uniquely determines all principal minors, as each

term in the Laplace expansion of some ∆푆(퐾) corresponds to a partitioning of 푆 into a disjoint collection of vertices, pairs corresponding to edges, and oriented versions of positive simple cycles of 퐸(푆). This completes the ﬁrst portion of the proof. ˆ Next, we explicitly construct a matrix 퐾 that agrees with 퐾 in the ﬁrst ℓ+ − 1 principal

minors (ℓ+ ≥ 3), but disagrees on a principal minor of order ℓ+. To do so, we consider a

59 {︀ }︀ + minimal simple cycle basis 휒퐶1 , ..., 휒퐶휈−1 of 풞 (퐺), ordered so that |퐶1| ≥ |퐶2| ≥ ... ≥ ˆ |퐶휈−1|. By deﬁnition, ℓ+ = |퐶1|. Let 퐾 be a matrix satisfying

ˆ ˆ ˆ 퐾푖,푖 = 퐾푖,푖 for 푖 ∈ [푁], 퐾푖,푗퐾푗,푖 = 퐾푖,푗퐾푗,푖 for 푖, 푗 ∈ [푁],

and ⎧ ⎨⎪ 푠퐾 (퐶푖) if 푖 > 1 푠퐾^ (퐶푖) = . ⎩⎪−푠퐾 (퐶푖) if 푖 = 1

ˆ The matrix 퐾 and 퐾 agree on all principal minors of order at most ℓ+ − 1, for if there was a principal minor where they disagreed, this would imply that there is a positive simple

cycle of length less than ℓ+ whose incidence vector is not in the span of {휒퐶2 , ..., 휒퐶휈−1 }, a ˆ contradiction. To complete the proof, it sufﬁces to show that ∆푉 (퐶1)(퐾) ̸= ∆푉 (퐶1)(퐾).

To do so, we consider three different cases, depending on the length of 퐶1. If 퐶1 is

a three-cycle, then 퐶1 is an induced cycle and the result follows immediately. If 퐶1 is a

four-cycle, 퐺[푉 (퐶1)] may possibly have multiple positive four-cycles. However, in this case the quantity 푍 from Equation (2.11) is distinct for 퐾ˆ and 퐾, as 퐾 ∈ 풦. Finally, we consider the general case when |퐶1| > 4. By Lemma 8, all terms of ∆푉 (퐶)(퐾) (and ˆ ∆푉 (퐶)(퐾)) except for 푍 depend only on principal minors of order at most ℓ+ − 1. We denote the quantity 푍 from Lemma 8 for 퐾 and 퐾ˆ by 푍 and 푍ˆ respectively. By Lemma 9, the magnitude of 푍 and 푍ˆ depend only on principal minors of order at most four, and so |푍| = |푍ˆ|. In addition, because 퐾 ∈ 풦, 푍 ̸= 0. The quantities 푍 and 푍ˆ are equal in sign if ˆ and only if 푠퐾^ (퐶1) = 푠퐾 (퐶1), and therefore ∆푉 (퐶1)(퐾) ̸= ∆푉 (퐶1)(퐾).

2.3.2 Efﬁciently Recovering a Matrix from its Principal Minors

In Subsection 2.3.1, we characterized both the set of magnitude-symmetric matrices in 풦

that share a given set of principal minors, and noted that principal minors of order ≤ ℓ+

uniquely determine principal minors of all orders, where ℓ+ is the simple cycle sparsity of 풞+(퐺) and is computable using principal minors of order one and two. In this subsection, we make use of a number of theoretical results of Subsection 2.3.1 to formally describe a

60 polynomial time algorithm to produce a magnitude-symmetric matrix 퐾 with prescribed principal minors. We provide a high-level description and discussion of this algorithm below, and save the description of a key subroutine for computing a positive simple cycle basis for after the overall description and proof.

An Efﬁcient Algorithm

Our formal algorithm proceeds by completing a number of high-level tasks, which we describe below. This procedure is very similar in nature to the algorithm implicitly described and used in the proofs of Proposition 4 and Theorem 5. The main difference between the algorithmic procedure alluded to in Subsection 2.3.1 and this algorithm is the computation of a positive simple cycle basis. Unlike 풞(퐺), there is no known polynomial time algorithm to compute a minimal simple cycle basis for 풞+(퐺), and a decision version of this problem (︀ )︀ may indeed be NP-hard. Our algorithm, which we denote by RECOVERK {∆푆}푆⊂[푁] , proceeds in ﬁve main steps:

Step 1: Compute |퐾푖,푗|, 푖, 푗 ∈ [푁], and the charged sparsity graph 퐺 = ([푁], 퐸, 휖).

√︀ We recall that 퐾푖,푖 = ∆푖 and |퐾푖,푗| = |퐾푗,푖| = |∆푖∆푗 − ∆푖,푗|. The edges {푖, 푗} ∈

퐸(퐺) correspond to non-zero off-diagonal entries |퐾푖,푗|= ̸ 0, and the function 휖 is given

by 휖푖,푗 = sgn(∆푖∆푗 − ∆푖,푗).

+ Step 2: For every block 퐻 of 퐺, compute a simple cycle basis {푥1, ..., 푥푘} of 풞 (퐻).

In the proof of Proposition 3, we defined an efficient algorithm to compute a simple cycle basis of 풞+(퐻) for any two-connected graph 퐻. This algorithm makes use of the property that every two-connected graph has an open ear decomposition. Unfortunately, this algorithm has no provable guarantees on the length of the longest cycle in the basis. For this reason, we introduce an alternate efficient algorithm below that computes a

+ simple cycle basis of 풞 (퐻) consisting of cycles all of length at most 3휑퐻 , where 휑퐻 is

61 the maximum length of a shortest cycle between any two edges in 퐻, i.e.,

휑퐻 := max min |퐶| 푒,푒′∈퐸(퐻) simple cycle 퐶 푠.푡. 푒,푒′∈퐸(퐶)

with 휑퐻 := 2 if 퐻 is acyclic. The existence of a simple cycle through any two edges of a 2-connected graph can be deduced from the existence of two vertex-disjoint paths between newly added midpoints of these two edges. The resulting simple cycle basis also maximizes the number of three- and four-cycles it contains. This is a key property which allows us to limit the number of principal minors that we query.

Step 3: For every block 퐻 of 퐺, convert {푥1, ..., 푥푘} into a simple cycle basis satisfying Properties (i)-(iii) of Lemma 6.

If there was an efficient algorithm for computing a minimal simple cycle basis for 풞+(퐻), by Lemma 6, we would be done (and there would be no need for the previous step). However, there is currently no known algorithm for this, and a decision version of this problem may be NP-hard. By using the simple cycle basis from Step 2 and iteratively removing chords, we can create a basis that satisfies the same three key properties (in Lemma 6) that a minimal simple cycle basis does. In addition, the lengths of the cycles in this basis are no larger than those of Step 2, i.e., no procedure in Step 3 ever increases the length of a cycle. The procedure for this is quite intuitive. Given a cycle 퐶 in the basis, we efficiently check that 훾(퐶) satisfies Properties (i)-(iii) of Lemma 6. If all properties hold, we are done, and check another cycle in the basis, until all cycles satisfy the desired properties. In each case, if a given property does not hold, then, by the proof of Lemma 6, we can efficiently compute an alternate cycle 퐶′, |퐶′| < |퐶|, that can replace 퐶 in the simple cycle basis for 풞+(퐻), decreasing the sum of cycle lengths in the basis by at least one. Because of this, the described procedure is a polynomial time algorithm.

62 Step 4: For every block 퐻 of 퐺, compute 푠(퐶) for every cycle 퐶 in the basis.

This calculation relies heavily on the results of Subsection 2.3.1. The simple cycle basis for 풞+(퐻) satisfies Properties (i)-(iii) of Lemma 6, and so we may apply Lemmas 8 and 9. We compute the quantities 푠(퐶) iteratively based on cycle length, beginning with the shortest cycle in the basis and finishing with the longest. By the analysis at the beginning of Subsection 2.3.1, we recall that when 퐶 is a three- or four-cycle, 푠(퐶) can be computed efficiently using 푂(1) principal minors, all corresponding to subsets of 푉 (퐶). When |퐶| > 4, by Lemma 8, the quantity 푍 (defined in Lemma 8) can be computed efficiently using 푂(1) principal minors, all corresponding to subsets of 푉 (퐶). By Lemma 9, the quantity 푠(퐶) can be computed efficiently using 푍, 푠(퐶′) for positive four-cycles 퐶′ ⊂ 퐺[푉 (퐶)], and 푂(1) principal minors all corresponding to subsets of 푉 (퐶). Because our basis maximizes the number of three- and four-cycles it contains, any such four-cycle 퐶′ is either in the basis (and so 푠(퐶′) has already been computed) or is a linear combination of three- and four-cycles in the basis, in which case 푠(퐶′) can be computed using Gaussian elimination, without any additional querying of principal minors.

Step 5: Output a matrix 퐾 satisfying ∆푆(퐾) = ∆푆 for all 푆 ⊂ [푛].

The procedure for producing this matrix is quite similar to the proof of Proposition 4. It suffices to fix the signs of the upper triangular entries of 퐾, as the lower triangular entries can be computed using 휖. For each block 퐻, we find a negative simple cycle 퐶 (if one exists), fix a negative edge 푒 ∈ 퐸(퐶), and extend 퐸(퐶)∖푒 to a spanning tree 푇 of 퐻, [︀ ]︀ i.e., 퐸(퐶)∖푒 ⊂ 퐸(푇 ). We give the entries 퐾푖,푗, 푖 < 푗, {푖, 푗} ∈ 퐸(푇 ) ∪ 푒 an arbitrary + sign, and note our choice of 푠퐾 (퐶). We extend the simple cycle basis for 풞 (퐻) to a basis for 풞(퐻) by adding 퐶. On the other hand, if no negative cycle exists, then we

simply ﬁx an arbitrary spanning tree 푇 , and give the entries 퐾푖,푗, 푖 < 푗, {푖, 푗} ∈ 퐸(푇 )

an arbitrary sign. In both cases, we have a basis for 풞(퐻) consisting of cycles 퐶푖 for

63 which we have computed 푠(퐶푖). For each edge {푖, 푗} ∈ 퐸(퐻) corresponding to an entry ′ 퐾푖,푗 for which we have not ﬁxed the sign of, we consider the cycle 퐶 consisting of the edge {푖, 푗} and the unique 푖 − 푗 path in 푇 . Using Gaussian elimination, we can write 퐶′ as a sum of a subset of the cycles in our basis. As noted in Subsection 2.3.1, the quantity

′ 푠(퐶 ) is simply the product of the quantities 푠(퐶푖) for cycles 퐶푖 in this sum.

A few comments are in order. Conditional on the analysis of Step 2, the above algorithm

runs in time polynomial in 푁. Of course, the set {∆푆}푆⊂[푁] is not polynomial in 푁, but rather than take the entire set as input, we assume the existence of some querying operation, in which the value of any principal minor can be queried in polynomial time. Combining the (︀ )︀ analysis of each step, we can give the following guarantee for the RECOVERK {∆푆}푆⊂[푁] algorithm.

Theorem 6. Let {∆푆}푆⊂[푁] be the principal minors of some matrix in 풦. The algorithm (︀ )︀ RECOVERK {∆푆}푆⊂[푁] computes a matrix 퐾 ∈ 풦 satisfying ∆푆(퐾) = ∆푆 for all 푆 ⊂ [푁]. This algorithm runs in time polynomial in 푁, and queries at most 푂(푁 2)

principal minors, all of order at most 3 휑퐺, where 퐺 is the sparsity graph of {∆푆}|푆|≤2 and ′ 휑퐺 is the maximum of 휑퐻 over all blocks 퐻 of 퐺. In addition, there exists a matrix 퐾 ∈ 풦, ′ |퐾푖,푗| = |퐾푖,푗|, 푖, 푗 ∈ [푁], such that any algorithm that computes a matrix with principal ′ minors {∆푆(퐾 )}푆⊂[푁] must query a principal minor of order at least 휑퐺.

Proof. Conditional on the existence of the algorithm described in Step 2, i.e., an efﬁcient algorithm to compute a simple cycle basis for 풞+(퐻) consisting of cycles of length at most

3휑퐻 that maximizes the number of three- and four-cycles it contains, by the above analysis (︀ )︀ we already have shown that the RECOVERK {∆푆}푆⊂[푁] algorithm runs in time polynomial in 푛 and queries at most 푂(푁 2) principal minors. To construct 퐾′, we ﬁrst consider the uncharged sparsity graph 퐺 = ([푁], 퐸) of 퐾, and

′ let 푒, 푒 ∈ 퐸(퐺) be a pair of edges for which the quantity 휑퐺 is achieved (if 휑퐺 = 2, we are done). We aim to deﬁne an alternate charge function for 퐺, show that the simple cycle

+ ′ sparsity of 풞 (퐺) is at least 휑퐺, ﬁnd a matrix 퐾 that has this charged sparsity graph, and

64 then make use of Theorem 5. Consider the charge function 휖 satisfying 휖(푒) = 휖(푒′) = −1, and equal to +1 otherwise. Any simple cycle basis for the block containing 푒, 푒′ must have a simple cycle containing both edges 푒, 푒′. By deﬁnition, the length of this cycle is at least

+ ′ 휑퐺, and so the simple cycle sparsity of 풞 (퐺) is at least 휑퐺. Next, let 퐾 be an arbitrary ′ matrix with |퐾푖,푗| = |퐾푖,푗|, 푖, 푗 ∈ [푁], and charged sparsity graph 퐺 = ([푁], 퐸, 휖), with 휖 as deﬁned above. 퐾 ∈ 풦, and so 퐾′ ∈ 풦, and therefore, by Theorem 5, any algorithm that

′ computes a matrix with principal minors {∆푆(퐾 )}푆⊂[푁] must query a principal minor of

order at least ℓ+ ≥ 휑퐺.

Computing a Simple Cycle Basis with Provable Guarantees

Next, we describe an efﬁcient algorithm to compute a simple cycle basis for 풞+(퐻)

consisting of cycles of length at most 3휑퐻 . Suppose we have a two-connected graph

퐻 = ([푁], 퐸, 휖). We ﬁrst compute a minimal cycle basis {휒퐶1 , ..., 휒퐶휈 } of 풞(퐻), where

each 퐶푖 is an induced simple cycle (as argued previously, any lexicographically minimal basis for 풞(퐻) consists only of induced simple cycles). Without loss of generality, sup-

pose that 퐶1 is the shortest negative cycle in the basis (if no negative cycle exists, we are

done). The set of incidence vectors 휒퐶1 + 휒퐶푖 , 휖(퐶푖) = −1, combined with vectors 휒퐶푗 , + 휖(퐶푗) = +1, forms a basis for 풞 (퐻), which we denote by ℬ. We will build a simple + cycle basis for 풞 (퐻) by iteratively choosing incidence vectors of the form 휒퐶1 + 휒퐶푖 , 휖(퐶 ) = −1, replacing each with an incidence vector 휒 corresponding to a positive simple 푖 퐶̃︀푖

cycle 퐶̃︀푖, and iteratively updating ℬ.

Let 퐶 := 퐶1 + 퐶푖 be a positive cycle in our basis ℬ. If 퐶 is a simple cycle, we are done,

otherwise 퐶 is the union of edge-disjoint simple cycles 퐹1, ..., 퐹푝 for some 푝 > 1. If one of these simple cycles is positive and satisﬁes

{︀ }︀ 휒퐹푗 ̸∈ span ℬ∖{휒퐶1 + 휒퐶푖 } , we are done. Otherwise there is a negative simple cycle, without loss of generality given by

퐹1. We can construct a set of 푝 − 1 positive cycles by adding 퐹1 to each negative simple

65 {︀ }︀ cycle, and at least one of these cycles is not in span ℬ∖{휒퐶1 + 휒퐶푖 } . We now have a positive cycle not in this span, given by the sum of two edge-disjoint negative simple cycles

퐹1 and 퐹푗. These two cycles satisfy, by construction, |퐹1| + |퐹푗| ≤ |퐶| ≤ 2ℓ, where ℓ is

the cycle sparsity of 풞(퐻). If |푉 (퐹1) ∩ 푉 (퐹푗)| > 1, then 퐹1 ∪ 퐹푗 is two-connected, and we + may compute a simple cycle basis for 풞 (퐹1 ∪ 퐹푗) using the ear decomposition approach of Proposition 3. At least one positive simple cycle in this basis satisﬁes our desired condition,

and the length of this positive simple cycle is at most |퐸(퐹1 ∪ 퐹푗)| ≤ 2ℓ. If 퐹1 ∪ 퐹푗 is not two-connected, then we compute a shortest cycle 퐶̃︀ in 퐸(퐻) containing both an edge in

퐸(퐹1) and 퐸(퐹푗) (computed using Suurballe’s algorithm, see [109] for details). The graph

′ (︀ )︀ 퐻 = 푉 (퐹1) ∪ 푉 (퐹푗) ∪ 푉 (퐶̃︀), 퐸(퐹1) ∪ 퐸(퐹푗) ∪ 퐸(퐶̃︀)

is two-connected, and we can repeat the same procedure as above for 퐻′, where, in this case,

′ the resulting cycle is of length at most |퐸(퐻 )| ≤ 2ℓ + 휑퐻 . The additional guarantee for maximizing the number of three- and four-cycles can be obtained easily by simply listing all positive cycles of length three and four, combining them with the computed basis, and

performing a greedy algorithm. What remains is to note that ℓ ≤ 휑퐻 . This follows quickly from the observation that for any vertex 푢 in any simple cycle 퐶 of a minimal cycle basis for 풞(퐻), there exists an edge {푣, 푤} ∈ 퐸(퐶) such that 퐶 is the disjoint union of a shortest 푢 − 푣 path, shortest 푢 − 푤 path, and {푣, 푤} [51, Theorem 3].

2.3.3 An Algorithm for Principal Minors with Noise

So far, we have exclusively considered the situation in which principal minors are known (or can be queried) exactly. In applications related to this work, this is often not the case. The key application of a large part of this work is signed determinantal point processes, and the algorithmic question of learning the kernel of a signed DPP from some set of samples. Here, for the sake of readability, we focus on the non-probabilistic setting in which, given some

matrix 퐾 with spectral radius 휌(퐾) ≤ 1, each principal minor ∆푆 can be queried/estimated

66 ˆ up to some absolute error term 0 < 훿 < 1, i.e., we are given a set {∆푆}푆⊂[푁], satisfying

⃒ ˆ ⃒ ⃒∆푆 − ∆푆⃒ < 훿 for all 푆 ⊂ [푁],

where {∆푆}푆⊂[푁] are the principal minors of some magnitude-symmetric matrix 퐾, and asked to compute a matrix 퐾′ minimizing

′ ˆ ′ 휌(퐾, 퐾 ) := min |퐾 − 퐾 |∞, 퐾^ s.t. Δ푆 (퐾^ )=Δ푆 (퐾), 푆⊂[푁]

′ ′ where |퐾 − 퐾 |∞ = max푖,푗 |퐾푖,푗 − 퐾푖,푗|.

In this setting, we require both a separation condition for the entries of 퐾 and a stronger

푁×푁 version of Condition (2.6). Let 풦푁 (훼, 훽) be the set of matrices 퐾 ∈ R , 휌(퐾) ≤ 1,

satisfying: |퐾푖,푗| = |퐾푗,푖| for 푖, 푗 ∈ [퐾], |퐾푖,푗| > 훼 or |퐾푖,푗| = 0 for all 푖, 푗 ∈ [푁], and

If 퐾푖,푗퐾푗,푘퐾푘,ℓ퐾ℓ,푖 ̸= 0 for 푖, 푗, 푘, ℓ ∈ [푁] distinct, (2.12) ⃒ ⃒ then ⃒|퐾푖,푗퐾푘,ℓ| − |퐾푗,푘퐾ℓ,푖|⃒ > 훽, and the sum ⃒ ⃒휑1 퐾푖,푗퐾푗,푘퐾푘,ℓ퐾ℓ,푖 + 휑2퐾푖,푗퐾푗,ℓ퐾ℓ,푘퐾푘,푖

⃒ 3 + 휑3퐾푖,푘퐾푘,푗퐾푗,ℓ퐾ℓ,푖⃒ > 훽 ∀ 휑 ∈ {−1, 1} .

(︀ )︀ The algorithm RECOVERK {∆푆}푆⊂[푁] , slightly modified, performs well for this more general problem. Here, we briefly describe a variant of the above algorithm that performs nearly optimally on any 퐾 ∈ 풦(훼, 훽), 훼, 훽 > 0, for 훿 sufficiently small, and quantify the relationship between 훼 and 훽. The algorithm to compute a 퐾′ sufficiently close to 퐾, which (︀ ˆ )︀ we denote by RECOVERNOISYK {∆푆}푆⊂[푁]; 훿 , proceeds in three main steps:

′ Step 1: Deﬁne |퐾푖,푗|, 푖, 푗 ∈ [푁], and the charged sparsity graph 퐺 = ([푁], 퐸, 휖).

67 We deﬁne ⎧ ˆ ˆ ⎪∆푖 if |∆푖| > 훿 ′ ⎨ 퐾푖,푖 = , ⎩⎪ 0 otherwise

and ⎧ ˆ ˆ ˆ ˆ ˆ ˆ 2 ⎪∆푖∆푗 − ∆푖,푗 if |∆푖∆푗 − ∆푖,푗| > 3훿 + 훿 ′ ′ ⎨ 퐾푖,푗퐾푗,푖 = . ⎩⎪ 0 otherwise

′ The edges {푖, 푗} ∈ 퐸(퐺) correspond to non-zero off-diagonal entries |퐾푖,푗| ̸= 0, and ′ ′ the function 휖 is given by 휖푖,푗 = sgn(퐾푖,푗퐾푗,푖).

+ Step 2: For every block 퐻 of 퐺, compute a simple cycle basis {푥1, ..., 푥푘} of 풞 (퐻) satisfying Properties (i)-(iii) of Lemma 6, and deﬁne 푠퐾′ (퐶) for every cycle 퐶 in the basis.

The computation of the simple cycle basis depends only on the graph 퐺, and so is identi-

cal to Steps 1 and 2 of the RECOVERK algorithm. The simple cycle basis for 풞+(퐻) satisﬁes Properties (i)-(iii) of Lemma 6, and so we can apply Lemmas 8 and 9. We

deﬁne the quantities 푠퐾′ (퐶) iteratively based on cycle length, beginning with the shortest cycle. We begin by detailing the cases of |퐶| = 3, then |퐶| = 4, and then ﬁnally |퐶| > 4.

Case I: |퐶| = 3.

Suppose that the three-cycle 퐶 = 푖 푗 푘 푖, 푖 < 푗 < 푘, is in our basis. If 퐺 was the charged sparsity graph of 퐾, then the equation

1 1 퐾 퐾 퐾 = ∆ ∆ ∆ − [︀∆ ∆ + ∆ ∆ + ∆ ∆ ]︀ + ∆ 푖,푗 푗,푘 푘,푖 푖 푗 푘 2 푖 푗,푘 푗 푖,푘 푘 푖,푗 2 푖,푗,푘

would hold. For this reason, we deﬁne 푠퐾′ (퐶) as

(︀ ˆ ˆ ˆ [︀ ˆ ˆ ˆ ˆ ˆ ˆ ]︀ ˆ )︀ 푠퐾′ (퐶) = 휖푖,푘 sgn 2∆푖∆푗∆푘 − ∆푖∆푗,푘 + ∆푗∆푖,푘 + ∆푘∆푖,푗 + ∆푖,푗,푘 .

68 Case II: |퐶| = 4.

Suppose that the four-cycle 퐶 = 푖 푗 푘 ℓ 푖, with (without loss of generality) 푖 < 푗 < 푘 < ℓ, is in our basis. We note that

∆푖,푗,푘,ℓ = ∆푖,푗∆푘,ℓ + ∆푖,푘∆푗,ℓ + ∆푖,ℓ∆푗,푘 + ∆푖∆푗,푘,ℓ + ∆푗∆푖,푘,ℓ + ∆푘∆푖,푗,ℓ

+ ∆ℓ∆푖,푗,푘 − 2∆푖∆푗∆푘,ℓ − 2∆푖∆푘∆푗,ℓ − 2∆푖∆ℓ∆푗,푘 − 2∆푗∆푘∆푖,ℓ

− 2∆푗∆ℓ∆푖,푘 − 2∆푘∆ℓ∆푖,푗 + 6∆푖∆푗∆푘∆ℓ + 푍,

where 푍 is the sum of the terms in the Laplace expansion of ∆푖,푗,푘,ℓ corresponding to a four-cycle of 퐺[{푖, 푗, 푘, ℓ}], and deﬁne the quantity 푍ˆ analogously:

ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ 푍 = ∆푖,푗,푘,ℓ − ∆푖,푗∆푘,ℓ − ∆푖,푘∆푗,ℓ − ∆푖,ℓ∆푗,푘 − ∆푖∆푗,푘,ℓ − ∆푗∆푖,푘,ℓ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ − ∆푘∆푖,푗,ℓ − ∆ℓ∆푖,푗,푘 + 2∆푖∆푗∆푘,ℓ + 2∆푖∆푘∆푗,ℓ + 2∆푖∆ℓ∆푗,푘 ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ + 2∆푗∆푘∆푖,ℓ + 2∆푗∆ℓ∆푖,푘 + 2∆푘∆ℓ∆푖,푗 − 6∆푖∆푗∆푘∆ℓ.

We consider two cases, depending on the number of positive four-cycles in 퐺[{푖, 푗, 푘, ℓ}]. First, suppose that 퐶 is the only positive four-cycle in 퐺[{푖, 푗, 푘, ℓ}]. If 퐺 was the charged sparsity graph of 퐾, then 푍 would equal −2퐾푖,푗퐾푗,푘퐾푘,ℓ퐾ℓ,푖. For this reason, we deﬁne

푠퐾′ (퐶) as

(︀ ˆ)︀ 푠퐾′ (퐶) = 휖푖,ℓ sgn − 푍 in this case. Next, we consider the second case, in which there is more than one positive four-cycle in 퐺[{푖, 푗, 푘, ℓ}], which actually implies that all three four-cycles in 퐺[{푖, 푗, 푘, ℓ}] are positive. If 퐺 was the charged sparsity graph of 퐾 then 푍 would be

69 given by

(︀ )︀ 푍 = −2 퐾푖,푗퐾푗,푘퐾푘,ℓ퐾ℓ,푖 + 퐾푖,푗퐾푗,ℓ퐾ℓ,푘퐾푘,푖 + 퐾푖,푘퐾푘,푗퐾푗,ℓ퐾ℓ,푖 .

For this reason, we compute an assignment of values 휑1, 휑2, 휑3 ∈ {−1, +1} that minimizes (not necessarily uniquely) the quantity

⃒ ⃒ ⃒ ˆ ⃒ ˆ ˆ ˆ ˆ ⃒ ⃒ ˆ ˆ ˆ ˆ ⃒ ⃒ ˆ ˆ ˆ ˆ ⃒⃒ ⃒푍/2 + 휑1⃒퐾푖,푗퐾푗,푘퐾푘,ℓ퐾ℓ,푖⃒ + 휑2⃒퐾푖,푗퐾푗,ℓ퐾ℓ,푘퐾푘,푖⃒ + 휑3⃒퐾푖,푘퐾푘,푗퐾푗,ℓ퐾ℓ,푖⃒⃒, ⃒ ⃒ and deﬁne 푠퐾′ (퐶) = 휖푖,ℓ 휑1 in this case.

Case III: |퐶| > 4.

Suppose that the cycle 퐶 = 푖1 ... 푖푘 푖1, 푘 > 4, is in our basis, with vertices ordered to match the ordering of Lemma 8, and deﬁne 푆 = {푖1, ..., 푖푘}. Based on the equation in Lemma 8, we deﬁne 푍ˆ to equal

ˆ ˆ ˆ ˆ ˆ ˆ [︀ ˆ ˆ ˆ ]︀ ˆ 푍 = ∆푆 − ∆푖1 ∆푆∖푖1 − ∆푖푘 ∆푆∖푖푘 − ∆푖1,푖푘 − 2∆푖1 ∆푖푘 ∆푆∖푖1,푖푘 + 2퐾′ 퐾′ 퐾′ 퐾′ ∆ˆ 푖1,푖푘−1 푖푘−1,푖푘 푖푘,푖2 푖2,푖1 푆∖푖1,푖2,푖푘−1,푖푘 [︀ ˆ ˆ ˆ ]︀[︀ ˆ ˆ ˆ ]︀ ˆ + ∆푖1,푖2 − ∆푖1 ∆푖2 ∆푖푘−1,푖푘 − ∆푖푘−1 ∆푖푘 ∆푆∖푖1,푖2,푖푘−1,푖푘 [︀ ˆ ˆ ˆ ]︀[︀ ˆ ˆ ˆ ]︀ ˆ − ∆푖1,푖푘−1 − ∆푖1 ∆푖푘−1 ∆푖2,푖푘 − ∆푖2 ∆푖푘 ∆푆∖푖1,푖2,푖푘−1,푖푘 [︀ ˆ ˆ ˆ ]︀[︀ ˆ ˆ ˆ ]︀ − ∆푖1,푖2 − ∆푖1 ∆푖2 ∆푆∖푖1,푖2 − ∆푖푘 ∆푆∖푖1,푖2,푖푘 [︀ ˆ ˆ ˆ ]︀[︀ ˆ ˆ ˆ ]︀ − ∆푖푘−1,푖푘 − ∆푖푘−1 ∆푖푘 ∆푆∖푖푘−1,푖푘 − ∆푖2 ∆푆∖푖1,푖푘−1,푖푘 , and note that the quantity

(퐾′ 퐾′ 퐾′ 퐾′ ) sgn 푖1,푖푘−1 푖푘−1,푖푘 푖푘,푖2 푖2,푖1 is, by construction, computable using the signs of three- and four-cycles in our basis.

70 Based on the equation in Lemma 9, we set

(︂ 푘−1 [︃ 휖 퐾′ 퐾′ ]︃ )︂ ∏︁ ∏︁ 푖푎,푖푎+1 푖푏−1,푖푎 푖푎+1,푖푏 sgn 2 (−1)푘+1 퐾′ 퐾′ 1 − = sgn(푍ˆ), 푖푘,푖1 푖푗 ,푖푗+1 퐾′ 퐾′ 푗=1 (푎,푏)∈푈 푖푎,푖푎+1 푖푏−1,푖푏

with the set 푈 deﬁned as in Lemma 9. From this equation, we can compute 푠퐾′ (퐶), as the quantity [︃ ′ ′ ]︃ 휖푖 ,푖 퐾 퐾 sgn 1 − 푎 푎+1 푖푏−1,푖푎 푖푎+1,푖푏 퐾′ 퐾′ 푖푎,푖푎+1 푖푏−1,푖푏 is computable using the signs of three- and four-cycles in our basis (see Subsection 2.3.1

for details). In the case of |퐶| > 4, we note that 푠퐾′ (퐶) was deﬁned using 푂(1) principal minors, all corresponding to subsets of 푉 (퐶), and previously computed information regarding three- and four-cycles (which requires no additional querying).

Step 3: Deﬁne 퐾′.

′ ′ We have already deﬁned 퐾푖,푖, 푖 ∈ [푁], |퐾푖,푗|, 푖, 푗 ∈ [푁], and 푠퐾′ (퐶) for every cycle in a simple cycle basis. The procedure for producing this matrix is identical to Step 5 of

the RECOVERK algorithm.

(︀ ˆ )︀ The RECOVERNOISYK {∆푆}푆⊂[푁], 훿 algorithm relies quite heavily on 훿 being small enough so that the sparsity graph of 퐾 can be recovered. In fact, we can quantify the size of 훿 required so that the complete signed structure of 퐾 can be recovered, and only uncertainty in the magnitude of entries remains. In the following theorem, we show that, in this regime,

the RECOVERNOISYK algorithm is nearly optimal.

ˆ ˆ Theorem 7. Let 퐾 ∈ 풦푁 (훼, 훽) and {∆푆}푆⊂[푁] satisfy |∆푆 − ∆푆(퐾)| < 훿 < 1 for all 푆 ⊂ [푁], for some

훿 < min{훼3휑퐺 , 훽3휑퐺/2, 훼2}/400,

where 퐺 is the sparsity graph of {∆푆}|푆|≤2 and 휑퐺 is the maximum of 휑퐻 over all blocks 퐻

71 (︀ ˆ )︀ ′ of 퐺. The RECOVERNOISYK {∆푆}푆⊂[푁], 훿 algorithm outputs a matrix 퐾 satisfying

휌(퐾, 퐾′) < 2훿/훼.

This algorithm runs in time polynomial in 푁, and queries at most 푂(푁 2) principal minors,

all of order at most 3 휑퐺.

Proof. This proof consists primarily of showing that, for 훿 sufﬁciently small, the charged sparsity graph 퐺 = ([푁], 퐸, 휖) and the cycle signs 푠(퐶) of 퐾′ and 퐾 agree, and quantifying the size of 훿 required to achieve this. If this holds, then the quantity 휌(퐾, 퐾′) is simply

⃒ ′ ⃒ given by max푖,푗 ⃒|퐾푖,푗| − |퐾푖,푗|⃒. We note that, because 휌(퐾) ≤ 1 and 훿 < 1,

⃒ ˆ ˆ ⃒ 푘 푘 ⃒∆푆1 ...∆푆푘 − ∆푆1 ...∆푆푘 ⃒ < (1 + 훿) − 1 < (2 − 1)훿 (2.13)

for any 푆1, ..., 푆푘 ⊂ [푁], 푘 ∈ N. This is the key error estimate we will use throughout the proof.

′ Next, we consider off-diagonal entries 퐾푖,푗. By Equation (2.13),

⃒ ˆ ˆ ˆ ⃒ 2 ⃒(∆푖∆푗 − ∆푖,푗) − 퐾푖,푗퐾푗,푖⃒ ≤ [(1 + 훿) − 1] + 훿 < 4훿.

√︁ ′ ′ ˆ ˆ ˆ Therefore, if 퐾푖,푗 = 0, then 퐾푖,푗 = 0. If |퐾푖,푗| > 훼, then |퐾푖,푗| = |∆푖∆푗 − ∆푖,푗|, as 훼2 > 8훿. Combining these two cases, we have

′ ′ |퐾푖,푗퐾푗,푖 − 퐾푖,푗퐾푗,푖| < 4훿 for all 푖, 푗 ∈ [푁],

72 and therefore

⃒ ′ ⃒ ⃒ |퐾푖,푗| − |퐾푖,푗| ⃒ < 2훿/훼 for all 푖, 푗 ∈ [푁], 푖 ̸= 푗. (2.14)

Our above analysis implies that 퐾 and 퐾′ share the same charged sparsity graph 퐺 = ([푁], 퐸, 휖). What remains is to show that the cycle signs 푠(퐶) of 퐾 and 퐾′ agree for the positive simple cycle basis of our algorithm. Similar to the structure in Step 2 of the

description of the RECOVERNOISYK algorithm, we will treat the cases of |퐶| = 3, |퐶| = 4, and |퐶| > 4 separately, and rely heavily on the associated notation deﬁned above.

Case I: |퐶| = 3.

3 We have |퐾푖,푗퐾푗,푘퐾푘,푖| > 훼 , and, by Equation (2.13), the difference between 퐾푖,푗퐾푗,푘퐾푘,푖 and its estimated value

1 1 ∆ˆ ∆ˆ ∆ˆ − [︀∆ˆ ∆ˆ + ∆ˆ ∆ˆ + ∆ˆ ∆ˆ ]︀ + ∆ˆ 푖 푗 푘 2 푖 푗,푘 푗 푖,푘 푘 푖,푗 2 푖,푗,푘

is at most

3 1 [(1 + 훿)3 − 1] + [(1 + 훿)2 − 1] + [(1 + 훿) − 1] < 12훿 < 훼3. 2 2

Therefore, 푠퐾 (퐶) equals 푠퐾′ (퐶).

Case II: |퐶| = 4.

By repeated application of Equation (2.13),

|푍 − 푍ˆ| < 6[(1 + 훿)4 − 1] + 12[(1 + 훿)3 − 1] + 7[(1 + 훿)2 − 1] + [(1 + 훿) − 1] < 196 훿.

If 퐶 is the only positive four-cycle of 퐺[{푖, 푗, 푘, ℓ}], then 푠퐾 (퐶) equals 푠퐾′ (퐶), as 2훽 >

73 196훿. If 퐺[{푖, 푗, 푘, ℓ}] has three positive four-cycles, then

(︀ )︀ 푍 = −2 퐾푖,푗퐾푗,푘퐾푘,ℓ퐾ℓ,푖 + 퐾푖,푗퐾푗,ℓ퐾ℓ,푘퐾푘,푖 + 퐾푖,푘퐾푘,푗퐾푗,ℓ퐾ℓ,푖 ,

and, by Condition (2.12), any two signings of the three terms of the above equation for 푍

are at distance at least 4훽 from each other. As 2훽 > 196훿, 푠퐾 (퐶) equals 푠퐾′ (퐶) in this case as well.

Case III: |퐶| > 4.

√ Using equations (2.13) and (2.14), and noting that 휌(퐾) < 1 implies |퐾푖,푗| < 2, we can bound the difference between 푍 and 푍ˆ by

|푍 − 푍ˆ| < [(1 + 훿) − 1] + 5[(1 + 훿)2 − 1] + 8[(1 + 훿)3 − 1] + 6[(1 + 훿)4 − 1] √ + 2[(1 + 훿)5 − 1] + 2[( 2 + 2훿/훼)4 − 4][(1 + 훿) − 1]

≤ (224 + 120훿/훼)훿 ≤ 344훿.

Based on the equation of Lemma 9 for 푍, the quantity 푍 is at least

min{훼|퐶|, 훽|퐶|/2} > 344훿,

and so 푠퐾 (퐶) equals 푠퐾′ (퐶). This completes the analysis of the ﬁnal case. ′ ⃒ ′ ⃒ This implies that 휌(퐾, 퐾 ) is given by max푖,푗 ⃒|퐾푖,푗| − |퐾푖,푗|⃒ < 2훿/훼.

74 Chapter 3

The Spread and Bipartite Spread Conjecture

3.1 Introduction

The spread 푠(푀) of an arbitrary 푛 × 푛 complex matrix 푀 is the diameter of its spectrum; that is,

푠(푀) := max |휆푖 − 휆푗|, 푖,푗 where the maximum is taken over all pairs of eigenvalues of 푀. This quantity has been well-studied in general, see [33, 53, 83, 129] for details and additional references. Most notably, Johnson, Kumar, and Wolkowicz produced the lower bound

⃒ ∑︀ ⃒ 푠(푀) ≥ ⃒ 푖̸=푗 푚푖,푗⃒/(푛 − 1)

for normal matrices 푀 = (푚푖,푗) [53, Theorem 2.1], and Mirsky produced the upper bound

√︁ ∑︀ 2 ⃒ ∑︀ ⃒2 푠(푀) ≤ 2 푖,푗 |푚푖,푗| − (2/푛)⃒ 푖 푚푖,푖⃒

for any 푛 × 푛 matrix 푀, which is tight for normal matrices with 푛 − 2 of its eigenvalues all equal and equal to the arithmetic mean of the other two [83, Theorem 2].

75 The spread of a matrix has also received interest in certain particular cases. Consider a simple undirected graph 퐺 = (푉, 퐸) of order 푛. The adjacency matrix 퐴 of a graph 퐺 is the 푛 × 푛 matrix whose rows and columns are indexed by the vertices of 퐺, with entries satisfying ⎧ ⎨⎪ 1 if {푢, 푣} ∈ 퐸(퐺) 퐴푢,푣 = . ⎩⎪ 0 otherwise

This matrix is real and symmetric, and so its eigenvalues are real, and can be ordered

휆1(퐺) ≥ 휆2(퐺) ≥ · · · ≥ 휆푛(퐺). When considering the spread of the adjacency matrix 퐴

of some graph 퐺, the spread is simply the distance between 휆1(퐺) and 휆푛(퐺), denoted by

푠(퐺) := 휆1(퐺) − 휆푛(퐺).

In this instance, 푠(퐺) is referred to as the spread of the graph. The spectrum of a bipartite

graph is symmetric with respect to the origin, and, in particular, 휆푛(퐺) = −휆1(퐺) and so

푠(퐺) = 2 휆1(퐺). In [48], the authors investigated a number of properties regarding the spread of a graph, determining upper and lower bounds on 푠(퐺). Furthermore, they made two key conjectures. Let us denote the maximum spread over all 푛 vertex graphs by 푠(푛), the maximum spread over all 푛 vertex graphs of size 푚 by 푠(푛, 푚), and the maximum spread

over all 푛 vertex bipartite graphs of size 푚 by 푠푏(푛, 푚). Let 퐾푘 be the clique of order 푘 and

퐺(푛, 푘) := 퐾푘 ∨ 퐾푛−푘 be the join of the clique 퐾푘 and the independent set 퐾푛−푘 (i.e., the

disjoint union of 퐾푘 and 퐾푛−푘, with all possible edges between the two graphs added). The conjectures addressed in this article are as follows.

Conjecture 1 ([48], Conjecture 1.3). For any positive integer 푛, the graph of order 푛 with maximum spread is 퐺(푛, ⌊2푛/3⌋); that is, 푠(푛) is attained only by 퐺(푛, ⌊2푛/3⌋).

Conjecture 2 ([48], Conjecture 1.4). If 퐺 is a graph with 푛 vertices and 푚 edges attaining the maximum spread 푠(푛, 푚), and if 푚 ≤ ⌊푛2/4⌋, then 퐺 must be bipartite. That is,

2 푠푏(푛, 푚) = 푠(푛, 푚) for all 푚 ≤ ⌊푛 /4⌋.

76 Conjecture 1 is referred to as the Spread Conjecture, and Conjecture 2 is referred to as the Bipartite Spread Conjecture. In this chapter, we investigate both conjectures. In Section 3.2, we prove an asymptotic version of the Bipartite Spread Conjecture, and provide an inﬁnite family of counterexamples to illustrate that our asymptotic version is as tight as possible, up to lower order terms. In Section 3.3, we provide a high-level sketch of a proof that the Spread Conjecture holds for all 푛 sufﬁciently large (the full proof can be found in [12]). The exact associated results are given by Theorems 8 and 9.

Theorem 8. There exists a constant 푁 so that the following holds: Suppose 퐺 is a graph on 푛 ≥ 푁 vertices with maximum spread; then 퐺 is the join of a clique on ⌊2푛/3⌋ vertices and an independent set on ⌈푛/3⌉ vertices.

Theorem 9. 1 + 16 푚−3/4 푠(푛, 푚) − 푠 (푛, 푚) ≤ 푠(푛, 푚) 푏 푚3/4

2 for all 푛, 푚 ∈ N satisfying 푚 ≤ ⌊푛 /4⌋. In addition, for any 휀 > 0, there exists some 푛휀 such that 1 − 휀 푠(푛, 푚) − 푠 (푛, 푚) ≥ 푠(푛, 푚) 푏 푚3/4

2 for all 푛 ≥ 푛휀 and some 푚 ≤ ⌊푛 /4⌋ depending on 푛.

The proof of Theorem 8 is quite involved, and for this reason we provide only a sketch, illustrating the key high-level details involved in the proof, and referring some of the exact details to the main paper [12]. The general technique consists of showing that a spread- extremal graph has certain desirable properties, considering and solving an analogous problem for graph limits, and then using this result to say something about the Spread Conjecture for sufﬁciently large 푛. In comparison, the proof of Theorem 9 is surprisingly short, making use of the theory of equitable decompositions and a well-chosen class of counterexamples.

77 3.2 The Bipartite Spread Conjecture

In [48], the authors investigated the structure of graphs which maximize the spread over all graphs with a ﬁxed number of vertices 푛 and edges 푚, denoted by 푠(푛, 푚). In particular, they proved the upper bound

√︁ √ 2 푠(퐺) ≤ 휆1 + 2푚 − 휆1 ≤ 2 푚, (3.1)

and noted that equality holds throughout if and only if 퐺 is the union of isolated vertices

and 퐾푝,푞, for some 푝 + 푞 ≤ 푛 satisfying 푚 = 푝푞 [48, Thm. 1.5]. This led the authors to conjecture that if 퐺 has 푛 vertices, 푚 ≤ ⌊푛2/4⌋ edges, and spread 푠(푛, 푚), then 퐺 is bipartite [48, Conj. 1.4]. In this section, we prove a weak asymptotic form of this conjecture and provide an inﬁnite family of counterexamples to the exact conjecture which veriﬁes that the error in the aforementioned asymptotic result is of the correct order of magnitude. Recall

2 that 푠푏(푛, 푚), 푚 ≤ ⌊푛 /4⌋, is the maximum spread over all bipartite graphs with 푛 vertices and 푚 edges. To explicitly compute the spread of certain graphs, we make use of the theory of equitable partitions. In particular, we note that if 휑 is an automorphism of 퐺, then the

quotient matrix of 퐴(퐺) with respect to 휑, denoted by 퐴휑, satisﬁes Λ(퐴휑) ⊂ Λ(퐴), and

therefore 푠(퐺) is at least the spread of 퐴휑 (for details, see [13, Section 2.3]). Additionally, we require two propositions, one regarding the largest spectral radius of subgraphs of 퐾푝,푞 of a given size, and another regarding the largest gap between sizes which correspond to a complete bipartite graph of order at most 푛.

푚 Let 퐾푝,푞, 0 ≤ 푝푞 − 푚 < min{푝, 푞}, be the subgraph of 퐾푝,푞 resulting from removing 푝푞 − 푚 edges all incident to some vertex in the larger side of the bipartition (if 푝 = 푞, the vertex can be from either set). In [78], the authors proved the following result.

푚 Proposition 5. If 0 ≤ 푝푞 − 푚 < min{푝, 푞}, then 퐾푝,푞 maximizes 휆1 over all subgraphs of

퐾푝,푞 of size 푚.

We also require estimates regarding the longest sequence of consecutive sizes 푚 < ⌊푛2/4⌋ for which there does not exist a complete bipartite graph on at most 푛 vertices and

78 exactly 푒 edges. As pointed out by [4], the result follows quickly by induction. However, for completeness, we include a brief proof.

Proposition 6. The length of the longest sequence of consecutive sizes 푚 < ⌊푛2/4⌋ for which there does not exist a complete bipartite graph on at most 푛 vertices and exactly 푚 √ edges is zero for 푛 ≤ 4 and at most 2푛 − 1 − 1 for 푛 ≥ 5.

Proof. We proceed by induction. By inspection, for every 푛 ≤ 4, 푚 ≤ ⌊푛2/4⌋, there exists a complete bipartite graph of size 푚 and order at most 푛, and so the length of the longest sequence is trivially zero for 푛 ≤ 4. When 푛 = 푚 = 5, there is no complete bipartite graph of order at most ﬁve with exactly ﬁve edges. This is the only such instance for 푛 = 5, and so the length of the longest sequence for 푛 = 5 is one.

Now, suppose that the statement holds for graphs of order at most 푛 − 1, for some 푛 > 5. We aim to show the statement for graphs of order at most 푛. By our inductive hypothesis, it sufﬁces to consider only sizes 푚 ≥ ⌊(푛 − 1)2/4⌋ and complete bipartite graphs on 푛 vertices. We have √ (︁푛 )︁ (︁푛 )︁ (푛 − 1)2 2푛 − 1 + 푘 − 푘 ≥ for |푘| ≤ . 2 2 4 2 √ When 1 ≤ 푘 ≤ 2푛 − 1/2, the difference between the sizes of 퐾푛/2+푘−1,푛/2−푘+1 and

퐾푛/2+푘,푛/2−푘 is at most

⃒ (︀ )︀⃒ ⃒ (︀ )︀⃒ √ 퐸 퐾 푛 푛 − 퐸 퐾 = 2푘 − 1 ≤ 2푛 − 1 − 1. ⃒ 2 +푘−1, 2 −푘+1 ⃒ ⃒ 푛/2+푘,푛/2−푘 ⃒

√ Let 푘* be the largest value of 푘 satisfying 푘 ≤ 2푛 − 1/2 and 푛/2 + 푘 ∈ N. Then

(︂ √ )︂ (︂ √ )︂ ⃒ (︀ )︀⃒ 푛 2푛 − 1 푛 2푛 − 1 ⃒퐸 퐾 푛 +푘*, 푛 −푘* ⃒ < + − 1 − + 1 2 2 2 2 2 2 √ (푛 − 1)2 = 2푛 − 1 + − 1, 4

79 and the difference between the sizes of 퐾푛/2+푘*,푛/2−푘* and 퐾 푛−1 푛−1 is at most ⌈ 2 ⌉,⌊ 2 ⌋

2 ⌊︂ 2 ⌋︂ ⃒ (︀ )︀⃒ ⃒ (︀ )︀⃒ √ (푛 − 1) (푛 − 1) ⃒퐸 퐾 푛 +푘*, 푛 −푘* ⃒ − ⃒퐸 퐾⌈ 푛−1 ⌉,⌊ 푛−1 ⌋ ⃒ < 2푛 − 1 + − − 1 2 2 2 2 4 4 √ < 2푛 − 1.

Combining these two estimates completes our inductive step, and the proof.

We are now prepared to prove an asymptotic version of [48, Conjecture 1.4], and provide an inﬁnite class of counterexamples that illustrates that the asymptotic version under consideration is the tightest version of this conjecture possible.

Theorem 10. 1 + 16 푚−3/4 푠(푛, 푚) − 푠 (푛, 푚) ≤ 푠(푛, 푚) 푏 푚3/4

2 for all 푛, 푚 ∈ N satisfying 푚 ≤ ⌊푛 /4⌋. In addition, for any 휖 > 0, there exists some 푛휖 such that 1 − 휖 푠(푛, 푚) − 푠 (푛, 푚) ≥ 푠(푛, 푚) 푏 푚3/4

2 for all 푛 ≥ 푛휖 and some 푚 ≤ ⌊푛 /4⌋ depending on 푛.

Proof. The main idea of the proof is as follows. To obtain an upper bound on 푠(푛, 푚) − √ 푠푏(푛, 푚), we upper bound 푠(푛, 푚) by 2 푚 (using Inequality 3.1) and lower bound 푠푏(푛, 푚)

by the spread of some speciﬁc bipartite graph. To obtain a lower bound on 푠(푛, 푚)−푠푏(푛, 푚)

for a speciﬁc 푛 and 푚, we explicitly compute 푠푏(푛, 푚) using Proposition 5, and lower bound 푠(푛, 푚) by the spread of some speciﬁc non-bipartite graph.

푚 First, we analyze the spread of 퐾푝,푞, 0 < 푝푞 − 푚 < 푞 ≤ 푝, a quantity that will be used in the proof of both the upper and lower bound. Let us denote the vertices in the bipartition

푚 of 퐾푝,푞 by 푢1, ..., 푢푝 and 푣1, ..., 푣푞, and suppose without loss of generality that 푢1 is not

adjacent to 푣1, ..., 푣푝푞−푚. Then

휑 = (푢1)(푢2, ..., 푢푝)(푣1, ..., 푣푝푞−푚)(푣푝푞−푚+1, ..., 푣푞)

80 푚 is an automorphism of 퐾푝,푞. The corresponding quotient matrix is given by

⎛ ⎞ 0 0 0 푚 − (푝 − 1)푞 ⎜ ⎟ ⎜ ⎟ ⎜0 0 푝푞 − 푚 푚 − (푝 − 1)푞⎟ 퐴휑 = ⎜ ⎟ , ⎜ ⎟ ⎜0 푝 − 1 0 0 ⎟ ⎝ ⎠ 1 푝 − 1 0 0

has characteristic polynomial

4 2 푄(푝, 푞, 푚) = det[퐴휑 − 휆퐼] = 휆 − 푚휆 + (푝 − 1)(푚 − (푝 − 1)푞)(푝푞 − 푚),

and, therefore,

(︃ )︃1/2 푚 + √︀푚2 − 4(푝 − 1)(푚 − (푝 − 1)푞)(푝푞 − 푚) 푠 (︀퐾푚 )︀ ≥ 2 . (3.2) 푝,푞 2

2 푚 For 푝푞 = Ω(푛 ) and 푛 sufﬁciently large, this upper bound is actually an equality, as 퐴(퐾푝,푞) is a perturbation of the adjacency matrix of a complete bipartite graph with Ω(푛) bipartite √ sets by an 푂( 푛) norm matrix. For the upper bound, we only require the inequality, but for the lower bound, we assume 푛 is large enough so that this is indeed an equality.

Next, we prove the upper bound. For some ﬁxed 푛 and 푚 ≤ ⌊푛2/4⌋, let 푚 = 푝푞 − 푟, where 푝, 푞, 푟 ∈ N, 푝 + 푞 ≤ 푛, and 푟 is as small as possible. If 푟 = 0, then by [48,

Thm. 1.5] (described above), 푠(푛, 푚) = 푠푏(푛, 푚) and we are done. Otherwise, we note that 0 < 푟 < min{푝, 푞}, and so Inequality 3.2 is applicable (in fact, by Proposition 6, √ √ 푟 = 푂( 푛)). Using the upper bound 푠(푛, 푚) ≤ 2 푚 and Inequality 3.2, we have

(︃ √︃ )︃1/2 푠(푛, 푝푞 − 푟) − 푠 (︀퐾푚 )︀ 1 1 4(푝 − 1)(푞 − 푟)푟 푝,푞 ≤ 1 − + 1 − . (3.3) 푠(푛, 푝푞 − 푟) 2 2 (푝푞 − 푟)2

√ To upper bound 푟, we use Proposition 6 with 푛′ = ⌈2 푚⌉ ≤ 푛 and 푚. This implies that

√︁ √ √︁ √ √︁ √ 푟 ≤ 2⌈2 푚⌉ − 1 − 1 < 2(2 푚 + 1) − 1 − 1 = 4 푚 + 1 − 1 ≤ 2푚1/4.

81 √ Recall that 1 − 푥 ≥ 1 − 푥/2 − 푥2/2 for all 푥 ∈ [0, 1], and so

√ (︀ 1 1 )︀1/2 (︀ 1 1 1 1 2 )︀1/2 (︀ 1 2 )︀ 1 − 2 + 2 1 − 푥 ≤ 1 − 2 + 2 (1 − 2 푥 − 2 푥 ) = 1 − 1 − 4 (푥 + 푥 ) (︀ 1 2 1 2 2)︀ ≤ 1 − 1 − 8 (푥 + 푥 ) − 32 (푥 + 푥 ) 1 1 2 ≤ 8 푥 + 4 푥

for 푥 ∈ [0, 1]. To simplify Inequality 3.3, we observe that

4(푝 − 1)(푞 − 푟)푟 4푟 8 ≤ ≤ . (푝푞 − 푟)2 푚 푚3/4

Therefore, 푠(푛, 푝푞 − 푟) − 푠 (︀퐾푚 )︀ 1 16 푝,푞 ≤ + . 푠(푛, 푝푞 − 푟) 푚3/4 푚3/2 This completes the proof of the upper bound.

Finally, we proceed with the proof of the lower bound. Let us ﬁx some 0 < 휖 < 1, and consider some arbitrarily large 푛. Let 푚 = (푛/2 + 푘)(푛/2 − 푘) + 1, where 푘 is the smallest

number satisfying 푛/2 + 푘 ∈ N and 휖ˆ := 1 − 2푘2/푛 < 휖/2 (here we require 푛 = Ω(1/휖2)).

Denote the vertices in the bipartition of 퐾푛/2+푘,푛/2−푘 by 푢1, ..., 푢푛/2+푘 and 푣1, ..., 푣푛/2−푘, + and consider the graph 퐾푛/2+푘,푛/2−푘 := 퐾푛/2+푘,푛/2−푘 ∪ {(푣1, 푣2)} resulting from adding

one edge to 퐾푛/2+푘,푛/2−푘 between two vertices in the smaller side of the bipartition. Then

휑 = (푢1, ..., 푢푛/2+푘)(푣1, 푣2)(푣3, ..., 푣푛/2−푘)

+ is an automorphism of 퐾푛/2+푘,푛/2−푘, and

⎛ ⎞ 0 2 푛/2 − 푘 − 2 ⎜ ⎟ ⎜ ⎟ 퐴휑 = ⎜푛/2 + 푘 1 0 ⎟ ⎝ ⎠ 푛/2 + 푘 0 0

82 has characteristic polynomial

3 2 (︀ 2 2)︀ det[퐴휑 − 휆퐼] = −휆 + 휆 + 푛 /4 − 푘 휆 − (푛/2 + 푘)(푛/2 − 푘 − 2) (︂푛2 (1 − 휖ˆ)푛)︂ (︂푛2 (3 − 휖ˆ)푛 )︂ = −휆3 + 휆2 + − 휆 − − − √︀2(1 − 휖ˆ)푛 . 4 2 4 2

By matching higher order terms, we obtain

푛 1 − 휖ˆ (8 − (1 − 휖ˆ)2) 휆 (퐴 ) = − + + 표(1/푛), 푚푎푥 휑 2 2 4푛

푛 1 − 휖ˆ (8 + (1 − 휖ˆ)2) 휆 (퐴 ) = − + + + 표(1/푛), 푚푖푛 휑 2 2 4푛 and (1 − 휖ˆ)2 푠(퐾+ ) ≥ 푛 − (1 − 휖ˆ) − + 표(1/푛). 푛/2+푘,푛/2−푘 2푛

Next, we aim to compute 푠푏(푛, 푚), 푚 = (푛/2 + 푘)(푛/2 − 푘) + 1. By Proposition 5, 푚 푠푏(푛, 푚) is equal to the maximum of 푠(퐾푛/2+ℓ,푛/2−ℓ) over all ℓ ∈ [0, 푘 − 1], 푘 − ℓ ∈ N. As 푚 previously noted, for 푛 sufﬁciently large, the quantity 푠(퐾푛/2+ℓ,푛/2−ℓ) is given exactly by Equation (3.2), and so the optimal choice of ℓ minimizes

푓(ℓ) := (푛/2 + ℓ − 1)(푘2 − ℓ2 − 1)(푛/2 − ℓ − (푘2 − ℓ2 − 1))

= (푛/2 + ℓ)(︀(1 − 휖ˆ)푛/2 − ℓ2)︀(︀휖푛/ˆ 2 + ℓ2 − ℓ)︀ + 푂(푛2).

We have 푓(푘 − 1) = (푛/2 + 푘 − 2)(2푘 − 2)(푛/2 − 3푘 + 3),

4 3 4 and if ℓ ≤ 5 푘, then 푓(ℓ) = Ω(푛 ). Therefore the minimizing ℓ is in [ 5 푘, 푘]. The derivative of 푓(ℓ) is given by

푓 ′(ℓ) = (푘2 − ℓ2 − 1)(푛/2 − ℓ − 푘2 + ℓ2 + 1)

− 2ℓ(푛/2 + ℓ − 1)(푛/2 − ℓ − 푘2 + ℓ2 + 1)

+ (2ℓ − 1)(푛/2 + ℓ − 1)(푘2 − ℓ2 − 1).

83 4 For ℓ ∈ [ 5 푘, 푘],

푛(푘2 − ℓ2) 푓 ′(ℓ) ≤ − ℓ푛(푛/2 − ℓ − 푘2 + ℓ2) + 2ℓ(푛/2 + ℓ)(푘2 − ℓ2) 2 9푘2푛 ≤ − 4 푘푛(푛/2 − 푘 − 9 푘2) + 18 (푛/2 + 푘)푘3 50 5 25 25 81푘3푛 2푘푛2 = − + 푂(푛2) 125 5 (︂81(1 − 휖ˆ) 2)︂ = 푘푛2 − + 푂(푛2) < 0 250 5

for sufﬁciently large 푛. This implies that the optimal choice is ℓ = 푘 − 1, and 푠푏(푛, 푚) = 푚 2 2 푠(퐾푛/2+푘−1,푛/2−푘+1). The characteristic polynomial 푄(푛/2+푘−1, 푛/2−푘+1, 푛 /4−푘 +1) equals

휆4 − (︀푛2/4 − 푘2 + 1)︀ 휆2 + 2(푛/2 + 푘 − 2)(푛/2 − 3푘 + 3)(푘 − 1).

By matching higher order terms, the extreme root of 푄 is given by

√︂ 푛 1 − 휖ˆ 2(1 − 휖ˆ) 27 − 14ˆ휖 − 휖ˆ2 휆 = − − + + 표(1/푛), 2 2 푛 4푛

and so

√︂ 2(1 − 휖ˆ) 27 − 14ˆ휖 − 휖ˆ2 푠 (푛, 푚) = 푛 − (1 − 휖ˆ) − 2 + + 표(1/푛), 푏 푛 2푛

and

푠(푛, 푚) − 푠 (푛, 푚) 23/2(1 − 휖ˆ)1/2 14 − 8ˆ휖 푏 ≥ − + 표(1/푛2) 푠(푛, 푚) 푛3/2 푛2 (1 − 휖ˆ)1/2 (1 − 휖ˆ)1/2 [︂ (푛/2)3/2 ]︂ 14 − 8ˆ휖 = + 1 − − + 표(1/푛2) 푚3/4 (푛/2)3/2 푚3/4 푛2 1 − 휖/2 ≥ + 표(1/푚3/4). 푚3/4

This completes the proof.

84 3.3 A Sketch for the Spread Conjecture

Here, we provide a concise, high-level description of an asymptotic proof of the Spread Conjecture, proving only the key graph-theoretic structural results for spread-extremal graphs. The full proof itself is quite involved (and long), making use of interval arithmetic and a number of fairly complicated symbolic calculations, but conceptually, is quite intuitive. For the full proof, we refer the reader to [12]. The proof consists of four main steps:

Step 1: Graph-Theoretic Results

In Subsection 3.3.1, we observe a number of important structural properties of any graph that maximizes the spread for a given order 푛. In particular, in Theorem 11, we show that

• any graph that maximizes spread is be the join of two threshold graphs, each with order linear in 푛,

• the unit eigenvectors x and z corresponding to 휆1(퐺) and 휆푛(퐺) have inﬁnity norms of order 푛−1/2 ,

2 2 −1 • the quantities 휆1x푢 − 휆푛z푢, 푢 ∈ 푉 , are all nearly equal, up to a term of order 푛 .

This last structural property serves as the key backbone of our proof. In addition, we

note that, by a tensor argument (replacing vertices by independent sets 퐾푡 and edges by

bicliques 퐾푡,푡), an asymptotic upper bound for 푠(푛) implies a bound for all 푛.

Step 2: Graph Limits, Averaging, and a Finite-Dimensional Eigenvalue Problem

For this step, we provide only a brief overview, and refer the reader to [12] for additional details. In this step, we can make use of graph limits to understand how spread-extremal graphs behave as 푛 tends to inﬁnity. By proving a continuous version of Theorem 11 for graphons, and using an averaging argument inspired by [112], we can show that

85 the spread-extremal graphon takes the form of a step-graphon with a ﬁxed structure of symmetric seven by seven blocks, illustrated below.

푇 The lengths 훼 = (훼1, ..., 훼7), 훼 1 = 1, of each row/column in the spread-optimal step-graphon is unknown. For any choice of lengths 훼, we can associate a seven by seven matrix whose spread is identical to that of the associated step-graphon pictured above. Let 퐵 be the seven by seven matrix with 퐵푖,푗 equal to the value of the above step-graphon on block 푖, 푗, and 퐷 = diag(훼1, ..., 훼7) be a diagonal matrix with 훼 on the diagonal. Then the matrix 퐷1/2퐵퐷1/2, given by

⎛ ⎞ 1 1 1 1 1 1 1 ⎜ ⎟ ⎜ ⎟ ⎜1 1 1 0 1 1 1⎟ ⎜ ⎟ ⎜ ⎟ ⎜1 1 0 0 1 1 1⎟ ⎜ ⎟ 1/2 ⎜ ⎟ 1/2 diag(훼1, ..., 훼7) ⎜1 0 0 0 1 1 1⎟ diag(훼1, ..., 훼7) ⎜ ⎟ ⎜ ⎟ ⎜1 1 1 1 1 1 0⎟ ⎜ ⎟ ⎜ ⎟ ⎜1 1 1 1 1 0 0⎟ ⎝ ⎠ 1 1 1 1 0 0 0 has spread equal to the spread of the associated step-graphon. This process has reduced the graph limit version of our problem to a ﬁnite-dimensional eigenvalue optimization problem, which we can endow with additional structural constraints resulting from the graphon version of Theorem 11.

86 (a) 훼푖 ̸= 0 for all 푖 (b) 훼2 = 훼3 = 훼4 = 0 Figure 3-1: Contour plots of the spread for some choices of 훼. Each point (푥, 푦) of Plot (a) illustrates the maximum spread over all choices of 훼 satisfying 훼3 +훼4 = 푥 and 훼6 +훼7 = 푦 (and therefore, 훼1 + 훼2 + 훼5 = 1 − 푥 − 푦) on a grid of step size 1/100. Each point (푥, 푦) of Plot (b) illustrates the maximum spread over all choices of 훼 satisfying 훼2 = 훼3 = 훼4 = 0, 훼5 = 푦, and 훼7 = 푥 on a grid of step size 1/100. The maximum spread of Plot (a) is achieved at the black x, and implies that, without loss of generality, 훼3 + 훼4 = 0, and therefore 훼2 = 0 (indices 훼1 and 훼2 can be combined when 훼3 + 훼4 = 0). Plot (b) treats this case when 훼2 = 훼3 = 훼4 = 0, and the maximum spread is achieved on the black line. This implies that either 훼5 = 0 or 훼7 = 0. In both cases, this reduces to the block two by two case 훼1, 훼7 ̸= 0 (or, if 훼7 = 0, then 훼1, 훼6 ̸= 0).

Step 3: Computer-Assisted Proof of a Finite-Dimensional Eigenvalue Problem

Using a computer-assisted proof, we can show that the optimizing choice of 훼푖, 푖 =

1, ..., 7, is, without loss of generality, given by 훼1 = 2/3, 훼6 = 1/3, and all other 훼푖 = 0. This is exactly the limit of the conjectured spread optimal graph as 푛 tends to infinity. The proof of this fact is extremely technical. Though not a proof, in Figure 1 we provide intuitive visual justification that this result is true. In this figure, we provide contour plots resulting from numerical computations of the spread of the above matrix for various values of 훼. The numerical results suggest that the 2/3 − 1/3 two by two block step-graphon is indeed optimal. See Figure 1 and the associated caption for details. The

87 actual proof of this fact consists of three parts. First, we can reduce the possible choices

7 of non-zero 훼푖 from 2 to 17 different cases by removing different representations of the 2 2 same matrix. Second, using eigenvalue equations, the graph limit version of 휆1x푢 −휆푛z푢 all nearly equal, and interval arithmetic, we can prove that, of the 17 cases, only the

cases 훼1, 훼7 ̸= 0 or 훼4, 훼5, 훼7 ̸= 0 need to be considered. Finally, using basic results from the theory of cubic polynomials and computer-assisted symbolic calculations, we

can restrict ourselves to the case 훼1, 훼7 ̸= 0. This case corresponds to a quadratic

polynomial and can easily be shown to be maximized by 훼1 = 2/3, 훼7 = 1/3. We refer the reader to [118] for the code (based on [95]) and output of the interval arithmetic algorithm (the key portion of this step), and [12] for additional details regarding this step.

Step 4: From Graph Limits to an Asymptotic Proof of the Spread Conjecture

We can convert our result for the spread-optimal graph limit to a statement for graphs.

This process consists of showing that any spread optimal graph takes the form (퐾푛1 ∪

퐾푛2 ) ∨ 퐾푛3 for 푛1 = (2/3 + 표(1))푛, 푛2 = 표(푛), and 푛3 = (1/3 + 표(1))푛, i.e. any spread optimal graph is equal up to a set of 표(푛) vertices to the conjectured optimal

graph 퐾⌊2푛/3⌋ ∨ 퐾⌈푛/3⌉, and then showing that, for 푛 sufﬁciently large, the spread of

(퐾푛1 ∪ 퐾푛2 ) ∨ 퐾푛3 , 푛1 + 푛2 + 푛3 = 푛, is maximized when 푛2 = 0. This step is fairly straightforward, and we refer the reader to [12] for details.

3.3.1 Properties of Spread-Extremal Graphs

In this subsection, we consider properties of graphs that maximize the quantity 휆1(퐺) −

푐 휆푛(퐺) over all graphs of order 푛, for some ﬁxed 푐 > 0. When 푐 = 1, this is exactly the class of spread-extremal graphs. Let x be the unit eigenvector of 휆1 and z be the unit eigenvector of 휆푛. As noted in [48] (for the case of 푐 = 1), we have

∑︁ 휆1 − 푐 휆푛 = x푢x푣 − 푐 z푢z푣, 푢∼푣

88 and if a graph 퐺 maximizes 휆1 − 푐 휆푛 over all 푛 vertex graphs, then 푢 ∼ 푣, 푢 ̸= 푣, if x푢x푣 − 푐 z푢z푣 > 0 and 푢 ̸∼ 푣 if x푢x푣 − 푐 z푢z푣 < 0. We ﬁrst produce some weak lower

bounds for the maximum of 휆1 − 푐 휆푛 over all 푛 vertex graphs.

Proposition 7. [︀ 2 ]︀ max 휆1(퐺) − 푐 휆푛(퐺) ≥ 푛 1 + min{1, 푐 }/50 퐺

for all 푐 > 0 and 푛 ≥ 100 max{1, 푐−2}.

Proof. We consider two different graphs, depending on the choice of 푐. When 푐 ≥ 5/4,

푛 ≥ 100, we consider the bicliques 퐾⌊푛/2⌋,⌈푛/2⌉ and note that

(푐 + 1)(푛 − 1) 휆 (퐾 ) − 푐 휆 (퐾 ) ≥ > 푛[1 + 1/50]. 1 ⌊푛/2⌋,⌈푛/2⌉ 푛 ⌊푛/2⌋,⌈푛/2⌉ 2

When 푐 ≤ 5/4, we consider the graph 퐺(푛, 푘) = 퐾푘 ∨ 퐾푛−푘, 푘 > 0, with characteristic polynomial 휆푛−푘−1(휆 + 1)푘−1(︀휆2 − (푘 − 1)휆 − 푘(푛 − 푘))︀, and note that

(1 − 푐)(푘 − 1) 1 + 푐 휆 (퐺(푛, 푘)) − 푐 휆 (퐺(푛, 푘)) = + √︀(푘 − 1)2 + 4(푛 − 푘)푘. 1 푛 2 2

Setting 푘ˆ = ⌈(1−푐/3)푛⌉+1 and using the inequalities (1−푐/3)푛+1 ≤ 푘ˆ < (1−푐/3)푛+2, we have (1 − 푐)(푘ˆ − 1) (1 − 푐)(1 − 푐 )푛 [︂1 2푐 푐2 ]︂ ≥ 3 = 푛 − + , 2 2 2 3 6 and

(︂ 푐 )︂2 (︂푐푛 )︂(︂ 푐푛 )︂ (푘ˆ − 1)2 + 4(푛 − 푘ˆ)푘ˆ ≥ 1 − 푛2 + 4 − 2 푛 − + 1 3 3 3 [︂ 2푐 푐2 4푐 − 8 8 ]︂ 2푐 푐2 = 푛2 1 + − + − ≥ 1 + − 3 3 푛 푛2 3 2

for 푛 ≥ 100/푐2. Therefore,

√︂ [︂1 2푐 푐2 1 + 푐 2푐 푐2 ]︂ 휆 (퐺(푛, 푘ˆ)) − 푐 휆 (퐺(푛, 푘ˆ)) ≥ 푛 − + + 1 + − . 1 푛 2 3 6 2 3 2

89 √ Using the inequality 1 + 푥 ≥ 1 + 푥/2 − 푥2/8 for all 푥 ∈ [0, 1],

√︂ 2 2푐 푐2 (︂ 푐 푐2 )︂ 1(︂2푐 푐2 )︂ 1 + − ≥ 1 + − − − , 3 2 3 4 8 3 2

and, after a lengthy calculation, we obtain

[︂ 13푐2 푐3 5푐4 푐5 ]︂ 휆 (퐺(푛, 푘ˆ)) − 푐 휆 (퐺(푛, 푘ˆ)) ≥ 푛 1 + − + − 1 푛 72 9 192 64 [︂ ]︂ ≥ 푛 1 + 푐2/50 .

Combining these two estimates completes the proof.

This implies that for 푛 sufﬁciently large and 푐 ﬁxed, any extremal graph must satisfy

|휆푛| = Ω(푛). In the following theorem, we make a number of additional observations regarding the structure of these extremal graphs. Similar results for the speciﬁc case of 푐 = 1 can be found in the full paper [12].

Theorem 11. Let 퐺 = (푉, 퐸), 푛 ≥ 3, maximize the quantity 휆1(퐺) − 푐 휆푛(퐺), for some

푐 > 0, over all 푛 vertex graphs, and x and z be unit eigenvectors corresponding to 휆1 and

휆푛, respectively. Then

1. 퐺 = 퐺[푃 ] ∨ 퐺[푁], where 푃 = {푢 ∈ 푉 | z푢 ≥ 0} and 푁 = 푉 ∖푃 ,

2. x푢x푣 − 푐 z푢z푣 ̸= 0 for all 푢 ̸= 푣,

3. 퐺[푃 ] and 퐺[푁] are threshold graphs, i.e., there exists an ordering 푢1, ..., 푢|푃 | of the

vertices of 퐺[푃 ] (and 퐺[푁]) such that if 푢푖 ∼ 푢푗, 푖 < 푗, then 푢푗 ∼ 푢푘, 푘 ̸= 푖, implies

that 푢푖 ∼ 푢푘,

4. |푃 |, |푁| > min{1, 푐4} 푛/2500 for 푛 ≥ 100 max{1, 푐−2},

1/2 3 −3 1/2 −2 5. ‖x‖∞ ≤ 6/푛 and ‖z‖∞ ≤ 10 max{1, 푐 }/푛 for 푛 ≥ 100 max{1, 푐 },

⃒ 2 2 ⃒ 12 −12 6. ⃒푛(휆1x푢 − 푐 휆푛z푢) − (휆1 − 푐 휆푛)⃒ ≤ 5 · 10 max{푐, 푐 } for all 푢 ∈ 푉 for 푛 ≥ 4 · 106 max{1, 푐−6}.

90 Proof. Property 1 follows immediately, as x푢x푣 − 푐 z푢z푣 > 0 for all 푢 ∈ 푃 , 푣 ∈ 푁. To prove Property 2, we proceed by contradiction. Suppose that this is not the case, that

′ x푢1 x푢2 − 푐 z푢1 z푢2 = 0 for some 푢1 ̸= 푢2. Let 퐺 be the graph resulting from either

adding edge {푢1, 푢2} to 퐺 (if {푢1, 푢2} ̸∈ 퐸(퐺)) or removing edge {푢1, 푢2} from 퐺 (if

{푢1, 푢2} ∈ 퐸(퐺)). Then,

∑︁ ′ ′ 휆1(퐺) − 푐 휆푛(퐺) = x푢x푣 − 푐 z푢z푣 ≤ 휆1(퐺 ) − 푐 휆푛(퐺 ), {푢,푣}∈퐸(퐺′)

and so, by the optimality of 퐺, x and z are also eigenvectors of the adjacency matrix of 퐺′. This implies that

⃒ ′ ⃒ ⃒ ⃒ ′ ⃒(휆1(퐺) − 휆1(퐺 ))x푢1 ⃒ = ⃒x푢2 ⃒ and (휆1(퐺) − 휆1(퐺 ))x푣 = 0

for any 푣 ̸= 푢1, 푢2. If 푛 ≥ 3, then there exists some 푣 such that x푣 = 0, and so, by the Perron-Frobenius theorem, 퐺 is disconnected. By Property 1, either 푃 = ∅ or 푁 = ∅, and

so 휆푛 ≥ 0, a contradiction, as the trace of 퐴 is zero. This completes the proof of Property

2. To show Property 3, we simply order the vertices of 푃 so that the quantity z푢푖 /x푢푖 is

non-decreasing in 푖. If 푢푖 ∼ 푢푗, 푖 < 푗, and 푢푗 ∼ 푢푘, without loss of generality with 푖 < 푘, then

x푢푖 x푢푖 x푢푖 x푢푘 − 푐 z푢푖 z푢푘 ≥ x푢푖 x푢푘 − 푐 z푢푗 z푢푘 = (x푢푗 x푢푘 − 푐 z푢푗 z푢푘 ) > 0. x푢푗 x푢푗

To show Properties 4-6, we assume 푛 ≥ 100 max{1, 푐−2}, and so, by Proposition 7,

2 휆푛(퐺) < − min{1, 푐 } 푛/50. We begin with Property 4. Suppose, to the contrary, that 4 |푃 | ≤ min{1, 푐 } 푛/2500, and let 푣 maximize |z푣|. If 푣 ∈ 푁, then

∑︁ z푢 ∑︁ z푢 4 휆푛 = ≥ ≥ −|푃 | ≥ − min{1, 푐 } 푛/2500, z푣 z푣 푢∼푣 푢∈푃

91 2 a contradiction to 휆푛(퐺) < − min{1, 푐 } 푛/50. If 푣 ∈ 푃 , then

2 ∑︁ 휆푛z푢 ∑︁ ∑︁ z푤 ∑︁ ∑︁ z푤 4 2 휆푛 = = ≤ ≤ 푛 |푃 | ≤ min{1, 푐 } 푛 /2500, z푣 z푣 z푣 푢∼푣 푢∼푣 푤∼푢 푢∼푣 푤∈푃, 푤∼푢

again, a contradiction. This completes the proof of Property 4.

We now prove Property 5. Suppose that 푢ˆ maximizes |x푢| and 푣ˆ maximizes |z푣|, and, without loss of generality, 푣ˆ ∈ 푁. Let

{︂ ⃒ }︂ {︂ ⃒ }︂ ⃒ x푢^ ⃒ 2 |z푣^| 퐴 = 푤 ⃒ x푤 > and 퐵 = 푤 ⃒ z푤 > min{1, 푐 } . ⃒ 3 ⃒ 100

We have ∑︁ (︀ )︀ 휆1x푢^ = x푣 ≤ x푢^ |퐴| + (푛 − |퐴|)/3 푣∼푢^

and 휆1 ≥ 푛/2, and so |퐴| ≥ 푛/4. In addition,

∑︁ ‖x‖2 푛 ‖x‖2 1 ≥ x2 ≥ |퐴| ∞ ≥ ∞ , 푣 9 36 푣∈퐴

1/2 so ‖x‖∞ ≤ 6/푛 . Similarly,

∑︁ (︂ min{1, 푐2} )︂ 휆 z = z ≤ |z | |퐵| + (푛 − |퐵|) 푛 푣^ 푢 푣^ 100 푢∼푣^

2 2 and |휆푛(퐺)| ≥ min{1, 푐 } 푛/50, and so |퐵| ≥ min{1, 푐 } 푛/100. In addition,

∑︁ ‖z‖2 ‖z‖2 1 ≥ z2 ≥ |퐵| min{1, 푐4} ∞ ≥ min{1, 푐6} 푛 ∞ , 푢 104 106 푢∈퐵

3 −3 1/2 so ‖z‖∞ ≤ 10 max{1, 푐 }/푛 . This completes the proof of Property 5.

Finally, we prove Property 6. Let 퐺˜ = (푉 (퐺˜), 퐸(퐺˜)) be the graph resulting from deleting 푢 from 퐺 and cloning 푣, namely,

푉 (퐺˜) = {푣′} ∪ [︀푉 (퐺) ∖ {푢}]︀ and 퐸(퐺˜) = 퐸(퐺 ∖ {푢}) ∪ {{푣′, 푤} | {푣, 푤} ∈ 퐸(퐺)}.

92 Let 퐴˜ be the adjacency matrix of 퐺˜, and ˜x and ˜z equal

⎧ ⎧ ′ ′ ⎨⎪x푤 푤 ̸= 푣 ⎨⎪z푤 푤 ̸= 푣 ˜x푤 = and ˜z푤 = . ′ ′ ⎩⎪x푣 푤 = 푣 , ⎩⎪z푣 푤 = 푣 .

Then

푇 2 2 ˜x ˜x = 1 − x푢 + x푣,

푇 2 2 ˜z ˜z = 1 − z푢 + z푣,

and

푇 ˜ 2 2 ˜x 퐴˜x = 휆1 − 2휆1x푢 + 2휆1x푣 − 2퐴푢푣x푢x푣, 푇 ˜ 2 2 ˜z 퐴˜z = 휆푛 − 2휆푛z푢 + 2휆푛z푣 − 2퐴푢푣z푢z푣.

By the optimality of 퐺,

(︃ )︃ ˜x푇 퐴˜˜x ˜z푇 퐴˜˜z 휆 (퐺) − 푐 휆 (퐺) − − 푐 ≥ 0. 1 푛 ˜x푇 ˜x ˜z푇 ˜z

Rearranging terms, we have

2 2 2 2 휆1x푢 − 휆1x푣 + 2퐴푢,푣x푢x푣 휆푛z푢 − 휆푛z푣 + 2퐴푢,푣z푢z푣 2 2 − 푐 2 2 ≥ 0, 1 − x푢 + x푣 1 − z푢 + z푣

and so

[︂ 2 2 2 2 2 2 2 휆1(x푢 − x푣) + 2퐴푢,푣x푢x푣 (휆1x푢 − 푐 휆푛z푢) − (휆1x푣 − 푐 휆푛z푣) ≥ − 2 2 1 − x푢 + x푣 2 2 2 ]︂ 휆푛(z푢 − z푣) + 2퐴푢,푣z푢z푣 − 푐 2 2 . 1 − z푢 + z푣

The inﬁnity norms of x and z are at most 6/푛1/2 and 103 max{1, 푐−3}/푛1/2, respectively,

93 and so we can upper bound

⃒ 2 2 2 ⃒ ⃒ 4 2 ⃒ 4 2 ⃒휆1(x푢 − x푣) + 2퐴푢,푣x푢x푣 ⃒ ⃒4푛‖x‖∞ + 2‖x‖∞ ⃒ 4 · 6 + 2 · 6 ⃒ 2 2 ⃒ ≤ ⃒ 2 ⃒ ≤ , ⃒ 1 − x푢 + x푣 ⃒ ⃒ 1 − 2‖x‖∞ ⃒ 푛/2 and

⃒ 2 2 2 ⃒ ⃒ 4 2 ⃒ 12 6 −12 ⃒휆푛(z푢 − z푣) + 2퐴푢,푣z푢z푣 ⃒ ⃒2푛‖z‖∞ + 2‖z‖∞ ⃒ 2(10 + 10 ) max{1, 푐 } ⃒ 2 2 ⃒ ≤ ⃒ 2 ⃒ ≤ ⃒ 1 − z푢 + z푣 ⃒ ⃒ 1 − 2‖z‖∞ ⃒ 푛/2 for 푛 ≥ 4 · 106 max{1, 푐−6}. Our choice of 푢 and 푣 was arbitrary, and so

⃒ 2 2 2 2 ⃒ 12 −12 ⃒(휆1x푢 − 푐 휆푛z푢) − (휆1x푣 − 푐 휆푛z푣)⃒ ≤ 5 · 10 max{푐, 푐 }/푛

∑︀ 2 2 for all 푢, 푣 ∈ 푉 . Noting that 푢(휆1x푢 − 푐 휆푛z푢) = 휆1 − 푐 휆푛 completes the proof.

The generality of this result implies that similar techniques to those used to prove the spread conjecture could be used to understand the behavior of graphs that maximize 휆1 −푐 휆푛 and how the graphs vary with the choice of 푐 ∈ (0, ∞).

94 Chapter 4

Force-Directed Layouts

4.1 Introduction

Graph drawing is an area at the intersection of mathematics, computer science, and more qualitative fields. Despite the extensive literature in the field, in many ways the concept of what constitutes the optimal drawing of a graph is heuristic at best, and subjective at worst. For a general review of the major areas of research in graph drawing, we refer the reader to [7, 57]. In this chapter, we briefly introduce the broad class of force-directed layouts of graphs and consider two prominent force-based techniques for drawing a graph in low-dimensional Euclidean space: Tutte’s spring embedding and metric multidimensional scaling. The concept of a force-directed layout is somewhat ambiguous, but, loosely defined, it is a technique for drawing a graph in a low-dimensional Euclidean space (usually dimension ≤ 3) by applying “forces” between the set of vertices and/or edges. In a force-directed layout, vertices connected by an edge (or at a small graph distance from each other) tend to be close to each other in the resulting layout. Below we briefly introduce a number of prominent force-directed layouts, including Tutte’s spring embedding, Eades’ algorithm, the Kamada-Kawai objective and algorithm, and the more recent UMAP algorithm. In his 19631 work titled “How to Draw a Graph,” Tutte found an elegant technique

1The paper was received in May 1962, but was published in 1963. For this reason, in [124], the year is

95 to produce planar embeddings of planar graphs that also minimize “energy” (the sum of squared edge lengths) in some sense [116]. In particular, for a three-connected planar graph, he showed that if the outer face of the graph is fixed as the complement of some convex region in the plane, and every other point is located at the mass center of its neighbors, then the resulting embedding is planar. This result is now known as Tutte’s spring embedding theorem, and is considered by many to be the first example of a force-directed layout. One of the major questions that this result does not treat is how to best embed the outer face. In Section 4.2, we investigate this question, consider connections to a Schur complement, and provide some theoretical results for this Schur complement using a discrete energy-only trace theorem. An algorithm for general graphs was later proposed by Eades in 1984, in which vertices are placed in a random initial position in the plane, a logarithmic spring force is applied to each edge and a repulsive inverse square-root law force is applied to each pair of non- adjacent vertices [38]. The algorithm proceeds by iteratively moving each vertex towards its local equilibrium. Five years later, Kamada and Kawai proposed an objective function corresponding to the situation in which each pair of vertices are connected by a spring with equilibrium length given by their graph distance and spring constant given by the inverse square of their graph distance [55]. Their algorithm for locally minimizing this objective consists of choosing vertices iteratively based on the magnitude of the associated derivative of the objective, and locally minimizing with respect to that vertex. The associated objective function is a specific example of metric multidimensional scaling, and both the objective function and the proposed algorithm are quite popular in practice (see, for instance, popular packages such as GRAPHVIZ [39] and the SMACOF package in R). In Section 4.3, we provide a theoretical analysis of the Kamada-Kawai objective function. In particular, we prove a number of structural results regarding optimal layouts, provide algorithmic lower bounds for the optimization problem, and propose a polynomial time randomized approximation scheme for drawing low diameter graphs. Most recently, a new force-based layout algorithm called UMAP was proposed as an referred to as 1962.

96 alternative to the popular t-SNE algorithm [82]. The UMAP algorithm takes a data set as input, constructs a weighted graph, and performs a fairly complex force-directed layout in which attractive and repulsive forces governed by a number of hyper-parameters are applied. For the exact details of this approach, we refer the reader to [82, Section 3.2]. Though the UMAP algorithm is not speciﬁcally analyzed in this chapter, it is likely that some the techniques proposed in this chapter can be applied (given a much more complicated analysis) to more complicated force-directed layout objectives, such as that of the UMAP algorithm. In addition, the popularity of the UMAP algorithm illustrates the continued importance and relevance of force-based techniques and their analysis. In the remainder of this chapter, we investigate both Tutte’s spring embedding and the Kamada-Kawai objective function. In Section 4.2, we formally describe Tutte’s spring embedding and consider theoretical questions regarding the best choice of boundary. In particular, we consider connections to a Schur complement and, using a discrete version of a trace theorem from the theory of elliptic PDEs, provide a spectral equivalence result for this matrix. In Section 4.3, we formally deﬁne the Kamada-Kawai objective, and prove a number of theoretical results for this optimization problem. We lower bound the objective value and upper bound the diameter of any optimal layout, prove that a gap version of the optimization problem is NP-hard even for bounded diameter graph metrics, and provide a polynomial time randomized approximation scheme for low diameter graphs.

97 4.2 Tutte’s Spring Embedding and a Trace Theorem

The question of how to “best” draw a graph in the plane is not an easy one to answer, largely due to the ambiguity in the objective. When energy (i.e. Hall’s energy, the sum of squared distances between adjacent vertices) minimization is desired, the optimal embedding in the plane is given by the two-dimensional diffusion map induced by the eigenvectors of the two smallest non-zero eigenvalues of the graph Laplacian [60, 61, 62]. This general class of graph drawing techniques is referred to as spectral layouts. When drawing a planar graph, often a planar embedding (a drawing in which edges do not intersect) is desirable. However, spectral layouts of planar graphs are not guaranteed to be planar. When looking at a triangulation of a given domain, it is commonplace for the near-boundary points of the spectral layout to “grow” out of the boundary, or lack any resemblance to a planar embedding. For instance, see the spectral layout of a random triangulation of a disk in Figure 4-1.

Tutte’s spring embedding is an elegant technique to produce planar embeddings of planar graphs that also minimize energy in some sense [116]. In particular, for a three-connected planar graph, if the outer face of the graph is ﬁxed as the complement of some convex region in the plane, and every other point is located at the mass center of its neighbors, then the resulting embedding is planar. This embedding minimizes Hall’s energy, conditional on the embedding of the boundary face. This result is now known as Tutte’s spring embedding theorem. While this result is well known (see [59], for example), it is not so obvious how to embed the outer face. This, of course, should vary from case to case, depending on the dynamics of the interior.

In this section, we investigate how to embed the boundary face so that the resulting drawing is planar and Hall’s energy is minimized (subject to some normalization). In what follows, we produce an algorithm with theoretical guarantees for a large class of three- connected planar graphs. Our analysis begins by observing that the Schur complement of the graph Laplacian with respect to the interior vertices is, in some sense, the correct matrix to consider when choosing an optimal embedding of boundary vertices. See Figure 4-2

98 (a) Circle (b) Spectral Layout (c) Schur Complement Layout

Figure 4-1: A Delaunay triangulation of 1250 points randomly generated on the disk (A), its non-planar spectral layout (B), and a planar layout using a spring embedding of the Schur complement of the graph Laplacian with respect to the interior vertices (C).

for a visual example of a spring embedding using the two minimal non-trivial eigenvectors of the Schur complement. In order to theoretically understand the behavior of the Schur complement, we prove a discrete trace theorem. Trace theorems are a class of results in the theory of partial differential equations relating norms on the domain to norms on the boundary, which are used to provide a priori estimates on the Dirichlet integral of functions with given data on the boundary. We construct a discrete version of a trace theorem in the plane for energy-only semi-norms. Using a discrete trace theorem, we show that this Schur complement is spectrally equivalent to the boundary Laplacian to the one-half power. This spectral equivalence proves the existence of low energy (w.r.t. the Schur complement) convex and planar embeddings of the boundary, but is also of independent interest and applicability in the study of spectral properties of planar graphs. These theoretical guarantees give rise to a simple graph drawing algorithm with provable guarantees.

The remainder of this section is as follows. In Subsection 4.2.1, we formally introduce Tutte’s spring embedding theorem, describe the optimization problem under consideration, illustrate the connection to a Schur complement, and, conditional on a spectral equivalence, describe a simple algorithm with provable guarantees. In Subsection 4.2.2, we prove the aforementioned spectral equivalence for a large class of three-connected planar graphs. In particular, we consider trace theorems for Lipschitz domains from the theory of elliptic

99 (a) Laplacian Embedding (b) Schur Complement Embedding

Figure 4-2: A visual example of embeddings of the 2D ﬁnite element discretization graph 3elt, taken from the SuiteSparse Matrix Collection [27]. Figure (A) is the non-planar spectral layout of this 2D mesh, and Figure (B) is a planar spring embedding of the mesh using the minimal non-trivial eigenvectors of the Schur complement to embed the boundary.

partial differential equations, prove discrete energy-only variants of these results for a large class of graphs in the plane, and show that the Schur complement with respect to the interior is spectrally equivalent to the boundary Laplacian to the one-half power.

Deﬁnitions and Notation

Let 퐺 = (푉, 퐸), 푉 = {1, ..., 푛}, 퐸 ⊂ {푒 ⊂ 푉 | |푒| = 2}, be a simple, connected, undirected graph. A graph 퐺 is 푘-connected if it remains connected upon the removal of any 푘 − 1 vertices, and is planar if it can be drawn in the plane such that no edges intersect (save for adjacent edges at their mutual endpoint). A face of a planar embedding of a graph is a region of the plane bounded by edges (including the outer inﬁnite region, referred to as the outer

face). Let 풢푛 be the set of all ordered pairs (퐺, Γ), where 퐺 is a simple, undirected, planar, three-connected graph of order 푛 > 4, and Γ ⊂ 푉 is the set of vertices of some face of 퐺. Three-connectedness is an important property for planar graphs, which, by Steinitz’s theorem, guarantees that the graph is the skeleton of a convex polyhedron [107]. This characterization implies that, for three-connected graphs, the edges corresponding to each face in a planar embedding are uniquely determined by the graph. In particular, the set of

100 faces is simply the set of induced cycles, so we may refer to faces of the graph without specifying an embedding. One important corollary of this result is that, for 푛 ≥ 3, the vertices of any face form an induced simple cycle.

Let 푁퐺(푖) be the neighborhood of vertex 푖, 푁퐺(푆) be the union of the neighborhoods

of the vertices in 푆, and 푑퐺(푖, 푗) be the distance between vertices 푖 and 푗 in the graph 퐺. When the associated graph is obvious, we may remove the subscript. Let 푑(푖) be the

degree of vertex 푖. Let 퐺[푆] be the graph induced by the vertices 푆, and 푑푆(푖, 푗) be the distance between vertices 푖 and 푗 in 퐺[푆]. If 퐻 is a subgraph of 퐺, we write 퐻 ⊂ 퐺. The

Cartesian product 퐺1퐺2 between 퐺1 = (푉1, 퐸1) and 퐺2 = (푉2, 퐸2) is the graph with vertices (푣1, 푣2) ∈ 푉1 × 푉2 and edge {(푢1, 푢2), (푣1, 푣2)} ∈ 퐸 if (푢1, 푣1) ∈ 퐸1 and 푢2 = 푣2, 푛×푛 or 푢1 = 푣1 and (푢2, 푣2) ∈ 퐸2. The graph Laplacian 퐿퐺 ∈ R of 퐺 is the symmetric positive semi-deﬁnite matrix deﬁned by

∑︁ 2 ⟨퐿퐺푥, 푥⟩ = (푥푖 − 푥푗) , {푖,푗}∈퐸

and, in general, a matrix is the graph Laplacian of some weighted graph if it is symmetric diagonally dominant, has non-positive off-diagonal entries, and the vector 1 := (1, ..., 1)푇

푡ℎ 푡ℎ lies in its nullspace. Given a matrix 퐴, we denote the 푖 row by 퐴푖,·, the 푗 column by 퐴·,푗, 푡ℎ 푡ℎ and the entry in the 푖 row and 푗 column by 퐴푖,푗.

4.2.1 Spring Embeddings and a Schur Complement

Here and in what follows, we refer to Γ as the “boundary” of the graph 퐺, 푉 ∖Γ as the

1/2 “interior,” and generally assume 푛Γ := |Γ| to be relatively large (typically 푛Γ = Θ(푛 )). The concept of a “boundary” face is somewhat arbitrary, but, depending on the application from which the graph originated (i.e., a discretization of some domain), one face is often already designated as the boundary face. If a face has not been designated, choosing the largest induced cycle is a reasonable choice. By producing a planar drawing of 퐺 in the plane and traversing the embedding, one can easily ﬁnd all the induced cycles of 퐺 in linear time and space [20].

101 푛×2 Without loss of generality, suppose that Γ = {푛 − 푛Γ + 1, ..., 푛}. A matrix 푋 ∈ R is said to be a planar embedding of 퐺 if the drawing of 퐺 using straight lines and with vertex 푖

푛Γ×2 located at coordinates 푋푖,· for all 푖 ∈ 푉 is a planar drawing. A matrix 푋Γ ∈ R is said to be a convex embedding of Γ if the embedding is planar and every point is an extreme

푛Γ point of the convex hull conv({[푋Γ]푖,·}푖=1). Tutte’s spring embedding theorem states that if

푋Γ is a convex embedding of Γ, then the system of equations

⎧ ⎪ 1 ∑︀ ⎨ 푑(푖) 푗∈푁(푖) 푋푗,· 푖 = 1, ..., 푛 − 푛Γ 푋푖,· = ⎪ ⎩[푋Γ]푖−(푛−푛Γ),· 푖 = 푛 − 푛Γ + 1, ..., 푛 has a unique solution 푋, and this solution is a planar embedding of 퐺 [116].

We can write both the Laplacian and embedding of 퐺 in block-notation, differentiating between interior and boundary vertices as follows:

⎛ ⎞ ⎛ ⎞ 퐿표 + 퐷표 −퐴표,Γ 푛×푛 푋표 푛×2 퐿퐺 = ⎝ ⎠ ∈ R , 푋 = ⎝ ⎠ ∈ R , 푇 −퐴표,Γ 퐿Γ + 퐷Γ 푋Γ

(푛−푛Γ)×(푛−푛Γ) 푛Γ×푛Γ (푛−푛Γ)×푛Γ (푛−푛Γ)×2 where 퐿표, 퐷표 ∈ R , 퐿Γ, 퐷Γ ∈ R , 퐴표,Γ ∈ R , 푋표 ∈ R ,

푛Γ×2 푋Γ ∈ R , and 퐿표 and 퐿Γ are the Laplacians of 퐺[푉 ∖Γ] and 퐺[Γ], respectively. Using block notation, the system of equations for the Tutte spring embedding of some convex

embedding 푋Γ is given by

−1 푋표 = (퐷표 + 퐷[퐿표]) [(퐷[퐿표] − 퐿표)푋표 + 퐴표,Γ푋Γ], where 퐷[퐴] is the diagonal matrix with diagonal entries given by the diagonal of 퐴. The unique solution to this system is

−1 푋표 = (퐿표 + 퐷표) 퐴표,Γ푋Γ.

We note that this choice of 푋표 not only guarantees a planar embedding of 퐺, but also

102 minimizes Hall’s energy, namely,

−1 arg min ℎ(푋) = (퐿표 + 퐷표) 퐴표,Γ푋Γ, 푋표 where ℎ(푋) := Tr(푋푇 퐿푋) (see [62] for more on Hall’s energy).

While Tutte’s theorem is a very powerful result, guaranteeing that, given a convex embedding of any face, the energy minimizing embedding of the remaining vertices results in a planar embedding, it gives no direction as to how this outer face should be embedded.

We consider embeddings of the boundary that minimize Hall’s energy, given some

푇 푇 normalization. We consider embeddings that satisfy 푋Γ 푋Γ = 퐼 and 푋Γ 1 = 0, though other normalizations, such as 푋푇 푋 = 퐼 and 푋푇 1 = 0, would be equally appropriate. The analysis that follows in this section can be readily applied to this alternate normalization, but it does require the additional step of verifying a norm equivalence between 푉 and Γ for the harmonic extension of low energy vectors, which can be produced relatively easily for the class of graphs considered in Subsection 4.2.2. In addition, alternate normalization techniques, such as the introduction of negative weights on the boundary cycle, can also

푇 be considered. Let 풳 be the set of all planar embeddings 푋Γ that satisfy 푋Γ 푋Γ = 퐼, 푇 −1 푋Γ 1 = 0, and for which the spring embedding 푋 with 푋표 = (퐿표 + 퐷표) 퐴표,Γ푋Γ is planar. We consider the optimization problem

min ℎ(푋) 푠.푡. 푋Γ ∈ cl(풳 ), (4.1) where cl(·) is the closure of a set. The set 풳 is not closed, and the minimizer of (4.1) may be non-planar, but must be arbitrarily close to a planar embedding. The normalizations

푇 푇 푋Γ 1 = 0 and 푋Γ 푋Γ = 퐼 ensure that the solution does not degenerate into a single point or line. In what follows we are primarily concerned with approximately solving this

optimization problem and connecting this problem to the Schur complement of 퐿퐺 with respect to 푉 ∖Γ. It is unclear whether there exists an efﬁcient algorithm to solve (4.1) exactly or if the associated decision problem is NP-hard. If (4.1) is NP-hard, it seems rather difﬁcult to verify that this is indeed the case.

103 A Schur Complement

Given some choice of 푋Γ, by Tutte’s theorem the minimum value of ℎ(푋) is attained when −1 푋표 = (퐿표 + 퐷표) 퐴표,Γ푋Γ, and given by

⎡ ⎛ ⎞ ⎛ ⎞⎤ −1 (︁ )︁ 퐿표 + 퐷표 −퐴표,Γ (퐿표 + 퐷표) 퐴표,Γ푋Γ −1 푇 푇 Tr ⎣ [(퐿표 + 퐷표) 퐴표,Γ푋Γ] 푋 ⎝ ⎠ ⎝ ⎠⎦ Γ 푇 −퐴표,Γ 퐿Γ + 퐷Γ 푋Γ

(︀ 푇 [︀ 푇 −1 ]︀ )︀ = Tr 푋Γ 퐿Γ + 퐷Γ − 퐴표,Γ(퐿표 + 퐷표) 퐴표,Γ 푋Γ (︀ 푇 )︀ = Tr 푋Γ 푆Γ푋Γ ,

where 푆Γ is the Schur complement of 퐿퐺 with respect to 푉 ∖Γ,

푇 −1 푆Γ = 퐿Γ + 퐷Γ − 퐴표,Γ(퐿표 + 퐷표) 퐴표,Γ.

For this reason, we can instead consider the optimization problem

min ℎΓ(푋Γ) 푠.푡. 푋Γ ∈ cl(풳 ), (4.2) where (︀ 푇 )︀ ℎΓ(푋Γ) := Tr 푋Γ 푆Γ푋Γ .

Therefore, if the minimal two non-trivial eigenvectors of 푆Γ produce a planar spring embedding, then this is the exact solution of (4.2). However, a priori, there is no reason to think that this drawing would be planar. It turns out that, for a large class of graphs with some macroscopic structure, this is often the case (see, for example, Figures 4-1 and 4-2), and, in the rare instance that it is not, through a spectral equivalence result, a low energy planar and convex embedding of the boundary always exists. This fact follows from the 1/2 spectral equivalence of 푆Γ and 퐿Γ , which is shown in Subsection 4.2.2. First, we present a number of basic properties of the Schur complement of a graph Laplacian. For more information on the Schur complement, we refer the reader to [18, 40, 134].

104 푛×푛 Proposition 8. Let 퐺 = (푉, 퐸), 푛 = |푉 |, be a graph and 퐿퐺 ∈ R the associated graph 푛 Laplacian. Let 퐿퐺 and vectors 푣 ∈ R be written in block form

⎛ ⎞ ⎛ ⎞ 퐿11 퐿12 푣1 퐿(퐺) = ⎝ ⎠ , 푣 = ⎝ ⎠ , 퐿21 퐿22 푣2

푚×푚 푚 where 퐿22 ∈ R , 푣2 ∈ R , and 퐿12 ̸= 0. Then

−1 (1) 푆 = 퐿22 − 퐿21퐿11 퐿12 is a graph Laplacian,

∑︀푚 푇 푇 −1 (2) 푖=1(푒푖 퐿221푚)푒푖푒푖 − 퐿21퐿11 퐿12 is a graph Laplacian,

(3) ⟨푆푤, 푤⟩ = inf{⟨퐿푣, 푣⟩|푣2 = 푤}.

⎛ −1 ⎞ −퐿11 퐿12 Proof. Let 푃 = ⎝ ⎠ ∈ R푛×푚. Then 퐼

⎛ ⎞ ⎛ −1 ⎞ (︁ )︁ 퐿11 퐿12 −퐿 퐿12 푇 −1 11 −1 푃 퐿푃 = −퐿21퐿11 퐼 ⎝ ⎠ ⎝ ⎠ = 퐿22 − 퐿21퐿11 퐿12 = 푆. 퐿21 퐿22 퐼

−1 Because 퐿111푛−푚 + 퐿121푚 = 0, we have 1푛−푚 = −퐿11 퐿121푚. Therefore 푃 1푚 = 1푛, and, as a result, 푇 푇 푇 푆1푚 = 푃 퐿푃 1푚 = 푃 퐿1푛 = 푃 0 = 0.

In addition,

[︃ 푚 ]︃ [︂ 푚 ]︂ ∑︁ 푇 푇 −1 ∑︁ 푇 푇 (푒푖 퐿221푚)푒푖푒푖 − 퐿21퐿11 퐿12 1푚 = (푒푖 퐿221푚)푒푖푒푖 − 퐿22 1푚 + 푆1푚 푖=1 푖=1 푚 ∑︁ 푇 = (푒푖 퐿221푚)푒푖 − 퐿221푚 푖=1 [︂ 푚 ]︂ ∑︁ 푇 = 푒푖푒푖 − 퐼푚 퐿221푚 = 0. 푖=1

−1 −1 퐿11 is an M-matrix, so 퐿11 is a non-negative matrix. 퐿21퐿11 퐿12 is the product of three non- negative matrices, and so must also be non-negative. Therefore, the off-diagonal entries of

105 ∑︀푚 푇 푇 −1 푆 and 푖=1(푒푖 퐿221)푒푖푒푖 − 퐿21퐿11 퐿12 are non-positive, and so both are graph Laplacians. Consider

⟨퐿푣, 푣⟩ = ⟨퐿11푣1, 푣1⟩ + 2⟨퐿12푣2, 푣1⟩ + ⟨퐿22푣2, 푣2⟩, with 푣2 ﬁxed. Because 퐿11 is symmetric positive deﬁnite, the minimum occurs when

휕 ⟨퐿푣, 푣⟩ = 2퐿11푣1 + 2퐿12푣2 = 0. 휕푣1

−1 Setting 푣1 = −퐿11 퐿12푣2, the desired result follows.

The above result illustrates that the Schur complement Laplacian 푆Γ is the sum of two 푇 −1 Laplacians 퐿Γ and 퐷Γ − 퐴표,Γ(퐿표 + 퐷표) 퐴표,Γ, where the ﬁrst is the Laplacian of 퐺[Γ], and the second is a Laplacian representing the dynamics of the interior.

An Algorithm

Given the above analysis for 푆Γ, and assuming a spectral equivalence result of the form

1 1/2 1/2 ⟨퐿Γ 푥, 푥⟩ ≤ ⟨푆Γ푥, 푥⟩ ≤ 푐2 ⟨퐿Γ 푥, 푥⟩, (4.3) 푐1

푛Γ for all 푥 ∈ R and some constants 푐1 and 푐2 that are not too large, we can construct a simple

algorithm for producing a spring embedding. Constants 푐1 and 푐2 can be computed explicitly using the results of Subsection 4.2.2, and are reasonably sized if the graph resembles a discretization of some planar domain (i.e., a graph that satisﬁes the conditions of Theorem 15). Given these two facts, a very natural algorithm consists of the following steps. First we compute the two eigenvectors corresponding to the two minimal non-trivial eigenvalues of

the Schur complement 푆Γ. If these two eigenvectors produce a planar embedding, we are done, and have solved the optimization problem (4.2) exactly. Depending on the structure of the graph, this may often be the case. For instance, this occurs for the graphs in Figures 4-1 and 4-2, and the numerical results of [124] conﬁrm that this also occurs often for discretizations of circular and rectangular domains. If this is not the case, then we consider

106 Algorithm 3 A Schur Complement-Based Spring Embedding Algorithm function SchurSPRING(퐺, Γ) 푛Γ×2 Compute the two minimal non-trivial eigenpairs 푋푎푙푔 ∈ R of 푆Γ if spring embedding of 푋푎푙푔 is non-planar then {︀ 2 (︀ 2휋푗 2휋푗 )︀}︀푛Γ 푋푛푒푤 ← cos , sin ; gap ← 1 푛Γ 푛Γ 푛Γ 푗=1 while spring embedding of 푋푛푒푤 is planar, gap > 0 do ˆ 푋 ← smooth(푋푛푒푤); 푋푎푙푔 ← 푋푛푒푤 ˆ ˆ 푇 ˆ 푋 ← 푋 − 11 푋/푛Γ ˆ 푇 ˆ ˆ −1/2 Solve [푋 푋]푄 = 푄Λ, 푄 orthogonal, Λ diagonal; set 푋푛푒푤 ← 푋푄Λ gap ← ℎΓ(푋푎푙푔) − ℎΓ(푋푛푒푤) end while end if Return 푋푎푙푔 end function

the boundary embedding

2 (︂ 2휋푗 2휋푗 )︂ [푋퐶 ]푗,· = cos , sin , 푗 = 1, ..., 푛Γ, 푛Γ 푛Γ 푛Γ

1/2 namely, the embedding of the two minimal non-trivial eigenvectors of 퐿Γ . This choice of boundary is convex and planar, and so the associated spring embedding is planar. By spectral equivalence,

휋 ℎΓ(푋퐶 ) ≤ 4푐2 sin ≤ 푐1푐2 min ℎΓ(푋Γ), 푛Γ 푋Γ∈cl(풳 )

and therefore, this algorithm already produces a 푐1푐2 approximation guarantee for (4.2).

The choice of 푋퐶 can often be improved, and a natural technique consists of smoothing

푋퐶 using 푆Γ and renormalizing until either the resulting drawing is no longer planar or the objective value no longer decreases. We describe this procedure in Algorithm 3.

We now discuss some of the ﬁner details of the SchurSPRING(퐺, Γ) algorithm (Algo- rithm 3). Determining whether a drawing is planar can be done in near-linear time using the sweep line algorithm [102]. Also, in practice, it is advisable to replace conditions of the form gap > 0 in Algorithm 3 by gap > tol for some small value of tol, in order to ensure that the algorithm terminates after some ﬁnite number of steps. For a reasonable choice of

107 1/2 smoother, a constant tolerance, and 푛Γ = Θ(푛 ), the SchurSPRING(퐺, Γ) algorithm has complexity near-linear in 푛. The main cost of this procedure is due to the computations that

involve 푆Γ.

−1 The SchurSPRING(퐺, Γ) algorithm requires the repeated application of 푆Γ or 푆Γ in

order to compute the minimal eigenvectors of 푆Γ and also to perform smoothing. The Schur

complement 푆Γ is a dense matrix and requires the inversion of a (푛 − 푛Γ) × (푛 − 푛Γ) matrix, but can be represented as the composition of functions of sparse matrices. In practice,

푆Γ should never be formed explicitly. Rather, the operation of applying 푆Γ to a vector 푥

should occur in two steps. First, the sparse Laplacian system (퐿표 + 퐷표)푦 = 퐴표,Γ푥 should 푇 be solved for 푦, and then the product 푆푥 is given by 푆Γ푥 = (퐿Γ + 퐷Γ)푥 − 퐴표,Γ푦. Each ˜ application of 푆Γ is therefore an 푂(푛) procedure (using a nearly-linear time Laplacian −1 solver). The application of the inverse 푆Γ deﬁned on the subspace {푥 | ⟨푥, 1⟩ = 0} also −1 requires the solution of a Laplacian system. As noted in [130], the action of 푆Γ on a vector 푥 ∈ {푥 | ⟨푥, 1⟩ = 0} is given by

⎛ ⎞−1 ⎛ ⎞ −1 (︁ )︁ 퐿표 + 퐷표 −퐴표,Γ 0 푆Γ 푥 = 0 퐼 ⎝ ⎠ ⎝ ⎠ , 푇 −퐴표,Γ 퐿Γ + 퐷Γ 푥 as veriﬁed by the computation

⎡⎛ ⎞ ⎛ ⎞⎤−1 ⎛ ⎞ 퐼 0 퐿 + 퐷 −퐴 0 [︀ −1 ]︀ (︁ )︁ 표 표 표,Γ 푆Γ 푆Γ 푥 = 푆Γ 0 퐼 ⎣⎝ ⎠ ⎝ ⎠⎦ ⎝ ⎠ 푇 −1 −퐴표,Γ(퐿표 + 퐷표) 퐼 0 푆Γ 푥 ⎛ ⎞−1 ⎛ ⎞ ⎛ ⎞ (︁ )︁ 퐿표 + 퐷표 −퐴표,Γ 퐼 0 0 = 푆Γ 0 퐼 ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ 푇 −1 0 푆Γ 퐴표,Γ(퐿표 + 퐷표) 퐼 푥 ⎛ ⎞−1 ⎛ ⎞ (︁ )︁ 퐿표 + 퐷표 −퐴표,Γ 0 = 푆Γ 0 퐼 ⎝ ⎠ ⎝ ⎠ = 푥. 0 푆Γ 푥

−1 Given that the application of 푆Γ has the same complexity as an application 푆Γ, the inverse power method is naturally preferred over the shifted power method for both smoothing and the computation of low energy eigenvectors.

108 4.2.2 A Discrete Trace Theorem and Spectral Equivalence

The main result of this subsection takes classical trace theorems from the theory of partial differential equations and extends them to a class of planar graphs. However, for our purposes, we require a stronger form of trace theorem, one between energy semi-norms (i.e., no ℓ2 term), which we refer to as “energy-only” trace theorems. These energy-only trace theorems imply their classical variants with ℓ2 terms almost immediately. We then 1/2 use these new results to prove the spectral equivalence of 푆Γ and 퐿Γ for the class of graphs under consideration. This class of graphs is rigorously deﬁned below, but includes planar three-connected graphs that have some regular structure (such as graphs of ﬁnite element discretizations). For the sake of space and readibility, a number of the results of this subsection are stated for arbitrary constants, or constants larger than necessary. More complicated proofs with explicit constants (or, in some cases, improved constants) can be found in [124]. We begin by formally describing a classical trace theorem.

Let Ω ⊂ R푑 be a domain with boundary Γ = 훿Ω that, locally, is a graph of a Lipschitz function. 퐻1(Ω) is the Sobolev space of square integrable functions with square integrable weak gradient, with norm

∫︁ 2 2 2 2 2 2 2 ‖푢‖1,Ω = ‖∇푢‖퐿 (Ω) + ‖푢‖퐿 (Ω), where ‖푢‖퐿2(Ω) = 푢 푑푥. Ω

Let ∫︁∫︁ (휙(푥) − 휙(푦))2 ‖휙‖2 = ‖휙‖2 + 푑푥 푑푦 1/2,Γ 퐿2(Γ) 푑 Γ×Γ |푥 − 푦| for functions deﬁned on Γ, and denote by 퐻1/2(Γ) the Sobolev space of functions deﬁned on

1 the boundary Γ for which ‖ · ‖1/2,Γ is ﬁnite. The trace theorem for functions in 퐻 (Ω) is one of the most important and frequently used trace theorems in the theory of partial differential equations. More general results for traces on boundaries of Lipschitz domains, which involve 퐿푝 norms and fractional derivatives, are due to E. Gagliardo [42] (see also [24]).

Gagliardo’s theorem, when applied to the case of 퐻1(Ω) and 퐻1/2(Γ), states that if Ω ⊂ R푑

109 is a Lipschitz domain, then the norm equivalence

⃒ ‖휙‖1/2,Γ h inf{‖푢‖1,Ω ⃒ 푢|Γ = 휙} holds (the right hand side is indeed a norm on 퐻1/2(Γ)). These results are key tools in proving a priori estimates on the Dirichlet integral of functions with given data on the boundary of a domain Ω. Roughly speaking, a trace theorem gives a bound on the energy of a harmonic function via norm of the trace of the function on Γ = 휕Ω. In addition to the classical references given above, further details on trace theorems and their role in the analysis of PDEs (including the case of Lipschitz domains) can be found in [77, 87]. There are several analogues of this theorem for finite element spaces (finite dimensional subspaces of 퐻1(Ω)). For instance, in [86] it is shown that the finite element discretization of the Laplace-Beltrami operator on the boundary to the one-half power provides a norm which is equivalent to the 퐻1/2(Γ)-norm. Here we prove energy-only analogues of the classical trace theorem for graphs (퐺, Γ) ∈ 풢푛, using energy semi-norms

2 ∑︁ (휙(푝) − 휙(푞)) |푢|2 = ⟨퐿 푢, 푢⟩ and |휙|2 = . 퐺 퐺 Γ 푑2 (푝, 푞) 푝,푞∈Γ, 퐺 푝<푞

The energy semi-norm | · |퐺 is a discrete analogue of ‖∇푢‖퐿2(Ω), and the boundary ∫︀∫︀ (휙(푥)−휙(푦))2 semi-norm | · |Γ is a discrete analogue of the quantity Γ×Γ |푥−푦|2 푑푥 푑푦. In addition, by connectivity, | · |퐺 and | · |Γ are norms on the quotient space orthogonal to 1. We aim to prove that for any 휙 ∈ R푛Γ ,

1 |휙|Γ ≤ min |푢|퐺 ≤ 푐2 |휙|Γ 푐1 푢|Γ=휙 for some constants 푐1, 푐2 that do not depend on 푛Γ, 푛. We begin by proving these results for a simple class of graphs, and then extend our analysis to more general graphs. Some of the proofs of the below results are rather technical, and are omitted for the sake of readibility and space.

110 Trace Theorems for a Simple Class of Graphs

Let 퐺푘,ℓ = 퐶푘 푃ℓ be the Cartesian product of the 푘 vertex cycle 퐶푘 and the ℓ vertex path 푃ℓ, where 4ℓ < 푘 < 2푐ℓ for some constant 푐 ∈ N. The lower bound 4ℓ < 푘 is arbitrary in some sense, but is natural, given that the ratio of boundary length to in-radius of a convex region is

at least 2휋. Vertex (푖, 푗) in 퐺푘,ℓ corresponds to the product of 푖 ∈ 퐶푘 and 푗 ∈ 푃ℓ, 푖 = 1, ..., 푘, 푘 푘×ℓ 푗 = 1, ..., ℓ. The boundary of 퐺푘,ℓ is deﬁned to be Γ = {(푖, 1)}푖=1. Let 푢 ∈ R and 푘 휙 ∈ R be functions on 퐺푘,ℓ and Γ, respectively, with 푢[(푖, 푗)] denoted by 푢푖,푗 and 휙[(푖, 1)]

denoted by 휙푖. For the remainder of the section, we consider the natural periodic extension

of the vertices (푖, 푗) and the functions 푢푖,푗 and 휙푖 to the indices 푖 ∈ Z. In particular, if * * 푖 ̸∈ {1, ..., 푘}, then (푖, 푗) := (푖 , 푗), 휙푖 := 휙푖* , and 푢푖,푗 := 푢푖*,푗, where 푖 ∈ {1, ..., 푘} and * * 푖 = 푖 mod 푘. Let 퐺푘,ℓ be the graph resulting from adding to 퐺푘,ℓ all edges of the form {(푖, 푗), (푖 − 1, 푗 + 1)} and {(푖, 푗), (푖 + 1, 푗 + 1)}, 푖 = 1, ..., 푘, 푗 = 1, ..., ℓ − 1. We provide

* a visual example of 퐺푘,ℓ and 퐺푘,ℓ in Figure 4-3. First, we prove a trace theorem for 퐺푘,ℓ. The proof of the trace theorem consists of two lemmas. Lemma 10 shows that the discrete trace operator is bounded, and Lemma 11 shows that it has a continuous right inverse. Taken

together, these lemmas imply our desired result for 퐺푘,ℓ. We then extend this result to all * graphs 퐻 satisfying 퐺푘,ℓ ⊂ 퐻 ⊂ 퐺푘,ℓ.

푘 Lemma 10. Let 퐺 = 퐺푘,ℓ, 4ℓ < 푘 < 2푐ℓ, 푐 ∈ N, with boundary Γ = {(푖, 1)}푖=1. For any 푘×ℓ √ 푢 ∈ R , the vector 휙 = 푢|Γ satisﬁes |휙|Γ ≤ 4 푐 |푢|퐺.

Proof. We can decompose 휙푝+ℎ − 휙ℎ into a sum of differences, given by

푠−1 ℎ 푠−1 ∑︁ ∑︁ ∑︁ 휙푝+ℎ − 휙푝 = 푢푝+ℎ,푖 − 푢푝+ℎ,푖+1 + 푢푝+푖,푠 − 푢푝+푖−1,푠 + 푢푝,푠−푖+1 − 푢푝,푠−푖, 푖=1 푖=1 푖=1

111 ⌈︀ ℎ ⌉︀ where 푠 = 푐 is a function of ℎ. By Cauchy-Schwarz,

푘 ⌊푘/2⌋ (︂ )︂2 푘 ⌊푘/2⌋ (︃ 푠−1 )︃2 ∑︁ ∑︁ 휙푝+ℎ − 휙푝 ∑︁ ∑︁ 1 ∑︁ ≤ 3 푢 − 푢 ℎ ℎ 푝+ℎ,푖 푝+ℎ,푖+1 푝=1 ℎ=1 푝=1 ℎ=1 푖=1 푘 ⌊푘/2⌋ (︃ ℎ )︃2 ∑︁ ∑︁ 1 ∑︁ + 3 푢 − 푢 ℎ 푝+푖,푠 푝+푖−1,푠 푝=1 ℎ=1 푖=1 푘 ⌊푘/2⌋ (︃ 푠−1 )︃2 ∑︁ ∑︁ 1 ∑︁ + 3 푢 − 푢 . ℎ 푝,푠−푖+1 푝,푠−푖 푝=1 ℎ=1 푖=1

We bound the first and the second term separately. The third term is identical to the first. Using Hardy’s inequality [50, Theorem 326], we can bound the first term by

푘 ⌊푘/2⌋ (︃ 푠−1 )︃2 푘 ℓ (︃ 푠−1 )︃2 ∑︁ ∑︁ 1 ∑︁ ∑︁ ∑︁ 1 ∑︁ ∑︁ 푠2 푢 − 푢 = 푢 − 푢 ℎ 푝,푖 푝,푖+1 푠 푝,푖 푝,푖+1 ℎ2 푝=1 ℎ=1 푖=1 푝=1 푠=1 푖=1 ℎ:⌈ℎ/푐⌉=푠 1≤ℎ≤⌊푘/2⌋

푘 ℓ−1 2 ∑︁ ∑︁ 2 ∑︁ 푠 ≤ 4 (︀푢 − 푢 )︀ . 푝,푠 푝,푠+1 ℎ2 푝=1 푠=1 ℎ:⌈ℎ/푐⌉=푠 1≤ℎ≤⌊푘/2⌋

We have

푐푠 ∑︁ 푠2 ∑︁ 1 푠2(푐 − 1) 4(푐 − 1) 1 ≤ 푠2 ≤ ≤ ≤ ℎ2 푖2 (푐(푠 − 1) + 1)2 (푐 + 1)2 2 ℎ:⌈ℎ/푐⌉=푠 푖=푐(푠−1)+1 1≤ℎ≤⌊푘/2⌋

for 푠 ≥ 2 (푐 ≥ 3, by deﬁnition), and for 푠 = 1,

∞ ∑︁ 1 ∑︁ 1 휋2 ≤ = . ℎ2 푖2 6 ℎ:⌈ℎ/푐⌉=1 푖=1 1≤ℎ≤⌊푘/2⌋

112 * (a) 퐺16,3 = 퐶16푃3 (b) 퐺16,3

* Figure 4-3: A visual example of 퐺푘,ℓ and 퐺푘,ℓ for 푘 = 16, ℓ = 3. The boundary Γ is given by the outer (or, by symmetry, inner) cycle.

Therefore, we can bound the ﬁrst term by

2 푘 ⌊푘/2⌋ (︃ 푠−1 )︃ 2 푘 ℓ−1 ∑︁ ∑︁ 1 ∑︁ 2휋 ∑︁ ∑︁ 2 푢 − 푢 ≤ (︀푢 − 푢 )︀ . ℎ 푝,푖 푝,푖+1 3 푝,푠 푝,푠+1 푝=1 ℎ=1 푖=1 푝=1 푠=1

For the second term, we have

푘 ⌊푘/2⌋ (︃ ℎ )︃2 푘 ⌊푘/2⌋ ℎ ∑︁ ∑︁ 1 ∑︁ ∑︁ ∑︁ 1 ∑︁ 2 푢 − 푢 ≤ (︀푢 − 푢 )︀ ℎ 푝+푖,푠 푝+푖−1,푠 ℎ 푝+푖,푠 푝+푖−1,푠 푝=1 ℎ=1 푖=1 푝=1 ℎ=1 푖=1 푘 ℓ ∑︁ ∑︁ (︀ )︀2 ≤ 푐 푢푝+1,푠 − 푢푝,푠 . 푝=1 푠=1

2 2 2 Combining these bounds produces the result |휙|Γ ≤ max{4휋 , 3푐} |푢|퐺. Noting that 푐 ≥ 3 gives the desired result.

In order to show that the discrete trace operator has a continuous right inverse, we need to produce a provably low-energy extension of an arbitrary function on Γ. Let

푘 푗−1 1 ∑︁ 1 ∑︁ 푎 = 휙 and 푎 = 휙 . 푘 푝 푖,푗 2푗 − 1 푖+ℎ 푝=1 ℎ=1−푗

113 We consider the extension

푗 − 1 (︂ 푗 − 1)︂ 푢 = 푎 + 1 − 푎 . (4.4) 푖,푗 ℓ − 1 ℓ − 1 푖,푗

The proof of following inverse result for the discrete trace operator is similar in technique to that of Lemma 10, but signiﬁcantly more involved.

푘 Lemma 11. Let 퐺 = 퐺푘,ℓ, 4ℓ < 푘 < 2푐ℓ, 푐 ∈ N, with boundary Γ = {(푖, 1)}푖=1. For any 푘 √ 휙 ∈ R , the vector 푢 deﬁned by (4.4) satisﬁes |푢|퐺 ≤ 5 푐 |휙|Γ.

2 Proof. We can decompose |푢|퐺 into two parts, namely,

푘 ℓ 푘 ℓ−1 2 ∑︁ ∑︁ 2 ∑︁ ∑︁ 2 |푢|퐺 = (푢푖+1,푗 − 푢푖,푗) + (푢푖,푗+1 − 푢푖,푗) . 푖=1 푗=1 푖=1 푗=1

We bound each sum separately, beginning with the ﬁrst. We have

(︂ 푗 − 1)︂ (︂ 푗 − 1)︂ 휙 − 휙 푢 − 푢 = 1 − (푎 − 푎 ) = 1 − 푖+푗 푖+1−푗 . 푖+1,푗 푖,푗 ℓ − 1 푖+1,푗 푖,푗 ℓ − 1 2푗 − 1

Squaring both sides and noting that 4ℓ < 푘, we have

푘 ℓ 푘 ℓ [︂ ]︂2 푘 2ℓ−1 [︂ ]︂2 ∑︁ ∑︁ ∑︁ ∑︁ 휙푖+푗 − 휙푖+1−푗 ∑︁ ∑︁ 휙푝+ℎ − 휙푝 (푢 − 푢 )2 ≤ ≤ ≤ |휙|2 . 푖+1,푗 푖,푗 2푗 − 1 ℎ Γ 푖=1 푗=1 푖=1 푗=1 푝=1 ℎ=1

We now consider the second sum. Each term can be decomposed as

푎 − 푎 (︂ 푗 )︂ 푢 − 푢 = 푖,푗 + 1 − [푎 − 푎 ], 푖,푗+1 푖,푗 ℓ − 1 ℓ − 1 푖,푗+1 푖,푗 which leads to the upper bound

푘 ℓ−1 푘 ℓ−1 [︂ ]︂2 푘 ℓ−1 ∑︁ ∑︁ ∑︁ ∑︁ 푎 − 푎푖,푗 ∑︁ ∑︁ (푢 − 푢 )2 ≤ 2 + 2 (푎 − 푎 )2. 푖,푗+1 푖,푗 ℓ − 1 푖,푗+1 푖,푗 푖=1 푗=1 푖=1 푗=1 푖=1 푗=1

We estimate the two terms in the previous equation separately, beginning with the ﬁrst. The

114 difference 푎 − 푎푖,푗 can be written as

푘 푗−1 푘 푗−1 1 ∑︁ 1 ∑︁ 1 ∑︁ ∑︁ 푎 − 푎 = 휙 − 휙 = 휙 − 휙 . 푖,푗 푘 푝 2푗 − 1 푖+ℎ 푘(2푗 − 1) 푝 푖+ℎ 푝=1 ℎ=1−푗 푝=1 ℎ=1−푗

Squaring both sides,

(︃ 푘 푗−1 )︃2 푘 푗−1 1 ∑︁ ∑︁ 1 ∑︁ ∑︁ (푎 − 푎 )2 = 휙 − 휙 ≤ (휙 − 휙 )2. 푖,푗 푘2(2푗 − 1)2 푝 푖+ℎ 푘(2푗 − 1) 푝 푖+ℎ 푝=1 ℎ=1−푗 푝=1 ℎ=1−푗

Summing over all 푖 and 푗 gives

푘 ℓ−1 [︂ ]︂2 푘 ℓ−1 푘 푗−1 ∑︁ ∑︁ 푎 − 푎푖,푗 1 ∑︁ ∑︁ 1 ∑︁ ∑︁ ≤ (휙 − 휙 )2 ℓ − 1 (ℓ − 1)2 푘(2푗 − 1) 푝 푖+ℎ 푖=1 푗=1 푖=1 푗=1 푝=1 ℎ=1−푗

ℓ−1 푗−1 푘 2 푘 ∑︁ 1 ∑︁ ∑︁ (휙푝 − 휙푖+ℎ) = 4(ℓ − 1)2 2푗 − 1 푘2/4 푗=1 ℎ=1−푗 푖,푝=1 푘 ≤ |휙|2 ≤ 푐|휙|2 . 4(ℓ − 1) Γ Γ

This completes the analysis of the ﬁrst term. For the second term, we have

[︃ 푗−1 ]︃ 1 2 ∑︁ 푎 − 푎 = 휙 + 휙 − 휙 . 푖,푗+1 푖,푗 2푗 + 1 푖+푗 푖−푗 2푗 − 1 푖+ℎ ℎ=1−푗

Next, we note that

⃒ 푗−1 ⃒ ⃒ 푗−1 ⃒ ⃒ 휙푖 2 ∑︁ ⃒ ⃒휙푖+푗 − 휙푖 ∑︁ 휙푖+푗 − 휙푖+ℎ ⃒ ⃒휙 − − 휙 ⃒ = ⃒ + 2 ⃒ ⃒ 푖+푗 2푗 − 1 2푗 − 1 푖+ℎ⃒ ⃒ 2푗 − 1 2푗 − 1 ⃒ ⃒ ℎ=1 ⃒ ⃒ ℎ=1 ⃒ 푗−1 ∑︁ |휙푖+푗 − 휙푖+ℎ| ≤ 2 , 2푗 − 1 ℎ=0

115 and, similarly,

⃒ 푗−1 ⃒ ⃒ 푗−1 ⃒ ⃒ 휙푖 2 ∑︁ ⃒ ⃒휙푖−푗 − 휙푖 ∑︁ 휙푖−푗 − 휙푖−ℎ ⃒ ⃒휙 − − 휙 ⃒ = ⃒ + 2 ⃒ ⃒ 푖−푗 2푗 − 1 2푗 − 1 푖−ℎ⃒ ⃒ 2푗 − 1 2푗 − 1 ⃒ ⃒ ℎ=1 ⃒ ⃒ ℎ=1 ⃒ 푗−1 ∑︁ |휙푖−푗 − 휙푖−ℎ| ≤ 2 . 2푗 − 1 ℎ=0

Hence,

푙−1 푙−1 ⎡(︃ 푗−1 )︃2 (︃ 푗−1 )︃2⎤ ∑︁ ∑︁ 8 ∑︁ |휙푖+푗 − 휙푖+ℎ| ∑︁ |휙푖−푗 − 휙푖−ℎ| (푎 − 푎 )2 ≤ + . 푖,푗+1 푖,푗 (2푗 + 1)2 ⎣ 2푗 − 1 2푗 − 1 ⎦ 푗=1 푗=1 ℎ=0 ℎ=0

Once we sum over all 푖, the sum of the ﬁrst and second term are identical, and therefore

푘 푙−1 푘 푙−1 (︃ 푗−1 )︃2 ∑︁ ∑︁ ∑︁ ∑︁ ∑︁ |휙푖+푗 − 휙푖+ℎ| (푎 − 푎 )2 ≤ 16 . 푖,푗+1 푖,푗 (2푗 − 1)(2푗 + 1) 푖=1 푗=1 푖=1 푗=1 ℎ=0

We have

푘 푙−1 (︃ 푗−1 )︃2 푘 푙−1 (︃ 푖+푗−1 )︃2 ∑︁ ∑︁ ∑︁ |휙푖+푗 − 휙푖+ℎ| 16 ∑︁ ∑︁ 1 ∑︁ |휙푖+푗 − 휙푝| 16 ≤ (2푗 − 1)(2푗 + 1) 9 푗 푖 + 푗 − 푝 푖=1 푗=1 ℎ=0 푖=1 푗=1 푝=푖 푘+ℓ−1 푞−1 (︃ 푞−1 )︃2 16 ∑︁ ∑︁ 1 ∑︁ |휙푞 − 휙푝| ≤ , 9 푞 − 푚 푞 − 푝 푞=1 푚=1 푝=푚 where 푞 = 푖 + 푗 and 푚 = 푖. Letting 푟 = 푞 − 푚, 푠 = 푞 − 푝, and using Hardy’s inequality [50,

116 Theorem 326], we obtain

푘+ℓ−1 푞−1 (︃ 푞−1 )︃2 푘+ℓ−1 푞−1 (︃ 푟 )︃2 16 ∑︁ ∑︁ 1 ∑︁ |휙푞 − 휙푝| 16 ∑︁ ∑︁ 1 ∑︁ |휙푞 − 휙푞−푠| = 9 푞 − 푚 푞 − 푝 9 푟 푠 푞=1 푚=1 푝=푚 푞=1 푟=1 푠=1 푘+ℓ−1 푞−1 [︂ ]︂2 64 ∑︁ ∑︁ 휙푞 − 휙푞−푟 ≤ 9 푟 푞=1 푟=1 푘+ℓ−1 푞−1 [︃ ]︃2 64 ∑︁ ∑︁ 휙푞 − 휙푞−푟 ≤ 9 푑 (︀(푞, 1), (푞 − 푟, 1))︀ 푞=1 푟=1 퐺 256 ≤ |휙|2 , 9 Γ where we note that the sum over the indices 푞 and 푟 consists of some amount of over- counting, with some terms (휙(푞) − 휙(푞 − 푟))2 appearing up to four times. Improved estimates can be obtained by noting that the choice of indexing for Γ is arbitrary (see [124] for details), but, for the sake of readability, we focus on simplicity over tight constants. Combining our estimates and noting that 푐 ≥ 3, we obtain the desired result

√︃ (︂ 256)︂ √︂ 521 √ |푢| ≤ 1 + 2 푐 + |휙| = 2푐 + |휙| ≤ 5 푐 |휙| . 퐺 9 Γ 9 Γ Γ

Combining Lemmas 10 and 11, we obtain our desired trace theorem.

푘 Theorem 12. Let 퐺 = 퐺푘,ℓ, 4ℓ < 푘 < 2푐ℓ, 푐 ∈ N, with boundary Γ = {(푖, 1)}푖=1. For any 휙 ∈ R푘, 1 √ √ |휙|Γ ≤ min |푢|퐺 ≤ 5 푐 |휙|Γ. 4 푐 푢|Γ=휙

With a little more work, we can prove a similar result for a slightly more general class of graphs. Using Theorem 12, we can almost immediately prove a trace theorem for any

* graph 퐻 satisfying 퐺푘,ℓ ⊂ 퐻 ⊂ 퐺푘,ℓ. In fact, Lemma 10 carries over immediately. In order to prove a new version of Lemma 11, it sufﬁces to bound the energy of 푢 on the edges in

117 * 퐺푘,ℓ not contained in 퐺푘,ℓ. By Cauchy-Schwarz,

푘 ℓ−1 [︂ ]︂ 2 2 ∑︁ ∑︁ 2 2 |푢|퐺* = |푢|퐺 + (푢푖,푗+1 − 푢푖−1,푗) + (푢푖,푗+1 − 푢푖+1,푗) 푖=1 푗=1 푘 ℓ 푘 ℓ−1 ∑︁ ∑︁ 2 ∑︁ ∑︁ 2 2 ≤ 3 (푢푖+1,푗 − 푢푖,푗) + 2 (푢푖,푗+1 − 푢푖,푗) ≤ 3 |푢|퐺, 푖=1 푗=1 푖=1 푗=1 and therefore Corollary 1 follows immediately from Lemmas 10 and 11.

* Corollary 1. Let 퐻 satisfy 퐺푘,ℓ ⊂ 퐻 ⊂ 퐺푘,ℓ, 4ℓ < 푘 < 2푐ℓ, 푐 ∈ N, with boundary 푘 푘 Γ = {(푖, 1)}푖=1. For any 휙 ∈ R ,

1 √ √ |휙|Γ ≤ min |푢|퐻 ≤ 5 3푐 |휙|Γ. 4 푐 푢|Γ=휙

Trace Theorems for General Graphs

In order to extend Corollary 1 to more general graphs, we introduce a graph operation which is similar in concept to an aggregation (a partition of 푉 into connected subsets) in which the size of aggregates is bounded. In particular, we give the following deﬁnition.

* Deﬁnition 2. The graph 퐻, 퐺푘,ℓ ⊂ 퐻 ⊂ 퐺푘,ℓ, is said to be an 푀-aggregation of (퐺, Γ) ∈ 푗=1,...,ℓ 풢푛 if there exists a partition 풜 = 푎* ∪ {푎푖,푗}푖=1,...,푘 of 푉 (퐺) satisfying

1. 퐺[푎푖,푗] is connected and |푎푖,푗| ≤ 푀 for all 푖 = 1, ..., 푘, 푗 = 1, ..., ℓ,

⋃︀푘 2. Γ ⊂ 푖=1 푎푖,1, and Γ ∩ 푎푖,1 ̸= ∅ for all 푖 = 1, ..., 푘,

⋃︀푘 3. 푁퐺(푎*) ⊂ 푎* ∪ 푖=1 푎푖,ℓ,

4. the aggregation graph of 풜∖푎*, given by

(풜∖푎*, {(푎푖1,푗1 , 푎푖2,푗2 ) | 푁퐺(푎푖1,푗1 ) ∩ 푎푖2,푗2 ̸= 0}),

is isomorphic to 퐻.

118 * (a) graph (퐺, Γ) (b) partition 풜 (c) 퐺6,2 ⊂ 퐻 ⊂ 퐺6,2 Figure 4-4: An example of an 푀-aggregation. Figure (A) provides a visual representation of a graph 퐺, with boundary vertices Γ enlarged. Figure (B) shows a partition 풜 of 퐺, in which each aggregate (enclosed by dotted lines) has order at most four. The set 푎* is denoted by a shaded region. Figure (C) shows the aggregation graph 퐻 of 풜∖푎*. The graph 퐻 * satisﬁes 퐺6,2 ⊂ 퐻 ⊂ 퐺6,2, and is therefore a 4-aggregation of (퐺, Γ).

We provide a visual example in Figure 4-4, and later show that this operation applies to a fairly large class of graphs. However, the 푀-aggregation procedure is not the only operation for which we can control the behavior of the energy and boundary semi-norms. For instance, the behavior of our semi-norms under the deletion of some number of edges can be bounded easily if each edge can be replaced by a path of constant length, and no remaining edge is contained in more than a constant number of such paths. In addition, the behavior of these semi-norms under the disaggregation of large degree vertices is also relatively well-behaved (see [52] for details). Here, we focus on the 푀-aggregation procedure. We

* give the following result regarding graphs (퐺, Γ) for which some 퐻, 퐺푘,ℓ ⊂ 퐻 ⊂ 퐺푘,ℓ, is an 푀-aggregation of (퐺, Γ), but note that a large number of minor reﬁnements are possible, such as the two brieﬂy mentioned above.

* Theorem 13. If 퐻, 퐺푘,ℓ ⊂ 퐻 ⊂ 퐺푘,ℓ, 4ℓ < 푘 < 2푐ℓ, 푐 ∈ N, is an 푀-aggregation of

푛Γ (퐺, Γ) ∈ 풢푛, then for any 휙 ∈ R ,

1 |휙|Γ ≤ min |푢|퐺 ≤ 퐶2 |휙|Γ 퐶1 푢|Γ=휙

for some ﬁxed constants 퐶1, 퐶2 > 0 that depend only on 푐 and 푀.

119 Proof. We ﬁrst prove that there is an extension 푢 of 휙 which satisﬁes |푢|퐺 ≤ 퐶2|휙|Γ for

some 퐶2 > 0 that depends only on 푐 and 푀. To do so, we deﬁne auxiliary functions 푢̂︀ and * 휙̂︀ on (퐺2푘,ℓ, Γ2푘,ℓ). Let

⎧ ⎪ ⎨max푞∈Γ∩푎(푝+1)/2,1 휙(푞) if 푝 is odd, 휙̂︀(푝) = ⎪ ⎩ min푞∈Γ∩푎푝/2,1 휙(푞) if 푝 is even,

and 푢̂︀ be extension (4.4) of 휙̂︀. The idea is to upper bound the semi-norm for 푢 by 푢̂︀, for 푢̂︀ by 휙̂︀ (using Corollary 1), and for 휙̂︀ by 휙. On each aggregate 푎푖,푗, let 푢 take values between 2 푢̂︀(2푖 − 1, 푗) and 푢̂︀(2푖, 푗), and let 푢 equal 푎 on 푎*. We can decompose |푢|퐺 into

푘 ℓ 푘 ℓ 2 ∑︁ ∑︁ ∑︁ 2 ∑︁ ∑︁ ∑︁ 2 |푢|퐺 = (푢(푝) − 푢(푞)) + (푢(푝) − 푢(푞)) 푖=1 푗=1 푝,푞∈푎푖,푗 , 푖=1 푗=1 푝∈푎푖,푗 , 푝∼푞 푞∈푎푖+1,푗 , 푝∼푞 푘 ℓ−1 푘 ℓ−1 ∑︁ ∑︁ ∑︁ ∑︁ ∑︁ ∑︁ + (푢(푝) − 푢(푞))2 + (푢(푝) − 푢(푞))2

푖=1 푗=1 푝∈푎푖,푗 , 푖=1 푗=1 푝∈푎푖,푗 , 푞∈푎푖−1,푗+1, 푞∈푎푖+1,푗+1, 푝∼푞 푝∼푞 푘 ℓ−1 ∑︁ ∑︁ ∑︁ + (푢(푝) − 푢(푞))2,

푖=1 푗=1 푝∈푎푖,푗 , 푞∈푎푖,푗+1, 푝∼푞

2 and bound each term of |푢|퐺 separately, beginning with the ﬁrst. The maximum energy semi-norm of an 푚 vertex graph that takes values in the range [푎, 푏] is bounded above by (푚/2)2(푏 − 푎)2. Therefore,

2 ∑︁ 푀 2 (푢(푝) − 푢(푞))2 ≤ (푢(2푖 − 1, 푗) − 푢(2푖, 푗)) . 4 ̂︀ ̂︀ 푝,푞∈푎푖,푗 , 푝∼푞

120 For the second term,

∑︁ 2 2 2 (푢(푝) − 푢(푞)) ≤ 푀 max (푢̂︀(푖1, 푗) − 푢̂︀(푖2, 푗)) 푖1∈{2푖−1,2푖}, 푝∈푎푖,푗 , 푖2∈{2푖+1,2푖+2} 푞∈푎푖+1,푗 , 푝∼푞 2[︀ 2 2 ≤ 3푀 (푢̂︀(2푖 − 1, 푗) − 푢̂︀(2푖, 푗)) + (푢̂︀(2푖, 푗) − 푢̂︀(2푖 + 1, 푗)) 2]︀ +(푢̂︀(2푖 + 1, 푗) − 푢̂︀(2푖 + 2, 푗)) .

The exact same type of bound holds for the third and fourth terms. For the ﬁfth term,

∑︁ 2 2 2 (푢(푝) − 푢(푞)) ≤ 푀 max (푢̂︀(푖1, 푗) − 푢̂︀(푖2, 푗 + 1)) , 푖1∈{2푖−1,2푖}, 푝∈푎푖,푗 , 푖2∈{2푖−1,2푖} 푞∈푎푖,푗+1, 푝∼푞

2 and, unlike terms two, three, and four, this maximum appears in |푢| * . Combining these ̂︀ 퐺2푘,ℓ 2 1 2 2 73 2 2 three estimates, we obtain the upper bound |푢| ≤ (6 × 3 + )푀 |푢| * = 푀 |푢| * . 퐺 4 ̂︀ 퐺2푘,ℓ 4 ̂︀ 퐺2푘,ℓ

Next, we lower bound |휙| by a constant times |휙| . By deﬁnition, in Γ ∩ 푎 there is Γ ̂︀ Γ2푘,ℓ 푖,1 a vertex that takes value 휙̂︀(2푖−1) and a vertex that takes value 휙̂︀(2푖). This implies that every term in |휙| is a term in |휙| , with a possibly different denominator. Distances between ̂︀ Γ2푘,ℓ Γ vertices on Γ can be decreased by at most a factor of 2푀 on Γ2푘,ℓ. In addition, it may be the

case that an aggregate contains only one vertex of Γ, which results in 휙̂︀(2푖 − 1) = 휙̂︀(2푖). Therefore, a given term in |휙|2 could appear four times in |휙|2 . Combining these Γ ̂︀ Γ2푘,ℓ two facts, we immediately obtain the bound |휙|2 ≤ 16푀 2|휙|2 . Combining these two ̂︀ Γ2푘,ℓ Γ estimates with Corollary 1 completes the ﬁrst half of the proof.

All that remains is to show that for any 푢, |휙|Γ ≤ 퐶1|푢|퐺 for some 퐶1 > 0 that depends

only on 푐 and 푀. To do so, we deﬁne auxiliary functions 푢̃︀ and 휙̃︀ on (퐺2푘,2ℓ, Γ2푘,2ℓ). Let ⎧ ⎪ ⎨max푝∈푎⌈푖/2⌉,⌈푗/2⌉ 푢(푝) if 푖 = 푗 mod 2, 푢̃︀(푖, 푗) = ⎪ ⎩min푝∈푎⌈푖/2⌉,⌈푗/2⌉ 푢(푝) if 푖 ̸= 푗 mod 2.

Here, the idea is to lower bound the semi-norm for 푢 by 푢̃︀, for 푢̃︀ by 휙̃︀ (using Corollary 1),

121 and for 휙 by 휙. We can decompose |푢|2 into ̃︀ ̃︀ 퐺2푘,2ℓ

푘 ℓ ∑︁ ∑︁ 2 |푢|2 = 4 (푢(2푖 − 1, 2푗 − 1) − 푢(2푖, 2푗 − 1)) ̃︀ 퐺2푘,2ℓ ̃︀ ̃︀ 푖=1 푗=1 푘 ℓ ∑︁ ∑︁ 2 2 + (푢̃︀(2푖, 2푗 − 1) − 푢̃︀(2푖 + 1, 2푗 − 1)) + (푢̃︀(2푖, 2푗) − 푢̃︀(2푖 + 1, 2푗)) 푖=1 푗=1 푘 ℓ−1 ∑︁ ∑︁ 2 2 + (푢̃︀(2푖 − 1, 2푗) − 푢̃︀(2푖 − 1, 2푗 + 1)) + (푢̃︀(2푖, 2푗) − 푢̃︀(2푖, 2푗 + 1)) . 푖=1 푗=1

The minimum squared energy semi-norm of an 푚 vertex graph that takes value 푎 at some vertex and value 푏 at some vertex is bounded below by (푏 − 푎)2/푚. Therefore,

2 ∑︁ 2 (푢̃︀(2푖 − 1, 2푗 − 1) − 푢̃︀(2푖, 2푗 − 1)) ≤ 푀 (푢(푝) − 푢(푞)) , 푝,푞∈푎푖,푗 , 푝∼푞

and 2 ∑︁ 2 max (푢̃︀(푖1, 2푗) − 푢̃︀(푖2, 2푗)) ≤ 2푀 (푢(푝) − 푢(푞)) . 푖1∈{2푖−1,2푖}, 푝,푞∈푎푖,푗 ∪푎푖+1,푗 , 푖2∈{2푖+1,2푖+2} 푝∼푞 Repeating the above estimate for each term results in the desired bound.

Next, we upper bound |휙| by a constant multiple of |휙| . We can write |휙|2 as Γ ̃︀ Γ2푘,2ℓ Γ

푘 2 푘−1 푘 2 2 ∑︁ ∑︁ (휙(푝) − 휙(푞)) ∑︁ ∑︁ ∑︁ (휙(푝) − 휙(푞)) |휙|Γ = 2 + 2 , 푑퐺(푝, 푞) 푑퐺(푝, 푞) 푖=1 푝,푞∈Γ∩푎푖,1 푖1=1 푖2=푖1+1 푝∈Γ∩푎푖1,1, 푞∈Γ∩푎푖2,1

and bound each term separately. The ﬁrst term is bounded by

∑︁ (휙(푝) − 휙(푞))2 푀 2 ≤ (휙(2푖 − 1) − 휙(2푖))2. 2 ̃︀ ̃︀ 푑퐺(푝, 푞) 4 푝,푞∈Γ∩푎푖,1

1 For the second term, we ﬁrst note that 푑퐺(푝, 푞) ≥ 3 푑Γ2푘,2ℓ ((푚1, 1), (푚2, 1)) for 푝 ∈ Γ∩푎푖1,1,

푞 ∈ Γ ∩ 푎푖2,1, 푚1 ∈ {2푖1 − 1, 2푖1}, 푚2 ∈ {2푖2 − 1, 2푖2}, which allows us to bound the

122 second term by

2 2 ∑︁ (휙(푝) − 휙(푞)) 2 (휙̃︀(푚1) − 휙̃︀(푚2)) 2 ≤ 9푀 max 2 . 푑 (푝, 푞) 푚1∈{2푖1−1,2푖1}, 푑 (푚1, 푚2) 푝∈Γ∩푎 , 퐺 Γ2푘,2ℓ 푖1,1 푚2∈{2푖2−1,2푖2} 푞∈Γ∩푎푖2,1

Combining the lower bound for 푢 in terms of 푢̃︀, the upper bound for 휑 in terms of 휑̃︀, and Corollary 1 completes the second half of the proof.

The proof of Theorem 13 also immediately implies a similar result. Let 퐿̃︀ ∈ R푛Γ×푛Γ be −2 the Laplacian of the complete graph on Γ with weights 푤(푖, 푗) = 푑Γ (푖, 푗). The same proof implies the following.

* Corollary 2. If 퐻, 퐺푘,ℓ ⊂ 퐻 ⊂ 퐺푘,ℓ, 4ℓ < 푘 < 2푐ℓ, 푐 ∈ N, is an 푀-aggregation of

푛Γ (퐺, Γ) ∈ 풢푛, then for any 휙 ∈ R ,

1 1/2 1/2 ⟨퐿휙,̃︀ 휙⟩ ≤ min |푢|퐺 ≤ 퐶2 ⟨퐿휙,̃︀ 휙⟩ , 퐶1 푢|Γ=휙

for some ﬁxed constants 퐶1, 퐶2 > 0 that depend only on 푐 and 푀.

1/2 Spectral Equivalence of 푆Γ and 퐿Γ

2 By Corollary 2, and the property ⟨휙, 푆Γ휙⟩ = min푢|Γ=휙 |푢|퐺 (see Proposition 8), in order 1/2 1/2 to prove spectral equivalence between 푆Γ and 퐿Γ , it sufﬁces to show that 퐿Γ and 퐿̃︀ are spectrally equivalent. This can be done relatively easily, and leads to a proof of the main result of the section.

* Theorem 14. If 퐻, 퐺푘,ℓ ⊂ 퐻 ⊂ 퐺푘,ℓ, 4ℓ < 푘 < 2푐ℓ, 푐 ∈ N, is an 푀-aggregation of

푛Γ (퐺, Γ) ∈ 풢푛, then for any 휙 ∈ R ,

1 1/2 1/2 ⟨퐿Γ 휙, 휙⟩ ≤ ⟨푆Γ휙, 휙⟩ ≤ 퐶2 ⟨퐿Γ 휙, 휙⟩, 퐶1 for some ﬁxed constants 퐶1, 퐶2 > 0 that depend only on 푐 and 푀.

123 Proof. Let 휑(푖, 푗) = min{푖 − 푗 mod 푛Γ, 푗 − 푖 mod 푛Γ}. 퐺[Γ] is a cycle, so 퐿̃︀(푖, 푗) = −2 −휑(푖, 푗) for 푖 ̸= 푗. The spectral decomposition of 퐿Γ is well known, namely,

⌊︀ 푛Γ ⌋︀ 2 [︂ 푇 푇 ]︂ ∑︁ 푥푘푥푘 푦푘푦푘 퐿Γ = 휆푘(퐿Γ) 2 + 2 , ‖푥푘‖ ‖푦푘‖ 푘=1

2휋푘 2휋푘푗 2휋푘푗 where 휆푘(퐿Γ) = 2 − 2 cos and 푥푘(푗) = sin , 푦푘(푗) = cos , 푗 = 1, ..., 푛Γ. If 푛Γ 푛Γ 푛Γ 푛Γ

is odd, then 휆(푛Γ−1)/2 has multiplicity two, but if 푛Γ is even, then 휆푛Γ/2 has only multiplicity

one, as 푥푛Γ/2 = 0. If 푘 ̸= 푛Γ/2, we have

푛Γ (︂ )︂ 푛Γ (︂ )︂ ∑︁ 2휋푘푗 푛Γ 1 ∑︁ 4휋푘푗 ‖푥 ‖2 = sin2 = − cos 푘 푛 2 2 푛 푗=1 Γ 푗=1 Γ [︃ 1 ]︃ 푛 1 sin(2휋푘(2 + )) 푛 = Γ − 푛Γ − 1 = Γ , 2 4 sin 2휋푘 2 푛Γ

2 푛Γ 2 and so ‖푦푘‖ = 2 as well. If 푘 = 푛Γ/2, then ‖푦푘‖ = 푛Γ. If 푛Γ is odd,

√ 푛Γ−1 2 [︂ ]︂1/2 [︂ ]︂ 1/2 2 2 ∑︁ 2푘휋 2휋푘푖 2휋푘푗 2휋푘푖 2휋푘푗 퐿Γ (푖, 푗) = 1 − cos sin sin − cos cos 푛Γ 푛Γ 푛Γ 푛Γ 푛Γ 푛Γ 푘=1 푛Γ−1 4 ∑︁2 (︂휋 2푘 )︂ (︂ 2푘 )︂ = sin cos 휑(푖, 푗)휋 푛Γ 2 푛Γ 푛Γ 푘=1 푛 2 ∑︁Γ (︂휋 2푘 )︂ (︂ 2푘 )︂ = sin cos 휑(푖, 푗)휋 , 푛Γ 2 푛Γ 푛Γ 푘=0

and if 푛Γ is even,

푛Γ −1 2 (︂ )︂ (︂ )︂ 1/2 2 푖+푗 4 ∑︁ 휋 2푘 2푘 퐿Γ (푖, 푗) = (−1) + sin cos 휑(푖, 푗)휋 푛Γ 푛Γ 2 푛Γ 푛Γ 푘=1 푛 2 ∑︁Γ (︂휋 2푘 )︂ (︂ 2푘 )︂ = sin cos 휑(푖, 푗)휋 . 푛Γ 2 푛Γ 푛Γ 푘=0

1/2 휋 퐿Γ (푖, 푗) is simply the trapezoid rule applied to the integral of sin( 2 푥) cos(휑(푖, 푗)휋푥) on

124 the interval [0, 2]. Therefore,

⃒ ⃒ ⃒ ∫︁ 2 ⃒ ⃒ 1/2 2 ⃒ ⃒ 1/2 (︁휋 )︁ ⃒ 2 ⃒퐿Γ (푖, 푗) + 2 ⃒ = ⃒퐿Γ (푖, 푗) − sin 푥 cos (휑(푖, 푗)휋푥) 푑푥⃒ ≤ 2 , ⃒ 휋(4휑(푖, 푗) − 1)⃒ ⃒ 0 2 ⃒ 3푛Γ where we have used the fact that if 푓 ∈ 퐶2([푎, 푏]), then

⃒ ∫︁ 푏 ⃒ 3 ⃒ 푓(푎) + 푓(푏) ⃒ (푏 − 푎) ′′ ⃒ 푓(푥)푑푥 − (푏 − 푎)⃒ ≤ max |푓 (휉)|. ⃒ 푎 2 ⃒ 12 휉∈[푎,푏]

Noting that 푛Γ ≥ 3, it quickly follows that √ √ (︂ 1 2)︂ (︂ 2 2)︂ − ⟨퐿휙,˜ 휙⟩ ≤ ⟨퐿1/2휙, 휙⟩ ≤ + ⟨퐿휙,˜ 휙⟩. 2휋 12 Γ 3휋 27

2 Combining this result with Corollary 2, and noting that ⟨휙, 푆Γ휙⟩ = |푢̂︀|퐺, where 푢̂︀ is the harmonic extension of 휙, we obtain the desired result.

An Illustrative Example

* While the concept of a graph (퐺, Γ) having some 퐻, 퐺푘,ℓ ⊂ 퐻 ⊂ 퐺푘,ℓ, as an 푀-aggregation seems somewhat abstract, this simple formulation in itself is quite powerful. As an example, we illustrate that this implies a trace theorem (and, therefore, spectral equivalence) for all three-connected planar graphs with bounded face degree (number of edges in the associated induced cycle) and for which there exists a planar spring embedding with a convex hull that is not too thin (a bounded distance to Hausdorff distance ratio for the boundary with respect to some point in the convex hull) and satisﬁes bounded edge length and small angle

푓≤푐 conditions. Let 풢푛 be the elements of (퐺, Γ) ∈ 풢푛 for which every face other than the outer face Γ has at most 푐 edges. The exact proof the following theorem is rather long, and can be found in the Appendix of [124]. The intuition is that by taking the planar spring embedding, rescaling it so that the vertices of the boundary face lie on the unit circle, and

* overlaying a natural embedding of 퐺푘,ℓ on the unit disk, an intuitive 푀-aggregation of the graph 퐺 can be formed.

푓≤푐1 Theorem 15. If there exists a planar spring embedding 푋 of (퐺, Γ) ∈ 풢푛 for which

125 푛Γ (1) 퐾 = conv ({[푋Γ]푖,·}푖=1) satisﬁes

‖푢 − 푣‖ sup inf sup ≥ 푐2 > 0, 푢∈퐾 푣∈휕퐾 푤∈휕퐾 ‖푢 − 푤‖

(2) 푋 satisﬁes

‖푋푖1,· − 푋푖2,·‖ max ≤ 푐3 and min 푋푗 ,· 푋푖,· 푋푗 ,· ≥ 푐4 > 0, 푖∈푉 ∠ 1 2 {푖1,푖2}∈퐸 ‖푋푗1,· − 푋푗2,·‖ {푗1,푗2}∈퐸 푗1,푗2∈푁(푖)

* then there exists an 퐻, 퐺푘,ℓ ⊂ 퐻 ⊂ 퐺푘,ℓ, ℓ ≤ 푘 < 2푐ℓ, 푐 ∈ N, such that 퐻 is an

푀-aggregation of (퐺, Γ), where 푐 and 푀 are constants that depend on 푐1, 푐2, 푐3, and 푐4.

126 4.3 The Kamada-Kawai Objective and Optimal Layouts

Given the distances between data points in a high dimensional space, how can we meaning- fully visualize their relationships? This is a fundamental task in exploratory data analysis, for which a variety of different approaches have been proposed. Many of these techniques seek to visualize high-dimensional data by embedding it into lower dimensional, e.g. two or three-dimensional, space. Metric multidimensional scaling (MDS or mMDS) [64, 66] is a classical approach that attempts to find a low-dimensional embedding that accurately represents the distances between points. Originally motivated by applications in psychomet- rics, MDS has now been recognized as a fundamental tool for data analysis across a broad range of disciplines. See [66, 9] for more details, including a discussion of applications to data from scientific, economic, political, and other domains. Compared to other classical visualization tools like PCA2, metric multidimensional scaling has the advantage of not being restricted to linear projections of the data, and of being applicable to data from an arbitrary metric space, rather than just Euclidean space. Because of this versatility, MDS has also become one of the most popular algorithms in the field of graph drawing, where the goal is to visualize relationships between nodes (e.g. people in a social network). In this context, MDS was independently proposed by Kamada and Kawai [55] as a force-directed graph drawing method. In this section, we consider the algorithmic problem of computing an optimal drawing under the MDS/Kamada-Kawai objective, and structural properties of optimal drawings. The

Kamada-Kawai objective is to minimize the following energy/stress functional 퐸 : R푟푛 → R

(︂ )︂2 ∑︁ ‖⃗푥푖 − ⃗푥푗‖ 퐸(⃗푥 , . . . , ⃗푥 ) = − 1 , (4.5) 1 푛 푑(푖, 푗) 푖<푗

푟 which corresponds to the physical situation where ⃗푥1, . . . , ⃗푥푛 ∈ R are particles and for each

푖 ̸= 푗, particles ⃗푥푖 and ⃗푥푗 are connected by an idealized spring with equilibrium length 푑(푖, 푗) 1 following Hooke’s law with spring constant 푘푖푗 = 푑(푖,푗)2 . In applications to visualization,

2In the literature, PCA is sometimes referred to as classical multidimensional scaling, in contrast to metric multidimensional scaling, which we study in this work.

127 the choice of dimension is often small, i.e. 푟 ≤ 3. We also note that in (4.5) the terms in the sum are sometimes re-weighted with vertex or edge weights, which we discuss in more detail later. In practice, the MDS/Kamada-Kawai objective (4.5) is optimized via a heuristic procedure like gradient descent [65, 135] or stress majorization [29, 43]. Because the objective is non-convex, these algorithms may not reach the global minimum, but instead may terminate at approximate critical points of the objective function. Heuristics such as restarting an algorithm from different initializations and using modiﬁed step size schedules have been proposed to improve the quality of results. In practice, these heuristic methods do seem to work well for the Kamada-Kawai objective and are implemented in popular packages like

GRAPHVIZ [39] and the SMACOF package in R. We revisit this problem from an approximation algorithms perspective. First, we resolve the computational complexity of minimizing (4.5) by proving that ﬁnding the global minimum is NP-hard, even for graph metrics (where the metric is the shortest path distance on a graph). Consider the gap decision version of stress minimization over graph metrics, which we formally deﬁne below:

GAP STRESS MINIMIZATION

Input: Graph 퐺 = ([푛], 퐸), 푟 ∈ N, 퐿 ≥ 0. 푛푟 Output: TRUE if there exists ⃗푥 = (⃗푥1, . . . , ⃗푥푛) ∈ R such that 퐸(⃗푥) ≤ 퐿; FALSE if for every ⃗푥, 퐸(⃗푥) ≥ 퐿 + 1.

Theorem 16. Gap Stress Minimization in dimension 푟 = 1 is NP-hard. Furthermore, the problem is hard even restricted to input graphs with diameter bounded by an absolute constant.

As a gap problem, the output is allowed to be arbitrary if neither case holds; the hardness of the gap formulation shows that there cannot exist a Fully-Polynomial Randomized Approximation Scheme (FPRAS) for this problem if P ̸= NP, i.e. the runtime cannot be polynomial in the desired approximation guarantee. Our reduction shows this problem is

128 hard even when the input graph has diameter bounded above by an absolute constant. This is a natural setting to consider, since many real world graphs (for example, social networks [35]) and random graph models [127] indeed have low diameter due to the “small-world phenomena.” Other key aspects of this hardness proof are that we show the problem is hard even when the input 푑 is a graph metric, and we show it is hard even in its canonical unweighted formulation (4.5). Given that computing the minimizer is NP-hard, a natural question is whether there exist polynomial time approximation algorithms for minimizing (4.5). We show that if the input graph has bounded diameter 퐷 = 푂(1), then there indeed exists a Polynomial-Time Approximation Scheme (PTAS) to minimize (4.5), i.e. for ﬁxed 휖 > 0 and ﬁxed 퐷 there exists an algorithm to approximate the global minimum of a 푛 vertex diameter 퐷 graph up to multiplicative error (1 + 휖) in time 푓(휖, 퐷) · 푝표푙푦(푛). More generally, we show the following

result, where KKSCHEME is a simple greedy algorithm described in Subsection 4.3.3 below.

Theorem 17. For any input metric over [푛] with min푖,푗∈[푛] 푑(푖, 푗) = 1 and any 푅 > 휖 > 0, 2 2 푂(푟푅4/휖2) Algorithm KKSCHEME with 휖1 = 푂(휖/푅) and 휖2 = 푂(휖/푅 ) runs in time 푛 (푅/휖) 푟 and outputs ⃗푥1, . . . , ⃗푥푛 ∈ R with ‖⃗푥푖‖ ≤ 푅 such that

* * 2 E [퐸(⃗푥1, . . . , ⃗푥푛)] ≤ 퐸(⃗푥1, . . . , ⃗푥푛) + 휖푛

* * * for any ⃗푥1, . . . , ⃗푥푛 with ‖⃗푥푖 ‖ ≤ 푅 for all 푖, where E is the expectation over the randomness of the algorithm.

The fact that this result is a PTAS for bounded diameter graphs follows from combining it with the two structural results regarding optimal Kamada-Kawai embeddings, which are of independent interest. The ﬁrst (Lemma 13) shows that the optimal objective value for low diameter graphs must be of order Ω(푛2), and the second (Lemma 15) shows that the optimal KK embedding is “contractive” in the sense that the diameter of the output is never much larger than the diameter of the input. Both lemmas are proven in Subsection 4.3.1. In the multidimensional scaling literature, there has been some study of the local convergence of algorithms like stress majorization (for example [28]) which shows that

129 stress majorization will converge quickly if in a sufficiently small neighborhood of a local minimum. This work seems to propose the first provable guarantees for global optimization. The closest previous hardness result is the work of [19], where they showed that a similar problem is hard. In their problem, the terms in (4.5) are weighted by 푑(푖, 푗), absolute value loss replaces the squared loss, and the input is an arbitrary pseudometric where nodes in the input are allowed to be at distance zero from each other. The second assumption makes the diameter (ratio of max to min distance in the input) infinite, which is a major obstruction to modifying their approach to show Theorem 16. See Remark 1 for further discussion. In the approximation algorithms literature, there has been a great deal of interest in optimizing the worst-case distortion of metric embeddings into various spaces, see e.g. [5] for approximation algorithms for embeddings into one dimension, and [34, 85] for more general surveys of low distortion metric embeddings. Though conceptually related, the techniques used in this literature are not generally targeted for minimizing a measure of average pairwise distortion like (4.5).

In what follows, we generally assume the input is given as an unweighted graph to simplify notation. However, for the upper bound (Theorem 17) we do handle the general case of arbitrary metrics with distances in [1, 퐷] (the lower bound of 1 is without loss of generality after re-scaling). In the lower bound (Theorem 16), we prove the (stronger) result that the problem is hard when restricted to graph metrics, instead of just for arbitrary metrics.

Throughout this section, we use techniques involving estimating different components of

the objective function 퐸(⃗푥1, . . . , ⃗푥푛) given by (4.5). For convenience, we use the notation

(︂ )︂2 ‖⃗푥푖 − ⃗푥푗‖ ∑︁ ∑︁ ∑︁ 퐸 (⃗푥) := − 1 , 퐸 (⃗푥) := 퐸 (⃗푥), 퐸 (⃗푥) := 퐸 (⃗푥). 푖,푗 푑(푖, 푗) 푆 푖,푗 푆,푇 푖,푗 푖,푗∈푆 푖∈푆 푗∈푇 푖<푗

The remainder of this section is as follows. In Subsection 4.3.1, we prove two structural results regarding the objective value and diameter of an optimal layout. In Subsection 4.3.2, we provide a proof of Theorem 16, the main algorithmic lower bound of the section. Finally, in Subsection 4.3.3, we formally describe an approximation algorithm for low diameter graphs and prove Theorem 17.

130 4.3.1 Structural Results for Optimal Embeddings

In this subsection, we present two results regarding optimal layouts of a given graph. In particular, we provide a lower bound for the energy of a graph layout and an upper bound for the diameter of an optimal layout. First, we recall the following standard 휖-net estimate.

푟 Lemma 12 (Corollary 4.2.13 of [126]). Let 퐵푅 = {푥 : ‖푥‖ ≤ 푅} ⊂ R be the origin-

centered radius 푅 ball in 푟 dimensions. For any 휖 ∈ (0, 푅) there exists a subset 푆휖 ⊂ 퐵푅 푟 with |푆휖| ≤ (3푅/휖) such that max min ‖푥 − 푦‖ ≤ 휖, i.e. 푆휖 is an 휖-net of 퐵푅. ‖푥‖≤푅 푦∈푆휖

We use this result to prove a lower bound for the objective value of any layout of a

diameter 퐷 graph in R푟.

Lemma 13. Let 퐺 = ([푛], 퐸) have diameter

(푛/2)1/푟 퐷 ≤ . 10

Then any layout ⃗푥 ∈ R푟푛 has energy

푛2 퐸(⃗푥) ≥ . 81(10퐷)푟

Proof. Let 퐺 = ([푛], 퐸) have diameter 퐷 ≤ (푛/2)1/푟/10, and suppose that there exists a

layout ⃗푥 ⊂ R푟 of 퐺 in dimension 푟 with energy 퐸(⃗푥) = 푐푛2 for some 푐 ≤ 1/810. If no such layout exists, then we are done. We aim to lower bound the possible values of 푐. For

each vertex 푖 ∈ [푛], we consider the quantity 퐸푖,푉 ∖푖(⃗푥). The sum

∑︁ 2 퐸푖,푉 ∖푖(⃗푥) = 2푐푛 , 푖∈[푛]

′ and so there exists some 푖 ∈ [푛] such that 퐸푖′,푉 ∖푖′ (⃗푥) ≤ 2푐푛. By Markov’s inequality,

⃒ ⃒ ⃒{푗 ∈ [푛] | 퐸푖′,푗(⃗푥) > 10푐}⃒ < 푛/5,

131 and so at least 4푛/5 vertices (including 푖′) in [푛] satisfy

(︂ )︂2 ‖⃗푥 ′ − ⃗푥 ‖ 푖 푗 − 1 ≤ 10푐, 푑(푖′, 푗)

and also √ ′ 10 ‖⃗푥 ′ − ⃗푥 ‖ ≤ 푑(푖 , 푗)(1 + 10푐) ≤ 퐷. 푖 푗 9

The remainder of the proof consists of taking the 푑-dimensional ball with center ⃗푥푖′ and radius 10퐷/9 (which contains ≥ 4푛/5 vertices), partitioning it into smaller sub-regions, and then lower bounding the energy resulting from the interactions between vertices within each sub-region.

By applying Lemma 12 with 푅 := 10퐷/9 and 휖 := 1/3, we may partition the 푟

푟 dimensional ball with center ⃗푥푖′ and radius 10퐷/9 into (10퐷) disjoint regions, each of 푟 diameter at most 2/3. For each of these regions, we denote by 푆푗 ⊂ [푛], 푗 ∈ [(10퐷) ], the subset of vertices whose corresponding point lies in the corresponding region. As each region is of diameter at most 2/3 and the graph distance between any two distinct vertices is at least one, either

(︂|푆 |)︂ |푆 |(|푆 | − 1) 퐸 (⃗푥) ≥ 푗 (2/3 − 1)2 = 푗 푗 푆푗 2 18

or |푆푗| = 0. Empty intervals provide no beneﬁt and can be safely ignored. A lower bound on the total energy can be produced by the following optimization problem

ℓ ℓ ∑︁ ∑︁ min 푚푘(푚푘 − 1) s.t. 푚푘 = 푚, 푚푘 ≥ 1, 푘 ∈ [ℓ], 푘=1 푘=1 where the 푚푘 have been relaxed to real values. This optimization problem has a non-empty feasible region for 푚 ≥ ℓ, and the solution is given by 푚(푚/ℓ − 1) (achieved when

푟 푚푘 = 푚/ℓ for all 푘). In our situation, 푚 := 4푛/5 and ℓ := (10퐷) , and, by assumption,

132 푚 ≥ ℓ. This leads to the lower bound

ℓ ∑︁ 4푛 [︂ 4푛 ]︂ 푐푛2 = 퐸(⃗푥) ≥ 퐸 (⃗푥) ≥ − 1 , 푆푗 90 5(10퐷)푟 푗=1 which implies that

16 (︂ 5(10퐷)푟 )︂ 1 푐 ≥ 1 − ≥ 450(10퐷)푟 4푛 75(10퐷)푟

for 퐷 ≤ (푛/2)1/푟/10. This completes the proof.

The above estimate has the correct dependence for 푟 = 1. For instance, consider the

lexicographical product of a path 푃퐷 and a clique 퐾푛/퐷 (i.e., a graph with 퐷 cliques in a line, and complete bipartite graphs between neighboring cliques). This graph has diameter

퐷, and the layout in which the “vertices” (each corresponding to a copy of 퐾푛/퐷) of 푃퐷 lie 푛 exactly at the integer values [퐷] has objective value 2 (푛/퐷 − 1). This estimate is almost certainly not tight for dimensions 푟 > 1, as there is no higher dimensional analogue of the

path (i.e., a graph with 푂(퐷푟) vertices and diameter 퐷 that embeds isometrically in R푟). Next, we provide a proof of Lemma 15, which upper bounds the diameter of any optimal layout of a diameter 퐷 graph. However, before we proceed with the proof, we ﬁrst prove

an estimate for the concentration of points ⃗푥푖 at some distance from the marginal median 푟 푟 in any optimal layout. The marginal median ⃗푚 ∈ R of a set of points ⃗푥1, . . . , ⃗푥푛 ∈ R is 푡ℎ the vector whose 푖 component ⃗푚(푖) is the univariate median of ⃗푥1(푖), . . . , ⃗푥푛(푖) (with the convention that, if 푛 is even, the univariate median is given by the mean of the (푛/2)푡ℎ and (푛/2 + 1)푡ℎ ordered values). The below result, by itself, is not strong enough to prove our desired diameter estimate, but serves as a key ingredient in the proof of Lemma 15.

Lemma 14. Let 퐺 = ([푛], 퐸) have diameter 퐷. Then, for any optimal layout ⃗푥 ∈ R푟푛, i.e., ⃗푥 such that 퐸(⃗푥) ≤ 퐸(⃗푦) for all ⃗푦 ∈ R푟푛,

√ 푘 ⃒ ⃒ − 2 ⃒{푖 ∈ [푛] | ‖⃗푚 − ⃗푥푖‖∞ ≥ (퐶 + 푘)퐷}⃒ ≤ 2푟푛퐶

133 for all 퐶 > 0 and 푘 ∈ N, where ⃗푚 is the marginal median of ⃗푥1, . . . , ⃗푥푛.

Proof. Let 퐺 = ([푛], 퐸) have diameter 퐷, and ⃗푥 be an optimal layout. Without loss

of generality, we order the vertices so that ⃗푥1(1) ≤ · · · ≤ ⃗푥푛(1), and shift ⃗푥 so that [︀ ]︀ ⃗푥⌊푛/2⌋(1) = 0. Next, we ﬁx some 퐶 > 0, and deﬁne the subsets 푆0 := ⌊푛/2⌋ and

{︀ [︀ )︀}︀ 푆푘 := 푖 ∈ [푛] | ⃗푥푖(1) ∈ (퐶 + 푘)퐷, (퐶 + 푘 + 1)퐷 , {︀ }︀ 푇푘 := 푖 ∈ [푛] | ⃗푥푖(1) ≥ (퐶 + 푘)퐷 ,

푘 ∈ N. Our primary goal is to estimate the quantity |푇푘|, for an arbitrary 푘, from which the desired result will follow quickly.

(︀푛)︀ The objective value 퐸(⃗푥) is at most 2 ; otherwise we could replace ⃗푥 by a layout with

all ⃗푥푖 equal. To obtain a ﬁrst estimate on |푇푘|, we consider a crude lower bound on 퐸푆0,푇푘 (⃗푥). We have

(︂(퐶 + 푘)퐷 )︂2 퐸 (⃗푥) ≥ |푆 ||푇 | − 1 = (퐶 + 푘 − 1)2⌊푛/2⌋|푇 |, 푆0,푇푘 0 푘 퐷 푘

and therefore (︂푛)︂ 1 푛 |푇 | ≤ ≤ . 푘 2 (퐶 + 푘 − 1)2⌊푛/2⌋ (퐶 + 푘 − 1)2

From here, we aim to prove the following claim

푛 |푇푘| ≤ ℓ , 푘, ℓ ∈ N, 푘 ≥ 2ℓ − 1, (4.6) (︀퐶 + 푘 − (2ℓ − 1))︀2

by induction on ℓ. The above estimate serves as the base case for ℓ = 1. Assuming the above statement holds for some ﬁxed ℓ, we aim to prove that this holds for ℓ + 1.

To do so, we consider the effect of collapsing the set of points in 푆푘, 푘 ≥ 2ℓ, into a

134 hyperplane with a ﬁxed ﬁrst value. In particular, consider the alternate layout ⃗푥′ given by

⎧ ⎪ ⎪(퐶 + 푘)퐷 푗 = 1, 푖 ∈ 푆푘 ⎪ ′ ⎨ ⃗푥푖(푗) = ⃗푥푖(푗) − 퐷 푗 = 1, 푖 ∈ 푇푘+1 . ⎪ ⎪ ⎩⎪⃗푥푖(푗) otherwise

For this new layout, we have

|푆 |2 퐸 (⃗푥′) ≤ 퐸 (⃗푥) + 푘 + (︀|푆 | + |푆 |)︀|푆 | 푆푘−1∪푆푘∪푆푘+1 푆푘−1∪푆푘∪푆푘+1 2 푘−1 푘+1 푘

and

′ (︁(︀ )︀2 (︀ )︀2)︁ 퐸푆0,푇푘+1 (⃗푥 ) ≤ 퐸푆0,푇푘+1 (⃗푥) − |푆0||푇푘+1| (퐶 + 푘 + 1) − 1 − (퐶 + 푘) − 1

= 퐸푆0,푇푘+1 (⃗푥) − (2퐶 + 2푘 − 1)⌊푛/2⌋|푇푘+1|.

Combining these estimates, we note that this layout has objective value bounded above by

′ (︀ )︀ 퐸(⃗푥 ) ≤ 퐸(⃗푥) + |푆푘−1| + |푆푘|/2 + |푆푘+1| |푆푘| − (2퐶 + 2푘 − 1)⌊푛/2⌋|푇푘+1|

≤ 퐸(⃗푥) + |푇푘−1||푇푘| − (2퐶 + 2푘 − 1)⌊푛/2⌋|푇푘+1|,

and so

|푇 ||푇 | |푇 | ≤ 푘−1 푘 푘+1 (2퐶 + 2푘 − 1)⌊푛/2⌋ 푛2 ≤ ℓ ℓ (︀퐶 + (푘 − 1) − (2ℓ − 1))︀2 (︀퐶 + 푘 − (2ℓ − 1))︀2 (2퐶 + 2푘 − 1)⌊푛/2⌋ 2ℓ 푛 (︀퐶 + (푘 − 1) − (2ℓ − 1))︀ 푛 ≤ ℓ ℓ+1 2⌊푛/2⌋ (︀퐶 + 푘 − (2ℓ − 1))︀2 (︀퐶 + (푘 − 1) − (2ℓ − 1))︀2 푛 ≤ ℓ+1 (︀퐶 + (푘 + 1) − (2(ℓ + 1) − 1))︀2

135 for all 푘 + 1 ≥ 2(ℓ + 1) − 1 and 퐶 + 푘 ≤ 푛 + 1. If 퐶 + 푘 > 푛 + 1, then

|푇 ||푇 | |푇 ||푇 | |푇 | ≤ 푘−1 푘 ≤ 푘−1 푘 < 1, 푘+1 (2퐶 + 2푘 − 1)⌊푛/2⌋ (2푛 + 1)⌊푛/2⌋

√ 푘 − 2 and so |푇푘+1| = 0. This completes the proof of claim (4.6), and implies that |푇푘| ≤ 푛 퐶 . Repeating this argument, with indices reversed, and also for the remaining dimensions 푗 = 2, . . . , 푟, leads to the desired result

√ 푘 ⃒ ⃒ − 2 ⃒{푖 ∈ [푛] | ‖⃗푚 − ⃗푥푖‖∞ ≥ (퐶 + 푘)퐷}⃒ ≤ 2푟푛퐶

for all 푘 ∈ N and 퐶 > 0.

√ 푘 − 2 Using the estimates in the proof of Lemma 14 (in particular, the bound |푇푘| ≤ 푛퐶 ), we are now prepared to prove the following diameter bound.

Lemma 15. Let 퐺 = ([푛], 퐸) have diameter 퐷. Then, for any optimal layout ⃗푥 ∈ R푟푛, i.e., ⃗푥 such that 퐸(⃗푥) ≤ 퐸(⃗푦) for all ⃗푦 ∈ R푟푛,

‖⃗푥푖 − ⃗푥푗‖2 ≤ 8퐷 + 4퐷 log2 log2 2퐷 for all 푖, 푗 ∈ [푛].

Proof. Let 퐺 = ([푛], 퐸), have diameter 퐷, and ⃗푥 be an optimal layout. Without loss of generality, suppose that the largest distance between any two points of ⃗푥 is realized between

two points lying in span{푒1} (i.e., on the axis of the ﬁrst dimension). In addition, we order

the vertices so that ⃗푥1(1) ≤ · · · ≤ ⃗푥푛(1) and translate ⃗푥 so that ⃗푥⌊푛/2⌋(1) = 0.

Let 푥1(1) = −훼푒1, and suppose that 훼 ≥ 5퐷. If this is not the case, then we are done.

By differentiating 퐸 with respect to ⃗푥1(1), we obtain

푛 (︂ )︂ 휕퐸 ∑︁ ‖⃗푥푗 − ⃗푥1‖ ⃗푥푗(1) + 훼 = − − 1 = 0. 휕⃗푥 (1) 푑(1, 푗) 푑(1, 푗)‖⃗푥 − ⃗푥 ‖ 1 푗=2 푗 1

136 Let

푇1 := {푗 ∈ [푛]: ‖⃗푥푗 − ⃗푥1‖ ≥ 푑(1, 푗)},

푇2 := {푗 ∈ [푛]: ‖⃗푥푗 − ⃗푥1‖ < 푑(1, 푗)},

and compare the sum of the terms in the above equation corresponding to 푇1 and 푇2,

respectively. We begin with the former. Note that for all 푗 ≥ ⌊푛/2⌋ we must have 푗 ∈ 푇1,

because ‖푥푗 − 푥1‖ ≥ 훼 ≥ 5퐷 > 푑(1, 푗). Therefore we have

(︂ )︂ 푛 (︂ )︂ ∑︁ ‖⃗푥푗 − ⃗푥1‖ ⃗푥푗(1) + 훼 ∑︁ ‖⃗푥푗 − ⃗푥1‖ ⃗푥푗(1) + 훼 − 1 ≥ − 1 푑(1, 푗) 푑(1, 푗)‖⃗푥푗 − ⃗푥1‖ 푑(1, 푗) 푑(1, 푗)‖⃗푥푗 − ⃗푥1‖ 푗∈푇1 푗=⌊푛/2⌋ (︂‖⃗푥 − ⃗푥 ‖ )︂ 훼 ≥ ⌈푛/2⌉ 푗 1 − 1 퐷 퐷‖⃗푥푗 − ⃗푥1‖ ⌈푛/2⌉(훼 − 퐷) ≥ . 퐷2

where in the last inequality we used ‖푥푗 − 푥1‖ ≥ 훼 and ‖푥푗 − 푥1‖/퐷 − 1 ≥ 훼/퐷 − 1.

Next, we estimate the sum of terms corresponding to 푇2. Let

푇3 := {푖 ∈ [푛]: ⃗푥푖(1) + 훼 ≤ 퐷}

and note that 푇2 ⊂ 푇3. We have

(︂ )︂ ⃒ ⃒ ∑︁ ‖⃗푥푗 − ⃗푥1‖ ⃗푥푗(1) + 훼 ∑︁ ⃒ ‖⃗푥푗 − ⃗푥1‖⃒ ⃗푥푗(1) + 훼 1 − = ⃒1 − ⃒ 푑(1, 푗) 푑(1, 푗)‖⃗푥푗 − ⃗푥1‖ ⃒ 푑(1, 푗) ⃒ 푑(1, 푗)‖⃗푥푗 − ⃗푥1‖ 푗∈푇2 푗∈푇3 + ⃒ ⃒ ∑︁ ⃒ ‖⃗푥푗 − ⃗푥1‖⃒ 1 ≤ ⃒1 − ⃒ ⃒ 푑(1, 푗) ⃒ 푑(1, 푗) 푗∈푇3 + ∑︁ 1 ≤ ≤ |푇 |, 푑(1, 푗) 3 푗∈푇3 where | · |+ := max{·, 0}. Combining these estimates, we have

⌈푛/2⌉(훼 − 퐷) |푇 | − ≥ 0, 3 퐷2

137 or equivalently, |푇 |퐷2 훼 ≤ 퐷 + 3 . ⌈푛/2⌉

√ 푘 − 2 By the estimate in the proof of Lemma 14, |푇3| ≤ 푛퐶 for any 푘 ∈ N and 퐶 > 0

satisfying (퐶 + 푘)퐷 ≤ 훼 − 퐷. Taking 퐶 = 2, and 푘 = ⌊훼/퐷 − 3⌋ > 2퐷 log2 log2(2퐷), we have 2 √ 푘 2퐷 훼 ≤ 퐷 + 2퐷22− 2 ≤ 퐷 + = 2퐷, 2퐷

a contradiction. Therefore, 훼 is at most 4퐷 + 2퐷 log2 log2 2퐷. Repeating this argument with indices reversed completes the proof.

While the above estimate is sufﬁcient for our purposes, we conjecture that it is not tight, and that the diameter of an optimal layout of a diameter 퐷 graph is always at most 2퐷.

4.3.2 Algorithmic Lower Bounds

In this subsection, we provide a proof of Theorem 16, the algorithmic lower bound of this section. Our proof is based on a reduction from a version of Max All-Equal 3SAT. The

Max All-Equal 3SAT decision problem asks whether, given variables 푡1, . . . , 푡ℓ, clauses

퐶1, . . . , 퐶푚 ⊂ {푡1, . . . , 푡ℓ, 푡¯1,..., 푡¯ℓ} each consisting of at most three literals (variables or their negation), and some value 퐿, there exists an assignment of variables such that at least 퐿 clauses have all literals equal. The Max All-Equal 3SAT decision problem is known to be APX-hard, as it does not satisfy the conditions of the Max CSP classiﬁcation theorem for a polynomial time optimizable Max CSP [58] (setting all variables true or all variables false does not satisfy all clauses, and all clauses cannot be written in disjunctive normal form as two terms, one with all unnegated variables and one with all negated variables). However, we require a much more restrictive version of this problem. In particular, we require a version in which all clauses have exactly three literals, no literal appears in a clause more than once, the number of copies of a clause is equal to the number of copies of its complement (deﬁned as the negation of all its elements), and each literal appears in exactly

138 푘 clauses. We refer to this restricted version as Balanced Max All-Equal EU3SAT-E푘. This is indeed still APX-hard, even for 푘 = 6. The proof adds little to the understanding of the hardness of the Kamada-Kawai objective, and for this reason is omitted. The proof can be found in [31].

Lemma 16. Balanced Max All-Equal EU3SAT-E6 is APX-hard.

Reduction. Suppose we have an instance of Balanced Max All-Equal EU3SAT-E푘 with variables 푡1, . . . , 푡ℓ, and clauses 퐶1, . . . , 퐶2푚. Let ℒ = {푡1, . . . , 푡ℓ, 푡¯1,..., 푡¯ℓ} be the set of

literals and 풞 = {퐶1, . . . , 퐶2푚} be the multiset of clauses. Consider the graph 퐺 = (푉, 퐸), with

푖 푖 푖 푉 = {푣 }푖∈[푁푣] ∪ {푡 } 푡∈ℒ ∪ {퐶 } 퐶∈풞 , 푖∈[푁푡] 푖∈[푁푐] [︂ ]︂ (2) 푖 푗 푖 푗 퐸 = 푉 ∖ {(푡 , 푡¯ )} 푡∈ℒ ∪ {(푡 , 퐶 )} 푡∈퐶,퐶∈풞 , 푖,푗∈[푁푡] 푖∈[푁푡],푗∈[푁푐]

(2) 20 2 20 20 where 푉 := {푈 ⊂ 푉 | |푈| = 2}, 푁푣 = 푁푐 , 푁푡 = 푁푐 , 푁푐 = 2000ℓ 푚 . The graph 퐺

consists of a central clique of size 푁푣, a clique of size 푁푡 for each literal, and a clique of

size 푁푐 for each clause; the central clique is connected to all other cliques (via bicliques); the clause cliques are connected to each other; each literal clique is connected to all other literals, save for the clique corresponding to its negation; and each literal clique is connected to the cliques of all clauses it does not appear in. Note that the distance between any two nodes in this graph is at most two.

Our goal is to show that, for some ﬁxed 퐿 = 퐿(ℓ, 푚, 푛, 푝) ∈ N, if our instance of Balanced Max All-Equal EU3SAT-E푘 can satisfy 2푝 clauses, then there exists a layout ⃗푥 with 퐸(⃗푥) ≤ 퐿, and if it can satisfy at most 2푝 − 2 clauses, then all layouts ⃗푥′ have objective value at least 퐸(⃗푥′) ≥ 퐿 + 1. To this end, we propose a layout and calculate its objective function up to lower-order terms. Later we will show that, for any layout of an instance that satisﬁes at most 2푝 − 2 clauses, the objective value is strictly higher than our proposed construction for the previous case. The anticipated difference in objective value is of order Θ(푁푡푁푐), and so we will attempt to correctly place each literal vertex up to

139 (︀ )︀ (︀ )︀ accuracy 표 푁푐/(ℓ푛) and each clause vertex up to accuracy 표 푁푡/(푚푛) . The general idea of this reduction is that the clique {푣푖} serves as an “anchor" of sorts that forces all other vertices to be almost exactly at the correct distance from its center. Without loss of generality, assume this anchor clique is centered at 0. Roughly speaking, this forces each literal clique to roughly be at either −1 or +1, and the distance between negations

forces negations to be on opposite sides, i.e., ⃗푥푡푖 ≈ −⃗푥푡¯푖 . Clause cliques are also roughly at either −1 or +1, and the distance to literals forces clauses to be opposite the side with the

majority of its literals, i.e., clause 퐶 = {푡 , 푡 , 푡 } lies at ⃗푥 푖 ≈ −휒{⃗푥 푖 + ⃗푥 푖 + ⃗푥 푖 ≥ 0}, 1 2 3 퐶 푡1 푡2 푡3 where 휒 is the indicator variable. The optimal embedding of 퐺, i.e. the location of variable cliques at either +1 or −1, corresponds to an optimal assignment for the Max All-Equal 3SAT instance.

Remark 1 (Comparison to [19]). As mentioned in the Introduction, the reduction here is signiﬁcantly more involved than the hardness proof for a related problem in [19]. At a high level, the key difference is that in [19] they were able to use a large number of distance-zero vertices to create a simple structure around the origin. This is no longer possible in our setting (in particular, with bounded diameter graph metrics), which results in graph layouts with much less structure. For this reason, we require a graph that exhibits as much structure as possible. To this end, a reduction from Max All-Equal 3SAT using both literals and clauses in the graph is a much more suitable technique than a reduction from Not All-Equal 3SAT using only literals. In fact, it is not at all obvious that the same approach in [19], applied to unweighted graphs, would lead to a computationally hard instance.

Structure of optimal layout. Let ⃗푥 be a globally optimal layout of 퐺, and let us label the vertices of 퐺 based on their value in ⃗푥, i.e., such that ⃗푥1 ≤ · · · ≤ ⃗푥푛. In addition, we deﬁne

푛ˆ := 푛 − 푁푣 = 2ℓ푁푡 + 2푚푁푐.

∑︀ Without loss of generality, we assume that 푖 ⃗푥푖 = 0. We consider the ﬁrst-order conditions

for an arbitrary vertex 푖. Let 푆 := {(푖, 푗) | 푖 < 푗, 푑(푖, 푗) = 2}, 푆푖 := {푗 ∈ [푛] | 푑(푖, 푗) = 2},

140 and

< > 푆푖 := {푗 < 푖 | 푑(푖, 푗) = 2}, 푆푖 := {푗 > 푖 | 푑(푖, 푗) = 2},

<,+ >,+ 푆푖 := {푗 < 푖 | 푑(푖, 푗) = 2, ⃗푥푗 ≥ 0}, 푆푖 := {푗 > 푖 | 푑(푖, 푗) = 2, ⃗푥푗 ≥ 0}, <,− >,− 푆푖 := {푗 < 푖 | 푑(푖, 푗) = 2, ⃗푥푗 < 0}, 푆푖 := {푗 > 푖 | 푑(푖, 푗) = 2, ⃗푥푗 < 0}.

We have

휕퐸 ∑︁ ∑︁ ∑︁ = 2 (⃗푥 − ⃗푥 − 1) − 2 (⃗푥 − ⃗푥 − 1) − 2 (⃗푥 − ⃗푥 − 1) 휕⃗푥 푖 푗 푗 푖 푖 푗 푖 푗<푖 푗>푖 < 푗∈푆푖 ∑︁ 1 ∑︁ 1 ∑︁ + 2 (⃗푥푗 − ⃗푥푖 − 1) + (⃗푥푖 − ⃗푥푗 − 2) − (⃗푥푗 − ⃗푥푖 − 2) > 2 < 2 > 푗∈푆푖 푗∈푆푖 푗∈푆푖 ⎡ ⎤ (︀ 3 )︀ 1 ∑︁ ∑︁ = 2푛 − 2 |푆푖| ⃗푥푖 + 2(푛 + 1 − 2푖) + ⎣ (3⃗푥푗 + 2) + (3⃗푥푗 − 2)⎦ 2 < > 푗∈푆푖 푗∈푆푖 (︀ 3 )︀ 1 (︀ >,+ >,− <,+ <,− )︀ 3 = 2푛 − 2 |푆푖| ⃗푥푖 + 2(푛 + 1 − 2푖) + 2 |푆푖 | − 5|푆푖 | + 5|푆푖 | − |푆푖 | + 2 퐶푖 = 0, where

∑︁ ∑︁ 퐶푖 := (⃗푥푗 + 1) + (⃗푥푗 − 1). <,− >,− <,+ >,+ 푗∈푆푖 ∪푆푖 푗∈푆푖 ∪푆푖

The 푖푡ℎ vertex has location

>,+ >,− <,+ <,− 2푖 − (푛 + 1) |푆푖 | − 5|푆푖 | + 5|푆푖 | − |푆푖 | 3퐶푖 ⃗푥푖 = 3 − − . (4.7) 푛 − 4 |푆푖| 4푛 − 3|푆푖| 4푛 − 3|푆푖|

By Lemma 15,

|퐶푖| ≤ |푆푖| max max{|⃗푥푗 − 1|, |⃗푥푗 + 1|} ≤ 75푁푡, 푗∈[푛]

5 which implies that |⃗푥푖| ≤ 1 + 1/푁푡 for all 푖 ∈ [푛].

141 Good layout for positive instances. Next, we will formally describe a layout that we will then show to be nearly optimal (up to lower-order terms). Before giving the formal description, we describe the layout’s structure at a high level ﬁrst. Given some assignment of variables that corresponds to 2푝 clauses being satisﬁed, for each literal 푡, we place the

푖 clique {푡 }푖∈[푁푡] at roughly +1 if the corresponding literal 푡 is true; otherwise we place the 푖 clique at roughly −1. For each clause 퐶, we place the corresponding clique {퐶 }푖∈[푁퐶 ] at roughly +1 if the majority of its literals are near −1; otherwise we place it at roughly −1.

푖 The anchor clique {푣 }푖∈[푁푣] lies in the middle of the interval [−1, +1], separating the literal and clause cliques on both sides.

The order of the vertices, from most negative to most positive, consists of

0 푘 푘 0 푇1, 푇2, 푇3 , . . . , 푇3 , 푇4, 푇5 , . . . , 푇5 , 푇6, 푇7, where

푇1 = { clause cliques near −1 with all corresponding literal cliques near +1 },

푇2 = { clause cliques near −1 with two corresponding literal cliques near +1},

휑 푇3 = { literal cliques near −1 with 휑 corresponding clauses near −1 }, 휑 = 0, . . . , 푘,

푇4 = { the anchor clique },

휑 푇5 = { literal cliques near +1 with 휑 corresponding clauses near +1 }, 휑 = 0, . . . , 푘,

푇6 = { clause cliques near +1 with two corresponding literal cliques near −1},

푇7 = { clause cliques near +1 with all corresponding literal cliques near −1 },

⋃︀푘 휑 ⋃︀푘 휑 푇3 := 휑=0 푇3 , 푇6 := 휑=0 푇6 , 푇푐 := 푇1 ∪ 푇2 ∪ 푇6 ∪ 푇7, and 푇푡 := 푇3 ∪ 푇5. Let us deﬁne 2푖−(푛+1) ⃗푦푖 := 푛 , i.e., the optimal layout of a clique 퐾푛 in one dimension. We can write our

proposed optimal layout as a perturbation of ⃗푦푖.

Using the above Equation 4.7 for ⃗푥푖, we obtain ⃗푥푖 = ⃗푦푖 for 푖 ∈ 푇4, and, by ignoring both

142 the contribution of 퐶푖 and 표(1/푛) terms, we obtain

3푁 3푁 ⃗푥 = ⃗푦 − 푡 , 푖 ∈ 푇 , ⃗푥 = ⃗푦 + 푡 , 푖 ∈ 푇 , 푖 푖 푛 1 푖 푖 푛 7 3푁 3푁 ⃗푥 = ⃗푦 − 푡 , 푖 ∈ 푇 , ⃗푥 = ⃗푦 + 푡 , 푖 ∈ 푇 , 푖 푖 2푛 2 푖 푖 2푛 6 푁 + (푘 − 휑/2)푁 푁 + (푘 − 휑/2)푁 ⃗푥 = ⃗푦 − 푡 푐 , 푖 ∈ 푇 , ⃗푥 = ⃗푦 + 푡 푐 , 푖 ∈ 푇 . 푖 푖 푛 3 푖 푖 푛 5

Next, we upper bound the objective value of ⃗푥. We proceed in steps, estimating different components up to 표(푁푐푁푡). We recall the useful formulas

푟−1 푟 2 (︀ 2 )︀ ∑︁ ∑︁ (︂2(푗 − 푖) )︂ (푟 − 1)푟 3푛 − 4푛(푟 + 1) + 2푟(푟 + 1) − 1 = , (4.8) 푛 6푛2 푖=1 푗=푖+1 and

푟 푠 2 ∑︁ ∑︁ (︂2(푗 − 푖) + 푞 )︂ 푟푠 − 1 = (︀3푛2 + 3푞2 + 4푟2 + 4푠2 − 6푛푞 + 6푛푟 − 6푛푠 푛 3푛2 푖=1 푗=1 − 6푞푟 + 6푞푠 − 6푟푠 − 2)︀ 푟푠3 13푟푠 ≤ + max{(푛 − 푠)2, (푟 − 푞)2, 푟2}. (4.9) 3푛2 3푛2

For 푖 ∈ 푇푐, |1 − |⃗푥푖|| ≤ 4푁푡/푛, and so

2 2 2 2 퐸푇푐 (⃗푥) ≤ |푇푐| max(|⃗푥푖 − ⃗푥푗| − 1) ≤ 8푚 푁푐 . 푖,푗∈푇푐

In addition, for 푖 ∈ 푇푡, |1 − |⃗푥푖|| ≤ 3ℓ푁푡/푛, and so

(︂ 6ℓ푁 )︂2 1 (︂6ℓ푁 )︂2 퐸 (⃗푥) ≤ |{푖, 푗 ∈ 푇 | 푑(푖, 푗) = 1}| 1 + 푡 + |{푖, 푗 ∈ 푇 | 푑(푖, 푗) = 2}| 푡 푇푡 푡 푛 푡 4 푛 ℓ3푁 3 ≤ (2ℓ − 1)ℓ푁 2 + 40 푡 푡 푛

143 and

(︂ 6ℓ푁 )︂2 퐸 (⃗푥) ≤ 1 + 푡 |{푖 ∈ 푇 , 푗 ∈ 푇 | 푑(푖, 푗) = 1}| 푇푐,푇푡 푁 푐 푡 1 (︂ 6ℓ푁 )︂2 + 2 × 2 + 푡 |{푖 ∈ 푇 , 푗 ∈ 푇 | 푑(푖, 푗) = 2}| 4 푛 2 3 1 (︂6ℓ푁 )︂2 + 2 × 푡 |{푖 ∈ 푇 ∪ 푇 , 푗 ∈ 푇 | 푑(푖, 푗) = 2}| 4 푛 1 2 5 (︂ 18ℓ푁 )︂ ≤ 1 + 푡 (2ℓ − 3)푁 2푚푁 푛 푡 푐 1 (︂ 30ℓ푁 )︂ + 4 + 푡 (푚 − 푝)푁 푁 2 푛 푐 푡 1 6ℓ푁 + 푡 (2푚 + 푝)푁 푁 2 푛 푐 푡 푚ℓ2푁 2푁 ≤ (4ℓ − 6)푚푁 푁 + 2(푚 − 푝)푁 푁 + 100 푡 푐 . 푡 푐 푡 푐 푛

The quantity 퐸푇4 (⃗푥) is given by (4.8) with 푟 = 푁푣, and so

푁 (푁 − 1)(3푛2 − 4푛(푁 + 1) + 2푁 (푁 + 1)) 퐸 (⃗푥) = 푣 푣 푣 푣 푣 푇4 6푛2 (푛 − 1)(푛 − 2) − 2ˆ푛푛 + 3ˆ푛2 + 3ˆ푛 ≤ . 6

The quantity 퐸푇1,푇4 (⃗푥) is given by (4.9) with

푞 = 3푁푡 +푛/ ˆ 2, 푟 = 푝푁푐, and 푠 = 푁푣,

and so

푝푁 푁 3 13푝푁 푁 푛ˆ2 푝푁 16푝푁 푛ˆ2 퐸 (⃗푥) ≤ 푐 푣 + 푐 푣 ≤ 푐 (푛 − 3ˆ푛) + 푐 . 푇1,푇4 3푛2 3푛2 3 3푛

Similarly, the quantity 퐸푇2,푇4 (⃗푥) is given by (4.9) with

3 푞 = 2 푁푡 +푛/ ˆ 2 − 푝푁푐, 푟 = (푚 − 푝)푁푐, and 푠 = 푁푣,

144 and so

(푚 − 푝)푁 푁 3 13(푚 − 푝)푁 푁 푛ˆ2 퐸 (⃗푥) ≤ 푐 푣 + 푐 푣 푇1,푇4 3푛2 3푛2 (푚 − 푝)푁 16(푚 − 푝)푁 푛ˆ2 ≤ 푐 (푛 − 3ˆ푛) + 푐 . 3 3푛

Next, we estimate 퐸 휑 (⃗푥). Using formula (4.9) with 푇3 ,푇4

푘 ∑︁ 푞 = 푁푡 + (푘 − 휑/2)푁푐 + ℓ푗, 푟 = ℓ휑, and 푠 = 푁푣, 푗=휑

휑 where ℓ휑 := |푇3 |, we have

3 2 2 ℓ휑푁푣 13ℓ휑푁푣푛ˆ ℓ휑 16ℓ휑푛ˆ 퐸푇 휑,푇 ≤ + ≤ (푛 − 3ˆ푛) + . 3 4 3푛2 3푛2 3 3푛

휑 Combining the individual estimates for each 푇3 gives us

ℓ푁 16ℓ푁 푛ˆ2 퐸 ≤ 푡 (푛 − 3ˆ푛) + 푡 . 푇3,푇4 3 3푛

Finally, we can combine all of our above estimates to obtain an upper bound on 퐸(⃗푥). We have

(푛 − 1)(푛 − 2) 퐸(⃗푥) ≤ (2ℓ − 1)ℓ푁 2 + (4ℓ − 6)푚푁 푁 + 2(푚 − 푝)푁 푁 + − 1 푛푛ˆ 푡 푡 푐 푡 푐 6 3 1 2 2 2 2 2 + 2 푛ˆ + 3 푚푁푐(푛 − 3ˆ푛) + 3 ℓ푁푡(푛 − 3ˆ푛) + 200푚 푁푐 (푛 − 1)(푛 − 2) = − 1 푛ˆ2 + (2ℓ − 1)ℓ푁 2 6 2 푡 [︀ ]︀ 2 2 + (4ℓ − 6)푚 + 2(푚 − 푝) 푁푡푁푐 + 200푚 푁푐 (푛 − 1)(푛 − 2) ≤ − ℓ푁 2 − 2(2푚 + 푝)푁 푁 + 200푚2푁 2. 6 푡 푡 푐 푐

We define the ceiling of this final upper bound to be the quantity 퐿. The remainder of the proof consists of showing that if our given instance satisfies at most 2푝 − 2 clauses, then any layout has objective value at least 퐿 + 1.

145 ′ ∑︀ ′ Suppose, to the contrary, that there exists some layout ⃗푥 (shifted so that 푖 ⃗푥푖 = 0), with 퐸(⃗푥′) < 퐿 + 1.

′ 5 From the analysis above, |⃗푥푖| ≤ 1 + 1/푁푡 for all 푖. Intuitively, an optimal layout should have a large fraction of the vertices at distance two on opposite sides. To make this intuition precise, we ﬁrst note that

푛 Lemma 17. Let ⃗푥 ∈ R be a layout of the clique 퐾푛. Then 퐸(⃗푥) ≥ (푛 − 1)(푛 − 2)/6.

Proof. The ﬁrst order conditions (4.7) imply that the optimal layout (up to translation and

′ (︀ )︀ ′ vertex reordering) is given by ⃗푥푖 = 2푖−(푛+1) /푛. By (4.8), 퐸(⃗푥 ) = (푛−1)(푛−2)/6.

Using Lemma 17, we can lower bound 퐸(⃗푥′) by

∑︁ 2 ∑︁ 2 1 ∑︁ 2 퐸(⃗푥′) = (︀⃗푥′ − ⃗푥′ − 1)︀ − (︀⃗푥′ − ⃗푥′ − 1)︀ + (︀⃗푥′ − ⃗푥′ − 2)︀ 푗 푖 푗 푖 4 푗 푖 푖<푗 (푖,푗)∈푆 (푖,푗)∈푆 (푛 − 1)(푛 − 2) ∑︁ ≥ − [︀ 3 (⃗푥′ − ⃗푥′ )2 − (⃗푥′ − ⃗푥′ )]︀. 6 4 푗 푖 푗 푖 (푖,푗)∈푆

Therefore, by assumption,

∑︁ (푛 − 1)(푛 − 2) [︀ 3 (⃗푥′ − ⃗푥′ )2 − (⃗푥′ − ⃗푥′ )]︀ ≥ − (퐿 + 1) 4 푗 푖 푗 푖 6 (푖,푗)∈푆

2 2 2 ≥ ℓ푁푡 + 2(2푚 + 푝)푁푡푁푐 − 200푚 푁푐 − 2.

3 2 [︀ 4 )︀ We note that the function 4 푥 − 푥 equals one at 푥 = 2 and is negative for 푥 ∈ 0, 3 . ′ 5 Because |⃗푥푖| ≤ 1 + 1/푁푡 for all 푖,

(︂ )︂2 (︂ )︂ [︀ 3 ′ ′ 2 ′ ′ ]︀ 3 2 2 7 max 4 (⃗푥푗 − ⃗푥푖) − (⃗푥푗 − ⃗푥푖) ≤ 2 + 5 − 2 + 5 ≤ 1 + 5 . (푖,푗)∈푆 4 푁푡 푁푡 푁푡

Let ′ ′ 1 ′ 1 푇 := {(푖, 푗) ∈ 푆 | ⃗푥푖 ≤ − 6 and ⃗푥푗 ≥ 6 }.

′ 2 By assumption, |푇 | is at most ℓ푁푡 + 2(2푚 + 푝 − 1)푁푡푁푐, otherwise the corresponding instance could satisfy at least 2푝 clauses, a contradiction.

146 [︀ 3 ′ ′ 2 ′ ′ ]︀ ′ However, the quantity 4 (⃗푥푗 − ⃗푥푖) − (⃗푥푗 − ⃗푥푖) is negative for all (푖, 푗) ∈ 푆∖푇 . Therefore,

(︂ )︂ 7 ′ 2 2 2 1 + 5 |푇 | ≥ ℓ푁푡 + 2(2푚 + 푝)푁푡푁푐 − 200푚 푁푐 − 2, 푁푡 which implies that

(︂ )︂ ′ 7 (︀ 2 2 2 )︀ |푇 | ≥ 1 − 5 ℓ푁푡 + 2(2푚 + 푝)푁푡푁푐 − 200푚 푁푐 − 2 푁푡 2 2 2 ≥ ℓ푁푡 + 2(2푚 + 푝)푁푡푁푐 − 200푚 푁푐 − 2000

2 > ℓ푁푡 + 2(2푚 + 푝 − 1)푁푡푁푐 + 푁푡푁푐,

a contradiction. This completes the proof, with a gap of one.

4.3.3 An Approximation Algorithm

In this subsection, we formally describe an approximation algorithm using tools from the Dense CSP literature, and prove theoretical guarantees for the algorithm.

Preliminaries: Greedy Algorithms for Max-CSP

A long line of work studies the feasibility of solving the Max-CSP problem under various related pseudorandomness and density assumptions. In our case, an algorithm with mild dependence on the alphabet size is extremely important. A very simple greedy approach, proposed and analyzed by Mathieu and Schudy [81, 101] (see also [133]), satisﬁes this requirement.

Theorem 18 ([81, 101]). Suppose that Σ is a ﬁnite alphabet, 푛 ≥ 1 is a positive integer, (︀푛)︀ and for every 푖, 푗 ∈ 2 we have a function 푓푖푗 :Σ × Σ → [−푀, 푀]. Then for any 2 2 푂(1/휖2) 휖 > 0, Algorithm GREEDYCSP with 푡0 = 푂(1/휖 ) runs in time 푛 |Σ| and returns

147 Algorithm 4 Greedy Algorithm for Dense CSPs [81, 101]

function GreedyCSP(Σ, 푛, 푡0, {푓푖푗}) Shufﬂe the order of variables 푥1, . . . , 푥푛 by a random permutation. 푡0 for all assignments 푥1, . . . , 푥푡0 ∈ Σ do for (푡0 + 1) ≤ 푖 ≤ 푛 do Choose 푥푖 ∈ Σ to maximize ∑︁ 푓푗푖(푥푗, 푥푖) 푗<푖

end for ∑︀ Record 푥 and objective value 푖̸=푗 푓푖푗(푥푖, 푥푗). end for Return the assignment 푥 found with maximum objective value. end function

푥1, . . . , 푥푛 ∈ Σ such that

∑︁ ∑︁ * * 2 E 푓푖푗(푥푖, 푥푗) ≥ 푓푖푗(푥푖 , 푥푗 ) − 휖푀푛 푖̸=푗 푖̸=푗

* * for any 푥1, . . . , 푥푛 ∈ Σ, where E denotes the expectation over the randomness of the algorithm.

In comparison, we note that computing the maximizer using brute force would run in time |Σ|푛, i.e. exponentially slower in terms of 푛. This guarantee is stated in expectation but, if desired, can be converted to a high probability guarantee by using Markov’s inequality

and repeating the algorithm multiple times. We use GREEDYCSP to solve a minimization

problem instead of maximization, which corresponds to negating all of the functions 푓푖푗. To do so, we require estimates on the error due to discretization of our domain.

Lemma 18. For 푐, 푅 > 0 and 푦 ∈ R푟 with ‖푦‖ ≤ 푅, the function 푥 ↦→ (‖푥 − 푦‖/푐 − 1)2 is 2 푐 max(1, 2푅/푐)-Lipschitz on 퐵푅 = {푥 : ‖푥‖ ≤ 푅}.

2 2 Proof. We ﬁrst prove that the function 푥 ↦→ (푥/푐 − 1) is 푐 max(1, 푅/푐)-Lipschitz on the 2 ⃒ 2 ⃒ interval [0, 푅]. Because the derivative of the function is 푐 (푥/푐 − 1) and ⃒ 푐 (푥/푐 − 1)⃒ ≤ 2 푐 max(1, 푅/푐) on [0, 푅], this result follows from the mean value theorem.

148 Algorithm 5 Approximation Algorithm KKSCHEME

function KKSCHEME(휖1, 휖2, 푅): 푟 Build an 휖1-net 푆휖1 of 퐵푅 = {푥 : ‖푥‖ ≤ 푅} ⊂ R as in Lemma 12. Apply the GREEDYCSP algorithm of Theorem 18 with 휖 = 휖2 to approximately 푛 minimize 퐸(⃗푥1, . . . , ⃗푥푛) over ⃗푥1, . . . , ⃗푥푛 ∈ 푆휖1 . Return ⃗푥1, . . . , ⃗푥푛. end function

Because the function ‖푥 − 푦‖ is 1-Lipschitz and ‖푥 − 푦‖ ≤ ‖푥‖ + ‖푦‖ ≤ 2푅 by the

2 2 triangle inequality, the result follows from the fact that 푥 ↦→ (푥/푐 − 1) is 푐 max(1, 푅/푐)- Lipschitz and a composition of Lipschitz functions is Lipschitz.

Combining this result with Lemma 12 we can bound the loss in objective value due to restricting to a well-chosen discretization.

푟 Lemma 19. Let ⃗푥1, . . . , ⃗푥푛 ∈ R be arbitrary vectors such that ‖⃗푥푖‖ ≤ 푅 for all 푖 and 푟 휖 > 0 be arbitrary. Deﬁne 푆휖 to be an 휖-net of 퐵푅 as in Lemma 12, so |푆휖| ≤ (3푅/휖) . For ′ ′ any input metric over [푛] with min푖,푗∈[푛] 푑(푖, 푗) = 1, there exists ⃗푥1, . . . , ⃗푥푛 ∈ 푆휖 such that

′ ′ 2 퐸(⃗푥1, . . . , ⃗푥푛) ≤ 퐸(⃗푥1, . . . , ⃗푥푛) + 4휖푅푛 where 퐸 is (4.5) deﬁned with respect to an arbitrary graph with 푛 vertices.

(︀푛)︀ 2 Proof. By Lemma 18 the energy 퐸(⃗푥1, . . . , ⃗푥푛) is the sum of 2 ≤ 푛 /2 many terms, ′ which, for each 푖 and 푗, are individually 4푅-Lipschitz in ⃗푥푖 and ⃗푥푗. Therefore, deﬁning ⃗푥푖

to be the closest point in 푆휖 for all 푖 gives the desired result.

This result motivates a straightforward algorithm for producing nearly-optimal layouts of a graph. Given some graph, by Lemma 15, a sufﬁciently large radius can be chosen so that there exists an optimal layout in a ball of that radius. By constructing an 휖-net of that

ball, and applying the GREEDYCSP algorithm to the resulting discretization, we obtain a nearly-optimal layout with theoretical guarantees. This technique is formally described in

the algorithm KKSCHEME. We are now prepared to prove Theorem 17.

149 By combining Lemma 19 with Theorem 18 (used as a minimization instead of maximization algorithm), the output ⃗푥1, . . . , ⃗푥푛 of KKSCHEME satisﬁes

* * 2 2 2 퐸(⃗푥1, . . . , ⃗푥푛) ≤ 퐸(⃗푥1, . . . , ⃗푥푛) + 4휖1푅푛 + 휖2푅 푛

2 2 푂(1/휖 )푟 log(3푅/휖1) 2 and runs in time 푛 2 2 . Taking 휖2 = 푂(휖/푅 ) and 휖1 = 푂(휖/푅) completes the proof of Theorem 17. The runtime can be improved to 푛2 + (푅/휖)푂(푑푅4/휖2) using a slightly more complex greedy CSP algorithm [81]. Also, by the usual argument, a high probability guarantee can be derived by repeating the algorithm 푂(log(2/훿)) times, where 훿 > 0 is the desired failure probability.

150 Chapter 5

Error Estimates for the Lanczos Method

5.1 Introduction

The computation of extremal eigenvalues of matrices is one of the most fundamental problems in numerical linear algebra. When a matrix is large and sparse, methods such as the Jacobi eigenvalue algorithm and QR algorithm become computationally infeasible, and, therefore, techniques that take advantage of the sparsity of the matrix are required. Krylov subspace methods are a powerful class of techniques for approximating extremal eigenvalues, most notably the Arnoldi iteration for non-symmetric matrices and the Lanczos method for symmetric matrices.

The Lanczos method is a technique that, given a symmetric matrix 퐴 ∈ R푛×푛 and an 푛 푚×푚 initial vector 푏 ∈ R , iteratively computes a tridiagonal matrix 푇푚 ∈ R that satisﬁes 푇 푛×푚 푇푚 = 푄푚퐴푄푚, where 푄푚 ∈ R is an orthonormal basis for the Krylov subspace

푚−1 풦푚(퐴, 푏) = span{푏, 퐴푏, ..., 퐴 푏}.

(푚) (푚) The eigenvalues of 푇푚, denoted by 휆1 (퐴, 푏) ≥ ... ≥ 휆푚 (퐴, 푏), are the Rayleigh-Ritz

approximations to the eigenvalues 휆1(퐴) ≥ ... ≥ 휆푛(퐴) of 퐴 on 풦푚(퐴, 푏), and, therefore,

151 are given by

푇 (푚) 푥 퐴푥 휆푖 (퐴, 푏) = min max 푇 , 푖 = 1, ..., 푚, (5.1) 푈⊂풦푚(퐴,푏) 푥∈푈 푥 푥 dim(푈)=푚+1−푖 푥̸=0

or, equivalently,

푇 (푚) 푥 퐴푥 휆푖 (퐴, 푏) = max min 푇 , 푖 = 1, ..., 푚. (5.2) 푈⊂풦푚(퐴,푏) 푥∈푈 푥 푥 dim(푈)=푖 푥̸=0

This description of the Lanczos method is sufﬁcient for a theoretical analysis of error (i.e., without round-off error), but, for completeness, we provide a short description of the

Lanczos method (when 풦푚(퐴, 푏) is full-rank) in Algorithm 6 [113]. For a more detailed discussion of the nuances of practical implementation and techniques to minimize the effects of round-off error, we refer the reader to [47, Section 10.3]. If 퐴 has 휈푛 non-zero entries,

then Algorithm 6 outputs 푇푚 after approximately (2휈 + 8)푚푛 ﬂoating point operations. A number of different techniques, such as the divide-and-conquer algorithm with the fast multipole method [23], can easily compute the eigenvalues of a tridiagonal matrix. The complexity of this computation is negligible compared to the Lanczos algorithm, as, in practice, 푚 is typically orders of magnitude less than 푛.

(푚) Equation (5.1) for the Ritz values 휆푖 illustrates the signiﬁcant improvement that the Lanczos method provides over the power method for extremal eigenvalues. Whereas the power method uses only the iterate 퐴푚푏 as an approximation of an eigenvector associated with the largest magnitude eigenvalue, the Lanczos method uses the span of all of the iterates

of the power method (given by 풦푚+1(퐴, 푏)). However, the analysis of the Lanczos method is signiﬁcantly more complicated than that of the power method. Error estimates for extremal eigenvalue approximation using the Lanczos method have been well studied, most notably by Kaniel [56], Paige [91], Saad [99], and Kuczynski and Wozniakowski [67] (other notable work includes [30, 36, 68, 84, 92, 104, 105, 125]). The work of Kaniel, Paige, and Saad focused on the convergence of the Lanczos method as 푚 increases, and, therefore, their results have strong dependence on the spectrum of the matrix 퐴 and the choice of initial

152 vector 푏. For example, a standard result of this type is the estimate

휆 (퐴) − 휆(푚)(퐴, 푏) (︂ tan (푏, 휙 ) )︂2 휆 (퐴) − 휆 (퐴) 1 1 ≤ ∠ 1 , 훾 = 1 2 , (5.3) 휆1(퐴) − 휆푛(퐴) 푇푚−1(1 + 2훾) 휆2(퐴) − 휆푛(퐴)

푡ℎ where 휙1 is the eigenvector corresponding to 휆1, and 푇푚−1 is the (푚−1) degree Chebyshev polynomial of the ﬁrst kind [100, Theorem 6.4]. Kuczynski and Wozniakowski took a quite (푚) different approach, and estimated the maximum expected relative error (휆1 − 휆1 )/휆1 over all 푛 × 푛 symmetric positive deﬁnite matrices, resulting in error estimates that depend only on the dimension 푛 and the number of iterations 푚. They produced the estimate

[︃ (푚) ]︃ 2 4 휆1(퐴) − 휆1 (퐴, 푏) ln (푛(푚 − 1) ) sup E푏∼풰(푆푛−1) ≤ .103 , (5.4) 푛 2 퐴∈풮++ 휆1(퐴) (푚 − 1)

푛 for all 푛 ≥ 8 and 푚 ≥ 4, where 풮++ is the set of all 푛 × 푛 symmetric positive deﬁnite matrices, and the expectation is with respect to the uniform probability measure on the

hypersphere 푆푛−1 = {푥 ∈ R푛 | ‖푥‖ = 1} [67, Theorem 3.2]. One can quickly verify

that equation (5.4) also holds when the 휆1(퐴) term in the denominator is replaced by 푛 휆1(퐴) − 휆푛(퐴), and the supremum over 풮++ is replaced by the maximum over the set of all 푛 × 푛 symmetric matrices, denoted by 풮푛. Both of these approaches have beneﬁts and drawbacks. If an extremal eigenvalue is known to have a reasonably large eigenvalue gap (based on application or construction), then a distribution dependent estimate provides a very good approximation of error, even for small 푚. However, if the eigenvalue gap is not especially large, then distribution dependent estimates can signiﬁcantly overestimate error, and estimates that depend only on 푛 and 푚 are preferable. This is illustrated by the following elementary, yet enlightening, example.

푛 Example 1. Let 퐴 ∈ 푆++ be the tridiagonal matrix resulting from the discretization of the

Laplacian operator on an interval with Dirichlet boundary conditions, namely, 퐴푖,푖 = 2

for 푖 = 1, ..., 푛 and 퐴푖,푖+1 = 퐴푖+1,푖 = −1 for 푖 = 1, ..., 푛 − 1. The eigenvalues of 퐴

are given by 휆푖(퐴) = 2 + 2 cos (푖휋/(푛 + 1)), 푖 = 1, ..., 푛. Consider the approximation

of 휆1(퐴) by 푚 iterations of the Lanczos method. For a random choice of 푏, the expected

153 Algorithm 6 Lanczos Method

Input: symmetric matrix 퐴 ∈ R푛×푛, vector 푏 ∈ R푛, number of iterations 푚.

푚×푚 Output: symmetric tridiagonal matrix 푇푚 ∈ R , 푇푚(푖, 푖) = 훼푖,

푇 푇푚(푖, 푖 + 1) = 훽푖, satisfying 푇푚 = 푄푚퐴푄푚, where 푄푚 = [푞1 ... 푞푚].

Set 훽0 = 0, 푞0 = 0, 푞1 = 푏/‖푏‖ For 푖 = 1, ..., 푚

푣 = 퐴푞푖 푇 훼푖 = 푞푖 푣

푣 = 푣 − 훼푖푞푖 − 훽푖−1푞푖−1

훽푖 = ‖푣‖

푞푖+1 = 푣/훽푖

2 2 value of tan ∠(푏, 휙1) is (1 + 표(1))푛. If tan ∠(푏, 휙1) = 퐶푛 for some constant 퐶, then (5.3) produces the estimate

(푚) (︂ (︂ )︂ (︂ )︂)︂−2 휆1(퐴) − 휆1 (퐴, 푏) 휋 3휋 ≤ 퐶푛 푇푚−1 1 + 2 tan tan 휆1(퐴) − 휆푛(퐴) 2(푛 + 1) 2(푛 + 1) ≃ 푛(1 + 푂(푛−1))−푚 ≃ 푛.

In this instance, the estimate is a trivial one for all choices of 푚, which varies greatly from the error estimate (5.4) of order ln2 푛/푚2. The exact same estimate holds for the smallest

eigenvalue 휆푛(퐴) when the corresponding bounds are applied, since 4퐼 − 퐴 is similar to 퐴.

Now, consider the approximation of the largest eigenvalue of 퐵 = 퐴−1 by 푚 iterations of the Lanczos method. The matrix 퐵 possesses a large gap between the largest and second- largest eigenvalue, which results in a value of 훾 for (5.3) that remains bounded below by a constant independent of 푛, namely

훾 = (2 cos (휋/(푛 + 1)) + 1) / (2 cos (휋/(푛 + 1)) − 1) .

154 Therefore, in this instance, the estimate (5.3) illustrates a constant convergence rate, produces non-trivial bounds (for typical 푏) for 푚 = Θ(ln 푛), and is preferable to the error estimate (5.4) of order ln2 푛/푚2.

−훼 More generally, if a matrix 퐴 has eigenvalue gap 훾 . 푛 and the initial vector 푏 2 훼/2 satisﬁes tan ∠(푏, 휙1) & 푛, then the error estimate (5.3) is a trivial one for 푚 . 푛 . This implies that the estimate (5.3) is most useful when the eigenvalue gap is constant or tends to zero very slowly as 푛 increases. When the gap is not especially large (say, 푛−훼, 훼 constant), then uniform error estimates are preferable for small values of 푚. In this work, we focus on uniform bounds, namely, error estimates that hold uniformly over some large set of

푛 푛 matrices (typically 풮++ or 풮 ). We begin by recalling some of the key existing uniform error estimates for the Lanczos method.

5.1.1 Related Work

Uniform error estimates for the Lanczos method have been produced almost exclusively for symmetric positive deﬁnite matrices, as error estimates for extremal eigenvalues of

푛 symmetric matrices can be produced from estimates for 푆++ relatively easily. The majority

of results apply only to either 휆1(퐴), 휆푛(퐴), or some function of the two (i.e., condition number). All estimates are probabilistic and take the initial vector 푏 to be uniformly distributed on the hypersphere. Here we provide a brief description of some key uniform error estimates previously produced for the Lanczos method. In [67], Kuczynski and Wozniakowski produced a complete analysis of the power method and provided a number of upper bounds for the Lanczos method. Most notably, they produced error estimate (5.4) and provided the following upper bound for the probability (푚) that the relative error (휆1 − 휆1 )/휆1 is greater than some value 휖:

[︃ ]︃ (푚) √ 휆1(퐴) − 휆1 (퐴, 푏) √ − 휖(2푚−1) sup P > 휖 ≤ 1.648 푛푒 . (5.5) 푛 푏∼풰(푆푛−1) 퐴∈풮++ 휆1(퐴)

However, the authors were unable to produce any lower bounds for (5.4) or (5.5), and stated that a more sophisticated analysis would most likely be required. In the same paper, they

155 performed numerical experiments for the Lanczos method that produced errors of the order 푚−2, which led the authors to suggest that the error estimate (5.4) may be an overestimate, and that the ln2 푛 term may be unnecessary.

In [68], Kuczynski and Wozniakowski noted that the above estimates immediately (푚) translate to relative error estimates for minimal eigenvalues when the error 휆푚 − 휆푛 is

considered relative to 휆1 or 휆1 − 휆푛 (both normalizations can be shown to produce the 푛 same bound). However, we can quickly verify that there exist sequences in 풮++ for which [︀ (푚) ]︀ the quantity E (휆푚 − 휆푛)/휆푛 is unbounded. These results for minimal eigenvalues, combined with (5.4), led to error bounds for estimating the condition number of a matrix. Unfortunately, error estimates for the condition number face the same issue as the quantity (푚) (휆푚 − 휆푛)/휆푛, and therefore, only estimates that depend on the value of the condition number can be produced. [︀ The proof technique used to produce (5.4) works speciﬁcally for the quantity E (휆1 − (푚) ]︀ 휆1 )/휆1 (i.e., the 1-norm), and does not carry over to more general 푝-norms of the form [︀ (푚) 푝]︀1/푝 E ((휆1 − 휆1 )/휆1) , 푝 ∈ (1, ∞). Later, in [30, Theorem 5.2, 푟 = 1], Del Corso and √ Manzini produced an upper bound of the order ( 푛/푚)1/푝 for arbitrary 푝-norms, given by

푝 1 1 [︃(︃ (푚) )︃ ]︃ 푝 (︃ (︀ 1 )︀ (︀ 푛 )︀)︃ 푝 휆1(퐴) − 휆1 (퐴, 푏) 1 Γ 푝 − 2 Γ 2 sup 푛−1 . E푏∼풰(푆 ) . 1/푝 (︀ 푛−1 )︀ (5.6) 푛 휆 (퐴) 푚 퐴∈풮++ 1 Γ(푝)Γ 2

This bound is clearly worse than (5.4), and better bounds can be produced for arbitrary 푝 simply by making use of (5.5). Again, the authors were unable to produce any lower bounds.

More recently, the machine learning and optimization community has become increas- ingly interested in the problem of approximating the top singular vectors of a symmetric matrix by randomized techniques (for a review, see [49]). This has led to a wealth of new results in recent years regarding classical techniques, including the Lanczos method. In [84], Musco and Musco considered a block Krylov subspace algorithm similar to the block Lanczos method and showed that with high probability their algorithm achieves an error of 휖 in 푚 = 푂(휖−1/2 ln 푛) iterations, matching the bound (5.5) for a block size of one. Very recently, a number of lower bounds for a wide class of randomized algorithms were shown

156 in [104, 105], all of which can be applied to the Lanczos method as a corollary. First, the authors showed that the dependence on 푛 in the above upper bounds was in fact necessary. In particular, it follows from [105, Theorem A.1] that 푂(log 푛) iterations are necessary to obtain a non-trivial error. In addition, as a corollary of their main theorem [105, Theorem

−1 1], there exist 푐1, 푐2 > 0 such that for all 휖 ∈ (0, 1) there exists an 푛표 = poly(휖 ) such that 푛 for all 푛 ≥ 푛표 there exists a random matrix 퐴 ∈ 풮 such that

[︂ ]︂ 휌(퐴) − 휌(푇 ) 휖 푐 ≥ ≥ 1 − 푒−푛 2 for all 푚 ≤ 푐 휖−1/2 ln 푛, (5.7) P 휌(퐴) 12 1 where 휌(·) is the spectral radius of a matrix, and the randomness is over both the initial vector and matrix.

5.1.2 Contributions and Remainder of Chapter

In what follows, we prove improved upper bounds for the maximum expected error of the Lanczos method in the 푝-norm, 푝 ≥ 1, and combine this with nearly-matching asymptotic lower bounds with explicit constants. These estimates can be found in Theorem 19. The upper bounds result from using a slightly different proof technique than that of (5.4), which is more robust to arbitrary 푝-norms. Comparing the lower bounds of Theorem 19 to the estimate (5.7), we make a number of observations. Whereas (5.7) results from a statistical analysis of random matrices, our estimates follow from taking an inﬁnite sequence of non- random matrices with explicit eigenvalues and using the theory of orthogonal polynomials. Our estimate for 푚 = 푂(ln 푛) (in Theorem 19) is slightly worse than (5.7) by a factor of ln2 ln 푛, but makes up for this in the form of an explicit constant. The estimate for 푚 = 표(푛1/2 ln−1/2 푛) (also in Theorem 19) does not have 푛 dependence, but illustrates a useful lower bound, as it has an explicit constant and the ln 푛 term becomes negligible as 푚 increases. The results (5.7) do not fully apply to this regime, as the dependence of the required lower bound on 푛 is at least cubic in the inverse of the eigenvalue gap (see [105, Theorem 6.1] for details). To complement these bounds, we also provide an error analysis for matrices that have a

157 certain structure. In Theorem 20, we produce improved dimension-free upper bounds for

matrices that have some level of eigenvalue “regularity" near 휆1. In addition, in Theorem 21, we prove a powerful result that can be used to determine, for any ﬁxed 푚, the asymptotic

푛 relative error for any sequence of matrices 푋푛 ∈ 풮 , 푛 = 1, 2, ..., that exhibits suitable convergence of its empirical spectral distribution. Later, we perform numerical experiments that illustrate the practical usefulness of this result. Theorem 21, combined with estimates for Jacobi polynomials (see Proposition 13), illustrates that the inverse quadratic dependence on the number of iterations 푚 in the estimates produced throughout this chapter does not simply illustrate the worst case, but is actually indicative of the typical case in some sense.

In Corollary 3 and Theorem 22, we produce results similar to Theorem 19 for arbitrary

eigenvalues 휆푖. The lower bounds follow relatively quickly from the estimates for 휆1, but the upper bounds require some mild assumptions on the eigenvalue gaps of the matrix. These results mark the ﬁrst uniform-type bounds for estimating arbitrary eigenvalues by the Lanczos method. In addition, in Corollary 4, we translate our error estimates for the

extremal eigenvalues 휆1(퐴) and 휆푛(퐴) into error bounds for the condition number of a symmetric positive deﬁnite matrix, but, as previously mentioned, the relative error of the condition number of a matrix requires estimates that depend on the condition number itself. Finally, we present numerical experiments that support the accuracy and practical usefulness of the theoretical estimates detailed above.

The remainder of the chapter is as follows. In Section 5.2, we prove basic results regarding relative error and make a number of fairly straightforward observations. In Section 5.3, we prove asymptotic lower bounds and improved upper bounds for the relative error in an arbitrary 푝-norm. In Section 5.4, we produce a dimension-free error estimate for a large class of matrices and prove a theorem that can be used to determine the asymptotic

푛 relative error for any ﬁxed 푚 and sequence of matrices 푋푛 ∈ 풮 , 푛 = 1, 2, ..., with suitable convergence of its empirical spectral distribution. In Section 5.5, under some mild additional

assumptions, we prove a version of Theorem 19 for arbitrary eigenvalues 휆푖, and extend

our results for 휆1 and 휆푛 to the condition number of a symmetric positive deﬁnite matrix. Finally, in Section 5.6, we perform a number of experiments and discuss how the numerical

158 results compare to the theoretical estimates in this work.

5.2 Preliminary Results

Because the Lanczos method applies only to symmetric matrices, all matrices in this

푛 chapter are assumed to belong to 풮 . The Lanczos method and the quantity (휆1(퐴) − (푚) 휆1 (퐴, 푏))/(휆1(퐴) − 휆푛(퐴)) are both unaffected by shifting and scaling, and so any 푛 푛 maximum over 퐴 ∈ 풮 can be replaced by a maximum over all 퐴 ∈ 풮 with 휆1(퐴) and

휆푛(퐴) ﬁxed. Often for producing upper bounds, it is convenient to choose 휆1(퐴) = 1 and (푚) (푚) 휆푛(퐴) = 0. For the sake of brevity, we will often write 휆푖(퐴) and 휆푗 (퐴, 푏) as 휆푖 and 휆푗 when the associated matrix 퐴 and vector 푏 are clear from context.

(푚) We begin by rewriting expression (5.2) for 휆1 in terms of polynomials. The Krylov

subspace 풦푚(퐴, 푏) can be alternatively deﬁned as

풦푚(퐴, 푏) = {푃 (퐴)푏 | 푃 ∈ 풫푚−1},

where 풫푚−1 is the set of all real-valued polynomials of degree at most 푚 − 1. Suppose 퐴 ∈ 풮푛 has eigendecomposition 퐴 = 푄Λ푄푇 , where 푄 ∈ R푛×푛 is an orthogonal matrix and 푛×푛 Λ ∈ R is the diagonal matrix satisfying Λ(푖, 푖) = 휆푖(퐴), 푖 = 1, ..., 푛. Then we have the relation

푥푇 퐴푥 푏푇 푃 2(퐴)퐴푏 ˜푏푇 푃 2(Λ)Λ˜푏 휆(푚)(퐴, 푏) = max = max = max , 1 푇 푇 2 푇 2 푥∈풦푚(퐴,푏) 푥 푥 푃 ∈풫푚−1 푏 푃 (퐴)푏 푃 ∈풫푚−1 ˜푏 푃 (Λ)˜푏 푥̸=0 푃 ̸=0 푃 ̸=0 where ˜푏 = 푄푇 푏. The relative error is given by

휆 (퐴) − 휆(푚)(퐴, 푏) ∑︀푛 ˜푏2푃 2(휆 )(휆 − 휆 ) 1 1 = min 푖=2 푖 푖 1 푖 , 푃 ∈풫 ∑︀푛 ˜2 2 휆1(퐴) − 휆푛(퐴) 푚−1 (휆1 − 휆푛) 푏 푃 (휆푖) 푃 ̸=0 푖=1 푖

159 and the expected 푝푡ℎ moment of the relative error is given by

푝 푝 [︃(︃ (푚) )︃ ]︃ ∫︁ [︃ ∑︀푛 ˜2 2 ]︃ 휆1 − 휆1 푖=2 푏푖 푃 (휆푖)(휆1 − 휆푖) ˜ 푏∼풰(푆푛−1) = min 푑휎(푏), E 푃 ∈풫 ∑︀푛 ˜2 2 휆1 − 휆푛 푆푛−1 푚−1 (휆1 − 휆푛) 푏 푃 (휆푖) 푃 ̸=0 푖=1 푖 where 휎 is the uniform probability measure on 푆푛−1. Because the relative error does not depend on the norm of ˜푏 or the sign of any entry, we can replace the integral over 푆푛−1 by

푛 an integral of 푦 = (푦1, ..., 푦푛) over [0, ∞) with respect to the joint chi-square probability density function {︃ 푛 }︃ 푛 1 1 1 ∑︁ ∏︁ − 푓 (푦) = exp − 푦 푦 2 (5.8) 푌 (2휋)푛/2 2 푖 푖 푖=1 푖=1

2 of 푛 independent chi-square random variables 푌1, ..., 푌푛 ∼ 휒1 with one degree of freedom each. In particular, we have

푝 [︃(︃ (푚) )︃ ]︃ ∫︁ [︂ ∑︀푛 2 ]︂푝 휆1 − 휆1 푖=2 푦푖푃 (휆푖)(휆1 − 휆푖) 푏∼풰(푆푛−1) = min 푛 푓푌 (푦) 푑푦. E 푃 ∈풫 ∑︀ 2 휆1 − 휆푛 [0,∞)푛 푚−1 (휆1 − 휆푛) 푖=1 푦푖푃 (휆푖) 푃 ̸=0 (5.9) Similarly, probabilistic estimates with respect to 푏 ∼ 풰(푆푛−1) can be replaced by estimates

2 with respect to 푌1, ..., 푌푛 ∼ 휒1, as

⎡ ⎤ [︃ (푚) ]︃ ∑︀푛 2 휆1 − 휆1 푖=2 푌푖푃 (휆푖)(휆1 − 휆푖) P ≥ 휖 = P ⎣ min 푛 ≥ 휖⎦ . (5.10) 푏∼풰(푆푛−1) 2 푃 ∈풫 ∑︀ 2 휆1 − 휆푛 푌푖∼휒1 푚−1 (휆1 − 휆푛) 푖=1 푌푖푃 (휆푖) 푃 ̸=0

For the remainder of the chapter, we almost exclusively use equation (5.9) for expected relative error and equation (5.10) for probabilistic bounds for relative error. If 푃 minimizes the expression in equation (5.9) or (5.10) for a given 푦 or 푌 , then any polynomial of the

form 훼푃 , 훼 ∈ R∖{0}, is also a minimizer. Therefore, without any loss of generality, we

can alternatively minimize over the set 풫푚−1(1) = {푄 ∈ 풫푚−1 | 푄(1) = 1}. For the sake of brevity, we will omit the subscripts under E and P when the underlying distribution is clear from context. In this work, we make use of asymptotic notation to express the limiting behavior of a function with respect to 푛. A function 푓(푛) is 푂(푔(푛)) if there exists

푀, 푛0 > 0 such that |푓(푛)| ≤ 푀푔(푛) for all 푛 ≥ 푛0, 표(푔(푛)) if for every 휖 > 0 there exists

160 a 푛휖 such that |푓(푛)| ≤ 휖푔(푛) for all 푛 ≥ 푛0, 휔(푔(푛)) if |푔(푛)| = 표(|푓(푛)|), and Θ(푔(푛)) if 푓(푛) = 푂(푔(푛)) and 푔(푛) = 푂(푓(푛)).

5.3 Asymptotic Lower Bounds and Improved Upper Bounds

[︀ (푚) 푝]︀1/푝 In this section, we obtain improved upper bounds for E ((휆1 − 휆1 )/휆1) , 푝 ≥ 1, and produce nearly-matching lower bounds. In particular, we prove the following theorem for the behavior of 푚 as a function of 푛 as 푛 tends to inﬁnity.

Theorem 19. [︃ ]︃ 휆 (퐴) − 휆(푚)(퐴, 푏) max 1 1 ≥ 1 − 표(1) ≥ 1 − 표(1/푛) 푛 P 퐴∈풮 푏∼풰(푆푛−1) 휆1(퐴) − 휆푛(퐴)

for 푚 = 표(ln 푛),

[︃ ]︃ 휆 (퐴) − 휆(푚)(퐴, 푏) .015 ln2 푛 max 1 1 ≥ ≥ 1 − 표(1/푛) 푛 P푛−1 2 2 퐴∈풮 푏∼풰(푆 ) 휆1(퐴) − 휆푛(퐴) 푚 ln ln 푛 for 푚 = Θ(ln 푛),

[︃ ]︃ 휆 (퐴) − 휆(푚)(퐴, 푏) 1.08 max 1 1 ≥ ≥ 1 − 표(1/푛) 푛 P 2 퐴∈풮 푏∼풰(푆푛−1) 휆1(퐴) − 휆푛(퐴) 푚

(︁ )︁ for 푚 = 표 푛1/2 ln−1/2 푛 and 휔(1), and

푝 1/푝 [︃(︃ (푚) )︃ ]︃ 2 8푝 휆1(퐴) − 휆1 (퐴, 푏) ln (푛(푚 − 1) ) max 푛−1 ≤ .068 푛 E푏∼풰(푆 ) 2 퐴∈풮 휆1(퐴) − 휆푛(퐴) (푚 − 1) for 푛 ≥ 100, 푚 ≥ 10, 푝 ≥ 1.

Proof. The ﬁrst result is a corollary of [105, Theorem A.1], and the remaining results follow from Lemmas 20, 21, and 22.

By Hölder’s inequality, the lower bounds in Theorem 19 also hold for arbitrary 푝-norms,

161 푝 ≥ 1. All results are equally applicable to 휆푛, as the Krylov subspace is unaffected by

shifting and scaling, namely 풦푚(퐴, 푏) = 풦푚(훼퐴 + 훽퐼, 푏) for all 훼 ̸= 0. We begin by producing asymptotic lower bounds. The general technique is as follows.

∞ 푛 We choose an inﬁnite sequence of matrices {퐴푛}푛=1, 퐴푛 ∈ 풮 , treat 푚 as a function of 푛, and show that, as 푛 tends to inﬁnity, for “most" choices of an initial vector 푏, the relative error of this sequence of matrices is well approximated by an integral polynomial minimization problem for which bounds can be obtained. First, we recall a number of useful propositions regarding Gauss-Legendre quadrature, orthogonal polynomials, and Chernoff bounds for chi-square random variables.

푘 푡ℎ Proposition 9. ([41],[114]) Let 푃 ∈ 풫2푘−1, {푥푖}푖=1 be the zeros of the 푘 degree Legendre

polynomial 푃푘(푥), 푃푘(1) = 1, in descending order (푥1 > ... > 푥푘), and 푤푖 = 2(1 − 2 −1 ′ −2 푥푖 ) [푃푘(푥푖)] , 푖 = 1, ..., 푘 be the corresponding weights. Then

푘 ∫︁ 1 ∑︁ 1. 푃 (푥) 푑푥 = 푤푖푃 (푥푖), −1 푖=1

(︀ 1 )︀ (︁ (4푖−1)휋 )︁ −3 2. 푥푖 = 1 − 8푘2 cos 4푘+2 + 푂(푘 ), 푖 = 1, ..., 푘,

휋 √︁ (︂ 1 )︂ 휋 3. 1 − 푥2 1 − ≤ 푤 < ... < 푤 ≤ . 1 1 1 2 2 1 ⌊︀ 푘+1 ⌋︀ 1 푘 + 2 8(푘 + 2 ) (1 − 푥1) 2 푘 + 2 Proposition 10. ([110, Section 7.72], [111]) Let 휔(푥) 푑푥 be a measure on [−1, 1] with

inﬁnitely many points of increase, with orthogonal polynomials {푝푘(푥)}푘≥0, 푝푘 ∈ 풫푘. Then

∫︀ 1 푥푃 2(푥) 휔(푥) 푑푥 max −1 = max {︀푥 ∈ [−1, 1] ⃒ 푝 (푥) = 0}︀. ∫︀ 1 ⃒ 푘+1 푃 ∈풫푘 푃 2(푥) 휔(푥) 푑푥 푃 ̸=0 −1

푘 2 [︀ 푥 {︀ 푥 }︀]︀ 2 Proposition 11. Let 푍 ∼ 휒푘. Then P[푍 ≤ 푥] ≤ 푘 exp 1 − 푘 for 푥 ≤ 푘 and 푘 [︀ 푥 {︀ 푥 }︀]︀ 2 P[푍 ≥ 푥] ≤ 푘 exp 1 − 푘 for 푥 ≥ 푘.

Proof. The result follows from taking [26, Lemma 2.2] and letting 푑 → ∞.

We are now prepared to prove a lower bound of order 푚−2.

162 Lemma 20. If 푚 = 표(︀푛1/2 ln−1/2 푛)︀ and 휔(1), then

[︃ ]︃ 휆 (퐴) − 휆(푚)(퐴, 푏) 1.08 max 1 1 ≥ ≥ 1 − 표(1/푛). 푛 P 2 퐴∈풮 푏∼풰(푆푛−1) 휆1(퐴) − 휆푛(퐴) 푚

Proof. The structure of the proof is as follows. We choose a matrix with eigenvalues based on the zeros of a Legendre polynomial, and show that a large subset of [0, ∞)푛 (with

respect to 푓푌 (푦)) satisﬁes conditions that allow us to lower bound the relative error using an integral minimization problem. The choice of Legendre polynomials is based on connections to Gaussian quadrature, but, as we will later see (in Theorem 21 and experiments), the 1/푚2 dependence is indicative of a large class of matrices with connections to orthogonal

푡ℎ polynomials. Let 푥1 > ... > 푥2푚 be the zeros of the (2푚) degree Legendre polynomial. 푛 Let 퐴 ∈ 풮 have eigenvalue 푥1 with multiplicity one, and remaining eigenvalues given by

푥푗, 푗 = 2, ..., 2푚, each with multiplicity at least ⌊(푛 − 1)/(2푚 − 1)⌋. By equation (5.10),

⎡ ⎤ [︃ (푚) ]︃ ∑︀푛 2 휆1 − 휆1 푖=2 푌푖푃 (휆푖)(휆1 − 휆푖) ≥ 휖 = ⎣ min 푛 ≥ 휖⎦ P P 푃 ∈풫 ∑︀ 2 휆1 − 휆푛 푚−1 (휆1 − 휆푛) 푖=1 푌푖푃 (휆푖) 푃 ̸=0 ⎡ ⎤ ∑︀2푚 ˆ 2 푗=2 푌푗푃 (푥푗)(푥1 − 푥푗) = P ⎣ min ≥ 휖⎦ , 푃 ∈풫 ∑︀2푚 ˆ 2 푚−1 (푥1 − 푥2푚) 푌푗푃 (푥푗) 푃 ̸=0 푗=1

2 ˆ ˆ where 푌1, ..., 푌푛 ∼ 휒1, and 푌푗 is the sum of the 푌푖 that satisfy 휆푖 = 푥푗. 푌1 has one degree of ˆ freedom, and 푌푗, 푗 = 2, ..., 2푚, each have at least ⌊(푛 − 1)/(2푚 − 1)⌋ degrees of freedom.

Let 푤1, ..., 푤2푚 be the weights of Gaussian quadrature associated with 푥1, ..., 푥2푚. By Proposition 9,

4 + 9휋2 4 + 49휋2 푥 = 1 − + 푂(푚−3), 푥 = 1 − + 푂(푚−3), 1 128푚2 2 128푚2

2 4+9휋2 −3 and, therefore, 1 − 푥1 = 64푚2 + 푂(푚 ). Again, by Proposition 9, we can lower bound

163 the smallest ratio between weights by

√︁ (︂ )︂ 푤1 2 1 ≥ 1 − 푥1 1 − 2 2 푤푚 8(2푚 + 1/2) (1 − 푥1) (︃√ )︃ 4 + 9휋2 (︂ 1 )︂ = + 푂(푚−2) 1 − 8푚 (4 + 9휋2)/2 + 푂(푚−1) 2 + 9휋2 = √ + 푂(푚−2). 8 4 + 9휋2 푚

Therefore, by Proposition 11,

[︂ ]︂ [︂ ⌊︂ ⌋︂ ]︂ [︂ ⌊︂ ⌋︂]︂ ˆ 푤푚 ˆ ˆ 1 푛 − 1 ˆ 푤1 푛 − 1 P min 푌푗 ≥ 푌1 ≥ P 푌푗 ≥ , 푗 ≥ 2 × P 푌1 ≤ 푗≥2 푤1 3 2푚 − 1 3푤푚 2푚 − 1 1 ⌊︁ 푛−1 ⌋︁ 1 [︂ ]︂2푚−1[︂ (︂ 푤 ⌊︁ 푛−1 ⌋︁)︂ ]︂ (︁ 2/3 )︁ 1 2 푒 2 2푚−1 푤1 ⌊︀ 푛−1 ⌋︀ 1− ≥ 1 − 1 − 푒 3푤푚 2푚−1 3 3푤푚 2푚−1

= 1 − 표(1/푛).

푛 We now restrict our attention to values of 푌 = (푌1, ..., 푌푛) ∈ [0, ∞) that satisfy the ˆ ˆ inequality 푤1 min푗≥2 푌푗 ≥ 푤푚푌1. If, for some ﬁxed choice of 푌 ,

∑︀2푚 ˆ 2 푗=2 푌푗푃 (푥푗)(푥1 − 푥푗) min ≤ 푥1 − 푥2, (5.11) 푃 ∈풫 ∑︀2푚 ˆ 2 푚−1 푌푗푃 (푥푗) 푃 ̸=0 푗=1

then, by Proposition 9 and Proposition 10 for 휔(푥) = 1,

∑︀2푚 ˆ 2 ∑︀2푚 2 푌푗푃 (푥푗)(푥1 − 푥푗) 푤푗푃 (푥푗)(푥1 − 푥푗) min 푗=2 ≥ min 푗=2 푃 ∈풫 ∑︀2푚 ˆ 2 푃 ∈풫 ∑︀2푚 2 푚−1 푌푗푃 (푥푗) 푚−1 푤푗푃 (푥푗) 푃 ̸=0 푗=1 푃 ̸=0 푗=1 ∑︀2푚 2 푗=1 푤푗푃 (푥푗)(1 − 푥푗) = min − (1 − 푥1) 푃 ∈풫 ∑︀2푚 2 푚−1 푤푗푃 (푥푗) 푃 ̸=0 푗=1 ∫︁ 1 2 푃 (푦)(1 − 푦) 푑푦 2 −1 4 + 9휋 −3 = min 1 − 2 + 푂(푚 ) 푃 ∈풫푚−1 ∫︁ 128푚 푃 ̸=0 푃 2(푦) 푑푦 −1 4 + 9휋2 4 + 9휋2 12 + 27휋2 = − + 푂(푚−3) = + 푂(푚−3). 32푚2 128푚2 128푚2

164 Alternatively, if equation (5.11) does not hold, then we can lower bound the left hand side

5휋2 −3 of (5.11) by 푥1 − 푥2 = 16푚2 + 푂(푚 ). Noting that 푥1 − 푥2푚 ≤ 2 completes the proof, as 12+27휋2 5휋2 256 and 32 are both greater than 1.08.

The previous lemma illustrates that the inverse quadratic dependence of the relative error on the number of iterations 푚 persists up to 푚 = 표(푛1/2/ ln1/2 푛). As we will see in Sections 4 and 6, this 푚−2 error is indicative of the behavior of the Lanczos method for a wide range of typical matrices encountered in practice. Next, we aim to produce a lower bound of the form ln2 푛/(푚2 ln2 ln 푛). To do so, we will make use of more general Jacobi polynomials instead of Legendre polynomials. In addition, in order to obtain a result that contains some dependence on 푛, the choice of Jacobi polynomial must vary with 푛 (i.e., 훼 and 훽 are a function of 푛). However, due to the distribution of eigenvalues required to produce this bound, Gaussian quadrature is no longer exact, and we must make use of estimates for basic composite quadrature. We recall the following propositions regarding composite quadrature and Jacobi polynomials.

1 Proposition 12. If 푓 ∈ 풞 [푎, 푏], then for 푎 = 푥0 < ... < 푥푛 = 푏,

⃒∫︁ 푏 푛 ⃒ ⃒ ∑︁ * ⃒ 푏 − 푎 ′ ⃒ 푓(푥) 푑푥 − (푥푖 − 푥푖−1)푓(푥푖 )⃒ ≤ max |푓 (푥)| max (푥푖 − 푥푖−1), ⃒ ⃒ 2 푥∈[푎,푏] 푖=1,...,푛 ⃒ 푎 푖=1 ⃒

* where 푥푖 ∈ [푥푖−1, 푥푖], 푖 = 1, ..., 푛.

(훼,훽) ∞ Proposition 13. ([103, Chapter 3.2]) Let {푃푘 (푥)}푘=0, 훼, 훽 > −1, be the orthogonal system of Jacobi polynomials over [−1, 1] with respect to weight function 휔훼,훽(푥) = (1 − 푥)훼(1 + 푥)훽, namely,

푘 (︂ )︂ (︂ )︂푖 (훼,훽) Γ(푘 + 훼 + 1) ∑︁ 푘 Γ(푘 + 푖 + 훼 + 훽 + 1) 푥 − 1 푃 (푥) = . 푘 푘! Γ(푘 + 훼 + 훽 + 1) 푖 Γ(푖 + 훼 + 1) 2 푖=0

Then

∫︁ 1 2 훼+훽+1 [︁ (훼,훽) ]︁ 훼,훽 2 Γ(푘 + 훼 + 1)Γ(푘 + 훽 + 1) (i) 푃푘 (푥) 휔 (푥) 푑푥 = , −1 (2푘 + 훼 + 훽 + 1)푘! Γ(푘 + 훼 + 훽 + 1)

165 ⃒ ⃒ {︂Γ(푘 + 훼 + 1) Γ(푘 + 훽 + 1)}︂ ⃒ (훼,훽) ⃒ 1 (ii) max 푃푘 (푥) = max , for max{훼, 훽} ≥ − , 푥∈[−1,1] ⃒ ⃒ 푘! Γ(훼 + 1) 푘! Γ(훽 + 1) 2

푑 푘 + 훼 + 훽 + 1 (iii) 푃 (훼,훽)(푥) = 푃 (훼+1,훽+1)(푥), 푑푥 푘 2 푘−1

√︃ (︂ 훼 + 3/2 )︂2 (iv) max {푥 ∈ [−1, 1] | 푃 (훼,훽)(푥) = 0} ≤ 1 − for 훼 ≥ 훽 > − 11 and 푘 푘 + 훼 + 1/2 12 푘 > 1.

Proof. Equations (i)-(iii) are standard results, and can be found in [103, Chapter 3.2] or

푡ℎ [110]. What remains is to prove (iv). By [90, Lemma 3.5], the largest zero 푥1 of the 푘 (휆−1/2,휆−1/2) degree Gegenbauer polynomial (i.e., 푃푘 (푥)), with 휆 > −5/12, satisﬁes

(푘 − 1)(푘 + 2휆 + 1) (푘 − 1)(푘 + 2휆 + 1) (︂휆 + 1)︂2 푥2 ≤ ≤ = 1− . 1 2 5 1 2 2 (푘 + 휆) + 3휆 + 4 + 3(휆 + 2 ) /(푘 − 1) (푘 + 휆) 푘 + 휆

(훼,훽+푡) By [37, Theorem 2.1], the largest zero of 푃푘 (푥) is strictly greater than the largest zero (훼,훽) of 푃푘 (푥) for any 푡 > 0. As 훼 ≥ 훽, combining these two facts provides our desired result.

For the sake of brevity, the inner product and norm on [−1, 1] with respect to 휔훼,훽(푥) =

훼 훽 (1 − 푥) (1 + 푥) will be denoted by ⟨·, ·⟩훼,훽 and ‖·‖훼,훽. We are now prepared to prove the following lower bound.

Lemma 21. If 푚 = Θ (ln 푛), then

[︃ ]︃ 휆 (퐴) − 휆(푚)(퐴, 푏) .015 ln2 푛 max 1 1 ≥ ≥ 1 − 표(1/푛). 푛 P푛−1 2 2 퐴∈풮 푏∼풰(푆 ) 휆1(퐴) − 휆푛(퐴) 푚 ln ln 푛

Proof. The structure of the proof is similar in concept to that of Lemma 20. We choose a matrix with eigenvalues based on a function corresponding to an integral minimization

푛 problem, and show that a large subset of [0, ∞) (with respect to 푓푌 (푦)) satisﬁes conditions that allow us to lower bound the relative error using the aforementioned integral minimization problem. The main difﬁculty is that, to obtain improved bounds, we must use Proposition 10

166 with a weight function that requires quadrature for functions that are no longer polynomials, and, therefore, cannot be represented exactly using Gaussian quadrature. In addition, the required quadrature is for functions whose derivative has a singularity at 푥 = 1, and so Proposition 12 is not immediately applicable. Instead, we perform a two-part error analysis. In particular, if a function is 퐶1[0, 푎], 푎 < 1, and monotonic on [푎, 1], then using Proposition 12 on [0, 푎] and a monotonicity argument on [푎, 1] results in an error bound for quadrature.

⌊︀ .2495 ln 푛 ⌋︀ ⌊︀ 4.004ℓ⌋︀ Let ℓ = ln ln 푛 , 푘 = 푚 , and 푚 = Θ(ln 푛). We assume that 푛 is sufﬁciently large, so that ℓ > 0. ℓ is quite small for practical values of 푛, but the growth of ℓ with respect to 푛 is required for a result with dependence on 푛. Consider a matrix 퐴 ∈ 풮푛 with eigenvalues given by 푓(푥푗), 푗 = 1, ..., 푘, each with multiplicity either ⌊푛/푘⌋ or ⌈푛/푘⌉, 1 where 푓(푥) = 1 − 2(1 − 푥) ℓ , and 푥푗 = 푗/푘, 푗 = 1, ..., 푘. Then

⎡ ⎤ [︃ (푚) ]︃ ∑︀푛 2 휆1 − 휆1 푖=2 푌푖푃 (휆푖)(휆1 − 휆푖) ≥ 휖 = ⎣ min 푛 ≥ 휖⎦ P P 푃 ∈풫 ∑︀ 2 휆1 − 휆푛 푚−1 (휆1 − 휆푛) 푖=1 푌푖푃 (휆푖) 푃 ̸=0 ⎡ ⎤ ∑︀푘 ˆ 2 푗=1 푌푗푃 (푓(푥푗)) (1 − 푓(푥푗)) ≥ P ⎣ min ≥ 휖⎦ , 푃 ∈풫 ∑︀푘 ˆ 2 푚−1 2 푌푗푃 (푓(푥푗)) 푃 ̸=0 푗=1

2 ˆ ˆ where 푌1, ..., 푌푛 ∼ 휒1, and 푌푗 is the sum of the 푌푖’s that satisfy 휆푖 = 푓(푥푗). Each 푌푗, 푗 = 1, ..., 푘, has either ⌊푛/푘⌋ or ⌈푛/푘⌉ degrees of freedom. Because 푚 = Θ(ln 푛), we have 푘 = 표(푛.999) and, by Proposition 11,

(︂ 푛 푛 )︂푘 [︁ ]︁ ⌊ ⌋ ⌈ ⌉ 푛 ˆ 푛 (︀ .001)︀ 푘 (︀ −.001)︀ 푘 P .999⌊ 푘 ⌋ ≤ 푌푗 ≤ 1.001⌈ 푘 ⌉, 푗 = 1, ..., 푘 ≥ 1 − .999푒 − 1.001푒 = 1 − 표(1/푛).

Therefore, with probability 1 − 표(1/푛),

∑︀푘 ˆ 2 ∑︀푘 2 푌푗푃 (푓(푥푗)) (1 − 푓(푥푗)) .998 푃 (푓(푥푗))(1 − 푓(푥푗)) min 푗=1 ≥ min 푗=1 . 푃 ∈풫 ∑︀푘 ˆ 2 푃 ∈풫 ∑︀푘 2 푚−1 2 푌푗푃 (푓(푥푗)) 2 푚−1 푃 (푓(푥푗)) 푃 ̸=0 푗=1 푃 ̸=0 푗=1

167 ∑︀푚−1 (ℓ,0) Let 푃 (푦) = 푟=0 푐푟푃푟 (푦), 푐푟 ∈ R, 푟 = 0, ..., 푚 − 1, and deﬁne

(ℓ,0) (ℓ,0) 푔푟,푠(푥) = 푃푟 (푓(푥))푃푠 (푓(푥)) and 푔ˆ푟,푠(푥) = 푔푟,푠(푥)(1 − 푓(푥)),

푟, 푠 = 0, ..., 푚 − 1. Then we have

∑︀푘 2 ∑︀푚−1 ∑︀푘 푃 (푓(푥푗))(1 − 푓(푥푗)) 푐푟푐푠 푔ˆ푟,푠(푥푗) 푗=1 = 푟,푠=0 푗=1 . ∑︀푘 2 ∑︀푚−1 ∑︀푘 푗=1 푃 (푓(푥푗)) 푟,푠=0 푐푟푐푠 푗=1 푔푟,푠(푥푗)

∑︀푘 ∑︀푘 We now replace each quadrature 푗=1 푔ˆ푟,푠(푥푗) or 푗=1 푔푟,푠(푥푗) in the previous expression

by its corresponding integral, plus a small error term. The functions 푔푟,푠 and 푔ˆ푟,푠 are not elements of 풞1[0, 1], as 푓 ′(푥) has a singularity at 푥 = 1, and, therefore, we cannot use Proposition 12 directly. Instead we break the error analysis of the quadrature into two

1 components. We have 푔푟,푠, 푔ˆ푟,푠 ∈ 풞 [0, 푎] for any 0 < 푎 < 1, and, if 푎 is chosen to be large

enough, both 푔푟,푠 and 푔ˆ푟,푠 will be monotonic on the interval [푎, 1]. In that case, we can apply Proposition 12 to bound the error over the interval [0, 푎] and use basic properties of Riemann sums of monotonic functions to bound the error over the interval [푎, 1].

The function 푓(푥) is increasing on [0, 1], and, by equations (iii) and (iv) of Proposition (ℓ,0) 13, the function 푃푟 (푦), 푟 = 2, ..., 푚 − 1, is increasing on the interval

⎡√︃ ⎤ (︂ ℓ + 5/2 )︂2 1 − , 1 . (5.12) ⎣ 푚 + ℓ − 1/2 ⎦

(ℓ,0) (ℓ,0) The functions 푃0 (푦) = 1 and 푃1 (푦) = ℓ + 1 + (ℓ + 2)(푦 − 1)/2 are clearly non- √ decreasing over interval (5.12). By the inequality 1 − 푦 ≤ 1−푦/2, 푦 ∈ [0, 1], the function (ℓ,0) 푃푟 (푓(푥)), 푟 = 0, ..., 푚 − 1, is non-decreasing on the interval

[︃ ]︃ (︂ ℓ + 5/2 )︂2ℓ 1 − , 1 . (5.13) 2푚 + 2ℓ − 1

Therefore, the functions 푔푟,푠(푥) are non-decreasing and 푔ˆ푟,푠(푥) are non-increasing on the 2ℓ (︁ ℓ+5/2 )︁ interval (5.13) for all 푟, 푠 = 0, ..., 푚 − 1. The term 2푚+2ℓ−1 = 휔(1/푘), and so, for

168 sufﬁciently large 푛, there exists an index 푗* ∈ {1, ..., 푘 − 2} such that

(︂ ℓ + 5/2 )︂2ℓ (︂ ℓ + 5/2 )︂2ℓ 1 1 − ≤ 푥 * < 1 − + < 1. 2푚 + 2ℓ − 1 푗 2푚 + 2ℓ − 1 푘

We now upper bound the derivatives of 푔푟,푠 and 푔ˆ푟,푠 on the interval [0, 푥푗* ]. By equation

(iii) of Proposition 13, the ﬁrst derivatives of 푔푟,푠 and 푔ˆ푟,푠 are given by

(︃[︂ ]︂ [︂ ]︂ )︃ ′ ′ 푑 (ℓ,0) (ℓ,0) (ℓ,0) 푑 (ℓ,0) 푔푟,푠(푥) = 푓 (푥) 푃푟 (푦) 푃푠 (푓(푥)) + 푃푟 (푓(푥)) 푃푠 (푦) 푑푦 푦=푓(푥) 푑푦 푦=푓(푥) ℓ−1 − (1 − 푥) ℓ (︂ = (푟 + ℓ + 1)푃 (ℓ+1,1)(푓(푥))푃 (ℓ,0)(푓(푥)) ℓ 푟−1 푠 )︂ (ℓ,0) (ℓ+1,1) +(푠 + ℓ + 1)푃푟 (푓(푥))푃푠−1 (푓(푥)) ,

and ℓ−1 − 1 (1 − 푥) ℓ 푔ˆ′ (푥) = 푔′ (푥)(1 − 푥) ℓ − 2푔 (푥) . 푟,푠 푟,푠 푟,푠 ℓ 푦 (︀푥)︀ (︁ 푒푥 )︁ By equation (ii) of Proposition 13, and the inequality 푦 ≤ 푦 , 푥, 푦 ∈ N, 푦 ≤ 푥,

ℓ−1 − ℓ (︂ (︂ )︂(︂ )︂ ′ (1 − 푥푗* ) 푟 + ℓ 푠 + ℓ max |푔푟,푠(푥)| ≤ (푟 + ℓ + 1) 푥∈[0,푥푗* ] ℓ ℓ + 1 ℓ (︂푟 + ℓ)︂(︂푠 + ℓ)︂)︂ +(푠 + ℓ + 1) ℓ ℓ + 1 ℓ−1 (︃ )︃− ℓ 푚 + ℓ (︂ ℓ + 5/2 )︂2ℓ 1 (︂푒(푚 + ℓ − 1))︂2ℓ+1 ≤ 2 − , ℓ 2푚 + 2ℓ − 1 푘 ℓ

and

ℓ−1 − ℓ (︂ )︂(︂ )︂ ⃒ ′ ⃒ ′ (1 − 푥푗* ) 푟 + ℓ 푠 + ℓ max ⃒푔ˆ푟,푠(푥)⃒ ≤ max |푔푟,푠(푥)| + 2 푥∈[0,푥푗* ] 푥∈[0,푥푗* ] ℓ ℓ ℓ ℓ−1 (︃ )︃− ℓ 푚 + ℓ (︂ ℓ + 5/2 )︂2ℓ 1 (︂푒(푚 + ℓ − 1))︂2ℓ+1 ≤ 4 − . ℓ 2푚 + 2ℓ − 1 푘 ℓ

′ ′ 4.002ℓ Therefore, both max푥∈[0,푥푗* ] |푔푟,푠(푥)| and max푥∈[0,푥푗* ] |푔ˆ푟,푠(푥)| are 표(푚 ). Then, by

169 Proposition 12, equation (ii) of Proposition 13 and monotonicity on the interval [푥푗* , 1], we have

⃒ 푘 ⃒ ⃒ 푗* ⃒ ⃒1 ∑︁ ∫︁ 1 ⃒ ⃒1 ∑︁ ∫︁ 푥푗* ⃒ ⃒ 푔 (푥 ) − 푔 (푥) 푑푥⃒ ≤ ⃒ 푔 (푥 ) − 푔 (푥) 푑푥⃒ ⃒푘 푟,푠 푗 푟,푠 ⃒ ⃒푘 푟,푠 푗 푟,푠 ⃒ ⃒ 푗=1 0 ⃒ ⃒ 푗=1 0 ⃒ ⃒ 푘 ⃒ ⃒1 ∑︁ ∫︁ 1 ⃒ + ⃒ 푔 (푥 ) − 푔 (푥) 푑푥⃒ ⃒푘 푟,푠 푗 푟,푠 ⃒ ⃒ 푗=푗*+1 푎 ⃒ 1 + 표(1) 푔 (1) ≤ + 푟,푠 2푘푚4.002ℓ 푘 1 + 표(1) 1 (︂푒(푚 + ℓ + 1))︂2(ℓ+1) ≤ + 2푚.002ℓ 푘 ℓ + 1 1 + 표(1) ≤ , 2푚.002ℓ and, similarly,

⃒ 푘 ⃒ ⃒ 푗* ⃒ ⃒1 ∑︁ ∫︁ 1 ⃒ ⃒1 ∑︁ ∫︁ 푥푗* ⃒ ⃒ 푔ˆ (푥 ) − 푔ˆ (푥) 푑푥⃒ ≤ ⃒ 푔ˆ (푥 ) − 푔ˆ (푥) 푑푥⃒ ⃒푘 푟,푠 푗 푟,푠 ⃒ ⃒푘 푟,푠 푗 푟,푠 ⃒ ⃒ 푗=1 0 ⃒ ⃒ 푗=1 0 ⃒ ⃒ 푘 ⃒ ⃒1 ∑︁ ∫︁ 1 ⃒ + ⃒ 푔ˆ (푥 ) − 푔ˆ (푥) 푑푥⃒ ⃒푘 푟,푠 푗 푟,푠 ⃒ ⃒ 푗=푗*+1 푥푗* ⃒

1 + 표(1) 푔ˆ (푥 * ) ≤ + 푟,푠 푗 2푚.002ℓ 푘 1 + 표(1) 푔 (1) ≤ + 푟,푠 2푚.002ℓ 푘 1 + 표(1) ≤ . 2푚.002ℓ

Let us denote this upper bound by 푀 = (1 + 표(1))/(2푚.002ℓ). By using the substitution

1−푦 ℓ 푥 = 1 − ( 2 ) , we have

∫︁ 1 ∫︁ 1 ℓ (ℓ,0) (ℓ,0) ℓ (ℓ,0) (ℓ,0) 푔ˆ푟,푠(푥) 푑푥 = ℓ ⟨푃푟 , 푃푠 ⟩ℓ,0 and 푔푟,푠(푥) 푑푥 = ℓ ⟨푃푟 , 푃푠 ⟩ℓ−1,0. 0 2 0 2

170 Then

푚−1 [︁ (ℓ,0) (ℓ,0) ]︁ ∑︀푘 2 ∑︀ ℓ 푃 (푓(푥푗)) 푟,푠=0 푐푟푐푠 2ℓ ⟨푃푟 , 푃푠 ⟩ℓ−1,0 + 휖푟,푠 max 푗=1 ≤ max . 푘 푚 [︁ ]︁ 푃 ∈풫푚−1 ∑︀ 푃 2(푓(푥 ))(1 − 푓(푥 )) 푐∈R ∖0 ∑︀푚−1 ℓ (ℓ,0) (ℓ,0) 푗=1 푗 푗 푐푟푐푠 ℓ ⟨푃푟 , 푃푠 ⟩ℓ,0 +휖 ˆ푟,푠 푃 ̸=0 |휖푟,푠|≤푀 푟,푠=0 2 |휖^푟,푠|≤푀

(ℓ,0) Letting 푐˜푟 = 푐푟‖푃푟 ‖ℓ,0, 푟 = 0, ..., 푚−1, and noting that, by equation (i) of Proposition 1/2 (ℓ,0) (︁ 2ℓ+1 )︁ 13, ‖푃푟 ‖ℓ,0 = 2푟+ℓ+1 , we obtain the bound

[︂ (ℓ,0) (ℓ,0) ]︂ ∑︀푚−1 ⟨푃푟 ,푃푠 ⟩ℓ−1,0 ∑︀푘 2 푟,푠=0 푐˜푟푐˜푠 (ℓ,0) (ℓ,0) + 휖푟,푠 푃 (푓(푥푗)) ‖푃푟 ‖ℓ,0‖푃푠 ‖ℓ,0 max 푗=1 ≤ max , 푘 푚 푚−1 푚−1 푃 ∈풫푚−1 ∑︀ 2 푐˜∈ ∖0 ∑︀ 2 ∑︀ 푗=1 푃 (푓(푥푗))(1 − 푓(푥푗)) R 푟=0 푐˜푟 + 푟,푠=0 푐˜푟푐˜푠휖ˆ푟,푠 푃 ̸=0 |휖푟,푠|≤휖 |휖^푟,푠|≤휖 where 푀(2푚 + ℓ − 1) 2푚 + ℓ − 1 휖 = = (1 + 표(1)) = 표(1/푚). ℓ 2푚.002ℓℓ

푚×푚 (ℓ,0) (ℓ,0) (ℓ,0) −1 (ℓ,0) −1 Let 퐵 ∈ R be given by 퐵(푟, 푠) = ⟨푃푟 , 푃푠 ⟩ℓ−1,0‖푃푟 ‖ℓ,0 ‖푃푠 ‖ℓ,0 , 푟, 푠 = 0, ..., 푚 − 1. By Proposition 10 applied to 휔ℓ−1,0(푥) and equation (iv) of Proposition 13, we have 푐˜푇 퐵푐˜ 1 (︂푚 + ℓ − 1/2)︂2 max ≤ ≤ 2 . 푐˜∈ 푚∖0 푇 √︂ 2 R 푐˜ 푐˜ (︁ ℓ+1/2 )︁ ℓ + 1/2 1 − 1 − 푚+ℓ−1/2

This implies that

∑︀푘 2 푇 (︀ 푇 )︀ 푃 (푓(푥푗)) 푐˜ 퐵 + 휖11 푐˜ max 푗=1 ≤ max 푃 ∈풫 ∑︀푘 2 푐˜∈ 푚 푇 11푇 푚−1 푃 (푓(푥푗))(1 − 푓(푥푗)) R 푐˜ (퐼 − 휖 )푐 ˜ 푃 ̸=0 푗=1 푐˜̸=0 1 [︂푐˜푇 퐵푐˜]︂ 휖푚 ≤ max + 푚 푇 1 − 휖푚 푐˜∈R 푐˜ 푐˜ 1 − 휖푚 푐˜̸=0 2 (︂푚 + ℓ − 1/2)︂2 휖푚 ≤ + , 1 − 휖푚 ℓ + 1/2 1 − 휖푚

171 and that, with probability 1 − 표(1/푛),

∑︀푘 ˆ 2 2 2 푌푗푃 (푓(푥푗)) (1 − 푓(푥푗)) .998 (︂ ℓ + 1/2 )︂ .015 ln 푛 min 푗=1 ≥ (1−표(1)) ≥ . 푃 ∈풫 ∑︀푘 ˆ 2 2 2 푚−1 푌푗푃 (푓(푥푗)) 4 푚 + ℓ − 1/2 푚 ln ln 푛 푃 ̸=0 푗=1

This completes the proof.

In the previous proof, a number of very small constants were used, exclusively for the purpose of obtaining an estimate with constant as close to 1/64 as possible. The constants used in the deﬁnition of ℓ and 푘 can be replaced by more practical numbers (that begin to exhibit asymptotic convergence for reasonably sized 푛), at the cost of a constant worse than .015. This completes the analysis of asymptotic lower bounds. We now move to upper bounds for relative error in the 푝-norm. Our estimate for relative error in the one-norm is of the same order as the estimate (5.4), but with an improved constant. Our technique for obtaining these estimates differs from the technique of [67] in

one key way. Rather than integrating ﬁrst by 푏1 and using properties of the arctan function, we replace the ball 퐵푛 by 푛 chi-square random variables on [0, ∞)푛, and iteratively apply Cauchy-Schwarz to our relative error until we obtain an exponent 푐 for which the inverse chi-square distribution with one degree of freedom has a convergent 푐푡ℎ moment.

Lemma 22. Let 푛 ≥ 100, 푚 ≥ 10, and 푝 ≥ 1. Then

푝 1 [︃(︃ (푚) )︃ ]︃ 푝 2 8푝 휆1(퐴) − 휆1 (퐴, 푏) ln (푛(푚 − 1) ) max 푛−1 ≤ .068 . 푛 E푏∼풰(푆 ) 2 퐴∈풮 휆1(퐴) − 휆푛(퐴) (푚 − 1)

Proof. Without loss of generality, suppose 휆1(퐴) = 1 and 휆푛(퐴) = 0. By repeated application of Cauchy Schwarz,

푛 푛 1 푛 푞 1 ∑︀ 푦 푄2(휆 )(1 − 휆 ) [︂∑︀ 푦 푄2(휆 )(1 − 휆 )2 ]︂ 2 [︂∑︀ 푦 푄2(휆 )(1 − 휆 )2 ]︂ 2푞 푖=1 푖 푖 푖 ≤ 푖=1 푖 푖 푖 ≤ 푖=1 푖 푖 푖 ∑︀푛 2 ∑︀푛 2 ∑︀푛 2 푖=1 푦푖푄 (휆푖) 푖=1 푦푖푄 (휆푖) 푖=1 푦푖푄 (휆푖)

for 푞 ∈ N. Choosing 푞 to satisfy 2푝 < 2푞 ≤ 4푝, and using equation (5.9) with polynomial

172 normalization 푄(1) = 1, we have

푝 1 푝 1 [︃(︃ (푚) )︃ ]︃ 푝 [︃ 푞 ]︃ 푝 휆 − 휆 ∫︁ (︂∑︀푛 푦 푄2(휆 )(1 − 휆 )2 )︂ 2푞 1 1 ≤ min 푖=2 푖 푖 푖 푓 (푦) 푑푦 E ∑︀푛 2 푌 휆1 − 휆푛 푄∈풫푚−1(1) [0,∞)푛 푖=1 푦푖푄 (휆푖) 1 푝 ⎡ ⎤ 푝 (︃∑︀ 2 2푞 )︃ 2푞 ∫︁ 푦푖푄 (휆푖)(1 − 휆푖) ≤ min 푖:휆푖<훽 푓 (푦) 푑푦 ⎣ ∑︀푛 2 푌 ⎦ 푄∈풫푚−1(1) [0,∞)푛 푖=1 푦푖푄 (휆푖)

1 푝 ⎡ ⎤ 푝 (︃∑︀ 2 2푞 )︃ 2푞 ∫︁ 푦푖푄 (휆푖)(1 − 휆푖) + 푖:휆푖≥훽 푓 (푦) 푑푦 ⎣ ∑︀푛 2 푌 ⎦ [0,∞)푛 푖=1 푦푖푄 (휆푖)

for any 훽 ∈ (0, 1). The integrand of the ﬁrst term satisﬁes

푝 1 (︃∑︀ 2 2푞 )︃ 2푞 (︃∑︀ 2 2푞 )︃ 4 푦푖푄 (휆푖)(1 − 휆푖) 푦푖푄 (휆푖)(1 − 휆푖) 푖:휆푖<훽 ≤ 푖:휆푖<훽 ∑︀푛 2 ∑︀푛 2 푖=1 푦푖푄 (휆푖) 푖=1 푦푖푄 (휆푖) 1 (︂∑︀ )︂ 4 푞−2 푦푖 ≤ max |푄(푥)|1/2(1 − 푥)2 푖:휆푖<훽 , 푥∈[0,훽] 푦1

and the second term is always bounded above by max휆∈[훽,1](1 − 휆) = 1 − 훽. We replace 2 푇푚−1( 훽 푥−1) the minimizing polynomial in 풫푚−1(1) by 푄̂︀(푥) = 2 , where 푇푚−1(·) is the 푇푚−1( 훽 −1) Chebyshev polynomial of the ﬁrst kind. The Chebyshev polynomials 푇푚−1(·) are bounded by one in magnitude on the interval [−1, 1], and this bound is tight at the endpoints. Therefore, our maximum is achieved at 푥 = 0, and

⃒ (︂ )︂⃒−1/2 1/2 2푞−2 ⃒ 2 ⃒ max |푄̂︀(푥)| (1 − 푥) = ⃒푇푚−1 − 1 ⃒ . 푥∈[0,훽] ⃒ 훽 ⃒

(︁ √ 푚−1 √ 푚−1)︁ (︀ 2 )︀ (︀ 2 )︀ By the deﬁnition 푇푚−1(푥) = 1/2 푥 − 푥 − 1 + 푥 + 푥 − 1 , |푥| ≥ 1,

173 2푥 1+푥 and the standard inequality 푒 ≤ 1−푥 , 푥 ∈ [0, 1],

⎛ √︃ ⎞푚−1 √ (︂ 2 )︂ 1 2 (︂ 2 )︂2 1 (︂1 + 1 − 훽 )︂푚−1 푇푚−1 − 1 ≥ ⎝ − 1 + − 1 − 1⎠ = √ 훽 2 훽 훽 2 1 − 1 − 훽 1 {︁ }︁ ≥ exp 2√︀1 − 훽 (푚 − 1) . 2

In addition,

∫︁ (︂∑︀푛 )︂1/4 푖=2 푦푖 Γ(푛/2 − 1/4) Γ(1/4) Γ(1/4) 1/4 푓푌 (푦) 푑푦 = ≤ 1/4 푛 , [0,∞)푛 푦1 Γ(푛/2 − 1/2)Γ(1/2) 2 Γ(1/2) which gives us

1 푝 [︃(︃ (푚) )︃ ]︃ 푝 [︂ 1/4 ]︂1/푝 휆1 − 휆1 2 Γ(1/4) 1/4 −훾(푚−1)/푝 2 E ≤ 푛 푒 + 훾 , 휆1 − 휆푛 Γ(1/2)

√ 푝 (︀ 1/4푝 2)︀ where 훾 = 1 − 훽. Setting 훾 = 푚−1 ln 푛 (푚 − 1) (assuming 훾 < 1, otherwise our bound is already greater than one, and trivially holds), we obtain

(︁ 1/4 )︁1/푝 [︃(︃ (푚) )︃푝]︃1/푝 2 Γ(1/4) + 푝2 ln2(푛1/4푝(푚 − 1)2) 휆1 − 휆1 Γ(1/2) E ≤ 2 휆1 − 휆푛 (푚 − 1) 1/푝 (︁ 21/4Γ(1/4) )︁ 1 2 8푝 Γ(1/2) + 16 ln (푛(푚 − 1) ) = (푚 − 1)2 ln2 (푛(푚 − 1)8푝) ≤ .068 , (푚 − 1)2

21/4Γ(1/4) for 푚 ≥ 10, 푛 ≥ 100. The constant of .068 is produced by upper bounding Γ(1/2) by a constant1 times ln2 [︀푛(푚 − 1)8푝]︀. This completes the proof.

∑︀푛 A similar proof, paired with probabilistic bounds on the quantity 푖=2 푌푖/푌1, where

1/4 1 2 Γ(1/4) 2 [︀ 8푝]︀ The best constant in this case is given by the ratio of Γ(1/2) to the minimum value of ln 푛(푚 − 1) over 푚 ≥ 10, 푛 ≥ 100.

174 2 푌1, ..., 푌푛 ∼ 휒1, gives a probabilistic upper estimate. Combining the lower bounds in Lemmas 20 and 21 with the upper bound of Lemma 22 completes the proof of Theorem 19.

5.4 Distribution Dependent Bounds

In this section, we consider improved estimates for matrices with certain speciﬁc properties.

First, we show that if a matrix 퐴 has a reasonable number of eigenvalues near 휆1(퐴), then we can produce an error estimate that depends only on the number of iterations 푚. The intuition is that if there are not too few eigenvalues near the maximum eigenvalue, then the initial vector has a large inner product with a vector with Rayleigh quotient close to the maximum eigenvalue. In particular, we suppose that the eigenvalues of our matrix 퐴 are such that, once scaled, there are at least 푛/(푚 − 1)훼 eigenvalues in the range [훽, 1], for a speciﬁc value of 훽 satisfying 1 − 훽 = 푂 (︀푚−2 ln2 푚)︀. For a large number of matrices for which the Lanczos method is used, this assumption holds true. Under this assumption, we prove the following error estimate.

Theorem 20. Let 퐴 ∈ 풮푛, 푚 ≥ 10, 푝 ≥ 1, 훼 > 0, and 푛 ≥ 푚(푚 − 1)훼. If

{︃ ⃒ (︂ )︂2}︃ ⃒ 휆1(퐴) − 휆푖(퐴) (2푝 + 훼/4) ln(푚 − 1) 푛 # 휆푖(퐴) ⃒ ≤ ≥ 훼 , ⃒ 휆1(퐴) − 휆푛(퐴) 푚 − 1 (푚 − 1)

then

푝 1 [︃(︃ (푚) )︃ ]︃ 푝 2 2 휆1(퐴) − 휆1 (퐴, 푏) (2푝 + 훼/4) ln (푚 − 1) 푛−1 E푏∼풰(푆 ) ≤ .077 2 . 휆1(퐴) − 휆푛(퐴) (푚 − 1)

In addition, if

{︃ ⃒ (︂ )︂2}︃ ⃒ 휆1(퐴) − 휆푖(퐴) (훼 + 2) ln(푚 − 1) 푛 # 휆푖(퐴) ⃒ ≤ ≥ 훼 , ⃒ 휆1(퐴) − 휆푛(퐴) 4(푚 − 1) (푚 − 1)

175 then

[︃ (푚) 2 2 ]︃ 휆1(퐴) − 휆1 (퐴, 푏) (훼 + 2) ln (푚 − 1) −푚 P ≤ .126 2 ≥ 1 − 푂(푒 ). 푏∼풰(푆푛−1) 휆1(퐴) − 휆푛(퐴) (푚 − 1)

Proof. We begin by bounding expected relative error. The main idea is to proceed as in

the proof of Lemma 22, but take advantage of the number of eigenvalues near 휆1. For

simplicity, let 휆1 = 1 and 휆푛 = 0. We consider eigenvalues in the ranges [0, 2훽 − 1) and [2훽 − 1, 1] separately, 1/2 < 훽 < 1, and then make use of the lower bound for the number of eigenvalues in [훽, 1]. From the proof of Lemma 22, we have

1 1 푝 푝 ⎡ ⎤ 푝 [︃(︃ (푚) )︃ ]︃ 푝 (︃∑︀ 2 2푞 )︃ 2푞 휆 − 휆 ∫︁ 푦푖푄 (휆푖)(1 − 휆푖) 1 1 ≤ min 푖:휆푖<2훽−1 푓 (푦) 푑푦 E ⎣ ∑︀푛 2 푌 ⎦ 휆1 − 휆푛 푄∈풫푚−1(1) [0,∞)푛 푖=1 푦푖푄 (휆푖)

1 푝 ⎡ ⎤ 푝 (︃∑︀ 2 2푞 )︃ 2푞 ∫︁ 푦푖푄 (휆푖)(1 − 휆푖) + 푖:휆푖≥2훽−1 푓 (푦) 푑푦 , ⎣ ∑︀푛 2 푌 ⎦ [0,∞)푛 푖=1 푦푖푄 (휆푖)

푞 where 푞 ∈ N, 2푝 < 2 ≤ 4푝, 푓푌 (푦) is given by (5.8), and 훽 ∈ (1/2, 1). The second term is at most 2(1 − 훽), and the integrand in the ﬁrst term is bounded above by

푞 (︃∑︀ 2 2푞 )︃푝/2 (︃∑︀ 2 2푞 )︃1/4 푦푖푄 (휆푖)(1 − 휆푖) 푦푖푄 (휆푖)(1 − 휆푖) 푖:휆푖<2훽−1 ≤ 푖:휆푖<2훽−1 ∑︀푛 2 ∑︀푛 2 푖=1 푦푖푄 (휆푖) 푖=1 푦푖푄 (휆푖)

(︃∑︀ 2 2푞 )︃1/4 푦푖푄 (휆푖)(1 − 휆푖) ≤ 푖:휆푖<2훽−1 ∑︀ 푦 푄2(휆 ) 푖:휆푖≥훽 푖 푖 1/4 1/2 2푞−2 (︃∑︀ )︃ max |푄(푥)| (1 − 푥) 푦푖 ≤ 푥∈[0,2훽−1] 푖:휆푖<2훽−1 . 1/2 ∑︀ 푦 min푥∈[훽,1] |푄(푥)| 푖:휆푖≥훽 푖

√ (2푝+훼/4) ln(푚−1) Let 1 − 훽 = 푚−1 < 1/4 (if 훽 ≤ 1/2, then our estimate is a trivial one, and we are already done). By the condition of the theorem, there are at least 푛/(푚−1)훼 eigenvalues

176 in the interval [훽, 1]. Therefore,

(︃∑︀ )︃1/4 ∫︁ 푦푖 [︁ ]︁ [︁ ]︁ 푖:휆푖<2훽−1 ˆ 1/4 ˜ −1/4 푓푌 (푦) 푑푦 ≤ ^ 2 푌 ˜ 2 푌 ∑︀ E푌 ∼휒푛 E푌 ∼휒 훼 푛 푦 ⌈푛/(푚−1) ⌉ [0,∞) 푖:휆푖≥훽 푖 Γ(푛/2 + 1/4)Γ(⌈푛/(푚 − 1)훼⌉/2 − 1/4) = Γ(푛/2)Γ(⌈푛/(푚 − 1)훼⌉/2) Γ(푚(푚 − 1)훼/2 + 1/4)Γ(푚/2 − 1/4) ≤ Γ(푚(푚 − 1)훼/2)Γ(푚/2) ≤ 1.04 (푚 − 1)훼/4

for 푛 ≥ 푚(푚 − 1)훼 and 푚 ≥ 10. Replacing the minimizing polynomial by 푄̂︀(푥) = 2푥 푇푚−1( 2훽−1 −1) 2 , we obtain 푇푚−1( 2훽−1 −1)

⃒ ⃒ 1 ⃒ ⃒ 2 2푞−2 max푥∈[0,2훽−1] ⃒푄̂︀(푥)⃒ (1 − 푥) 1 1 √ √ = ≤ ≤ 2푒− 1−훽(푚−1). 1 1/2 (︁ 2훽 )︁ 1/2 (︁ 2 )︁ ⃒ ⃒ 2 푇 − 1 푇 − 1 ⃒ ⃒ 푚−1 2훽−1 푚−1 훽 min푥∈[훽,1] ⃒푄̂︀(푥)⃒

Combining our estimates results in the bound

1 [︃(︃ )︃푝]︃ (푚) 푝 √ 1/푝 √ 휆1 − 휆1 (︁ 훼/4)︁ − 1−훽(푚−1)/푝 E ≤ 1.04 2(푚 − 1) 푒 + 2(1 − 훽) 휆1 − 휆푛 √ (1.04 2)1/푝 (2푝 + 훼/4)2 ln2(푚 − 1) = + (푚 − 1)2 (푚 − 1)2 (2푝 + 훼/4)2 ln2(푚 − 1) ≤ .077 (푚 − 1)2

for 푚 ≥ 10, 푝 ≥ 1, and 훼 > 0. This completes the bound for expected relative error. We

177 √ (훼+2) ln(푚−1) now produce a probabilistic bound for relative error. Let 1 − 훽 = 4(푚−1) . We have

휆 − 휆(푚) ∑︀푛 푌 푄2(휆 )(1 − 휆 ) 1 1 = min 푖=2 푖 푖 푖 ∑︀푛 2 휆1 − 휆푛 푄∈풫푚−1(1) 푖=1 푌푖푄 (휆푖) 2 ∑︀ max 푄 (푥)(1 − 푥) 푌푖 ≤ min 푥∈[0,2훽−1] 푖:휆푖<2훽−1 + 2(1 − 훽) 2 ∑︀ 푄∈풫푚−1(1) min 푄 (푥) 푌 푥∈[훽,1] 푖:휆푖≥훽 푖 ∑︀ (︂ 2 )︂ 푌푖 ≤ 푇 −2 − 1 푖:휆푖<2훽−1 + 2(1 − 훽) 푚−1 훽 ∑︀ 푌 푖:휆푖≥훽 푖 ∑︀ 푌푖 ≤ 4 exp{−4√︀1 − 훽(푚 − 1)} 푖:휆푖<2훽−1 + 2(1 − 훽). ∑︀ 푌 푖:휆푖≥훽 푖

By Proposition 11,

[︃∑︀ ]︃ [︃ ]︃ [︃ ]︃ 푖:휆 <2훽−1 푌푖 ∑︁ ∑︁ 푛 푖 ≥ 4(푚 − 1)훼 ≤ 푌 ≥ 2푛 + 푌 ≤ P ∑︀ P 푖 P 푖 훼 푖:휆 ≥훽 푌푖 2(푚 − 1) 푖 푖:휆푖<2훽−1 푖:휆푖≥훽 √ 훼 ≤ (2/푒)푛/2 + (︀ 푒/2)︀푛/2(푚−1) = 푂(푒−푚).

Then, with probability 1 − 푂(푒−푚),

(푚) √ 2 2 휆1 − 휆1 훼 −4 1−훽(푚−1) 16 (훼 + 2) ln (푚 − 1) ≤ 16(푚 − 1) 푒 + 2(1 − 훽) = 2 + 2 . 휆1 − 휆푛 (푚 − 1) 8(푚 − 1)

The 16/(푚 − 1)2 term is dominated by the log term as 푚 increases, and, therefore, with probability 1 − 푂(푒−푚),

(푚) 2 2 휆1 − 휆1 (훼 + 2) ln (푚 − 1) ≤ .126 2 . 휆1 − 휆푛 (푚 − 1)

This completes the proof.

The above theorem shows that, for matrices whose distribution of eigenvalues is independent of 푛, we can obtain dimension-free estimates. For example, the above theorem holds for the matrix from Example 1, for 훼 = 2. When a matrix has eigenvalues known to converge to a limiting distribution as the

dimension increases, or a random matrix 푋푛 exhibits suitable convergence of its empirical

178 ∑︀푛 spectral distribution 퐿푋푛 := 1/푛 푖=1 훿휆푖(푋푛), improved estimates can be obtained by simply estimating the corresponding integral polynomial minimization problem. However, to do so, we ﬁrst require a law of large numbers for weighted sums of independent identically distributed (i.i.d.) random variables. We recall the following result.

Proposition 14. ([25]) Let 푎1, 푎2, ... ∈ [푎, 푏] and 푋1, 푋2, ... be i.i.d. random variables, with 2 1 ∑︀푛 E[푋1] = 0 and E[푋1 ] < ∞. Then 푛 푖=1 푎푖푋푖 → 0 almost surely.

We present the following theorem regarding the error of random matrices that exhibit suitable convergence.

푛 Theorem 21. Let 푋푛 ∈ 풮 , Λ(푋푛) ∈ [푎, 푏], 푛 = 1, 2, ... be a sequence of random matrices,

such that 퐿푋푛 converges in probability to 휎(푥) 푑푥 in 퐿2([푎, 푏]), where 휎(푥) ∈ 퐶([푎, 푏]), 푎, 푏 ∈ supp(휎). Then, for all 푚 ∈ N and 휖 > 0,

(︂⃒ (푚) ⃒ )︂ ⃒휆1(푋푛) − 휆1 (푋푛, 푏) 푏 − 휉(푚)⃒ lim P ⃒ − ⃒ > 휖 = 0, 푛→∞ ⃒ 휆1(푋푛) − 휆푛(푋푛) 푏 − 푎 ⃒

where 휉(푚) is the largest zero of the 푚푡ℎ degree orthogonal polynomial of 휎(푥) in the interval [푎, 푏].

Proof. The main idea of the proof is to use Proposition 14 to control the behavior of 푌 , and

the convergence of 퐿푋푛 to 휎(푥) 푑푥 to show convergence to our integral minimization prob- ∑︀푚−1 푗 lem. We ﬁrst write our polynomial 푃 ∈ 풫푚−1 as 푃 (푥) = 푗=0 푐푗푥 and our unnormalized error as

∑︀푛 2 (푚) 푖=2 푌푖푃 (휆푖)(휆1 − 휆푖) 휆1(푋푛) − 휆1 (푋푛, 푏) = min 푛 푃 ∈풫 ∑︀ 2 푚−1 푖=1 푌푖푃 (휆푖) 푃 ̸=0 ∑︀푚−1 ∑︀푛 푗1+푗2 푐푗 푐푗 푌푖휆 (휆1 − 휆푖) = min 푗1,푗2=0 1 2 푖=2 푖 , 푚 푚−1 푛 푗 +푗 푐∈R ∑︀ 푐 푐 ∑︀ 푌 휆 1 2 푐̸=0 푗1,푗2=0 푗1 푗2 푖=1 푖 푖 where 푌1, ..., 푌푛 are i.i.d. chi-square random variables with one degree of freedom each. The functions 푥푗, 푗 = 0, ..., 2푚 − 2, are bounded on [푎, 푏], and so, by Proposition 14, for

179 any 휖1, 휖2 > 0,

⃒ 푛 푛 ⃒ ⃒ 1 ∑︁ 푗 1 ∑︁ 푗 ⃒ ⃒ 푌푖휆 (휆1 − 휆푖) − 휆 (휆1 − 휆푖)⃒ < 휖1, 푗 = 0, ..., 2푚 − 2, ⃒푛 푖 푛 푖 ⃒ 푖=2 푖=2

and ⃒ 푛 푛 ⃒ ⃒ 1 ∑︁ 푗 1 ∑︁ 푗⃒ ⃒ 푌푖휆 − 휆 ⃒ < 휖2, 푗 = 0, ..., 2푚 − 2, ⃒푛 푖 푛 푖 ⃒ 푖=1 푖=1 with probability 1−표(1). 퐿푋푛 converges in probability to 휎(푥) 푑푥, and so, for any 휖3, 휖4 > 0,

⃒ 푛 ∫︁ 푏 ⃒ ⃒ 1 ∑︁ 푗 푗 ⃒ ⃒ 휆 (휆1 − 휆푖) − 푥 (푏 − 푥)휎(푥) 푑푥⃒ < 휖3, 푗 = 0, ..., 2푚 − 2, ⃒푛 푖 ⃒ 푖=2 푎

and ⃒ 푛 ∫︁ 푏 ⃒ ⃒ 1 ∑︁ 푗 푗 ⃒ ⃒ 휆 − 푥 휎(푥) 푑푥⃒ < 휖4, 푗 = 0, ..., 2푚 − 2, ⃒푛 푖 ⃒ 푖=1 푎 with probability 1 − 표(1). This implies that

[︁ 푏 ]︁ ∑︀푚−1 ∫︀ 푗1+푗2 ˆ 푐푗 푐푗 푥 (푏 − 푥)휎(푥) 푑푥 + 퐸(푗1, 푗2) (푚) 푗1,푗2=0 1 2 푎 휆1(푋푛) − 휆1 (푋푛, 푏) = min , 푐∈ 푚 푚−1 [︁∫︀ 푏 ]︁ R ∑︀ 푐 푐 푥푗1+푗2 휎(푥) 푑푥 + 퐸(푗 , 푗 ) 푐̸=0 푗1,푗2=0 푗1 푗2 푎 1 2

ˆ where |퐸(푗1, 푗2)| < 휖1 + 휖3 and |퐸(푗1, 푗2)| < 휖2 + 휖4, 푗1, 푗2 = 0, ..., 푚 − 1, with probability 1 − 표(1).

By the linearity of integration, the minimization problem

∫︀ 푏 푃 2(푥) (︀ 푏−푥 )︀ 휎(푥) 푑푥 min 푎 푏−푎 , ∫︀ 푏 푃 ∈풫푚−1 푃 2(푥)휎(푥) 푑푥 푃 ̸=0 푎 when rewritten in terms of the polynomial coefﬁcients 푐푖, 푖 = 0, ..., 푚 − 1, corresponds 푐푇 퐴푐 푚 to a generalized Rayleigh quotient 푐푇 퐵푐 , where 퐴, 퐵 ∈ 풮++ and 휆푚푎푥(퐴), 휆푚푎푥(퐵), 푇 and 휆푚푖푛(퐵) are all constants independent of 푛, and 푐 = (푐0, ..., 푐푚−1) . By choosing

180 휖1, 휖2, 휖3, 휖4 sufﬁciently small,

⃒ ⃒ ⃒ ⃒ ⃒ 푐푇 (퐴 + 퐸ˆ)푐 푐푇 퐴푐 ⃒ ⃒(푐푇 퐸푐ˆ )푐푇 퐵푐 − (푐푇 퐸푐)푐푇 퐴푐⃒ ⃒ − ⃒ = ⃒ ⃒ ⃒ 푇 푇 ⃒ ⃒ 푇 푇 푇 ⃒ ⃒푐 (퐵 + 퐸)푐 푐 퐵푐⃒ ⃒ (푐 퐵푐 + 푐 퐸푐)푐 퐵푐 ⃒ (휖 + 휖 )푚휆 (퐵) + (휖 + 휖 )푚휆 (퐴) ≤ 1 3 푚푎푥 2 4 푚푎푥 ≤ 휖 (휆푚푖푛(퐵) − (휖2 + 휖4)푚)휆푚푖푛(퐵)

for all 푐 ∈ R푚 with probability 1 − 표(1). Applying Proposition 10 to the above integral minimization problem completes the proof.

The above theorem is a powerful tool for explicitly computing the error in the Lanczos method for certain types of matrices, as the computation of the extremal eigenvalue of an 푚 × 푚 matrix (corresponding to the largest zero) is a nominal computation compared to one application of an 푛 dimensional matrix. In Section 6, we will see that this convergence occurs quickly in practice. In addition, the above result provides strong evidence that the inverse quadratic dependence on 푚 in the error estimates throughout this chapter is not so much a worst case estimate, but actually indicative of error rates in practice. For instance, if the eigenvalues of a matrix are sampled from a distribution bounded above and below by some multiple of a Jacobi weight function, then, by equation (iv) Proposition 13 and Theorem 21, it immediately follows that the error is of order 푚−2. Of course, we note that

Theorem 21 is equally applicable for estimating 휆푛.

5.5 Estimates for Arbitrary Eigenvalues and Condition Num- ber

Up to this point, we have concerned ourselves almost exclusively with the extremal eigenvalues 휆1 and 휆푛 of a matrix. In this section, we extend the techniques of this chapter to arbitrary eigenvalues, and also obtain bounds for the condition number of a positive definite matrix. The results of this section provide the first uniform error estimates for arbitrary eigenvalues and the first lower bounds for the relative error with respect to the condition number. Lower bounds for arbitrary eigenvalues follow relatively quickly from our previous

181 work. However, our proof technique for upper bounds requires some mild assumptions regarding the eigenvalue gaps of the matrix. We begin with asymptotic lower bounds for an arbitrary eigenvalue, and present the following corollary of Theorem 19.

Corollary 3.

[︃ ]︃ 휆 (퐴) − 휆(푚)(퐴, 푏) max 푖 푖 ≥ 1 − 표(1) ≥ 1 − 표(1/푛) 푛 P 퐴∈풮 푏∼풰(푆푛−1) 휆1(퐴) − 휆푛(퐴)

for 푚 = 표(ln 푛),

[︃ ]︃ 휆 (퐴) − 휆(푚)(퐴, 푏) .015 ln2 푛 max 푖 푖 ≥ ≥ 1 − 표(1/푛) 푛 P푛−1 2 2 퐴∈풮 푏∼풰(푆 ) 휆1(퐴) − 휆푛(퐴) 푚 ln ln 푛

for 푚 = Θ(ln 푛), and

[︃ ]︃ 휆 (퐴) − 휆(푚)(퐴, 푏) 1.08 max 푖 푖 ≥ ≥ 1 − 표(1/푛) 푛 P 2 퐴∈풮 푏∼풰(푆푛−1) 휆1(퐴) − 휆푛(퐴) 푚

(︁ )︁ for 푚 = 표 푛1/2 ln−1/2 푛 and 휔(1).

Proof. Let 휙1, ..., 휙푛 be orthonormal eigenvectors corresponding to 휆1(퐴) ≥ ... ≥ 휆푛(퐴), ˆ ∑︀푖−1 푇 and 푏 = 푏 − 푗=1(휙푗 푏)푏. By the inequalities

푇 푇 (푚) 푥 퐴푥 푥 퐴푥 (푚)(︀ ˆ)︀ 휆푖 (퐴, 푏) ≤ max ≤ max = 휆1 퐴, 푏 , 푥∈풦 (퐴,푏)∖0 푇 ^ 푇 푚 푥 푥 푥∈풦푚(퐴,푏)∖0 푥 푥 푇 푥 휙푗 =0, 푗=1,...,푖−1 the relative error 휆 (퐴) − 휆(푚)(퐴, 푏) 휆 (퐴) − 휆(푚)(퐴, ˆ푏) 푖 푖 ≥ 푖 1 . 휆1(퐴) − 휆푛(퐴) 휆1(퐴) − 휆푛(퐴) The right-hand side corresponds to an extremal eigenvalue problem of dimension 푛 − 푖 + 1. Setting the largest eigenvalue of 퐴 to have multiplicity 푖, and deﬁning the eigenvalues

휆푖, ..., 휆푛 based on the eigenvalues (corresponding to dimension 푛 − 푖 + 1) used in the proofs of Lemmas 20 and 21 completes the proof.

182 Next, we provide an upper bound for the relative error in approximating 휆푖 under the

assumption of non-zero gaps between eigenvalues 휆1, ..., 휆푖.

Theorem 22. Let 푛 ≥ 100, 푚 ≥ 9 + 푖, 푝 ≥ 1, and 퐴 ∈ 풮푛. Then

푝 1/푝 [︃(︃ (푚) )︃ ]︃ 2 (︀ −2(푖−1) 8푝)︀ 휆푖(퐴) − 휆푖 (퐴, 푏) ln 훿 푛(푚 − 푖) 푛−1 E푏∼풰(푆 ) ≤ .068 2 휆푖(퐴) − 휆푛(퐴) (푚 − 푖)

and

[︃ (푚) 2 (︀ −2(푖−1)/3 2/3)︀]︃ 휆푖(퐴) − 휆푖 (퐴, 푏) ln 훿 푛(푚 − 푖) P ≤ .571 2 ≥ 1 − 표(1/푛), 푏∼풰(푆푛−1) 휆푖(퐴) − 휆푛(퐴) (푚 − 푖)

where 1 휆 (퐴) − 휆 (퐴) 훿 = min 푘−1 푘 . 2 푘=2,...,푖 휆1(퐴) − 휆푛(퐴)

Proof. As in previous cases, it sufﬁces to prove the theorem for matrices 퐴 with 휆푖(퐴) = 1

and 휆푛(퐴) = 0 (if 휆푖 = 휆푛, we are done). Similar to the polynomial representation of (푚) (푚) 휆1 (퐴, 푏), the Ritz value 휆푖 (퐴, 푏) corresponds to ﬁnding the polynomial in 풫푚−1 with (푚) zeros 휆푘 , 푘 = 1, ..., 푖 − 1, that maximizes the corresponding Rayleigh quotient. For the ∏︀푖−1 (푚) 2 (푚) sake of brevity, let 휑푖(푥) = 푘=1(휆푘 − 푥) . Then 휆푖 (퐴, 푏) can be written as

∑︀푛 2 2 (푚) 푗=1 푏푗 푃 (휆푗)휑푖(휆푗)휆푗 휆푖 (퐴, 푏) = max 푛 , 푃 ∈풫 ∑︀ 2 2 푚−푖 푗=1 푏푗 푃 (휆푗)휑푖(휆푗)

and, therefore, the error is bounded above by

∑︀푛 2 2 (푚) 푗=푖+1 푏푗 푃 (휆푗)휑푖(휆푗)(1 − 휆푗) 휆푖(퐴) − 휆푖 (퐴, 푏) ≤ max 푛 . 푃 ∈풫 ∑︀ 2 2 푚−푖 푗=1 푏푗 푃 (휆푗)휑푖(휆푗)

The main idea of the proof is very similar to that of Lemma 22, paired with a pigeonhole

principle. The intervals [휆푗(퐴) − 훿휆1, 휆푗(퐴) + 훿휆1], 푗 = 1, ..., 푖, are disjoint, and so there exists some index 푗* such that the corresponding interval does not contain any of the Ritz (푚) values 휆푘 (퐴, 푏), 푘 = 1, ..., 푖 − 1. We begin by bounding expected relative error. As in the proof of Lemma 22, by integrating over chi-square random variables and using

183 Cauchy-Schwarz, we have

1 푝 ⎡ ⎤ 푝 1 ∫︁ (︃∑︀푛 2 2푞 )︃ 2푞 [︁(︀ (푚))︀푝]︁ 푝 푗=푖+1 푦푗푃 (휆푗)휑푖(휆푗)(1 − 휆푗) 휆푖 − 휆푖 ≤ min ⎣ 푛 푓푌 (푦) 푑푦⎦ E 푃 ∈풫 ∑︀ 2 푚−푖 [0,∞)푛 푗=1 푦푗푃 (휆푗)휑푖(휆푗)

1 푝 ⎡ 푞 ⎤ 푝 ∫︁ (︃∑︀ 푦 푃 2(휆 )휑 (휆 )(1 − 휆 )2 )︃ 2푞 푗:휆푗 <훽 푗 푗 푖 푗 푗 ≤ min ⎣ 푛 푓푌 (푦) 푑푦⎦ 푃 ∈풫 ∑︀ 2 푚−푖 [0,∞)푛 푗=1 푦푗푃 (휆푗)휑푖(휆푗)

1 푝 ⎡ 푞 ⎤ 푝 (︃∑︀ 2 2 )︃ 2푞 ∫︁ 푦푗푃 (휆푗)휑푖(휆푗)(1 − 휆푗) + 푗:휆푗 ∈[훽,1] 푓 (푦) 푑푦 ⎣ ∑︀푛 2 푌 ⎦ [0,∞)푛 푗=1 푦푗푃 (휆푗)휑푖(휆푗)

for 푞 ∈ N, 2푝 < 2푞 ≤ 4푝, and any 훽 ∈ (0, 1). The second term on the right-hand side is bounded above by 1 − 훽 (the integrand is at most max휆∈[훽,1](1 − 휆) = 1 − 훽), and the integrand in the ﬁrst term is bounded above by

푝 1 푞 푞 (︃∑︀ 2 2 )︃ 2푞 (︃∑︀ 2 2 )︃ 4 푦푗푃 (휆푗)휑푖(휆푗)(1 − 휆푗) 푦푗푃 (휆푗)휑푖(휆푗)(1 − 휆푗) 푗:휆푗 <훽 ≤ 푗:휆푗 <훽 ∑︀푛 2 ∑︀푛 2 푗=1 푦푗푃 (휆푗)휑푖(휆푗) 푗=1 푦푗푃 (휆푗)휑푖(휆푗) 1 1 1 푞−2 4 2 ∑︀ 4 |푃 (푥)| 2 휑 (푥)(1 − 푥) (︂ 푦푖 )︂ ≤ max 푖 푖:휆푖<훽 . 1 1 푥∈[0,훽] 4 푦푗* |푃 (휆푗* )| 2 휑푖 (휆푗* )

(︁ 2 )︁ By replacing the minimizing polynomial in 풫푚−푖 by 푇푚−푖 훽 푥 − 1 , the maximum is achieved at 푥 = 0, and, by the monotonicity of 푇푚−푖 on [1, ∞),

(︂ )︂ (︂ )︂ 2 2 1 {︁ √︀ }︁ 푇 휆 * − 1 ≥ 푇 − 1 ≥ exp 2 1 − 훽 (푚 − 푖) . 푚−푖 훽 푗 푚−푖 훽 2

In addition, 1/4 푖−1 ⃒ (푚) ⃒1/2 휑 (0) ∏︁ ⃒ 휆 ⃒ 푖 = ⃒ 푘 ⃒ ≤ 훿−(푖−1)/2. 1/4 ⃒ (푚) ⃒ 휑푖 (휆푗* ) 푘=1 ⃒휆푘 − 휆푗* ⃒

184 We can now bound the 푝-norm by

1 1/푝 [︁(︁ )︁푝]︁ [︂21/4Γ(1/4) 푛1/4 ]︂ 휆 − 휆(푚) 푝 ≤ 푒−훾(푚−푖)/푝 + 훾2, E 푖 푖 Γ(1/2) 훿(푖−1)/2 √ 푝 (︀ −(푖−1)/2푝 1/4푝 2)︀ where 훾 = 1 − 훽. Setting 훾 = 푚−푖 ln 훿 푛 (푚 − 푖) (assuming 훾 < 1, otherwise our bound is already greater than one, and trivially holds), we obtain

(︁ 1/4 )︁1/푝 1 2 Γ(1/4) 1 2 (︀ −2(푖−1) 8푝)︀ [︁(︁ )︁푝]︁ Γ(1/2) + 16 ln 훿 푛(푚 − 푖) 휆 − 휆(푚) 푝 ≤ E 푖 푖 (푚 − 푖)2 ln2 (︀훿−2(푖−1)푛(푚 − 푖)8푝)︀ ≤ .068 , (푚 − 푖)2

for 푚 ≥ 9 + 푖, 푛 ≥ 100. This completes the proof of the expected error estimate. We now focus on the probabilistic estimate. We have

∑︀푛 2 (푚) 푗=푖+1 푌푗푃 (휆푗)휑푖(휆푗)(1 − 휆푗) 휆푖(퐴) − 휆푖 (퐴, 푏) ≤ min 푛 푃 ∈풫 ∑︀ 2 푚−푖 푗=1 푌푗푃 (휆푗)휑푖(휆푗) 2 ∑︀ 푌 푃 (푥)휑푖(푥)(1 − 푥) 푗:휆푗 <훽 푗 ≤ min max 2 + (1 − 훽) 푃 ∈풫푚−푖 푥∈[0,훽] 푃 (휆푗* )휑푖(휆푗* ) 푌푗* ∑︀ (︂ )︂ 푌푗 −2(푖−1) −2 2 푗:휆푗 <훽 ≤ 훿 푇푚−푖 − 1 + (1 − 훽) 훽 푌푗* ∑︀ 푌푗 ≤ 4 훿−2(푖−1) exp{−4√︀1 − 훽(푚 − 푖)} 푗:휆푗 <훽 + (1 − 훽). 푌푗*

By Proposition 11,

[︃∑︀ ]︃ [︃ ]︃ 푌푗 푗:휆푗 <훽 3.02 [︀ −2.01]︀ ∑︁ 1.01 P ≥ 푛 ≤ P 푌푗* ≤ 푛 + P 푌푗 ≥ 푛 푌푗* 푗̸=푗*

(︁ .01 )︁(푛−1)/2 ≤ (︀푒/푛2.01)︀1/2 + 푛.01푒1−푛 = 표(1/푛).

√ ln(훿−2(푖−1)푛3.02(푚−푖)2) Let 1 − 훽 = 4(푚−푖) . Then, with probability 1 − 표(1/푛),

4 ln2 (︀훿−2(푖−1)푛3.02(푚 − 푖)2)︀ 휆 (퐴) − 휆(푚)(퐴) ≤ + . 푖 푖 (푚 − 푖)2 16(푚 − 푖)2

185 The 4/(푚 − 푖)2 term is dominated by the log term as 푛 increases, and, therefore, with probability 1 − 표(1/푛),

ln2 (︀훿−2(푖−1)/3푛(푚 − 푖)2/3)︀ 휆 (퐴) − 휆(푚)(퐴) ≤ .571 . 푖 푖 (푚 − 푖)2

This completes the proof.

For typical matrices with no repeated eigenvalues, 훿 is usually a very low degree

polynomial in 푛, and, for 푖 constant, the estimates for 휆푖 are not much worse than that of

휆1. In addition, it seems likely that, given any matrix 퐴, a small random perturbation of 퐴 before the application of the Lanczos method will satisfy the condition of the theorem with high probability, and change the eigenvalues by a negligible amount. Of course, the bounds

from Theorem 22 for maximal eigenvalues 휆푖 also apply to minimal eigenvalues 휆푛−푖.

Next, we make use of Theorem 19 to produce error estimates for the condition number

of a symmetric positive deﬁnite matrix. Let 휅(퐴) = 휆1(퐴)/휆푛(퐴) denote the condition 푛 (푚) (푚) (푚) number of a matrix 퐴 ∈ 풮++ and 휅 (퐴, 푏) = 휆1 (퐴, 푏)/휆푚 (퐴, 푏) denote the condition 푚 number of the tridiagonal matrix 푇푚 ∈ 풮++ resulting from 푚 iterations of the Lanczos 푛 method applied to a matrix 퐴 ∈ 풮++ with initial vector 푏. Their difference can be written as

(푚) (푚) (푚) (푚) 휆1 휆1 휆1휆푚 − 휆푛휆1 휅(퐴) − 휅 (퐴, 푏) = − (푚) = (푚) 휆푛 휆푚 휆푛휆푚 (푚) (︁ (푚))︁ (푚) (︁ (푚) )︁ 휆푚 휆1 − 휆1 + 휆1 휆푚 − 휆푛 = (푚) 휆푛휆푚 [︃ ]︃ 휆 − 휆(푚) 휆(푚) − 휆 = (휅(퐴) − 1) 1 1 + 휅(푚)(퐴, 푏) 푚 푛 , 휆1 − 휆푛 휆1 − 휆푛 which leads to the bounds

휅(퐴) − 휅(푚)(퐴, 푏) 휆 − 휆(푚) 휆(푚) − 휆 ≤ 1 1 + 휅(퐴) 푚 푛 휅(퐴) 휆1 − 휆푛 휆1 − 휆푛

186 and (푚) (푚) 휅(퐴) − 휅 (퐴, 푏) 휆푚 − 휆푛 (푚) ≥ (휅(퐴) − 1) . 휅 (퐴, 푏) 휆1 − 휆푛 Using Theorem 19 and Minkowski’s inequality, we have the following corollary.

Corollary 4.

[︂휅(퐴) − 휅(푚)(퐴, 푏) ]︂ sup P ≥ (1 − 표(1))(¯휅 − 1) ≥ 1 − 표(1/푛) 푛 푏∼풰(푆푛−1) (푚) 퐴∈풮++ 휅 (퐴, 푏) 휅(퐴)=¯휅

for 푚 = 표(ln 푛),

[︂휅(퐴) − 휅(푚)(퐴, 푏) (¯휅 − 1) ln2 푛]︂ sup P ≥ .015 2 ≥ 1 − 표(1/푛) 푛 푏∼풰(푆푛−1) (푚) 2 퐴∈풮++ 휅 (퐴, 푏) 푚 ln ln 푛 휅(퐴)=¯휅 for 푚 = Θ(ln 푛),

[︂휅(퐴) − 휅(푚)(퐴, 푏) 휅¯ − 1]︂ sup P ≥ 1.08 ≥ 1 − 표(1/푛) 푛 푏∼풰(푆푛−1) (푚) 2 퐴∈풮++ 휅 (퐴, 푏) 푚 휅(퐴)=¯휅

(︁ )︁ for 푚 = 표 푛1/2 ln−1/2 푛 and 휔(1), and

1/푝 [︂(︂휅(퐴) − 휅(푚)(퐴, 푏))︂푝]︂ ln2 (푛(푚 − 1)8푝) sup E푏∼풰(푆푛−1) ≤ .068 (¯휅 + 1) 푛 2 퐴∈풮++ 휅(퐴) (푚 − 1) 휅(퐴)=¯휅

for 푛 ≥ 100, 푚 ≥ 10, 푝 ≥ 1.

As previously mentioned, it is not possible to produce uniform bounds for the relative error of 휅(푚)(퐴, 푏), and so some dependence on 휅(퐴) is necessary.

5.6 Experimental Results

In this section, we present a number of experimental results that illustrate the error of the Lanczos method in practice. We consider:

187 • eigenvalues of 1D Laplacian with Dirichlet boundary conditions Λ푙푎푝 = {2+2 cos(푖휋/(푛+ 푛 1))}푖=1,

푛 • a uniform partition of [0, 1], Λ푢푛푖푓 = {(푛 − 푖)/(푛 − 1)}푖=1,

푛 • eigenvalues from the semi-circle distribution, Λ푠푒푚푖 = {휆푖}푖=1, where (︀ √︀ 2 )︀ 1/2 + 휆푖 1 − 휆푖 + arcsin 휆푖 /휋 = (푛 − 푖)/(푛 − 1), 푖 = 1, ..., 푛,

ln ln 푛 ln 푛 푛 • eigenvalues corresponding to Lemma 21, Λ푙표푔 = {1 − [(푛 + 1 − 푖)/푛] }푖=1.

For each one of these spectra, we perform tests for dimensions 푛 = 10푖, 푖 = 5, 6, 7, 8. For each dimension, we generate 100 random vectors 푏 ∼ 풰(푆푛−1), and perform 푚 = 100 iterations of the Lanczos method on each vector. In Figure 5-1, we report the results of these experiments. In particular, for each spectrum, we plot the empirical average relative (푚) error (휆1 − 휆1 )/(휆1 − 휆푛) for each dimension as 푚 varies from 1 to 100. In addition, for 푛 = 108, we present a box plot for each spectrum, illustrating the variability in relative error that each spectrum possesses.

2 In Figure 5-1 , the plots of Λ푙푎푝, Λ푢푛푖푓 , Λ푠푒푚푖 all illustrate an empirical average error estimate that scales with 푚−2 and has no dependence on dimension 푛 (for 푛 sufﬁciently large). This is consistent with the theoretical results of the chapter, most notably Theorem 21, as all of these spectra exhibit suitable convergence of their empirical spectral distribution, and the related integral minimization problems all have solutions of order 푚−2. In addition, the box plots corresponding to 푛 = 108 illustrate that the relative error for a given iteration number has a relatively small variance. For instance, all extreme values remain within a

−2 −2 −2 range of length less than .01 푚 for Λ푙푎푝, .1 푚 for Λ푢푛푖푓 , and .4 푚 for Λ푠푒푚푖. This is also consistent with the convergence of Theorem 21. The empirical spectral distribution of all three spectra converge to shifted and scaled versions of Jacobi weight functions, namely,

−1/2,−1/2 0,0 1/2,1/2 Λ푙푎푝 corresponds to 휔 (푥), Λ푢푛푖푓 to 휔 (푥), and Λ푠푒푚푖 to 휔 (푥). The limiting 2 2 value of 푚 (1 − 휉(푚))/2 for each of these three cases is given by 푗1,훼/4, where 푗1,훼 is

2In Figure 5-1, the box plots are labelled as follows. For a given 푚, the 25푡ℎ and 75푡ℎ percentile of the values, denoted by 푞1 and 푞3, are the bottom and top of the corresponding box, and the red line in the box is the median. The whiskers extend to the most extreme points in the interval [푞1 − 1.5(푞3 − 푞1), 푞3 + 1.5(푞3 − 푞1)], and outliers not in this interval correspond to the ’+’ symbol.

188 (a) Plot, Λ푙푎푝 (b) Box Plot, Λ푙푎푝

(e) Plot, Λ푠푒푚푖 (f) Box Plot, Λ푠푒푚푖

(g) Plot, Λ푙표푔 (h) Box Plot, Λ푙표푔

2 Figure 5-1: Plot and box plot of relative error times 푚 vs iteration number 푚 for Λ푙푎푝, Λ푢푛푖푓 , Λ푠푒푚푖, and Λ푙표푔. The plot contains curves for each dimension 푛 tested. Each curve represents the empirical average relative error for each value of 푚, averaged over 100 random initializations. The box plot illustrates the variability of relative error for 푛 = 108. 189 the ﬁrst positive zero of the Bessel function 퐽훼(푥), namely, the scaled error converges to 2 2 휋 /16 ≈ .617 for Λ푙푎푝, ≈ 1.45 for Λ푢푛푖푓 , and 휋 /4 ≈ 2.47 for Λ푠푒푚푖. Two additional properties suggested by Figure 5-1 are that the variance and the dimension 푛 required to observe asymptotic behavior both appear to increase with 훼. This is consistent with the

훼,0 theoretical results, as Lemma 21 (and Λ푙표푔) results from considering 휔 (푥) with 훼 as a function of 푛.

The plot of relative error for Λ푙표푔 illustrates that relative error does indeed depend on 푛 in a tangible way, and is increasing with 푛 in what appears to be a logarithmic fashion. For instance, when looking at the average relative error scaled by 푚2, the maximum over all iteration numbers 푚 appears to increase somewhat logarithmically (≈ 4 for 푛 = 105, ≈ 5 for 푛 = 106, ≈ 6 for 푛 = 107, and ≈ 7 for 푛 = 108). In addition, the boxplot for 푛 = 108 illustrates that this spectrum exhibits a large degree of variability and is susceptible to extreme outliers. These numerical results support the theoretical lower bounds of Section 3, and illustrate that the asymptotic theoretical lower bounds which depend on 푛 do occur in practical computations.

190 Bibliography

[1] Raja Haﬁz Affandi, Emily B. Fox, Ryan P. Adams, and Benjamin Taskar. Learning the parameters of determinantal point process kernels. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, pages 1224–1232, 2014.

[2] Edoardo Amaldi, Claudio Iuliano, and Romeo Rizzi. Efﬁcient deterministic algorithms for ﬁnding a minimum cycle basis in undirected graphs. In International Conference on Integer Programming and Combinatorial Optimization, pages 397– 410. Springer, 2010.

[3] Nima Anari and Thuy-Duong Vuong. Simple and near-optimal map inference for nonsymmetric dpps. arXiv preprint arXiv:2102.05347, 2021.

[4] Vishal Arul. personal communication.

[5] Mihai Badoiu, Kedar Dhamdhere, Anupam Gupta, Yuri Rabinovich, Harald Räcke, Ramamoorthi Ravi, and Anastasios Sidiropoulos. Approximation algorithms for low-distortion embeddings into low-dimensional spaces. In SODA, volume 5, pages 119–128. Citeseer, 2005.

[6] Nematollah Kayhan Batmanghelich, Gerald Quon, Alex Kulesza, Manolis Kellis, Polina Golland, and Luke Bornn. Diversifying sparsity using variational determinantal point processes. ArXiv: 1411.6307, 2014.

[7] Giuseppe Di Battista, Peter Eades, Roberto Tamassia, and Ioannis G Tollis. Graph drawing: algorithms for the visualization of graphs. Prentice Hall PTR, 1998.

[8] Julius Borcea, Petter Brändén, and Thomas Liggett. Negative dependence and the geometry of polynomials. Journal of the American Mathematical Society, 22(2):521– 567, 2009.

[9] Ingwer Borg and Patrick JF Groenen. Modern multidimensional scaling: Theory and applications. Springer Science & Business Media, 2005.

[10] Alexei Borodin. Determinantal point processes. arXiv preprint arXiv:0911.1153, 2009.

191 [11] Alexei Borodin and Eric M Rains. Eynard–mehta theorem, schur process, and their pfafﬁan analogs. Journal of statistical physics, 121(3):291–317, 2005.

[12] Jane Breen, Alex Riasanovsky, Michael Tait, and John Urschel. Maximum spread of graphs and bipartite graphs. In Preparation.

[13] Andries E Brouwer and Willem H Haemers. Spectra of graphs. Springer Science & Business Media, 2011.

[14] Victor-Emmanuel Brunel. Learning signed determinantal point processes through the principal minor assignment problem. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Informa- tion Processing Systems, volume 31. Curran Associates, Inc., 2018.

[15] Victor-Emmanuel Brunel, Michel Goemans, and John Urschel. Recovering a magnitude-symmetric matrix from its principal minors. In preparation.

[16] Victor-Emmanuel Brunel, Ankur Moitra, Philippe Rigollet, and John Urschel. Max- imum likelihood estimation of determinantal point processes. arXiv:1701.06501, 2017.

[17] Victor-Emmanuel Brunel, Ankur Moitra, Philippe Rigollet, and John Urschel. Rates of estimation for determinantal point processes. In Conference on Learning Theory, pages 343–345. PMLR, 2017.

[18] David Carlson. What are Schur complements, anyway? Linear Algebra and its Applications, 74:257–275, 1986.

[19] Lawrence Cayton and Sanjoy Dasgupta. Robust euclidean embedding. In Proceedings of the 23rd international conference on machine learning, pages 169–176, 2006.

[20] Norishige Chiba, Takao Nishizeki, Shigenobu Abe, and Takao Ozawa. A linear algorithm for embedding planar graphs using PQ-trees. Journal of computer and system sciences, 30(1):54–76, 1985.

[21] David M. Chickering, Dan Geiger, and David Heckerman. On ﬁnding a cycle basis with a shortest maximal cycle. Information Processing Letters, 54(1):55 – 58, 1995.

[22] Ali Çivril and Malik Magdon-Ismail. On selecting a maximum volume sub-matrix of a matrix and related problems. Theoretical Computer Science, 410(47-49):4801–4811, 2009.

[23] Ed S Coakley and Vladimir Rokhlin. A fast divide-and-conquer algorithm for computing the spectra of real symmetric tridiagonal matrices. Applied and Computational Harmonic Analysis, 34(3):379–414, 2013.

[24] Martin Costabel. Boundary integral operators on Lipschitz domains: elementary results. SIAM J. Math. Anal., 19(3):613–626, 1988.

192 [25] Jack Cuzick. A strong law for weighted sums of iid random variables. Journal of Theoretical Probability, 8(3):625–641, 1995.

[26] Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of johnson and lindenstrauss. Random Structures & Algorithms, 22(1):60–65, 2003.

[27] Timothy A Davis and Yifan Hu. The university of ﬂorida sparse matrix collection. ACM Transactions on Mathematical Software (TOMS), 38(1):1–25, 2011.

[28] Jan De Leeuw. Convergence of the majorization method for multidimensional scaling. Journal of classiﬁcation, 5(2):163–180, 1988.

[29] Jan De Leeuw, In JR Barra, F Brodeau, G Romier, B Van Cutsem, et al. Applications of convex analysis to multidimensional scaling. In Recent Developments in Statistics. Citeseer, 1977.

[30] Gianna M Del Corso and Giovanni Manzini. On the randomized error of polynomial methods for eigenvector and eigenvalue estimates. Journal of Complexity, 13(4):419– 456, 1997.

[31] Erik Demaine, Adam Hesterberg, Frederic Koehler, Jayson Lynch, and John Urschel. Multidimensional scaling: Approximation and complexity. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 2568– 2578. PMLR, 18–24 Jul 2021.

[32] Amit Deshpande and Luis Rademacher. Efﬁcient volume sampling for row/column subset selection. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on, pages 329–338. IEEE, 2010.

[33] Emeric Deutsch. On the spread of matrices and polynomials. Linear Algebra and Its Applications, 22:49–55, 1978.

[34] Michel Marie Deza and Monique Laurent. Geometry of cuts and metrics, volume 15. Springer, 2009.

[35] Peter Sheridan Dodds, Roby Muhamad, and Duncan J Watts. An experimental study of search in global social networks. science, 301(5634):827–829, 2003.

[36] Petros Drineas, Ilse CF Ipsen, Eugenia-Maria Kontopoulou, and Malik Magdon- Ismail. Structural convergence results for approximation of dominant subspaces from block Krylov spaces. SIAM Journal on Matrix Analysis and Applications, 39(2):567–586, 2018.

[37] Kathy Driver, Kerstin Jordaan, and Norbert Mbuyi. Interlacing of the zeros of Jacobi polynomials with different parameters. Numerical Algorithms, 49(1-4):143, 2008.

193 [38] Peter Eades. A heuristic for graph drawing. Congressus numerantium, 42:149–160, 1984.

[39] John Ellson, Emden Gansner, Lefteris Koutsoﬁos, Stephen C North, and Gordon Woodhull. Graphviz—open source graph drawing tools. In International Symposium on Graph Drawing, pages 483–484. Springer, 2001.

[40] Miroslav Fiedler. Remarks on the Schur complement. Linear Algebra Appl., 39:189– 195, 1981.

[41] Klaus-Jürgen Förster and Knut Petras. On estimates for the weights in Gaussian quadrature in the ultraspherical case. Mathematics of computation, 55(191):243–264, 1990.

[42] Emilio Gagliardo. Caratterizzazioni delle tracce sulla frontiera relative ad alcune classi di funzioni in 푛 variabili. Rend. Sem. Mat. Univ. Padova, 27:284–305, 1957.

[43] Emden R Gansner, Yehuda Koren, and Stephen North. Graph drawing by stress majorization. In International Symposium on Graph Drawing, pages 239–250. Springer, 2004.

[44] Mike Gartrell, Victor-Emmanuel Brunel, Elvis Dohmatob, and Syrine Krich- ene. Learning nonsymmetric determinantal point processes. arXiv preprint arXiv:1905.12962, 2019.

[45] Mike Gartrell, Ulrich Paquet, and Noam Koenigstein. Low-rank factorization of determinantal point processes. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 31, 2017.

[46] Jennifer A Gillenwater, Alex Kulesza, Emily Fox, and Ben Taskar. Expectation- maximization for learning determinantal point processes. In NIPS, 2014.

[47] Gene H Golub and Charles F Van Loan. Matrix computations, volume 3. JHU press, 2012.

[48] David A Gregory, Daniel Hershkowitz, and Stephen J Kirkland. The spread of the spectrum of a graph. Linear Algebra and its Applications, 332:23–35, 2001.

[49] Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217–288, 2011.

[50] G. H. Hardy, J. E. Littlewood, and G. Pólya. Inequalities. Cambridge Mathematical Library. Cambridge University Press, Cambridge, 1988. Reprint of the 1952 edition.

[51] Joseph Douglas Horton. A polynomial-time algorithm to ﬁnd the shortest cycle basis of a graph. SIAM Journal on Computing, 16(2):358–366, 1987.

194 [52] Xiaozhe Hu, John C Urschel, and Ludmil T Zikatanov. On the approximation of laplacian eigenvalues in graph disaggregation. Linear and Multilinear Algebra, 65(9):1805–1822, 2017. [53] Charles R Johnson, Ravinder Kumar, and Henry Wolkowicz. Lower bounds for the spread of a matrix. Linear Algebra and Its Applications, 71:161–173, 1985. [54] Charles R Johnson and Michael J Tsatsomeros. Convex sets of nonsingular and P-matrices. Linear and Multilinear Algebra, 38(3):233–239, 1995. [55] Tomihisa Kamada, Satoru Kawai, et al. An algorithm for drawing general undirected graphs. Information processing letters, 31(1):7–15, 1989. [56] Shmuel Kaniel. Estimates for some computational techniques in linear algebra. Mathematics of Computation, 20(95):369–378, 1966. [57] Michael Kaufmann and Dorothea Wagner. Drawing graphs: methods and models, volume 2025. Springer, 2003. [58] Sanjeev Khanna, Madhu Sudan, Luca Trevisan, and David P Williamson. The approximability of constraint satisfaction problems. SIAM Journal on Computing, 30(6):1863–1920, 2001. [59] Kevin Knudson and Evelyn Lamb. My favorite theorem, episode 23 - ingrid daubechies. [60] Y. Koren. Drawing graphs by eigenvectors: theory and practice. Comput. Math. Appl., 49(11-12):1867–1888, 2005. [61] Yehuda Koren. On spectral graph drawing. In Computing and combinatorics, volume 2697 of Lecture Notes in Comput. Sci., pages 496–508. Springer, Berlin, 2003. [62] Yehuda Koren, Liran Carmel, and David Harel. Drawing huge graphs by algebraic multigrid optimization. Multiscale Model. Simul., 1(4):645–673 (electronic), 2003. [63] B. Korte and J. Vygen. Combinatorial Optimization: Theory and Algorithms. Springer, 2011. [64] Joseph B Kruskal. Multidimensional scaling by optimizing goodness of ﬁt to a nonmetric hypothesis. Psychometrika, 29(1):1–27, 1964. [65] Joseph B Kruskal. Nonmetric multidimensional scaling: a numerical method. Psy- chometrika, 29(2):115–129, 1964. [66] Joseph B Kruskal. Multidimensional scaling. Number 11. Sage, 1978. [67] Jacek Kuczynski´ and Henryk Wo´zniakowski. Estimating the largest eigenvalue by the power and Lanczos algorithms with a random start. SIAM journal on matrix analysis and applications, 13(4):1094–1122, 1992.

195 [68] Jacek Kuczynski´ and Henryk Wo´zniakowski. Probabilistic bounds on the extremal eigenvalues and condition number by the Lanczos algorithm. SIAM Journal on Matrix Analysis and Applications, 15(2):672–691, 1994. [69] A. Kulesza. Learning with determinantal point processes. PhD thesis, University of Pennsylvania, 2012. [70] Alex Kulesza and Ben Taskar. 푘-DPPs: Fixed-size determinantal point processes. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011, pages 1193–1200, 2011. [71] Alex Kulesza and Ben Taskar. Determinantal point processes for machine learning. arXiv preprint arXiv:1207.6083, 2012. [72] Alex Kulesza and Ben Taskar. Determinantal Point Processes for Machine Learning. Now Publishers Inc., Hanover, MA, USA, 2012. [73] Donghoon Lee, Geonho Cha, Ming-Hsuan Yang, and Songhwai Oh. Individualness and determinantal point processes for pedestrian detection. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI, pages 330–346, 2016. [74] Chengtao Li, Stefanie Jegelka, and Suvrit Sra. Fast dpp sampling for nystrom with application to kernel methods. International Conference on Machine Learning (ICML), 2016. [75] Chengtao Li, Stefanie Jegelka, and Suvrit Sra. Fast sampling for strongly rayleigh measures with application to determinantal point processes. 1607.03559, 2016. [76] Hui Lin and Jeff A. Bilmes. Learning mixtures of submodular shells with application to document summarization. In Proceedings of the Twenty-Eighth Conference on Uncertainty in Artiﬁcial Intelligence, Catalina Island, CA, USA, August 14-18, 2012, pages 479–490, 2012. [77] J.-L. Lions and E. Magenes. Non-homogeneous boundary value problems and applications. Vol. I. Springer-Verlag, New York-Heidelberg, 1972. Translated from the French by P. Kenneth, Die Grundlehren der mathematischen Wissenschaften, Band 181. [78] Chia-an Liu and Chih-wen Weng. Spectral radius of bipartite graphs. Linear Algebra and its Applications, 474:30–43, 2015. [79] Raphael Loewy. Principal minors and diagonal similarity of matrices. Linear algebra and its applications, 78:23–64, 1986. [80] Zelda Mariet and Suvrit Sra. Fixed-point algorithms for learning determinantal point processes. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pages 2389–2397, 2015.

196 [81] Claire Mathieu and Warren Schudy. Yet another algorithm for dense max cut: go greedy. In SODA, pages 176–182, 2008.

[82] Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.

[83] Leon Mirsky. The spread of a matrix. Mathematika, 3(2):127–130, 1956.

[84] Cameron Musco and Christopher Musco. Randomized block Krylov methods for stronger and faster approximate singular value decomposition. In Advances in Neural Information Processing Systems, pages 1396–1404, 2015.

[85] Assaf Naor. An introduction to the ribe program. Japanese Journal of Mathematics, 7(2):167–233, 2012.

[86] S. V. Nepomnyaschikh. Mesh theorems on traces, normalizations of function traces and their inversion. Soviet J. Numer. Anal. Math. Modelling, 6(3):223–242, 1991.

[87] Jindrichˇ Necas.ˇ Direct methods in the theory of elliptic equations. Springer Mono- graphs in Mathematics. Springer, Heidelberg, 2012. Translated from the 1967 French original by Gerard Tronel and Alois Kufner, Editorial coordination and preface by Šárka Necasováˇ and a contribution by Christian G. Simader.

[88] Aleksandar Nikolov. Randomized rounding for the largest simplex problem. In Pro- ceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, pages 861–870. ACM, 2015.

[89] Aleksandar Nikolov and Mohit Singh. Maximizing determinants under partition constraints. In STOC, pages 192–201, 2016.

[90] Geno Nikolov. Inequalities of Dufﬁn-Schaeffer type ii. Institute of Mathematics and Informatics at the Bulgarian Academy of Sciences, 2003.

[91] Christopher Conway Paige. The computation of eigenvalues and eigenvectors of very large sparse matrices. PhD thesis, University of London, 1971.

[92] Beresford N Parlett, H Simon, and LM Stringer. On estimating the largest eigenvalue with the Lanczos algorithm. Mathematics of computation, 38(157):153–165, 1982.

[93] Jack Poulson. High-performance sampling of generic determinantal point processes. Philosophical Transactions of the Royal Society A, 378(2166):20190059, 2020.

[94] Patrick Rebeschini and Amin Karbasi. Fast mixing for discrete point processes. In COLT, pages 1480–1500, 2015.

[95] Alex W. N. Riasanocsky. spread_numeric. https://github.com/ariasanovsky/spread_ numeric, commit = c75e5d1726361eba04292c46c41009ea76e401e9, 2021.

197 [96] Justin Rising, Alex Kulesza, and Ben Taskar. An efﬁcient algorithm for the symmetric principal minor assignment problem. Linear Algebra and its Applications, 473:126 – 144, 2015. [97] Dhruv Rohatgi, John C Urschel, and Jake Wellens. Regarding two conjectures on clique and biclique partitions. arXiv preprint arXiv:2005.02529, 2020. [98] Donald J Rose, R Endre Tarjan, and George S Lueker. Algorithmic aspects of vertex elimination on graphs. SIAM Journal on computing, 5(2):266–283, 1976. [99] Yousef Saad. On the rates of convergence of the Lanczos and the block-Lanczos methods. SIAM Journal on Numerical Analysis, 17(5):687–706, 1980. [100] Yousef Saad. Numerical methods for large eigenvalue problems: revised edition, volume 66. Siam, 2011. [101] Warren Schudy. Approximation Schemes for Inferring Rankings and Clusterings from Pairwise Data. PhD thesis, Brown University, 2012. [102] Michael Ian Shamos and Dan Hoey. Geometric intersection problems. In 17th Annual Symposium on Foundations of Computer Science (sfcs 1976), pages 208–215. IEEE, 1976. [103] Jie Shen, Tao Tang, and Li-Lian Wang. Spectral methods: algorithms, analysis and applications, volume 41. Springer Science & Business Media, 2011. [104] Max Simchowitz, Ahmed El Alaoui, and Benjamin Recht. On the gap between strict-saddles and true convexity: An omega (log d) lower bound for eigenvector approximation. arXiv preprint arXiv:1704.04548, 2017. [105] Max Simchowitz, Ahmed El Alaoui, and Benjamin Recht. Tight query complexity lower bounds for PCA via ﬁnite sample deformed Wigner law. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1249–1259, 2018. [106] Jasper Snoek, Richard S. Zemel, and Ryan Prescott Adams. A determinantal point process latent variable model for inhibition in neural spiking data. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States., pages 1932–1940, 2013. [107] Ernst Steinitz. Polyeder und raumeinteilungen. Encyk der Math Wiss, 12:38–43, 1922. [108] Marco Di Summa, Friedrich Eisenbrand, Yuri Faenza, and Carsten Moldenhauer. On largest volume simplices and sub-determinants. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 315–323. Society for Industrial and Applied Mathematics, 2015.

198 [109] JW Suurballe. Disjoint paths in a network. Networks, 4(2):125–145, 1974.

[110] Gabor Szego. Orthogonal polynomials, volume 23. American Mathematical Soc., 1939.

[111] P. L. Tchebichef. Sur le rapport de deux integrales etendues aux memes valeurs de la variable. Zapiski Akademii Nauk, vol. 44, Supplement no. 2, 1883. Oeuvres. Vol. 2, pp. 375-402.

[112] Tamás Terpai. Proof of a conjecture of V. Nikiforov. Combinatorica, 31(6):739–754, 2011.

[113] Lloyd N Trefethen and David Bau III. Numerical linear algebra, volume 50. Siam, 1997.

[114] Francesco G Tricomi. Sugli zeri dei polinomi sferici ed ultrasferici. Annali di Matematica Pura ed Applicata, 31(4):93–97, 1950.

[115] Alexandre B. Tsybakov. Introduction to nonparametric estimation. Springer Series in Statistics. Springer, New York, 2009.

[116] W. T. Tutte. How to draw a graph. Proc. London Math. Soc. (3), 13:743–767, 1963.

[117] William Thomas Tutte. How to draw a graph. Proceedings of the London Mathemati- cal Society, 3(1):743–767, 1963.

[118] John Urschel. spread_numeric+. https://math.mit.edu/~urschel/spread_numeric+.zip, April, 2021. [Online].

[119] John Urschel, Victor-Emmanuel Brunel, Ankur Moitra, and Philippe Rigollet. Learn- ing determinantal point processes with moments and cycles. In International Confer- ence on Machine Learning, pages 3511–3520. PMLR, 2017.

[120] John C Urschel. On the characterization and uniqueness of centroidal voronoi tessellations. SIAM Journal on Numerical Analysis, 55(3):1525–1547, 2017.

[121] John C Urschel. Nodal decompositions of graphs. Linear Algebra and its Applications, 539:60–71, 2018.

[122] John C Urschel. Uniform error estimates for the Lanczos method. SIAM Journal of Matrix Analysis, To appear.

[123] John C Urschel and Jake Wellens. Testing gap k-planarity is NP-complete. Informa- tion Processing Letters, 169:106083, 2021.

[124] John C Urschel and Ludmil T Zikatanov. Discrete trace theorems and energy minimizing spring embeddings of planar graphs. Linear Algebra and its Applications, 609:73–107, 2021.

199 [125] Jos LM Van Dorsselaer, Michiel E Hochstenbach, and Henk A Van Der Vorst. Com- puting probabilistic bounds for extreme eigenvalues of symmetric matrices with the Lanczos method. SIAM Journal on Matrix Analysis and Applications, 22(3):837–852, 2001.

[126] Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.

[127] Duncan J Watts and Steven H Strogatz. Collective dynamics of ‘small-world’networks. nature, 393(6684):440–442, 1998.

[128] H. Whitney. Non-separable and planar graphs. Transactions of the American Mathe- matical Society, 34:339–362, 1932.

[129] Junliang Wu, Pingping Zhang, and Wenshi Liao. Upper bounds for the spread of a matrix. Linear algebra and its applications, 437(11):2813–2822, 2012.

[130] Yangqingxiang Wu and Ludmil Zikatanov. Fourier method for approximating eigenvalues of indeﬁnite Stekloff operator. In International Conference on High Perfor- mance Computing in Science and Engineering, pages 34–46. Springer, 2017.

[131] Haotian Xu and Haotian Ou. Scalable discovery of audio ﬁngerprint motifs in broad- cast streams with determinantal point process based motif clustering. IEEE/ACM Trans. Audio, Speech & Language Processing, 24(5):978–989, 2016.

[132] Jin-ge Yao, Feifan Fan, Wayne Xin Zhao, Xiaojun Wan, Edward Y. Chang, and Jianguo Xiao. Tweet timeline generation with determinantal point processes. In Proceedings of the Thirtieth AAAI Conference on Artiﬁcial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA., pages 3080–3086, 2016.

[133] Grigory Yaroslavtsev. Going for speed: Sublinear algorithms for dense r-csps. arXiv preprint arXiv:1407.7887, 2014.

[134] Fuzhen Zhang, editor. The Schur complement and its applications, volume 4 of Numerical Methods and Algorithms. Springer-Verlag, New York, 2005.

[135] Jonathan X Zheng, Samraat Pawar, and Dan FM Goodman. Graph drawing by stochastic gradient descent. IEEE transactions on visualization and computer graph- ics, 25(9):2738–2748, 2018.

200