The Mathematical Work of Daniel Spielman

Michel X. Goemans and Jonathan A. Kelner

The Notices solicited the following article describing the work of Daniel Spielman, recipient of the 2010 . The International Mathematical Union also issued a news release about the prize, which appeared in the December 2010 issue of the Notices.

The Rolf Nevanlinna Prize is awarded once every positive constants as n grows. The goal is to get four years at the International Congress of Mathe- not only asymptotically good codes with the best maticians by the International Mathematical Union possible trade-off between the rate and the relative for outstanding contributions in mathematical as- minimum distance but also codes for which both pects of information sciences. The 2010 recipient the encoding and the decoding can be done as was Daniel Spielman, who was cited for “smoothed fast as possible, ideally in linear time. In his Ph.D. analysis of Linear Programming, algorithms for work, Spielman and his advisor graph-based codes, and applications of graph proposed the first asymptotically good codes that theory to Numerical Computing”. allow linear-time decoding (for a survey, see [6] and In this article, we summarize some of Spielman’s references therein). Their codes are low-density seminal contributions in these areas. Unfortunately, parity check (LDPC) codes, introduced by Gallager because of space constraints, we can barely scratch half a century ago, in which a sparse bipartite the surface and have to leave out many of his graph links the bits of the codeword on one side impressive results (and their interconnections) to parity checks on the other side that constrain over the last two decades. sets of bits to sum to 0 over GF(2); these are linear codes. The Sipser and Spielman codes use Error-Correcting Codes a bipartite expander graph as the parity check Several of Spielman’s important early contributions graph. The expansion property not only implies were in the design of error-correcting codes. a good bound on the minimum distance but also Error-correcting codes are fundamental in our allows a simple decoding algorithm, in which one increasingly digital lives, and they are important repeatedly flips codeword bits with more than mathematical tools in the theory of computing. half of their neighboring constraints violated. The In a code of block length n and rate r < 1 Sipser and Spielman codes require quadratic time rn over the alphabet {0, 1}, a message in {0, 1} is for encoding, as in any linear code. Subsequently, n encoded into a codeword in {0, 1} . To be able to Spielman (see also [6]) provided a more intricate correct errors in the transmission of a codeword, construction, still based on expanders, that yielded one would like the minimum distance d between the first asymptotically good codes with linear-time two codewords to be large. A code is said to encoding and decoding. be asymptotically good if both the rate r and In later work, Spielman and collaborators de- the relative distance d/n are bounded below by signed highly practical LDPC codes for the erasure channel model, in which some of the bits are Michel X. Goemans is Leighton Family Professor of at the Massachusetts Institute of Technology. simply lost. Their codes [4] approach the maximum His email address is [email protected]. capacity possible according to Shannon’s theory, Jonathan Kelner is the Kokusai Denshin Denwa Assistant and they can be encoded and decoded in linear time. Professor of Applied Mathematics at MIT and a member The decoding is based on belief propagation, which of the MIT Computer Science and Artificial Intelligence sets a missing bit whenever it can be recovered Laboratory. His email address is [email protected]. unambiguously. Despite the simplicity of these

1288 Notices of the AMS Volume 58, Number 9 algorithms, the design and analysis of these codes models, but results in one model do not necessarily are mathematically quite involved. Some of this carry over to other probabilistic models and do work has had considerable practical impact. not necessarily shed any light on instances that occur in practice. of Algorithms Spielman and Teng [7] instead proposed One of Daniel Spielman’s most celebrated contri- smoothed analysis, a blend of worst-case and butions is his introduction of a new notion for probabilistic analyses, which marvelously explains analyzing the efficiency of algorithms, the concept the typical behavior of the simplex algorithm. In of smoothed analysis, and its powerful and highly smoothed analysis, one measures the maximum technical illustration on the simplex algorithm for over all instances of a certain size of the expected linear programming. This was done in joint work running time of an algorithm under small random with Shang-Hua Teng, his longtime collaborator. perturbations of the input. In the setting of The traditional way to measure the efficiency the simplex algorithm for linear programming, of an algorithm for a computational problem Spielman and Teng consider any linear program (such as inverting a matrix, factoring a number, given by c ∈ Rd , A ∈ Rm×d , and b ∈ Rm and or computing a shortest path in a graph) is to randomly perturb A and b by independently measure its worst-case running time over all adding to each entry a Gaussian random variable of possible instances of a given input size. Efficient variance σ 2 times the maximum entry of A and b. algorithms are considered to be those whose worst- Using an intricate technical argument, they prove case running times grow polynomially with the that the expected number of steps of the simplex size of the input. But this worst-case, pessimistic algorithm with the so-called shadow-vertex pivot measure does not always reflect the behavior rule is polynomial in the dimensions of A and of algorithms on typical instances that arise in 1/σ . The shadow-vertex pivot rule essentially practice. corresponds to walking along the projection A compelling example of this is the case of linear of the polyhedron onto a 2-dimensional linear programming, in which one aims to maximize subspace containing c. One step of the proof is to a linear function subject to linear constraints: bound the expected number of vertices along a d max{hc, xi : Ax ≤ b, x ∈ R } with the inputs random 2-dimensional projection by a polynomial d m×d m c ∈ R , A ∈ R , and b ∈ R . This problem in n, d, and 1/σ . For this purpose, they prove has numerous industrial applications. In the late that the angle at a vertex of the projection is 1940s, George Dantzig proposed the simplex unlikely to be too flat, and this is formalized algorithm for linear programming, which has been and proved by showing that a random Gaussian cited as one of the “top ten algorithms of the perturbation of any given matrix is sufficiently well 20th century” (Dongarra and Sullivan). In the conditioned, a fundamental notion in numerical nondegenerate case, the simplex algorithm can be linear algebra. Their work has motivated further viewed as starting from a vertex of the polyhedron developments, including better estimates on the P = {x ∈ Rd : Ax ≤ b} and repeatedly moving to condition number of randomly perturbed matrices, an adjacent vertex (connected by an edge, or face other probabilistic models for the perturbations, of dimension 1) while improving the value hc, xi. and smoothed analyses for other algorithms and However, no pivot rule (dictating which adjacent problems (see the Smoothed Analysis page in vertex is chosen next) is known for which the [5]). Smoothed analysis also suggested the first worst-case number of operations is polynomial in randomized polynomial-time simplex algorithm the size of the input. Even worse, for almost every for linear programming by Daniel Spielman and known pivot rule, there exist instances for which the second author [3]. This could be a step toward the number of steps grows exponentially. Whether finding a strongly (not depending on the size of there exist polynomial-time algorithms for linear the numbers involved) polynomial-time algorithm programming was a long-standing open question until the discovery of the ellipsoid algorithm for linear programming. (by Nemirovski and Shor, and Khachian) in the late 1970s and interior-point algorithms (first by Fast Algorithms for Numerical Linear Karmarkar) in the mid-1980s. Still, the simplex Algebra and for Graph Problems algorithm is the most often used algorithm for In recent years, Spielman and his collaborators have linear programming, as it performs extremely ignited what appears to be an incipient revolution well on typical instances that arise in practice. in the theory of graph algorithms. By developing This almost paradoxical disparity between its and exploiting deep connections between graph exponential worst-case behavior and its fast typical theory and computational linear algebra, they behavior called for an explanation. In the 1980s, have introduced a new set of tools that have the researchers considered probabilistic analyses of potential to transform both fields. This work has the simplex method under various probabilistic already resulted in faster algorithms for several

October 2011 Notices of the AMS 1289 fundamental problems in both linear algebra and sparsification. In addition to being a surprising . result in graph theory, their techniques have led to a new simpler proof of an important theorem by Fast Algorithms for Laplacian Linear Systems Bourgain and Tzafriri in functional analysis and

With any graph G, one can associate a matrix LG a new approach to the long-open Kadison-Singer known as the graph Laplacian. Much of Spielman’s conjecture. recent work has been in the realm of spectral graph theory, which studies the rich interplay between Local Clustering the combinatorial properties of G and the linear A key part of Spielman and Teng’s construction algebraic properties of LG. Motivated by scientific computing, Spielman’s of sparsifiers was an algorithm for local clustering work in this field began with the search for a [10]. The goal is to find a well-connected cluster faster algorithm to solve linear systems of the of vertices that contains a given vertex v in some form LGx = b. While the best known algorithms for graph G so that the running time grows with the solving general n × n linear systems require time size of the cluster and depends only very weakly O(n2.378), Spielman and Teng showed that this on the size of G. This can be a vast improvement class of linear systems can be solved much more over global algorithms when one is looking for a quickly. In a technical tour de force, they gave an small cluster inside a huge ambient graph. algorithm [8] that approximately solves such linear Assuming that v is sufficiently well contained in systems to any given accuracy in an amount of a cluster, Spielman and Teng provide an algorithm time that is nearly linear in the number of edges for finding this cluster using the connection of G. This allows one to solve sparse diagonally between graph conductance and random walks. dominant linear systems, which are ubiquitous in This has sparked an active area of research into scientific computing. improving and applying these techniques. However, its impact has extended far beyond solving linear systems. Their algorithm works by finding simpler graphs whose Laplacians are Electrical Flows and Graph Algorithms algebraically similar to LG and using the solutions In constructing their solver, Spielman and Teng to linear systems involving them to speed up an use graph theory to speed up linear algebra. In iterative solver. To find these graphs, they introduce more recent work, Spielman and his collaborators two algorithmic tools that have found broader have shown how to exploit this connection in the applicability and led to substantial follow-up work: other direction, using the linear system solver to spectral sparsification and local clustering. obtain faster algorithms for several fundamental graph problems, most notably for approximating Spectral Sparsification maximum flows in undirected graphs [2]. To do this, The goal of sparsification is to approximate a dense they model the edges of a graph as resistors, and graph G by a sparser graph H. In a landmark they study the electrical currents and potentials paper, Benczur and Karger showed that any graph that result when one imposes voltages or injects G can be approximated in nearly linear time by a current into various parts of the graph. It turns out weighted graph H with O(n log n/2) so that the weight of every cut in H is preserved up to a factor that this can be computed by solving a Laplacian of at most 1 + . This allows one to approximately linear system, so it can be done using the Spielman- solve any problem whose solution depends only Teng solver in nearly linear time. Since electrical on the weights of cuts by operating on H instead flows encode complex global information about a of G. Since H has many fewer edges when G is circuit, this provides a powerful new tool to probe dense, this is usually much faster. the structure of a graph. For their solver, Spielman and Teng introduced the notion of spectral sparsification, which requires Summary that LG and LH be approximately the same as qua- We were able to give only a glimpse of Daniel dratic forms. This is a strictly stronger requirement Spielman’s mathematical work in the theory of that implies that H approximately preserves the computing, but we hope we were able to convey that weights of cuts as well. They then showed how O(1) his numerous mathematical insights have led to to compute such sparsifiers with O(n log n/2) edges in nearly linear time [9]. groundbreaking results in the design and analysis In later work, Batson, Spielman, and Srivastava of algorithms for error-correcting codes, linear [1] have shown that one can eliminate the polylog- programming, graph algorithms, and numerical arithmic factors and compute spectral sparsifiers linear algebra. We refer the interested reader to with just O(n/2) edges. This was not previously his website [5] for additional results, pointers, known even for the weaker cut-based notion of references, and mathematical gems.

1290 Notices of the AMS Volume 58, Number 9 References [1] Joshua Batson, Daniel A. Spielman, and N. Srivas- tava, Twice-Ramanujan sparsifiers, arXiv:0808.0163, 2008. [2] Paul Christiano, Jonathan A. Kelner, Alek- sander Madry, Daniel A. Spielman, and Shang-Hua Teng, Electrical flows, Laplacian systems, and faster approximation of maximum flow in undirected graphs, arXiv:1010.2921. [3] Jonathan A. Kelner and Daniel A. Spielman,A randomized polynomial-time simplex algorithm for linear programming, Proceedings of the 38th Annual ACM Symposium on Theory of Computing, 2006. AMS presidents [4] Michael G. Luby, , M. Amin Shokrollahi, and Daniel A. Spielman, Efficient play a key role in erasure correcting codes; Improved low-density parity-check codes using irregular graphs, IEEE leading the Society Transactions on Information Theory, 47(2), 569–584; and representing the 585–598. 2001. [5] Daniel A. Spielman, http://www.cs.yale.edu/ profession. Browse homes/spielman/. [6] Daniel A. Spielman, Constructing error-correcting through the timeline codes from expanders, in: Emerging Applications of to see each AMS president’s Number Theory, IMA Volume 109 (Dennis A. Hejhal, Joel Friedman, Martin C. Gutzwiller, and Andrew page, which includes the M. Odlyzko, eds.), 591–600, 1999. [7] Daniel A. Spielman and Shang-Hua Teng, Smoothed institution and date of analysis: Why the simplex algorithm usually takes polynomial time, Journal of the ACM 51 (3), 385–463, his/her doctoral degree, 2004. a brief note about his/her [8] , Nearly-linear time algorithms for precondi- tioning and solving symmetric, diagonally dominant academic career and honors, linear systems, arXiv:cs/0607105, 2006. [9] , Spectral sparsification of graphs, and links to more extensive arXiv:0808.4134, 2008. biographical information. [10] , A local clustering algorithm for massive graphs and its application to nearly-linear time graph partitioning, arXiv:0809.3232, 2008.

October 2011 Notices of the AMS 1291