<<

Faster (Spectral) via

Nisheeth K. Vishnoi EPFL

1.0 d=0

d=1 0.5 d=5

-1.0 -0.5 d=4 0.5 1.0

d=3 -0.5

d=2 -1.0

Based on a recent monograph with Sushant Sachdeva (Yale) Simons Institute, Dec. 3, 2014 The Goal

Many algorithms today rely on our ability to quickly compute good to matrix--vector products: e.g., As v, A−1v, exp(−A)v, ... or top few eigenvalues and eigenvectors.

Demonstrate How to reduce the problem of computing these primitives to a small number of those of the form Bu where B is a matrix closely related to A (often A itself) and u is some vector.

A key feature of these algorithms is that if the matrix-vector product for A can be computed quickly, e.g., when A is sparse, then Bu can also be computed in essentially the same time.

The classical area in analysis of approximation theory provides the right framework to study these questions. The Goal

Many algorithms today rely on our ability to quickly compute good approximations to matrix-function-vector products: e.g., As v, A−1v, exp(−A)v, ... or top few eigenvalues and eigenvectors.

Demonstrate How to reduce the problem of computing these primitives to a small number of those of the form Bu where B is a matrix closely related to A (often A itself) and u is some vector.

A key feature of these algorithms is that if the matrix-vector product for A can be computed quickly, e.g., when A is sparse, then Bu can also be computed in essentially the same time.

The classical area in analysis of approximation theory provides the right framework to study these questions. The Goal

Many algorithms today rely on our ability to quickly compute good approximations to matrix-function-vector products: e.g., As v, A−1v, exp(−A)v, ... or top few eigenvalues and eigenvectors.

Demonstrate How to reduce the problem of computing these primitives to a small number of those of the form Bu where B is a matrix closely related to A (often A itself) and u is some vector.

A key feature of these algorithms is that if the matrix-vector product for A can be computed quickly, e.g., when A is sparse, then Bu can also be computed in essentially the same time.

The classical area in analysis of approximation theory provides the right framework to study these questions. The Goal

Many algorithms today rely on our ability to quickly compute good approximations to matrix-function-vector products: e.g., As v, A−1v, exp(−A)v, ... or top few eigenvalues and eigenvectors.

Demonstrate How to reduce the problem of computing these primitives to a small number of those of the form Bu where B is a matrix closely related to A (often A itself) and u is some vector.

A key feature of these algorithms is that if the matrix-vector product for A can be computed quickly, e.g., when A is sparse, then Bu can also be computed in essentially the same time.

The classical area in analysis of approximation theory provides the right framework to study these questions. Approximation Theory Approximation Theory Approximation Theory Approximation Theory Approximation Theory

How well can functions be approximated by simpler ones?

Uniform (Chebyshev) Approximation by /Rationals For f : R 7→ R and an interval I, what is the closest a degree d / can remain to f (x) throughout I inf sup|f (x) − p(x)|. p∈Σd x∈I

inf sup|f (x) − p(x)/q(x)|. p,q∈Σd x∈I

Σd : set of all polynomials of degree at most d.

150+ years of fascinating history, deep results and many applications.

Interested in fundamental functions such as xs , e−x and 1/x over finite and infinite intervals such as [−1, 1], [0, n], [0, ∞). For our applications good enough approximations suffice. Approximation Theory

How well can functions be approximated by simpler ones?

Uniform (Chebyshev) Approximation by Polynomials/Rationals For f : R 7→ R and an interval I, what is the closest a degree d polynomial/rational function can remain to f (x) throughout I inf sup|f (x) − p(x)|. p∈Σd x∈I

inf sup|f (x) − p(x)/q(x)|. p,q∈Σd x∈I

Σd : set of all polynomials of degree at most d.

150+ years of fascinating history, deep results and many applications.

Interested in fundamental functions such as xs , e−x and 1/x over finite and infinite intervals such as [−1, 1], [0, n], [0, ∞). For our applications good enough approximations suffice. Approximation Theory

How well can functions be approximated by simpler ones?

Uniform (Chebyshev) Approximation by Polynomials/Rationals For f : R 7→ R and an interval I, what is the closest a degree d polynomial/rational function can remain to f (x) throughout I inf sup|f (x) − p(x)|. p∈Σd x∈I

inf sup|f (x) − p(x)/q(x)|. p,q∈Σd x∈I

Σd : set of all polynomials of degree at most d.

150+ years of fascinating history, deep results and many applications.

Interested in fundamental functions such as xs , e−x and 1/x over finite and infinite intervals such as [−1, 1], [0, n], [0, ∞). For our applications good enough approximations suffice. Approximation Theory

How well can functions be approximated by simpler ones?

Uniform (Chebyshev) Approximation by Polynomials/Rationals For f : R 7→ R and an interval I, what is the closest a degree d polynomial/rational function can remain to f (x) throughout I inf sup|f (x) − p(x)|. p∈Σd x∈I

inf sup|f (x) − p(x)/q(x)|. p,q∈Σd x∈I

Σd : set of all polynomials of degree at most d.

150+ years of fascinating history, deep results and many applications.

Interested in fundamental functions such as xs , e−x and 1/x over finite and infinite intervals such as [−1, 1], [0, n], [0, ∞). For our applications good enough approximations suffice. Approximation Theory

How well can functions be approximated by simpler ones?

Uniform (Chebyshev) Approximation by Polynomials/Rationals For f : R 7→ R and an interval I, what is the closest a degree d polynomial/rational function can remain to f (x) throughout I inf sup|f (x) − p(x)|. p∈Σd x∈I

inf sup|f (x) − p(x)/q(x)|. p,q∈Σd x∈I

Σd : set of all polynomials of degree at most d.

150+ years of fascinating history, deep results and many applications.

Interested in fundamental functions such as xs , e−x and 1/x over finite and infinite intervals such as [−1, 1], [0, n], [0, ∞). For our applications good enough approximations suffice. Approximation Theory

How well can functions be approximated by simpler ones?

Uniform (Chebyshev) Approximation by Polynomials/Rationals For f : R 7→ R and an interval I, what is the closest a degree d polynomial/rational function can remain to f (x) throughout I inf sup|f (x) − p(x)|. p∈Σd x∈I

inf sup|f (x) − p(x)/q(x)|. p,q∈Σd x∈I

Σd : set of all polynomials of degree at most d.

150+ years of fascinating history, deep results and many applications.

Interested in fundamental functions such as xs , e−x and 1/x over finite and infinite intervals such as [−1, 1], [0, n], [0, ∞). For our applications good enough approximations suffice. Algorithms/Numerical Linear Alg.- f (A)v, Eigenvalues, ..

A simple example: Compute As v where A is symmetric with eigenvalues in [−1, 1], v is a vector and s is a large positive integer.

The straightforward way to compute As v takes time O(ms) where m is the number of non-zero entries in A. Suppose xs can be δ-approximated over the interval [−1, 1] Pd i by a degree d polynomial ps,d (x) = i=0 ai x . s Pd i Candidate approximation to A v: i=0 ai A v. Pd i The time to compute i=0 ai A v is O(md). Pd i s k i=0 ai A v − A vk ≤ δkvk since all the eigenvalues of A lie in [−1, 1], and s ps,d is δ-close to x in the entire interval [−1, 1].

How small can d be? Algorithms/Numerical Linear Alg.- f (A)v, Eigenvalues, ..

A simple example: Compute As v where A is symmetric with eigenvalues in [−1, 1], v is a vector and s is a large positive integer.

The straightforward way to compute As v takes time O(ms) where m is the number of non-zero entries in A. Suppose xs can be δ-approximated over the interval [−1, 1] Pd i by a degree d polynomial ps,d (x) = i=0 ai x . s Pd i Candidate approximation to A v: i=0 ai A v. Pd i The time to compute i=0 ai A v is O(md). Pd i s k i=0 ai A v − A vk ≤ δkvk since all the eigenvalues of A lie in [−1, 1], and s ps,d is δ-close to x in the entire interval [−1, 1].

How small can d be? Algorithms/Numerical Linear Alg.- f (A)v, Eigenvalues, ..

A simple example: Compute As v where A is symmetric with eigenvalues in [−1, 1], v is a vector and s is a large positive integer.

The straightforward way to compute As v takes time O(ms) where m is the number of non-zero entries in A. Suppose xs can be δ-approximated over the interval [−1, 1] Pd i by a degree d polynomial ps,d (x) = i=0 ai x . s Pd i Candidate approximation to A v: i=0 ai A v. Pd i The time to compute i=0 ai A v is O(md). Pd i s k i=0 ai A v − A vk ≤ δkvk since all the eigenvalues of A lie in [−1, 1], and s ps,d is δ-close to x in the entire interval [−1, 1].

How small can d be? Algorithms/Numerical Linear Alg.- f (A)v, Eigenvalues, ..

A simple example: Compute As v where A is symmetric with eigenvalues in [−1, 1], v is a vector and s is a large positive integer.

The straightforward way to compute As v takes time O(ms) where m is the number of non-zero entries in A. Suppose xs can be δ-approximated over the interval [−1, 1] Pd i by a degree d polynomial ps,d (x) = i=0 ai x . s Pd i Candidate approximation to A v: i=0 ai A v. Pd i The time to compute i=0 ai A v is O(md). Pd i s k i=0 ai A v − A vk ≤ δkvk since all the eigenvalues of A lie in [−1, 1], and s ps,d is δ-close to x in the entire interval [−1, 1].

How small can d be? Algorithms/Numerical Linear Alg.- f (A)v, Eigenvalues, ..

A simple example: Compute As v where A is symmetric with eigenvalues in [−1, 1], v is a vector and s is a large positive integer.

The straightforward way to compute As v takes time O(ms) where m is the number of non-zero entries in A. Suppose xs can be δ-approximated over the interval [−1, 1] Pd i by a degree d polynomial ps,d (x) = i=0 ai x . s Pd i Candidate approximation to A v: i=0 ai A v. Pd i The time to compute i=0 ai A v is O(md). Pd i s k i=0 ai A v − A vk ≤ δkvk since all the eigenvalues of A lie in [−1, 1], and s ps,d is δ-close to x in the entire interval [−1, 1].

How small can d be? Algorithms/Numerical Linear Alg.- f (A)v, Eigenvalues, ..

A simple example: Compute As v where A is symmetric with eigenvalues in [−1, 1], v is a vector and s is a large positive integer.

The straightforward way to compute As v takes time O(ms) where m is the number of non-zero entries in A. Suppose xs can be δ-approximated over the interval [−1, 1] Pd i by a degree d polynomial ps,d (x) = i=0 ai x . s Pd i Candidate approximation to A v: i=0 ai A v. Pd i The time to compute i=0 ai A v is O(md). Pd i s k i=0 ai A v − A vk ≤ δkvk since all the eigenvalues of A lie in [−1, 1], and s ps,d is δ-close to x in the entire interval [−1, 1].

How small can d be? Algorithms/Numerical Linear Alg.- f (A)v, Eigenvalues, ..

A simple example: Compute As v where A is symmetric with eigenvalues in [−1, 1], v is a vector and s is a large positive integer.

The straightforward way to compute As v takes time O(ms) where m is the number of non-zero entries in A. Suppose xs can be δ-approximated over the interval [−1, 1] Pd i by a degree d polynomial ps,d (x) = i=0 ai x . s Pd i Candidate approximation to A v: i=0 ai A v. Pd i The time to compute i=0 ai A v is O(md). Pd i s k i=0 ai A v − A vk ≤ δkvk since all the eigenvalues of A lie in [−1, 1], and s ps,d is δ-close to x in the entire interval [−1, 1].

How small can d be? Algorithms/Numerical Linear Alg.- f (A)v, Eigenvalues, ..

A simple example: Compute As v where A is symmetric with eigenvalues in [−1, 1], v is a vector and s is a large positive integer.

The straightforward way to compute As v takes time O(ms) where m is the number of non-zero entries in A. Suppose xs can be δ-approximated over the interval [−1, 1] Pd i by a degree d polynomial ps,d (x) = i=0 ai x . s Pd i Candidate approximation to A v: i=0 ai A v. Pd i The time to compute i=0 ai A v is O(md). Pd i s k i=0 ai A v − A vk ≤ δkvk since all the eigenvalues of A lie in [−1, 1], and s ps,d is δ-close to x in the entire interval [−1, 1].

How small can d be? Example: Approximating the Monomial

p For any s, for any δ > 0, and d ∼ s log (1/δ), there is a s polynomial ps,d s.t. sup |ps,d (x) − x | ≤ δ. x∈[−1,1]

Simulating Random Walks: If A is random walk matrix of a √ graph, we can simulate s steps of a random walk in m s time.

Conjugate Gradient Method: Given Ax = b with eigenvalues of A in (0, 1], one can find y s.t. −1 −1 p ky − A bkA ≤ δkA bkA in time roughly m κ(A) log 1/δ. Quadratic speedup over the Power Method: Given A, in √ time ∼ m/ δ can compute a value µ ∈ [(1 − δ)λ1(A), λ1(A)]. Example: Approximating the Monomial

p For any s, for any δ > 0, and d ∼ s log (1/δ), there is a s polynomial ps,d s.t. sup |ps,d (x) − x | ≤ δ. x∈[−1,1]

Simulating Random Walks: If A is random walk matrix of a √ graph, we can simulate s steps of a random walk in m s time.

Conjugate Gradient Method: Given Ax = b with eigenvalues of A in (0, 1], one can find y s.t. −1 −1 p ky − A bkA ≤ δkA bkA in time roughly m κ(A) log 1/δ. Quadratic speedup over the Power Method: Given A, in √ time ∼ m/ δ can compute a value µ ∈ [(1 − δ)λ1(A), λ1(A)]. Example: Approximating the Monomial

p For any s, for any δ > 0, and d ∼ s log (1/δ), there is a s polynomial ps,d s.t. sup |ps,d (x) − x | ≤ δ. x∈[−1,1]

Simulating Random Walks: If A is random walk matrix of a √ graph, we can simulate s steps of a random walk in m s time.

Conjugate Gradient Method: Given Ax = b with eigenvalues of A in (0, 1], one can find y s.t. −1 −1 p ky − A bkA ≤ δkA bkA in time roughly m κ(A) log 1/δ. Quadratic speedup over the Power Method: Given A, in √ time ∼ m/ δ can compute a value µ ∈ [(1 − δ)λ1(A), λ1(A)]. Example: Approximating the Monomial

p For any s, for any δ > 0, and d ∼ s log (1/δ), there is a s polynomial ps,d s.t. sup |ps,d (x) − x | ≤ δ. x∈[−1,1]

Simulating Random Walks: If A is random walk matrix of a √ graph, we can simulate s steps of a random walk in m s time.

Conjugate Gradient Method: Given Ax = b with eigenvalues of A in (0, 1], one can find y s.t. −1 −1 p ky − A bkA ≤ δkA bkA in time roughly m κ(A) log 1/δ. Quadratic speedup over the Power Method: Given A, in √ time ∼ m/ δ can compute a value µ ∈ [(1 − δ)λ1(A), λ1(A)].

The Chebyshev polynomial of deg. d is defined recursively to be: def Td (x) = 2xTd−1(x) − Td−2(x) def def for d ≥ 2 with T0(x) = 1, T1(x) = x. Averaging Property

Td+1(x)+Td−1(x) xTd (x) = 2 .

Boundedness Property

For any θ, and any integer d, Td (cos θ) = cos(dθ). Thus, |Td (x)| ≤ 1 for all x ∈ [−1, 1]. Chebyshev Polynomials

The Chebyshev polynomial of deg. d is defined recursively to be: def Td (x) = 2xTd−1(x) − Td−2(x) def def for d ≥ 2 with T0(x) = 1, T1(x) = x. Averaging Property

Td+1(x)+Td−1(x) xTd (x) = 2 .

Boundedness Property

For any θ, and any integer d, Td (cos θ) = cos(dθ). Thus, |Td (x)| ≤ 1 for all x ∈ [−1, 1]. Chebyshev Polynomials

The Chebyshev polynomial of deg. d is defined recursively to be: def Td (x) = 2xTd−1(x) − Td−2(x) def def for d ≥ 2 with T0(x) = 1, T1(x) = x. Averaging Property

Td+1(x)+Td−1(x) xTd (x) = 2 .

Boundedness Property

For any θ, and any integer d, Td (cos θ) = cos(dθ). Thus, |Td (x)| ≤ 1 for all x ∈ [−1, 1]. Chebyshev Polynomials

The Chebyshev polynomial of deg. d is defined recursively to be: def Td (x) = 2xTd−1(x) − Td−2(x) def def for d ≥ 2 with T0(x) = 1, T1(x) = x. Averaging Property

Td+1(x)+Td−1(x) xTd (x) = 2 .

Boundedness Property

For any θ, and any integer d, Td (cos θ) = cos(dθ). Thus, |Td (x)| ≤ 1 for all x ∈ [−1, 1]. Back to Approximating Monomials def Ps def Ds = Yi where Y1,..., Ys i.i.d. ±1 w.p. 1/2 (D0 = 0). i=1 h i p 2 Thus, Pr |Ds | ≥ 2s log ( /δ) ≤ δ.

s Key Claim: E [TDs (x)] = x . Y1,...,Ys

s+1 x = x · E TDs (x) = E [x · TDs (x)] Y1,...,Ys Y1,...,Ys 1 = E [ /2(TDs +1(x) + TDs −1(x))] = E [TDs+1 (x)]. Y1,...,Ys Y1,...,Ys+1 Our Approximation to xs : def   p 2 ps,d (x) = E TDs (x) · 1|Ds |≤d for d = 2s log ( /δ). Y1,...,Ys

s   sup |ps,d (x) − x | = sup E TDs (x) · 1|Ds |>d x∈[−1,1] x∈[−1,1] Y1,...,Ys " #   ≤ E 1|Ds |>d · sup |TDs (x)| ≤ E 1|Ds |>d ≤ δ. Y1,...,Ys x∈[−1,1] Y1,...,Ys Back to Approximating Monomials def Ps def Ds = Yi where Y1,..., Ys i.i.d. ±1 w.p. 1/2 (D0 = 0). i=1 h i p 2 Thus, Pr |Ds | ≥ 2s log ( /δ) ≤ δ.

s Key Claim: E [TDs (x)] = x . Y1,...,Ys

s+1 x = x · E TDs (x) = E [x · TDs (x)] Y1,...,Ys Y1,...,Ys 1 = E [ /2(TDs +1(x) + TDs −1(x))] = E [TDs+1 (x)]. Y1,...,Ys Y1,...,Ys+1 Our Approximation to xs : def   p 2 ps,d (x) = E TDs (x) · 1|Ds |≤d for d = 2s log ( /δ). Y1,...,Ys

s   sup |ps,d (x) − x | = sup E TDs (x) · 1|Ds |>d x∈[−1,1] x∈[−1,1] Y1,...,Ys " #   ≤ E 1|Ds |>d · sup |TDs (x)| ≤ E 1|Ds |>d ≤ δ. Y1,...,Ys x∈[−1,1] Y1,...,Ys Back to Approximating Monomials def Ps def Ds = Yi where Y1,..., Ys i.i.d. ±1 w.p. 1/2 (D0 = 0). i=1 h i p 2 Thus, Pr |Ds | ≥ 2s log ( /δ) ≤ δ.

s Key Claim: E [TDs (x)] = x . Y1,...,Ys

s+1 x = x · E TDs (x) = E [x · TDs (x)] Y1,...,Ys Y1,...,Ys 1 = E [ /2(TDs +1(x) + TDs −1(x))] = E [TDs+1 (x)]. Y1,...,Ys Y1,...,Ys+1 Our Approximation to xs : def   p 2 ps,d (x) = E TDs (x) · 1|Ds |≤d for d = 2s log ( /δ). Y1,...,Ys

s   sup |ps,d (x) − x | = sup E TDs (x) · 1|Ds |>d x∈[−1,1] x∈[−1,1] Y1,...,Ys " #   ≤ E 1|Ds |>d · sup |TDs (x)| ≤ E 1|Ds |>d ≤ δ. Y1,...,Ys x∈[−1,1] Y1,...,Ys Back to Approximating Monomials def Ps def Ds = Yi where Y1,..., Ys i.i.d. ±1 w.p. 1/2 (D0 = 0). i=1 h i p 2 Thus, Pr |Ds | ≥ 2s log ( /δ) ≤ δ.

s Key Claim: E [TDs (x)] = x . Y1,...,Ys

s+1 x = x · E TDs (x) = E [x · TDs (x)] Y1,...,Ys Y1,...,Ys 1 = E [ /2(TDs +1(x) + TDs −1(x))] = E [TDs+1 (x)]. Y1,...,Ys Y1,...,Ys+1 Our Approximation to xs : def   p 2 ps,d (x) = E TDs (x) · 1|Ds |≤d for d = 2s log ( /δ). Y1,...,Ys

s   sup |ps,d (x) − x | = sup E TDs (x) · 1|Ds |>d x∈[−1,1] x∈[−1,1] Y1,...,Ys " #   ≤ E 1|Ds |>d · sup |TDs (x)| ≤ E 1|Ds |>d ≤ δ. Y1,...,Ys x∈[−1,1] Y1,...,Ys Back to Approximating Monomials def Ps def Ds = Yi where Y1,..., Ys i.i.d. ±1 w.p. 1/2 (D0 = 0). i=1 h i p 2 Thus, Pr |Ds | ≥ 2s log ( /δ) ≤ δ.

s Key Claim: E [TDs (x)] = x . Y1,...,Ys

s+1 x = x · E TDs (x) = E [x · TDs (x)] Y1,...,Ys Y1,...,Ys 1 = E [ /2(TDs +1(x) + TDs −1(x))] = E [TDs+1 (x)]. Y1,...,Ys Y1,...,Ys+1 Our Approximation to xs : def   p 2 ps,d (x) = E TDs (x) · 1|Ds |≤d for d = 2s log ( /δ). Y1,...,Ys

s   sup |ps,d (x) − x | = sup E TDs (x) · 1|Ds |>d x∈[−1,1] x∈[−1,1] Y1,...,Ys " #   ≤ E 1|Ds |>d · sup |TDs (x)| ≤ E 1|Ds |>d ≤ δ. Y1,...,Ys x∈[−1,1] Y1,...,Ys Back to Approximating Monomials def Ps def Ds = Yi where Y1,..., Ys i.i.d. ±1 w.p. 1/2 (D0 = 0). i=1 h i p 2 Thus, Pr |Ds | ≥ 2s log ( /δ) ≤ δ.

s Key Claim: E [TDs (x)] = x . Y1,...,Ys

s+1 x = x · E TDs (x) = E [x · TDs (x)] Y1,...,Ys Y1,...,Ys 1 = E [ /2(TDs +1(x) + TDs −1(x))] = E [TDs+1 (x)]. Y1,...,Ys Y1,...,Ys+1 Our Approximation to xs : def   p 2 ps,d (x) = E TDs (x) · 1|Ds |≤d for d = 2s log ( /δ). Y1,...,Ys

s   sup |ps,d (x) − x | = sup E TDs (x) · 1|Ds |>d x∈[−1,1] x∈[−1,1] Y1,...,Ys " #   ≤ E 1|Ds |>d · sup |TDs (x)| ≤ E 1|Ds |>d ≤ δ. Y1,...,Ys x∈[−1,1] Y1,...,Ys A General Recipe?

Pk s Suppose f (x) is δ-approximated by a Taylor polynomial s=0 cs x , then one may instead try the approx. (with suitably shifted ps,d ) k X c p √ (x) s s, s log 1/δ s=0 . Approximating the Exponential

For every b > 0, and δ, there is a polynomial rb,δ s.t. −x p 1 supx∈[0,b] |e − rb,δ(x)| ≤ δ; degree ∼ b log /δ. (Taylor -Ω(b).)

p Implies O˜(m kAk log 1/δ) time to compute a δ-approximation to e−Av for a PSD A. Useful in solving SDPs.

When A is a graph Laplacian, implies an optimal spectral algorithm √ for Balanced Separator that runs in time O˜(m/ γ).(γ is the target conductance) [Orecchia-Sachdeva-V. 2012]. How far can polynomial approximations take us? A General Recipe?

Pk s Suppose f (x) is δ-approximated by a Taylor polynomial s=0 cs x , then one may instead try the approx. (with suitably shifted ps,d ) k X c p √ (x) s s, s log 1/δ s=0 . Approximating the Exponential

For every b > 0, and δ, there is a polynomial rb,δ s.t. −x p 1 supx∈[0,b] |e − rb,δ(x)| ≤ δ; degree ∼ b log /δ. (Taylor -Ω(b).)

p Implies O˜(m kAk log 1/δ) time algorithm to compute a δ-approximation to e−Av for a PSD A. Useful in solving SDPs.

When A is a graph Laplacian, implies an optimal spectral algorithm √ for Balanced Separator that runs in time O˜(m/ γ).(γ is the target conductance) [Orecchia-Sachdeva-V. 2012]. How far can polynomial approximations take us? A General Recipe?

Pk s Suppose f (x) is δ-approximated by a Taylor polynomial s=0 cs x , then one may instead try the approx. (with suitably shifted ps,d ) k X c p √ (x) s s, s log 1/δ s=0 . Approximating the Exponential

For every b > 0, and δ, there is a polynomial rb,δ s.t. −x p 1 supx∈[0,b] |e − rb,δ(x)| ≤ δ; degree ∼ b log /δ. (Taylor -Ω(b).)

p Implies O˜(m kAk log 1/δ) time algorithm to compute a δ-approximation to e−Av for a PSD A. Useful in solving SDPs.

When A is a graph Laplacian, implies an optimal spectral algorithm √ for Balanced Separator that runs in time O˜(m/ γ).(γ is the target conductance) [Orecchia-Sachdeva-V. 2012]. How far can polynomial approximations take us? A General Recipe?

Pk s Suppose f (x) is δ-approximated by a Taylor polynomial s=0 cs x , then one may instead try the approx. (with suitably shifted ps,d ) k X c p √ (x) s s, s log 1/δ s=0 . Approximating the Exponential

For every b > 0, and δ, there is a polynomial rb,δ s.t. −x p 1 supx∈[0,b] |e − rb,δ(x)| ≤ δ; degree ∼ b log /δ. (Taylor -Ω(b).)

p Implies O˜(m kAk log 1/δ) time algorithm to compute a δ-approximation to e−Av for a PSD A. Useful in solving SDPs.

When A is a graph Laplacian, implies an optimal spectral algorithm √ for Balanced Separator that runs in time O˜(m/ γ).(γ is the target conductance) [Orecchia-Sachdeva-V. 2012]. How far can polynomial approximations take us? A General Recipe?

Pk s Suppose f (x) is δ-approximated by a Taylor polynomial s=0 cs x , then one may instead try the approx. (with suitably shifted ps,d ) k X c p √ (x) s s, s log 1/δ s=0 . Approximating the Exponential

For every b > 0, and δ, there is a polynomial rb,δ s.t. −x p 1 supx∈[0,b] |e − rb,δ(x)| ≤ δ; degree ∼ b log /δ. (Taylor -Ω(b).)

p Implies O˜(m kAk log 1/δ) time algorithm to compute a δ-approximation to e−Av for a PSD A. Useful in solving SDPs.

When A is a graph Laplacian, implies an optimal spectral algorithm √ for Balanced Separator that runs in time O˜(m/ γ).(γ is the target conductance) [Orecchia-Sachdeva-V. 2012]. How far can polynomial approximations take us? Lower Bounds for Polynomial Approximations

Bad News [see Sachdeva-V. 2014] √ Polynomial approx. to xs on [−1, 1] requires degreeΩ( s). √ Polynomials approx. to e−x on [0, b] requires degreeΩ( b).

Markov’s Theorem (inspired by a prob. of Mendeleev in Chemistry) Any degree-d polynomial p s.t. |p(x)| ≤ 1 over [−1, 1] must have its derivative |p(1)(x)| ≤ d2 for all x ∈ [−1, 1].

Chebyshev polynomials are a tight example for this theorem.

Bypass this barrier via rational functions! Lower Bounds for Polynomial Approximations

Bad News [see Sachdeva-V. 2014] √ Polynomial approx. to xs on [−1, 1] requires degreeΩ( s). √ Polynomials approx. to e−x on [0, b] requires degreeΩ( b).

Markov’s Theorem (inspired by a prob. of Mendeleev in Chemistry) Any degree-d polynomial p s.t. |p(x)| ≤ 1 over [−1, 1] must have its derivative |p(1)(x)| ≤ d2 for all x ∈ [−1, 1].

Chebyshev polynomials are a tight example for this theorem.

Bypass this barrier via rational functions! Lower Bounds for Polynomial Approximations

Bad News [see Sachdeva-V. 2014] √ Polynomial approx. to xs on [−1, 1] requires degreeΩ( s). √ Polynomials approx. to e−x on [0, b] requires degreeΩ( b).

Markov’s Theorem (inspired by a prob. of Mendeleev in Chemistry) Any degree-d polynomial p s.t. |p(x)| ≤ 1 over [−1, 1] must have its derivative |p(1)(x)| ≤ d2 for all x ∈ [−1, 1].

Chebyshev polynomials are a tight example for this theorem.

Bypass this barrier via rational functions! Lower Bounds for Polynomial Approximations

Bad News [see Sachdeva-V. 2014] √ Polynomial approx. to xs on [−1, 1] requires degreeΩ( s). √ Polynomials approx. to e−x on [0, b] requires degreeΩ( b).

Markov’s Theorem (inspired by a prob. of Mendeleev in Chemistry) Any degree-d polynomial p s.t. |p(x)| ≤ 1 over [−1, 1] must have its derivative |p(1)(x)| ≤ d2 for all x ∈ [−1, 1].

Chebyshev polynomials are a tight example for this theorem.

Bypass this barrier via rational functions! Example: Approximating the Exponential

For all integers d ≥ 0, there is a degree-d polynomial Sd (x) s.t.

sup e−x − 1 ≤ 2−Ω(d). x∈[0,∞) Sd (x)

def Pd xk Sd (x) = k=0 k! . (Proof by induction.)

No dependence on the length of the interval!

Hence, for any δ > 0, we have a rational function of degree −x O(log 1/δ) that is a δ-approximation to e . For most applications, an error of δ = 1/poly(n) suffices, so we can choose d = O(log n).

−1 −A Thus,( Sd (A)) v δ-approximates e v.

−1 How do we compute (Sd (A)) v? Example: Approximating the Exponential

For all integers d ≥ 0, there is a degree-d polynomial Sd (x) s.t.

sup e−x − 1 ≤ 2−Ω(d). x∈[0,∞) Sd (x)

def Pd xk Sd (x) = k=0 k! . (Proof by induction.)

No dependence on the length of the interval!

Hence, for any δ > 0, we have a rational function of degree −x O(log 1/δ) that is a δ-approximation to e . For most applications, an error of δ = 1/poly(n) suffices, so we can choose d = O(log n).

−1 −A Thus,( Sd (A)) v δ-approximates e v.

−1 How do we compute (Sd (A)) v? Example: Approximating the Exponential

For all integers d ≥ 0, there is a degree-d polynomial Sd (x) s.t.

sup e−x − 1 ≤ 2−Ω(d). x∈[0,∞) Sd (x)

def Pd xk Sd (x) = k=0 k! . (Proof by induction.)

No dependence on the length of the interval!

Hence, for any δ > 0, we have a rational function of degree −x O(log 1/δ) that is a δ-approximation to e . For most applications, an error of δ = 1/poly(n) suffices, so we can choose d = O(log n).

−1 −A Thus,( Sd (A)) v δ-approximates e v.

−1 How do we compute (Sd (A)) v? Example: Approximating the Exponential

For all integers d ≥ 0, there is a degree-d polynomial Sd (x) s.t.

sup e−x − 1 ≤ 2−Ω(d). x∈[0,∞) Sd (x)

def Pd xk Sd (x) = k=0 k! . (Proof by induction.)

No dependence on the length of the interval!

Hence, for any δ > 0, we have a rational function of degree −x O(log 1/δ) that is a δ-approximation to e . For most applications, an error of δ = 1/poly(n) suffices, so we can choose d = O(log n).

−1 −A Thus,( Sd (A)) v δ-approximates e v.

−1 How do we compute (Sd (A)) v? Example: Approximating the Exponential

For all integers d ≥ 0, there is a degree-d polynomial Sd (x) s.t.

sup e−x − 1 ≤ 2−Ω(d). x∈[0,∞) Sd (x)

def Pd xk Sd (x) = k=0 k! . (Proof by induction.)

No dependence on the length of the interval!

Hence, for any δ > 0, we have a rational function of degree −x O(log 1/δ) that is a δ-approximation to e . For most applications, an error of δ = 1/poly(n) suffices, so we can choose d = O(log n).

−1 −A Thus,( Sd (A)) v δ-approximates e v.

−1 How do we compute (Sd (A)) v? Example: Approximating the Exponential

For all integers d ≥ 0, there is a degree-d polynomial Sd (x) s.t.

sup e−x − 1 ≤ 2−Ω(d). x∈[0,∞) Sd (x)

def Pd xk Sd (x) = k=0 k! . (Proof by induction.)

No dependence on the length of the interval!

Hence, for any δ > 0, we have a rational function of degree −x O(log 1/δ) that is a δ-approximation to e . For most applications, an error of δ = 1/poly(n) suffices, so we can choose d = O(log n).

−1 −A Thus,( Sd (A)) v δ-approximates e v.

−1 How do we compute (Sd (A)) v? Why any Rational Approximation is not Enough?

Qd Qd −1 Factor Sd (x) = α0 i=1(x − βi ) and output α0 i=1(A − βi I ) v.

−1 Since d is O(log n), sufficient to compute (A − βi I ) u.

When A is Laplacian, and βi ≤ 0, then A − βi I is SDD!

βi s could be complex:

Sd (x) has exactly one real zero xd ∈ [−d, −1] if d is odd, and no real

zeros if d is even. Also, zeros of Sd (x) grow linearly in magnitude with d.

However, since Sd has real coefficients, its complex roots appear as conjugates. Hence, the task reduces to computing 2 2 −1 (A − (βi + β¯i )A + |βi | I ) u. 2 2 The matrix (A − (βi + β¯i )A + |βi | I ) is PSD but the condition number can be comparable to that of A.

Desire: A rational approximation with negative poles. Why any Rational Approximation is not Enough?

Qd Qd −1 Factor Sd (x) = α0 i=1(x − βi ) and output α0 i=1(A − βi I ) v.

−1 Since d is O(log n), sufficient to compute (A − βi I ) u.

When A is Laplacian, and βi ≤ 0, then A − βi I is SDD!

βi s could be complex:

Sd (x) has exactly one real zero xd ∈ [−d, −1] if d is odd, and no real

zeros if d is even. Also, zeros of Sd (x) grow linearly in magnitude with d.

However, since Sd has real coefficients, its complex roots appear as conjugates. Hence, the task reduces to computing 2 2 −1 (A − (βi + β¯i )A + |βi | I ) u. 2 2 The matrix (A − (βi + β¯i )A + |βi | I ) is PSD but the condition number can be comparable to that of A.

Desire: A rational approximation with negative poles. Why any Rational Approximation is not Enough?

Qd Qd −1 Factor Sd (x) = α0 i=1(x − βi ) and output α0 i=1(A − βi I ) v.

−1 Since d is O(log n), sufficient to compute (A − βi I ) u.

When A is Laplacian, and βi ≤ 0, then A − βi I is SDD!

βi s could be complex:

Sd (x) has exactly one real zero xd ∈ [−d, −1] if d is odd, and no real

zeros if d is even. Also, zeros of Sd (x) grow linearly in magnitude with d.

However, since Sd has real coefficients, its complex roots appear as conjugates. Hence, the task reduces to computing 2 2 −1 (A − (βi + β¯i )A + |βi | I ) u. 2 2 The matrix (A − (βi + β¯i )A + |βi | I ) is PSD but the condition number can be comparable to that of A.

Desire: A rational approximation with negative poles. Why any Rational Approximation is not Enough?

Qd Qd −1 Factor Sd (x) = α0 i=1(x − βi ) and output α0 i=1(A − βi I ) v.

−1 Since d is O(log n), sufficient to compute (A − βi I ) u.

When A is Laplacian, and βi ≤ 0, then A − βi I is SDD!

βi s could be complex:

Sd (x) has exactly one real zero xd ∈ [−d, −1] if d is odd, and no real

zeros if d is even. Also, zeros of Sd (x) grow linearly in magnitude with d.

However, since Sd has real coefficients, its complex roots appear as conjugates. Hence, the task reduces to computing 2 2 −1 (A − (βi + β¯i )A + |βi | I ) u. 2 2 The matrix (A − (βi + β¯i )A + |βi | I ) is PSD but the condition number can be comparable to that of A.

Desire: A rational approximation with negative poles. Why any Rational Approximation is not Enough?

Qd Qd −1 Factor Sd (x) = α0 i=1(x − βi ) and output α0 i=1(A − βi I ) v.

−1 Since d is O(log n), sufficient to compute (A − βi I ) u.

When A is Laplacian, and βi ≤ 0, then A − βi I is SDD!

βi s could be complex:

Sd (x) has exactly one real zero xd ∈ [−d, −1] if d is odd, and no real

zeros if d is even. Also, zeros of Sd (x) grow linearly in magnitude with d.

However, since Sd has real coefficients, its complex roots appear as conjugates. Hence, the task reduces to computing 2 2 −1 (A − (βi + β¯i )A + |βi | I ) u. 2 2 The matrix (A − (βi + β¯i )A + |βi | I ) is PSD but the condition number can be comparable to that of A.

Desire: A rational approximation with negative poles. Why any Rational Approximation is not Enough?

Qd Qd −1 Factor Sd (x) = α0 i=1(x − βi ) and output α0 i=1(A − βi I ) v.

−1 Since d is O(log n), sufficient to compute (A − βi I ) u.

When A is Laplacian, and βi ≤ 0, then A − βi I is SDD!

βi s could be complex:

Sd (x) has exactly one real zero xd ∈ [−d, −1] if d is odd, and no real

zeros if d is even. Also, zeros of Sd (x) grow linearly in magnitude with d.

However, since Sd has real coefficients, its complex roots appear as conjugates. Hence, the task reduces to computing 2 2 −1 (A − (βi + β¯i )A + |βi | I ) u. 2 2 The matrix (A − (βi + β¯i )A + |βi | I ) is PSD but the condition number can be comparable to that of A.

Desire: A rational approximation with negative poles. Why any Rational Approximation is not Enough?

Qd Qd −1 Factor Sd (x) = α0 i=1(x − βi ) and output α0 i=1(A − βi I ) v.

−1 Since d is O(log n), sufficient to compute (A − βi I ) u.

When A is Laplacian, and βi ≤ 0, then A − βi I is SDD!

βi s could be complex:

Sd (x) has exactly one real zero xd ∈ [−d, −1] if d is odd, and no real

zeros if d is even. Also, zeros of Sd (x) grow linearly in magnitude with d.

However, since Sd has real coefficients, its complex roots appear as conjugates. Hence, the task reduces to computing 2 2 −1 (A − (βi + β¯i )A + |βi | I ) u. 2 2 The matrix (A − (βi + β¯i )A + |βi | I ) is PSD but the condition number can be comparable to that of A.

Desire: A rational approximation with negative poles. Rational Approximation with Negative Poles

How about (1 + x/d)−d ? Converges to e−x unif. over [0, ∞).

Convergence rate slow: at x = 1 error is Θ(1/d).

1 More generally, for every rational function of the form /pd (x), where pd is a degree-d polynomial with real roots:

−x 1 1 2 sup |e − /pd (x)| =Ω( /d ). x∈[0,∞) Saff-Sch¨onhage-Varga 1975 For every d, there exists a degree-d polynomial p s.t.,   d sup e−x − p 1 ≤ 2−Ω(d). d 1+x/d x∈[0,∞)

Sachdeva-V. 2014 O(d) Moreover, the coefficients of pd are bounded by d , and can be approximated up to an error of d−Θ(d) using poly(d) arithmetic operations, where all intermediate numbers poly(d) bits. Rational Approximation with Negative Poles

How about (1 + x/d)−d ? Converges to e−x unif. over [0, ∞).

Convergence rate slow: at x = 1 error is Θ(1/d).

1 More generally, for every rational function of the form /pd (x), where pd is a degree-d polynomial with real roots:

−x 1 1 2 sup |e − /pd (x)| =Ω( /d ). x∈[0,∞) Saff-Sch¨onhage-Varga 1975 For every d, there exists a degree-d polynomial p s.t.,   d sup e−x − p 1 ≤ 2−Ω(d). d 1+x/d x∈[0,∞)

Sachdeva-V. 2014 O(d) Moreover, the coefficients of pd are bounded by d , and can be approximated up to an error of d−Θ(d) using poly(d) arithmetic operations, where all intermediate numbers poly(d) bits. Rational Approximation with Negative Poles

How about (1 + x/d)−d ? Converges to e−x unif. over [0, ∞).

Convergence rate slow: at x = 1 error is Θ(1/d).

1 More generally, for every rational function of the form /pd (x), where pd is a degree-d polynomial with real roots:

−x 1 1 2 sup |e − /pd (x)| =Ω( /d ). x∈[0,∞) Saff-Sch¨onhage-Varga 1975 For every d, there exists a degree-d polynomial p s.t.,   d sup e−x − p 1 ≤ 2−Ω(d). d 1+x/d x∈[0,∞)

Sachdeva-V. 2014 O(d) Moreover, the coefficients of pd are bounded by d , and can be approximated up to an error of d−Θ(d) using poly(d) arithmetic operations, where all intermediate numbers poly(d) bits. Rational Approximation with Negative Poles

How about (1 + x/d)−d ? Converges to e−x unif. over [0, ∞).

Convergence rate slow: at x = 1 error is Θ(1/d).

1 More generally, for every rational function of the form /pd (x), where pd is a degree-d polynomial with real roots:

−x 1 1 2 sup |e − /pd (x)| =Ω( /d ). x∈[0,∞) Saff-Sch¨onhage-Varga 1975 For every d, there exists a degree-d polynomial p s.t.,   d sup e−x − p 1 ≤ 2−Ω(d). d 1+x/d x∈[0,∞)

Sachdeva-V. 2014 O(d) Moreover, the coefficients of pd are bounded by d , and can be approximated up to an error of d−Θ(d) using poly(d) arithmetic operations, where all intermediate numbers poly(d) bits. Rational Approximation with Negative Poles

How about (1 + x/d)−d ? Converges to e−x unif. over [0, ∞).

Convergence rate slow: at x = 1 error is Θ(1/d).

1 More generally, for every rational function of the form /pd (x), where pd is a degree-d polynomial with real roots:

−x 1 1 2 sup |e − /pd (x)| =Ω( /d ). x∈[0,∞) Saff-Sch¨onhage-Varga 1975 For every d, there exists a degree-d polynomial p s.t.,   d sup e−x − p 1 ≤ 2−Ω(d). d 1+x/d x∈[0,∞)

Sachdeva-V. 2014 O(d) Moreover, the coefficients of pd are bounded by d , and can be approximated up to an error of d−Θ(d) using poly(d) arithmetic operations, where all intermediate numbers poly(d) bits. Computing the Matrix Exponential- Summary

Orecchia-Sachdeva-V. 2012, Sachdeva-V. 2014 Given an SDD A  0, a vector v with kvk = 1 and δ, we compute a vector u s.t. kexp(−A)v − uk ≤ δ, in time O˜ (m logkAk log 1/δ).

Corollary [Orecchia-Sachdeva-V. 2012] √ γ-approximation for Balanced separator in time O˜(m). Spectral guarantee for approximation, running time independent of γ

SDD Solvers Given Lx = b, L is SDD, and ε > 0, obtain a vector u s.t., −1 −1 1 ku − L bkL ≤ εkL bkL . Time required O˜ (m log /ε)

Are Laplacian solvers necessary for the matrix exponential? Computing the Matrix Exponential- Summary

Orecchia-Sachdeva-V. 2012, Sachdeva-V. 2014 Given an SDD A  0, a vector v with kvk = 1 and δ, we compute a vector u s.t. kexp(−A)v − uk ≤ δ, in time O˜ (m logkAk log 1/δ).

Corollary [Orecchia-Sachdeva-V. 2012] √ γ-approximation for Balanced separator in time O˜(m). Spectral guarantee for approximation, running time independent of γ

SDD Solvers Given Lx = b, L is SDD, and ε > 0, obtain a vector u s.t., −1 −1 1 ku − L bkL ≤ εkL bkL . Time required O˜ (m log /ε)

Are Laplacian solvers necessary for the matrix exponential? Computing the Matrix Exponential- Summary

Orecchia-Sachdeva-V. 2012, Sachdeva-V. 2014 Given an SDD A  0, a vector v with kvk = 1 and δ, we compute a vector u s.t. kexp(−A)v − uk ≤ δ, in time O˜ (m logkAk log 1/δ).

Corollary [Orecchia-Sachdeva-V. 2012] √ γ-approximation for Balanced separator in time O˜(m). Spectral guarantee for approximation, running time independent of γ

SDD Solvers Given Lx = b, L is SDD, and ε > 0, obtain a vector u s.t., −1 −1 1 ku − L bkL ≤ εkL bkL . Time required O˜ (m log /ε)

Are Laplacian solvers necessary for the matrix exponential? Computing the Matrix Exponential- Summary

Orecchia-Sachdeva-V. 2012, Sachdeva-V. 2014 Given an SDD A  0, a vector v with kvk = 1 and δ, we compute a vector u s.t. kexp(−A)v − uk ≤ δ, in time O˜ (m logkAk log 1/δ).

Corollary [Orecchia-Sachdeva-V. 2012] √ γ-approximation for Balanced separator in time O˜(m). Spectral guarantee for approximation, running time independent of γ

SDD Solvers Given Lx = b, L is SDD, and ε > 0, obtain a vector u s.t., −1 −1 1 ku − L bkL ≤ εkL bkL . Time required O˜ (m log /ε)

Are Laplacian solvers necessary for the matrix exponential? Matrix Inversion via Exponentiation

Belykin-Monzon 2010, Sachdeva-V. 2014

For ε, δ ∈ (0, 1], there exist poly(log(1/εδ)) numbers 0 < wj , tj s.t. −1 P −tj A −1 for all symm. εI  A  I , (1 − δ)A  j wj e  (1 + δ)A .

1 Weights wj are O(poly( /δε)), we lose only a polynomial factor in the approximation error.

For applications polylogarithmic dependence on both 1/δ and the condition number of A (1/ε in this case).

−1 ∞ −xt R 1 (εδ) Discretizing x = 0 e dt naively needs poly( / ) terms. Substituting t = ey in the above integral obtains the identity −1 R ∞ −xey +y x = −∞ e dy. Discretizing this integral, we bound the error using the Euler-Maclaurin formula, Riemann zeta fn.; global error analysis! Matrix Inversion via Exponentiation

Belykin-Monzon 2010, Sachdeva-V. 2014

For ε, δ ∈ (0, 1], there exist poly(log(1/εδ)) numbers 0 < wj , tj s.t. −1 P −tj A −1 for all symm. εI  A  I , (1 − δ)A  j wj e  (1 + δ)A .

1 Weights wj are O(poly( /δε)), we lose only a polynomial factor in the approximation error.

For applications polylogarithmic dependence on both 1/δ and the condition number of A (1/ε in this case).

−1 ∞ −xt R 1 (εδ) Discretizing x = 0 e dt naively needs poly( / ) terms. Substituting t = ey in the above integral obtains the identity −1 R ∞ −xey +y x = −∞ e dy. Discretizing this integral, we bound the error using the Euler-Maclaurin formula, Riemann zeta fn.; global error analysis! Matrix Inversion via Exponentiation

Belykin-Monzon 2010, Sachdeva-V. 2014

For ε, δ ∈ (0, 1], there exist poly(log(1/εδ)) numbers 0 < wj , tj s.t. −1 P −tj A −1 for all symm. εI  A  I , (1 − δ)A  j wj e  (1 + δ)A .

1 Weights wj are O(poly( /δε)), we lose only a polynomial factor in the approximation error.

For applications polylogarithmic dependence on both 1/δ and the condition number of A (1/ε in this case).

−1 ∞ −xt R 1 (εδ) Discretizing x = 0 e dt naively needs poly( / ) terms. Substituting t = ey in the above integral obtains the identity −1 R ∞ −xey +y x = −∞ e dy. Discretizing this integral, we bound the error using the Euler-Maclaurin formula, Riemann zeta fn.; global error analysis! Matrix Inversion via Exponentiation

Belykin-Monzon 2010, Sachdeva-V. 2014

For ε, δ ∈ (0, 1], there exist poly(log(1/εδ)) numbers 0 < wj , tj s.t. −1 P −tj A −1 for all symm. εI  A  I , (1 − δ)A  j wj e  (1 + δ)A .

1 Weights wj are O(poly( /δε)), we lose only a polynomial factor in the approximation error.

For applications polylogarithmic dependence on both 1/δ and the condition number of A (1/ε in this case).

−1 ∞ −xt R 1 (εδ) Discretizing x = 0 e dt naively needs poly( / ) terms. Substituting t = ey in the above integral obtains the identity −1 R ∞ −xey +y x = −∞ e dy. Discretizing this integral, we bound the error using the Euler-Maclaurin formula, Riemann zeta fn.; global error analysis! Matrix Inversion via Exponentiation

Belykin-Monzon 2010, Sachdeva-V. 2014

For ε, δ ∈ (0, 1], there exist poly(log(1/εδ)) numbers 0 < wj , tj s.t. −1 P −tj A −1 for all symm. εI  A  I , (1 − δ)A  j wj e  (1 + δ)A .

1 Weights wj are O(poly( /δε)), we lose only a polynomial factor in the approximation error.

For applications polylogarithmic dependence on both 1/δ and the condition number of A (1/ε in this case).

−1 ∞ −xt R 1 (εδ) Discretizing x = 0 e dt naively needs poly( / ) terms. Substituting t = ey in the above integral obtains the identity −1 R ∞ −xey +y x = −∞ e dy. Discretizing this integral, we bound the error using the Euler-Maclaurin formula, Riemann zeta fn.; global error analysis! Matrix Inversion via Exponentiation

Belykin-Monzon 2010, Sachdeva-V. 2014

For ε, δ ∈ (0, 1], there exist poly(log(1/εδ)) numbers 0 < wj , tj s.t. −1 P −tj A −1 for all symm. εI  A  I , (1 − δ)A  j wj e  (1 + δ)A .

1 Weights wj are O(poly( /δε)), we lose only a polynomial factor in the approximation error.

For applications polylogarithmic dependence on both 1/δ and the condition number of A (1/ε in this case).

−1 ∞ −xt R 1 (εδ) Discretizing x = 0 e dt naively needs poly( / ) terms. Substituting t = ey in the above integral obtains the identity −1 R ∞ −xey +y x = −∞ e dy. Discretizing this integral, we bound the error using the Euler-Maclaurin formula, Riemann zeta fn.; global error analysis! A Heuristic for solving PSD Systems?

Goal: compute A−1u for A psd.

Write A = A1 + ··· + Ak . Hence, e−tA = e−t(A1+···+Ak ). Suppose magically:

1 k is small. 2 Computing e−tAi v easier for all i. 3 e−t(A1+···+Ak ) ≈ e−tA1 ··· e−tAk .

−1 P Q −tj Ai Then A u ≈ j wj i e u (from previous slide).

Theorem p  B1 Bk  For large enough p, e(B1+···+Bk ) ≈ e p ··· e p . A Heuristic for solving PSD Systems?

Goal: compute A−1u for A psd.

Write A = A1 + ··· + Ak . Hence, e−tA = e−t(A1+···+Ak ). Suppose magically:

1 k is small. 2 Computing e−tAi v easier for all i. 3 e−t(A1+···+Ak ) ≈ e−tA1 ··· e−tAk .

−1 P Q −tj Ai Then A u ≈ j wj i e u (from previous slide).

Theorem p  B1 Bk  For large enough p, e(B1+···+Bk ) ≈ e p ··· e p . A Heuristic for solving PSD Systems?

Goal: compute A−1u for A psd.

Write A = A1 + ··· + Ak . Hence, e−tA = e−t(A1+···+Ak ). Suppose magically:

1 k is small. 2 Computing e−tAi v easier for all i. 3 e−t(A1+···+Ak ) ≈ e−tA1 ··· e−tAk .

−1 P Q −tj Ai Then A u ≈ j wj i e u (from previous slide).

Theorem p  B1 Bk  For large enough p, e(B1+···+Bk ) ≈ e p ··· e p . A Heuristic for solving PSD Systems?

Goal: compute A−1u for A psd.

Write A = A1 + ··· + Ak . Hence, e−tA = e−t(A1+···+Ak ). Suppose magically:

1 k is small. 2 Computing e−tAi v easier for all i. 3 e−t(A1+···+Ak ) ≈ e−tA1 ··· e−tAk .

−1 P Q −tj Ai Then A u ≈ j wj i e u (from previous slide).

Theorem p  B1 Bk  For large enough p, e(B1+···+Bk ) ≈ e p ··· e p . A Heuristic for solving PSD Systems?

Goal: compute A−1u for A psd.

Write A = A1 + ··· + Ak . Hence, e−tA = e−t(A1+···+Ak ). Suppose magically:

1 k is small. 2 Computing e−tAi v easier for all i. 3 e−t(A1+···+Ak ) ≈ e−tA1 ··· e−tAk .

−1 P Q −tj Ai Then A u ≈ j wj i e u (from previous slide).

Theorem p  B1 Bk  For large enough p, e(B1+···+Bk ) ≈ e p ··· e p . A Heuristic for solving PSD Systems?

Goal: compute A−1u for A psd.

Write A = A1 + ··· + Ak . Hence, e−tA = e−t(A1+···+Ak ). Suppose magically:

1 k is small. 2 Computing e−tAi v easier for all i. 3 e−t(A1+···+Ak ) ≈ e−tA1 ··· e−tAk .

−1 P Q −tj Ai Then A u ≈ j wj i e u (from previous slide).

Theorem p  B1 Bk  For large enough p, e(B1+···+Bk ) ≈ e p ··· e p . A Heuristic for solving PSD Systems?

Goal: compute A−1u for A psd.

Write A = A1 + ··· + Ak . Hence, e−tA = e−t(A1+···+Ak ). Suppose magically:

1 k is small. 2 Computing e−tAi v easier for all i. 3 e−t(A1+···+Ak ) ≈ e−tA1 ··· e−tAk .

−1 P Q −tj Ai Then A u ≈ j wj i e u (from previous slide).

Theorem p  B1 Bk  For large enough p, e(B1+···+Bk ) ≈ e p ··· e p . A Heuristic for solving PSD Systems?

Goal: compute A−1u for A psd.

Write A = A1 + ··· + Ak . Hence, e−tA = e−t(A1+···+Ak ). Suppose magically:

1 k is small. 2 Computing e−tAi v easier for all i. 3 e−t(A1+···+Ak ) ≈ e−tA1 ··· e−tAk .

−1 P Q −tj Ai Then A u ≈ j wj i e u (from previous slide).

Theorem p  B1 Bk  For large enough p, e(B1+···+Bk ) ≈ e p ··· e p . Conclusion

Uniform approx. the right notion for algorithmic applications. Taylor series often not the best. Often reduce computations of f (A)v to a small number of sparse matrix-vector computations. Mere existence of good approximation suffices (see V. 2013).

Rational approx. can benefit from the ability to solve Lx = b. Much left to be explained in the fascinating world of rational approximations. Challenge problem: Can we compute a δ-approximation to W s v in time O˜(m log s · log 1/δ)? (W is the random walk matrix of an undirected graph.) Beyond Lx = b?

Thanks for your attention! Conclusion

Uniform approx. the right notion for algorithmic applications. Taylor series often not the best. Often reduce computations of f (A)v to a small number of sparse matrix-vector computations. Mere existence of good approximation suffices (see V. 2013).

Rational approx. can benefit from the ability to solve Lx = b. Much left to be explained in the fascinating world of rational approximations. Challenge problem: Can we compute a δ-approximation to W s v in time O˜(m log s · log 1/δ)? (W is the random walk matrix of an undirected graph.) Beyond Lx = b?

Thanks for your attention! Conclusion

Uniform approx. the right notion for algorithmic applications. Taylor series often not the best. Often reduce computations of f (A)v to a small number of sparse matrix-vector computations. Mere existence of good approximation suffices (see V. 2013).

Rational approx. can benefit from the ability to solve Lx = b. Much left to be explained in the fascinating world of rational approximations. Challenge problem: Can we compute a δ-approximation to W s v in time O˜(m log s · log 1/δ)? (W is the random walk matrix of an undirected graph.) Beyond Lx = b?

Thanks for your attention! Conclusion

Uniform approx. the right notion for algorithmic applications. Taylor series often not the best. Often reduce computations of f (A)v to a small number of sparse matrix-vector computations. Mere existence of good approximation suffices (see V. 2013).

Rational approx. can benefit from the ability to solve Lx = b. Much left to be explained in the fascinating world of rational approximations. Challenge problem: Can we compute a δ-approximation to W s v in time O˜(m log s · log 1/δ)? (W is the random walk matrix of an undirected graph.) Beyond Lx = b?

Thanks for your attention! Conclusion

Uniform approx. the right notion for algorithmic applications. Taylor series often not the best. Often reduce computations of f (A)v to a small number of sparse matrix-vector computations. Mere existence of good approximation suffices (see V. 2013).

Rational approx. can benefit from the ability to solve Lx = b. Much left to be explained in the fascinating world of rational approximations. Challenge problem: Can we compute a δ-approximation to W s v in time O˜(m log s · log 1/δ)? (W is the random walk matrix of an undirected graph.) Beyond Lx = b?

Thanks for your attention! Conclusion

Uniform approx. the right notion for algorithmic applications. Taylor series often not the best. Often reduce computations of f (A)v to a small number of sparse matrix-vector computations. Mere existence of good approximation suffices (see V. 2013).

Rational approx. can benefit from the ability to solve Lx = b. Much left to be explained in the fascinating world of rational approximations. Challenge problem: Can we compute a δ-approximation to W s v in time O˜(m log s · log 1/δ)? (W is the random walk matrix of an undirected graph.) Beyond Lx = b?

Thanks for your attention!