A fast and well-conditioned spectral method: The US method

Alex Townsend University of Oxford

Leslie Fox Prize, 24th of June 2013

Partially based on: S. Olver & T., A fast and well-conditioned spectral method, to appear in SIAM Review, (2013).

Alex Townsend, University of Oxford 24th of June 2013 0/26 The problem

Problem: Solve linear ordinary differential equations (ODEs)

d N u du aN (x) + ··· + a1(x) + a0(x)u = f (x), x ∈ [−1, 1] dx N dx R 1 with K linear boundary conditions (Dirichlet, Neumann, −1 u = c, etc.) Assumptions: a0,..., aN , f are continuous with bounded variation. The leading term aN (x) is non-zero on the interval. The solution exists and is unique. Main philosophy: Replace a0,..., aN , f by polynomial approximations, and solve for a polynomial approximation for u. The spectral method finds the coefficients in a Chebyshev series in O(m2n) operations:

n−1 X −1 u(x) ≈ αk Tk (x), Tk (x) = cos(k cos x). k=0

Alex Townsend, University of Oxford 24th of June 2013 1/26 Overview

6

4 degree(u) = 51511

2 u(x) Conventional wisdom 0 A comparison −2 −4 First-order ODEs −1 −0.5 0 0.5 1 x Higher-order ODEs 1 Stability and convergence Fast linear algebra 0.5 PDEs

y 0 Conclusion

−0.5

−1 −1 −0.5 0 0.5 1 x

Alex Townsend, University of Oxford 24th of June 2013 2/26 Quotes about spectral methods

“It is known that matrices generated by spectral methods can be dense and ill-conditioned.” [Chen 2005]

“The idea behind spectral methods is to take this process to the limit, at least in principle, and work with a differentiation formula of infinite order and infinite bandwidth, i.e., a dense matrix.” [Trefethen 2000]

“The best advice is still this: use pseudospectral (collocation) methods instead of spectral (coefficient), and use and in preference to more exotic functions.” [Boyd 2003]

The Ultraspherical spectral method (US method) can result in almost banded, well-conditioned linear systems.

Alex Townsend, University of Oxford 24th of June 2013 3/26 Chebyshev spectral methods

u00(x) + cos(x)u(x) = 0, u(±1) = 0. Collocation Coefficients US method 1010 1010 4

Condition 105 105 3 number 100 100 2 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Density

Collocation Coefficient US method Matrix structure Dense Banded from below Banded + rank-K Condition number O(n2N ) O(n2N ) O(n2(D−1)) Problem sizes n ≤ 5,000 n ≤ 5,000 n ≤ 70,000 Operation count O(n3) O(mn2) O(m2n)

Alex Townsend, University of Oxford 24th of June 2013 4/26 First-order differential equations

Differentiation Multiplication Operator Operator

u′(x) + a(x)u(x) = f(x)

(D + SM[a]) u = Sf

Jacobi with w(x)=(1−x)α(1+x)β 1.5 D : T → C(1) 1

(1) 0.5 C M[a]: T → T

β 0 L Differentiation

−0.5 Conversion Jacobi

Hypergeometric functions T (1) Ultraspherical S : T → C −1 Hypergeometric functions

−1.5 −1.5 −1 −0.5 0 0.5 1 1.5 α

Alex Townsend, University of Oxford 24th of June 2013 5/26 First-order operators

Differentiation operator: D : T → C(1), [DLMF]

0 1  ( (1) 2 dTk kCk−1, k ≥ 1,   = D =  3  . dx 0, k = 0,    .  ..

Conversion operator: S : T → C(1), [DLMF]

 1   1  (1) (1)  1 0 − 2 C − C , k ≥ 2, 1 1  2 k k−2 0 −   2 2  1 (1)   Tk = S =  1 ..  . 2 C1 , k = 1, 0 .   2   (1)  . .  C0 , k = 0, .. ..

Multiplication operator: M[a]: T → T, [DLMF]

1 T T = T + T  , j, k ≥ 0. j k 2 |j−k| j+k

Alex Townsend, University of Oxford 24th of June 2013 6/26 First-order multiplication operator

1 1 T T = T + T j k 2 |j−k| 2 j+k     2a0 a1 a2 a3 ... 0 0 0 0 ...  ..   ..  a1 2a0 a1 a2 . a1 a2 a3 a4 .      1  ..  1  .. M[a] =  a2 a1 2a0 a1 . + a2 a3 a4 a5 .  2   2    ..   ..  a3 a2 a1 2a0 . a3 a4 a5 a6 .   . . . . .   . . . ..  ...... | {z } | {z } Toeplitz Hankel + rank-1 Multiplication is not a dense operator in finite precision. It is m-banded:

∞ m X X a(x) = ak Tk (x) = ˜ak Tk (x) + O(), k=0 k=0

where ˜ak are aliased Chebyshev coefficients.

Alex Townsend, University of Oxford 24th of June 2013 7/26 Imposing boundary conditions

Dirichlet: u(−1) = c:

k B = (1, −1, 1, −1,..., ) , Tk (−1) = (−1) .

Neumann: u0(1) = c:

2 B = (0, 1, 4, 9,..., ) , Tk (1) = k .

R 1 0 Other conditions: −1 u = c, u(0) = c, or u(0) + u (π/6) = c. Imposing boundary conditions by boundary bordering: u0(x)+a(x)u(x) = f ,  B  c = . Bu = c, D + SM[a] f

The finite section can be done exactly by projection Pn = (In , 0):

T ! ! BPn c T T T = . Pn−1DPn + Pn−1SPn+mPn+mM[a]Pn Pn−1f

Alex Townsend, University of Oxford 24th of June 2013 8/26 First-order examples

Example 1

u0(x) + x 3u(x) = 100 sin(20,000x 2), u(−1) = 0.

The exact solution is

4 Z x 4  − x t 2 u(x) = e 4 100e 4 sin(20,000t )dt . −1

1.2

1 N = ultraop(@(x,u) ··· );

0.8 N.lbc = 0; u = N \ f; degree(u) = 20391 0.6 Adaptively selects the u(x) 0.4 discretisation size. time = 15.5s 0.2 Forms a chebfun object

0 [Chebfun V4.2].

−0.2 −15 −1 −0.5 0 0.5 1 ku˜ − uk = 1.5 × 10 . x ∞

Alex Townsend, University of Oxford 24th of June 2013 9/26 First-order examples

Example 2 1 u0(x) + u(x) = 0, u(−1) = 1. 1 + 50,000x 2

The exact solution with a = 50,000 is √ √  tan−1( ax) + tan−1( a) u(x) = exp − √ . a

−5 1.005 10

1 −10 degree(u) = 5093 10

0.995 u(x)

−15 Absolute error 10 0.99 Old chebop Preconditioned collocation

−20 US method 0.985 10 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 x x

Alex Townsend, University of Oxford 24th of June 2013 10/26 Higher-order linear ODEs (λ) Higher-order diff operators: Dλ : T → C , [DLMF]  λ times  (λ) ( z }| { dC (λ+1) 0 ··· 0 λ  k 2λCk−1 , k ≥ 1, λ−1   = Dλ = 2 (λ−1)!   . dx 0, k = 0,  λ + 1   .  .. (λ) (λ+1) Higher-order conversion operators: Sλ : C → C , [DLMF]      λ (λ+1) (λ+1) 1 − λ  λ+k Ck − Ck−2 , k ≥ 2, λ+2  λ λ (λ) λ (λ+1)  −  Ck = C , k = 1, Sλ =  λ+1 λ+3  .  λ+1 1  . .   (λ+1) .. .. C0 , k = 0,

Jacobi with w(x)=(1−x)α(1+x)β 2.5

Multiplication operators: 2 C(2) λ λ 1.5 Mλ[a]: C → C ,

β 1 Differentiation (1) (1) (1) M1[a]: C → C , 0.5 C Conversion L 0 M1[a] = Toeplitz + Hankel. Jacobi Ultraspherical T−0.5 −0.5 0 0.5 1 1.5 2 2.5 α

Alex Townsend, University of Oxford 24th of June 2013 11/26 Construction of ultraspherical multiplication operators

(λ) (λ) (λ) Mλ[a]: C → C represents multiplication with a(x) in C . If

m X (λ) a(x) = aj Cj (x), j=0

then it can be shown that Mλ[a] is m-banded with

k X λ (Mλ[a])j,k = a2s+j−k cs (k, 2s + j − k), j, k ≥ 0, s=max(0,k−j)

where (to prevent overflow issues) we have

s−1 j−s−1 j + k + λ − 2s Y λ + t Y λ + t cλ(j, k) = × × s j + k + λ − s 1 + t 1 + t t=0 t=0 s−1 j−s−1 Y 2λ + j + k − 2s + t Y k − s + 1 + t × × . λ + j + k − 2s + t k − s + λ + t t=0 t=0

λ For efficiency, we use a recurence relation to generate cs (j, k). Alex Townsend, University of Oxford 24th of June 2013 12/26 A high-order example

d N u d N−1u aN (x) + aN−1(x) + ··· + a0(x)u = f (x), Bu = c, dx N dx N−1

 B   c  N N−1 0 = MN [a ]DN + SN−1MN−1[a ]DN−1 + ··· + SN−1 ···S0M0[a ] SN−1 ···S0f

Example 3: High-order ODE

u(10)(x) + cosh(x)u(8)(x) + cos(x)u(2)(x) + x 2u(x) = 0 u(±1) = 0, u0(±1) = 1, u(k)(±1) = 0, k = 2, 3, 4.

0 0.5 10

0.4

−5 0.3 10 0.2 1

2 −10 0.1 || 10 1 2 n Z  u 2 0 −

u(x) (˜u(x) +u ˜(−x)) n+1 −15

−0.1 u 10 || −1 −0.2 −20 −14 −0.3 10 = 1.3 × 10 . −0.4

−25 −0.5 10 −1 −0.5 0 0.5 1 20 40 60 80 100 120 140 x n

Alex Townsend, University of Oxford 24th of June 2013 13/26 Airy equation and regularity preserving

Example 4 Consider Airy’s equation for  > 0,     u00(x) − xu(x) = 0, u(−1) = Ai −p3 1/ , u(1) = Ai p3 1/ .

Relative error in derivative at x=1/2 5

10 10 10 Ai(1)  = 10−9 Ai(20) 0 (100) 108 10 Ai

6 10 −4 −5  = 10 10

104 Relative error

Condition Number −10 10 102

 = 1 −15 0 10 10 0.5 1 1.5 2 2.5 3 0 50 100 150 200 n x 104 n

The US method can accurately compute high derivatives of functions. For a different approach, see [Bornemann 2011].

Alex Townsend, University of Oxford 24th of June 2013 14/26 Stability when K = N

Assumption: Number of boundary conditions = Differential order.

2 Definition: Higher-order `λ-norms 2 ∞ For λ ∈ N, `λ ⊂ C is the with norm

∞ 2 X 2 2λ kuk 2 = |uk | (1 + k) < ∞. `λ k=0

Right diagonal preconditioner:   IN 1  1  R =  N  . 2N−1(N − 1)!  1   N+1 .  ..

Alex Townsend, University of Oxford 24th of June 2013 15/26 Stability when K = N Recall the notation: Solve Lu = f , 2 K with Bu = c, B : `D → C , bounded, D ≥ 1.

Theorem: Condition number is bounded with n B Let : `2 → `2 be an invertible operator for some λ ≥ D − 1. L λ+1 λ Then, as n → ∞,

−1 kAnRnk 2 = O(1), k(AnRn) k 2 = O(1), `λ `λ where B R = PT RP , A = PT P . n n n n n L n

Proof uses integral reformulation ideas: “R integrates the ODE to form a Fredholm operator”, i.e., B R = I + K, K = compact. L

Alex Townsend, University of Oxford 24th of June 2013 16/26 Convergence with K = N

Theorem: Computed solution converges in high-order norms B Suppose f ∈ `2 for some λ ≥ D − 1, and that : `2 → `2 is λ−N+1 L λ+1 λ an invertible operator. Define   −1 c un = An Pn , SN−1 ···S0f

then > > ku − P unk 2 ≤ Cku − P Pnuk 2 . n `λ+1 n `λ+1

Relative error in derivative at x=1/2 5 10 Ai(1) Ai(20) 0 (100) The constant, C, does not 10 Ai depend on n. −5 You don’t always need the 10 Relative error complex plane to compute −10 10 high derivatives.

−15 10

0 50 100 150 200 n Alex Townsend, University of Oxford 24th of June 2013 17/26 Fast linear algebra: The QTP∗ factorisation

Original matrix: After left Givens: After right Givens:

Apply a partial factorisation on the left, followed by a partial factorisation on the right. ∗ 2 Factorisation: An = QTP in O(m n) operations.

No fill-in: The matrix T contains no more non-zeros than An.

For solving Anx = b: The matrix Q can be immediately applied to b. The matrix P∗ can be stored using Demmel’s trick [Demmel 1997]. UMFPACK by Davis is faster for n ≤ 50000, but has observed complexity of, roughly, O(n1.25) with a very small constant.

Alex Townsend, University of Oxford 24th of June 2013 18/26 Constant coefficient PDEs Consider the constant coefficient PDE N N−j X X ∂i+j u a = f (x, y), a ∈ ij ∂i x∂j y ij R j=0 i=0 defined on [a, b] × [c, d] with linear boundary conditions satisfying continuity conditions. We discretise as a generalised Sylvester matrix equation k X T n1×n1 n2×n2 σj Aj XBj = F , Aj ∈ R , Bj ∈ R , j=1

where Aj and Bj are US discretisations, for example, ! Aj =

Alex Townsend, University of Oxford 24th of June 2013 19/26 Determining the rank of a PDE operator

Given a PDE with constant coefficients of the form,

N N−j X X ∂i+j L = a , A = (a ) . ij ∂i x∂j y ij j=0 i=0

We can rewrite this as  ∂0/∂x 0  D T D D  .  L = (y) A (x), (x) =  .  . ∂N /∂x N

Hence, if A = UΣV T is the truncated SVD of A with Σ ∈ Rk×k then L = D(y)T UΣV T  D(x) = D(y)T U Σ V T D(x) .

This low rank representation for L tells us how to discretize and solve the PDE.

Alex Townsend, University of Oxford 24th of June 2013 20/26 Matrix equation solvers

T T T Rank-1: A1XB1 = F . Solve A1Y = F , then B1X = Y . T T Rank-2: A1XB1 + A2XB2 = F . Generalised Sylvester solver [Jonsson & K˚agstr¨om,1992]. ∗ Rank-k, k ≥ 3: Solve n1n2 × n1n2 system with QTP factorization. Boundary conditions are imposed by carefully 3 3 3 O(n1n2) O(n1 + n2) removing degrees of freedom in the matrix equation. O(n n ) 1 2 Automatically forms and solves subproblems, if possible. Corner singularities can be partly resolved by domain

n1 & n2 subdivision.

Alex Townsend, University of Oxford 24th of June 2013 21/26 Examples

Example 5: Helmholtz equation

2 uxx + uyy + 2ω u = 0, u(±1, y) = f (±1, y), u(x, ±1) = f (x, ±1),

where f (x, y) = cos(ωx) cos(ωy).

ω = 50 1 N = chebop2(@(u) ... ); N.lbc=f(-1,:); u=N\0; 0.5 Adaptively selects the discretisation size.

y 0 Forms a chebfun2 object

−0.5 [T. & Trefethen 2013]. For ω = 2000, −1 −9 −1 −0.5 0 0.5 1 ku˜ − uk2 = 2.43 × 10 . x

For ω = 50 solve takes 0.27s for n1 = n2 = 126, for ω = 2000 solve takes 32.1s for n1 = n2 = 2124.

Alex Townsend, University of Oxford 24th of June 2013 22/26 Conclusions

The US method results in almost banded well-conditioned matrices. It requires O(m2n) operations to solve an ODE. It can be used as a preconditioner for the . The US method converges and is numerically stable. It can be extended to a spectral method for solving PDEs on rectangular domains.

Alex Townsend, University of Oxford 24th of June 2013 23/26 Historical example [Orszag 1971]

Orr-Sommerfeld equation

1 (u0000 − 2u00 + u) − 2iu − i(1 − x 2)(u00 − u) = λ(u00 − u), 5772.2 u(±1) = u0(±1) = 0.

Eigenvalues of the Orr−Sommerfeld operator

−0.3

−0.4

−0.5

−0.6 imag −0.7

−0.8

−0.9

−1 −1 −0.8 −0.6 −0.4 −0.2 0 real

Alex Townsend, University of Oxford 24th of June 2013 24/26 The end

Thank you

More information: S. Olver & T., A fast and well-conditioned spectral method, to appear in SIAM Review, (2013).

Alex Townsend, University of Oxford 24th of June 2013 25/26 References

F. Bornemann, Accuracy and stability of computing high-order derivatives of analytic functions by Cauchy integrals, Found. Comput. Math., 11 (2011), pp. 1–63.

I. Jonsson & K˚agstrom¨ , Recursive blocked algorithms for solving triangular matrix equations — Part II: Two-sided and generalized Sylvester and Lyapunov equations, ACM Trans. Math. Softw., 28 (2002), pp. 416–435.

S. Olver & T., A fast and well-conditioned spectral method, to appear in SIAM Review, (2013).

S. Orszag, Accurate solution of the Orr-Sommerfeld stability equation, J. Fluid Mech., 50 (1971), pp. 689–703.

T. & L. N. Trefethen, An extension of Chebfun to two dimensions, likely to appear in SISC, (2013).

L. N. Trefethen et al., The Chebfun software system, version 4.2 , 2011.

Alex Townsend, University of Oxford 24th of June 2013 26/26 Chebyshev polynomials

An underappreciated fact: Every Chebyshev series can be rewritten as a palindromic Laurent series. The degree k Chebyshev polynomial (of the first kind) is defined by

1 T (x) = cos k cos−1 x = zk + z−k  , x ∈ [−1, 1], k 2 where z = eikθ and cos θ = x. θ θ T (x) T ( ) = cos(25 ) 25 25 1 1

0.5

0 0

−0.5

−1 −1 −1 −0.5 0 0.5 1 0 1.5708 3.1416 x θ Every Chebyshev series can be written as a palindromic Laurent series:

N N N X 1 X 1 X α T (x) = α z−k +1 α + α zk , z = eikθ. k k 2 k 0 2 k k=0 k=1 k=1

Alex Townsend, University of Oxford 24th of June 2013 27/26 Demmel’s trick for storing Givens rotations

A Givens rotation zeros out one entry of a matrix and we can store the rotation information there.     × × × × Givens × × × × × × × × z}|{ × × × ×   −→   ×  × × × p × × × × × × × × × × Let s = sin θ and c = cos θ, where θ is rotation angle, then ( s · sign(c), |s| < |c|, p = sign(s) c , |s| ≥ |c|.

To restore√ (up to a 180 degrees): If |p| < √1, then s = p and c = 1 − s2, otherwise c = 1/p and s = 1 − c2.

Alex Townsend, University of Oxford 24th of June 2013 28/26