Fast and stable computations with spherical harmonics

Richard Mika¨elSlevinsky Joint work with Hadrien Montanelli and Qiang Du

Department of , University of Manitoba

[email protected]

PDEs on the 2019

April 30, 2019 Spherical Harmonics

Let µ be a positive Borel measure on D ⊂ Rn. The inner product: Z hf , gi = f (x)g(x) dµ(x), D

p 2 induces the norm kf k2 = hf , f i and the L (D, dµ(x)).

Let S2 ⊂ R3 denote the unit 2-sphere and let dΩ = sin θ dθ dϕ. Then any function f ∈ L2(S2, dΩ) may be expanded in spherical harmonics:

+∞ +` +∞ +∞ X X m m X X m m f (θ, ϕ) = f` Y` (θ, ϕ) = f` Y` (θ, ϕ), `=0 m=−` m=−∞ `=|m| where the expansion coefficients are:

m m hY` , f i f` = m m hY` , Y` i

2 of 24 Spherical Harmonics

For ` ∈ N0 and |m| ≤ `, orthonormal spherical harmonics are defined by: s eimϕ (` − |m|)! Y m(x) = Y m(θ, ϕ) = √ (−1)|m| (` + 1 ) P|m|(cos θ) . ` ` 2π 2 (` + |m|)! ` | {z } ˜|m| P` (cos θ) Associated Legendre functions are defined by ultraspherical polynomials:

1 m m 1 m (m+ 2 ) P` (cos θ) = (−2) ( 2 )m sin θC`−m (cos θ).

˜m The notation P` is used to denote , and: Γ(x + n) (x) = n Γ(x) is the Pochhammer symbol for the rising factorial. 2 of 24 Synthesis and Analysis

A band-limited function of degree-n on the sphere has no nonzero spherical harmonic expansion coefficient of degree-n or greater:

n−1 +` X X m m fn−1(θ, ϕ) = f` Y` (θ, ϕ). `=0 m=−`

m For a spherical harmonic Y` , ` is the degree and m is the order. The transforms of synthesis and analysis convert between representations of a band-limited function in momentum and physical spaces: Synthesis Sample a band-limited function at a set of points on S2. Analysis Convert samples on S2 to spherical harmonic expansion coefficients. Normally, an appropriate set of points is chosen to be able to perfectly reconstruct a band-limited function of degree-n.

3 of 24 Synthesis and Analysis

Point sets include:

Equiangular θk and ϕj are equispaced and their Cartesian product is taken. These include the celebrated Driscoll–Healy (2n)(2n − 1) and McEwen–Wiaux (n − 1)(2n − 1) + 1 sampling theorems.

Gaussian cos θk are Gauss–Legendre points and ϕj are equispaced and their Cartesian product is taken. HEALPix procedure to generate points that correspond to a hierarchical equal area isolatitude pixelization of S2. Random with common distributions (e.g. uniformly distributed on S2). Data-driven may be treated similar to random. The na¨ıvecost of synthesis and analysis is O(n4) but it can be trivially reorganized to O(n3) for isolatitudinal point sets.

3 of 24 The Connection Problem

Complementary to synthesis and analysis is the connection problem: the exact expansion of spherical harmonics in another basis. The sphere is doubly periodic and supports a bivariate . The (N)FFT solves synthesis and analysis with bivariate Fourier series. eimϕ Y m(θ, ϕ) = √ P˜m(cos θ) ⇒ in , we are done. ` 2π ` ˜m The problem is to convert P` (cos θ) to Fourier series. (m+ 1 ) ˜m m 2 Since P` (cos θ) ∝ sin θC`−m (cos θ), the intuition is that ˜m even-ordered P` are trigonometric polynomials in cos θ; and, ˜m odd-ordered P` are trigonometric polynomials in sin θ. Conversions are not one-to-one. The connection problem is an analogue of the McEwen–Wiaux sampling theorem.

4 of 24 Connection vs. Synth. & Anal.

10 10 2 SHTns SHTns 10 11 2 FT FT

10 12

10 13 Relative Error

10 14

10 15

102 103 104 n + 1 5 of 24 Fast Transforms

Fast transforms should: have an O(n2 logO(1) n) run-time with a similar pre-computation for synthesis and analysis or the connection problem; be (backward) stable; and, be fast in practice. Algorithms may be classified: as either exact in exact arithmetic and unstable in floating-point arithmetic, and numerically stable algorithms in fixed precision that are approximate to some arbitrarily small tolerance ; as either acceleration of synthesis and analysis (>10), or acceleration of the connection problem (3); or, by analytical apparatus: split-Legendre functions, WKB approximation, the fast multipole method (FMM), and the butterfly algorithm.

6 of 24 The SH Connection Problem

Definition

Let {φn(x)}n≥0 be a family of with respect to L2(Dˆ, dµˆ(x)); and,

let {ψn(x)}n≥0 be another family of orthogonal functions with respect to L2(D, dµ(x)). The connection coefficients:

hψ`, φnidµ c`,n = , hψ`, ψ`idµ allow for the expansion:

∞ X φn(x) = c`,nψ`(x). `=0

7 of 24 The SH Connection Problem

Theorem Let {φn(x)}n≥0 and {ψn(x)}n≥0 be two families of orthonormal functions with respect to L2(D, dµ(x)). Then the connection coefficients satisfy:

∞ X c`,mc`,n = δm,n. `=0

Any matrix A ∈ Rm×n, m ≥ n, with orthonormal columns is well-conditioned and Moore–Penrose pseudo-invertible A+ = A>. ˜m For every m, the P` (x) are a family of orthonormal functions for the same Hilbert space L2([−1, 1], dx).

7 of 24 The SH Connection Problem

Definition Let Gn denote the real Givens rotation: 1 ··· 0 0 0 ··· 0 ......  ......    0 ··· cn 0 sn ··· 0   Gn = 0 ··· 0 1 0 ··· 0 ,   0 · · · −sn 0 cn ··· 0   ......  ......  0 ··· 0 0 0 ··· 1 where the sines and the cosines are in the intersections of the nth and n + 2nd rows and columns, embedded in the identity of a conformable size.

7 of 24 The SH Connection Problem

Theorem (Slevinsky 2017a) ˜m+2 ˜m The connection coefficients between Pn+m+2(cos θ) and P`+m(cos θ) are:

s  (` + 2m)! (n + m + 5 )n!  (2` + 2m + 1)(2m + 2) 2 , for ` ≤ n, ` + n even,  1  (` + m + 2 )`! (n + 2m + 4)! m  s c`,n = (n + 1)(n + 2)  − , for ` = n + 2,   (n + 2m + 3)(n + 2m + 4)  0, otherwise.

Furthermore, the matrix of connection coefficients (m) (m) (m) (m) (m) C = G0 G1 ··· Gn−1Gn I(n+3)×(n+1), where the sines and cosines are: s s (n + 1)(n + 2) (2m + 2)(2n + 2m + 5) sm = , and cm = . n (n + 2m + 3)(n + 2m + 4) n (n + 2m + 3)(n + 2m + 4)

7 of 24 The SH Connection Problem

“Proof.” W.l.o.g., consider m = 0 and n = 5.

 0.91287 0.0 0.31623 0.0 0.17593 0.0   0.0 0.83666 0.0 0.39641 0.0 0.24398    −0.40825 0.0 0.70711 0.0 0.3934 0.0    (0)  0.0 −0.54772 0.0 0.60553 0.0 0.37268  C =    0.0 0.0 −0.63246 0.0 0.5278 0.0     0.0 0.0 0.0 −0.69007 0.0 0.46718     0.0 0.0 0.0 0.0 −0.73193 0.0  0.0 0.0 0.0 0.0 0.0 −0.76376

7 of 24 The SH Connection Problem

(0)> “Proof.” Apply the Givens rotation G0 : 1.0 0.0 0.0 0.0 0.0 0.0  0.0 0.83666 0.0 0.39641 0.0 0.24398    0.0 0.0 0.7746 0.0 0.43095 0.0    (0)> (0) 0.0 −0.54772 0.0 0.60553 0.0 0.37268  G C =   0 0.0 0.0 −0.63246 0.0 0.5278 0.0    0.0 0.0 0.0 −0.69007 0.0 0.46718    0.0 0.0 0.0 0.0 −0.73193 0.0  0.0 0.0 0.0 0.0 0.0 −0.76376

7 of 24 The SH Connection Problem

“Proof.” And again:

1.0 0.0 0.0 0.0 0.0 0.0  0.0 1.0 0.0 0.0 0.0 0.0    0.0 0.0 0.7746 0.0 0.43095 0.0    (0)> (0)> (0) 0.0 0.0 0.0 0.72375 0.0 0.44544  G G C =   1 0 0.0 0.0 −0.63246 0.0 0.5278 0.0    0.0 0.0 0.0 −0.69007 0.0 0.46718    0.0 0.0 0.0 0.0 −0.73193 0.0  0.0 0.0 0.0 0.0 0.0 −0.76376

7 of 24 The SH Connection Problem

“Proof.” And again:

1.0 0.0 0.0 0.0 0.0 0.0  0.0 1.0 0.0 0.0 0.0 0.0    0.0 0.0 1.0 0.0 0.0 0.0    (0)> (0)> (0)> (0) 0.0 0.0 0.0 0.72375 0.0 0.44544  G G G C =   2 1 0 0.0 0.0 0.0 0.0 0.68139 0.0    0.0 0.0 0.0 −0.69007 0.0 0.46718    0.0 0.0 0.0 0.0 −0.73193 0.0  0.0 0.0 0.0 0.0 0.0 −0.76376

7 of 24 The SH Connection Problem

“Proof.” And again:

1.0 0.0 0.0 0.0 0.0 0.0  0.0 1.0 0.0 0.0 0.0 0.0    0.0 0.0 1.0 0.0 0.0 0.0    (0)> (0)> (0)> (0)> (0) 0.0 0.0 0.0 1.0 0.0 0.0  G G G G C =   3 2 1 0 0.0 0.0 0.0 0.0 0.68139 0.0    0.0 0.0 0.0 0.0 0.0 0.6455    0.0 0.0 0.0 0.0 −0.73193 0.0  0.0 0.0 0.0 0.0 0.0 −0.76376

7 of 24 The SH Connection Problem

“Proof.” And again:

1.0 0.0 0.0 0.0 0.0 0.0  0.0 1.0 0.0 0.0 0.0 0.0    0.0 0.0 1.0 0.0 0.0 0.0    (0)> (0)> (0)> (0)> (0)> (0) 0.0 0.0 0.0 1.0 0.0 0.0  G G G G G C =   4 3 2 1 0 0.0 0.0 0.0 0.0 1.0 0.0    0.0 0.0 0.0 0.0 0.0 0.6455    0.0 0.0 0.0 0.0 0.0 0.0  0.0 0.0 0.0 0.0 0.0 −0.76376

7 of 24 The SH Connection Problem

“Proof.” And finally:

1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0   0.0 0.0 1.0 0.0 0.0 0.0   (0)> (0)> (0)> (0)> (0)> (0)> (0) 0.0 0.0 0.0 1.0 0.0 0.0 G G G G G G C =   5 4 3 2 1 0 0.0 0.0 0.0 0.0 1.0 0.0   0.0 0.0 0.0 0.0 0.0 1.0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

7 of 24 The SH Connection Problem

“Proof.” Schematically:

(m)

C . I

= .

.

(m) (m) G0 ··· Gn

Conversion between neighbouring layers is O(n) flops and storage. The Givens rotations are computed to high relative accuracy due to analytical expressions of sines and cosines ⇒ backward stable.

7 of 24 Spherical Harmonics to Fourier

˜` ˜1 P` P` sin θU` ˜`−1 ˜0 P` P` T` ...... ˜3 =⇒ ˜1 =⇒ P` P` sin θU` ˜2 ˜0 P` P` T` ˜1 ˜1 P` P` sin θU` ˜0 ˜0 P` P` T`

1 Convert high-order layers to layers of order 0 and 1 in O(n3) flops and O(n2) storage; and, 2 Convert low-order layers to Fourier series in O(n2 log n) flops and O(n log n) storage `ala Fast Multipole Method. To make an algorithm fast, we usually need to make approximations.

8 of 24 A Fast Transform

To convert high-order layers to layers of order 0 and 1 in O(n2 log2 n) flops, we need something more. ˜2m ˜0 For m ∈ N, the connection coefficients between P`+2m and Pn are given by the inner product: Z 1 2m ˜2m ˜0 c`,n = P`+2m(x)Pn (x) dx. −1 ˜0 Using the Fourier transform of Pn : q n 1 1 (−i) n + 2 Z Z c2m = j (k) dk eikx P˜2m (x) dx. `,n π n `+2m R −1 The matrix of connection coefficients is an operator composition with the variables (n × k) × (k × x) × (x × `). The Fourier integral operator is special.

9 of 24 The Butterfly Algorithm

Purpose: Abstract the algebra of the FFT. Technique: Divide-and-conquer ⇔ merge-and-split. Technology: The interpolative decomposition. Proof: Fourier integral operators have rank-proportional-to-area. The ranks of operator compositions are bounded by the smallest rank in the composition, extending applicability beyond Fourier integral operators.

10 of 24 The Interpolative Decomposition

Lemma m×n m×k Let A ∈ R . For any k, there exist ACS ∈ R whose columns are a k×n unique subset of the columns of A and AI ∈ R such that: 1 some subset of the columns of AI makes up the k × k identity matrix;

2 kvec(AI)k∞ ≤ 1;

3 p the spectral norm of AI satisfies kAIk2 ≤ k(n − k) + 1; 4 the least singular value of AI is at least 1;

5 ACSAI = A whenever k = m or k = n; and,

6 when k < min{m, n}, the spectral norm of A − ACSAI satisfies: p kA − ACSAIk2 ≤ k(n − k) + 1σk+1.

st where σk+1 is the k + 1 singular value of A.

We say that A ≈ ACSAI and any structure in A is also in ACS. 11 of 24 The Butterfly Algorithm

Step 1: Partition A ∈ Rn×n into thin strips. Compute IDs of each subblock.

12 of 24 The Butterfly Algorithm

Step 2 (a): Merge the strips and split them approximately in half.

12 of 24 The Butterfly Algorithm

Step 2 (b): Compute IDs of each subblock.

12 of 24 The Butterfly Algorithm

Step 3 (a): Again, merge the strips and split them approximately in half.

12 of 24 The Butterfly Algorithm

Step 3 (b): Compute IDs of each subblock.

12 of 24 The Butterfly Algorithm

Step 4 (a): Final step, merge the strips and split them approximately in half.

12 of 24 The Butterfly Algorithm

Step 4 (b): Final step, compute IDs of each subblock.

12 of 24 The Butterfly Algorithm

Let A ∈ Rn×n have rank-proportional-to-area. Every time we merge-and-split, the complexity of a matrix-vector product is approximately halved. We have created a permuted and sparse block-diagonal factorization. 2 Costs O(kavgn ) to compute the factorization and O(kavgn log n) for a matrix-vector product, where kavg is the average rank of all IDs. To convert all associated Legendre functions to orders 0 and 1, requires 3 2 O(kavgn ) flops to pre-compute and O(kavgn log n) to apply.

12 of 24 Spherical Harmonics to Fourier

˜` ˜1 P` P` sin θU` ˜`−1 ˜0 P` P` T` ...... ˜3 =⇒ ˜1 =⇒ P` P` sin θU` ˜2 ˜0 P` P` T` ˜1 ˜1 P` P` sin θU` ˜0 ˜0 P` P` T`

2 1 Convert high-order layers to layers of order 0 and 1 in O(kavgn log n) 2 flops and O(kavgn log n) storage; and, 2 Convert low-order layers to Fourier series in O(n2 log n) flops and O(n log n) storage `ala Fast Multipole Method. 3 Can we beat O(kavgn ) pre-computation?

13 of 24 Skeletonizing the Pre-Computation Skeletonizing the pre-computation makes it practical for a laptop. Neighbouring layers are converted via Given rotations.

n

m

` } ) {z avg | (k O 14 of 24 Conclusions

New transforms are created to convert spherical harmonic expansions into bivariate Fourier series. They are asymptotically fast, O(n2 log2 n), and backward stable by construction. For practical band-limits of n ≤ O(10, 000), the asymptotically optimal complexity does not yet appear. They are freely available in Julia in FastTransforms.jl and in C in FastTransforms. Is there a pre-computation-free method? Yes! [Slevinsky 2017b] uses fast symmetric tridiagonal eigensolvers (via FMM) to reduce pre-computation 3 3 from O(n ) down to O(n 2 log n), which is superoptimal. The approach extends to more exotic 2D harmonic polynomials on the disk, triangle, rectangle, deltoid, wedge, etc. . . .

15 of 24 Time Evolution

Semi-linear PDEs with nonlocal diffusion have many real-world applications:

2 ut =  Lδu + N (u), u(t = 0, x) = u0(x),  > 0,

where Lδ is a linear nonlocal operator, and N is not. Our nonlocal model is: Z Lδu(x) = ρδ(|x − y|)[u(y) − u(x)] dΩ(y), 2 S

The kernel ρδ is nonnegative and compactly supported by a horizon 0 < δ ≤ 2. For example, N (u) = u − u3 in the Allen–Cahn equation.

Problem: Lδ is localized in momentum (coefficient) space and N is localized in physical (value) space.

16 of 24 Application: Nonlocal Diffusion Joint work with Hadrien Montanelli & Qiang Du (Columbia). Our nonlocal diffusion is: Z Lδu(x) = ρδ(|x − y|)[u(y) − u(x)] dΩ(y), 2 S where: 4(1 + α) ρδ(|x − y|) = χ[0,δ](|x − y|), πδ2+2α |x − y|2−2α where α ∈ (−1, 1) and 0 < δ ≤ 2 are parameters governing the strength. Using the Funk–Hecke formula, we proved: m m LδY` (x) = λδ(`)Y` (x), with eigenvalues: 1 Z p λδ(`) = 2π (P`(t) − 1) ρδ( 2(1 − t)) dt. 1−δ2/2

The constants ensure that λδ(`) → −`(` + 1) as δ → 0. 17 of 24 A View of the Spectrum

8 10 0( )

0.01( ) 5 .

0 7 10 0.1( )

1( ) = 6 ( ) 10 2 t a

) ( 105

f o

n 4

o 10 i t a u l 3 a 10 v E

l a

c 2 i 10 r e m

u 1

N 10

100 100 101 102 103 104

18 of 24 A View of the Spectrum

8 10 0( )

0.01( ) 5

. 7 10 0.1( ) 0

= 1( )

6 t 10 2( ) a

) ( 5

10 f o

n

o 4 i

t 10 a u l

a 3 v 10 E

l a c i r 102 e m u

N 101

100 100 101 102 103 104

18 of 24 Local Allen–Cahn

2 3 ut =  ∆u + u − u , t = 0

19 of 24 Local Allen–Cahn

2 3 ut =  ∆u + u − u , t = 1

19 of 24 Local Allen–Cahn

2 3 ut =  ∆u + u − u , t = 4

19 of 24 Local Allen–Cahn

2 3 ut =  ∆u + u − u , t = 16

19 of 24 Local Allen–Cahn

2 3 ut =  ∆u + u − u , t = 64

19 of 24 Typical Alloy Microstructure

Google image search of “microstructure of alloys”

20 of 24 Typical Alloy Microstructure

Google image search of “microstructure of alloys”

20 of 24 Nonlocal Allen–Cahn

2 3 ut =  Lδu + u − u , t = 0

21 of 24 Nonlocal Allen–Cahn

2 3 ut =  Lδu + u − u , t = 1

21 of 24 Nonlocal Allen–Cahn

2 3 ut =  Lδu + u − u , t = 4

21 of 24 Nonlocal Allen–Cahn

2 3 ut =  Lδu + u − u , t = 16

21 of 24 Nonlocal Allen–Cahn

2 3 ut =  Lδu + u − u , t = 64

21 of 24 Convergence Results

Spherical harmonic expansions are spectrally accurate.

10 1

10 3

10 5

10 7 Relative Error

10 9

10 11 t = 1 t = 4 10 13 100 200 300 400 500 22 of 24 n Convergence Results

We use ETDRK4 which is fourth-order in time.

10 1

10 3

10 5

10 7 Relative Error

10 9

t = 1 10 11 t = 4 (h4) 10 13 10 2 10 1 h 22 of 24 Convergence Results

Sufficiently small time steps ⇒ spectral spatial accuracy. Sufficiently large degrees ⇒ fourth-order temporal accuracy. It is known that the nonlocal steady-state may be discontinuous [Du & Yang 2016]. Why use a spectral method? A spectral method converges with spectral accuracy arbitrarily close to the discontinuity. Ces`aro(C, 2) summation as global post-processing removes the Gibbs phenomenon ⇒ pointwise convergence almost everywhere. No better or worse than a finite difference/element/volume method, but the spectrum is trivial. Analysis question: does the solution become discontinuous in finite or infinite time?

22 of 24 References

1 B. K. Alpert and V. Rokhlin. SIAM J. Sci. Stat. Comput., 12:158–179, 1991.

2 L. Greengard and V. Rokhlin. J. Comp. Phys., 73:325–348, 1987.

3 E. Liberty et al. Proc. Nat. Acad. Sci., 104:20167–20172, 2007.

4 M. J. Mohlenkamp. J. Fourier Anal. Appl., 5:159–184, 1999.

5 A. Mori, R. Suda, and M. Sugihara. 40:3612–3615, 1999.

6 V. Rokhlin and M. Tygert. SIAM J. Sci. Comput., 27:1903–1928, 2006.

7 R. M. Slevinsky. Appl. Comp. Harmon. Anal., 2017.

8 R. M. Slevinsky. arXiv:1711.07866, 2017.

9 R. M. Slevinsky, H. Montanelli, and Q. Du. J. Comp. Phys., 372:893–911, 2018.

10 R. Suda and M. Takami. Math. Comp., 71:703–715, 2002.

11 M. Tygert. J. Comp. Phys., 227:4260–4279, 2008.

12 M. Tygert. J. Comp. Phys., 229:6181–6192, 2010.

23 of 24 Fin

Thank you!

24 of 24