Fast and stable computations with spherical harmonics
Richard Mika¨elSlevinsky Joint work with Hadrien Montanelli and Qiang Du
Department of Mathematics, University of Manitoba
PDEs on the Sphere 2019
April 30, 2019 Spherical Harmonics
Let µ be a positive Borel measure on D ⊂ Rn. The inner product: Z hf , gi = f (x)g(x) dµ(x), D
p 2 induces the norm kf k2 = hf , f i and the Hilbert space L (D, dµ(x)).
Let S2 ⊂ R3 denote the unit 2-sphere and let dΩ = sin θ dθ dϕ. Then any function f ∈ L2(S2, dΩ) may be expanded in spherical harmonics:
+∞ +` +∞ +∞ X X m m X X m m f (θ, ϕ) = f` Y` (θ, ϕ) = f` Y` (θ, ϕ), `=0 m=−` m=−∞ `=|m| where the expansion coefficients are:
m m hY` , f i f` = m m hY` , Y` i
2 of 24 Spherical Harmonics
For ` ∈ N0 and |m| ≤ `, orthonormal spherical harmonics are defined by: s eimϕ (` − |m|)! Y m(x) = Y m(θ, ϕ) = √ (−1)|m| (` + 1 ) P|m|(cos θ) . ` ` 2π 2 (` + |m|)! ` | {z } ˜|m| P` (cos θ) Associated Legendre functions are defined by ultraspherical polynomials:
1 m m 1 m (m+ 2 ) P` (cos θ) = (−2) ( 2 )m sin θC`−m (cos θ).
˜m The notation P` is used to denote orthonormality, and: Γ(x + n) (x) = n Γ(x) is the Pochhammer symbol for the rising factorial. 2 of 24 Synthesis and Analysis
A band-limited function of degree-n on the sphere has no nonzero spherical harmonic expansion coefficient of degree-n or greater:
n−1 +` X X m m fn−1(θ, ϕ) = f` Y` (θ, ϕ). `=0 m=−`
m For a spherical harmonic Y` , ` is the degree and m is the order. The transforms of synthesis and analysis convert between representations of a band-limited function in momentum and physical spaces: Synthesis Sample a band-limited function at a set of points on S2. Analysis Convert samples on S2 to spherical harmonic expansion coefficients. Normally, an appropriate set of points is chosen to be able to perfectly reconstruct a band-limited function of degree-n.
3 of 24 Synthesis and Analysis
Point sets include:
Equiangular θk and ϕj are equispaced and their Cartesian product is taken. These include the celebrated Driscoll–Healy (2n)(2n − 1) and McEwen–Wiaux (n − 1)(2n − 1) + 1 sampling theorems.
Gaussian cos θk are Gauss–Legendre points and ϕj are equispaced and their Cartesian product is taken. HEALPix procedure to generate points that correspond to a hierarchical equal area isolatitude pixelization of S2. Random with common distributions (e.g. uniformly distributed on S2). Data-driven may be treated similar to random. The na¨ıvecost of synthesis and analysis is O(n4) but it can be trivially reorganized to O(n3) for isolatitudinal point sets.
3 of 24 The Connection Problem
Complementary to synthesis and analysis is the connection problem: the exact expansion of spherical harmonics in another basis. The sphere is doubly periodic and supports a bivariate Fourier series. The (N)FFT solves synthesis and analysis with bivariate Fourier series. eimϕ Y m(θ, ϕ) = √ P˜m(cos θ) ⇒ in longitude, we are done. ` 2π ` ˜m The problem is to convert P` (cos θ) to Fourier series. (m+ 1 ) ˜m m 2 Since P` (cos θ) ∝ sin θC`−m (cos θ), the intuition is that ˜m even-ordered P` are trigonometric polynomials in cos θ; and, ˜m odd-ordered P` are trigonometric polynomials in sin θ. Conversions are not one-to-one. The connection problem is an analogue of the McEwen–Wiaux sampling theorem.
4 of 24 Connection vs. Synth. & Anal.
10 10 2 SHTns SHTns 10 11 2 FT FT
10 12
10 13 Relative Error
10 14
10 15
102 103 104 n + 1 5 of 24 Fast Transforms
Fast transforms should: have an O(n2 logO(1) n) run-time with a similar pre-computation for synthesis and analysis or the connection problem; be (backward) stable; and, be fast in practice. Algorithms may be classified: as either exact in exact arithmetic and unstable in floating-point arithmetic, and numerically stable algorithms in fixed precision that are approximate to some arbitrarily small tolerance ; as either acceleration of synthesis and analysis (>10), or acceleration of the connection problem (3); or, by analytical apparatus: split-Legendre functions, WKB approximation, the fast multipole method (FMM), and the butterfly algorithm.
6 of 24 The SH Connection Problem
Definition
Let {φn(x)}n≥0 be a family of orthogonal functions with respect to L2(Dˆ, dµˆ(x)); and,
let {ψn(x)}n≥0 be another family of orthogonal functions with respect to L2(D, dµ(x)). The connection coefficients:
hψ`, φnidµ c`,n = , hψ`, ψ`idµ allow for the expansion:
∞ X φn(x) = c`,nψ`(x). `=0
7 of 24 The SH Connection Problem
Theorem Let {φn(x)}n≥0 and {ψn(x)}n≥0 be two families of orthonormal functions with respect to L2(D, dµ(x)). Then the connection coefficients satisfy:
∞ X c`,mc`,n = δm,n. `=0
Any matrix A ∈ Rm×n, m ≥ n, with orthonormal columns is well-conditioned and Moore–Penrose pseudo-invertible A+ = A>. ˜m For every m, the P` (x) are a family of orthonormal functions for the same Hilbert space L2([−1, 1], dx).
7 of 24 The SH Connection Problem
Definition Let Gn denote the real Givens rotation: 1 ··· 0 0 0 ··· 0 ...... ...... 0 ··· cn 0 sn ··· 0 Gn = 0 ··· 0 1 0 ··· 0 , 0 · · · −sn 0 cn ··· 0 ...... ...... 0 ··· 0 0 0 ··· 1 where the sines and the cosines are in the intersections of the nth and n + 2nd rows and columns, embedded in the identity of a conformable size.
7 of 24 The SH Connection Problem
Theorem (Slevinsky 2017a) ˜m+2 ˜m The connection coefficients between Pn+m+2(cos θ) and P`+m(cos θ) are:
s (` + 2m)! (n + m + 5 )n! (2` + 2m + 1)(2m + 2) 2 , for ` ≤ n, ` + n even, 1 (` + m + 2 )`! (n + 2m + 4)! m s c`,n = (n + 1)(n + 2) − , for ` = n + 2, (n + 2m + 3)(n + 2m + 4) 0, otherwise.
Furthermore, the matrix of connection coefficients (m) (m) (m) (m) (m) C = G0 G1 ··· Gn−1Gn I(n+3)×(n+1), where the sines and cosines are: s s (n + 1)(n + 2) (2m + 2)(2n + 2m + 5) sm = , and cm = . n (n + 2m + 3)(n + 2m + 4) n (n + 2m + 3)(n + 2m + 4)
7 of 24 The SH Connection Problem
“Proof.” W.l.o.g., consider m = 0 and n = 5.
0.91287 0.0 0.31623 0.0 0.17593 0.0 0.0 0.83666 0.0 0.39641 0.0 0.24398 −0.40825 0.0 0.70711 0.0 0.3934 0.0 (0) 0.0 −0.54772 0.0 0.60553 0.0 0.37268 C = 0.0 0.0 −0.63246 0.0 0.5278 0.0 0.0 0.0 0.0 −0.69007 0.0 0.46718 0.0 0.0 0.0 0.0 −0.73193 0.0 0.0 0.0 0.0 0.0 0.0 −0.76376
7 of 24 The SH Connection Problem
(0)> “Proof.” Apply the Givens rotation G0 : 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.83666 0.0 0.39641 0.0 0.24398 0.0 0.0 0.7746 0.0 0.43095 0.0 (0)> (0) 0.0 −0.54772 0.0 0.60553 0.0 0.37268 G C = 0 0.0 0.0 −0.63246 0.0 0.5278 0.0 0.0 0.0 0.0 −0.69007 0.0 0.46718 0.0 0.0 0.0 0.0 −0.73193 0.0 0.0 0.0 0.0 0.0 0.0 −0.76376
7 of 24 The SH Connection Problem
“Proof.” And again:
1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.7746 0.0 0.43095 0.0 (0)> (0)> (0) 0.0 0.0 0.0 0.72375 0.0 0.44544 G G C = 1 0 0.0 0.0 −0.63246 0.0 0.5278 0.0 0.0 0.0 0.0 −0.69007 0.0 0.46718 0.0 0.0 0.0 0.0 −0.73193 0.0 0.0 0.0 0.0 0.0 0.0 −0.76376
7 of 24 The SH Connection Problem
“Proof.” And again:
1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 (0)> (0)> (0)> (0) 0.0 0.0 0.0 0.72375 0.0 0.44544 G G G C = 2 1 0 0.0 0.0 0.0 0.0 0.68139 0.0 0.0 0.0 0.0 −0.69007 0.0 0.46718 0.0 0.0 0.0 0.0 −0.73193 0.0 0.0 0.0 0.0 0.0 0.0 −0.76376
7 of 24 The SH Connection Problem
“Proof.” And again:
1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 (0)> (0)> (0)> (0)> (0) 0.0 0.0 0.0 1.0 0.0 0.0 G G G G C = 3 2 1 0 0.0 0.0 0.0 0.0 0.68139 0.0 0.0 0.0 0.0 0.0 0.0 0.6455 0.0 0.0 0.0 0.0 −0.73193 0.0 0.0 0.0 0.0 0.0 0.0 −0.76376
7 of 24 The SH Connection Problem
“Proof.” And again:
1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 (0)> (0)> (0)> (0)> (0)> (0) 0.0 0.0 0.0 1.0 0.0 0.0 G G G G G C = 4 3 2 1 0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.6455 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 −0.76376
7 of 24 The SH Connection Problem
“Proof.” And finally:
1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 (0)> (0)> (0)> (0)> (0)> (0)> (0) 0.0 0.0 0.0 1.0 0.0 0.0 G G G G G G C = 5 4 3 2 1 0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
7 of 24 The SH Connection Problem
“Proof.” Schematically:
(m)
C . I
= .
.
(m) (m) G0 ··· Gn
Conversion between neighbouring layers is O(n) flops and storage. The Givens rotations are computed to high relative accuracy due to analytical expressions of sines and cosines ⇒ backward stable.
7 of 24 Spherical Harmonics to Fourier
˜` ˜1 P` P` sin θU` ˜`−1 ˜0 P` P` T` ...... ˜3 =⇒ ˜1 =⇒ P` P` sin θU` ˜2 ˜0 P` P` T` ˜1 ˜1 P` P` sin θU` ˜0 ˜0 P` P` T`
1 Convert high-order layers to layers of order 0 and 1 in O(n3) flops and O(n2) storage; and, 2 Convert low-order layers to Fourier series in O(n2 log n) flops and O(n log n) storage `ala Fast Multipole Method. To make an algorithm fast, we usually need to make approximations.
8 of 24 A Fast Transform
To convert high-order layers to layers of order 0 and 1 in O(n2 log2 n) flops, we need something more. ˜2m ˜0 For m ∈ N, the connection coefficients between P`+2m and Pn are given by the inner product: Z 1 2m ˜2m ˜0 c`,n = P`+2m(x)Pn (x) dx. −1 ˜0 Using the Fourier transform of Pn : q n 1 1 (−i) n + 2 Z Z c2m = j (k) dk eikx P˜2m (x) dx. `,n π n `+2m R −1 The matrix of connection coefficients is an operator composition with the variables (n × k) × (k × x) × (x × `). The Fourier integral operator is special.
9 of 24 The Butterfly Algorithm
Purpose: Abstract the algebra of the FFT. Technique: Divide-and-conquer ⇔ merge-and-split. Technology: The interpolative decomposition. Proof: Fourier integral operators have rank-proportional-to-area. The ranks of operator compositions are bounded by the smallest rank in the composition, extending applicability beyond Fourier integral operators.
10 of 24 The Interpolative Decomposition
Lemma m×n m×k Let A ∈ R . For any k, there exist ACS ∈ R whose columns are a k×n unique subset of the columns of A and AI ∈ R such that: 1 some subset of the columns of AI makes up the k × k identity matrix;
2 kvec(AI)k∞ ≤ 1;
3 p the spectral norm of AI satisfies kAIk2 ≤ k(n − k) + 1; 4 the least singular value of AI is at least 1;
5 ACSAI = A whenever k = m or k = n; and,
6 when k < min{m, n}, the spectral norm of A − ACSAI satisfies: p kA − ACSAIk2 ≤ k(n − k) + 1σk+1.
st where σk+1 is the k + 1 singular value of A.
We say that A ≈ ACSAI and any structure in A is also in ACS. 11 of 24 The Butterfly Algorithm
Step 1: Partition A ∈ Rn×n into thin strips. Compute IDs of each subblock.
≈
12 of 24 The Butterfly Algorithm
Step 2 (a): Merge the strips and split them approximately in half.
≈
12 of 24 The Butterfly Algorithm
Step 2 (b): Compute IDs of each subblock.
≈
12 of 24 The Butterfly Algorithm
Step 3 (a): Again, merge the strips and split them approximately in half.
≈
12 of 24 The Butterfly Algorithm
Step 3 (b): Compute IDs of each subblock.
≈
12 of 24 The Butterfly Algorithm
Step 4 (a): Final step, merge the strips and split them approximately in half.
≈
12 of 24 The Butterfly Algorithm
Step 4 (b): Final step, compute IDs of each subblock.
≈
12 of 24 The Butterfly Algorithm
Let A ∈ Rn×n have rank-proportional-to-area. Every time we merge-and-split, the complexity of a matrix-vector product is approximately halved. We have created a permuted and sparse block-diagonal factorization. 2 Costs O(kavgn ) to compute the factorization and O(kavgn log n) for a matrix-vector product, where kavg is the average rank of all IDs. To convert all associated Legendre functions to orders 0 and 1, requires 3 2 O(kavgn ) flops to pre-compute and O(kavgn log n) to apply.
12 of 24 Spherical Harmonics to Fourier
˜` ˜1 P` P` sin θU` ˜`−1 ˜0 P` P` T` ...... ˜3 =⇒ ˜1 =⇒ P` P` sin θU` ˜2 ˜0 P` P` T` ˜1 ˜1 P` P` sin θU` ˜0 ˜0 P` P` T`
2 1 Convert high-order layers to layers of order 0 and 1 in O(kavgn log n) 2 flops and O(kavgn log n) storage; and, 2 Convert low-order layers to Fourier series in O(n2 log n) flops and O(n log n) storage `ala Fast Multipole Method. 3 Can we beat O(kavgn ) pre-computation?
13 of 24 Skeletonizing the Pre-Computation Skeletonizing the pre-computation makes it practical for a laptop. Neighbouring layers are converted via Given rotations.
n
m
` } ) {z avg | (k O 14 of 24 Conclusions
New transforms are created to convert spherical harmonic expansions into bivariate Fourier series. They are asymptotically fast, O(n2 log2 n), and backward stable by construction. For practical band-limits of n ≤ O(10, 000), the asymptotically optimal complexity does not yet appear. They are freely available in Julia in FastTransforms.jl and in C in FastTransforms. Is there a pre-computation-free method? Yes! [Slevinsky 2017b] uses fast symmetric tridiagonal eigensolvers (via FMM) to reduce pre-computation 3 3 from O(n ) down to O(n 2 log n), which is superoptimal. The approach extends to more exotic 2D harmonic polynomials on the disk, triangle, rectangle, deltoid, wedge, etc. . . .
15 of 24 Time Evolution
Semi-linear PDEs with nonlocal diffusion have many real-world applications:
2 ut = Lδu + N (u), u(t = 0, x) = u0(x), > 0,
where Lδ is a linear nonlocal operator, and N is not. Our nonlocal model is: Z Lδu(x) = ρδ(|x − y|)[u(y) − u(x)] dΩ(y), 2 S
The kernel ρδ is nonnegative and compactly supported by a horizon 0 < δ ≤ 2. For example, N (u) = u − u3 in the Allen–Cahn equation.
Problem: Lδ is localized in momentum (coefficient) space and N is localized in physical (value) space.
16 of 24 Application: Nonlocal Diffusion Joint work with Hadrien Montanelli & Qiang Du (Columbia). Our nonlocal diffusion is: Z Lδu(x) = ρδ(|x − y|)[u(y) − u(x)] dΩ(y), 2 S where: 4(1 + α) ρδ(|x − y|) = χ[0,δ](|x − y|), πδ2+2α |x − y|2−2α where α ∈ (−1, 1) and 0 < δ ≤ 2 are parameters governing the strength. Using the Funk–Hecke formula, we proved: m m LδY` (x) = λδ(`)Y` (x), with eigenvalues: 1 Z p λδ(`) = 2π (P`(t) − 1) ρδ( 2(1 − t)) dt. 1−δ2/2
The constants ensure that λδ(`) → −`(` + 1) as δ → 0. 17 of 24 A View of the Spectrum
8 10 0( )
0.01( ) 5 .
0 7 10 0.1( )
1( ) = 6 ( ) 10 2 t a
) ( 105
f o
n 4
o 10 i t a u l 3 a 10 v E
l a
c 2 i 10 r e m
u 1
N 10
100 100 101 102 103 104
18 of 24 A View of the Spectrum
8 10 0( )
0.01( ) 5
. 7 10 0.1( ) 0
= 1( )
6 t 10 2( ) a
) ( 5
10 f o
n
o 4 i
t 10 a u l
a 3 v 10 E
l a c i r 102 e m u
N 101
100 100 101 102 103 104
18 of 24 Local Allen–Cahn
2 3 ut = ∆u + u − u , t = 0
19 of 24 Local Allen–Cahn
2 3 ut = ∆u + u − u , t = 1
19 of 24 Local Allen–Cahn
2 3 ut = ∆u + u − u , t = 4
19 of 24 Local Allen–Cahn
2 3 ut = ∆u + u − u , t = 16
19 of 24 Local Allen–Cahn
2 3 ut = ∆u + u − u , t = 64
19 of 24 Typical Alloy Microstructure
Google image search of “microstructure of alloys”
20 of 24 Typical Alloy Microstructure
Google image search of “microstructure of alloys”
20 of 24 Nonlocal Allen–Cahn
2 3 ut = Lδu + u − u , t = 0
21 of 24 Nonlocal Allen–Cahn
2 3 ut = Lδu + u − u , t = 1
21 of 24 Nonlocal Allen–Cahn
2 3 ut = Lδu + u − u , t = 4
21 of 24 Nonlocal Allen–Cahn
2 3 ut = Lδu + u − u , t = 16
21 of 24 Nonlocal Allen–Cahn
2 3 ut = Lδu + u − u , t = 64
21 of 24 Convergence Results
Spherical harmonic expansions are spectrally accurate.
10 1
10 3
10 5
10 7 Relative Error
10 9
10 11 t = 1 t = 4 10 13 100 200 300 400 500 22 of 24 n Convergence Results
We use ETDRK4 which is fourth-order in time.
10 1
10 3
10 5
10 7 Relative Error
10 9
t = 1 10 11 t = 4 (h4) 10 13 10 2 10 1 h 22 of 24 Convergence Results
Sufficiently small time steps ⇒ spectral spatial accuracy. Sufficiently large degrees ⇒ fourth-order temporal accuracy. It is known that the nonlocal steady-state may be discontinuous [Du & Yang 2016]. Why use a spectral method? A spectral method converges with spectral accuracy arbitrarily close to the discontinuity. Ces`aro(C, 2) summation as global post-processing removes the Gibbs phenomenon ⇒ pointwise convergence almost everywhere. No better or worse than a finite difference/element/volume method, but the spectrum is trivial. Analysis question: does the solution become discontinuous in finite or infinite time?
22 of 24 References
1 B. K. Alpert and V. Rokhlin. SIAM J. Sci. Stat. Comput., 12:158–179, 1991.
2 L. Greengard and V. Rokhlin. J. Comp. Phys., 73:325–348, 1987.
3 E. Liberty et al. Proc. Nat. Acad. Sci., 104:20167–20172, 2007.
4 M. J. Mohlenkamp. J. Fourier Anal. Appl., 5:159–184, 1999.
5 A. Mori, R. Suda, and M. Sugihara. 40:3612–3615, 1999.
6 V. Rokhlin and M. Tygert. SIAM J. Sci. Comput., 27:1903–1928, 2006.
7 R. M. Slevinsky. Appl. Comp. Harmon. Anal., 2017.
8 R. M. Slevinsky. arXiv:1711.07866, 2017.
9 R. M. Slevinsky, H. Montanelli, and Q. Du. J. Comp. Phys., 372:893–911, 2018.
10 R. Suda and M. Takami. Math. Comp., 71:703–715, 2002.
11 M. Tygert. J. Comp. Phys., 227:4260–4279, 2008.
12 M. Tygert. J. Comp. Phys., 229:6181–6192, 2010.
23 of 24 Fin
Thank you!
24 of 24