<<

Multi-grid Methods in Theory

Rich Brower, , Boston University Jan 10, 2017 (SciDAC Software Co-director and NVIDIA CUDA Fellow)

Machine Learning/Multi-scale Physics

smoothing

prolongation (interpolation)

Fine Grid

restriction

The Multigrid V-cycle

Smaller Coarse Grid has Come of Age

7 CM-2 100 Mflops (1989) 10 increase in 25 years BF/Q 1 Pflops (2012) Lagrangian for QCD

What so difficult about this! (only explicit scale) S = d4x L Z 1 (x)= F abF ab + ¯ ab (@ + Aab) + m ¯ L 4g2 µ⌫ µ⌫ a µ ⌫ µ b

• 3x3 “Maxwell” matrix field & 2+ Dirac quarks • 1 “color” charge g & “small” quark masses m. • Sample quantum “probability” of gluonic “plasma”: d4xF 2/2g2 Prob A (x)det[D† (A)D (A)]e ⇠ D µ quark quark Z R All prediction from require “Algorithms” Z = Path Integral exp[ - Action]

Feynman Wilson Diagrams Lattice QCD PDE/FEM Schwartz Schurs

OPE & Group Multi- ‘tHooft Grid Dim Reg Domain Wall Twisters Wilson DD Bootstrap AdS/CFT Flow Peter’s Multiscale

P. Boyle Yesterday

What about multi-scales inside of Lattice QCD?

a(lattice) 1/Mproton 1/m⇡ L(box) 0.06 fermi ⌧ 0.2 fermi ⌧ 1.4 fermi⌧ 6.0 fermi ⌧ ⌧ ⌧ = L = O(100) or Minimum Lattice Volume 1004! ) Multigrid: Case History in Algorithm Development

* “MG is always the Future”: Anonymous, JLab 2008 ** “But, the future has arrived!”: Rich, Oak Ridge 2013

• History Lessons (1989-1992) * - Cause of early failure

• Modern Era (2008-2013) - 5 years to put into production the QCD MG Solver for Wilson-clover)

• Future** (2013-2018) - Domain Wall & Staggered Solvers, HMC evolution, etc - Adaptation to heterogeneous architectures, etc. Outline 1. Lattice Gauge Multi-grid: Why was it so difficult? –Lessons from 1990s: –The scaling & RG metaphor*

2. The Adaptive Geometric MG break through - Wilson clover (twisted mass) - Staggered (?) - Domain Wall (overlap/Peter Boyle!)

3. Multi-scale extensions: - Monte Carlo MG (Endres et al ?) - Quantum Finite Elements (FEM + quantum) - SUSY/Graphene/etc *Combining And Multigrid Methods 1988 R. Brower, R. Giles, K.J.M. Moriarty , P. Tamayo A faithful Discrete Dirac PDE on Lattice is not trivial Standard Finite Difference or Finite Element Methods Fail There are several popular choice and trade offs Each poses different Multigrid Challenges

Wilson Clover & Twisted Mass

Staggered (or Kogut-Susskind, SUSY, Graphene) Domain Wall & Overlap (exact chiral) Simplicial Lattice (Random Flat Christ et al) (General Riemann Brower et al) : Math Preliminaries Continuum: (@ iA ) (x)+m = b(x) µ µ µ @ ij( iAab) jb(x)+m ia = bia(x) or µ @xµ µ µ X 3 by 3 gauged colors 4 x 4 (d/2 x d/2) spin matrices

Even more compact Linear Op form (D + m) = b

(D + m) =(i + m) D is anti-Hermitian implies spectral | i | i Normal Operator 2 2 (D + m)†(D + m)= (@ iA ) F + m µ µ µ⌫ µ⌫ Putting it on hypercubic lattice

1 µ 1+µ Dquark = d U(x, x + µ) U(x + µ, x)+mq Wilson 2 2 1 d D(U, m)= ⌘µ(~x) Uµ(~x)~x,~y µˆ Uµ†(~x µˆ)~x,~y+ˆµ + m~x,~y 2 Staggered µ=1 X ⇥ ⇤ x + x ⌘ =( 1) 2 ··· µ just phases! µ

Domain Wall is 5d Wilson-like

x x+µ

iaAµ(x) Uglue(x, x + µ)=e Scaling & Ren Group Metaphor

2 d Massless Laplace: (x) (x) !

@2 @2 2(x, y, )= [ + + ]=e2d(x, y, ) r ··· @x2 @y2 ··· ···

Solution (Green’s function)

1 e2 (x, y, )= ··· Sd ( x2 + y2 + z)d 2 ··· 1 p (x, y)= log( x2 + y2) for d =2 2⇡ p Naive Scaling and 1-d MG Toy Example

(D)(x)=(x + a) 2(x)+(x a)+a2m2(x)=b(x)

a ➔ 2a Restriction R = P†

2 a ➔ a Prolongation P

(1) Blocking preserves the scale invariant const solutions (null state) (2) Coarse operator is renormalized: m ➔ 2 m ( in units a = 1) 1 1+ D = d µ U(x, x + µ) µ U(x + µ, x)+m wilson 2 2 q 1 1 = ( † )+ † 2 µ µ 2 µ µ

(x)=U(x, x + µ) (x + µ) (x) a(@ A (x)) (x) ( where µ ' µ µ )

Wilson Spectra in 2d QCD MG attempts in 1990’s

See Thomas Kalkretuer hep-lat/9409008 review on “MG Methods for in LGT”.

Israel: Ben-Av, M. Harmatz, P.G. Lauwers & S.Solomon

Boston: Brower, Edwards, Rebbi & Vicari

Amsterdam: A. Hulsebos, J Smit J. C. Vick

Amsterdam: A. Hulsebos, J Smit J. C. Vick QCD MG “failure” in 1990s: 2x2 Blocks for U(1) Dirac

β = 1

2-d Lattice, Gauss-Jacobi (Diamond), CG (circle), V cycle (square), W cycle (star) Uµ(x) on links Ψ(x) on sites Universal of Critical Slowing down:

! = 3 (cross) 10(plus) 100( square)

Gauss-Jacobi (Diamond), CG(circle), 3 level (square & star) ⌧ = F (ml) Lessons from MG attempts in 1990s

– Partial success at weak coupling.

– “Local” real space RG blocking but not perfect.

– Maintain Gauge invariance

– Maintain 5 Hermiticity

– BUT why did it fail at small mass? Perturbative RG Metaphor?

“Education is knowing how far to push a metaphor!” Why Didn’t It work? Instantons, Topological Zero Modes (Atiyah-Singer index) and Confinement length lσ

1 (1 + )kj ia(x)= xˆµik ✏ja 0 x2 + ⇢2 µ 2

lσ “A little knowledge is a dangerous thing”

Lagrangian Lattice (a >0) Quantum Theory * (i.e. PDE’s) (i.e.Computer) (i.e.Nature)

Rotational(Lorentz) Invariance ✔ ✘ ✔ Gauge Invariance ✔ ✔ ✔ Scale Invariance ✔ ✘ ✘ Chiral Invariance ✔ ✘/✔ ✘

*Must take lattice spacing to zero for the quantum averages. 2. Adaptive Multigrid Revolution?

2. Machine Learning Multigrid Revolution! Adaptive Smooth Aggregations Algebraic MultiGrid

Slow convergence of Dirac solver is due small eigenvalues for vectors in near null subspace: S .

smoothing

prolongation (interpolation) D : S 0 '

Fine Grid restriction

The Multi-grid Spilt the vector space V-cycle into near null space S and the complement S ?

Smaller Coarse Grid 2-level Multi-grid Cycle (simplified)

• Smooth: x0 =(1 D)x + br0 =(1 D)r • Project: Dˆ = P †DP rˆ = P †r

1 • Approx. Solve: D ˆ eˆ =ˆr = eˆ Dˆ P †r ) ' • Prolongate e = P eˆ

• Update x00 = x0 + e

Petrov-Galerkin Oblique Projector: Exact Upscale 1 rexact =[1 DP P †]r = P †rexact =0 P †DP ) Adaptive Smooth Aggregation Algebraic Multigrid

“Adaptive*multigrid*algorithm*for*the*lattice*Wilson7Dirac*operator”*R.*Babich,*J.*Brannick,*R.*C.* Brower,*M.*A.*Clark,*T.*Manteuffel,*S.*McCormick,*J.*C.*Osborn,*and*C.*Rebbi,**PRL.**(2010).* Good News/Bad News

More Data: Should Save MG projectors with lattice

Actually MG error is smaller at fixed Residual Adaptive Smooth Aggregation Algebraic Multigrid

“Adaptive*multigrid*algorithm*for*the*lattice*Wilson7Dirac*operator”*R.*Babich,*J.*Brannick,*R.*C.* Brower,*M.*A.*Clark,*T.*Manteuffel,*S.*McCormick,*J.*C.*Osborn,*and*C.*Rebbi,**PRL.**(2010).* Good News/Bad News

More Data: Should Save MG projectors with lattice

Actually MG error is smaller at fixed Residual Staggered Multigrid: Preliminary (Brower, Clark, Strelchenko and Weinberg)

1 D(U, m)= ⌘ (x)[U (x, x + µ) U(x + µ, x)] + m 2 µ µ 1 = ⌘ (~x)[ † ]+m 2 µ µ µ

2d Staggered Spectrum Normal Equation (e/o precond): Free field is 2a Laplace

m D mD m2 D D 0 D (U, m)D(U, m)= oe oe = eo oe † D m D m 0 m2 D D  eo  eo  oe eo

Total Deo,Doe applications, Normal eqn 10000 1282, CG on Even-Odd 2562, CG on Even-Odd 1282, MG-GCR 2562, MG-GCR

1000 applications ⇢ D ⇢

100 00.020.040.060.080.1 m Spurious Galerkin Eigenmodes

Eigenvalues 322, =6.0, Naive Galerkin 0.2

0.15 ) 0.1 Im(

0.05

0 ˆ D Dˆ Dˆ

ˆ Dˆ = P †DP Dˆ = P †DPˆ Removing Spurious Galerkin Eigenmodes

Eigenvalues 322, =6.0,HybridAlgorithm 0.2 L1: D ˆ L2: D +0.16[D†D]T ˆˆ 0.15 L3: D +0.32[Dd†D]T d ) 0.1 Im(

0.05

0 -0.05 0 0.05 0.1 0.15 0.2 0.25 Re()

Dˆ = P †DP +[P †D†DP]T Truncated Normal Stabilized Staggered MG

Total Deo,Doe applications, ⇢D⇢ solve 10000 1282, CG on Even-Odd 2562, CG on Even-Odd 1282, MG-GCR 2562, MG-GCR

1000 applications ⇢ D ⇢

100 00.020.040.060.080.1 m Domain Wall

Hierarchically deflated conjugate gradient P A Boyle (Edinburgh U.). Feb 11, 2014. 37 pp. QCD EDINBURGH-2014-03 arXiv:1402.2585

Multigrid Algorithms for Domain-Wall Saul D. Cohen (Washington U., Seattle), R.C. Brower (Boston U., Ctr. Comp. Sci.), M.A. Clark (Harvard-Smithsonian Ctr. Astrophys.), J.C. Osborn (Argonne). May 2012. 7 pp. 2d U(1) Published in PoS LATTICE2011 (2011) 030 Domain Wall: 4 + 1 with extra 5th dimension of size Ls 2 + 1 Domain Wall Spectrum: Violently Non-Normal Operator Non Normal Non Hermitian Non Pos. Def.

(Saul Cohen) High Performance on GPUs

• Cost in $s reduced by a factor of at least 100+

GPU O(10+) MG O(10+) QUDA: NVIDIA GPU

•“QCD on CUDA” team – http://lattice.github.com/quda

! Ron Babich (BU-> NVIDIA) ! Kip Barros (BU ->LANL) ! Rich Brower (Boston University) ! Michael Cheng (Boston University) ! Mike Clark (BU-> NVIDIA) ! Justin Foley (University of Utah) ! Steve Gottlieb (Indiana University) ! Bálint Joó (Jlab) ! Claudio Rebbi (Boston University) ! Guochun Shi (NCSA -> Google) ! Alexei Strelchenko (Cyprus Inst.-> FNAL) ! Hyung-Jin Kim (BNL) ! Mathias Wagner (Bielefeld -> Indiana Univ) ! Frank Winter (UoE -> Jlab) Mapping Multi-scale Algorithms to Multi-scale Architecture First Step Wilson-Dslash on GPUs

• REDUCE MEMORY TRAFFIC:

• (1) Lossless Data Compression: • SU(3) matrices are all unitary complex matrices with • det = 1. 12-number parameterization: reconstruct full matrix on the fly in registers • Additional 384 (free) flops per site

• Also have an 8-number parameterization of SU(3) manifold (requires sin/cos and sqrt)

a1 a2 a3 a1 a2 a3 b1 b2 b3 c = (axb)* b1 b2 b3 ) ( ) ( Group Manifold:S3 S5 c1 c2 c3 ⇥ • (2) Similarity Transforms to increase sparsity

• (3) Mixed Precision: Use 16-bit fixed-point representation. No loss in precision with mixed- precision solves (Almost a free lunch:small increase in iteration count) Latest Results from SuperComputing 2016 (Kate Clark)

Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization M.A. Clark, Bálint Joó, Alexei Strelchenko, Michael Cheng, Arjun Gambhir, Richard Brower. Dec 22, 2016. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '16), 3. Multi-scale Extensions

– Monte Carlo Equilibration – HMC dynamics – Simplicial Lattice Riemann Manifolds – Recursive spherical lattice applied Conformal Field Theory – Deep learning: simplicial renormalization. Multigrid ideas for HMC Very important and difficult problem Major focus of US Exascale Software project (see Poster by Mike Endres)

New simulation strategies for lattice Michael G. Endres Lattice 2016

Multiscale Monte Carlo equilibration: Pure Yang-Mills theory Michael G. Endres, Richard C. Brower, William Detmold, Kostas Orginos, Andrew V. Pochinsky Deep Learning Challenges Lattice Quantum corrected Finite Elements (QFE)

1 Vij ¯ (i)j a ¯ (i)j a S = [ ie a ⌦ij j j⌦jie a i] 2 lij i,j hXi 1 + mV ¯ + S 2 i i i W ilsonT erm

Lattice Dirac Fermions on a Simplicial Riemannian Manifold Richard C. Brower (Boston U.), George T. Fleming, Andrew D. Gasbarro (Yale U.), Timothy G. Raben, Chung-I Tan (Brown U.), Evan S. Weinberg (Boston U.). Oct 26, 2016. 55 pp. Solving CFT with Deep Learning? Back to the Bootstrap Neural Like Equations (i.e. Data: spectra + couplings to conformal blocks)

Exact 2 and 3 correlators

Only “tree” diagrams! “partial waves” exp: sum over conformal blocks

CFT Bootstrap: OPE & factorization completely fixed the theory Thanks

Let’s Make Multigrid Great Again!

Devils in the Details!

Boris Grigoryevich Galerkin (Russian: Бори́с Григо́рьевич Галёркин, surname more accurately Karl Hermann Amandus Schwarz (25 romanized as Galyorkin; March 4 [O.S. February January 1843 – 30 November 1921) 20, 1871] 1871 – July 12, 1945), First Small Success: Applied Math/Physics Collaboration First Success: Applied Math/Physics Collaboration Collaboration

•NVIDIA •Mike Clark • Ron Babich

• Michael Cheng • Saul Cohen • Oliver Witzel • Saul Cohen • INT Seattle • Saul Cohen

•MIT •Andrew Pochinsky Chris Schroeder What is the New Idea?

• Math Speak: A Schur/Schwarzian DD splitting of the vector space: - How do you spit the space into Fine vs Cause Space? - Classical MG vs Adaptive MG

But P†P = 1 so Ker(P) = 0! ker(P†)! UV! cc fine space!

S ?! P†!

span(P†)! span(P)! S P! IR! (see Front cover of Strang’s Undergraduate MIT math text!)

• In Physic Speak: The Wilsonian Renormalization Groups: - How to find sepearate UV (short scales) from IR (long scales) - Conformal (Scale Inv) vs Non-perturbative RG