Some Thoughts for Particle-In-Cell on Leadership Class Computers

Some thoughts for particle-in-cell on leadership class computers W. B. Mori University of California Los Angeles (UCLA) Departments of Physics and Astronomy and of Electrical Engineering Institute of Digital Research and Education [email protected] And much of the material from the Joule Metric study is from: R.A. Fonseca GoLP/IPFN, Instituto Superior Técnico, Lisboa, Portugal W. B. Mori | IPAM, May 14th | PLWS 2012 My experimental colleagues view of the world Life of an experimentalist Life of a simulationist W. B. Mori | IPAM, May 14th | PLWS 2012 But we face a tsunami of advances... ...in theory, in computational methods, in new hardware, and in new algorithms Contrary to the perception of my experimental colleagues, we rarely get to “relax” . This workshop offers us a rare opportunity for us to think in a “quiet room”. W. B. Mori | IPAM, May 14th | PLWS 2012 The evolution of state-of-the-art HPC systems cannot be ignored Top System Performance [MFlop/s] Tianhe-1A IBM RoadRunner 9 Cray XT5-HE 10 IBM Blue Gene/L IBM Blue Gene/L NEC Earth Simulator SGI Project Columbia IBM ASCI White Intel ASCI Red/9152 106 Intel ASCI Red/9632 Intel Paragon XP/S 140 Hitachi/Tsukuba CP-PACS/2048 Fujitsu Numerical Wind Tunnel NEC SX-3/44R Cray-2/8 ETA10-G/8 103 Cray X-MP/4 M-13 Cray-1 CDC Cyber 205 CDC STAR-100 ILLIAC IV CDC 7600 CDC 6600 Jaguar (#2 11/2010) 0 10 IBM 7030 Stretch UNIVAC LARC IBM 7090 Cray XT5 Rpeak 2.33 PFlop/s IBM 709 Rmax 1.76 PFlop/s IBM 704 10-3 1950 1965 1980 1995 2010 Memory also has increased! Year W. B. Mori | IPAM, May 14th | PLWS 2012 Petascale computing has arrived and exascale is around the corner • Jaguar (jaguarpf) • XT5 Compute node • Complete system • 18688 compute nodes • Dual hex-core AMD Opteron 2435 224256 processing cores • dedicated service/login nodes (Istanbul) at 2.6 GHz 300 TB of memory • SeaStar2+ network • 16GB DDR2-800 memory Peak performance 2.3 PFlop/s W. B.W.B.Mori Mori | IPAM, | March May 6, 2012 14th | |IST PLWS 2012 Outline Introduction/Motivation The particle-in-cell approach Code developments for HEDP modeling in HPC systems Conclusions and perspectives W. B. Mori | IPAM, May 14th | PLWS 2012 Outline Introduction/Motivation The particle-in-cell approach Code developments for HEDP modeling in HPC systems Conclusions and perspectives W. B. Mori | IPAM, May 14th | PLWS 2012 Advanced ICF modeling is demanding due to different scales involved eg. fast Ignition or shock ignition Typical compressed target Computational requirements for PIC Physical size Box size: 1 mm Cell size: 5 Å Duration: 10 ps Time step: 1 as (10-18 s) Numerical size # cells/dim: 2x106 # particles/cell: 100 (1D); 10 (2D); 1 (3D) # time steps: 106 Particle push time: 1 ms Computational time 1D - 2x103 CPU days H2 gas jet 2D - 5x108 CPU days ~ 106 CPU years Laser duration =CO 102 ps - 10 ns 11 8 laser pulse 3D - 2x10 CPU days ~ 7x10 CPU years 400 nm driver pulse W. B. Mori | IPAM, May 14th | PLWS 2012 What is high energy density plasma physics? Why are both plasma-based acceleration and the nonlinear optics of plasmas considered high-energy density plasma research? • High energy density, means high pressure – What is a high pressure? • MBar? GBar? – Need a dimensionless parameter • In plasma physics an important parameter is the number of particles in a Debye sphere (which is directly related to the ratio of the kinetic energy of an electron to the potential energy between particles). It measures the discreteness of the plasma. 4π T 2 nλ3 N =2.1 103 keV 3 d ≡ D × 1/2 PMBar – When the pressure exceeds ~1 MBar then the discrete nature of the plasma becomes important: • ND is not “infinite” • Discreteness makes developing computational methods difficult How can we describe the plasma physics? Hierarchy of descriptions (ignoring quantum effects) ‣ Klimontovich equation (“exact”). ‣ Ensemble average the Klimontovich equation • Leads to Vlasov Fokker Planck equation (approximate) ‣ Take the limit that ND is very very large • Vlasov equation ‣ Take moments of the Vlasov or Vlasov Fokker Planck equation • “Two” fluid (transport) equations ‣ Ignore electron inertia • Single fluid or MHD equations W. B. Mori | IPAM, May 14th | PLWS 2012 Klimontovich description (“exact”) Klimontovich equation Maxwell’s equations D ∂ 4π F =0 E = B J Dt ∂t ∇× − c N ∂ F (x , v ; t)= δ(x x (t))δ(v v (t)) B = E − i − i ∂t −∇ × i D ∂ + v + a Dt ≡ t ·∇x ·∇v J(x , t )= dv qv F (x , v , t ) d q v a v = E + B ≡ dt m c × W. B. Mori | IPAM, May 14th | PLWS 2012 Descriptions of a plasma Not practical to follow every particle • Ensemble average of Klimontovich equation • Ensemble averages Equation for smooth F D d F = F + δF F <F > FS = <δa vδf >= FS s s Dt − ·∇ dt ≡ S col J = J + δJ S D B = BS + δB δF = δa F + δa δF <δa δF > Dt − ·∇v s ·∇v − ·∇v s E = ES + δE q v a = E + B s m s c × s D ∂ = + v + a Dt ∂t ·∇x s ·∇v s Descriptions of a plasma Where does the Vlasov equation fit into this • Smooth F descriptions • Vlasov Fokker Planck Vlasov Equation D d F = <δa δf >= F D Dt S − ·∇v dt S S col Fs =0 Dt s D δF = δa F Dt − ·∇v s s ONLY TRUE IF ND>>>1 Sometimes called the quasi- linear approximation Descriptions of a plasma Where does the Vlasov equation fit into this • Smooth F descriptions • Vlasov Fokker Planck Vlasov Equation D d F = <δa δf >= F D Dt S − ·∇v dt S S col Fs =0 Dt s D δF = δa F Dt − ·∇v s s ONLY TRUE IF ND>>>1 Sometimes called the quasi- linear approximation W. B. Mori | IPAM, May 14th | PLWS 2012 Outline Introduction/Motivation The particle-in-cell approach Code developments for HEDP modeling in HPC systems Conclusions and perspectives W. B. Mori | IPAM, May 14th | PLWS 2012 What computational method do we use to model HEDP including discrete effects? The Particle-in-cell method Not all PIC codes are the same! dp ⎛ v ⎞ = q⎜ E + × B⎟ Integration of equations of motion, Push dt ⎝ c ⎠ particles Fk à uk à xk Interpolating Depositing Δt ( E , B )ij à Fk (x,u)k à Jij Integration of Field Equations on the grid ∂E = 4π j − c∇ × B ( E , B )ij ß Jij ∂t ∂B = −c∇ × E ∂t What is the particle-in-cell model? Is it an efﬁcient way of modeling the Vlasov equation? No, it is a Klimontovich description for ﬁnite size (macro-particles) Klimontovich eq. of macro-particles Maxwell’s equations D ∂ 4π F =0 E = B J Dt ∂t ∇× − c N ∂ F (x , v ; t)= S (x x (t))δ(v v (t)) p − i − i B = E i ∂t −∇ × D ∂ + v + a Dt ≡ t ·∇x ·∇v J(x , t )= dv qv F (x , v , t ) d q v a v = E + B ≡ dt m c × W. B. Mori | IPAM, May 14th | PLWS 2012 Does it matter that we agree on what it is? I think it does. Yes. It changes your view of what convergence tests should be done to VERIFY the implementation of the algorithm? Yes. Today’s computers are getting very powerful. We can do runs with smaller particle sizes (compared to Debye length) and with more particles. Converge to “real” conditions. Yes. A single PIC simulation is not an ensemble average. So for some problems, perhaps we should start thinking about taking ensemble averages. Yes. PIC simulations can be used to VALIDATE reduced models W.B.Mori | March 12, 2012 | UCLA Outline Introduction/Motivation The particle-in-cell approach Code developments for HEDP modeling in HPC systems Conclusions and perspectives W. B. Mori | IPAM, May 14th | PLWS 2012 OSIRIS 2.0 osiris framework · Massivelly Parallel, Fully Relativistic Particle-in-Cell (PIC) Code · Visualization and Data Analysis Infrastructure · Developed by the osiris.consortium ⇒ UCLA + IST New Features in v2.0 · High-orderOptimized igh-ordersplines splines · Binary Collision Module · PML absorbing BC · Tunnel (ADK) and Impact Ionization Ricardo Fonseca: [email protected] · Dynamic Load Balancing Frank Tsung: [email protected] · Parallel I/O http://cfp.ist.utl.pt/golp/epp/ · Hybrid options http://exodus.physics.ucla.edu/ W. B. Mori | IPAM, May 14th | PLWS 2012 Want to discuss what performance is currently possible with PIC. This sets a bar for other kinetic approaches for HEDP Results from a Joule Metric Excercise on Jaguar: ~30% of peak speed can be obtained dynamic load balancing can be important W.B.Mori | March 12, 2012 | UCLA High-order particle weighting Run simulations for up to ~ 107 iterations Control numerical self-heating • Stable algorithm • Increase number of particles per • Energy conservation cell • Numerical noise seeded • Use high-order particle weighting instabilities 10-4 2.93 10-5 Particles per cell 2.41 10-5 1 -5 10 4 16 -1 0 1 10-6 64 Linear 10-7 Energy Conservation 10-8 -1 0 1 100 101 102 103 104 105 Quadratic Iterations W. B. Mori | IPAM, May 14th | PLWS 2012 Calculations overhead Order Weights Interpolation 1 linear W0 =1 ∆ linear − Aint = WiAi W = ∆ S(x) 1.0 1 i=0 quadratic 1 2 W 1 = (1 2∆) 1 0.8 − 8 − 3 2 quadratic W0 = ∆ Aint = WiAi 4 − cubic 1 2 i= 1 W1 = 8 (1 + 2∆) − 0.6 1 3 W 1 = ( 1+∆) − − 6 − 1 2 3 2 0.4 W0 = 6 4 6∆ +3∆ cubic − Aint = WiAi W = 1 1+3∆ +3∆2 3∆3 1 6 i= 1 ∆3 − − 0.2 W2 = 6 -2 -1 0 1 2 x [cell] Ops Linear Quadratic Cubic B-spline blending functions 2D 219 480 890 performance 3D 349 1180 2863 Measured performance: • 3D quadratic ~ 2.0 × slower than 3D linear ~ 3.4 × • pipeline and cache effects W.

Some Thoughts for Particle-In-Cell on Leadership Class Computers

(52) Cont~Ol Data

Future of Supercomputing Yoshio Oyanagi ∗

Recent Supercomputing Development in Japan

VHSIC and ETA10

PERSPECTIVE on the NEXT TEN YEARS in PLASMA Physicst

Supercomputers: the Amazing Race Gordon Bell November 2014

The Evolution of Software in High Energy Physics

Chapter 5. Driving Forces on Packaging: Physical Interconnects

Needs and Challenges for Modeling FACET-II and Beyond

Recent Supercomputing Development in Japan

The Scaled-Sized Model: a Revision of Amdahl's

The Marketplace of High Performance Computing