<<

Accelerating Research and Development Using the

Fernanda Foertter HPC User Support, ORNL

ORNL is managed by UT-Battelle for the US Department of Energy What is the Leadership Computing Facility (LCF)? • Collaborative DOE Office of Science • Highly competitive user allocation program at ORNL and ANL programs (INCITE, ALCC). • Mission: Provide the computational • Projects receive 10x to 100x more and data resources required to solve resource than at other generally the most challenging problems. available centers. • 2-centers/2-architectures to address • LCF centers partner with users to diverse and growing computational enable science & engineering needs of the scientific community breakthroughs (Liaisons, Catalysts).

2 The OLCF has delivered five systems and six upgrades to our users since 2004

• Increased our system capability by 10,000x • Strong partnerships with computer designers and architects • Worked with users to scale codes by 10,000x • Science delivered through strong user partnerships to scale codes and algorithms

Titan XK7 • GPU upgrade XT5 2012 Jaguar XT4 • 6 core upgrade Jaguar XT3 • Quad core upgrade 2008 Phoenix X1 • Dual core upgrade • Doubled size 2007 • X1e 2005 2004 3 Science breakthroughs at the OLCF: SELECTED science and engineering advances over the period 2003 - 2013

2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

Researchers solved the 2D Hubbard model MD simulations show and presented evidence that it predicts selectivity filter of a trans- HTSC behavior, Phys. Rev. Lett (2005) membrane ion channel is 105 citations, 3/2014 sterically locked open by hidden water molecules, First-Principles Flame Simulation Provides Crucial Nature (2013) Information to Guide Design of Fuel-Efficient Clean Calculation of the number of bound Engines, Proc. Combust. Insti. (2007) nuclei in nature, Nature (2012), 36 78 citations, 3/2014 citations, 3/2014 , 36 citations, 3/2014 Largest simulation of a galaxy’s worth of dark matter, showed for the first time the Global Warming preceded by increasing fractal-like appearance of dark matter carbon dioxide concentrations during the substructures, Nature (2008) last deglaciation, Nature (2012). 326 citations, 3/2014 64 citations, 3/2014 World’s first continuous simulation of Demonstrated that three-body forces Astrophysicists discover 21,000 years of Earth’s climate are necessary to describe the shock-wave instability, history, Science (2009) long lifetime of 14C Astrophys. J. (2003) 116 citations 254 citations, 3/2014 Phys. Rev. Lett. (2011) Biomass as a viable, sustainable feedstock for hydrogen 28 citations, 3/2014 production for fuel cells, Nano Letters (2011) J. Phys. Chem. Lett. (2010) 4 71 & 74 citations, respectively No more free lunch

Herb Sutter: Dr. Dobb’s Journal: 5 http://www.gotw.ca/publications/concurrency-ddj.htm Power is THE problem

Power consumption of 2.3 PF (Peak) Jaguar: 7 megawatts, equivalent to that of a small city (5,000 homes)

6 Using traditional CPUs is not economically feasible

20 PF+ system: 30 megawatts (30,000 homes)

7 Why GPUs? Hierarchical Parallelism High performance and power efficiency on path to exascale • Hierarchical parallelism improves CPU GPU Accelerator scalability of applications • Expose more parallelism through code refactoring and source code directives – Doubles performance of many codes • Heterogeneous multicore processor architecture: Using right type of processor for each task • Optimized • Data locality: Keep data near processing for sequential multitasking • Optimized for many – GPU has high bandwidth to local memory simultaneous tasks for rapid access • 10 performance – GPU has large internal per socket • 5 more energy- • Explicit data management: Explicitly efficient systems manage data movement between CPU and GPU memories 8 #2

8.2 Megawatts 27 Pflops (Peak) 17.59 PFlops (Linpack)

9 10 Roadmap to Exascale

Our Science requires that we advance computational capability 1000x over the next decade.

2022

2017

2012

OLCF-5: 1 EF 20 MW

What are the Challenges? OLCF-4: 100-250 PF 4000 TB memory Titan 27 PF > 20MW 600 TB DRAM 6 day resilience 11 Hybrid GPU/CPU Represented Research Science Category Areas Requirements gathering Bioinformatics Biophysics DOE/SC and LCFs support a Life Sciences Biology Medical Science Neuroscience diverse user community Proteomics • Science benefits and impact of future Systems Biology Chemistry Chemistry systems are examined on an ongoing Physical Chemistry Computer Science Computer Science basis Climate Earth Science Geosciences • LCF staff have been actively engaged in Aerodynamics Bioenergy Engineering community assessments of future Combustion computational needs and solutions Turbulence Fusion Energy Fusion Plasma Physics • Computational science roadmaps are Materials Science Nanoelectronics developed in collaboration with leading Materials Nanomechanics domain scientists Nanophotonics Nanoscience Nuclear Fission Nuclear Energy • Detailed performance analyses are Accelerator Physics conducted for applications to understand future architectural bottlenecks Atomic/Molecular Physics Condensed Matter Physics Physics High Energy Physics • Analysis of INCITE, ALCC, Early Science, Lattice Gauge Theory Nuclear Physics and Center for Accelerated Application Solar/Space Physics Readiness (CAAR) projects history and trends 12 Requirements Process

• Surveys are a “lagging indicator” that tend to tell us what problems the users are seeing now, not what they expect to see in the future

• https://www.olcf.ornl.gov/wp-content/uploads/2013/01/OLCF_Requirements_TM_2013_Final.pdf

13 OLCF User Requirements Survey – Key Findings

• Memory bandwidth was reported as the Hardware feature Ranking greatest need Memory Bandwidth 4.4 • Local memory capacity was not a driver for Flops 4.0 most users, perhaps in recognition of cost Interconnect Bandwidth 3.9 trends Archival Storage 3.8 Capacity • 76% of users said there is still a moderate to Interconnect Latency 3.7 large amount of parallelism to extract in their Disk Bandwidth 3.7 code, but… WAN Network 3.7 Bandwidth Memory Latency 3.5 • 85% of users rated difficulty level of extracting Local Storage Capacity 3.5 that parallelism as moderate to difficult - often Memory Capacity 3.2 requires application refactoring Mean Time to Interrupt 3.0 – Highlights training needs and community Disk Latency 2.9 based efforts for application readiness Rankings from OLCF users 1=not important, 5=very important

14 Center for Accelerated Application Readiness (CAAR)

• We created CAAR as part of the Titan project to help prepare applications for accelerated architectures • Goals: – Work with code teams to develop and implement strategies for exposing hierarchical parallelism for our users applications – Maintain code portability across modern architectures – Learn from and share our results • We selected six applications from across different science domains and algorithmic motifs

15 CAAR Plan • Comprehensive team assigned to each app – OLCF application lead – engineer – developer – Other: other application developers, local tool/library developers, computational scientists • Single early-science problem targeted for each app – Success on this problem is ultimate metric for success • Particular plan-of-attack different for each app – WL-LSMS – dependent on accelerated ZGEMM – CAM-SE– pervasive and widespread custom acceleration required • Multiple acceleration methods explored – WL-LSMS – CULA, MAGMA, custom ZGEMM – CAM-SE– CUDA, directives – Two-fold aim – Maximum acceleration for model problem – Determination of optimal, reproducible acceleration path for other applications 16

Early Science Challenges for Titan

WL-LSMS LAMMPS Illuminating the role of A material disorder, simulation of organic statistics, and fluctuations polymers for applications in nanoscale materials in organic photovoltaic and systems. heterojunctions , de- wetting phenomena and biosensor applications CAM-SE Answering questions S3D about specific climate Understanding turbulent change adaptation and combustion through direct mitigation scenarios; numerical simulation with realistically represent complex chemistry. features like . precipitation patterns / statistics and tropical storms. Denovo NRDF Discrete ordinates Radiation transport – radiation transport important in astrophysics, calculations that can laser fusion, combustion, be used in a variety atmospheric dynamics, of nuclear energy and medical imaging – and technology computed on AMR grids. applications. 17 Effectiveness of GPU Acceleration? OLCF-3 Early Science Codes -- Performance on Titan XK7

Cray XK7 vs. Cray XE6

Application Performance Ratio*

LAMMPS* 7.4 Molecular dynamics S3D 2.2 Turbulent combustion

Denovo 3.8 3D neutron transport for nuclear reactors

WL-LSMS 3.8 Statistical mechanics of magnetic materials

Titan: Cray XK7 (Kepler GPU plus AMD 16-core CPU) Cray XE6: (2x AMD 16-core Opteron CPUs) *Performance depends strongly on specific problem size chosen

18 Additional Applications from Community Efforts Current Performance Measurements on Titan

Cray XK7 vs. Cray XE6

Application Performance Ratio* AWP-ODC 2.1 Seismology DCA++ 4.4 Condensed Matter Physics QMCPACK 2.0 Electronic structure RMG (DFT – real-space, multigrid) 2.0 Electronic Structure XGC1 1.8 Plasma Physics for Fusion Energy R&D

Titan: Cray XK7 (Kepler GPU plus AMD 16-core Opteron CPU) Cray XE6: (2x AMD 16-core Opteron CPUs) *Performance depends strongly on specific problem size chosen 19 All Codes Will Need Rework To Scale! • Up to 1-2 person-years required to port each code from Jaguar to Titan – Takes work, but an unavoidable step required for exascale regardless of the type of processors. It comes from the required level of parallelism on the node – Also pays off for other systems—the ported codes often run significantly faster CPU-only (Denovo 2X, CAM-SE >1.7X) • We estimate possibly 70-80% of developer time is spent in code restructuring, regardless of whether using OpenMP / CUDA / OpenCL / OpenACC / … • Each code team must make its own choice of using OpenMP vs. CUDA vs. OpenCL vs. OpenACC, based on the specific case—may be different conclusion for each code • Our users and their sponsors must plan for this expense.

20 High-impact science across a broad range of disciplines (2013) Paleoclimate Science Molecular Biology “Northern Hemisphere forcing MD simulations show selectivity filter of Southern Hemisphere of a trans-membrane ion channel is climate during the last sterically locked open by hidden water deglaciation,” Jared Ostmeyer, et al. (U. Feng He (UW Madison), et Chicago) Nature, Sept. (2013) al., Nature, February (2013) Superconductivity Molecular Biology “Doping dependence of spin “A phenylalanine rotameric excitations and correlations switch for signal-state control in with high-temperature super- bacterial chemoreceptors” conductivity in iron pnictides,“ D. Ortega (UTK), Meng Wang(IOP CAS Beijing), Nature Communications Nature Communications. December (2013) December (2013)

Polymer Science Complex Oxide Materials “Self-Organized and Cu- “Atomically resolved Coordinated Surface Linear spectroscopic studyof Sr2IrO4: Polymerization” Experiment and theory,” Qing Qing Li, B. Sumpter (ORNL), Li (ORNL), E.G. Eguiluz (UTK) Nature Scientific Reports. Nature Scientific Reports. July (2013) October (2013)

21 Increasing Usage of GPUs

INCITE 2013

INCITE 2014

As measured by ALTD against linked libraries

22 Advancing Department of Energy’s Strategic Plan Goal 2: Maintain a vibrant U.S. effort in science and engineering as a cornerstone of our economic prosperity, with clear leadership in strategic areas.

Priority : • Lead Computational Sciences and High-Performance Computing

Targeted Outcome: • Continue to develop and deploy high-performance computing hardware and software systems through exascale platforms.

23 Advancing Department of Energy’s Strategic Plan Goal 1: Catalyze the timely, material, and efficient transformation of the nation’s energy system and secure U.S. leadership in clean energy technologies.

Priority: “We will facilitate the transfer of our computer simulation capability to industry with the goal of accelerating energy technology innovation by improving designs, compressing the design cycle and easing the transitions to scale, thereby enhancing US economic competitiveness.” 24 New Industry Projects

Number of New Industry Projects Launched at OLCF 20 18 16 15 14 14 12 12 10 8 8 7 6 6 5 5 4 3 2 0 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014*

Current as of March 2014 * Year in progress

25 Growth of Industry Projects

Number of Industry Projects Underway at OLCF Current as of March 2014 * Year in progress 30 29

25 23 21 20 19

15 12 10 7 * 5 5 5 3

0 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

26 Innovation through Industrial Partnerships

Human skin Global flood Engine cycle-to- Fuel efficient Wind turbine Welding barrier maps cycle variation jet engines resilience Software Demonstrated Developed Developing novel Conducting first- First time Evaluating large- small molecules fluvial and approach to using of-a-kind high- simulation of ice scale HPC and can have large pluvial high massively parallel, fidelity LES formation within GPU capability of and varying resolution global multiple computations million-molecule critical welding impact on skin of flow in water droplets is simulation permeability flood maps to simultaneous turbomachinery expanding software and depending on enable combustion cycle components for understanding of further their molecular insurance firms simulations to more fuel freezing at the developing & characteristics— to better price address cycle-to- efficient, next- molecular level to testing weld important for risk and reduce cycle variations in generation jet enhance wind optimization product efficacy loss of life and spark ignition engines turbine resilience algorithm and safety property engine in cold climates

27 Innovation through Industrial Partnerships

Aircraft Consumer Gasoline engine Jet engine Li-ion Underhood design product stability injector efficiency batteries cooling Unexpected Developed Optimizing Accurate New classes Developed discovery of method to multihole gasoline predictions of solid inorganic a new, efficient multiple solutions measure impact spray injector of atomization Li-ion electrolytes and automatic for steady RANS of additives, nozzle designs for of liquid fuel could deliver analytical cooling by aerodynamic high ionic package equations with such as dyes better in-cylinder forces enhance and low electronic optimization separated flow and perfumes, fuel-air mixture combustion conductivity process leading helps explain why on properties of distributions, stability, improve and good to one of a kind numerical lipid systems greater fuel efficiency, electrochemical design modeling such as fabric efficiency and and reduce stability optimization of sometimes fails enhancer and reduced physical emissions cooling systems to capture other formulated prototypes maximum lift products

28 Innovation through Industrial Partnerships

Design Aircraft Industrial fire Turbo machinery Long-haul truck Catalysis innovation design suppression efficiency fuel efficiency Demonstrated Accelerating Simulated Developing Simulated Simulations biomass as a design of shock takeoff and landing high-fidelity unsteady flow in reduced by 50% viable, wave turbo scenarios modeling turbo machinery, the time to sustainable compressors improved capability for fire opening new develop a unique feedstock for carbon a critical code growth and opportunities for system of add-on for hydrogen capture and for estimating suppression; design innovation parts that production for sequestration characteristics fire losses and efficiency increases fuel fuel cells; showed of commercial account for 30% improvements. efficiency nickel is a aircraft, including of U.S. property by 7−12% for feasible catalytic lift, drag, and loss costs long-haul (18- alternative controllability wheeler) trucks to platinum

29 Education and Training at OLCF

• Tutorials online • Events open to general public

• Upcoming events – OpenACC Hackathon – Data Workshop • Contact me – [email protected]

30 LCF User Programs

10% Director’s Discretionary 60% INCITE 6 Billions Hours 60 Projects

30% ALCC

31 Getting Started at OLCF: Project Allocation Requests

https://www.olcf.ornl.gov/support/getting-started/

32 INCITE 2015 Call opens soon

• Planning Request for Information (RFI) • Call opens April, 2014. Closes June, 2014 • Expect to allocate more than 5 billion core-hours • Expect 3X oversubscription • Awards to be announced in November for CY 2015 • Average award to exceed 50 million core-hours • INCITE Proposal Writing Webinars! Contact information Julia . White, INCITE Manager April 22, 2014 1:30pm EST May 15, 2014 9:30am EST [email protected]

33 Conclusions

• DOE will continue to develop and deploy high-performance computing hardware and software systems through exascale platforms.

• Exascale will require investment in preparing applications

• OLCF will continue to support users via training and staffing.

• Road to exascale will make impossible science, possible.

34 Acknowledgements OLCF Staff: Jack Wells : Mike Matheson and Dave Pugmire (ORNL) for visualizations OLCF user requirements process: Ricky Kendall & Doug Kothe (ORNL) OLCF-3 Vendor Partners: Cray, AMD, NVIDIA, CAPS, Allinea

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05- 00OR22725.

35