ASC Platforms Impact on the USA HPC Industry
Total Page:16
File Type:pdf, Size:1020Kb
Performance Measures x.x, x.x, and x.x High Performance Computing at Livermore, Petascale and Beyond Presentation to: LSD, May 2, 2008 Livermore, CA Dr. Mark K. Seager ASC Platforms Lead at LLNL UCRL-PRES-403698 This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Talk Overview . Current Computing Environments at LLNL • Systems & Simulation Environment . Peta and Exascale Programmatic Drivers • Weapons Physics for Know Unknowns • Uncertainty Quantification for Existing Stockpile . Sequoia Target Architecture . Comments on evolutionary changes required for petascale programming model LLNL HPC for LSD 2 Our platform strategy is to straddle multiple technology curves to appropriately balance risk and cost/performance benefit Any given technology curve is ultimately Three complementary curves… limited by Moore’s Law Advanced Architectures 33 1.Delivers to today’s stockpile’s demanding (IBM BG/L, Cell) needs 2 Production environment 2 Commodity Linux Clusters For “must have” deliverables now 11 Vendor integrated SMP Cluster 2.Delivers transition for next generation (IBM SP, HP SC) “Near production”, riskier environment Capability system for risk tolerant programs Capacity systems for risk averse programs Purple Mainframes 3.Delivers affordable path to petaFLOP/s (RIP) ThunderBird Q Research environment, leading transition to Performance MCR petaflop systems? Are there other paths to a breakthrough regime by 2006-8? White Time Today FY05 LLNL HPC for LSD 3 NNSA has a sterling record of delivering production computers, setting the standard for national supercomputing user facilities ASC White [2001]: First shared tri-laboratory resource came in 23% over required peak Purple Utilization 100.0% 90.0% 80.0% 70.0% 60.0% Idle 50.0% Down 40.0% Used 30.0% 20.0% 10.0% 0.0% Nov-10 Dec-10 Jan-11 Feb-11 Mar-11 Apr-11 May-11 Jun-11 Jul-11 Aug-11 Sep-11 Oct-11 Nov-11 Dec-11 ASC Purple [2005 ]: First NNSA User Facility came in $50M under budget and is routinely 90+% utilized LLNL HPC for LSD 4 Linux Clusters Curve 2 leveraged strategic tri-labs institutional investments to create huge efficiency for institution and programs with scalable COTS and Open Source SW approach . LLNL institutional investments filled gaps in Linux cluster technology • LLNL – Early Lustre adoption leading to integrated simulation environment • LLNL – SLURM and Linux Distro development for manageability • LLNL – Thunder was TOP500(23) #2 . Real world synergy • LLNL Peloton SUs procurement for three institutional clusters • Enabled >50% TCO improvement for ASC 138 4Socket Dual Core Compute Nodes (1,104 CPUs) Minos - 6 SUs - 33TF QsNet288 Port Elan3, Infiniband 100BaseT Control 4x One Scalable Unit building block 144 Port IBA 1 Login/ 4 Gateway GW GW GW GW Remote Partition Master nodes Server 1GbE Management 1/10 GbEnet Federated Switch S S S S MD MD S S S S … Peloton is not a computer - it is a procurement and a way of thinking Oneaggressively “Peloton” about SUachieving ~ 5.55 TCO TFlopsefficiency LLNL HPC for LSD 5 America’s dominance in simulation today is the result of NNSA’s investments and its astute technical choices 500 Purple Power (IBM) sites Today’s Top 12 supercomputers Blue Gene (IBM) sites 400 Blue Gene/L depended on NNSA investments (LLNL) Red Storm (Cray) sites Curve 3 Other ASC technology 300 investments 37% of the Top 500 systems today 200 depended on ASC technology Curve 2 Red Storm (SNL) ASC Purple investments Curve 1 (LLNL) 100 Curve 1 Delivered Speed (TF/s) 0 BG/L is #1 for the seventh consecutive Top 500 list 1 2 3 4 5 6 7 8 9 10 11 12 Four Gordon Bell prizes in three years Position on TOP500 List World’s first low-power consuming supercomputer reduces energy footprint by 75% LLNL HPC for LSD 6 Livermore Currently Has the Worlds Best Computing Infrastructure, by a Wide Margin LLNL HPC for LSD 7 Open Computing Facility Simulation Environment is Based on Lustre ALC Atlas Thunder IGS (test) 9 TFLOPS 44 TFLOPS 23 TFLOPS 95TB Zeus 11 TFLOPS Prism 1 TFLOP Lustre Federated Ethernet Core Network Yana 3 TFLOPS BGL-Dev 6 TFLOPS lscratchb lscratcha Total: ~100 TFLOPS 744TB 1.2PB ~2PB 8 Secure Computing Facility Simulation Environment Replicates Many Aspects of the OCF Total: 665 TFLOPS 3.5 PB BGL Minos 596 TFLOPS 33 TFLOPS 75GB/s Rhea 22 TFLOPS Gauss 2 TFLOPS 15GB/s Lustre Federated 30GB/s Lilac Ethernet Core 2.5GB/s 9 TFLOPS Hopi Network 3 TFLOPS 45GB/s lscratch1 lscratch3 1PB 2.5PB 9 Predictive simulation roadmap through exascale DP Goals: Develop and demonstrate Assess & certify without capability to certify aging Certify LEPs and RRWs (near requiring reliance on UGTs weapons with codes calibrated neighbors to the test base) to past UGTs ASC Goals: • Validated predictive capabilities • Create 3D integrated design • Ad hoc factors replaced by physics • First principles/atomistic codes based models calculations of key physics • Calibrate codes with AGEX • Transition to quantified 3D predictive • Design with new materials or and UGT data capability components in months Initial Technical need Principal uncertainties: Energy Conditions Secondary for future nuclear Boost Balance For Boost Performance tests eliminated 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Roadrunner Red Storm 124 Computing Power: Purple Dawn 100 BG/L 14-20PF 150PF BG/L 590 360 1EF Terascale systems Petascale systems Exascale systems Uncertainty Quantification LLNL HPC for LSD 10 Predicting stockpile performance drives five separate classes of petascale calculations 1. Quantifying uncertainty (for all classes of simulation) 2. Identify and model missing physics (e.g., boost) 3. Improving accuracy in material property data 4. Improving models for known physical processes 5. Improving the performance of complex models and algorithms in macro-scale simulation codes Each of these mission drivers require petascale computing LLNL HPC for LSD 11 Sequoia is the key integrating tool for Stockpile Stewardship in 2011-2015 time frame 10X 100X CAPABILITY UQ 5 PF/s Exascale Sequoia will provide 4,400 runs at 10 credible UQ for 3D First 3D look at Certification stockpile certification TF/s/run in 1 initial conditions month for boost with predictive UQ •Execution of Integrated RRW Primary Bound impact of Boost Program (IBP) Engineering 3D features ~2020 requires Sequoia – IBP can’t be done on UQ Purple UQ CAPABILITY (on 10% of Purple) 4,400 runs at 10 2D TF/s/run 5 PF/s •IBP is necessary to Baseline of Simulation Uncertainties Expose true physics High resolution achieve certification uncertainty to drive discovery for IBP with predictive UQ 4,400 runs in 1 month IBP Standard High Ultra Resolution Resolution Resolution LLNL HPC for LSD 12 Sequoia Target Architecture Leverages LLNL Success with Five Generations of ASC Platforms . Sequoia Peta-Scale Architecture • Smoothly integrates into LLNL environment • Architecture suitable for weapons science and Integrated Design Codes (IDC) 24x Purple for design codes in support of certification 20-50x for weapon science requirements • Risk Reduction woven into every aspect of Sequoia Associated R&D contracts accelerate technology development Targets and Off-ramps allow innovation with shared risk • ASC HQ supports: $250M Total Project Cost Sequoia, ID system and R&D LLNL HPC for LSD 13 Each year we get faster more processors Moore’s Historically: Boost single- Montecito Law stream performance via more complex chips, first Intel CPU Trends via one big feature, then via (sources: Intel, Wikipedia, K. Olukotun) lots of smaller features. Now: Deliver more cores per chip. Pentium The free lunch is over for 386 today’s sequential apps and many concurrent apps (expect some regressions). We need killer apps with lots of latent parallelism. A generational advance >OO is necessary to get above the “threads+locks” programming model. From Herb Sutter 14 1414 LLNL HPC for LSD <[email protected]> How many cores are you coding for? 600 How 512 500 Do We Get to 400 Here? OoO-Cores InO-Cores InO-Threads 300 256 You 256 Parallelism 200 Are Here! 128 128 100 64 64 32 32 32 1 2 4 8 8 16 16 0 2005 2006 2007 2008 2009 2010 2011 2012 2013 Microprocessor parallelism will increase exponentially (2x/1.5yr) in the next decade From Herb Sutter LLNL HPC for 15LSD 15 <[email protected]> DRAM component density is doubling every 3 years Source: IBM LLNL HPC for LSD 16 Source: IBM Source: Memory power projection (Quad projection power Memory 25 20 15 - LLNL HPC HPC LLNLfor LSD rank DIMM) rank 10 DDR Power [Watt]Power DDR2 5 DDR3 DDR4 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 17 Year 17 Application Programming Model Requirements . MPI Parallelism at top level • Static allocation of MPI tasks to nodes and sets of cores+threads . Effectively absorb multiple cores+threads in MPI task . Support multiple languages: C/C++/Fortran03 . Allow different physics packages to express node concurrency in different ways LLNL HPC for LSD 18 18 Unified Nested Node Concurrency TM/SE TM/SE OpenMP OpenMP OpenMP OpenMP MPI_FINALIZE MPI_INIT MPI Call MPI MPI Call MPI MPI Call MPI Call MPI Call MPI Call MPI Call MPI Thread0 Thread1 W W Thread2 W W Exit MAIN MAIN Funct2 Funct1 Thread3 W Funct1 1-3 W 1-3 1-3 1-3 1-3 1-3 1-3 1) Pthreads born with MAIN 2) Only Thread0 calls functions to nest parallelism 3) Pthreads based MAIN calls OpenMP based Funct1 4) OpenMP Funct1 calls TM/SE based Funct2 5) Funct2 returns to OpenMP based Funct1 6) Funct1 returns to Pthreads based MAIN .