Tsubame 2.5 Towards 3.0 and Beyond to Exascale

Total Page:16

File Type:pdf, Size:1020Kb

Tsubame 2.5 Towards 3.0 and Beyond to Exascale Being Very Green with Tsubame 2.5 towards 3.0 and beyond to Exascale Satoshi Matsuoka Professor Global Scientific Information and Computing (GSIC) Center Tokyo Institute of Technology ACM Fellow / SC13 Tech Program Chair NVIDIA Theater Presentation 2013/11/19 Denver, Colorado TSUBAME2.0 NEC Confidential TSUBAME2.0 Nov. 1, 2010 “The Greenest Production Supercomputer in the World” TSUBAME 2.0 New Development >600TB/s Mem BW 220Tbps NW >12TB/s Mem BW >400GB/s Mem BW >1.6TB/s Mem BW Bisecion BW 80Gbps NW BW 35KW Max 1.4MW Max 32nm 40nm ~1KW max 3 Performance Comparison of CPU vs. GPU 1750 GPU 200 GPU ] 1500 160 1250 GByte/s 1000 120 750 80 500 CPU CPU 250 40 Peak Performance [GFLOPS] Performance Peak 0 Memory Bandwidth [ Bandwidth Memory 0 x5-6 socket-to-socket advantage in both compute and memory bandwidth, Same power (200W GPU vs. 200W CPU+memory+NW+…) NEC Confidential TSUBAME2.0 Compute Node 1.6 Tflops Thin 400GB/s Productized Node Mem BW as HP 80GBps NW ProLiant Infiniband QDR x2 (80Gbps) ~1KW max SL390s HP SL390G7 (Developed for TSUBAME 2.0) GPU: NVIDIA Fermi M2050 x 3 515GFlops, 3GByte memory /GPU CPU: Intel Westmere-EP 2.93GHz x2 (12cores/node) Multi I/O chips, 72 PCI-e (16 x 4 + 4 x 2) lanes --- 3GPUs + 2 IB QDR Memory: 54, 96 GB DDR3-1333 SSD:60GBx2, 120GBx2 Total Perf 2.4PFlops Mem: ~100TB NEC Confidential SSD: ~200TB 4-1 2010: TSUBAME2.0 as No.1 in Japan > All Other Japanese Centers on the Top500 COMBINED 2.3 PetaFlops Total 2.4 Petaflops #4 Top500, Nov. 2010 NEC Confidential TSUBAME Wins Awards… “Greenest Production Supercomputer in the World” the Green 500 Nov. 2010, June 2011 (#4 Top500 Nov. 2010) 3 times more power efficient than a laptop! TSUBAME Wins Awards… ACM Gordon Bell Prize 2011 2.0 Petaflops Dendrite Simulation Special Achievements in Scalability and Time-to-Solution “Peta-Scale Phase-Field Simulation for Dendritic Solidification on the TSUBAME 2.0 Supercomputer” TSUBAME Three Key Application Areas “Of High National Interest and Societal Benefit to the Japanese Taxpayers” 1. Safety/Disaster & Environment 2. Medical & Pharmaceutical 3. Manufacturing & Materials Plus Co-Design for general IT Industry and Ecosystem impact (IDC, Big Data, etc.) 9 Lattice-Boltzmann-LES with Coherent-structure SGS model [Onodera&Aoki2013] Coherent-structure Smagorinsky model Second invariant of the velocity gradient The model parameter is locally determined by thetensor(Q) and second invariant of the velocity gradient tensor. Energy dissipation(ε) ◎ Turbulent flow around a complex object ◎ Large-scale parallel computation Computational Area – Entire Downtown Tokyo Major part of Tokyo Including Shnjuku-ku, Chiyoda-ku, Minato-ku, Shinjyuku Tokyo Meguro-ku, Chuou-ku, 10km×10km Building Data: Shibuya Pasco Co. Ltd. TDM 3D Achieved 0.592 Petaflops Shinagawa using over 4000 GPUs (15% efficiency) Map ©2012 Google, ZENRIN 11 Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology Area Around Metropolitan Government Building Flow profile at the 25m height on the ground Wind 960 m 640 m 地図データ ©2012 Google, ZENRIN Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 13 Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 14 3,000x1,500x1,500 LBM: DriVer: BMW-Audi Re = 1,000,000 Lehrstuhl für Aerodynamik und Strömungsmechanik Technische Universität München * Number of grid points: 3,623,878,656 60 km/h (3,072 × 1,536 × 768) *Grid resolution:4.2mm (13m x 6.5 m x 3.25m) *Number of GPUs: 288 (96 Nodes) 15 16 TSUBAME2.0 Contributed Significantly to Creation of the National Earthquake Hazard Map Industry prog.: TOTO INC. TSUBAME 150 GPUs In-House Cluster Accelerate In- silico screeninig and data mining Mixed Precision Amber on Tsubame2.0 for Industrial Drug Discovery x10 faster Mixed-Precision ヌクレオソーム (25095 粒子) 75% Energy Efficient $500mil~$1bil dev. cost per drug > 10% improvement of the process will more than pay for TSUBAME Towards TSUBAME 3.0 Interim Upgrade TSUBAME2.0 to 2.5 (Sept.10th, 2013) • Upgrade the TSUBAME2.0s GPUs NVIDIA Fermi M2050 to Kepler K20X SFP/DFP peak from 4.8PF/2.4PF => 17PF/5.7PF c.f. The K Computer 11.2/11.2 Acceleration of Important Apps Considerable Improvement TSUBAME2.0 Compute Node Summer 2013 Fermi GPU 3 x 1408 = 4224 GPUs Significant Capacity Improvement at low cost & w/o Power Increase TSUBAME3.0 2H2015 TSUBAME2.0⇒2.5 Thin Node Upgrade Peak Perf. Thin 4.08 Tflops Productized Node ~800GB/s as HP Mem BW ProLiant Infiniband QDR 80GBps NW SL390s x2 (80Gbps) ~1KW max Modified for TSUABME2.5 HP SL390G7 (Developed for TSUBAME 2.0, Modified for 2.5) GPU: NVIDIA Kepler K20X x 3 1310GFlops, 6GByte Mem(per GPU) CPU: Intel Westmere-EP 2.93GHz x2 Multi I/O chips, 72 PCI-e (16 x 4 + 4 x 2) lanes --- 3GPUs + 2 IB QDR Memory: 54, 96 GB DDR3-1333 SSD:60GBx2, 120GBx2 NVIDIA Fermi NVIDIA Kepler M2050 K20X 1039/515 3950/1310 GFlops GFlops TSUBAME2.0 => 2.5 Changes • Doubled~Tripled performance – 2.4(DFP)/4.8(SFP) Petaflops => 5.76(x 2.4)/17.1(x3.6) – Preliminary results: ~2.7PF Linpack (x2.25) , ~3.4PF Dandrite GB app (x1.7) • Bigger and higher bandwidth GPU memory – 3GB=>6GB per GPU, 150GB/s => 250GB/s • Higher reliability – Resolved minor HW bug, compute node fail-stop occurrence to decrease up to 50% • Lower Power – Observing ~20% drop in power/energy (tentative) • Better programmability w/new GPU features – Dynamic tasks, HyperQ, CPU/GPU shared memory • Prolongs TSUBAME2 lifetime by at least 1 year – TSUBAME 3.0 FY 2015 Q4 TSUBAME2.0 TSUBAME2.5 Thin Node x 1408 Units Node Machine HP Proliant SL390s ← No Change CPU Intel Xeon X5670 ← No Change (6core 2.93GHz, Westmere) x 2 GPU NVIDIA Tesla M2050 x 3 NVIDIA Tesla K20X x 3 448 CUDA cores (Fermi) 2688 CUDA cores (Kepler) SFP 1.03TFlops SFP 3.95TFlops DFP 0.515TFlops DFP 1.31TFlops 3GiB GDDR5 memory 6GiB GDDR5 memory 150GB Peak, ~90GB/s 250GB Peak, ~180GB/s STREAM Memory BW STREAM Memory BW Node SFP 3.40TFlops SFP 12.2TFlops Performance DFP 1.70TFlops DFP 4.08TFlops (incl. CPU ~500GB Peak, ~300GB/s ~800GB Peak, ~570GB/s Turbo boost) STREAM Memory BW STREAM Memory BW TOTAL System SFP 4.80PFlops SFP 17.1PFlops (x3.6) Total System DFP 2.40PFlops DFP 5.76PFlops (x2.4) Performance Peak ~0.70PB/s, STREAM Peak ~1.16PB/s, STREAM ~0.440PB/s Memory BW ~0.804PB/s Memory BW (x1.8) 2013: TSUBAME2.5 No.1 in Japan* in Single Precision FP, 17 Petaflops (*but not in Linpack) All University Centers ~=COMBINED 9 Petaflops SFP Total 17.1 Petaflops SFP K Computer 5.76 Petaflops DFP 11.4 Petaflops SFP/DFP Phase-field simulation for Dendritic Solidification [Shimokawabe, Aoki et. al.] Gordon Bell 2011 Winner Weak scaling on TSUBAME (Single precision) Mesh size(1GPU+4 CPU cores):4096 x 162 x 130 TSUBAME 2.5 3.444 PFlops (3,968 GPUs+15,872 CPU cores) 4,096 x 5,022 x 16,640 Developing lightweight strengthening TSUBAME 2.0 material by controlling microstructure 2.000 PFlops (4,000 GPUs+16,000 CPU cores) Low-carbon society 4,096 x 6,480 x 13,000 • Peta-Scale phase-field simulations can simulate the multiple dendritic growth during solidification required for the evaluation of new materials. • 2011 ACM Gordon Bell Prize Special Achievements in Scalability and Time-to-Solution Peta-scale stencil application : A Large-scale LES Wind Simulation using Lattice Boltzmann Method [Onodera, Aoki] Large-scale Wind Simulation for a Weak scalability in single precision 10km x 10km Area in Metropolitan Tokyo (N = 192 x 256 x 256) 10,080 x 10,240 x 512 (4,032 GPUs) ▲ TSUBAME 2.5 (overlap) ] ● TSUBAME 2.0 (overlap) TSUBAME 2.5 TFlops 1142 TFlops (3968 GPUs) x 1.93 288 GFlops / GPU TSUBAME 2.0 Performance [ 149 TFlops (1000 GPUs) 149 GFlops / GPU The above peta-scale simulations were executed as the TSUBAME Grand Challenge Program, Category A in 2012 fall. Number of GPUs • The LES wind simulation for the area 10km × 10km with 1-m resolution has never been done before in the world. • We achieved 1.14 PFLOPS using 3968 GPUs on the TSUBAME 2.5 supercomputer. AMBER pmemd benchmark Nucleosome = 25,095 atoms K20X×8 11.39 K20X×4 6.66 K20X×2 4.04 K20X×1 3.11 M2050×8 3.44 2.22 M2050×4 1.85 M2050×2 M2050×1 0.99 0.31 MPI 4node 0.15 MPI 2node 0.11 Dr.Sekijima@Tokyo Tech MPI 1node (12 core) 0 2 4 6 8 10 12 TSUBAME2.0 M2050 ns/day TSUBAME2.5 K20X Application TSUBAME2.0 TSUBAME2.5 Boost Performance Performance Ratio Top500/Linpack 1.192 2.843 2.39 4131 GPUs (PFlops) Semi-Definite Programming 1.019 1.713 1.68 Nonlinear Optimization 4080 GPUs (PFlops) Gordon Bell Dendrite Stencil 2.000 3.444 1.72 3968 GPUs (PFlops) LBM LES Whole City Airflow 0.592 1.142 1.93 3968 GPUs (PFlops) Amber 12 pmemd 3.44 11.39 3.31 4 nodes 8 GPUs (nsec/day) GHOSTM Genome Homology 19361 10785 1.80 Search 1 GPU (Sec) MEGADOC Protein Docking 37.11 83.49 2.25 1 node 3GPUs (vs. 1CPU core) Stay tuned for the TSUBAME2.5 and TSUBAME-KFC Green500 submission numbers later this talk TSUBAME2.0 => 2.5 10~20%power reduction Sept Sept 2012 2013 TSUBAME Evolution Towards Exascale and Extreme Big Data 25-30PF 1TB/s 5.7PF Graph 500 No.
Recommended publications
  • Linpack Evaluation on a Supercomputer with Heterogeneous Accelerators
    Linpack Evaluation on a Supercomputer with Heterogeneous Accelerators Toshio Endo Akira Nukada Graduate School of Information Science and Engineering Global Scientific Information and Computing Center Tokyo Institute of Technology Tokyo Institute of Technology Tokyo, Japan Tokyo, Japan [email protected] [email protected] Satoshi Matsuoka Naoya Maruyama Global Scientific Information and Computing Center Global Scientific Information and Computing Center Tokyo Institute of Technology/National Institute of Informatics Tokyo Institute of Technology Tokyo, Japan Tokyo, Japan [email protected] [email protected] Abstract—We report Linpack benchmark results on the Roadrunner or other systems described above, it includes TSUBAME supercomputer, a large scale heterogeneous system two types of accelerators. This is due to incremental upgrade equipped with NVIDIA Tesla GPUs and ClearSpeed SIMD of the system, which has been the case in commodity CPU accelerators. With all of 10,480 Opteron cores, 640 Xeon cores, 648 ClearSpeed accelerators and 624 NVIDIA Tesla GPUs, clusters; they may have processors with different speeds as we have achieved 87.01TFlops, which is the third record as a result of incremental upgrade. In this paper, we present a heterogeneous system in the world. This paper describes a Linpack implementation and evaluation results on TSUB- careful tuning and load balancing method required to achieve AME with 10,480 Opteron cores, 624 Tesla GPUs and 648 this performance. On the other hand, since the peak speed is ClearSpeed accelerators. In the evaluation, we also used a 163 TFlops, the efficiency is 53%, which is lower than other systems.
    [Show full text]
  • (Intel® OPA) for Tsubame 3
    CASE STUDY High Performance Computing (HPC) with Intel® Omni-Path Architecture Tokyo Institute of Technology Chooses Intel® Omni-Path Architecture for Tsubame 3 Price/performance, thermal stability, and adaptive routing are key features for enabling #1 on Green 500 list Challenge How do you make a good thing better? Professor Satoshi Matsuoka of the Tokyo Institute of Technology (Tokyo Tech) has been designing and building high- performance computing (HPC) clusters for 20 years. Among the systems he and his team at Tokyo Tech have architected, Tsubame 1 (2006) and Tsubame 2 (2010) have shown him the importance of heterogeneous HPC systems for scientific research, analytics, and artificial intelligence (AI). Tsubame 2, built on Intel® Xeon® Tsubame at a glance processors and Nvidia* GPUs with InfiniBand* QDR, was Japan’s first peta-scale • Tsubame 3, the second- HPC production system that achieved #4 on the Top500, was the #1 Green 500 generation large, production production supercomputer, and was the fastest supercomputer in Japan at the time. cluster based on heterogeneous computing at Tokyo Institute of Technology (Tokyo Tech); #61 on June 2017 Top 500 list and #1 on June 2017 Green 500 list For Matsuoka, the next-generation machine needed to take all the goodness of Tsubame 2, enhance it with new technologies to not only advance all the current • The system based upon HPE and latest generations of simulation codes, but also drive the latest application Apollo* 8600 blades, which targets—which included deep learning/machine learning, AI, and very big data are smaller than a 1U server, analytics—and make it more efficient that its predecessor.
    [Show full text]
  • TSUBAME---A Year Later
    1 TSUBAME---A Year Later Satoshi Matsuoka, Professor/Dr.Sci. Global Scientific Information and Computing Center Tokyo Inst. Technology & NAREGI Project National Inst. Informatics EuroPVM/MPI, Paris, France, Oct. 2, 2007 2 Topics for Today •Intro • Upgrades and other New stuff • New Programs • The Top 500 and Acceleration • Towards TSUBAME 2.0 The TSUBAME Production 3 “Supercomputing Grid Cluster” Spring 2006-2010 Voltaire ISR9288 Infiniband 10Gbps Sun Galaxy 4 (Opteron Dual x2 (DDR next ver.) core 8-socket) ~1310+50 Ports “Fastest ~13.5Terabits/s (3Tbits bisection) Supercomputer in 10480core/655Nodes Asia, 29th 21.4Terabytes 10Gbps+External 50.4TeraFlops Network [email protected] OS Linux (SuSE 9, 10) Unified IB NAREGI Grid MW network NEC SX-8i (for porting) 500GB 500GB 48disks 500GB 48disks 48disks Storage 1.5PB 1.0 Petabyte (Sun “Thumper”) ClearSpeed CSX600 0.1Petabyte (NEC iStore) SIMD accelerator Lustre FS, NFS, CIF, WebDAV (over IP) 360 boards, 70GB/s 50GB/s aggregate I/O BW 35TeraFlops(Current)) 4 Titech TSUBAME ~76 racks 350m2 floor area 1.2 MW (peak) 5 Local Infiniband Switch (288 ports) Node Rear Currently 2GB/s / node Easily scalable to 8GB/s / node Cooling Towers (~32 units) ~500 TB out of 1.1PB 6 TSUBAME assembled like iPod… NEC: Main Integrator, Storage, Operations SUN: Galaxy Compute Nodes, Storage, Solaris AMD: Opteron CPU Voltaire: Infiniband Network ClearSpeed: CSX600 Accel. CFS: Parallel FSCFS Novell: Suse 9/10 NAREGI: Grid MW Titech GSIC: us UK Germany AMD:Fab36 USA Israel Japan 7 TheThe racksracks werewere readyready
    [Show full text]
  • Tokyo Tech's TSUBAME 3.0 and AIST's AAIC Ranked 1St and 3Rd on the Green500
    PRESS RELEASE Sources: Tokyo Institute of Technology National Institute of Advanced Industrial Science and Technology For immediate release: June 21, 2017 Subject line: Tokyo Tech’s TSUBAME 3.0 and AIST’s AAIC ranked 1st and 3rd on the Green500 List Highlights ►Tokyo Tech’s next-generation supercomputer TSUBAME 3.0 ranks 1st in the Green500 list (Ranking of the most energy efficient supercomputers). ►AIST’s AI Cloud, AAIC, ranks 3rd in the Green500 list, and 1st among air-cooled systems. ►These achievements were made possible through collaboration between Tokyo Tech and AIST via the AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory (RWBC-OIL). The supercomputers at Tokyo Institute of Technology (Tokyo Tech) and, the National Institute of Advanced Industrial Science and Technology (AIST) have been ranked 1st and 3rd, respectively, on the Green500 List1, which ranks supercomputers worldwide in the order of their energy efficiency. The rankings were announced on June 19 (German time) at the international conference, ISC HIGH PERFORMANCE 2017 (ISC 2017), in Frankfurt, Germany. These achievements were made possible through our collaboration at the AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory (RWBC-OIL), which was established on February 20th this year, headed by Director Satoshi Matsuoka. At the award ceremony (Fourth from left to right: Professor Satoshi Matsuoka, Specially Appointed Associate Professor Akira Nukada) The TSUBAME 3.0 supercomputer of the Global Scientific Information and Computing Center (GSIC) in Tokyo Tech will commence operation in August 2017; it can achieve 14.110 GFLOPS2 per watt. It has been ranked 1st on the Green500 List of June 2017, making it Japan’s first supercomputer to top the list.
    [Show full text]
  • World's Greenest Petaflop Supercomputers Built with NVIDIA Tesla Gpus
    World's Greenest Petaflop Supercomputers Built With NVIDIA Tesla GPUs GPU Supercomputers Deliver World Leading Performance and Efficiency in Latest Green500 List Leaders in GPU Supercomputing talk about their Green500 systems Tianhe-1A Supercomputer at the National Supercomputer Center in Tianjin Tsubame 2.0 from Tokyo Institute of Technology Tokyo Tech talks about their Tsubame 2.0 supercomputer - Part 1 Tokyo Tech talk about their Tsubame 2.0 supercomputer - Part 2 NEW ORLEANS, LA--(Marketwire - November 18, 2010) - SC10 -- The "Green500" list of the world's most energy-efficient supercomputers was released today, revealed that the only petaflop system in the top 10 is powered by NVIDIA® Tesla™ GPUs. The system was Tsubame 2.0 from Tokyo Institute of Technology (Tokyo Tech), which was ranked number two. "The rise of GPU supercomputers on the Green500 signifies that heterogeneous systems, built with both GPUs and CPUs, deliver the highest performance and unprecedented energy efficiency," said Wu-chun Feng, founder of the Green500 and associate professor of Computer Science at Virginia Tech. GPUs have quickly become the enabling technology behind the world's top supercomputers. They contain hundreds of parallel processor cores capable of dividing up large computational workloads and processing them simultaneously. This significantly increases overall system efficiency as measured by performance per watt. "Top500" supercomputers based on heterogeneous architectures are, on average, almost three times more power-efficient than non-heterogeneous systems. Three other Tesla GPU-based systems made the Top 10. The National Center for Supercomputing Applications (NCSA) and Georgia Institute of Technology in the U.S. and the National Institute for Environmental Studies in Japan secured 3rd, 9th and 10th respectively.
    [Show full text]
  • The TSUBAME Grid: Redefining Supercomputing
    The TSUBAME Grid Redefining Supercomputing < One of the world’s leading technical institutes, the Tokyo Institute of Technology (Tokyo Tech) created the fastest supercomputer in Asia, and one of the largest outside of the United States. Using Sun x64 servers and data servers deployed in a grid architecture, Tokyo Tech built a cost-effective, flexible supercomputer that meets the demands of compute- and data-intensive applications. With hundreds of systems incorporating thousands of processors and terabytes of memory, the TSUBAME grid delivers 47.38 TeraFLOPS1 of sustained performance and 1 petabyte (PB) of storage to users running common off-the-shelf applications. Supercomputing demands Not content with sheer size, Tokyo Tech was Tokyo Tech set out to build the largest, and looking to bring supercomputing to everyday most flexible, supercomputer in Japan. With use. Unlike traditional, monolithic systems numerous groups providing input into the size based on proprietary solutions that service the and functionality of the system, the new needs of the few, the new supercomputing Highlights supercomputing campus grid infrastructure architecture had to be able to run commerical off-the-shelf and open source applications, • The Tokyo Tech Supercomputer had several key requirements. Groups focused and UBiquitously Accessible Mass on large-scale, high-performance distributed including structural analysis applications like storage Environment (TSUBAME) parallel computing required a mix of 32- and ABAQUS and MSC/NASTRAN, computational redefines supercomputing 64-bit systems that could run the Linux chemistry tools like Amber and Gaussian, and • 648 Sun Fire™ X4600 servers operating system and be capable of providing statistical analysis packages like SAS, Matlab, deliver 85 TeraFLOPS of peak raw over 1,200 SPECint2000 (peak) and 1,200 and Mathematica.
    [Show full text]
  • Highlights of the 53Rd TOP500 List
    ISC 2019, Frankfurt, Highlights of June 17, 2019 the 53rd Erich TOP500 List Strohmaier ISC 2019 TOP500 TOPICS • Petaflops are everywhere! • “New” TOP10 • Dennard scaling and the TOP500 • China: Top consumer and producer ? A closer look • Green500, HPCG • Future of TOP500 Power # Site Manufacturer Computer Country Cores Rmax ST [Pflops] [MW] Oak Ridge 41 LIST: THESummit TOP10 1 IBM IBM Power System, USA 2,414,592 148.6 10.1 National Laboratory P9 22C 3.07GHz, Mellanox EDR, NVIDIA GV100 Sierra Lawrence Livermore 2 IBM IBM Power System, USA 1,572,480 94.6 7.4 National Laboratory P9 22C 3.1GHz, Mellanox EDR, NVIDIA GV100 National Supercomputing Sunway TaihuLight 3 NRCPC China 10,649,600 93.0 15.4 Center in Wuxi NRCPC Sunway SW26010, 260C 1.45GHz Tianhe-2A National University of 4 NUDT ANUDT TH-IVB-FEP, China 4,981,760 61.4 18.5 Defense Technology Xeon 12C 2.2GHz, Matrix-2000 Texas Advanced Computing Frontera 5 Dell USA 448,448 23.5 Center / Univ. of Texas Dell C6420, Xeon Platinum 8280 28C 2.7GHz, Mellanox HDR Piz Daint Swiss National Supercomputing 6 Cray Cray XC50, Switzerland 387,872 21.2 2.38 Centre (CSCS) Xeon E5 12C 2.6GHz, Aries, NVIDIA Tesla P100 Los Alamos NL / Trinity 7 Cray Cray XC40, USA 979,072 20.2 7.58 Sandia NL Intel Xeon Phi 7250 68C 1.4GHz, Aries National Institute of Advanced AI Bridging Cloud Infrastructure (ABCI) 8 Industrial Science and Fujitsu PRIMERGY CX2550 M4, Japan 391,680 19.9 1.65 Technology Xeon Gold 20C 2.4GHz, IB-EDR, NVIDIA V100 SuperMUC-NG 9 Leibniz Rechenzentrum Lenovo ThinkSystem SD530, Germany 305,856
    [Show full text]
  • Tokyo Tech Tsubame Grid Storage Implementation
    TOKYO TECH TSUBAME GRID STORAGE IMPLEMENTATION Syuuichi Ihara, Sun Microsystems Sun BluePrints™ On-Line — May 2007 Part No 820-2187-10 Revision 1.0, 5/22/07 Edition: May 2007 Sun Microsystems, Inc. Table of Contents Introduction. .1 TSUBAME Architecture and Components . .2 Compute Servers—Sun Fire™ X4600 Servers . 2 Data Servers—Sun Fire X4500 Servers . 3 Voltaire Grid Directory ISR9288 . 3 Lustre File System . 3 Operating Systems . 5 Installing Required RPMs . .6 Required RPMs . 7 Creating a Patched Kernel on the Sun Fire X4500 Servers. 8 Installing Lustre Related RPMs on Red Hat Enterprise Linux 4 . 12 Modifying and Installing the Marvell Driver for the Patched Kernel. 12 Installing Lustre Client-Related RPMs on SUSE Linux Enterprise Server 9. 14 Configuring Storage and Lustre . 17 Sun Fire x4500 Disk Management. 17 Configuring the Object Storage Server (OSS) . 21 Setting Up the Meta Data Server . 29 Configuring Clients on the Sun Fire X4600 Servers . 30 Software RAID and Disk Monitoring . 31 Summary . 33 About the Author . 33 Acknowledgements. 33 References . 34 Ordering Sun Documents . 34 Accessing Sun Documentation Online . 34 1Introduction Sun Microsystems, Inc. Chapter 1 Introduction One of the world’s leading technical institutes, the Tokyo Institute of Technology (Tokyo Tech) recently created the fastest supercomputer in Asia, and one of the largest supercomputers outside of the United States. Deploying Sun Fire™ x64 servers and data servers in a grid architecture enabled Tokyo Tech to build a cost-effective, flexible supercomputer to meet the demands of compute- and data-intensive applications. Hundreds of systems in the grid, which Tokyo Tech named TSUBAME, incorporate thousands of processors and terabytes of memory, delivering 47.38 trillion floating- point operations per second (TeraFLOPS) of sustained LINPACK benchmark performance, and is expected to reach 100 TeraFLOPS in the future.
    [Show full text]
  • TSUBAME2.0: a Tiny and Greenest Petaflops Supercomputer
    TSUBAME2.0: A Tiny and Greenest Petaflops Supercomputer Satoshi Matsuoka Global Scientific Information and Computing Center (GSIC) Tokyo Institute of Technology (Tokyo Tech.) Booth #1127 Booth Presentations SC10 Nov 2010 The TSUBAME 1.0 “Supercomputing Grid Cluster” Unified IB network Spring 2006 Voltaire ISR9288 Infiniband 10Gbps x2 (DDR next ver.) Sun Galaxy 4 (Opteron Dual ~1310+50 Ports “Fastest core 8-socket) ~13.5Terabits/s (3Tbits bisection) Supercomputer in 10480core/655Nodes Asia” 7th on the 27th 21.4Terabytes 10Gbps+External 50.4TeraFlops Network [email protected] OS Linux (SuSE 9, 10) NAREGI Grid MW NEC SX-8i (for porting) 500GB 48disks 500GB 500GB 48disks 48disks Storage 1.0 Petabyte (Sun “Thumper”) ClearSpeed CSX600 0.1Petabyte (NEC iStore) SIMD accelerator Lustre FS, NFS, WebDAV (over IP) 360 boards, 50GB/s aggregate I/O BW 35TeraFlops(Current)) Titech TSUBAME ~76 racks 350m2 floor area 1.2 MW (peak), PUE=1.44 You know you have a problem when, … Biggest Problem is Power… Peak Watts/ Peak Ratio c.f. Machine CPU Cores Watts MFLOPS/ CPU GFLOPS TSUBAME Watt Core TSUBAME(Opteron) 10480 800,000 50,400 63.00 76.34 TSUBAME2006 (w/360CSs) 11,200 810,000 79,430 98.06 72.32 TSUBAME2007 (w/648CSs) 11,776 820,000 102,200 124.63 69.63 1.00 Earth Simulator 5120 6,000,000 40,000 6.67 1171.88 0.05 ASCI Purple (LLNL) 12240 6,000,000 77,824 12.97 490.20 0.10 AIST Supercluster (Opteron) 3188 522,240 14400 27.57 163.81 0.22 LLNL BG/L (rack) 2048 25,000 5734.4 229.38 12.21 1.84 Next Gen BG/P (rack) 4096 30,000 16384 546.13 7.32 4.38 TSUBAME 2.0 (2010Q3/4) 160,000 810,000 1,024,000 1264.20 5.06 10.14 TSUBAME 2.0 x24 improvement in 4.5 years…? ~ x1000 over 10 years Scaling Peta to Exa Design? • Shorten latency as much as possible – Extreme multi-core incl.
    [Show full text]
  • The Immersion Cooled TSUBAME-KFC: from Exascale Prototype to the Greenest Supercomputer in the World
    ANALYST REPORT Tokyo Institute of Technology The Immersion Cooled TSUBAME-KFC: From Exascale Prototype to The Greenest Supercomputer in The World In Collaboration with GRC — Republished 2018 Authors: Toshio Endo, Akira Nukada, Satoshi Matsuoka Conference Paper: The 20th IEEE International Conference on Parallel and Distribution Discussions, Stats, and Author Profiles: at: http://www.researchgate.net/publication/275646769 YEARS 11525 Stonehollow Drive, Suite A-150, Austin, TX 78758 512.692.8003 • [email protected] • grcooling.com GRC • 512.692.8003 • [email protected] • grcooling.com Page 1 Toshio Endo Akira Nukada Satoshi Matsuoka Global Science Information and Computing Center, Tokyo Institute of Technology, Japan Email: {endo, matsu}@is.titech.ac.jp, [email protected] Abstract—Modern supercomputer performance is principally limited by power. TSUBAME-KFC is a state-of- the-art prototype for our next-generation TSUBAME3.0 supercomputer and towards future exascale. In collaboration with Green Revolution Cooling (GRC) and others, TSUBAME-KFC submerges compute nodes configured with extremely high processor/component density, into non-toxic, low viscosity coolant with high 260 Celsius flash point, and cooled using ambient / evaporative cooling tower. This minimizes cooling power while all semiconductor components kept at low temperature to lower leakage current. Numerous off-line in addition to on-line power and temperature sensors are facilitated throughout and constantly monitored to immediately observe the effect of voltage/frequency control. As a result,TSUBAME-KFC achieved world No. 1 on the Green500 in Nov. 2013 and Jun. 2014, by over 20% c.f. the nearest competitors. Fig. 1. The breakdown of 1000 times improvement in power efficiency in the ULPHPC project I.
    [Show full text]
  • HPC Strategy and Direction for Meteorological Modelling
    HPC Strategy and direction for Meteorological Modelling Hans Joraandstad [email protected] HPC – A Sun Priority “HPC represents an enormous opportunity for Sun and Sun's partners. We have products today as well as on our future roadmap which are uniquely positioned to gain market share in HPC. I am personally leading the cross-Sun team to grow our position in this opportunity area and am looking forward to rapid success.” John Fowler - Executive Vice President Sun's HPC Technology Strategy Power, Space, Performance Standards based components Balance, Efficiency, Density Open Source, Integratable Low risk solutions Sun Fire Systems for Every Application Delivering Real-World Application Performance Scale Up Scale Out • Large databases • Web services, mail, messaging, security, firewall • Enterprise apps— • Applications server,database, ERP, CRM CRM, ERP, SCM • HPC, Compute Grid solutions • Data warehousing, • Network-facing, I/O intensive business intelligence • Load balancing, business logic • Server consolidation/ migration • Distributed databases • Mainframe rehosting • Server consolidation Your Choice of Operating Systems Sun Joins Open MPI Sun has joined Open MPI with the firm belief that HPC ultra- scale MPI computing requirements are best met through a concerted and collaborative effort for the good of the community as a whole. • Open MPI brings together world-class expertise to deliver ultra-scale MPI capabilities • Sun brings nine years of MPI implementation experience and expertise to the community • Sun engineers will participate as active developers of Open MPI • Sun will ship and support Open MPI for Solaris x64 and SPARC platforms Innovation at Sun System Chip Innovate at the system Innovate level with industry- at the system standard chip.
    [Show full text]
  • TSUBAME 2.0 Begins the Long Road from TSUBAME1.0 to 2.0 (Part One)
    2 TSUBAME 2.0 Begins The long road from TSUBAME1.0 to 2.0 (Part One) Multi-GPU Computing for Next-generation Weather Forecasting - 145.0 TFlops with 3990 GPUs on TSUBAME 2.0 - Computer prediction of protein-protein interaction network using MEGADOCK - application to systems biology - TSUBAME 2.0Begins The long road from TSUBAME1.0 to 2.0 (Part One) Satoshi Matsuoka* * Global Scientific Information and Computing Center The long-awaited TSUBAME2.0 will finally commence production operation in November, 2010. However, the technological evolutionary pathway that stems from times even earlier than TSUBAME1, its direct predecessor, was by no means paved smoothly. In the Part one of this article, we discuss the pros and cons of TSUBAME1, and how they have been addressed to achieve a 30-fold speedup in mere 4.5 years in TSUBAME2.0. newly developed with extreme computational density becomes visible, augmented with mere few cables protruding from each Introduction 1 of its front panels. Overall performance contained in only a single rack that looks more like a large refrigerator is 50 Teraflops, a performance that would have had it ranked number one in the Early October 2010—Opening the door to a room on the ground world mere 8 years ago, comparable to the actual machine the floor of our center, where our admin staff was kept busy handling Earth Simulator that had occupied the entire facility of a large paperwork, a brand new scenery from another world would gymnasium, consisting of more than 600 racks that resembled immediately jump into sight (Figure 1).
    [Show full text]