Fujitsu's Architectures and Collaborations for Weather Prediction and Climate Research

Total Page:16

File Type:pdf, Size:1020Kb

Fujitsu's Architectures and Collaborations for Weather Prediction and Climate Research Fujitsu’s Architectures and Collaborations for Weather Prediction and Climate Research Ross Nobes Fujitsu Laboratories of Europe ECMWF Workshop on HPC in Meteorology, October 2014 Copyright 2014 FUJITSU LIMITED Fujitsu HPC - Past, Present and Future Fujitsu has been developing advanced National supercomputers since 1977 project Exa Post-FX10 FX10 No.1 in Top500 (June and Nov., 2011) K computer FX1 World’s Fastest Most Efficient Vector Processor (1999) Performance in Top500 (Nov. 2008) Next x86 VPP5000 SPARC generation NWT* Enterprise Developed with NAL No.1 in Top500 PRIMEQUEST PRIMERGY CX400 (Nov. 1993) VPP300/700 PRIMEPOWER Gordon Bell Prize HPC2500 (1994, 95, 96) ⒸJAXA VPP500 World’s Most Scalable VP Series PRIMERGY Supercomputer BX900 AP3000 (2003) Cluster node HX600 Cluster node F230-75APU AP1000 PRIMERGY RX200 Cluster node Japan’s Largest Cluster in Top500 Japan’s First (July 2004) Vector (Array) Supercomputer *NWT: Numerical Wind Tunnel (1977) ECMWF Workshop on HPC in Meteorology, October 2014 1 Copyright 2014 FUJITSU LIMITED Fujitsu’s Approach to the HPC Market SPARC64-Based Supercomputers X86 PRIMERGY Supercomputer HPC solutions Divisional Departmental Workgroup Work Group Powerful Workstations ECMWF Workshop on HPC in Meteorology, October 2014 2 Copyright 2014 FUJITSU LIMITED The K computer June 2011 No.1 in TOP500 List (ISC11) November 2011 Consecutive No. 1 in TOP500 List (SC11) November 2011 ACM Gordon Bell Prize Peak-Performance (SC11) November 2012 No.1 in Three HPC Challenge Award Benchmarks (SC12) November 2012 ACM Gordon Bell Prize (SC12) November 2013 No.1 in Class 1 and 2 of the HPC Challenge Awards (SC13) June 2014 No.1 in Graph 500 “Big Data” Supercomputer Ranking (ISC14) Build on the success of the K computer ECMWF Workshop on HPC in Meteorology, October 2014 3 Copyright 2014 FUJITSU LIMITED Evolution of PRIMEHPC K computer PRIMEHPC FX10 Post-FX10 CPU SPARC64 VIIIfx SPARC64 IXfx SPARC64 XIfx Peak perf. 128 GFLOPS 236.5 GFLOPS 1 TFLOPS ~ # of cores 8 16 32 + 2 Memory DDR3 SDRAM ← HMC Interconnect Tofu Interconnect ← Tofu Interconnect 2 System size 11 PFLOPS Max. 23 PFLOPS Max. 100 PFLOPS Link BW 5 GB/s x ← 12.5 GB/s x bidirectional bidirectional ECMWF Workshop on HPC in Meteorology, October 2014 4 Copyright 2014 FUJITSU LIMITED Continuity in Architecture for Compatibility Upwards compatible CPU: Binary-compatible with the K computer & PRIMEHPC FX10 Good byte/flop balance New features: New instructions Improved micro architecture For distributed parallel executions: Compatible interconnect architecture Improved interconnect bandwidth ECMWF Workshop on HPC in Meteorology, October 2014 5 Copyright 2014 FUJITSU LIMITED 32 + 2 Core SPARC64 XIfx Rich micro architecture improves single thread performance 2 additional, Assistant-cores for avoiding OS jitter and non-blocking MPI functions K FX10 Post-FX10 Peak FP performance 128 GF 236.5 GF 1-TF class Core Execution unit FMA 2FMA 2FMA 2 config. SIMD 128 bit 128 bit 256 bit wide Dual SP mode NA NA 2x DP performance Integer SIMD NA NA Support Single thread - - Improved OOO performance execution, better enhancement branch prediction, larger cache ECMWF Workshop on HPC in Meteorology, October 2014 6 Copyright 2014 FUJITSU LIMITED Flexible SIMD Operations New 256-bit wide SIMD functions enable versatile operations Four double-precision calculations Strided load/store, Indirect (list) load/store, Permutation, Concatenation SIMD Strided load Indirect load Permutation Double precision reg S Memory reg S reg S1 Specified stride Memory reg S2 128 Arbitrary shuffle regs simultaneous calculation reg D reg D reg D reg D ECMWF Workshop on HPC in Meteorology, October 2014 7 Copyright 2014 FUJITSU LIMITED Hybrid Memory Cube (HMC) The increased arithmetic performance of the processor needs higher memory bandwidth The required memory bandwidth can almost be met by HMC (480 GB/s) Interconnect is boosted to 12.5 GB/s x 2 (bi-directional) with optical link Peak performance/node K FX10 post-FX10 DP performance (Gflops) 128 236.5 Over 1TF Memory Bandwidth (GB/s) 64 85 480 Interconnect Link Bandwidth (GB/s) 5 5 12.5 ECMWF Workshop on HPC in Meteorology, October 2014 8 Copyright 2014 FUJITSU LIMITED HMC (2) Amount per processor Capacity Memory BW HMC x8 32 GB 480 GB/s DDR4-DIMM x8 32~128 GB 154 GB/s GDDR5 x16 8 GB 320 GB/s HSSD was adopted for main memory since its bandwidth is three times more than DDR4 HMC can deliver the required bandwidth for high performance multi-core processors ECMWF Workshop on HPC in Meteorology, October 2014 9 Copyright 2014 FUJITSU LIMITED Tofu2 Interconnect Successor to Tofu Interconnect Highly scalable, 6-dimensional mesh/torus topology Logical 3D, 2D or 1D torus network from the user’s point of view Increased link bandwidth by 2.5 times to 100 Gbps Scalable three-dimensional torus Well-balanced shape Three-dimensional torus unit available 2×3×2 ECMWF Workshop on HPC in Meteorology, October 2014 10 Copyright 2014 FUJITSU LIMITED A Next Generation Interconnect Optical-dominant: 2/3 of network links are optical 1.2 Optical network links Electrical network links 1 0.8 25 Gbps 0.6 0.4 12.5 Gbps 10 Gbps 14 Gbps 6.25 Gbps 0.2 25 Gbps Gross raw bitrate per node (Tbps) 5 Gbps 14 Gbps 14 Gbps 0 IBM Blue IB-FDR Cray Tofu1 Tofu2 Gene/Q Fat-Tree Cascade ECMWF Workshop on HPC in Meteorology, October 2014 11 Copyright 2014 FUJITSU LIMITED Reduction in the Chip Area Size Process technology shrinks from 65 to 20 nm System-on-chip integration eliminates the host bus interface Chip area shrinks to 1/3 size TNR TNI and TBI Tofu network router (TNR) Tofu network interface (TNI) 34 core processor and PCI Tofu barrier Express with 8 HMC links interface (TBI) root complex Host bus interface Tofu1 < 300mm2 InterConnect Controller (65nm) PCI Express root complex Tofu2 < 100mm2 –SPARC64TM XIfx (20nm) ECMWF Workshop on HPC in Meteorology, October 2014 12 Copyright 2014 FUJITSU LIMITED Throughput of Single Put Transfer Achieved 11.46 GB/s of throughput which is 92% efficiency 16.00 11.46 8.00 4.66 4.00 2.00 1.00 0.50 0.25 0.13 Throughput of Put transfer (GB/s) Tofu2 0.06 Tofu1 0.03 4 16 64 256 1024 4096 Message size (byte) ECMWF Workshop on HPC in Meteorology, October 2014 13 Copyright 2014 FUJITSU LIMITED Throughput of Concurrent Put Transfers Linear increase in throughput without the host bus bottleneck 50 Tofu2 45.82 45 Tofu1 40 35 30 25 Host bus bottleneck 20 17.63 15 10 Throughput of Put transfer (GB/s) 5 0 1-way 2-way 3-way 4-way ECMWF Workshop on HPC in Meteorology, October 2014 14 Copyright 2014 FUJITSU LIMITED Communication Latencies Pattern Method Latency One-way Put 8-byte to memory 0.87 μsec Put 8-byte to cache 0.71 μsec Round-trip Put 8-byte ping-pong by CPU 1.42 μsec Put 8-byte ping-pong by session 1.41 μsec Atomic RMW 8-byte 1.53 μsec ECMWF Workshop on HPC in Meteorology, October 2014 15 Copyright 2014 FUJITSU LIMITED Communication Intensive Applications Fujitsu’s math library provides 3D-FFT functionality with improved scalability for massively parallel execution Significantly improved interconnect bandwidth Optimised process configuration enabled by Tofu interconnect and job manager Optimised MPI library provides high-performance collective communications ECMWF Workshop on HPC in Meteorology, October 2014 16 Copyright 2014 FUJITSU LIMITED Enhanced Software Stack for Post-FX10 ECMWF Workshop on HPC in Meteorology, October 2014 17 Copyright 2014 FUJITSU LIMITED Enduring Programming Model ECMWF Workshop on HPC in Meteorology, October 2014 18 Copyright 2014 FUJITSU LIMITED Smaller, Faster, More Efficient Highly integrated components with high-density packaging. Performance of 1-chassis corresponds to approx. 1-cabinet of K computer. Efficient in space, time, and power Fujitsu designed CPU Chassis Cabinet Tofu 2 TM 12345SPARC64 Memory (12 CPUs) XIfx Board 32 + 2 core CPU Three CPUs 1 CPU/1 node Tofu2 integrated 3 x 8 HMCs (Hybrid Memory Cube) 12 nodes/2U chassis Over 200 nodes/cabinet 8 optical modules (25Gbpsx12ch) Water cooled ECMWF Workshop on HPC in Meteorology, October 2014 19 Copyright 2014 FUJITSU LIMITED Scale-Out Smart for HPC and Cloud Computing Fujitsu Server PRIMERGY CX400 M1 (CX2550 M1 / CX2570 M1) 20 © 2014 FUJITSU PRIMERGY x86 Cluster Series Fujitsu has been developing advanced supercomputers since 1977 National projects Exa Post-FX10 FX10 No.1 in Top500 (June and Nov., 2011) K computer FX1 World’s Fastest Most Efficient Vector Processor (1999) Performance in Top500 (Nov. 2008) Next x86 VPP5000 SPARC generation NWT* Enterprise Developed with NAL No.1 in Top500 PRIMEQUEST PRIMERGY CX400 (Nov. 1993) VPP300/700 PRIMEPOWER Gordon Bell Prize HPC2500 (1994, 95, 96) ⒸJAXA VPP500 World’s Most Scalable VP Series PRIMERGY Supercomputer BX900 AP3000 (2003) Cluster node HX600 Cluster node F230-75APU AP1000 PRIMERGY RX200 Cluster node Japan’s Largest Cluster in Top500 Japan’s First (July 2004) Vector (Array) Supercomputer *NWT: Numerical Wind Tunnel (1977) ECMWF Workshop on HPC in Meteorology, October 2014 21 Copyright 2014 FUJITSU LIMITED Fujitsu PRIMERGY Portfolio TX Family RX Family BX Family CX Family Products CX400 CX420 CX2550 CX2570 Chassis Server Nodes ECMWF Workshop on HPC in Meteorology, October 2014 22 Copyright 2014 FUJITSU LIMITED Fujitsu PRIMERGY CX400 M1 Feature Overview 4 dual-socket nodes in 2U density Up to 24 storage drives Choice of server nodes Dual socket servers with Intel® Xeon® processor E5-2600 v3 product family (Haswell) Optionally up to two GPGPU or co-processor cards DDR4 memory technology
Recommended publications
  • Xinya (Leah) Zhao Abdulahi Abu Outline
    K-Computer Xinya (Leah) Zhao Abdulahi Abu Outline • History of Supercomputing • K-Computer Architecture • Programming Environment • Performance Evaluation • What's next? Timeline of Supercomputing Control Data The Cray era Massive Processing Petaflop Computing Corporation (1960s) (mid-1970s - 1980s) (1990s) (21st century) • CDC 1604 (1960): First • 80 MHz Cray-1 (1976): • Hitachi SR2201 (1996): • Realization: Power of solid state The most successful used 2048 processors large number of small • CDC 6600 (1964): 100 supercomputers in history connected via a fast three processors can be computers were sold at $8 • Vector processor dimensioanl corssbar harnessed to achieve high million each • Introduced chaining in network performance • Gained speed by "farming which scalar and vector • ASCI Red: mesh-based • IBM Blue Gene out" work to peripheral registers generate interim MIMD massively parallel architecture: trades computing elements, results system with over 9,000 processor speed for low freeing the CPU to • The Cray-2 (1985): No compute nodes and well power consumption, so a process actual data chaning and high memory over 12 terabytes of disk large number of • STAR-100: First to use latency with deep storage processors can be used at vector processing pipelinging • ASCI Red was the first air cooled temperature ever to break through the • K computer (2011) : 1 teraflop barrier fastest in the world K Computer is #1 !!! Why K Computer? Purpose: • Perform extremely complex mathematical or scientific calculations (ex: modelling the changes
    [Show full text]
  • Fujitsu PRIMEHPC FX10 Supercomputer
    Brochure Fujitsu PRIMEHPC FX10 Supercomputer Fujitsu PRIMEHPC FX10 Supercomputer Fujitsu's PRIMEHPC FX10 supercomputer provides the ability to address these high magnitude problems by delivering over 23 Petaflops, a quantum leap in processing performance. Combining high performance, scalability, and Green Credentials as well as High Performance High Reliability and Operability in Large reliability with superior energy efficiency, Mean Power Savings Systems PRIMEHPC FX10 further improves on Fujitsu's In today's quest for a greener world the Incorporating RAS functions, proven on supercomputer technology employed in the "K compromise between high performance and mainframe and high-end SPARC64 servers, computer," which achieved the world's environmental footprint is a major issue. At the SPARC64 IXfx processor delivers higher top -ranked performance*¹ in November 2011. heart of PRIMEHPC FX10 are SPARC64 IXfx reliability and operability. The flexible 6D All of the supercomputer's components—from processors that deliver ultra high performance of Mesh/Torus architecture of the Tofu processors to middleware—have been 236.5 Gigaflops and superb power efficiency of interconnect also contributes to overall developed by Fujitsu, thereby delivering high over 2 Gigaflops per watt. reliability. The result is outstanding levels of reliability and operability. The system operation: enhanced by the advanced set of can be scaled to meet customer needs, up to a Application Performance and Simple system management, monitoring, and job 1,024 rack configuration achieving a Development management software, and the highly super -high speed of 23.2 PFLOPS. SPARC64 IXfx processor includes extensions for scalable distributed file system. HPC applications known as HPC-ACE. This plus Fujitsu has developed PRIMEHPC FX10, a wide memory bandwidth, high performance Tofu supercomputer capable of world-class interconnect, Technical Computing Suite, HPC performance of up to 23.2 PFLOPS.
    [Show full text]
  • Germany's Top500 Businesses
    GERMANY’S TOP500 DIGITAL BUSINESS MODELS IN SEARCH OF BUSINESS CONTENTS FOREWORD 3 INSIGHT 1 4 SLOW GROWTH RATES YET HIGH SALES INSIGHT 2 6 NOT ENOUGH REVENUE IS ATTRIBUTABLE TO DIGITIZATION INSIGHT 3 10 EU REGULATIONS ARE WEAKENING INNOVATION INSIGHT 4 12 THE GERMAN FEDERAL GOVERNMENT COULD TURN INTO AN INNOVATION DRIVER CONCLUSION 14 FOREWORD Large German companies are on the lookout. Their purpose: To find new growth prospects. While revenue increases of more than 5 percent on average have not been uncommon for Germany’s 500 largest companies in the past, that level of growth has not occurred for the last four years. The reasons are obvious. With their high export rates, This study is intended to examine critically the Germany’s industrial companies continue to be major opportunities arising at the beginning of a new era of players in the global market. Their exports have, in fact, technology. Accenture uses four insights to not only been so high in the past that it is now increasingly describe the progress that has been made in digital difficult to sustain their previous rates of growth. markets, but also suggests possible steps companies can take to overcome weak growth. Accenture regularly examines Germany’s largest companies on the basis of their ranking in “Germany’s Top500,” a list published every year in the German The four insights in detail: daily newspaper DIE WELT. These 500 most successful German companies generate revenue of more than INSIGHT 1 one billion Euros. In previous years, they were the Despite high levels of sales, growth among Germany’s engines of the German economy.
    [Show full text]
  • Supercomputer Fugaku
    Supercomputer Fugaku Toshiyuki Shimizu Feb. 18th, 2020 FUJITSU LIMITED Copyright 2020 FUJITSU LIMITED Outline ◼ Fugaku project overview ◼ Co-design ◼ Approach ◼ Design results ◼ Performance & energy consumption evaluation ◼ Green500 ◼ OSS apps ◼ Fugaku priority issues ◼ Summary 1 Copyright 2020 FUJITSU LIMITED Supercomputer “Fugaku”, formerly known as Post-K Focus Approach Application performance Co-design w/ application developers and Fujitsu-designed CPU core w/ high memory bandwidth utilizing HBM2 Leading-edge Si-technology, Fujitsu's proven low power & high Power efficiency performance logic design, and power-controlling knobs Arm®v8-A ISA with Scalable Vector Extension (“SVE”), and Arm standard Usability Linux 2 Copyright 2020 FUJITSU LIMITED Fugaku project schedule 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Fugaku development & delivery Manufacturing, Apps Basic Detailed design & General Feasibility study Installation review design Implementation operation and Tuning Select Architecture & Co-Design w/ apps groups apps sizing 3 Copyright 2020 FUJITSU LIMITED Fugaku co-design ◼ Co-design goals ◼ Obtain the best performance, 100x apps performance than K computer, within power budget, 30-40MW • Design applications, compilers, libraries, and hardware ◼ Approach ◼ Estimate perf & power using apps info, performance counts of Fujitsu FX100, and cycle base simulator • Computation time: brief & precise estimation • Communication time: bandwidth and latency for communication w/ some attributes for communication patterns • I/O time: ◼ Then, optimize apps/compilers etc. and resolve bottlenecks ◼ Estimation of performance and power ◼ Precise performance estimation for primary kernels • Make & run Fugaku objects on the Fugaku cycle base simulator ◼ Brief performance estimation for other sections • Replace performance counts of FX100 w/ Fugaku params: # of inst. commit/cycle, wait cycles of barrier, inst.
    [Show full text]
  • Masahori Nunami (NIFS)
    US-Japan Joint Institute for Fusion Theory National Institute for Fusion Science Workshop on Innovations and co-designs of fusion simulations towards extreme scale computing August 20-21, 2015 Nagoya University, Nagoya, Japan Numerical Simulation Research in NIFS and Fujitsu’s New Supercomputer PRIMEHPC FX100 Masanori Nunamia, Toshiyuki Shimizub aNational Institute for Fusion Science, Japan bFujitsu, Japan Outline M. Nunami (NIFS) l Overview of Numerical Simulation Reactor Research Project in NIFS l Supercomputers in NIFS and Japan T. Shimizu (Fujitsu) l Fujitsu’s new supercomputer (FX100) l Features and evaluations l Application evaluation l The next step in exascale capability Page § 2 Research activities in NIFS In NIFS, there exists 7 divisions and 3 projects for fusion research activities. LHD project NSR project FER project Page § 3 Numerical Simulation Reactor Research Project ( NSRP ) NSRP has been launched to continue the tasks in simulation group in NIFS, and evolve them on re-organization of NIFS in 2010. Final Goal Based on large-scale numerical simulation researches by using a super-computer, - To understand and systemize physical mechanisms in fusion plasmas - To realize ultimately the “Numerical Simulation Reactor” which will be an integrated predictive modeling for plasma behaviors in the reactor. Realization of numerical test reactor requires - understanding all element physics in each hierarchy, - inclusion of all elemental physical processes into our numerical models, - development of innovative simulation technology
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Programming on K Computer
    Programming on K computer Koh Hotta The Next Generation Technical Computing Fujitsu Limited Copyright 2010 FUJITSU LIMITED System Overview of “K computer” Target Performance : 10PF over 80,000 processors Over 640K cores Over 1 Peta Bytes Memory Cutting-edge technologies CPU : SPARC64 VIIIfx 8 cores, 128GFlops Extension of SPARC V9 Interconnect, “Tofu” : 6-D mesh/torus Parallel programming environment. 1 Copyright 2010 FUJITSU LIMITED I have a dream that one day you just compile your programs and enjoy high performance on your high-end supercomputer. So, we must provide easy hybrid parallel programming method including compiler and run-time system support. 2 Copyright 2010 FUJITSU LIMITED User I/F for Programming for K computer K computer Client System FrontFront End End BackBack End End (80000 (80000 Nodes) Nodes) IDE Interface CCommommandand JobJob C Controlontrol IDE IntInteerfrfaceace debugger DDebugebuggerger App IntInteerfrfaceace App Interactive Debugger GUI Data Data Data CoConvnveerrtteerr Data SamplSampleerr debugging partition official op VisualizedVisualized SSaammpplingling Stage out partition Profiler DaDattaa DaDattaa (InfiniBand) 3 Copyright 2010 FUJITSU LIMITED Parallel Programming 4 Copyright 2010 FUJITSU LIMITED Hybrid Parallelism on over-640K cores Too large # of processes to manipulate To reduce number of processes, hybrid thread-process programming is required But Hybrid parallel programming is annoying for programmers Even for multi-threading, procedure level or outer loop parallelism was desired Little opportunity for such coarse grain parallelism System support for “fine grain” parallelism is required 5 Copyright 2010 FUJITSU LIMITED Targeting inner-most loop parallelization Automatic vectorization technology has become mature, and vector-tuning is easy for programmers. Inner-most loop parallelism, which is fine-grain, should be an important portion for peta-scale parallelization.
    [Show full text]
  • Presentación De Powerpoint
    Towards Exaflop Supercomputers Prof. Mateo Valero Director of BSC, Barcelona National U. of Defense Technology (NUDT) Tianhe-1A • Hybrid architecture: • Main node: • Two Intel Xeon X5670 2.93 GHz 6-core Westmere, 12 MB cache • One Nvidia Tesla M2050 448-ALU (16 SIMD units) 1150 MHz Fermi GPU: • 32 GB memory per node • 2048 Galaxy "FT-1000" 1 GHz 8-core processors • Number of nodes and cores: • 7168 main nodes * (2 sockets * 6 CPU cores + 14 SIMD units) = 186368 cores (not including 16384 Galaxy cores) • Peak performance (DP): • 7168 nodes * (11.72 GFLOPS per core * 6 CPU cores * 2 sockets + 36.8 GFLOPS per SIMD unit * 14 SIMD units per GPU) = 4701.61 TFLOPS • Linpack performance: 2.507 PF 53% efficiency • Power consumption 4.04 MWatt Source http://blog.zorinaq.com/?e=36 Cartagena, Colombia, May 18-20 Top10 Rank Site Computer Procs Rmax Rpeak 1 Tianjin, China XeonX5670+NVIDIA 186368 2566000 4701000 2 Oak Ridge Nat. Lab. Cray XT5,6 cores 224162 1759000 2331000 3 Shenzhen, China XeonX5670+NVIDIA 120640 1271000 2984300 4 GSIC Center, Tokyo XeonX5670+NVIDIA 73278 1192000 2287630 5 DOE/SC/LBNL/NERSC Cray XE6 12 cores 153408 1054000 1288630 Commissariat a l'Energie Bull bullx super-node 6 138368 1050000 1254550 Atomique (CEA) S6010/S6030 QS22/LS21 Cluster, 7 DOE/NNSA/LANL PowerXCell 8i / Opteron 122400 1042000 1375780 Infiniband National Institute for 8 Computational Cray XT5-HE 6 cores 98928 831700 1028850 Sciences/University of Tennessee 9 Forschungszentrum Juelich (FZJ) Blue Gene/P Solution 294912 825500 825500 10 DOE/NNSA/LANL/SNL Cray XE6 8-core 107152 816600 1028660 Cartagena, Colombia, May 18-20 Looking at the Gordon Bell Prize • 1 GFlop/s; 1988; Cray Y-MP; 8 Processors • Static finite element analysis • 1 TFlop/s; 1998; Cray T3E; 1024 Processors • Modeling of metallic magnet atoms, using a variation of the locally self-consistent multiple scattering method.
    [Show full text]
  • Financing the Future of Supercomputing: How to Increase
    INNOVATION FINANCE ADVISORY STUDIES Financing the future of supercomputing How to increase investments in high performance computing in Europe years Financing the future of supercomputing How to increase investment in high performance computing in Europe Prepared for: DG Research and Innovation and DG Connect European Commission By: Innovation Finance Advisory European Investment Bank Advisory Services Authors: Björn-Sören Gigler, Alberto Casorati and Arnold Verbeek Supervisor: Shiva Dustdar Contact: [email protected] Consultancy support: Roland Berger and Fraunhofer SCAI © European Investment Bank, 2018. All rights reserved. All questions on rights and licensing should be addressed to [email protected] Financing the future of supercomputing Foreword “Disruptive technologies are key enablers for economic growth and competitiveness” The Digital Economy is developing rapidly worldwide. It is the single most important driver of innovation, competitiveness and growth. Digital innovations such as supercomputing are an essential driver of innovation and spur the adoption of digital innovations across multiple industries and small and medium-sized enterprises, fostering economic growth and competitiveness. Applying the power of supercomputing combined with Artificial Intelligence and the use of Big Data provide unprecedented opportunities for transforming businesses, public services and societies. High Performance Computers (HPC), also known as supercomputers, are making a difference in the everyday life of citizens by helping to address the critical societal challenges of our times, such as public health, climate change and natural disasters. For instance, the use of supercomputers can help researchers and entrepreneurs to solve complex issues, such as developing new treatments based on personalised medicine, or better predicting and managing the effects of natural disasters through the use of advanced computer simulations.
    [Show full text]
  • 40 Jahre Erfolgsgeschichte BS2000 Vortrag Von 2014
    Fujitsu NEXT Arbeitskreis BS2000 40 Jahre BS2000: Eine Erfolgsgeschichte Achim Dewor Fujitsu NEXT AK BS2000 5./6. Mai 2014 0 © FUJITSU LIMITED 2014 Was war vor 40 Jahren eigentlich so los? (1974) 1 © FUJITSU LIMITED 2014 1974 APOLLO-SOJUS-KOPPLUNG Die amtierenden Staatspräsidenten Richard Nixon und Leonid Breschnew unterzeichnen das Projekt. Zwei Jahre arbeitet man an den Vorbereitungen für eine neue gemeinsame Kopplungsmission 2 © FUJITSU LIMITED 2014 1974 Der erste VW Golf wird ausgeliefert 3 © FUJITSU LIMITED 2014 1974 Willy Brandt tritt zurück 6. MAI 1974 TRITT BUNDESKANZLER WILLY BRANDT IM VERLAUF DER SPIONAGE-AFFÄRE UM SEINEN PERSÖNLICHEN REFERENTEN, DEN DDR-SPION GÜNTER GUILLAUME 4 © FUJITSU LIMITED 2014 1974 Muhammad Ali gegen George Foreman „Ich werde schweben wie ein Schmetterling, zustechen wie eine Biene", "Ich bin das Größte, das jemals gelebt hat“ "Ich weiß nicht immer, wovon ich rede. Aber ich weiß, dass ich recht habe“ (Zitate von Ali) 5 © FUJITSU LIMITED 2014 1974 Deutschland wird Fußball-Weltmeister Breitners Elfer zum Ausgleich und das unvergängliche, typische Müller- Siegtor aus der Drehung. Es war der sportliche Höhepunkt der wohl besten deutschen Mannschaft aller Zeiten. 6 © FUJITSU LIMITED 2014 Was ist eigentlich „alt“? These: „Stillstand macht alt …. …. ewig jung bleibt, was laufend weiterentwickelt wird!“ 7 © FUJITSU LIMITED 2014 Was ist „alt“ - was wurde weiterentwickelt ? Wird das Fahrrad bald 200 Jahre alt !? Welches? Das linke oder das rechte? Fahrrad (Anfang 1817) Fahrrad (2014) 8 © FUJITSU LIMITED 2014 Was ist „alt“ - was wurde weiterentwickelt ? Soroban (Japan) Im Osten, vom Balkan bis nach China, wird er hier und da noch als preiswerte Rechenmaschine bei kleineren Geschäften verwendet.
    [Show full text]
  • Meteorological and Climate Modelling | GRNET PRACE TRAINING CENTER July 4, 2018
    Meteorological and climate modelling | GRNET PRACE TRAINING CENTER July 4, 2018 1 GRNET PTC, July 4, 2018 @GRNET HQ, Athens, Greece www.prace-ri.eu Agenda – Wed 4 July 2018 ▶ 10:30 Introduction to PRACE – Aris Sotiropoulos ▶ 11:00 Numerical modelling with the Weather Research and Forecasting (WRF) model: Introduction - Applications - Best practices – Dr Theodore Giannaros ▶ 13:30 Break (snack/coffee) ▶ 14:00 Regional climate modeling with WRF – Dr Stergios Kartsios ▶ 16:30 Open Discussion – Dimitris Dellis 2 GRNET PTC, July 4, 2018 @GRNET HQ, Athens, Greece www.prace-ri.eu PRACE | The Research Infrastructure Supporting Science and Industry through Provision of High Performance Computing Resources Aris Sotiropoulos HPC Project Manager GRNET SA 3 GRNET PTC, July 4, 2018 @GRNET HQ, Athens, Greece www.prace-ri.eu PRACE | what we do ▶ Open access to best-of-breed HPC systems to EU scientists and researchers ▶ Variety of architectures to support the different scientific communities ▶ High standards in computational science and engineering ▶ Peer review on European scale to foster scientific excellence ▶ Robust and persistent funding scheme for HPC supported by national governments and European Commission (EC) 4 GRNET PTC, July 4, 2018 @GRNET HQ, Athens, Greece www.prace-ri.eu PRACE | 25 members Hosting Members General Partners (PRACE 2) ▶ France ▶ Belgium ▶ Luxembourg ▶ Germany ▶ Bulgaria ▶ Netherlands ▶ Italy ▶ Cyprus ▶ Norway ▶ Spain ▶ Czech Republic ▶ Poland ▶ Switzerland ▶ Denmark ▶ Portugal ▶ Finland ▶ Slovakia ▶ Greece ▶ Slovenia ▶ Hungary
    [Show full text]
  • Toward Automated Cache Partitioning for the K Computer
    IPSJ SIG Technical Report Toward Automated Cache Partitioning for the K Computer Swann Perarnau1 Mitsuhisa Sato1,2 Abstract: The processor architecture available on the K computer (SPARC64VIIIfx) features an hardware cache par- titioning mechanism called sector cache. This facility enables software to split the memory cache in two independent sectors and to select which one will receive a line when it is retrieved from memory. Such control over the cache by an application enables significant performance optimization opportunities for memory intensive programs, that several studies on software-controlled cache partitioning environments already demonstrated. Most of these previous studies share the same implementation idea: the use of page coloring and an overriding of the operating system virtual memory manager. However, the sector cache differs in several key points over these imple- mentations, making the use of the existing analysis or optimization strategies impractical, while enabling new ones. For example, while most studies overlooked the issue of phase changes in an application because of the prohibitive cost of a repartitioning, the sector cache provides this feature without any costs, allowing multiple cache distribution strategies to alternate during an execution, providing the best allocation of the current locality pattern. Unfortunately, for most application programmers, this hardware cache partitioning facility is not easy to take advantage of. It requires intricate knowledge of the memory access patterns of a code and the ability to identify data structures that would benefit from being isolated in cache. This optimization process can be tedious, with multiple code modifications and countless application runs necessary to achieve a good optimization scheme.
    [Show full text]