Many Integrated Core Prototype G

Total Page:16

File Type:pdf, Size:1020Kb

Many Integrated Core Prototype G Many Integrated Core Prototype G. Erbacci – CINECA PRACE Autumn School 2012 on Massively Parallel Architectures and Molecular Simulations Sofia, 24-28 September 2012 Outline • HPC evolution • The Eurora Prototype • MIC architecture • Programming MIC 2 Many Integrated Core Prototype • HPC evolution • The Eurora Prototype • MIC architecture • Programming MIC 3 HPC at CINECA CINECA: National Supercomputing Centre in Italy • manage the HPC infrastructure • provide support to Italian and European researchers • promote technology transfer initiatives for industry • CINECA is a Hosting Member in PRACE – PLX: Cluster Linux with GPUs (Tier-1 in PRACE) – FERMI: IBM BG/Q (Tier-0 in PRACE) 4 PLX@CINECA IBM Cluster linux Processor type: 2 six-cores Intel Xeon (Esa-Core Westmere) X 5645 @ 2.4 GHz, 12MB Cache N. of nodes / cores: 274 / 3288 RAM: 48 GB/Compute node (14 TB in total) Internal Network: Infiniband with 4x QDR switches (40 Gbps) Acccelerators: 2 GPUs nVIDIA M2070 per node 548 GPUs in total Peak performance: 32 Tflops 565 TFlops SP GPUs 283 TFlops DP GPUs 5 FERMI@CINECA Architecture: 10 BGQ Frames Model: IBM-BG/Q Processor type: IBM PowerA2 @1.6 GHz Computing Cores: 163840 Computing Nodes: 10240 RAM: 1GByte / core (163 PByte total) Internal Network: 5D Torus Disk Space: 2PByte of scratch space Peak Performance: 2PFlop/s N. 7 in Top 500 rank (June 2012) National and PRACE Tier-0 calls 6 CINECA HPC Infrastructure 7 Computational Sciences Computational science (with theory and experimentation), is the “third pillar” of scientific inquiry, enabling researchers to build and test models of complex phenomena Quick evolution of innovation: - Instantaneous communication - Geographically distributed work - Increased productivity - More data everywhere - Increasing problem complexity - Innovation happens worldwide 8 Technology Evolution More data everywhere: Radar, satellites, CAT scans, sensors, micro-arrays weather models, the human genome. The size and resolution of the problems scientists address today are limited only by the size of the data they can reasonably work with. There is a constantly increasing demand for faster processing on bigger data. Increasing problem complexity Partly driven by the ability to handle bigger data, but also by the requirements and opportunities brought by new technologies. For example, new kinds of medical scans create new computational challenges. HPC Evolution As technology allows scientists to handle bigger datasets and faster computations, they push to solve harder problems. In turn, the new class of problems drives the next cycle of technology innovation. 9 Top 500: some facts 1976 Cray 1 installed at Los Alamos: peak performance 160 MegaFlop/s (106 flop/s) 1993 (1° Edition Top 500) N. 1 59.7 GFlop/s (1012 flop/s) 1997 Teraflop/s barrier (1012 flop/s) 2008 Petaflop/s (1015 flop/s): Roadrunner (LANL) Rmax 1026 Gflop/s, Rpeak 1375 Gflop/s hybrid system: 6562 processors dual-core AMD Opteron accelerated with 12240 IBM Cell processors (98 TByte di RAM) 2012 (J) 16.3 Petaflop/s : Lawrence Livermore’s Sequoia Supercomputer BlueGene/Q, (1.572.864 cores) - 4 European systems in the Top 10 - Total combined performance of all 500 systems has grown to 123.02 Pflop/s, compared to 74.2 Pflop/s six months ago - 57 systems use accelerators - - - - Toward Exascale 10 Dennard Scaling law (MOSFET) • L’ = L / 2 do not hold anymore! • V’ = V / 2 The core frequency and performance do not • F’ = F * 2 grow following the • D’ = 1 / L2 = 4D Moore’s law any longer • P’ = P L’ = L / 2 CPU + Accelerator V’ = ~V to maintain the F’ = ~F * 2 architectures evolution In the Moore’s law D’ = 1 / L2 = 4 * D P’ = 4 * P Programming crisis! The power crisis! 11 Roadmap to Exascale(architectural trends) 12 Heterogeneous Multi-core Architecture • Combines different types of processors – Each optimized for a different operational modality • Performance – Synthesis favors superior performance • For complex computation exhibiting distinct modalities • Purpose-designed accelerators – Integrated to significantly speedup some critical aspect of one or more important classes of computation – IBM Cell architecture, ClearSpeed SIMD attached array processor, • Conventional co-processors – Graphical processing units (GPU) – Network controllers (NIC) – Many Integrated Cores (MIC ) – Efforts underway to apply existing special purpose components to general applications 13 Accelerators A set (one or more) of very simple execution units that can perform few operations (with respect to standard CPU) with very high efficiency. When combined with full featured CPU (CISC or RISC) can accelerate the “nominal” speed of a system. CPU ACC. Single thread perf. throughput ACCCPU. Physical integration CPU & ACC Architectural integration 14 nVIDIA GPU Fermi implementation packs 512 processor cores 15 ATI FireStream, AMD GPU 2012 New Graphics Core Next “GCN” With new instruction set and new SIMD design 16 Intel MIC (Knight Ferry) 17 Real HPC Crisis is with Software A supercomputer application and software are usually much more long-lived than a hardware - Hardware life typically four-five years at most. - Fortran and C are still the main programming models Programming is stuck - Arguably hasn’t changed so much since the 70’s Software is a major cost component of modern technologies - The tradition in HPC system procurement is to assume that the software is free. It’s time for a change - Complexity is rising dramatically - Challenges for the applications on Petaflop systems - Improvement of existing codes will become complex and partly impossible - The use of O(100K) cores implies dramatic optimization effort - New paradigm as the support of a hundred threads in one node implies new parallelization strategies - Implementation of new parallel programming methods in existing large applications has not always a promising perspective There is the need for new community codes 18 What about parallel App? • In a massively parallel context, an upper limit for the scalability of parallel applications is determined by the fraction of the overall execution time spent in non-scalable operations (Amdahl's law). maximum speedup tends to 1 / ( 1 − P ) P= parallel fraction 1000000 core P = 0.999999 serial fraction= 0.000001 19 Trends Scalar Application MPP System, Message Passing: MPI Vector Multi core nodes: OpenMP Distributed memory Accelerator (GPGPU, FPGA): Cuda, OpenCL Shared Memory Hybrid codes 20 Many Integrated Core Prototype • HPC evolution • The Eurora Prototype • MIC architecture • Programming MIC 21 EURORA Prototype • Evolution of AURORA architecture by Eurotech (http://www.eurotech.com/) – Aurora Rack: 256 Nodes: 512 CPUs – 101 Tflops @ 100 KW – liquid cooled • CPU: Xeon Sandy Bridge (SB) – Up to One full cabinet (128 nodes + 256 accelerators) • Accelerator: Intel Many Integrated Cores (MIC) • Network architecture: IB and Torus interconnect – Low Latency/High Bandwidth Interconnect • Cooling: Hot Water 22 EURORA chassis 1 rack, 16 chassis 16 nodes card or 8 nodes card + 16 accelerators Eurora Rack Physical dimensions: 2133mm(48U) h, 1095mm w, 1500 mm d; Weight (full rack with cooling fully loaded with water): 2000Kg Power/Cooling typical requirements: 120-130 kW @ 48 Vdc 23 EURORA node • 2 Intel Xeon E5 2 Intel MIC or 2 nVidia Kepler 16GByte DDR3 1.6GHz SSD disk 24 Node card mockup • Presented at ISC12 • Can host MIC and K20 cards • Thermal analysis and validation performed 25 EURORA Network 3D Torus custom network FPGA (Altera Stratix V) EXTOLL, APENET Ad-hoc MPI subset InfiniBand FDR Mellanox ConnectX3 MPI + Filesystem Synch 26 Cooling • Hot water 50-80C • Temperature gap 3-5C • No rotating fans • Cold plates –direct on component liquid cooling • Dry chillers • Free cooling Quick disconnect • Temperature sensors – downgrade performance is required • System isolation 27 EURORA prototype (Node Accelerator) EURopean many integrated cORe Architecture Goal: evaluate a new architecture for next generation Tier-0 system Partners: - CINECA, Italy - GRNET, Greece - IPB, Serbia - NCSA, Bulgaria Vendor: Eurotech, Italy 28 EURORA Installation Plan 29 HW Procurement • Contract with EUROTECH signed in July – 64 compute card – 128 Xeon SandyBridge 3.1GHz – 16GByte DDR3 1600MHz per node – 160GByte SSD per node – 1 FPGA (Altera Stratix V) per node – IB FDR – 128 Accelerator cards • INTEL KNC (or NVIDA K20) – Thermal sensors network 30 HW Procurement and Facility • Contract with EUROTECH signed in July • Integration in the Facility – First assessment of the location with EUROTECH in May – First project of integration completed • Estimated cost higher than budgeted – Second assessment with EUROTECH in September (before the end) – Procurement of the technology: • Dry coolers, pipes and pumps, exchanger, tanks, filters 31 Some Applications • www.quantum-espresso.org www.gromacs.org 32 EURORA Programming Models • Message Passing (MPI) • Shared Memory (OpenMP, TBB) • MIC offload (pragmas) / native • Hybrid: MPI + OpenMP + MIC extensions/OpenCL 33 ACCELERATORS • First K20 and KNC (dense form factor) samples in September • KNC standard expansion module, already available to start the work on software. 34 Software • Installation of the KNC software kit • Test of the compiler, and node card HW • First simple (MPI+OpenMP) application test • First Mic-to-Mic MPI communication test – Intel MPI – within the same node • Test of the affinity 35 ACCESS • Access will be granted upon request to the partners of the prototype project. • Other requests will be evaluated case by case. • We are working
Recommended publications
  • Partner Directory Wind River Partner Program
    PARTNER DIRECTORY WIND RIVER PARTNER PROGRAM The Internet of Things (IoT), cloud computing, and Network Functions Virtualization are but some of the market forces at play today. These forces impact Wind River® customers in markets ranging from aerospace and defense to consumer, networking to automotive, and industrial to medical. The Wind River® edge-to-cloud portfolio of products is ideally suited to address the emerging needs of IoT, from the secure and managed intelligent devices at the edge to the gateway, into the critical network infrastructure, and up into the cloud. Wind River offers cross-architecture support. We are proud to partner with leading companies across various industries to help our mutual customers ease integration challenges; shorten development times; and provide greater functionality to their devices, systems, and networks for building IoT. With more than 200 members and still growing, Wind River has one of the embedded software industry’s largest ecosystems to complement its comprehensive portfolio. Please use this guide as a resource to identify companies that can help with your development across markets. For updates, browse our online Partner Directory. 2 | Partner Program Guide MARKET FOCUS For an alphabetical listing of all members of the *Clavister ..................................................37 Wind River Partner Program, please see the Cloudera ...................................................37 Partner Index on page 139. *Dell ..........................................................45 *EnterpriseWeb
    [Show full text]
  • Technological Forecasting of Supercomputer Development: the March to Exascale Computing
    Portland State University PDXScholar Engineering and Technology Management Faculty Publications and Presentations Engineering and Technology Management 10-2014 Technological Forecasting of Supercomputer Development: The March to Exascale Computing Dong-Joon Lim Portland State University Timothy R. Anderson Portland State University, [email protected] Tom Shott Portland State University, [email protected] Follow this and additional works at: https://pdxscholar.library.pdx.edu/etm_fac Part of the Engineering Commons Let us know how access to this document benefits ou.y Citation Details Lim, Dong-Joon; Anderson, Timothy R.; and Shott, Tom, "Technological Forecasting of Supercomputer Development: The March to Exascale Computing" (2014). Engineering and Technology Management Faculty Publications and Presentations. 46. https://pdxscholar.library.pdx.edu/etm_fac/46 This Post-Print is brought to you for free and open access. It has been accepted for inclusion in Engineering and Technology Management Faculty Publications and Presentations by an authorized administrator of PDXScholar. Please contact us if we can make this document more accessible: [email protected]. Technological forecasting of supercomputer development: The march to Exascale computing Dong-Joon Lim *, Timothy R. Anderson, Tom Shott Dept. of Engineering and Technology Management, Portland State University, USA Abstract- Advances in supercomputers have come at a steady pace over the past 20 years. The next milestone is to build an Exascale computer however this requires not only speed improvement but also significant enhancements for energy efficiency and massive parallelism. This paper examines technological progress of supercomputer development to identify the innovative potential of three leading technology paths toward Exascale development: hybrid system, multicore system and manycore system.
    [Show full text]
  • Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures
    Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures José I. Aliaga PPAM-PEAC – Warsaw (Poland) September, 2013 Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures . Universidad Jaime I (Castellón, Spain) . José I. Aliaga . Maribel Castillo . Juan C. Fernández . Germán León . Joaquín Pérez . Enrique S. Quintana-Ortí . Innovative and Computing Lab (Univ. Tennessee, USA) . Hartwig Antz PPAM-PEAC – Warsaw (Poland) September, 2013 Concurrency and energy efficiency 2010 PFLOPS (1015 flops/sec.) 2010 JUGENE . 109 core level (PowerPC 450, 850MHz → 3.4 GFLOPS) . 101 node level (Quad-Core) . 105 cluster level (73.728 nodes) PPAM-PEAC – Warsaw (Poland) September, 2013 Concurrency and energy efficiency 2010 PFLOPS (1015 flops/sec.) 2020 EFLOPS (1018 flops/sec.) 2010 JUGENE . 109 core level . 109.5 core level (PowerPC 450, 850MHz → 3.4 GFLOPS) . 101 node level . 103 node level! (Quad-Core) . 105 cluster level . 105.5 cluster level (73.728 nodes) PPAM-PEAC – Warsaw (Poland) September, 2013 Concurrency and energy efficiency . Green500/Top500 (November 2010) Rank Site, Computer #Cores MFLOPS/W LINPACK MW to (TFLOPS) EXAFLOPS? Green/Top 1/115 NNSA/SC Blue Gene/Q Prototype 8.192 1.684’20 65’35 593’75 23 2,7 39 2,7 11/1 NUDT TH MPP, X5670 2.93Ghz 6C, 186.368 635’15 2.566’00’0 1.574’43 NVIDIA GPU, FT-1000 8C Most powerful reactor under construction in France Flamanville (EDF, 2017 for US $9 billion): 1,630 MWe PPAM-PEAC – Warsaw (Poland) September, 2013 Concurrency and energy efficiency BlueGene/Q PPAM-PEAC – Warsaw (Poland) September, 2013 Concurrency and energy efficiency .
    [Show full text]
  • Products & Services of the Forum Teratec 2013 Exhibitors
    Press Release - May 2013 FORUM TERATEC 2013 25 & 26 JUNE 2013 - PALAISEAU PRODUCTS & SERVICES OF THE FORUM TERATEC 2013 EXHIBITORS During these two days, there will be an exhibition covering the whole HPC industry . Systems manufacturers and software vendors, integrators and distributors, service providers, academic and laboratory researchers, public and private sector developers will present their latest HPC innovations. • Exhibitors list ACTIVEON - ALINEOS - ALLIANCE SERVICES PLUS - ALLINEA SOFTWARE - ALTAIR ENGINEERING - ALTRAN - ALYOTECH - ANSYS France - BARCO - BULL - CAPS ENTREPRISE - CARRI SYSTEMS - CEA - CLUSTERVISION - COMMUNICATION & SYSTEMES - DATADIRECT NETWORKS - DELL - EMC - ENGIN SOFT - ESI GROUP - EUROTECH - EXASCALE COMPUTING RESEARCH LAB - FUJITSU - GENCI - HEWLETT PACKARD- IBM - IFPEN - INRIA - INTEL - IRT SYSTEMX - KALRAY - NAFEMS - NETAPP - NICE SOFTWARE - NVIDIA - OPENSIDES - OXALYA - PANASAS - RITTAL - ROGUE WAVE - SCILAB - SGI - SILKAN - SOGETI HIGH TECH - ST MICROELECTRONICS - SYSFERA - SYSTEMATIC - SYSTEMX IRT - TERATEC - TOTALINUX - TRANSTEC Here is a first outline of the products and services which you'll find "in live" on the show: ALINEOS Stand 39 Contact presse: Fabien DEVILAINE Tel: +33 (0) 1 64 78 57 65 Mel: [email protected] ALINEOS: Expert for Scientific Computing Since the creation, more than 600 HPC clusters (integrating until several thousand cores) have been installed by ALINEOS in the major European research centers and laboratories, as well as in public and private sectors. In 2012, the company has strengthened its sales and technical teams by creating a department dedicated to industrial customers. Thanks to that, it benefits today from ressources enabling ALINEOS to guide its customers in their HPC projects and disposes of its own datacenter hosting servers and clusters (Calcul on Demand and Benchmark).
    [Show full text]
  • Caches: On-Chip Cache  Fast, but Expensive Off-Chip Cache  Thus Small, Order of MB
    Parallel Computer Architecture - Basics - Christian Terboven <[email protected]> 11.03.2013 / Aachen, Germany Stand: 01.03.2013 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda Overview: HPC Systems Processor Microarchitecture Shared-Memory Parallel Systems Distributed-Memory Parallel Systems (Cluster) General Purpose Graphic Processing Units (GPGPUs) Intel Xeon Phi Coprocessor Summary and Conclusions RZ: Christian Terboven Folie 2 Overview: HPC Systems RZ: Christian Terboven Folie 3 Performance Metrics FLOPS = Floating Point Operation per Second Megaflops = 106 FLOPS Gigaflops = 109 FLOPS Teraflops = 1012 FLOPS Petaflops = 1015 FLOPS Memory Bandwidth: Rate at which data can be read from or stored into a semiconductor memory by a processor. Memory Latency: Delay incurred when a processors accesses data inside the main memory (data not in the cache). RZ: Christian Terboven Folie 4 LINPACK Benchmark The theoretical peak performance is defined by the clock rate and cannot be achieved by a real application. The 500 fastest computer systems of the world are compared in the Top500 list: Updated in June at ISC (Germany) and November at SC (USA) LINPACK benchmark (www.top500.org): Parallel solution of a dense (random) linear equation system, performance measured in FLOPS Currently fastest system: Titan (USA), 27112.5 TFlop/s, 8209 kW power Cluster of 560640 cores (2.2 GHz AMD Opteron) + NVIDIA K20x GPGPU Example: Our Bull Cluster with about 1500 nodes Peak: 292.135,94 GFLOPS, 25448 cores (3.0 GHz Intel Xeon) Linpack: 219,8 TFLOPS (ranked 32 in 06/2011, ranked 111 in 11/2012) RZ: Christian Terboven Folie 5 Top500 List Statistics: Processor Family X86 has evolved as the main architecture for HPC systems.
    [Show full text]
  • Connecting Business with Machines in the Cloud-Based World Of
    AN IOT BLUEPRINT 2015 Connecting Business with Machines in the Cloud-Based World of IoT Eurotech® combines Intel-based hardware, Wind River Linux software, Oracle Java Virtual Machine, McAfee security technology, networking expertise, and a global cloud-based infrastructure to create a framework for Internet of Things developers. Brought to you by An ioT Blueprint | Connecting Business with Machines in a Cloud-Based World of IoT Connecting Business with Machines in the Cloud-Based World of IoT Table of Contents Eurotech® combines Intel-based hardware, Wind River Linux software, Oracle Executive Summary 2 Java Virtual Machine, McAfee security technology, networking expertise, and a Key Business Objectives 2 Business Challenges 2 global cloud-based infrastructure to create a framework for Internet of Things Solution Benefits 2 developers. Solution Overview 4 Executive Summary networks for decades, usually by engineering Key Components 4 or customizing the hardware and software Adding value is all about focusing on your core Solution Capabilities 6 from scratch. That forced system integrators to strengths while leveraging those of others. As the Example Use Models 6 develop networking, embedded and operational “Internet of Things” becomes more complicated Technology 7 technology expertise whether they liked it or and diverse, it’s important to make a solid IoT Tenets 8 not. A firm specializing in medical equipment, business case for attaching all of those “things” Summary 8 for example, doesn’t typically also employ Resources 9 experts on protocol stacks, low-level driver code, operating system kernels, or quality-of-service (QoS) algorithms. Distributed, industrial-grade networking expertise shouldn’t be a mandatory requirement for all system developers.
    [Show full text]
  • Aurora the Ultimate Accelerated Computing Solution
    Aurora The Ultimate Accelerated Computing Solution Achieve top results with uncompromised acceleration. Modularity / flexibility. Allows the best fit to business needs thanks to a choice of configurable modules The Aurora® Hi√e HPC systems are optimized to accelerate Bridge to the future. Intel and ARM-64 based nodes workloads, offering performance, energy efficiency, Superb RAS. Based on Eurotech embedded/ruggedized density and flexibility like never before. technology. They allow the best adaptation to applications, accelerating them with the configuration that minimizes time to solution. Why Aurora Hi√e? Features Optimized for accelerated workloads. Designed Highest density. >1 PFlops / rack to fit the application needs, the Hi√e supports multiple Extreme efficiency > 5 GFlops/Watt accelerated configurations that push the workload speed up to the top Best Flexibility. A choice of modules for Energy efficiency. Optimized architecture and direct different configurations hot liquid cooling to maximize Flops/watt and minimize the Low power CPU. Intel and ARM-64 based nodes data center PUE. Acceleration. Multiple accelerators per CPU Liquid cooling. Entirely cooled with the Aurora Direct Hot Water Cooling Scalability. Scale over hundreds of accelerators Silence. No fans needed : High Velocity HPC Systems Aurora Hi√e is a line of complete and robust HPC systems, built on an innovative supercomputing architecture that allows acceleration, optimization and flexibility. Entirely hot water cooled, highly dense and compact, provided with software stack and monitoring tools, the Hi√e series delivers a quality, reliable HPC solution. Applications Typical applications running on Hi√e include: Oil&Gas (seismic), Life Science (Molecular Dynamics, Quantum Chemistry, NGS), Medical Imaging, Rendering, Deep Learning, Computational Finance, Data Analytics, CAE, Physics (LQCD).
    [Show full text]
  • NVIDIA GPU-Accelerated Supercomputer Sets World Record for Energy Efficiency
    NVIDIA GPU-Accelerated Supercomputer Sets World Record for Energy Efficiency Italy's Top Supercomputing Center Leapfrogs Current Green500 No. 1 System Using Kepler Accelerators, Liquid Cooling SANTA CLARA, CA -- Italy's "Eurora" supercomputer -- which uses NVIDIA® Tesla® GPU accelerators based on NVIDIA Kepler™, the world's fastest and most efficient high performance computing (HPC) architecture -- has set a new record for data center energy efficiency, NVIDIA today announced. The Eurora supercomputer, built by Eurotech and deployed Wednesday at the Cineca facility in Bologna, Italy, the country's most powerful supercomputing center, reached 3,150 megaflops per watt of sustained performance(1) -- a mark 26 percent higher than the top system on the most recent Green500 list of the world's most efficient supercomputers.(2) Eurora broke the record by combining 128 high-performance, energy-efficient NVIDIA Tesla K20 accelerators with the Eurotech Aurora Tigon supercomputer, featuring innovative Aurora Hot Water Cooling technology, which uses direct hot water cooling on all electronic and electrical components of the HPC system. Available to members of the Partnership for Advanced Computing in Europe (PRACE) and major Italian research entities, Eurora will enable scientists to advance research and discovery across a range of scientific disciplines, including material science, astrophysics, life sciences and Earth sciences. "Advanced computer simulations that enable scientists to discover new phenomena and test hypotheses require massive amounts
    [Show full text]
  • Connecting Business with Machines in the Cloud-Based World Of
    AN IOT BLUEPRINT 2015 Connecting Business with Machines in the Cloud-Based World of IoT Eurotech® combines Intel-based hardware, Wind River Linux software, Oracle Java Virtual Machine, McAfee security technology, networking expertise, and a global cloud-based infrastructure to create a framework for Internet of Things developers. Brought to you by An ioT Blueprint | Connecting Business with Machines in a Cloud-Based World of IoT Connecting Business with Machines in the Cloud-Based World of IoT Table of Contents Eurotech® combines Intel-based hardware, Wind River Linux software, Oracle Executive Summary 2 Java Virtual Machine, McAfee security technology, networking expertise, and a Key Business Objectives 2 Business Challenges 2 global cloud-based infrastructure to create a framework for Internet of Things Solution Benefits 2 developers. Solution Overview 4 Executive Summary networks for decades, usually by engineering Key Components 4 or customizing the hardware and software Adding value is all about focusing on your core Solution Capabilities 6 from scratch. That forced system integrators to strengths while leveraging those of others. As the Example Use Models 6 develop networking, embedded and operational “Internet of Things” becomes more complicated Technology 7 technology expertise whether they liked it or and diverse, it’s important to make a solid IoT Tenets 8 not. A firm specializing in medical equipment, business case for attaching all of those “things” Summary 8 for example, doesn’t typically also employ Resources 9 experts on protocol stacks, low-level driver code, operating system kernels, or quality-of-service (QoS) algorithms. Distributed, industrial-grade networking expertise shouldn’t be a mandatory requirement for all system developers.
    [Show full text]
  • Unveiling Eurora - Thermal and Power Characterization of the Most Energy-Efficient Supercomputer in the World
    Unveiling Eurora - Thermal and Power Characterization of the most Energy-Efficient Supercomputer in the World Andrea Bartolini∗§, Matteo Cacciari∗, Carlo Cavazzoni†, Giampietro Tecchiolli‡ and Luca Benini∗§ ∗DEI, University of Bologna, Italy, {a.bartolini, matteo.cacciari, luca.benini}@unibo.it §ISS, ETH Zurich, Switzerland, {barandre, lbenini}@iis.ee.ethz.ch †SCAI, CINECA, Italy, [email protected] ‡Eurotech SpA, Italy, [email protected] Abstract—Eurora (EURopean many integrated cORe Archi- The Eurora Supercomputer, developed by Eurotech and tecture) is today the most energy efficient supercomputer in the Cineca [4] is today the most energy efficient supercomputer in world. Ranked 1st in the Green500 in July 2013, is a prototype the world. In July 2013, Eurora ranked first in the Green500 built from Eurotech and Cineca toward next-generation Tier-0 list, achieving 3.2 GFlops/W on the Linpack Benchmark with a systems in the PRACE 2IP EU project. Eurora’s outstanding peak power consumption of 30.7 KW, and improving by almost energy-efficiency is achieved by adopting a direct liquid cooling solution and a heterogeneous architecture with best-in-class 30% the performance of the greenest supercomputer in the general purpose HW components (Intel Xeon E5, Intel Xeon world. Eurora has been supported by PRACE 2IP project [16] Phi and NVIDIA Kepler K20). In this paper we present a and it will serve as testbed for next generation Tier-0 system. novel, low-overhead monitoring infrastructure capable to track Its outstanding performance is achieved by adopting a direct in detail and in real-time the thermal and power characteristics liquid cooling solution and a heterogeneous architecture with of Eurora’s components with fine-grained resolution.
    [Show full text]
  • European Technology Platform
    Eurotech delivers the “Booster” system to Jülich to complete the DEEP supercomputer Amaro, Jülich – 14 July 2015. Eurotech, a leading provider of embedded and supercomputing technologies, today announced the installation of the DEEP “Booster”, a tightly coupled cluster of manycore coprocessors, at Jülich Supercomputing Centre. This novel and highly-scalable HPC system completes the Exascale-enabling DEEP System. This prototype is based on the Cluster-Booster architecture and has been developed within the research project DEEP (Dynamical Exascale Entry Platform), funded by the European Commission through the FP7 program under grant agreement no. 287530. With this delivery, the DEEP consortium can leverage a supercomputer with a peak performance of 505 TFlop/s and an efficiency of over 3 GFlop/s per Watt. The Eurotech hot water cooling solution allows for additional permanent gains in energy efficiency at data centre level as it guarantees year-round free cooling in all climate zones. The system includes a matching innovative software stack, and six carefully selected grand challenge simulation applications have been optimized to show the full performance potential of the system. This Cluster-Booster architecture guarantees maximum flexibility and scalability with very high energy efficiency. The 3D Booster interconnect can be scaled up arbitrarily, and thus the DEEP system is a sound base for extrapolating to Exascale performance levels. "DEEP features a unique and novel computing and programming concept” – says professor Thomas Lippert
    [Show full text]
  • Eurotech Communication on Progress to Un Global Compact
    EUROTECH COMMUNICATION ON PROGRESS TO UN GLOBAL COMPACT Reporting Period : February 2011 - Ap ril 2012 2 Index Statement from Eurotech CEO 5 1. Strategic Management of Sustainability 7 2. Products Development and Commercialization 9 3. Business and Operations Management 12 4. Infrastructures, Energy, Water, Waste Management 14 5. Employees Management 15 6. Sustainability Culture Development 17 7. Community Engagement 18 8. Governance 19 9. Research Activity 20 10. Supporting the Millennium Development Goals 21 THE TEN PRINCIPLES OF THE UN GLOBAL COMPACT 23 THE MILLENNIUM DEVELOPMENT GOALS 25 3 4 Statement from Eurotech CEO I am glad to present Eurotech’s COP 2012 to the UN Global Compact . Eurotech has developed its business on its core competence in low-power miniaturized computers, and is a leader in developing and producing the building blocks of the Pervasive Computing Infrastructure: from boards, to systems, to ready to use products, to device data managemet via cloud solutions, to HPCs, Eurotech products contribute to reducing the need and to optimizing the use of energy, space and materials in industrial and service activities. We joined the UN Global Compact in 2009, within our path of integration of social and environmental sustainability into the strategy, the culture and the daily operating of the Company. At the core of its business, Eurotech is natively supporting environmental sustainability. In fact, the ubiquitous and pervasive use of computers has a fundamental role in making our Planet greener: if we want to make infrastructures and tools greener, we have to make them smarter, which means maximizing their output efficiency for minimizing their need of input resources.
    [Show full text]