High-Performance and Hardware

Rainer Buchty, Jan-Philipp Weiß (eds.) High-performance system architectures are increasingly exploiting heterogeneity: multi- and manycore-based systems are complemented by coprocessors, accelerators, and reconfigurable units providing huge High-performance and computational power. However, applications of scientific interest (e.g., in high-performance computing and numerical simulation) are not yet Hardware-aware Computing ready to exploit the available high computing potential. Different programming models, non-adjusted interfaces, and bandwidth bottlenecks complicate holistic programming approaches for heterogeneous archi- Proceedings of the Second International Workshop on tectures. In modern microprocessors, hierarchical memory layouts and New Frontiers in High-performance and complex logics obscure predictability of memory transfers or performance estimations. Hardware-aware Computing (HipHaC‘11) The HipHaC workshop aims at combining new aspects of parallel, heterogeneous, and reconfigurable microprocessor technologies with concepts of high-performance computing and, particularly, numerical solution methods. Compute- and memory-intensive applications can only benefit from the full hardware potential if all features on all levels are taken into account in a holistic approach. ISBN 978-3-86644-626-7 ISBN 978-3-86644-626-7 9 783866 446267 Rainer Buchty, Jan-Philipp Weiß (eds.) High-performance and Hardware-aware Computing Proceedings of the Second International Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC‘11) San Antonio, Texas, USA, February 2011 (In Conjunction with HPCA-17) High-performance and Hardware-aware Computing Proceedings of the Second International Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC‘11) San Antonio, Texas, USA, February 2011 (In Conjunction with HPCA-17) Rainer Buchty Jan-Philipp Weiß (eds.) Impressum Karlsruher Institut für Technologie (KIT) KIT Scientific Publishing Straße am Forum 2 D-76131 Karlsruhe www.ksp.kit.edu KIT – Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der Helmholtz-Gemeinschaft Diese Veröffentlichung ist im Internet unter folgender Creative Commons-Lizenz publiziert: http://creativecommons.org/licenses/by-nc-nd/3.0/de/ KIT Scientific Publishing 2011 Print on Demand ISBN 978-3-86644-626-7 Organization Workshop Organizers: Steering Committee: Rainer Buchty Vincent Heuveline Eberhard Karls University Tübingen, Germany, Karlsruhe Institute of Technology, Germany & Karlsruhe Institute of Technology, Germany Wolfgang Karl Jan-Philipp Weiß Karlsruhe Institute of Technology, Germany Karlsruhe Institute of Technology, Germany Program Committee: David A. Bader Michael Hübner Georgia Tech, Atlanta, USA Karlsruhe Institute of Technology, Germany Michael Bader Ben Juurlink Univ. Stuttgart, Germany TU Berlin, Germany Mladen Berekovic Wolfgang Karl Univ. Braunschweig, Germany Karlsruhe Institute of Technology, Germany Alan Berenbaum Rainer Keller SMSC, USA HLRS, Stuttgart, Germany Martin Bogdan Hiroaki Kobayashi Univ. Leipzig, Germany Tohoku University, Japan Dominik Göddeke Harald Köstler TU Dortmund, Germany Univ. Erlangen, Germany Georg Hager Dieter an Mey Univ. Erlangen, Germany RWTH Aachen, Germany Vincent Heuveline Andy Nisbet Karlsruhe Institute of Technology, Germany Manchester Metropolitan University, UK Eric d’Hollander Christian Perez Ghent University, Belgium INRIA, France Program Committee (continued): Franz-Josef Pfreundt Masha Sosonkina ITWM Kaiserslautern, Germany Ames Lab, USA Wolfgang Rosenstiel Thomas Steinke Eberhard Karls University Tübingen, Germany Zuse-Institut Berlin, Germany Olaf Schenk Josef Weidendorfer Basel University, Switzerland TU Munich, Germany Martin Schulz Felix Wolf LLNL, USA GRS-SIM Aachen/Jülich, Germany II Preface High-performance system architectures are increasingly exploiting heterogeneity: multi- and manycore- based systems are complemented by coprocessors, accelerators, and reconfigurable units providing huge computational power. However, applications of scientific interest (e.g. in high-performance computing and numerical simulation) are not yet ready to exploit the available high computing potential. Different programming models, non-adjusted interfaces, and bandwidth bottlenecks complicate holistic programming approaches for heterogeneous architectures. In modern microprocessors, hierarchical memory layouts and complex logics obscure predictability of memory transfers or performance estimations. For efficient implementations and optimal results, underlying algorithms and mathematical solution methods have to be adapted carefully to architectural constraints like fine-grained parallelism and memory or bandwidth limitations that require additional communication and synchronization. Currently, a comprehensive knowledge of underlying hardware is therefore mandatory for application programmers. Hence, there is strong need for virtualization concepts that free programmers from hardware details, main- taining best performance and enable deployment in heterogeneous and reconfigurable environments. The Second International Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC’11) – held in conjunction with the 17th IEEE International Symposium on High- Performance Computer Architecture (HPCA-17) – aims at combining new aspects of parallel, heterogeneous, and reconfigurable system architectures with concepts of high-performance computing and, particularly, numerical solution methods. It brings together international researchers of all affected fields to discuss issues of high-performance computing on emerging hardware architectures, ranging from architecture work to programming and tools. The workshop organizers would therefore like to thank the HPCA-17 Workshop Chair for giving us the chance to host this workshop in conjunction with one of the world’s finest conferences on high- performance architectures – and of course all the people who made this workshop finally happen.Thanks to the many contributors submitting exciting and novel work, HipHaC’11 will reflect a broad range of issues on architecture design, algorithm implementation, and application optimization. Karlsruhe, January 2011 Rainer Buchty1;2 & Jan-Philipp Weiß2 1Eberhard Karls University Tübingen 2 Karlsruhe Institute of Technology (KIT) Table of Content Architectures Convey HC-1 Hybrid Core Computer – The Potential of FPGAs in Numerical Simulation . 1 Werner Augustin, Jan-Philipp Weiss, and Vincent Heuveline Optimized Replacement in the Configuration Layers of the Grid Alu Processor . .9 Ralf Jahr, Basher Shehan, Sascha Uhrig, and Theo Ungerer Performance Measurement, Modeling, and Engineering Performance Engineering of an Orthogonal Matching Pursuit Algorithm for Sparse Representation of Signals on Different Architectures . 17 Markus Stürmer, Florian Rathgeber, and Harald Köstler Impact of Data Sharing on CMP design: A study based on Analytical Modeling . 25 Anil Krishna, Ahmad Samih, and Yan Solihin Traffic Prediction for NoCs using Fuzzy Logic . 33 Gervin Thomas, Ben Juurlink, and Dietmar Tutsch GPU Acceleration of the Assembly Process in Isogeometric Analysis. .41 Nathan Collier, Hyoseop Lee, Aron Ahmadia, Craig C. Douglas, and Victor M. Calo GPU Accelerated Scientific Computing: Evaluation of the NVIDIA Fermi Architecture; Elementary Kernels and Linear Solvers . 47 Hartwig Anzt, Tobias Hahn, Björn Rocker, and Vincent Heuveline List of Authors . 55 V Convey HC-1 Hybrid Core Computer – The Potential of FPGAs in Numerical Simulation Werner Augustin, Jan-Philipp Weiss Vincent Heuveline Engineering Mathematics and Computing Lab (EMCL) Engineering Mathematics and Computing Lab (EMCL) SRG New Frontiers in High Performance Computing Karlsruhe Institute of Technology, Germany Exploiting Multicore and Coprocessor Technology [email protected] Karlsruhe Institute of Technology, Germany [email protected], [email protected] Abstract—The Convey HC-1 Hybrid Core Computer brings and in particular for numerical simulation FPGAs are very FPGA technologies closer to numerical simulation. It combines attractive from further points of view: run-time configurability two types of processor architectures in a single system. Highly is an interesting topic for applications with several phases of capable FPGAs are closely connected to a host CPU and the accelerator-to-memory bandwidth has remarkable values. communication and computation and might be considered for Reconfigurability by means of pre-defined application-specific adaptive numerical methods. In addition, energy efficiency is instruction sets called personalities have the appeal of opti- a great concern in high performance computing and FPGA mized hardware configuration with respect to application char- technology is a possible solution approach. The main idea acteristics. Moreover, Convey’s solution eases the programming of FPGAs is to build one’s own parallel fixed-function units effort considerably. In contrast to hardware-centric and time- consuming classical coding of FPGAs, a dual-target compiler according to the special needs of the underlying application. interprets pragma-extended C/C++ or Fortran code and produces Currently, numerical simulation adopts all kinds of emerg- implementations running on both, host and accelerator. In ing technologies. In this context, a trend towards heteroge- addition, a global view of host and device memory is provided neous platforms has become apparent

High-Performance and Hardware

Nvidia Tesla P40 Gpu Accelerator

NVIDIA Ampere GA102 GPU Architecture Whitepaper

PACKET 22 BOOKSTORE, TEXTBOOK CHAPTER Reading Graphics

Advanced Computing

Virtual GPU Software User Guide Is Organized As Follows: ‣ This Chapter Introduces the Capabilities and Features of NVIDIA Vgpu Software

Nvidia Tesla V100 Gpu Accelerator

NVIDIA Tesla P100 Pcie Data Sheet

Nvidia Virtual Gpu How to Buy

NVIDIA® TESLA™ M2070Q Visual Supercomputing at 1/10Th the Cost

TESLA™ S2050 GPU Computing SYSTEM Supercomputing at 1/10Th the Cost

Nvidia Maximus System Builders' Guide

Nvidia Tesla Product Overview