Design of a Modern Distributed and Accelerated Linear Algebra Library

Total Page:16

File Type:pdf, Size:1020Kb

Design of a Modern Distributed and Accelerated Linear Algebra Library SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library Mark Gates Jakub Kurzak Ali Charara [email protected] [email protected] [email protected] University of Tennessee University of Tennessee University of Tennessee Knoxville, TN Knoxville, TN Knoxville, TN Asim YarKhan Jack Dongarra [email protected] [email protected] University of Tennessee University of Tennessee Knoxville, TN Knoxville, TN Oak Ridge National Laboratory University of Manchester ABSTRACT USA. ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/3295500. The SLATE (Software for Linear Algebra Targeting Exascale) library 3356223 is being developed to provide fundamental dense linear algebra capabilities for current and upcoming distributed high-performance systems, both accelerated CPU–GPU based and CPU based. SLATE will provide coverage of existing ScaLAPACK functionality, includ- ing the parallel BLAS; linear systems using LU and Cholesky; least 1 INTRODUCTION squares problems using QR; and eigenvalue and singular value The objective of SLATE is to provide fundamental dense linear problems. In this respect, it will serve as a replacement for Sca- algebra capabilities for the high-performance computing (HPC) LAPACK, which after two decades of operation, cannot adequately community at large. The ultimate objective of SLATE is to replace be retrofitted for modern accelerated architectures. SLATE uses ScaLAPACK [7], which has become the industry standard for dense modern techniques such as communication-avoiding algorithms, linear algebra operations in distributed memory environments. Af- lookahead panels to overlap communication and computation, and ter two decades of operation, ScaLAPACK is past the end of its life task-based scheduling, along with a modern C++ framework. Here cycle and overdue for a replacement, as it can hardly be retrofitted we present the design of SLATE and initial reports of several of its to support hardware accelerators, which are an integral part of components. today’s HPC hardware infrastructure. In the past two decades, HPC has witnessed tectonic shifts in hardware technology, software CCS CONCEPTS technology, and a plethora of algorithmic innovations in scientific • Mathematics of computing → Computations on matrices; computing that are not reflected in ScaLAPACK. At the same time, • Computing methodologies → Massively parallel algorithms; while several libraries provide functionality similar to ScaLAPACK, Shared memory algorithms; no replacement for ScaLAPACK has emerged to gain widespread adoption in applications. SLATE aims to be this replacement, boast- KEYWORDS ing superior performance and scalability in modern, heterogeneous, dense linear algebra, distributed computing, GPU computing distributed-memory HPC environments. Primarily, SLATE aims to extract the full performance potential ACM Reference Format: and maximum scalability from modern, many-node HPC machines Mark Gates, Jakub Kurzak, Ali Charara, Asim YarKhan, and Jack Dongarra. with large numbers of cores and multiple hardware accelerators 2019. SLATE: Design of a Modern Distributed and Accelerated Linear Alge- per node. For typical dense linear algebra workloads, this means bra Library. In The International Conference for High Performance Computing, getting close to the theoretical peak performance and scaling to Networking, Storage, and Analysis (SC ’19), November 17–22, 2019, Denver, CO, the full size of the machine (i.e., thousands to tens of thousands of nodes). Permission to make digital or hard copies of all or part of this work for personal or SLATE currently implements parallel BLAS (PBLAS); linear sys- classroom use is granted without fee provided that copies are not made or distributed tems using partial pivoted LU, Cholesky, and block Aasen’s for for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM symmetric indefinite systems [6]; mixed-precision iterative refine- must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, ment; least squares using communication-avoiding QR [15]; and ma- to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. trix inversion. We are actively developing communication-avoiding SC ’19, November 17–22, 2019, Denver, CO, USA two-stage symmetric eigenvalue and singular value decompositions. © 2019 Association for Computing Machinery. Future work will include non-symmetric eigenvalue problems and ACM ISBN 978-1-4503-6229-0/19/11...$15.00 randomized algorithms. https://doi.org/10.1145/3295500.3356223 SC ’19, November 17–22, 2019, Denver, CO, USA Gates, Kurzak, Charara, YarKhan, Dongarra 1.1 Motivation derived communication-avoiding algorithms for QR and LU factor- In the United States, the plan for achieving exascale relies heavily ization [15], as well as for matrix-matrix multiplication [14, 42]. on the use of GPU accelerated machines, similar to the Summit 1 There are also numerous node-level linear algebra libraries. Eigen and Sierra 2 systems at Oak Ridge National Laboratory (ORNL) [32] and Armadillo [41] use C++ templates extensively to make and Lawrence Livermore National Laboratory (LLNL), respectively, precision-independent code. MAGMA [16] provides accelerated which currently occupy positions #1 and #2 on the TOP500 list. linear algebra, with extensive coverage of LAPACK’s functionality. The urgency of developing multi-GPU accelerated, distributed- ViennaCL [40] provides accelerated linear algebra, but lacks most memory software is underscored by the architectures of these sys- dense linear algebra aside from BLAS and no-pivoting LU. NVIDIA tems. The Summit system contains three NVIDIA V100 GPUs per cuBLAS [38] and AMD rocBLAS [4] provide native BLAS opera- POWER9 CPU. The peak double precision floating point perfor- tions on GPU accelerators. Vendor libraries such as Cray Scientific mance of the CPUs is 22 cores × 24:56 GFLOP/s = 0:54 TFLOP/s. Libraries (LibSci) [13] and IBM Engineering and Scientific Sub- The peak performance of the GPUs is 3 devices × 7:8 TFLOP/s = routine Library (ESSL) [29] provide accelerated versions of BLAS 23:4 TFLOP/s. That is, 97:7% of performance is on the GPU side, and LAPACK for GPU accelerators. These accelerated node-level and only 2:3% of performance is on the CPU side. This means that, libraries may give some limited benefit to ScaLAPACK, which is for all practical purposes, the CPUs can be ignored as floating point built on top of BLAS and LAPACK. However, this is limited by hardware, with their main purpose being to keep the GPUs busy. the bulk-synchronous, fork-join design of ScaLAPACK. The Intel Math Kernels Library (MKL) [31] provides BLAS, LAPACK, and 1.2 Related work ScaLAPACK for CPUs and Xeon Phi accelerators. Numerous libraries provide dense linear algebra routines. Since its initial release nearly 30 years ago, LAPACK [5] has become the 1.3 Goals de facto standard library for dense linear algebra on a single node. Ultimately, a distributed linear algebra library built from the ground It leverages vendor-optimized BLAS for node-level performance, up to support accelerators, and incorporating state-of-the-art par- including shared-memory parallelism. ScaLAPACK [7] built upon allel algorithms, will give the best performance. In this manuscript, LAPACK by extending its routines to distributed computing, relying we describe how we have designed SLATE to achieve this, and on both the Parallel BLAS (PBLAS) and explicit distributed-memory describe how it fulfills the following design goals. parallelism. Some attempts [18, 19] have been made to adapt the Targets modern HPC hardware consisting of a large num- ScaLAPACK library for accelerators, but these efforts have shown ber of nodes with multicore processors and several hardware the need for a new framework. accelerators per node. More recently, the DPLASMA [9] and Chameleon [2, 3] libraries Achieves portable high performance by relying on vendor both build a task dependency graph and launch tasks as their depen- optimized standard BLAS, batch BLAS, and LAPACK, and dencies are fulfilled. This eliminates the artificial synchronizations standard parallel programming technologies such as MPI inherent in ScaLAPACK’s design, and allows for overlap of commu- and OpenMP. Using the OpenMP runtime puts less of a nication and computation. DPLASMA relies on the PaRSEC runtime burden on applications to integrate SLATE than adopting a to schedule tasks, while Chameleon uses the StarPU runtime. If proprietary runtime would. the dataflow is known at compile time, an algorithm can beex- Provides scalability by employing proven techniques in dense pressed using the efficient and scalable Parameterized Task Graph linear algebra, such as 2D block cyclic data distribution and (PTG) interface. Algorithms that make runtime decisions based on communication-avoiding algorithms, as well as modern par- data, such as partial pivoting in LU, are easier to express using the allel programming approaches, such as dynamic scheduling less efficient but more intuitive Insert Task interface, similar to and communication overlapping. OpenMP tasks. Both DPLASMA and Chameleon are accelerated Facilitates productivity by relying on the intuitive Single on GPU-based architectures. Instead of leveraging C++ templates, Program, Multiple Data (SPMD) programming model and they auto-generate all precisions
Recommended publications
  • Jack Dongarra: Supercomputing Expert and Mathematical Software Specialist
    Biographies Jack Dongarra: Supercomputing Expert and Mathematical Software Specialist Thomas Haigh University of Wisconsin Editor: Thomas Haigh Jack J. Dongarra was born in Chicago in 1950 to a he applied for an internship at nearby Argonne National family of Sicilian immigrants. He remembers himself as Laboratory as part of a program that rewarded a small an undistinguished student during his time at a local group of undergraduates with college credit. Dongarra Catholic elementary school, burdened by undiagnosed credits his success here, against strong competition from dyslexia.Onlylater,inhighschool,didhebeginto students attending far more prestigious schools, to Leff’s connect material from science classes with his love of friendship with Jim Pool who was then associate director taking machines apart and tinkering with them. Inspired of the Applied Mathematics Division at Argonne.2 by his science teacher, he attended Chicago State University and majored in mathematics, thinking that EISPACK this would combine well with education courses to equip Dongarra was supervised by Brian Smith, a young him for a high school teaching career. The first person in researcher whose primary concern at the time was the his family to go to college, he lived at home and worked lab’s EISPACK project. Although budget cuts forced Pool to in a pizza restaurant to cover the cost of his education.1 make substantial layoffs during his time as acting In 1972, during his senior year, a series of chance director of the Applied Mathematics Division in 1970– events reshaped Dongarra’s planned career. On the 1971, he had made a special effort to find funds to suggestion of Harvey Leff, one of his physics professors, protect the project and hire Smith.
    [Show full text]
  • A Peer-To-Peer Framework for Message Passing Parallel Programs
    A Peer-to-Peer Framework for Message Passing Parallel Programs Stéphane GENAUD a;1, and Choopan RATTANAPOKA b;2 a AlGorille Team - LORIA b King Mongkut’s University of Technology Abstract. This chapter describes the P2P-MPI project, a software framework aimed at the development of message-passing programs in large scale distributed net- works of computers. Our goal is to provide a light-weight, self-contained software package that requires minimum effort to use and maintain. P2P-MPI relies on three features to reach this goal: i) its installation and use does not require administra- tor privileges, ii) available resources are discovered and selected for a computation without intervention from the user, iii) program executions can be fault-tolerant on user demand, in a completely transparent fashion (no checkpoint server to config- ure). P2P-MPI is typically suited for organizations having spare individual comput- ers linked by a high speed network, possibly running heterogeneous operating sys- tems, and having Java applications. From a technical point of view, the framework has three layers: an infrastructure management layer at the bottom, a middleware layer containing the services, and the communication layer implementing an MPJ (Message Passing for Java) communication library at the top. We explain the de- sign and the implementation of these layers, and we discuss the allocation strategy based on network locality to the submitter. Allocation experiments of more than five hundreds peers are presented to validate the implementation. We also present how fault-management issues have been tackled. First, the monitoring of the infras- tructure itself is implemented through the use of failure detectors.
    [Show full text]
  • Wrangling Rogues: a Case Study on Managing Experimental Post-Moore Architectures Will Powell Jeffrey S
    Wrangling Rogues: A Case Study on Managing Experimental Post-Moore Architectures Will Powell Jeffrey S. Young Jason Riedy Thomas M. Conte [email protected] [email protected] [email protected] [email protected] School of Computational Science and Engineering School of Computer Science Georgia Institute of Technology Georgia Institute of Technology Atlanta, Georgia Atlanta, Georgia Abstract are not always foreseen in traditional data center environments. The Rogues Gallery is a new experimental testbed that is focused Our Rogues Gallery of novel technologies has produced research on tackling rogue architectures for the post-Moore era of computing. results and is being used in classes both at Georgia Tech as well as While some of these devices have roots in the embedded and high- at other institutions. performance computing spaces, managing current and emerging The key lessons from this testbed (so far) are the following: technologies provides a challenge for system administration that • Invest in rogues, but realize some technology may be are not always foreseen in traditional data center environments. short-lived. We cannot invest too much time in configuring We present an overview of the motivations and design of the and integrating a novel architecture when tools and software initial Rogues Gallery testbed and cover some of the unique chal- stacks may be in an unstable state of development. Rogues lenges that we have seen and foresee with upcoming hardware may be short-lived; they may not achieve their goals. Find- prototypes for future post-Moore research. Specifically, we cover ing their limits is an important aspect of the Rogues Gallery.
    [Show full text]
  • Simulating Low Precision Floating-Point Arithmetic
    SIMULATING LOW PRECISION FLOATING-POINT ARITHMETIC Higham, Nicholas J. and Pranesh, Srikara 2019 MIMS EPrint: 2019.4 Manchester Institute for Mathematical Sciences School of Mathematics The University of Manchester Reports available from: http://eprints.maths.manchester.ac.uk/ And by contacting: The MIMS Secretary School of Mathematics The University of Manchester Manchester, M13 9PL, UK ISSN 1749-9097 SIMULATING LOW PRECISION FLOATING-POINT ARITHMETIC∗ NICHOLAS J. HIGHAMy AND SRIKARA PRANESH∗ Abstract. The half precision (fp16) floating-point format, defined in the 2008 revision of the IEEE standard for floating-point arithmetic, and a more recently proposed half precision format bfloat16, are increasingly available in GPUs and other accelerators. While the support for low precision arithmetic is mainly motivated by machine learning applications, general purpose numerical algorithms can benefit from it, too, gaining in speed, energy usage, and reduced communication costs. Since the appropriate hardware is not always available, and one may wish to experiment with new arithmetics not yet implemented in hardware, software simulations of low precision arithmetic are needed. We discuss how to simulate low precision arithmetic using arithmetic of higher precision. We examine the correctness of such simulations and explain via rounding error analysis why a natural method of simulation can provide results that are more accurate than actual computations at low precision. We provide a MATLAB function chop that can be used to efficiently simulate fp16 and bfloat16 arithmetics, with or without the representation of subnormal numbers and with the options of round to nearest, directed rounding, stochastic rounding, and random bit flips in the significand. We demonstrate the advantages of this approach over defining a new MATLAB class and overloading operators.
    [Show full text]
  • R00456--FM Getting up to Speed
    GETTING UP TO SPEED THE FUTURE OF SUPERCOMPUTING Susan L. Graham, Marc Snir, and Cynthia A. Patterson, Editors Committee on the Future of Supercomputing Computer Science and Telecommunications Board Division on Engineering and Physical Sciences THE NATIONAL ACADEMIES PRESS Washington, D.C. www.nap.edu THE NATIONAL ACADEMIES PRESS 500 Fifth Street, N.W. Washington, DC 20001 NOTICE: The project that is the subject of this report was approved by the Gov- erning Board of the National Research Council, whose members are drawn from the councils of the National Academy of Sciences, the National Academy of Engi- neering, and the Institute of Medicine. The members of the committee responsible for the report were chosen for their special competences and with regard for ap- propriate balance. Support for this project was provided by the Department of Energy under Spon- sor Award No. DE-AT01-03NA00106. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the organizations that provided support for the project. International Standard Book Number 0-309-09502-6 (Book) International Standard Book Number 0-309-54679-6 (PDF) Library of Congress Catalog Card Number 2004118086 Cover designed by Jennifer Bishop. Cover images (clockwise from top right, front to back) 1. Exploding star. Scientific Discovery through Advanced Computing (SciDAC) Center for Supernova Research, U.S. Department of Energy, Office of Science. 2. Hurricane Frances, September 5, 2004, taken by GOES-12 satellite, 1 km visible imagery. U.S. National Oceanographic and Atmospheric Administration. 3. Large-eddy simulation of a Rayleigh-Taylor instability run on the Lawrence Livermore National Laboratory MCR Linux cluster in July 2003.
    [Show full text]
  • Mathematics People
    NEWS Mathematics People or up to ten years post-PhD, are eligible. Awardees receive Braverman Receives US$1 million distributed over five years. NSF Waterman Award —From an NSF announcement Mark Braverman of Princeton University has been selected as a Prizes of the Association cowinner of the 2019 Alan T. Wa- terman Award of the National Sci- for Women in Mathematics ence Foundation (NSF) for his work in complexity theory, algorithms, The Association for Women in Mathematics (AWM) has and the limits of what is possible awarded a number of prizes in 2019. computationally. According to the Catherine Sulem of the Univer- prize citation, his work “focuses on sity of Toronto has been named the Mark Braverman complexity, including looking at Sonia Kovalevsky Lecturer for 2019 by algorithms for optimization, which, the Association for Women in Math- when applied, might mean planning a route—how to get ematics (AWM) and the Society for from point A to point B in the most efficient way possible. Industrial and Applied Mathematics “Algorithms are everywhere. Most people know that (SIAM). The citation states: “Sulem every time someone uses a computer, algorithms are at is a prominent applied mathemati- work. But they also occur in nature. Braverman examines cian working in the area of nonlin- randomness in the motion of objects, down to the erratic Catherine Sulem ear analysis and partial differential movement of particles in a fluid. equations. She has specialized on “His work is also tied to algorithms required for learning, the topic of singularity development in solutions of the which serve as building blocks to artificial intelligence, and nonlinear Schrödinger equation (NLS), on the problem of has even had implications for the foundations of quantum free surface water waves, and on Hamiltonian partial differ- computing.
    [Show full text]
  • Co 2019 Vivekanandan Balasubramanian ALL RIGHTS
    c 2019 Vivekanandan Balasubramanian ALL RIGHTS RESERVED A PROGRAMMING MODEL AND EXECUTION SYSTEM FOR ADAPTIVE ENSEMBLE APPLICATIONS ON HIGH PERFORMANCE COMPUTING SYSTEMS by VIVEKANANDAN BALASUBRAMANIAN A dissertation submitted to the School of Graduate Studies Rutgers, The State University of New Jersey In partial fulfillment of the requirements For the degree of Doctor of Philosophy Graduate Program in Electrical and Computer Engineering Written under the direction of Shantenu Jha and Matteo Turilli and approved by New Brunswick, New Jersey October, 2019 ABSTRACT OF THE DISSERTATION A programming model and execution system for adaptive ensemble applications on high performance computing systems by Vivekanandan Balasubramanian Dissertation Directors: Shantenu Jha and Matteo Turilli Traditionally, advances in high-performance scientific computing have focused on the scale, performance, and optimization of an application with a large, single task, and less on applications comprised of multiple tasks. However, many scientific problems are expressed as applications that require the collective outcome of one or more ensembles of computational tasks in order to provide insight into the problem being studied. Depending on the scientific problem, a task of an ensemble can be any type of a program: from a molecular simulation, to a data analysis or a machine learning model. With different communication and coordination patterns, both within and across ensembles, the number and type of applications that can be formulated as ensembles is vast and spans many scientific domains, including biophysics, climate science, polar science and earth science. The performance of ensemble applications can be improved further by using partial results to adapt the application at runtime. Partial results of ongoing executions can be analyzed with multiple methods to adapt the application to focus on relevant portions of the problem space or reduce the time to execution of the application.
    [Show full text]
  • Michel Foucault Ronald C Kessler Graham Colditz Sigmund Freud
    ANK RESEARCHER ORGANIZATION H INDEX CITATIONS 1 Michel Foucault Collège de France 296 1026230 2 Ronald C Kessler Harvard University 289 392494 3 Graham Colditz Washington University in St Louis 288 316548 4 Sigmund Freud University of Vienna 284 552109 Brigham and Women's Hospital 5 284 332728 JoAnn E Manson Harvard Medical School 6 Shizuo Akira Osaka University 276 362588 Centre de Sociologie Européenne; 7 274 771039 Pierre Bourdieu Collège de France Massachusetts Institute of Technology 8 273 308874 Robert Langer MIT 9 Eric Lander Broad Institute Harvard MIT 272 454569 10 Bert Vogelstein Johns Hopkins University 270 410260 Brigham and Women's Hospital 11 267 363862 Eugene Braunwald Harvard Medical School Ecole Polytechnique Fédérale de 12 264 364838 Michael Graetzel Lausanne 13 Frank B Hu Harvard University 256 307111 14 Yi Hwa Liu Yale University 255 332019 15 M A Caligiuri City of Hope National Medical Center 253 345173 16 Gordon Guyatt McMaster University 252 284725 17 Salim Yusuf McMaster University 250 357419 18 Michael Karin University of California San Diego 250 273000 Yale University; Howard Hughes 19 244 221895 Richard A Flavell Medical Institute 20 T W Robbins University of Cambridge 239 180615 21 Zhong Lin Wang Georgia Institute of Technology 238 234085 22 Martín Heidegger Universität Freiburg 234 335652 23 Paul M Ridker Harvard Medical School 234 318801 24 Daniel Levy National Institutes of Health NIH 232 286694 25 Guido Kroemer INSERM 231 240372 26 Steven A Rosenberg National Institutes of Health NIH 231 224154 Max Planck
    [Show full text]
  • CV January 2001 Geoffrey Charles
    Geoffrey Charles Fox Phone Email Cell: 8122194643 [email protected] Informatics: 8128567977 [email protected] Lab: 8128560927 [email protected] Home: 8123239196 Web Site: http://www.infomall.org Addresses ..................................................................................................... 2 Date of Birth ................................................................................................. 2 Education ...................................................................................................... 2 Professional Experience .................................................................................. 2 Awards and Honors ........................................................................................ 3 Board Membership ......................................................................................... 3 Industry Experience ....................................................................................... 3 Current Journal Editor..................................................................................... 3 Previous Journal Responsibility ........................................................................ 4 PhD Theses supervised ................................................................................... 4 Research Activities ......................................................................................... 7 Conferences Organized ................................................................................... 8 Publications: Introduction ..............................................................................
    [Show full text]
  • NHBS Backlist Bargains 2011 Main Web Page NHBS Home Page
    www.nhbs.com [email protected] T: +44 (0)1803 865913 The Migration Biochemistry How To Identify How and Why Collins Field Guide: The Iberian Lynx: Ecology of Birds £51.99 £30.50 Trees in Southen Species Multiply: Birds of the Extinction or Africa The Radiation of Palearctic - Non- Recovery? £83.00 £48.99 £14.99 £8.99 Darwin's Finches Passerines £6.99 £4.50 £27.95 £16.50 £24.99 £12.99 For Love of Insects Field Guide to the The Status and Atlas of Seeds and The Biology of A Primer of £18.95 £11.50 Birds of The Distribution of Fruits of Central African Savannahs Ecological Statistics Gambia and Dragonflies of the and East-European £34.95 £20.50 £32.99 £19.50 Senegal Mediterranean Flora £24.99 £14.99 Region £359.00 £209.99 £9.99 £6.50 Browse the subject pages Dear Customers, Mammals Birds Welcome to the 2011 Backlist Bargains Catalogue! Every year we offer you the Reptiles & Amphibians chance to update your library collections, top up on textbooks or explore new interests, at greatly reduced prices. This year we have nearly 5000 books at up Fishes to 50% off. Invertebrates Palaeontology You'll find books from across our range of scientific and environmental subjects, Marine & Freshwater Biology from heavyweight science and monographs to field guides and natural history General Natural History writing. Regional & Travel Botany & Plant Science Please enjoy browsing the catalogue. The NHBS Backlist Bargains sale ends March Animal & General Biology 31st 2011. Take advantage of these great discounts - Order Now! Evolutionary Biology Ecology Happy reading and buying, Habitats & Ecosystems Conservation & Biodiversity Nigel Massen Managing Director Environmental Science Physical Sciences Sustainable Development Using the Backlist Bargains Catalogue Data Analysis Reference Now that NHBS Catalogues are online there are many ways to access the title information that you need.
    [Show full text]
  • Studying the Regulatory Landscape of Flowering Plants
    Studying the Regulatory Landscape of Flowering Plants Jan Van de Velde Promoter: Prof. Dr. Klaas Vandepoele Co-Promoter: Prof. Dr. Jan Fostier Ghent University Faculty of Sciences Department of Plant Biotechnology and Bioinformatics VIB Department of Plant Systems Biology Comparative and Integrative Genomics Research funded by a PhD grant of the Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT Vlaanderen). Dissertation submitted in fulfilment of the requirements for the degree of Doctor in Sciences:Bioinformatics. Academic year: 2016-2017 Examination Commitee Prof. Dr. Geert De Jaeger (chair) Faculty of Sciences, Department of Plant Biotechnology and Bioinformatics, Ghent University Prof. Dr. Klaas Vandepoele (promoter) Faculty of Sciences, Department of Plant Biotechnology and Bioinformatics, Ghent University Prof. Dr. Jan Fostier (co-promoter) Faculty of Engineering and Architecture, Department of Information Technology (INTEC), Ghent University - iMinds Prof. Dr. Kerstin Kaufmann Institute for Biochemistry and Biology, Potsdam University Prof. Dr. Pieter de Bleser Inflammation Research Center, Flanders Institute of Biotechnology (VIB) and Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium Dr. Vanessa Vermeirssen Faculty of Sciences, Department of Plant Biotechnology and Bioinformatics, Ghent University Dr. Stefanie De Bodt Crop Science Division, Bayer CropScience SA-NV, Functional Biology Dr. Inge De Clercq Department of Animal, Plant and Soil Science, ARC Centre of Excellence in Plant Energy Biology, La Trobe University and Faculty of Sciences, Department of Plant Biotechnology and Bioinformatics, Ghent University iii Thank You! Throughout this PhD I have received a lot of support, therefore there are a number of people I would like to thank. First of all, I would like to thank Klaas Vandepoele, for his support and guidance.
    [Show full text]
  • Jack Dongarra University of Tennessee and Oak Ridge National Laboratory
    Seminario EEUU/Venezuela de Computación de Alto Rendimiento 2000 US/Venezuela Workshop on High Performance Computing 2000 - WoHPC 2000 4 al 6 de Abril del 2000 Puerto La Cruz Venezuela Jack Dongarra University of Tennessee and Oak Ridge National Laboratory http://www.cs.utk.edu/~dongarra/ 1 ♦ Look at trends in HPC Ø Top500 statistics ♦ NetSolve Ø Example of grid middleware ♦ Performance on today’s architecture Ø ATLAS effort ♦ Tools for performance evaluation Ø Performance API (PAPI) 2 - Started in 6/93 by JD,Hans W. Meuer and Erich Strohmaier - Basis for analyzing the HCP market - Quantify observations - Detection of trends (market, architecture, technology) 3 - Listing of the 500 most powerful Computers in the World - Yardstick: Rmax from LINPACK MPP Ax=b, dense problem TPP performance - Updated twice a year Rate Size SC‘xy in November Meeting in Mannheim in June - All data available from www.top500.org 4 ♦ Original classes of ♦ A way for machines tracking trends Ø Sequential Ø SMPs Øin performance Ø MPPs Ø in market Ø SIMDs Øin classes of ♦ Two new classes HPC systems Ø Beowulf-class ØArchitecture systems Ø ØTechnology Clustering of SMPs and DSMs ØRequires additional terminology 5 Ø “Constellation” ClusterCluster ofof ClustersClusters ♦ An ensemble of N nodes each comprising p computing elements ♦ The p elements are tightly bound shared memory (e.g. smp, dsm) ♦ The N nodes are loosely coupled, i.e.: distributed memory 4TF Blue Pacific SST 3 x 480 4-way SMP nodes ♦ p is greater than N 3.9 TF peak performance 2.6 TB memory ♦ Distinction
    [Show full text]