Download Extract
Total Page:16
File Type:pdf, Size:1020Kb
Full text available at: http://dx.doi.org/10.1561/1900000076 Database Systems on GPUs Full text available at: http://dx.doi.org/10.1561/1900000076 Other titles in Foundations and Trends® in Databases Machine Knowledge: Creation and Curation of Comprehensive Knowl- edge Bases Gerhard Weikum, Xin Luna Dong, Simon Razniewski and Fabian Suchanek ISBN: 978-1-68083-836-7 Cloud Data Services: Workloads, Architectures and Multi-Tenancy Vivek Narasayya and Surajit Chaudhuri ISBN: 978-1-68083-774-2 Data Provenance Boris Glavic ISBN: 978-1-68083-828-2 FPGA-Accelerated Analytics: From Single Nodes to Clusters Zsolt István, Kaan Kara and David Sidler ISBN: 978-1-68083-734-6 Distributed Learning Systems with First-Order Methods Ji Liu and Ce Zhango ISBN: 978-1-68083-700-1 Full text available at: http://dx.doi.org/10.1561/1900000076 Database Systems on GPUs Johns Paul National University of Singapore Shengliang Lu National University of Singapore Bingsheng He National University of Singapore [email protected] Boston — Delft Full text available at: http://dx.doi.org/10.1561/1900000076 Foundations and Trends® in Databases Published, sold and distributed by: now Publishers Inc. PO Box 1024 Hanover, MA 02339 United States Tel. +1-781-985-4510 www.nowpublishers.com [email protected] Outside North America: now Publishers Inc. PO Box 179 2600 AD Delft The Netherlands Tel. +31-6-51115274 The preferred citation for this publication is J. Paul, S. Lu and B. He. Database Systems on GPUs. Foundations and Trends® in Databases, vol. 11, no. 1, pp. 1–108, 2021. ISBN: 978-1-68083-849-7 © 2021 J. Paul, S. Lu and B. He All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, mechanical, photocopying, recording or otherwise, without prior written permission of the publishers. Photocopying. In the USA: This journal is registered at the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923. Authorization to photocopy items for internal or personal use, or the internal or personal use of specific clients, is granted by now Publishers Inc for users registered with the Copyright Clearance Center (CCC). The ‘services’ for users can be found on the internet at: www.copyright.com For those organizations that have been granted a photocopy license, a separate system of payment has been arranged. Authorization does not extend to other kinds of copying, such as that for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. In the rest of the world: Permission to photocopy must be obtained from the copyright owner. Please apply to now Publishers Inc., PO Box 1024, Hanover, MA 02339, USA; Tel. +1 781 871 0245; www.nowpublishers.com; [email protected] now Publishers Inc. has an exclusive license to publish this material worldwide. Permission to use this content must be obtained from the copyright license holder. Please apply to now Publishers, PO Box 179, 2600 AD Delft, The Netherlands, www.nowpublishers.com; e-mail: [email protected] Full text available at: http://dx.doi.org/10.1561/1900000076 Foundations and Trends® in Databases Volume 11, Issue 1, 2021 Editorial Board Editor-in-Chief Joseph M. Hellerstein Surajit Chaudhuri University of California at Berkeley Microsoft Research, Redmond United States United States Editors Azza Abouzied NYU-Abu Dhabi Gustavo Alonso ETH Zurich Mike Cafarella University of Michigan Alan Fekete University of Sydney Ihab Ilyas University of Waterloo Andy Pavlo Carnegie Mellon University Sunita Sarawagi IIT Bombay Full text available at: http://dx.doi.org/10.1561/1900000076 Editorial Scope Topics Foundations and Trends® in Databases publishes survey and tutorial articles in the following topics: • Data Models and Query • Data Warehousing Languages • Adaptive Query Processing • Query Processing and • Data Stream Management Optimization • Search and Query Integration • Storage, Access Methods, and Indexing • XML and Semi-Structured Data • Transaction Management, Concurrency Control and • Web Services and Middleware Recovery • Data Integration and Exchange • Deductive Databases • Private and Secure Data • Parallel and Distributed Management Database Systems • Peer-to-Peer, Sensornet and • Database Design and Tuning Mobile Data Management • Metadata Management • Scientific and Spatial Data Management • Object Management • Data Brokering and • Trigger Processing and Active Publish/Subscribe Databases • Data Cleaning and Information • Data Mining and OLAP Extraction • Approximate and Interactive • Probabilistic Data Query Processing Management Information for Librarians Foundations and Trends® in Databases, 2021, Volume 11, 4 issues. ISSN paper version 1931-7883. ISSN online version 1931-7891. Also available as a combined paper and online subscription. Full text available at: http://dx.doi.org/10.1561/1900000076 Contents 1 Introduction3 2 GPU Hardware 8 2.1 GPU Hardware Architecture ................. 8 2.2 GPU Programming Model .................. 16 2.3 Performance Optimizations on GPU Programming ..... 18 3 A Brief on Database Systems on CPU 20 4 System Design on GPUs 25 4.1 Data Storage & Data Access ................ 25 4.2 Operators Design ...................... 31 4.3 Parallelism Granularities ................... 38 4.4 Summary ........................... 40 5 Query Processing on GPUs 41 5.1 Query Processing on a Single GPU ............. 41 5.2 Query Processing on Multiple Processors .......... 48 6 Systems 52 6.1 Systems in Academia .................... 52 6.2 Systems in Industry ..................... 61 6.3 Summary ........................... 71 Full text available at: http://dx.doi.org/10.1561/1900000076 7 Future Directions 73 7.1 Hardware trends....................... 74 7.2 Application Trends ...................... 78 7.3 Other development demands ................ 79 References 81 Full text available at: http://dx.doi.org/10.1561/1900000076 Database Systems on GPUs Johns Paul1, Shengliang Lu2 and Binsheng He3 1National University of Singapore 2National University of Singapore 3National University of Singapore; [email protected] ABSTRACT This article gives an overview of history and recent de- velopments in database systems on graphics processing units (GPUs). GPU, which was originally designed as a co-processor for rendering and graphics, has become a pow- erful, programmable, many-core processor in the past decade. As the GPU achieves much higher computation power and memory bandwidth than the CPU, GPU accelerations be- come an effective means to improve the performance of main memory databases. Database systems on GPUs have their root designs on traditional database systems on the CPU, but many GPU-optimized system designs have been introduced, ranging from data layouts, operator design to query processing and query optimizations. Those designs can achieve significant performance improvements over the traditional designs. In this article, we start with introduc- ing the background on GPU as a parallel architecture and the traditional parallel query processing in main memory databases. Next, we present the details of GPU-optimized system designs. We then survey a series of commercial and research systems, and outline the research trends. We wrote this article as an introductory article in GPU- optimized database systems especially in online analytical Johns Paul, Shengliang Lu and Binsheng He (2021), “Database Systems on GPUs”, Foundations and Trends® in Databases: Vol. 11, No. 1, pp 1–108. DOI: 10.1561/1900000076. Full text available at: http://dx.doi.org/10.1561/1900000076 2 processing (OLAP), which can be used as a short text for graduate level or a survey for researchers. We emphasize on the breadth and try to cover as many publications (such as those published in ACM/IEEE) as possible, with necessary details in some key GPU-optimized designs. Full text available at: http://dx.doi.org/10.1561/1900000076 1 Introduction The explosive growth in the amount of data that needs to be ana- lyzed (Data, 2020) and the slow down of CPU performance growth over the last decade due to the breakdown of Moore’s law (Michelogiannakis et al., 2016) has pushed the usage of heterogeneous architectures to many data intensive applications. Among those heterogeneous archi- tectures, graphic processing units (GPUs) have been successfully used to improve the performance of in-memory data analytics. While CPU performance growth has slowed down, GPU hardware has achieved over 45x increase in core count and over 23x increase in global memory bandwidth since 2008. Hence, GPUs have evolved as the hardware of choice for high performance database systems, especially for Online Analytical Processing (OLAP). To demonstrate the attractiveness of GPU hardware for high per- formance data analytics, we compare the key hardware capabilities of modern CPUs against modern GPUs in Table 1.1. As shown in the table, a typical GPU offers over 6x higher global memory bandwidth and close to 200x more cores than CPUs. This use of higher bandwidth global memory and their massively parallel architecture design is ideal for OLAP systems that require the parallel processing of a larger number of 3 Full text available at: http://dx.doi.org/10.1561/1900000076 4 Introduction data records. Therefore, we have witnessed significant efforts in the past decade to enable the use of GPUs in high performance data analytics operations (e.g., BreB et al., 2018; Paul et al., 2016; Chrysogelos et al., 2019a; Funke and Teubner, 2020). Table 1.1: Hardware comparison between GPUs and CPUs. GPU GPU CPU Hardware Feature (NVIDIA Quadro V100) (AMD Radeon RX 6900 XT) (Xeon Gold 6230R) Core Count 5,120 5,120 26 Base Clock (GHz) 1.1 1.825 4.0 Boost Clock (GHz) 1.62 2.25 4.0 Compute Cache Size (MB) 6 128 35.75 Memory Type HBM 2.0 GDDR6 DDR4-2933 Memory Size (GB) 32 16 1024 Memory Memory Bandwidth (GB/s) 870 512 140 Hardware Generation Volta Radeon 6000 Series Cascade Lake Price1 (USD) 8,000 2,100 2,500 Other TDP (W) 300 300 150 Despite the numerous hardware advantages of GPU hardware, de- signing and optimizing high performance database systems for GPUs is challenging due to the following reasons.