Building a Columnar Database on Shared Main Memory-Based Storage

Building a Columnar Database on Shared Main Memory-Based Storage Database Operator Placement in a Shared Main Memory-Based Storage System that Supports Data Access and Code Execution Dissertation zur Erlangung des akademischen Grades “doctor rerum naturalium” (Dr. rer. nat.) in der Wissenschaftsdisziplin “Praktische Informatik” eingereicht an der Mathematisch-Naturwissenschaftlichen Fakultät der Universität Potsdam von Christian Tinnefeld [email protected] Hasso-Plattner-Institut für IT Systems Engineering Fachgebiet Enterprise Platform and Integration Concepts August-Bebel-Str. 88 14482 Potsdam, Germany http://epic.hpi.uni-potsdam.de/ Supervisor/Examiners: Prof. Dr. Dr. h.c. Hasso Plattner, Hasso-Plattner-Institut Prof. Dr. Donald Kossmann, ETH Zürich Prof. John Ousterhout, PhD, Stanford University Hasso-Plattner-Institut Potsdam, Germany Date of thesis defense: 14th of November 2014 Published online at the Institutional Repository of the University of Potsdam: URL http://publishup.uni-potsdam.de/opus4-ubp/frontdoor/index/index/docId/7206/ URN urn:nbn:de:kobv:517-opus4-72063 http://nbn-resolving.de/urn:nbn:de:kobv:517-opus4-72063 Title of the Thesis Building a Columnar Database on Shared Main Memory-Based Storage: Database Operator Placement in a Shared Main Memory-Based Storage System that Supports Data Access and Code Execution Author: Christian Tinnefeld Hasso-Plattner-Institut Supervisor/Examiners: 1. Prof. Dr. Dr. h.c. Hasso Plattner, Hasso-Plattner-Institut 2. Prof. Dr. Donald Kossmann, ETH Zürich 3. Prof. John Ousterhout, PhD, Stanford University Abstract: In the field of disk-based parallel database management systems exists a great variety of solutions based on a shared-storage or a shared-nothing architecture. In contrast, main memory-based parallel database management systems are domina- ted solely by the shared-nothing approach as it preserves the in-memory performance advantage by processing data locally on each server. We argue that this unilateral development is going to cease due to the combination of the follow- ing three trends: a) Nowadays network technology features remote direct memory access (RDMA) and narrows the performance gap between accessing main memory inside a server and of a remote server to and even below a single order of magnitude. b) Modern storage systems scale gracefully, are elastic, and provide high-availability. c) A modern storage system such as Stanford’s RAMCloud even keeps all data resident in main memory. Exploiting these characteristics in the context of a main-memory parallel database management system is desirable. The advent of RDMA-enabled network technology makes the creation of a parallel main memory DBMS based on a shared-storage approach feasible. This thesis describes building a columnar database on shared main memory-based storage. The thesis discusses the resulting architecture (Part I), the implications on query processing (Part II), and presents an evaluation of the resulting solution in terms of performance, high-availability, and elasticity (Part III). In our architecture, we use Stanford’s RAMCloud as shared-storage, and the self- designed and developed in-memory AnalyticsDB as relational query processor on top. AnalyticsDB encapsulates data access and operator execution via an inter- face which allows seamless switching between local and remote main memory, while RAMCloud provides not only storage capacity, but also processing power. Com- bining both aspects allows pushing-down the execution of database operators into the storage system. We describe how the columnar data processed by AnalyticsDB is mapped to RAMCloud’s key-value data model and how the performance advan- tages of columnar data storage can be preserved. The combination of fast network technology and the possibility to execute database operators in the storage system opens the discussion for site selection. We construct a system model that allows the estimation of operator execution costs in terms of network transfer, data processed in memory, and wall time. This can be used for database operators that work on one relation at a time — such as a scan or materialize operation — to discuss the site selection problem (data pull vs. operator push). Since a database query translates to the execution of several database operators, it is possible that the optimal site selection varies per operator. For the execution of a database operator that works on two (or more) relations at a time, such as a join, the system model is enriched by additional factors such as the chosen algorithm (e.g. Grace- vs. Distributed Block Nested Loop Join vs. Cyclo-Join), the data partitioning of the respective relations, and their overlapping as well as the allowed resource allocation. We present an evaluation on a cluster with 60 nodes where all nodes are connected via RDMA-enabled network equipment. We show that query processing performance is about 2.4x slower if everything is done via the data pull operator execution strategy (i.e. RAMCloud is being used only for data access) and about 27% slower if operator execution is also supported inside RAMCloud (in comparison to operating only on main memory inside a server without any network commu- nication at all). The fast-crash recovery feature of RAMCloud can be leveraged to provide high-availability, e.g. a server crash during query execution only delays the query response for about one second. Our solution is elastic in a way that it can adapt to changing workloads a) within seconds, b) without interruption of the ongoing query processing, and c) without manual intervention. Titel der Arbeit Building a Columnar Database on Shared Main Memory-Based Storage: Database Operator Placement in a Shared Main Memory-Based Storage System that Supports Data Access and Code Execution Autor: Christian Tinnefeld Hasso-Plattner-Institut Betreuer: 1. Prof. Dr. Dr. h.c. Hasso Plattner, Hasso-Plattner-Institut 2. Prof. Dr. Donald Kossmann, ETH Zürich 3. Prof. John Ousterhout, PhD, Stanford University Zusammenfassung: Diese Arbeit beschreibt die Erstellung einer spalten-orientierten Datenbank auf einem geteilten, Hauptspeicher-basierenden Speichersystem. Motiviert wird diese Arbeit durch drei Faktoren. Erstens ist moderne Netzwerktechnologie mit “Remote Direct Memory Access” (RDMA) ausgestattet. Dies reduziert den Un- terschied hinsichtlich Latenz und Durchsatz zwischen dem Speicherzugriffinner- halb eines Rechners und auf einen entfernten Rechner auf eine Größenordnung. Zweitens skalieren moderne Speichersysteme, sind elastisch und hochverfügbar. Drittens hält ein modernes Speichersystem wie Stanford’s RAMCloud alle Daten im Hauptspeicher vor. Diese Eigenschaften im Kontext einer spalten-orientierten Datenbank zu nutzen ist erstrebenswert. Die Arbeit ist in drei Teile untergliedert. Der erste Teile beschreibt die Architektur einer spalten-orientierten Datenbank auf einem geteilten, Hauptspeicher-basierenden Speichersystem. Hierbei werden die im Rahmen dieser Arbeit entworfene und entwickelte Datenbank AnalyticsDB sowie Stanford’s RAMCloud verwendet. Die Architektur beschreibt wie Datenzugriffund Operatorausführung gekapselt werden um nahtlos zwischen lokalem und entfernten Hauptspeicher wechseln zu können. Weiterhin wird die Ablage der nach einem re- lationalen Schema formatierten Daten von AnalyticsDB in RAMCloud behandelt, welches mit einem Schlüssel-Wertpaar Datenmodell operiert. Der zweite Teil fokus- siert auf die Implikationen bei der Abarbeitung von Datenbankanfragen. Hier steht die Diskussion im Vordergrund wo (entweder in AnalyticsDB oder in RAMCloud) und mit welcher Parametrisierung einzelne Datenbankoperationen ausgeführt werden. Dafür werden passende Kostenmodelle vorgestellt, welche die Abbildung von Datenbankoperationen ermöglichen, die auf einer oder mehreren Relationen arbei- ten. Der dritte Teil der Arbeit präsentiert eine Evaluierung auf einem Verbund von 60 Rechnern hinsichtlich der Leistungsfähigkeit, der Hochverfügbarkeit und der Elastizität vom System. Acknowledgments When you wake up in the morning, what makes you looking forward to going to work? Two things: challenging tasks to work on and intellectually curious and fun colleagues to work with. I am very thankful to Prof. Hasso Plattner for creating an environment where having both is the norm: studying, learning, and working at the Hasso-Plattner-Institut is one the most rewarding experience of my life. I am also very grateful for the opportunity to be part of Prof. Hasso Plattner’s research group that is driven by his enthusiasm and energy. Talking about intellectually curious and fun colleagues, Dr. Anja Bog, Martin Faust, Dr. Martin Grund, Dr. Jens Krüger, Stephan Müller, Dr. Jan Schaffner, David Schwalb, Dr. Matthias Uflacker, and Johannes Wust significantly con- tributed to my well-being during the last five years. I want to thank my students Daniel Taschik and Björn Wagner for helping me to implement AnalyticsDB. I also want to thank Andrea Lange for her great administrative support. From ETH Zürich, I am very grateful to Prof. Donald Kossmann for his guidance, feedback, and lengthy Skype sessions shortly before paper deadline. I also thank Simon Loesing and Markus Pilman for fruitful discussions. From Stanford University, I thank Prof. John Ousterhout for his time, effort, and hospitality. Working together with him as well as with the CHEC (“Cup is Half Empty Club”) consisting of Diego Ongaro, Dr. Stephen Rumble, and Dr. Ryan Stutsman was one of the best learning experience ever.

Building a Columnar Database on Shared Main Memory-Based Storage

Amcham Transatlantic Partnership Award 2010 Awarded to SAP Founder and Institutional Trustor Hasso Plattner

Successfully Linking Science and Business Status: January 2017

Hasso Plattner Receives Honorary Doctorate

Frosh Olympics: Citius-Altius-Fortiuscontinued from Page 1 Year

National Hockey League Operations

View Annual Report

The Hasso Plattner Institute in Brief

TC-ECBS Newsletter

Reference Model for Design Thinking

SAP Conference

National Hockey League

PRESS RELEASE 05 Wednesday, 21 June 2006