Database/Operating System Co-Design

DISS. ETH NO. 24063 Database/Operating System Co-Design A thesis submitted to attain the degree of DOCTOR OF SCIENCES of ETH ZURICH (Dr. sc. ETH Zürich) presented by JANA GICEVAˇ MSc in Computer Science, ETH Zürich born on 14.12.1987 citizen of Macedonia accepted on the recommendation of Prof. Dr. Gustavo Alonso (ETH Zürich), examiner Prof. Dr. Timothy Roscoe (ETH Zürich), co-examiner Dr. Timothy L. Harris (Oracle Labs, Cambridge, UK), co-examiner Dr. Kimberly K. Keeton (Hewlett Packard Laboratories), co-examiner 2016 Abstract For decades, database engines have found the generic interfaces offered by conventional operating systems at odds with the need for efficient utilization of hardware resources. This is partly due to the big semantic gap between the two layers. The rigid DB/OS interface does not allow for knowledge to flow between them, and as a result: (1) the operating system is unaware of the database requirements, and provides a set of general purpose policies and mechanisms for all applications running on top, (2) the database can, at best, duplicate a lot of the OS functionality internally at the cost of absorbing significant portion of additional complexity in order to efficiently use the underlying hardware – an approach that does not scale with the current pace of hardware developments. In this dissertation, I approach the problem from two perspectives. First, I reduce the knowledge gap between the database and the operating system by introducing an OS policy engine and a declarative interface between the two layers. I show how such extensions allow easier deployment on different machines, more robust execution in noisy environments, and close to optimal resource allocation without sacrificing performance or tail latencies. Second, I propose using an OS architecture which allows dynamic splitting of the machine resources into a control and compute plane. I show how a compute plane kernel can be tailored to the needs of data processing applications by integrating a kernel-based runtime for efficient execution of concurrent parallel analytical jobs. I also address a modern challenge of database optimizers regarding the balance of concurrency and parallelism, and the influence that modern multicore machines have on the problem. I conclude by discussing future research directions which arise from the work presented in this dissertation, and highlighting the potential of cross-layer optimizations on the system stack in the light of increasingly heterogeneous hardware platforms and modern workload requirements. i Zusammenfassung FürDatenbanksysteme stehen seit Jahrzehnten die generischen Schnittstellen, die von konventionellen Betriebssystemen bereitgestellt werden, mit dem Bedürfnisin Konflikt, Hardware-Ressourcen effizient zu nutzen. Ein Teil des Problems rührtvon der großen semantischen Lücke zwischen den beiden Schichten her. Die starre DB/OS-Schnittstelle erlaubt keinen Informationsfluss zwischen diesen Schichten. Als Folge davon hat (1) das Betriebssystem keine Kenntnis von den Anforderungen der Datenbank und implementiert darum nur generische Strategien und Mechanismen füralle auf ihm laufenden Anwen- dungen, und (2) dupliziert die Datenbank im besten Fall intern viel OS-Funktionalität, auf Kosten einer signifikanten Menge an zusätzlicher Komplexitätnur um die verfügbare Hardware effizient nutzen zu können– ein Vorgehen, das mit der rasanten Geschwindigkeit, mit der Hardware sich heutzutage ändert,nicht skaliert. In dieser Dissertation gehe ich das Problem von zwei Seiten an. Zuerst reduziere ich die Wissenslücke zwischen der Datenbank und dem Betriebssystem mithilfe einer OS-Policy- Engine und einer deklarativen Schnittstelle zwischen den beiden Schichten. Ich zeige, wie diese Erweiterung einen einfacheren Einsatz auf unterschiedlichen Maschinen, robustere Ausführungin unruhigen Systemen und nahezu optimale Ressourcenallokation ermöglicht, ohne Ausführungsgeschwindigkeit oder Tail-Latenzen zu opfern. Des Weiteren schlage ich eine OS-Architektur vor, welche es erlaubt, eine Maschine dy- namisch in eine Control- und eine Compute-Plane aufzuteilen. Ich zeige, wie ein Kernel in der Compute-Plane auf die Anforderungen einer datenverarbeitenden Anwendung spezial- isiert werden kann, indem ich eine Kernel-basierte Laufzeitumgebung fürdie effiziente Ausführungvon nebenläufigenDatenanalyse-Aufgaben integriere. Außerdem befasse ich mich mit der gegenwärtigenHerausfordung von Datenbank-Optimierern bezüglich der Balance von Nebenläufigkeit und Parallelitätund dem Einfluss, den moderne Mehrkern- Maschinen auf das Problem haben. iii Ich schließe mit der Diskussion von zukünftigenForschungsrichtungen, welche sich aus der vorgestellten Arbeit ergeben, und der Herausstellung des Potentials von schichtübergreifenden Optimierungen des System-Stacks im Lichte von immer heterogenerer Hardware-Plattformen und Anforderungen moderner Workloads. iv Acknowledgments This has been an amazing journey. From the early days I have been captivated by the thrill to approach and explore even the most challenging of problems. And it is thanks to my advisers, collaborators, family, and my friends that I have loved every bit of it. First, I would like to express my gratitude to my adviser Gustavo Alonso for all his support, advice, guidance, patience; for helping me grow as a scholar and for teaching me to love what it takes to do great research. I want to thank my co-adviser Timothy Roscoe for being supportive and always ready to give insightful feedback for my work, and for helping me improve myself as a researcher. I also extend my gratitude to my mentor from Oracle Labs, Tim Harris, for many great discussions, his feedback and guidance. I have greatly enjoyed our collaboration, which I hope we continue in the future. I would like to thank Kim Keeton for agreeing to be part of my PhD committee and her feedback that significantly improved the quality of my dissertation; John Wilkes for being supportive mentor for my Google PhD fellowship and always challenging me to clearly define what I do; Eric Sedlar and Nipun Agrawal for giving me the opportunity to work on Project RAPID whose experience in the early days of my PhD has been very rewarding; Donald Kossmann, Frank McSherry, Derek Murray, Michael Isard, Onur Mutlu, and many others from ETH, Oracle Labs and Microsoft Research SVC for your guidance, all of our discussions, and for allowing me to learn so much from all of it. I had the pleasure and luck to work with many great students and would like to thank all my collaborators: Tudor, Adrian, Ionut, Kaan, Claude, Darko, Gerd, Pratanu, Zaheer, and Simon P. It was such a rewarding experience working with all of you. Throughout the years, the friendship in the Systems Group has been one of the greatest highlights. Therefore, a big thank you goes to Anja, Besa, Pravin, Stefan, Gerd, Lukas, Zsolt, Pratanu, Akhi, Georgios, Tudor and Desi. I would also like to use this opportunity v to thank my closest allies and friends for many years: Tijana, Sanja, Kiki, Alen, Gogi, Ozan, Kaveh, Sukriti, Kaan, Josip, Irena, Kate. For almost everything I have achieved so far, I have to thank my parents Gjorgji and Ljubica. They have been supportive like no other. Both have been my role models for many years and have strongly encouraged me to follow my dreams. I would also like to thank my sister Mila for always cheering me up and supporting me wherever I go. And finally, to my best friend, biggest supporter and critic – Darko. Thank you for putting up with all different versions of me and for always being there. Thank you for teaching me the value of a balanced life, and for showing me the positive aspects in all cases. Much of this success is thanks to you. vi Contents 1 Introduction1 1.1 Background...................................2 1.2 Motivation and challenges...........................3 1.2.1 Hardware trends............................3 1.2.2 Deployment trends...........................4 1.3 Problem statement...............................5 1.4 Contributions..................................6 1.4.1 Policies and information flow.....................6 1.4.2 Customized OS support for data processing.............7 1.5 Thesis outline..................................9 1.6 Related publications.............................. 10 2 OS policy engine and adaptive DB storage engine 11 2.1 System Overview................................ 12 2.2 DB Storage engine............................... 14 2.2.1 The architecture of the storage engine................ 14 2.2.2 Working unit and its properties.................... 17 2.2.3 Properties of the CSCS storage engine................ 17 2.2.4 Embedding into COD......................... 18 2.3 The OS Policy engine.............................. 20 vii Contents 2.3.1 Architecture............................... 21 2.3.2 Implementation............................. 23 2.3.3 Discussion................................ 24 2.4 Interface..................................... 24 2.4.1 Scope.................................. 24 2.4.2 Semantics................................ 25 2.4.3 Syntax and implementation...................... 27 2.4.4 Evaluation................................ 27 2.5 Experiments................................... 27 2.5.1 Experimental Setup........................... 28 2.5.2 Deployment on different machines................... 28 2.5.3 Deployment in a noisy system..................... 35 2.5.4 Adaptability to changes........................ 40 2.6 Related work.................................. 43 2.6.1 Interacting with

Database/Operating System Co-Design

Chapter 1: Introduction What Is an Operating System?

Blockchain Database for a Cyber Security Learning System

An Introduction to Cloud Databases a Guide for Administrators

Middleware-Based Database Replication: the Gaps Between Theory and Practice

Personal-Computer Systems • Parallel Systems • Distributed Systems • Real -Time Systems

Database Technology for Bioinformatics from Information Retrieval to Knowledge Systems

What Is a Database? Differences Between the Internet and Library

Operating System

Data Quality Requirements Analysis and Modeling December 1992 TDQM-92-03 Richard Y

COSC 6385 Computer Architecture - Multi-Processors (IV) Simultaneous Multi-Threading and Multi-Core Processors Edgar Gabriel Spring 2011

Chapter 20: the Linux System

The UNIX Time- Sharing System