Database/Operating System Co-Design
Total Page:16
File Type:pdf, Size:1020Kb
DISS. ETH NO. 24063 Database/Operating System Co-Design A thesis submitted to attain the degree of DOCTOR OF SCIENCES of ETH ZURICH (Dr. sc. ETH Z¨urich) presented by JANA GICEVAˇ MSc in Computer Science, ETH Z¨urich born on 14.12.1987 citizen of Macedonia accepted on the recommendation of Prof. Dr. Gustavo Alonso (ETH Z¨urich), examiner Prof. Dr. Timothy Roscoe (ETH Z¨urich), co-examiner Dr. Timothy L. Harris (Oracle Labs, Cambridge, UK), co-examiner Dr. Kimberly K. Keeton (Hewlett Packard Laboratories), co-examiner 2016 Abstract For decades, database engines have found the generic interfaces offered by conventional operating systems at odds with the need for efficient utilization of hardware resources. This is partly due to the big semantic gap between the two layers. The rigid DB/OS interface does not allow for knowledge to flow between them, and as a result: (1) the operating system is unaware of the database requirements, and provides a set of general purpose policies and mechanisms for all applications running on top, (2) the database can, at best, duplicate a lot of the OS functionality internally at the cost of absorbing significant portion of additional complexity in order to efficiently use the underlying hardware – an approach that does not scale with the current pace of hardware developments. In this dissertation, I approach the problem from two perspectives. First, I reduce the knowledge gap between the database and the operating system by introducing an OS policy engine and a declarative interface between the two layers. I show how such extensions allow easier deployment on different machines, more robust execution in noisy environments, and close to optimal resource allocation without sacrificing performance or tail latencies. Second, I propose using an OS architecture which allows dynamic splitting of the machine resources into a control and compute plane. I show how a compute plane kernel can be tailored to the needs of data processing applications by integrating a kernel-based runtime for efficient execution of concurrent parallel analytical jobs. I also address a modern challenge of database optimizers regarding the balance of concurrency and parallelism, and the influence that modern multicore machines have on the problem. I conclude by discussing future research directions which arise from the work presented in this dissertation, and highlighting the potential of cross-layer optimizations on the system stack in the light of increasingly heterogeneous hardware platforms and modern workload requirements. i Zusammenfassung F¨urDatenbanksysteme stehen seit Jahrzehnten die generischen Schnittstellen, die von konventionellen Betriebssystemen bereitgestellt werden, mit dem Bed¨urfnisin Konflikt, Hardware-Ressourcen effizient zu nutzen. Ein Teil des Problems r¨uhrtvon der großen semantischen L¨ucke zwischen den beiden Schichten her. Die starre DB/OS-Schnittstelle erlaubt keinen Informationsfluss zwischen diesen Schichten. Als Folge davon hat (1) das Betriebssystem keine Kenntnis von den Anforderungen der Datenbank und implementiert darum nur generische Strategien und Mechanismen f¨uralle auf ihm laufenden Anwen- dungen, und (2) dupliziert die Datenbank im besten Fall intern viel OS-Funktionalit¨at, auf Kosten einer signifikanten Menge an zus¨atzlicher Komplexit¨atnur um die verf¨ugbare Hardware effizient nutzen zu k¨onnen– ein Vorgehen, das mit der rasanten Geschwindigkeit, mit der Hardware sich heutzutage ¨andert,nicht skaliert. In dieser Dissertation gehe ich das Problem von zwei Seiten an. Zuerst reduziere ich die Wissensl¨ucke zwischen der Datenbank und dem Betriebssystem mithilfe einer OS-Policy- Engine und einer deklarativen Schnittstelle zwischen den beiden Schichten. Ich zeige, wie diese Erweiterung einen einfacheren Einsatz auf unterschiedlichen Maschinen, robustere Ausf¨uhrungin unruhigen Systemen und nahezu optimale Ressourcenallokation erm¨oglicht, ohne Ausf¨uhrungsgeschwindigkeit oder Tail-Latenzen zu opfern. Des Weiteren schlage ich eine OS-Architektur vor, welche es erlaubt, eine Maschine dy- namisch in eine Control- und eine Compute-Plane aufzuteilen. Ich zeige, wie ein Kernel in der Compute-Plane auf die Anforderungen einer datenverarbeitenden Anwendung spezial- isiert werden kann, indem ich eine Kernel-basierte Laufzeitumgebung f¨urdie effiziente Ausf¨uhrungvon nebenl¨aufigenDatenanalyse-Aufgaben integriere. Außerdem befasse ich mich mit der gegenw¨artigenHerausfordung von Datenbank-Optimierern bez¨uglich der Balance von Nebenl¨aufigkeit und Parallelit¨atund dem Einfluss, den moderne Mehrkern- Maschinen auf das Problem haben. iii Ich schließe mit der Diskussion von zuk¨unftigenForschungsrichtungen, welche sich aus der vorgestellten Arbeit ergeben, und der Herausstellung des Potentials von schicht¨ubergreifenden Optimierungen des System-Stacks im Lichte von immer heterogenerer Hardware-Plattformen und Anforderungen moderner Workloads. iv Acknowledgments This has been an amazing journey. From the early days I have been captivated by the thrill to approach and explore even the most challenging of problems. And it is thanks to my advisers, collaborators, family, and my friends that I have loved every bit of it. First, I would like to express my gratitude to my adviser Gustavo Alonso for all his support, advice, guidance, patience; for helping me grow as a scholar and for teaching me to love what it takes to do great research. I want to thank my co-adviser Timothy Roscoe for being supportive and always ready to give insightful feedback for my work, and for helping me improve myself as a researcher. I also extend my gratitude to my mentor from Oracle Labs, Tim Harris, for many great discussions, his feedback and guidance. I have greatly enjoyed our collaboration, which I hope we continue in the future. I would like to thank Kim Keeton for agreeing to be part of my PhD committee and her feedback that significantly improved the quality of my dissertation; John Wilkes for being supportive mentor for my Google PhD fellowship and always challenging me to clearly define what I do; Eric Sedlar and Nipun Agrawal for giving me the opportunity to work on Project RAPID whose experience in the early days of my PhD has been very rewarding; Donald Kossmann, Frank McSherry, Derek Murray, Michael Isard, Onur Mutlu, and many others from ETH, Oracle Labs and Microsoft Research SVC for your guidance, all of our discussions, and for allowing me to learn so much from all of it. I had the pleasure and luck to work with many great students and would like to thank all my collaborators: Tudor, Adrian, Ionut, Kaan, Claude, Darko, Gerd, Pratanu, Zaheer, and Simon P. It was such a rewarding experience working with all of you. Throughout the years, the friendship in the Systems Group has been one of the greatest highlights. Therefore, a big thank you goes to Anja, Besa, Pravin, Stefan, Gerd, Lukas, Zsolt, Pratanu, Akhi, Georgios, Tudor and Desi. I would also like to use this opportunity v to thank my closest allies and friends for many years: Tijana, Sanja, Kiki, Alen, Gogi, Ozan, Kaveh, Sukriti, Kaan, Josip, Irena, Kate. For almost everything I have achieved so far, I have to thank my parents Gjorgji and Ljubica. They have been supportive like no other. Both have been my role models for many years and have strongly encouraged me to follow my dreams. I would also like to thank my sister Mila for always cheering me up and supporting me wherever I go. And finally, to my best friend, biggest supporter and critic – Darko. Thank you for putting up with all different versions of me and for always being there. Thank you for teaching me the value of a balanced life, and for showing me the positive aspects in all cases. Much of this success is thanks to you. vi Contents 1 Introduction1 1.1 Background...................................2 1.2 Motivation and challenges...........................3 1.2.1 Hardware trends............................3 1.2.2 Deployment trends...........................4 1.3 Problem statement...............................5 1.4 Contributions..................................6 1.4.1 Policies and information flow.....................6 1.4.2 Customized OS support for data processing.............7 1.5 Thesis outline..................................9 1.6 Related publications.............................. 10 2 OS policy engine and adaptive DB storage engine 11 2.1 System Overview................................ 12 2.2 DB Storage engine............................... 14 2.2.1 The architecture of the storage engine................ 14 2.2.2 Working unit and its properties.................... 17 2.2.3 Properties of the CSCS storage engine................ 17 2.2.4 Embedding into COD......................... 18 2.3 The OS Policy engine.............................. 20 vii Contents 2.3.1 Architecture............................... 21 2.3.2 Implementation............................. 23 2.3.3 Discussion................................ 24 2.4 Interface..................................... 24 2.4.1 Scope.................................. 24 2.4.2 Semantics................................ 25 2.4.3 Syntax and implementation...................... 27 2.4.4 Evaluation................................ 27 2.5 Experiments................................... 27 2.5.1 Experimental Setup........................... 28 2.5.2 Deployment on different machines................... 28 2.5.3 Deployment in a noisy system..................... 35 2.5.4 Adaptability to changes........................ 40 2.6 Related work.................................. 43 2.6.1 Interacting with