Automatic Database Management System Tuning Through Large-Scale Machine Learning

NEWSLETTER ON PDL ACTIVITIES AND EVENTS • SPRING 2 0 1 7 http://www.pdl.cmu.edu/ Automatic Database Management System AN INFORMAL PUBLICATION FROM ACADEMIA’S PREMIERE STORAGE Tuning Through Large-scale Machine Learning SYSTEMS RESEARCH CENTER DEVOTED by Dana Van Aken TO ADVANCING THE STATE OF THE Database management systems (DBMSs) are the most important component of ART IN STORAGE AND INFORMATION any modern data-intensive application. But, this is historically a difficult task INFRASTRUCTURES. because DBMSs have hundreds of configuration “knobs” that control everything in the system, such as the amount of memory to use for caches and how often data CONTENTS is written to storage. This means that organizations often hire human experts to help with tuning activities, though this method can be prohibitively expensive. Automatic DBMS Tuning ................. 1 OtterTune is a new automatic DBMS configuration tool that overcomes these challenges, making it easier for anyone to deploy a DBMS able to handle large Director’s Letter .............................2 amounts of data and more complex workloads without any expertise needed in Year in Review ...............................4 database administration. The key feature of OtterTune is that it reduces the amount Recent Publications ........................5 of time and resources it takes to tune a DBMS for a new application by leverag- PDL News & Awards........................8 ing knowledge gained from previous DBMS deployments. OtterTune maintains Big Learning in the Cloud .............. 12 a repository of data collected from previous tuning sessions, and uses this data to build models of how the DBMS responds to different knob configurations. For a Defenses & Proposals ..................... 14 new application, these models are used to guide experimentation and recommend New PDL Faculty .......................... 19 optimal settings to the user to improve a target objective (e.g., latency, throughput). Alumni Update ............................ 19 In this article, we first provide a high-level example to demonstrate the tasks per- formed by each of the components in the OtterTune system, and show how they PDL CONSORTIUM interact with one another when automatically tuning a DBMS’s configuration. MEMBERS We next discuss each of the components in OtterTune’s ML pipeline. Finally, we conclude with an evaluation of OtterTune’s efficacy by comparing the performance Broadcom, Ltd. of its best configuration with ones selected by DBAs and other automatic tuning Citadel tools for MySQL and Postgres. All the experiments for our evaluation and data Dell EMC collection were conducted on Amazon EC2 Spot Instances. OtterTune’s ML al- Google gorithms were implemented using Google TensorFlow and Python’s scikit-learn. Hewlett-Packard Labs The OtterTune System Hitachi, Ltd. Intel Corporation Figure 1 shows an overview of the components in the OtterTune system. At the start of Microsoft Research a new tuning session, the DBA tells OtterTune what metric to optimize when selecting MongoDB a configuration (e.g., latency, NetApp, Inc. throughput). The client-side Tuning Manager controller then connects to the Oracle Corporation Analysis Planning r target DBMS and collects its Samsung Information Systems America Amazon EC2 instance type and Seagate Technology current knob configuration. Tintri ML Controlle Models Data JDBC DBMS Repository Next, the controller starts Two Sigma its first observation period, Toshiba during which the controller Veritas Western Digital Figure 1: The OtterTune System continued on page 10 FROM THE DIRECTOR’S CHAIR THE PDL PACKET GREG GANGER THE PARALLEL DATA Hello from fabulous Pittsburgh! LABORATORY School of Computer Science It has been another great year for the Department of ECE Parallel Data Lab. Some highlights in- Carnegie Mellon University clude record enrollment in PDL’s stor- 5000 Forbes Avenue age systems and cloud classes, continued growth and success for PDL’s database systems and big-learning systems research, Pittsburgh, PA 15213-3891 and new directions for cloud and NVM research. Along the way, many students VOICE 412•268•6716 graduated and joined PDL Consortium companies, new students joined the FAX 412•268•3010 PDL, and many cool papers have been published. Let me highlight a few things. PUBLISHER Since it headlines this newsletter, I’ll also lead off my highlights by talking about Greg Ganger PDL’s database systems research. The largest scope activities focus on automation in DBMSs, such as in the front-page article, and creating a new open source EDITOR DBMS called Peloton to embody these and other ideas from the database group. Joan Digney There also continue to be cool results and papers on deduplication in databases, better exploitation of NVM in databases, and improved concurrency control The PDL Packet is published once per year to update members of the PDL mechanisms. The energy that Andy has infused into database systems research at Consortium. A pdf version resides in CMU makes it a lot of fun to watch and participate in. the Publications section of the PDL Web pages and may be freely distributed. In addition to applying machine learning (ML) to systems for automation pur- Contributions are welcome. poses, we continue to explore new approaches to system support for large-scale machine learning. As discussed in the second newsletter article, we have been THE PDL LOGO especially active in exploring how ML systems should adapt to cloud computing Skibo Castle and the lands that com- environments. prise its estate are located in the Kyle of Sutherland in the northeastern part of Important questions include how such systems should address dynamic resource Scotland. Both ‘Skibo’ and ‘Sutherland’ availability, unpredictable levels of time-varying resource interference, and are names whose roots are from Old Norse, the language spoken by the geo-distributed data. And, in a fun twist, we are developing new approaches to Vikings who began washing ashore automatic tuning for ML systems—ML to improve ML. reg ularly in the late ninth century. The word ‘Skibo’ fascinates etymologists, I continue to be excited about the growth and evolution of the storage systems who are unable to agree on its original and cloud classes created and led by PDL faculty... the popularity of both is at an meaning. All agree that ‘bo’ is the Old Norse for ‘land’ or ‘place,’ but they argue all-time high. For example, over 100 MS students completed the storage class last whether ‘ski’ means ‘ships’ or ‘peace’ Fall, and the project-intensive cloud classes serve even more. We refined the new or ‘fairy hill.’ myFTL project such that the students’ FTLs store real data in the NAND Flash Although the earliest version of Skibo SSD simulator we provide them, which includes basic interfaces for read-page, seems to be lost in the mists of time, write-page, and erase-block. The students write FTL code to maintain LBA-to- it was most likely some kind of fortified physical mappings, perform cleaning, and address wear leveling. In addition to building erected by the Norsemen. The our lectures and the projects, these classes feature 3-5 corporate guest lecturers present-day castle was built by a bishop of the Roman Catholic Church. Andrew (thank you, PDL Consortium members!) bringing insight on real-world solu- Carnegie, after making his fortune, tions, trends, and futures. bought it in 1898 to serve as his sum mer home. In 1980, his daughter, Mar garet, PDL continues to explore questions of resource scheduling for cloud computing, donated Skibo to a trust that later sold which grows in complexity as the breadth of application and resource types grows. the estate. It is presently being run as a luxury hotel. Our Tetrisched project has developed new ways of allowing users to express their per-job resource type preferences (e.g., machine locality or hardware accelera- tors) and then exploring the trade-offs among them to maximize utility of the public and/or private cloud infrastructure. Our most recent work explores more 2 THE PDL PACKET PARALLEL DATA LABORATORY FROM THE DIRECTOR’S CHAIR FACULTY robust ways of exploiting estimated runtime information, finding that provid- Greg Ganger (PDL Director) ing full distributions of likely runtimes (e.g., based on history of “similar” jobs) 412•268•1297 [email protected] works quite well for real-world workloads as reflected in real cluster traces. In collaboration with the new Intel-funded visual cloud systems research center, David Andersen Seth Copen Goldstein Lujo Bauer Mor Harchol-Balter we are also exploring how processing of visual data should be partitioned and Nathan Beckmann Todd Mowry distributed across capture (edge) devices and core cloud resources. Chuck Cranor Onur Mutlu Lorrie Cranor Priya Narasimhan Our explorations of how systems should be designed differently to exploit and Christos Faloutsos David O’Hallaron Kayvon Fatahalian Andy Pavlo work with new underlying storage technologies, especially NVM and SMR, Eugene Fink Majd Sakr continues to expand on many fronts. Activities range from new search and sort Rajeev Gandhi M. Satyanarayanan Saugata Ghose Srinivasan Seshan algorithms that understand the asymmetry between read and write performance Phil Gibbons Alex Smola (i.e., assuming using NVM directly) to a modified Linux Ext4 file system imple- Garth Gibson Hui Zhang mentation that mitigates drive-managed SMR performance overheads. We’re STAFF MEMBERS excited about continuing to work with PDL companies on understanding where Bill Courtright,

Load more