Automatic Database Management System Tuning Through Large-Scale Machine Learning

Total Page:16

File Type:pdf, Size:1020Kb

Automatic Database Management System Tuning Through Large-Scale Machine Learning NEWSLETTER ON PDL ACTIVITIES AND EVENTS • SPRING 2 0 1 7 http://www.pdl.cmu.edu/ Automatic Database Management System AN INFORMAL PUBLICATION FROM ACADEMIA’S PREMIERE STORAGE Tuning Through Large-scale Machine Learning SYSTEMS RESEARCH CENTER DEVOTED by Dana Van Aken TO ADVANCING THE STATE OF THE Database management systems (DBMSs) are the most important component of ART IN STORAGE AND INFORMATION any modern data-intensive application. But, this is historically a difficult task INFRASTRUCTURES. because DBMSs have hundreds of configuration “knobs” that control everything in the system, such as the amount of memory to use for caches and how often data CONTENTS is written to storage. This means that organizations often hire human experts to help with tuning activities, though this method can be prohibitively expensive. Automatic DBMS Tuning ................. 1 OtterTune is a new automatic DBMS configuration tool that overcomes these challenges, making it easier for anyone to deploy a DBMS able to handle large Director’s Letter .............................2 amounts of data and more complex workloads without any expertise needed in Year in Review ...............................4 database administration. The key feature of OtterTune is that it reduces the amount Recent Publications ........................5 of time and resources it takes to tune a DBMS for a new application by leverag- PDL News & Awards........................8 ing knowledge gained from previous DBMS deployments. OtterTune maintains Big Learning in the Cloud .............. 12 a repository of data collected from previous tuning sessions, and uses this data to build models of how the DBMS responds to different knob configurations. For a Defenses & Proposals ..................... 14 new application, these models are used to guide experimentation and recommend New PDL Faculty .......................... 19 optimal settings to the user to improve a target objective (e.g., latency, throughput). Alumni Update ............................ 19 In this article, we first provide a high-level example to demonstrate the tasks per- formed by each of the components in the OtterTune system, and show how they PDL CONSORTIUM interact with one another when automatically tuning a DBMS’s configuration. MEMBERS We next discuss each of the components in OtterTune’s ML pipeline. Finally, we conclude with an evaluation of OtterTune’s efficacy by comparing the performance Broadcom, Ltd. of its best configuration with ones selected by DBAs and other automatic tuning Citadel tools for MySQL and Postgres. All the experiments for our evaluation and data Dell EMC collection were conducted on Amazon EC2 Spot Instances. OtterTune’s ML al- Google gorithms were implemented using Google TensorFlow and Python’s scikit-learn. Hewlett-Packard Labs The OtterTune System Hitachi, Ltd. Intel Corporation Figure 1 shows an overview of the components in the OtterTune system. At the start of Microsoft Research a new tuning session, the DBA tells OtterTune what metric to optimize when selecting MongoDB a configuration (e.g., latency, NetApp, Inc. throughput). The client-side Tuning Manager controller then connects to the Oracle Corporation Analysis Planning r target DBMS and collects its Samsung Information Systems America Amazon EC2 instance type and Seagate Technology current knob configuration. Tintri ML Controlle Models Data JDBC DBMS Repository Next, the controller starts Two Sigma its first observation period, Toshiba during which the controller Veritas Western Digital Figure 1: The OtterTune System continued on page 10 FROM THE DIRECTOR’S CHAIR THE PDL PACKET GREG GANGER THE PARALLEL DATA Hello from fabulous Pittsburgh! LABORATORY School of Computer Science It has been another great year for the Department of ECE Parallel Data Lab. Some highlights in- Carnegie Mellon University clude record enrollment in PDL’s stor- 5000 Forbes Avenue age systems and cloud classes, continued growth and success for PDL’s database systems and big-learning systems research, Pittsburgh, PA 15213-3891 and new directions for cloud and NVM research. Along the way, many students VOICE 412•268•6716 graduated and joined PDL Consortium companies, new students joined the FAX 412•268•3010 PDL, and many cool papers have been published. Let me highlight a few things. PUBLISHER Since it headlines this newsletter, I’ll also lead off my highlights by talking about Greg Ganger PDL’s database systems research. The largest scope activities focus on automa- tion in DBMSs, such as in the front-page article, and creating a new open source EDITOR DBMS called Peloton to embody these and other ideas from the database group. Joan Digney There also continue to be cool results and papers on deduplication in databases, better exploitation of NVM in databases, and improved concurrency control The PDL Packet is published once per year to update members of the PDL mechanisms. The energy that Andy has infused into database systems research at Consortium. A pdf version resides in CMU makes it a lot of fun to watch and participate in. the Publications section of the PDL Web pages and may be freely distributed. In addition to applying machine learning (ML) to systems for automation pur- Contributions are welcome. poses, we continue to explore new approaches to system support for large-scale machine learning. As discussed in the second newsletter article, we have been THE PDL LOGO especially active in exploring how ML systems should adapt to cloud computing Skibo Castle and the lands that com- environments. prise its estate are located in the Kyle of Sutherland in the northeastern part of Important questions include how such systems should address dynamic resource Scotland. Both ‘Skibo’ and ‘Sutherland’ availability, unpredictable levels of time-varying resource interference, and are names whose roots are from Old Norse, the language spoken by the geo-distributed data. And, in a fun twist, we are developing new approaches to Vikings who began washing ashore automatic tuning for ML systems—ML to improve ML. reg ularly in the late ninth century. The word ‘Skibo’ fascinates etymologists, I continue to be excited about the growth and evolution of the storage systems who are unable to agree on its original and cloud classes created and led by PDL faculty... the popularity of both is at an meaning. All agree that ‘bo’ is the Old Norse for ‘land’ or ‘place,’ but they argue all-time high. For example, over 100 MS students completed the storage class last whether ‘ski’ means ‘ships’ or ‘peace’ Fall, and the project-intensive cloud classes serve even more. We refined the new or ‘fairy hill.’ myFTL project such that the students’ FTLs store real data in the NAND Flash Although the earliest version of Skibo SSD simulator we provide them, which includes basic interfaces for read-page, seems to be lost in the mists of time, write-page, and erase-block. The students write FTL code to maintain LBA-to- it was most likely some kind of fortified physical mappings, perform cleaning, and address wear leveling. In addition to building erected by the Norsemen. The our lectures and the projects, these classes feature 3-5 corporate guest lecturers present-day castle was built by a bishop of the Roman Catholic Church. Andrew (thank you, PDL Consortium members!) bringing insight on real-world solu- Carnegie, after making his fortune, tions, trends, and futures. bought it in 1898 to serve as his sum mer home. In 1980, his daughter, Mar garet, PDL continues to explore questions of resource scheduling for cloud computing, donated Skibo to a trust that later sold which grows in complexity as the breadth of application and resource types grows. the estate. It is presently being run as a luxury hotel. Our Tetrisched project has developed new ways of allowing users to express their per-job resource type preferences (e.g., machine locality or hardware accelera- tors) and then exploring the trade-offs among them to maximize utility of the public and/or private cloud infrastructure. Our most recent work explores more 2 THE PDL PACKET PARALLEL DATA LABORATORY FROM THE DIRECTOR’S CHAIR FACULTY robust ways of exploiting estimated runtime information, finding that provid- Greg Ganger (PDL Director) ing full distributions of likely runtimes (e.g., based on history of “similar” jobs) 412•268•1297 [email protected] works quite well for real-world workloads as reflected in real cluster traces. In collaboration with the new Intel-funded visual cloud systems research center, David Andersen Seth Copen Goldstein Lujo Bauer Mor Harchol-Balter we are also exploring how processing of visual data should be partitioned and Nathan Beckmann Todd Mowry distributed across capture (edge) devices and core cloud resources. Chuck Cranor Onur Mutlu Lorrie Cranor Priya Narasimhan Our explorations of how systems should be designed differently to exploit and Christos Faloutsos David O’Hallaron Kayvon Fatahalian Andy Pavlo work with new underlying storage technologies, especially NVM and SMR, Eugene Fink Majd Sakr continues to expand on many fronts. Activities range from new search and sort Rajeev Gandhi M. Satyanarayanan Saugata Ghose Srinivasan Seshan algorithms that understand the asymmetry between read and write performance Phil Gibbons Alex Smola (i.e., assuming using NVM directly) to a modified Linux Ext4 file system imple- Garth Gibson Hui Zhang mentation that mitigates drive-managed SMR performance overheads. We’re STAFF MEMBERS excited about continuing to work with PDL companies on understanding where Bill Courtright,
Recommended publications
  • NUTANIX, INC. (Exact Name of Registrant As Specified in Its Charter)
    Table of Contents Index to Financial Statements As filed with the Securities and Exchange Commission on September 19, 2016 Registration No. 333-208711 UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549 AMENDMENT NO. 5 TO FORM S-1 REGISTRATION STATEMENT UNDER THE SECURITIES ACT OF 1933 NUTANIX, INC. (Exact name of Registrant as specified in its charter) Delaware 7372 27-0989767 (State or other jurisdiction of (Primary Standard Industrial (I.R.S. Employer incorporation or organization) Classification Code Number) Identification Number) 1740 Technology Drive, Suite 150 San Jose, California 95110 (408) 216-8360 (Address, including zip code, and telephone number, including area code, of Registrant’s principal executive offices) Dheeraj Pandey Chief Executive Officer and Chairman Nutanix, Inc. 1740 Technology Drive, Suite 150 San Jose, California 95110 (408) 216-8360 (Name, address, including zip code, and telephone number, including area code, of agent for service) Copies to: Jeffrey D. Saper, Esq. Eric S. Whitaker, Esq. Jeffrey R. Vetter, Esq. Mark B. Baudler, Esq. Olive Huang, Esq. James D. Evans, Esq. Andrew D. Hoffman, Esq. Nutanix, Inc. Fenwick & West LLP Wilson Sonsini Goodrich & Rosati, P.C. 1740 Technology Drive, Suite 150 801 California Street 650 Page Mill Road San Jose, California 95110 Mountain View, California 94041 Palo Alto, California 94304 (408) 216-8360 (650) 988-8500 (650) 493-9300 Approximate date of commencement of proposed sale to the public: As soon as practicable after this Registration Statement becomes effective. If any of the securities being registered on this Form are to be offered on a delayed or continuous basis pursuant to Rule 415 under the Securities Act, check the following box.
    [Show full text]
  • Solutions Exchange Virtual Event Guide August 24–28, San Francisco
    SOLUTIONS EXCHANGE OVERVIEW SOLUTIONS EXCHANGE VIRTUAL EVENT GUIDE AUGUST 24–28, SAN FRANCISCO VMworld 2014 Solutions Exchange Virtual Event Guide 1 Experience the Cisco Unified Computing System in Booth 1217 Americas revenue market UCS Channel 1 #1 share in x86 blades 36,000+ Partners Bring IT Unleash the Possibilities In Collaboration with Intel® 1 Source: IDC Worldwide Quarterly Server Tracker, 2014 Q1, May 2014, Vendor Review Share © 2014 Cisco and/or its affiliates. All rights reserved. All third-party products belong to the companies that own them. Cisco, the Cisco logo, and Cisco UCS are trademarks or registered trademarks of Cisco and/or its affiliates in the U.S. and other countries. Intel, the Intel logo, Xeon and Xeon Inside are trademarks or registered trademarks of Intel Corporation in the U.S. and/or other countries. All other trademarks are the property of their respective owners. VMworld 2014 Solutions Exchange Virtual Event Guide 2 SOLUTIONS EXCHANGE OVERVIEW Solutions Exchange Welcome Reception REFUELING LOUNGE 2435 SOLUTIONS EXCHANGE HALL CRAWL Feet hurting? Want a quick snack? Need The Solutions Exchange located in Sunday, August 24 4:00 PM–7:00 PM a pick-me-up to get ready for your next Tuesday, August 26 4:30 PM–6:00 PM Moscone South, Lower Level, features session? Stop in the Refueling Lounge and Join us in celebrating the VMworld 2014 hundreds of VMware partners showcasing Join us to kick off VMworld in the get a robust coffee, delicious cookies and Hall Crawl. Visit the following Sponsors for the latest virtualization and cloud Solutions Exchange in Moscone South, buttery popcorn.
    [Show full text]
  • 2017 Annual Report
    TO OUR STOCKHOLDERS I am honored to address our stockholders in our very first annual report since going public in September 2016. Nutanix is an 8-year old company, and I am incredibly proud of what we have achieved in the first arc of our company’s life. In less than six years since landing our first paying customer, we’ve sold products and services worth $2 billion, and more than 3/4 of that business has come in the last 24 months. We are one of the rare companies that achieved a $1 billion billings run rate at more than 50 percent annual growth. And yet we’ve barely begun our journey of transforming the computing landscape of the enterprise. Foresight and Simplicity Digitization (virtualization) is an unstoppable phenomenon in computing. We saw this with music, photography, and maps, as they all converged into pure software and as digital constructs in our consumer lives. We brought that foresight to enterprise storage and compute. And by doing so, we improved machine productivity by bringing data closer to applications, and also human productivity by breaking down artificial walls in IT departments. By standardizing on commodity hardware and a common operating system, we delivered eco- nomies of scale that were unprecedented in enterprise datacenters. But none of this would have been possible if we hadn’t obsessively focused on product design. The elegance of Nutanix products is in their simplicity, and in our ability to bring a consumer-grade experience to enterprise-grade systems. We are now on a path to digitizing networking, security, and effectively the entire datacenter.
    [Show full text]
  • Nutanix, Inc. 10K 2019 V1
    Nutanix is a global leader in cloud software and hyperconverged infrastructure solutions, making infrastructure invisible so that IT can focus on the applications and services that power their business. Companies around the world use Nutanix Enterprise Cloud OS software to bring one-click application management and mobility across public, private and distributed edge clouds so they can run any application at any scale with a dramatically lower total cost of ownership. The result is organizations that can rapidly deliver a high-performance IT environment on demand, giving application owners a true cloud-like experience. BOARD OF DIRECTORS NUTANIX CORPORATE HEADQUARTERS Dheeraj Pandey 1740 Technology Drive, Suite 150 Chief Executive Officer and Chairman, Nutanix, Inc. San Jose, CA 95110 (408) 216-8360 Susan L. Bostrom (408) 890-4833 Former Executive Vice President, Chief Marketing Officer, www.nutanix.com Worldwide Government Affairs, Cisco Systems, Inc. INVESTOR RELATIONS Craig Conway Tonya Chin Former Chief Executive Officer, PeopleSoft, Inc. Vice President, Corporate Communications and Investor Relations (408) 560-2675 Steven J. Gomo Email: [email protected] Former Chief Financial Officer, NetApp, Inc. You may also reach us by visiting the investor relations John McAdam portion of our website at: ir.nutanix.com Former Chief Executive Officer, F5 Networks, Inc. Our Class A common stock trades on The Nasdaq Ravi Mhatre Global Select Market under the ticker symbol NTNX. Managing Director, Lightspeed Ventures REGISTRAR AND TRANSFER AGENT Jeffrey T. Parks For questions regarding stockholder accounts or changes General Partner, Riverwood Capital of address, please contact our transfer agent: Michael P. Scarpelli Computershare Trust Company, N.A.
    [Show full text]
  • NUTANIX, INC. (Exact Name of Registrant As Specified in Its Charter)
    UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549 FORM 10-K (Mark One) ☒ ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the fiscal year ended July 31, 2020 OR ☐ TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the transition period from to Commission File Number: 001-37883 NUTANIX, INC. (Exact name of registrant as specified in its charter) Delaware 27-0989767 (State or other jurisdiction of (I.R.S. Employer incorporation or organization) Identification No.) 1740 Technology Drive, Suite 150 San Jose, CA 95110 (Address of principal executive offices, including zip code) (408) 216-8360 (Registrant's telephone number, including area code) Securities registered pursuant to Section 12(b) of the Act: Title of each class Trading symbol(s) Name of each exchange on which registered Class A common stock, $0.000025 par value per share NTNX NASDAQ Global Select Market Indicate by check mark whether the registrant is a well-known seasoned issuer, as defined in Rule 405 of the Securities Act. Yes ☒ No ☐ Indicate by check mark whether the registrant is not required to file reports pursuant to Section 13 or 15(d) of the Securities Exchange Act of 1934. Yes ☐ No ☒ Indicate by check mark whether the registrant (1) has filed all reports required to be filed by Section 13 or 15(d) of the Securities Exchange Act of 1934 during the preceding 12 months (or for such shorter period that the registrant was required to file such reports), and (2) has been subject to such filing requirements for the past 90 days.
    [Show full text]
  • Ntnx-07.31.2018-10K
    DEAR STOCKHOLDERS, This September, we completed two years as a public company and nine years since we were founded in 2009. It has not even been a decade and we have already done more than $3.5 billion in lifetime sales, transformed to a software business model while being publicly traded, surpassed $1 billion in annual software and support revenue run rate, and achieved the 10,000 customer mark while still keeping our Net Promoter Score above 90. And during this time, we’ve consistently maintained our leadership position in Gartner, Forrester, and IDC rankings for hyperconverged infrastructure (HCI). For a company that until two years ago — during its IPO roadshow — was prematurely assumed by Wall Street as being swept away in the wake of public cloud, we’ve definitely held our own with Main Street. Our market thesis — of what cloud computing would look like between private (owned) and public (rented) models — is now the dominant thinking BOARD OF DIRECTORS NUTANIX CORPORATE HEADQUARTERS within the enterprise. The new era of hyper-convergence is upon us. Blurring the lines between owned and rented models is the most difficult computer science problem of this decade. Not many companies are ready to exploit the opportunity, their megaphones notwithstanding. www.nutanix.com True Private Cloud is Hard; Requires Integrity INVESTOR RELATIONS The core of HCI is software-defined infrastructure, in that all datacenter devices need to move to pure software running on commodity x86 servers. Standardized hardware, a common operating system, consumer-grade design, and deep automation are the distinct [email protected] virtues that make a true cloud, public or private.
    [Show full text]
  • Storage Spectacular!
    content provided by STORAGE SPECTACULAR! A comprehensive look at virtual storage from Virtualization Review and Redmond magazine, plus a Storage Buyer’s Guide. > The Storage Infrastruggle Page 1 > Do You Really Need Storage Management Software? Page 17 > Storage: Virtualized vs. Software-DefinedPage 27 > Storage Disruptors Page 36 > In the Cloud Era, the Era of Convergence Is Upon Us Page 49 > The 2014 Virtualization Review Buyers Guide Page 56 SPONSORS Storage Spectacular The Storage Infrastruggle Vendors are battling for the future of your storage spend, but are they sidestepping the key drivers of storage cost? By Jon Toigo hile touting “new,” flash-heavy “server-side” topologies and so-called “software-defined architec- tures” as evolutionary replacements for the “legacy” W SANs and NAS appliances that companies deployed in their previous refresh cycles, the industry continues to ignore the more fundamental drivers of storage inefficiency and cost: lack of management both of infrastructure and of data. 1 Storage Spectacular You would think from news reports that the storage industry was on its last legs. Vendor revenues from sales of everything from hard disk drives to external storage arrays, storage area networks (SANs) and network attached storage (NAS) appliances are either flat or declining ever so slightly. Seagate and Western Digital have hit a patch of dol- drums in which the trends of prior years—the doubling of drive capacities every 18 months and the halving of cost per GB every year—have frozen in their tracks. Even the redoubtable EMC, NetApp and IBM are experiencing purchasing slowdowns at customer shops that shouldn’t be happening if the capacity demand explosion that analysts say accompanies server virtualization is to be believed.
    [Show full text]
  • 2020 Annual Report & Proxy Statement
    Nutanix is a global leader in cloud software and a pioneer in hyperconverged infrastructure solutions, making computing invisible anywhere. Organizations around the world use Nutanix software to leverage a single platform to manage any app at any location for their private, hybrid and multicloud environments. BOARD OF DIRECTORS NUTANIX CORPORATE HEADQUARTERS Dheeraj Pandey 1740 Technology Drive, Suite 150 Chief Executive Officer and Chairman, Nutanix, Inc. San Jose, CA 95110 (408) 216-8360 Sohaib Abbasi (408) 890-4833 Former Chairman, Chief Executive Officer and President, www.nutanix.com Informatica Corporation INVESTOR RELATIONS Susan L. Bostrom Tonya Chin Former Executive Vice President, Chief Marketing Officer, SVP, Corporate Marketing, IR and Chief Communications Officer Worldwide Government Affairs, Cisco Systems, Inc. (408) 560-2675 Email: [email protected] Craig Conway Former Chief Executive Officer, PeopleSoft, Inc. You may also reach us by visiting the investor relations portion of our website at: ir.nutanix.com Virginia Gambale Managing Partner, Azimuth Partners LLC Our Class A common stock trades on The Nasdaq Global Select Market under the ticker symbol NTNX. Steven J. Gomo Former Chief Financial Officer, NetApp, Inc. REGISTRAR AND TRANSFER AGENT For questions regarding stockholder accounts or changes Max de Groen of address, please contact our transfer agent: Managing Director, Bain Capital Private Equity Computershare Trust Company, N.A. David Humphrey 462 South 4th Street, Suite 1600 Managing Director, Bain Capital Private Equity Louisville, KY 40202 T (U.S. and Canada): (877) 373-6374 Ravi Mhatre T (Outside U.S. and Canada): (781) 575-3100 Managing Director, Lightspeed Ventures www.computershare.com Jeffrey T.
    [Show full text]