Understanding and Optimizing I/O Virtualization in Data Centers

Total Page:16

File Type:pdf, Size:1020Kb

Understanding and Optimizing I/O Virtualization in Data Centers Understanding and Optimizing I/O Virtualization in Data Centers by Ron Chi-Lung Chiang M.Sc. in Computer Science, May 2001, National Chung Cheng University B.Sc. in Computer Science, May 1999, Tamkang University A Dissertation submitted to The Faculty of The School of Engineering and Applied Science of the George Washington University in partial fulfillment of the requirements for the degree of Doctor of Philosophy January 31, 2014 Dissertation directed by H. Howie Huang Assistant Professor of Engineering and Applied Science The School of Engineering and Applied Science of The George Washington Univer- sity certifies that Ron Chi-Lung Chiang has passed the Final Examination for the degree of Doctor of Philosophy as of August 28, 2013. This is the final and approved form of the dissertation Understanding and Optimizing I/O Virtualization in Data Centers Ron Chi-Lung Chiang Dissertation Research Committee: Howie Huang, Assistant Professor of Engineering and Applied Science, Dissertation Director Tarek El-Ghazawi, Professor of Engineering and Applied Science, Committee Member Suresh Subramaniam, Professor of Engineering and Applied Science, Committee Member Guru Venkataramani, Assistant Professor of Engineering and Applied Science, Committee Member Timothy Wood, Assistant Professor of Computer Science, Committee Member ii Dedication To my beloved wife Claire H. Huang and my family. iii Acknowledgement It is never the individual effort to accomplish a PhD dissertation. I am indebted to all the people who inspire, motivate, and support me in my PhD odyssey. First and foremost, I give my sincere gratitude to my dissertation advisor, Prof. Howie Huang. His immense passion and relentless enthusiasm for doing great research always motivate and encourage me. His accurate guidance steers my vision and goal toward the right direction. Without his great support, I would not be able to finish my journey of pursuing a PhD. I am also grateful to my dissertation committee members, Prof. Tarek El-Ghazawi, Prof. Suresh Subramanian, Prof. Guru Prasadh Venkataramani, and Prof. Timothy Woods for their valuable mentorship through my journey and help me to polish this dissertation. Their insightful acumen and professional acuity strongly support and strengthen my dissertation. I am very fortunate to have the best collaborators in the lab. I express my appreciation to my lab mates, Xin Xu, Hang Liu, Ahsen Uppal, Jie Chen, and Jinho Hwang. I will miss their company during lunch, doing research and course works. I thank Dr. Oliver Spatscheck and Dr. Simon X. Chen for offering me internship opportunity at AT&T Lab. The last but not the least, I give deep thanks to my dearest wife, Claire H. Huang, who has given me countless support, encouragement, and moral boost over the years. I thank my parents for understanding and supporting my adventure. This work is supported in part by the National Science Foundation. iv Abstract Understanding and Optimizing I/O Virtualization in Data Centers Large-scale data centers leverage virtualization technology to achieve excellent re- source utilization, scalability, and high availability. Ideally, the performance of an application running inside a virtual machine (VM) shall be independent of co-located applications and VMs that share the physical machine. However, adverse interference effects exist and are especially severe for data-intensive applications in such virtual- ized environments. We demonstrate on Amazon Elastic Compute Cloud (EC2) a new type of per- formance vulnerability caused by competition among virtual I/O workloads. An adversary could intentionally slow down the execution of a targeted application in a VM that shares the same hardware. In Chapter 3, we design and implement Swiper, a framework which uses a carefully designed workload to incur significant delays on the target VM with minimum cost (i.e., resource consumption). We conduct a comprehensive set of experiments in EC2, which clearly demonstrates that Swiper is capable of significantly slowing down various server applications while consuming a small amount of resources. Our following research on the interference effect leads us to successfully construct mathematical models of resource contention and leverage the modeling results in task scheduling. In Chapter 4, we present TRACON, a novel Task and Resource Allocation CONtrol framework that mitigates the interference effects from concur- rent data-intensive applications and greatly improves the overall system performance. TRACON utilizes modeling and control techniques from statistical machine learning and consists of three major components: the interference prediction model that infers application performance from resource consumption observed from different VMs, the interference-aware scheduler that is designed to utilize the model for effective resource management, and the task and resource monitor that collects application character- v istics at the runtime for model adaption. We implement TRACON on a cluster and validate its effectiveness with experiments using a variety of cloud applications. Ex- periment results show that TRACON successfully achieves up to 25% improvement on application throughput. Swiper and TRACON address the contention on the shared physical resources among co-located VMs. In addition, other main factors contributing to VM perfor- mance unpredictability include limited control of VM allocation as well as lack of knowledge on the performance of a specific VM out of tens of VM types offered by public cloud providers. In Chapter 5, we propose Matrix, a novel performance and resource management system that ensures the performance of an application achieved on a VM can match closely to running on a target physical server. To this end, Matrix utilizes machine learning methods - clustering models with probability estimates - to predict the performance of new workloads in a virtualized environment, choose a suit- able VM type, and dynamically adjust the resource configuration of a VM on the fly. The evaluations on a private cloud, and two public clouds (Rackspace and Amazon EC2) show that for an extensive set of cloud applications, Matrix is able to estimate application performance with 90% average accuracy. In addition, Matrix can deliver the target performance within 3% variance, and do so with the best cost-efficiency in most cases. In addition to all above works which address performance issues on top of the vir- tualization framework, our exploration goes in depth to the virtualization architecture to design innovative I/O virtualization frameworks. Traditional data prefetching has been focused on applications running on bare metal systems using hard drives. In con- trast, virtualized systems using solid-state drives (SSDs) present different challenges for data prefetching. Most existing prefetching techniques, if applied unchanged in virtualized environments, are likely to either fail to fully capture I/O access patterns, interfere with blended I/O requests, or cause too much overhead if run in every virtu- alized instance, all of which could result in undesirable application performance. In Chapter 6, we demonstrate that data prefetching, when running in a virtualization- friendly manner can provide significant performance benefits for a wide range of vi data-intensive applications. We have implemented and evaluated VIO-prefetching in a Linux system with Xen hypervisor. Our comprehensive study provides insights of VIO-prefetching's behavior at various virtualization system configurations, e.g., the number of VMs, in-guest processes, application types, etc. The proposed method improves virtual I/O performance up to 43% with the average of 14% for 1 to 12 VMs while running various applications on a Xen virtualization system. In brief, this dissertation shows that virtualization overheads and architectures in cloud computing environments are very critical to performance, and proposes effective novel approaches which successfully advance the state of the art. More specifically, Swiper and TRACON construct mathematical models and scheduling algorithms to mitigate the interference problem; Matrix leverages machine learning and optimiza- tion techniques to realize the \equivalence" property of virtualization with the best cost-efficiency; and VIO-prefetching fundamentally changes the prefetching scheme in virtualization architecture and improves virtual I/O throughput. The results of this dissertation also envision numerous possibilities to thrust the virtualization and cloud computing technology. vii Contents Dedication iii Acknowledgement iv Abstract v Contents viii List of Figures xi List of Tables xvi 1 Introduction 1 1.1 Swiper . 3 1.2 TRACON . 3 1.3 Matrix . 5 1.4 VIO-Prefetching . 8 1.5 Contributions . 9 1.6 Dissertation Organization . 12 2 Background and Related Work 14 2.1 Amazon Elastic Compute Cloud . 14 viii 2.2 Virtualization . 15 2.3 Preliminary Interference Experiments . 16 2.4 Related Work . 17 2.4.1 Swiper . 18 2.4.2 TRACON . 21 2.4.3 Matrix . 23 2.4.4 VIO-prefetching . 24 3 Swiper 26 3.1 Introduction . 26 3.2 Threat Model . 30 3.2.1 Resource Sharing in Cloud Computing Systems . 30 3.2.2 Problem Definition . 30 3.3 I/O-Based Co-Location Detection . 32 3.4 Resource Competition for a Two-Party System . 34 3.4.1 Technical Challenges for Reaching the Maximum Delay . 34 3.4.2 Main Ideas for Synchronization . 36 3.4.3 Performance Attack . 39 3.5 Systems with Background Processes . 41 3.5.1 Synchronization in Multi-VM Systems . 41 3.5.2 Length of Observation Process . 42 3.6 Experiment Results . 46 3.6.1 Experiment Setup . 46 3.6.2 Comparison with Baseline Attacks . 47 3.6.3 Analysis of Performance Attack . 52 3.6.4 Analysis of Synchronization Accuracy . 53 3.7 Dealing with User Randomness . 55 3.8 Attacking Migratable VMs . 57 3.9 Potential Monetary Loss . 59 ix 4 TRACON 60 4.1 TRACON System Architecture . 60 4.2 Interference Prediction Model . 62 4.3 Interference-Aware Scheduling . 67 4.3.1 Machine Learning Based Scheduling . 70 4.4 Simulation . 74 4.4.1 Data-intensive Benchmarks . 74 4.4.2 Simulation Settings . 76 4.4.3 Performance of Prediction Models . 77 4.4.4 Task Scheduling with Different Models .
Recommended publications
  • The Datacenter As a Computer: an Introduction to The
    SeriesSeriesSeries ISSN: ISSN:ISSN: 1935-3235 1935-32351935-3235 BARROSO • CLIDARAS • HÖLZLE BARROSO • CLIDARAS • HÖLZLE BARROSO • CLIDARAS • HÖLZLE SSSYNTHESISYNTHESISYNTHESIS LLLECTURESECTURESECTURES ONONON MMM MorganMorganMorgan& & & ClaypoolClaypoolClaypool PublishersPublishersPublishers CCCOMPUTEROMPUTEROMPUTER AAARCHITECTURERCHITECTURERCHITECTURE &&&CCC SeriesSeriesSeries Editor: Editor:Editor: MarkMarkMark D.D.D. Hill,Hill,Hill, UniversityUniversityUniversity of ofof Wisconsin WisconsinWisconsin TheTheThe DatacenterDatacenterDatacenter asasas aaa ComputerComputerComputer AnAnAn IntroductionIntroductionIntroduction tototo thethethe DesignDesignDesign ofofof Warehouse-ScaleWarehouse-ScaleWarehouse-Scale Machines,Machines,Machines, SecondSecondSecond EditionEditionEdition TheTheThe DatacenterDatacenterDatacenter LuizLuizLuiz André AndréAndré Barroso, Barroso,Barroso, J JJimmyimmyimmy Clidaras, Clidaras,Clidaras, and andand Urs UrsUrs Hölzle, Hölzle,Hölzle, Google, Google,Google, Inc. Inc.Inc. THE DATACENTER AS A COMPUTER, SECOND EDITION AS A COMPUTER, SECOND EDITION THE DATACENTER AS A COMPUTER, SECOND EDITION THE DATACENTER THE DATACENTER AsAsAs computation computationcomputation continues continuescontinues to toto move movemove into intointo the thethe cloud, cloud,cloud, the thethe computing computingcomputing platform platformplatform of ofof interest interestinterest no nono longer longerlonger resembles resemblesresembles a aa pizzapizzapizza box boxbox or oror a aa refrigerator, refrigerator,refrigerator, but butbut a aa warehouse
    [Show full text]
  • (12) United States Patent (10) Patent No.: US 7925,802 B2 Lauterbach Et Al
    USOO7925802B2 (12) United States Patent (10) Patent No.: US 7925,802 B2 Lauterbach et al. (45) Date of Patent: Apr. 12, 2011 (54) HARDWARE-BASED VIRTUALIZATION OF $3: R ck 3. Oesetick et al. r 714/55 BIOS, DISKS, NETWORK-INTERFACES, AND 7, 171495 B2 1/2007 Matters et al. CONSOLES USING A DIRECT 7.219,183 B2 * 5/2007 Pettey et al. .................. T10.316 INTERCONNECT FABRIC 7,328,284 B2 * 2/2008 Rimmer ........................ 709/250 2002, 0083.120 A1 6/2002 Soltis (75) Inventors: Gary Lauterbach, Los Altos, CA (US); 3.99. A. ck 838. Sylt.lucker et al.1 ................. 370,252 Anil Rao, Cupertino, CA (US) 2006/O161719 A1 7/2006 Bennett et al. 2006/0253619 A1* 11/2006 Torudbakken et al. ......... T10/31 (73) Assignee: SeaMicro Corp., Santa Clara, CA (US) 2007/0061441 A1 3/2007 Landis et al. 2007/0097.950 A1 5/2007 Boyd et al. (*) Notice: Subject to any disclaimer, the term of this 2007/0106833 A1 5, 2007 Rankin et al. patent is extended or adjusted under 35 3.853; A. ck 3. her al. ................. TO9,203 jos et al. U.S.C. 154(b) by 414 days. 2008/0022071 A1 1/2008 Reid ............................. T12/206 (21) Appl. No.: 12/136,711 * cited by examiner (22) Filed: Jun. 10, 2008 Primary Examiner — Henry W Tsai Assistant Examiner — Eric T Oberly 65 Prior Publication Data 74) Attorney,ey, AgAgent, or Firm — Stuart T. Auvine; gPatent9. US 2008/0320181 A1 Dec. 25, 2008 LLC Related U.S. Application Data (57) ABSTRACT A multi-computer system has many processors that share (60) typal application No.
    [Show full text]
  • Of Traditional Storage?
    Table of Contents Introduction ................................................................................................................................ 4 Storage Solutions for Cloud-Based Information Services: Is SAN Dead for Cloud? ................... 6 What Is Cloud Computing? ..................................................................................................... 6 Cloud Computing from the Perspective of Storage Services ................................................... 6 Cloud Friendly and Unfriendly Applications ............................................................................ 7 Cloud Data Management Interface ......................................................................................... 8 Impact of Cloud Computing Requirements on Development of Storage Technologies ........... 9 DAS Solutions for Cloud ........................................................................................................10 Just DAS ............................................................................................................................10 High-Availability DAS .........................................................................................................10 Distributed Host-Cache ......................................................................................................11 SAN Solutions for Cloud ........................................................................................................12 Transformation of SAN Solutions Driven by Cloud Requirements ......................................12
    [Show full text]
  • Next Generation Datacenter Architecture
    Next Generation Datacenter Architecture Xiang Gao Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2020-30 http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-30.html May 1, 2020 Copyright © 2020, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Next Generation Datacenter Architecture by Xiang Gao A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Sylvia Ratnasamy, Co-chair Professor Scott Shenker, Co-chair Professor Rhonda Righter Summer 2018 Next Generation Datacenter Architecture Copyright 2018 by Xiang Gao 1 Abstract Next Generation Datacenter Architecture by Xiang Gao Doctor of Philosophy in Computer Science University of California, Berkeley Professor Sylvia Ratnasamy, Co-chair Professor Scott Shenker, Co-chair Modern datacenters are the foundation of large scale Internet services, such as search engines, cloud computing and social networks. In this thesis, I will investigate the new challenges in build- ing and managing large scale datacenters. Specifically, I will show trends and challenges in the software stack, the hardware stack and the network stack of modern datacenters, and propose new approaches to cope with these challenges.
    [Show full text]