ElasticMark: An Elasticity Benchmarking Framework for Platforms

Sadeka Islam

Dissertation submitted in fulfilment of the requirements for the degree of Doctor of Philosophy

School of Computer Science and Engineering Faculty of Engineering

March 2016

PLEASE TYPE ~SITY THE UNI OF NEw SOU'tl�WA L 8 Th.. I .,Di Hertatlon Sheet

Surname or Fam,t name· Islam

First name Sad a 0th.. rn m "' s PhD Abbrev,alJon for degree as g, nm the Uno ty cate nd• r:

SchoolCSE FacultyEng1neenng

IV ochma Trtle ElasbcMa .An Elasba Be g Framewooc Cloud Platforms '°'

Abstract 350 words maximum: (PLEASE TYPE)

Elasticity Is the unique cost-effective proposition of the cloud. It promises rapid adjustment of resources In response to varying wor1doads so that the application can meet its QoS objectives with minimal operational expenses. It is a crucial attribute that all commercial cloud providers frequently claim to possessin their offerings. However existing literature has not yet provided any meaningful and systematic guidanceto evaluate the elasticity of the cloud platform from the consumer's viewpoint The lack of an elasticity benchmarking framework makes it difficult for the consumer to diagnose and avoid elasticity issues, validate various claims of elasticity and compare the desirability of competing elastic cloud platfonns.

As such, this thesis proposes a novel consumer-centric elasticity benchmarking framework ElasticMark that reflects the elasticity of the cloud platfonn as a single figure of merit. It takes the consumer's perspective on running the benchmark by incorporating her application and workload profiles and then encapsulating the consumer's business objectives into the elasticity metric based on observations accessible via the cloud APls. The core framework is comprised of a penalty based elasticity measurerpentmodel, a standard workload suite with time-varying workload patterns and a set of guidelines for instantiating an executable benchmark. Toe measurement model derives the elasticity metricbased on financial penalty rates resulting from over-provisioning (unutilized resources) and under-provisioning (inadequate resources) when the cloud-hosted application is exposed to a suite of fluctuating workloads. ElastlcMant also includes a novel workload model in order to assist the consumer generate representative prototypes of her application-specific fine-scale bursty worlcloads, thus facilitates custom ela�ticity bench�arking. Fu,:thermore, this framework recommends a set of rigorous techniques to ensure repeatable and valid benchmarking results m presence of the performance unpredictability of the cloud environment

The framework has been validatedagainst a widely-used commercial cloud service to make sure that the elasti_ci� metric is� good reflection of the low-level adaptability characteristics and observed phenomena. It �as also �een pr�ven effe<:t1ve m comparing and contrasting the elasticity of multiple cloud pl�orms as well as pinpointing anomalies in their adaptive behaviors.

Declaration relating to disposition of project thesis/dissertation · · to make available my thesis or dissertationin whole of New South Wales or its agen the n· ht t a hive and 1 hereby grant to the University � provisions of the Copyright Act 1968. I retain forms of media, now or ere a � no ;n subject to the or in part in the University libranes tn all � t all or part of this thesis or the nght to use 1n ure wo rl

······· ...... Witness Signature Signature ······ ··· ..... Date copying or conditions on use Requests for e ptional rcums n re utrin restrictions on onal recognises that there may be xce i : C::: t 1 r f of restriction may be cons idered In excepti The University l.i eq s s o a onger period must be ma de tn wn ng ction for a pen'od f up to 2 years restri ° of the Dean of Graduate Research a�d r 1 e eth approval circumstances � _ -

for Award· Date of completion of requirements FOR OFFICE USE ONLY ------COVER OF THE THESIS GLUED TO THE INSIDE FRONT THIS SHEET IS TO BE i

ORIGINALITY STATEMENT

I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledge- ment is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project’s design and conception or in style, presentation and linguistic expression is acknowledged.

Signed ...... Sadeka Islam Mar 2016 ii

COPYRIGHT STATEMENT

I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral theses only). I have either used no substantial portions of copyright material in my thesis or I have obtained permission to use copyright material; where permission has not been granted I have applied/will apply for a partial restriction of the digital copy of my thesis or dissertation.

Sadeka Islam Mar 2016

AUTHENTICITY STATEMENT

I certify that the Library deposit digital copy is a direct equivalent of the final ocially approved version of my thesis. No emendation of content has occurred and if there are any minor variations in formatting, they are the result of the conversion to digital format.

Sadeka Islam Mar 2016 iii

ACKNOWLEDGEMENTS

When life is sweet, say thank you and celebrate. And when life is bitter, say thank you and grow.

Shauna Niequist

The path to a doctoral degree is full of many ups and downs. Thanks to the Almighty Lord for granting me the ability to adapt well to those fluctuations! More specifically, I am thankful to Him for letting me work on a fabulous topic - elasticity; the more time I spent with it, the more I became enamored with its beauty. This is the area where my mind loves to surf, this is the place where my heart finds its route to contentment!

My passion towards research did not grow suddenly out of thin air; as I look back and connect the dots, I see a group of brilliant scholars as my primary source of inspiration. It is their passionate zeal for research that motivates my young research mind, it is their amazing influence that encourages me to strive hard for quality research. I owe them a huge debt of gratitude. Among them, first comes the name of my supervisor, Dr. Anna Liu; she is an excellent mentor with a good combination of academic skills and business insights. Her inspiring words significantly boosted up my confidence. Her thoughtful guidance helped me reach completion within the assigned timeframe. I also feel blessed to have worked with Professor Alan Fekete; his creative thinking and visionary views have a profound impact on my overall research. He also guided me to the fascinating world of benchmarking. Working with my co-supervisor, Dr. Srikumar Venugopal, enhanced my research rigor. He also guided me through the exciting sphere of control-theoretic elasticity mechanisms. My joint supervisor, Dr. Hiroshi Wada, provided me invaluable advice for my reading course and earlier work, which quickly set me up and running. I am also indebted to Dr. Sherif Sakr for his sound guidance on my literature review work. Professor Len Bass’s insightful feedback proved to be instrumental in shaping up my dissertation; remembering that with sincere gratitude. I also worked with Dr. Jacky Keung and Dr. Kevin Lee in my earlier research projects; I appreciate their help and advice. iv

I am profoundly grateful to my annual review panel - Professor Ross Jeery, Professor Fethi Rabhi, Dr. Helen Paik and Dr. Adnene Guabtni, for their thoughtful comments and improvement suggestions. Ross also helped me define my research scope using GQM; his amazing mentorship is truly appreciated with respect.

Among others, I’d like to thank UNSW learning center and Ms. Pam Mort for arrang- ing the thesis writing workshop. My sincere gratitude goes to Mr. Colin Taylor for his sound advice regarding administrative matters. I am also grateful to Coursera for oering high quality online courses, which not only quenched my thirst for knowledge but also influenced my thinking pattern.

During this PhD, I submitted my papers to a number of conferences and journals and received insightful suggestions from anonymous reviewers and shepherds. I appreciate their invaluable feedbacks; undoubtedly, those enhanced the quality of my work. My sincere gratitude also goes to all researchers and practitioners who provided their constructive feedbacks on my research in those conferences.

I acknowledge the generous research grants from (AWS) and RightScale, which helped me carry out a series of comprehensive experiments. I received several travel grants too, which allowed me to present my research at numerous conferences; thank you all for supporting a poor PhD student.

And finally, I would like to express my heartfelt gratitude to my family for their pure transparent, unconditional love and support. I am really fortunate to have a wonderful dad who, despite not having a background in computer science, eagerly reads my research papers and asks me so many interesting questions. He and my extraordinary father-in-law also sent me motivational messages and blessings for my successful completion. I also appreciate my husband’s incredible love and support during the roller-coaster ride of this PhD. These people are the precious assets of my life, my everlasting source of inspiration. Contentment is nothing but a sparkling of silent appreciation in their eyes whenever I accomplish something. Words lose their expressive power whenever I try to thank these beautiful minds! v

To the Almighty God, who has given me the ability to learn and reason, enlightened my mind with knowledge, and guided me towards a successful completion. vi

ABSTRACT

Elasticity is the unique cost-eective proposition of the cloud. It promises rapid ad- justment of resources in response to varying workloads so that the application can meet its QoS objectives with minimal operational expenses. It is a crucial attribute that all com- mercial cloud providers frequently claim to possess in their oerings. However, existing literature has not yet provided any meaningful and systematic guidance to evaluate the elasticity of the cloud platform from the consumer’s viewpoint. The lack of an elasticity benchmarking framework makes it dicult for the consumer to diagnose and avoid elastic- ity issues, validate various claims of elasticity and compare the desirability of competing elastic cloud platforms.

As such, this thesis proposes a novel consumer-centric elasticity benchmarking frame- work ElasticMark that reflects the elasticity of the cloud platform as a single figure of merit. It takes the consumer’s perspective on running the benchmark by incorporating her application and workload profiles and then encapsulating the consumer’s business ob- jectives into the elasticity metric based on observations accessible via the cloud . The core framework is comprised of a penalty based elasticity measurement model, a standard workload suite with time-varying workload patterns and a set of guidelines for instan- tiating an executable benchmark. The measurement model derives the elasticity metric based on financial penalty rates resulting from over-provisioning (unutilized resources) and under-provisioning (inadequate resources) when the cloud-hosted application is exposed to a suite of fluctuating workloads. ElasticMark also includes a novel workload model in or- der to assist the consumer to generate representative prototypes of her application-specific fine-scale bursty workloads, thus facilitates custom elasticity benchmarking. Furthermore, this framework recommends a set of rigorous techniques to ensure repeatable and valid benchmarking results in presence of the performance unpredictability of the cloud environ- ment. The framework has been validated against a widely-used commercial cloud service to make sure that the elasticity metric is a good reflection of the low-level adaptability characteristics and observed phenomena. It has also been proven eective in comparing and contrasting the elasticity of multiple cloud platforms as well as pinpointing anomalies in their adaptive behaviors. vii

PUBLICATIONS AND RESEARCH GRANTS

Publications that have been included in this dissertation:

Sadeka Islam, Srikumar Venugopal, and Anna Liu. Evaluating the impact of fine- • scale burstiness on cloud elasticity. In Proceedings of the Sixth ACM Symposium on , pages 250–261. ACM, 2015.

Sadeka Islam, Kevin Lee, Alan Fekete, and Anna Liu. How a consumer can measure • elasticity for cloud platforms. In Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering, pages 85–96. ACM, 2012.

Other publications that haven’t been included in this dissertation:

Sadeka Islam, Jacky Keung, Kevin Lee, and Anna Liu. An empirical study into • adaptive resource provisioning in the cloud. In IEEE International Conference on Utility and Cloud Computing (UCC 2010), page 8, 2010.

Sadeka Islam, Jacky Keung, Kevin Lee, and Anna Liu. Empirical prediction mod- • els for adaptive resource provisioning in the cloud. Future Generation Computer Systems, 28(1):155–162, 2012.

Research grants:

AWS Education Research Grant. •

RightScale Premium Education Account. • viii Contents

List of Abbreviations xv

List of Figures xviii

List of Tables xix

1 Introduction 1

1.1 Cloud computing ...... 1

1.1.1 Deployment models ...... 3

1.1.2 Service models ...... 4

1.1.3 Pricing models ...... 5

1.1.4 Cloud ecosystem ...... 6

1.2 The truth about elasticity: expectation vs. reality ...... 8

1.2.1 Consumers’ expectations about elasticity ...... 8

1.2.2 Reality: imperfect elasticity ...... 10

1.3 Why is an elasticity benchmark the need of the hour? ...... 12

1.4 State of the art ...... 14

1.5 Research problem, hypothesis and goal ...... 14

1.6 Contribution and impact ...... 16

1.7 Scope and assumptions ...... 20

1.7.1 Scope ...... 20

ix x CONTENTS

1.7.2 Assumptions ...... 22

1.8 Research method ...... 23

1.9 Terminologies used ...... 24

1.10 Thesis overview ...... 27

2 Background and state of the art 29

2.1 Background ...... 29

2.1.1 Elasticity: definition and characteristics ...... 30

2.1.2 Comparison: Elasticity, Scalability and Eciency ...... 34

2.1.3 Foundation ...... 36

2.2 Related work ...... 41

2.2.1 Cloud performance analysis and benchmarking ...... 41

2.2.2 Elasticity benchmarking: initial concepts ...... 44

2.2.3 Elasticity benchmarking frameworks ...... 45

2.2.4 Elastic scaling techniques ...... 67

2.2.5 Elasticity quality assurance frameworks ...... 69

2.2.6 Elasticity modeling and simulation ...... 70

2.3 Summary ...... 71

3 The core framework 75

3.1 Introduction ...... 75

3.2 Consumer-centric elasticity measurement ...... 78

3.3 Elements of the elasticity benchmarking framework ...... 80

3.3.1 Penalty model ...... 80

3.3.2 Single figure of merit for elasticity ...... 83

3.3.3 Workload suite specification ...... 84

3.4 Executable benchmark instantiation ...... 85 CONTENTS xi

3.4.1 Choices for an elasticity benchmark ...... 86

3.4.2 Experimental setup ...... 89

3.4.3 Configuration and measurement procedure ...... 91

3.5 Case studies ...... 91

3.5.1 Exploring workload patterns ...... 91

3.5.2 Exploring the impact of scaling rules ...... 96

3.6 Discussion ...... 101

3.7 Critical reflection ...... 103

3.8 Conclusion ...... 109

4 Customized benchmarking? Use fine-scale bursty prototypes 111

4.1 Introduction ...... 111

4.2 Fine-scale burstiness: evidence and repercussions ...... 114

4.3 Prevalent fine-scale burstiness studies ...... 117

4.4 Multifractal analysis ...... 118

4.5 Modeling methodology ...... 121

4.5.1 Step 1: Characterization of pointwise regularity ...... 121

4.5.2 Step 2: Synthesis of fine-scale burstiness ...... 122

4.5.3 Step 3: Trend construction and superposition ...... 123

4.5.4 Working example ...... 125

4.5.5 Comparison with other methods ...... 127

4.6 Experimental setup ...... 128

4.7 Case study ...... 129

4.7.1 Eect of fine-scale burstiness on elasticity ...... 130

4.7.2 Trends in the elasticity penalty rate ...... 135

4.7.3 Summary ...... 136 xii CONTENTS

4.8 Critical reflection ...... 137

4.9 Conclusion ...... 140

5 Validity and repeatability? Tame the unpredictability 141

5.1 Introduction ...... 142

5.2 Runtime variability in elasticity behavior ...... 144

5.3 Causes of runtime variability ...... 147

5.4 Prevalent evaluation methodologies ...... 149

5.4.1 General features ...... 150

5.4.2 Common pitfalls ...... 151

5.5 Rigorous evaluation techniques ...... 153

5.5.1 Experiment design ...... 153

5.5.2 Data analysis ...... 158

5.6 Experimental setup ...... 164

5.7 Case study ...... 166

5.7.1 Rigorous method ...... 167

5.7.2 Comparison with prevalent methods ...... 170

5.7.3 Summary ...... 174

5.8 Critical reflection ...... 175

5.9 Conclusion ...... 178

6 Conclusion and future work 181

6.1 Recap on research problem and objective ...... 181

6.2 Contributions ...... 182

6.3 Critical reflection ...... 186

6.4 Future directions ...... 191

6.5 Concluding remarks ...... 195 CONTENTS xiii

Bibliography 197

Appendices 220

A Elasticity definition 221

B Elasticity behavior of standard workloads 227

B.1 Standard workload suite ...... 227

B.2 Elasticity behaviors ...... 231

C Elasticity behavior of fine-scale bursty workloads 253

D A comparison of elasticity evaluation methodologies: Environmental bias perspective 257

E Variability in elasticity behavior 259 xiv CONTENTS List of Abbreviations

AWS Amazon Web Services. PaaS Platform .

B2C Business to Consumer. QoS Quality of Service.

CoV Coecient of Variation. RDBMS Relational Database Manage- ment Systems. DaaS Database as a Service. RDS Relational Database Service. DNS Domain Name . ROI Return On Investment. DoE Design of Experiment. RPS Requests Per Second. EC2 Elastic Compute Cloud. S3 Simple Storage Service. ECU Elastic Compute Unit. SaaS as a Service. GAE App Engine. SLA Service Level Agreement. GCE . SLO Service Level Objective. GQM Goal Question Metric. SME Small and Medium Enterprise. SOA Service-Oriented Architecture. IaaS Infrastructure as a Service. SPEC Standard Performance Evaluation NIST National Institute of Standards and Corporation. Technology. SUT System Under Test.

OLTP Online Transaction Processing. TCO Total Cost of Ownership.

xv xvi List of Abbreviations List of Figures

1.1 Cloud service models ...... 4

1.2 Cloud ecosystem (adapted from [67]) ...... 7

1.3 Gartner’s hype cycle for cloud computing 2011 ...... 8

1.4 Traditional IT vs. automated elasticity (adapted from [233]) ...... 9

1.5 Elasticity behavior of EC2 platform in response to a sinusoidal workload . . 11

1.6 Motivation for an elasticity benchmark ...... 12

1.7 Research method: high-level approach ...... 23

2.1 Architecture of an elastic system (adapted from [126]) ...... 37

2.2 Taxonomy of factors for analyzing elasticity benchmarking frameworks . . . 46

2.3 Overview of macro-benchmarking modeling approaches ...... 48

2.4 Elasticity measurement concept of Dory et al. (adapted from [94]) . . . . . 52

2.5 Overview of Micro-benchmarking modeling approaches ...... 56

2.6 Evaluation criteria for elasticity benchmarking frameworks ...... 63

2.7 Classification scheme for elasticity techniques (adapted from [110]) . . . . . 67

3.1 Elasticity behavior of EC2 in response to a periodic sinusoidal workload . . 79

3.2 Elasticity behavior of EC2 with ruleset 1 in response to a sinusoidal work- load with 30 minutes period ...... 93

3.3 Results of the trapping scenario ...... 95

xvii xviii LIST OF FIGURES

3.4 Elasticity behavior of EC2 platform with ruleset 1 for exponential workload with growth 24/hour and decay 3/hour ...... 98

3.5 Rippling eect for sinusoidal workload with 90 minutes period ...... 100

4.1 Wikipedia workload snippet (Oct 1 2007) ...... 114

4.2 EC2 platform’s elasticity behavior under non-bursty and bursty workloads . 115

4.3 Partition function, Z(q,a) vs. Timescale, a ...... 120

4.4 Scaling exponent, ·(q) vs. q ...... 120

4.5 Holder function, deterministic trend and generated fine-scale bursty prototype126

4.6 Workload reproduced using LIMBO toolkit ...... 127

4.7 EC2 platform’s elasticity behavior for non-bursty and bursty workloads (smooth vs. sigma150) ...... 131

4.8 Estimated probability of average cpu utilization ...... 132

4.9 Behavior of the Tomcat application server under non-bursty and bursty workloads (smooth vs. sigma150) ...... 133

4.10 Response time percentiles ...... 135

4.11 Trends in elasticity penalty rates ...... 136

5.1 Random elasticity behavior of the m1.medium instance ...... 146

5.2 Elasticity aected by random scaling delay ...... 147

5.3 Diversified workload suite smooths out the random bias to some extent . . 154

5.4 An example 3-level experiment design ...... 157

5.5 Runtime variability in the elasticity score when compared the elastic im- provement of m1.medium instance with respect to the m1.small instance. . 169

5.6 Quantile-quantile (qq) plot of the sample elasticity scores ...... 169

5.7 Prevalent method evaluation ...... 173 List of Tables

2.1 Comparison: Elasticity, Scalability and Eciency ...... 36

2.2 Characteristics of dierent elasticity macro-benchmarking frameworks . . . 55

2.3 Characteristics of elasticity micro-benchmarking frameworks ...... 62

2.4 Evaluation of elasticity benchmarking frameworks ...... 65

3.1 engine configuration ...... 90

3.2 Penalty for Benchmarking Workloads - Ruleset 1 ...... 97

3.3 Penalty for Benchmarking Workloads - Ruleset 2 ...... 97

3.4 Penalty for Benchmarking Workloads - Ruleset 3 ...... 97

4.1 QoS degradation for under-provisioning in non-bursty and bursty workloads 117

4.2 Elasticity penalty for fine-scale burstiness ...... 130

5.1 State of the art elasticity evaluation methodologies ...... 150

5.2 Elasticity metrics for a 3-level hierarchical experimental design (followed [143])...... 168

5.3 Classification of the outcomes of the prevalent method and the rigorous method ...... 170

xix

Chapter 1

Introduction

“Mirror mirror on the wall, which cloud is the elastic of all?”

Cloud consumer

1.1 Cloud computing

Cloud computing is a popular computing paradigm that delivers IT capabilities on demand over the network [204]. Although the widespread adoption of cloud computing has gained momentum over the last few years, its root can be traced back to the mainframe com- puting era when computers were beyond the aordability range of individuals and small companies. Large enterprises, though could aord to buy such expensive computers, were constantly seeking ways to increase their utilizations using timesharing technology for im- proving their Return On Investment (ROI) [72]. These circumstances eventually drove these companies to find a cost-eective alternative to the mainframe system. Therefore, in 1961 when Professor John McCarthy suggested the concept of computing delivered as a public utility, it was believed to knock out all the aforementioned problems with just one stone [36]. This idea received immense popularity in the late ’60s, however, even- tually faded away because of several issues, such as application stack lock-ups due to incompatibilities among hosted application environments, inecient network infrastruc- ture etc. [107, 68]. It took several decades to establish the concept of utility computing as a sustainable reality. Technological innovations, such as cheaper micro-processors and

1 2 Introduction datacenters, fiber optic networks, , web services - all of these contributed to the realization of computing as a public utility [68]. Cloud computing indeed embraces the idea of utility computing; it delivers IT capabilities (such as infrastructure, software, development environment) as a service on demand and the cost incurred is approximately proportional to the amount of resources consumed. It facilitates economies of scale for both parties - the provider and the consumer; the provider can now enjoy better server utilizations by serving a large consumer base; thereby achieving faster ROI as well as lower Total Cost of Ownership (TCO). Likewise, the elimination of upfront capital investment as well as cheaper and readily accessible services on demand significantly reduces the cost per transaction at the consumer’s end; therefore, the consumer’s application can now serve the end-users with lower operational expenses. Undoubtedly, “economies of scale” is the key driving force behind the mainstream adoption of the cloud over the recent years [204].

Substantial eorts were spent to provide a precise definition of “cloud computing” and characterize its key features. Some of these attempts viewed cloud from technical perspec- tive [232, 108, 69, 11, 70, 106, 68, 48], while some others focused solely on the delivery model that enables the consumer’s business agility [204, 176, 177, 178]. Despite these dierences in viewpoints, all of these works unanimously agreed on the following key char- acteristics of cloud computing: on-demand self-service so that the consumer can provision and release resources automatically without any human intervention of the cloud service provider; broad network access so that resources are readily accessible from anywhere through dierent types of client platforms (e.g., mobiles, tablets, laptops, workstations); a set of pooled resources at the service provider to dynamically serve the demand of multi- ple consumers; rapid elasticity to enable quick allocation and deallocation of resources in response to fluctuating demand; and measured service to leverage pay-as-you-go pricing for resource usage. Among these features, elasticity and metered pricing enable cloud’s unique vision of oering economies of scale; as resources are charged based on actual usage in the cloud, this implies that elasticity has the power to shape up the operational ex- penses in proportion to the workload intensity faced by the application. This is what most Small and Medium Enterprises (SMEs) and even large enterprises yearned for a long time; they neither wanted to throw away their resources for an application which cannot draw enough customers (over-provisioning) nor wanted to starve their application for resources which suddenly becomes wildly popular overnight (under-provisioning). What they really wanted to have is the resource capacity that matches the workload intensity in real time 1.1. Cloud computing 3 and cloud elasticity brings this long-cherished dream into reality.

1.1.1 Deployment models

The cloud deployment model refers to dierent ways of setting up the cloud environment, typically characterized by ownership, size and accessibility concerns [74]. Several factors need to be considered while choosing a deployment model for hosting a cloud application, such as application’s requirements, time-to-market need, overall budget, security and so on. At present, cloud computing has four types of deployment models [11, 48, 178]: private cloud, community cloud, public cloud and hybrid cloud.

In a private cloud, the datacenter’s hardware and software are operated and managed by the organization itself or a third party and these resources remain dedicated for the internal use of the organization (i.e., these are not available to the general public). Large enterprises (and also the Government organizations) may find private cloud beneficial because of enhanced security, accountability and resilience. However, the upfront capital investment, longer time-to-market (typically 6-36 months) and management overhead may make it less attractive to small startups and medium enterprises.

Community cloud is a more generalized form of the private cloud where the data- center’s resources are managed and operated by several organizations or a third party to support some common interests (e.g., security and compliance requirements, mission- critical objectives) of the community. Since the cloud resources are shared by more than one organization, the participating organizations need to establish fair charging policies and governance procedures for the ecient operation of the community cloud.

Public cloud is the most generalized form in the spectra where the datacenter’s re- sources are made available to the general public on a pay-as-you-go manner. In the public cloud scenario, the datacenter is owned and maintained by an organization and it sells cloud resources based on its own charging policy. Public cloud eliminates the risks of upfront capital investment and longer time-to-market as well as maintenance overheads; on the downside, the consumers lose some visibility and control over the internal workflow of the underlying virtualization environment.

The last one, i.e., the Hybrid cloud deployment model is composed of several clouds (private, community, or public) which act as standalone entities but are bound together by 4 Introduction some standardized technology. Sometimes enterprises may need to use a combination of private and public cloud (i.e., the hybrid cloud) to handle sudden surges of their workloads (e.g., during a product launch event, Black Friday deals).

1.1.2 Service models

Any resource which is delivered over the can be considered as a cloud service. Some of these cloud services provide infrastructure resources, some oer specialized development environments and proprietary APIs for developers and testers while some others deliver sophisticated software applications to promote business productivity. These services are classified into three main categories: Infrastructure as a Service (IaaS), (PaaS) and (SaaS) [11, 232, 48, 178, 161, 169, 63]. These services are interdependent and form dierent layers in the cloud technology stack; this is depicted in Fig. 1.1.

SaaS Google Docs, , .net, CRM PaaS Elastic Beanstalk, Azure Cloud Service, App Engine IaaS EC2, Rackspace, Azure Virtual Machine, Google Compute Engine

Figure 1.1: Cloud service models

IaaS is the most rudimentary form of cloud computing where the service providers oer a shared pool of infrastructural resources (e.g., CPU, memory, bandwidth, storage etc.) using virtualization technology so that the consumers can customize, deploy and run arbitrary and applications. Some well-known examples falling into this category are Amazon Web Services (AWS) Elastic Compute Cloud (EC2) [18], Rackspace [28], Azure Virtual Machines [34], Google Compute Engine (GCE) [16] and OrionVM [24].

PaaS provides an additional level of abstraction on top of the IaaS; i.e., it provides software development environment with proprietary APIs and tools. The consumers can host their applications on the platform and tune the deployment configuration. It facili- tates quick development, testing and deployment of software applications by eliminating 1.1. Cloud computing 5 the overheads of software licensing and maintenance tasks. However, on the downside, the developers may have to work with a smaller subset of programming languages and tools supported by the platform; their visibility and control over the infrastructure may also be limited. Some examples of this category include Amazon’s Elastic Beanstalk [35], Azure Cloud Services [14], (GAE) [3] and Facebook’s Developer platform [20].

SaaS oers customized versions of software applications which are typically accessible through web browsers, tablets and cellphones. Some popular examples of SaaS include Google Docs [21], Oce 365 [23], Dropbox [17], Box.net [10] and Salesforce CRM [29]. Some of the SaaS applications are free to use, while some others require subscription based usage.

1.1.3 Pricing models

Well-designed pricing model is a key ingredient for the success of cloud computing [238]. The central theme for cloud resource pricing is the “pay-per-use” model where the con- sumer is charged based on the amount of resource usage. Cloud providers nowadays employ dierent pricing models for charging resources which can be considered as slight variants of the “pay-per-use” theme. The current pricing models for cloud resources can be classified into four categories: pay-as-you-go, pay for resources, subscription based pricing and dynamic pricing [41, 45, 144, 50].

In the pay-as-you-go model, consumers pay a fixed price per unit of resource usage. Cloud services, such as AWS [15], Azure [27] and [26], adopt this model to charge the consumer a fixed price for using a virtual machine for a predefined charging quanta. Variations exist, though, in the specification of the charging quanta by dierent providers. For instance, the charging quanta of AWS, and GCE are 60 minutes, 1 minute and 15 minutes respectively. The duration of the charging quanta has a significant impact on the consumer’s operating expenses; a recent study of Andra [45] described some use cases where Microsoft Azure’s per-minute based pricing model outperforms Amazon’s hourly pricing model. Applications with short spiky or unpredictable load surges may find this pricing model very useful.

Pay for resources is another commonly observed pricing model where the consumer 6 Introduction pays for the amount of resources consumed. Most of the cloud providers (e.g., AWS [15], Microsoft Azure [27] and Google Cloud Platform [26]) oer their network bandwidths and storage resources based on this pricing model.

In the subscription based pricing, cloud providers oer resources at a discounted rate if the consumer subscribes to use the cloud service for a longer period (e.g., 6 months or 1 year). Amazon’s reserved instance pricing model [1] is an example of this type where the consumer gets a discounted rate if she1 subscribes to use a fixed resource quantity for a long term, such as a year. Applications with predictable or steady resource usage may be potential candidates for this pricing model.

In contrast to the above three models, the dynamic pricing refers to variable charging rate per unit of cloud resources based on real-time market conditions, such as auctioning, bargaining, supply vs. demand etc. For example, Amazon’s spot instance pricing model [2] allows the consumer to bid price on spare EC2 instances; if the bid price is higher than the spot value, then the instance is allocated to the consumer. The spot price, in fact, fluctuates based on the supply and demand for instances. The allocated instance may terminate abruptly when the spot price goes beyond the consumer’s bid price. The main benefit of this pricing strategy is that it is very cost-eective, often 50 90% cheaper than ≠ the pay-as-you-go model. The potential use cases to be benefited from this pricing scheme may be applications with flexible completion times, applications facing urgent need of large compute capacity and so on.

1.1.4 Cloud ecosystem

With the advent of cloud computing, technological development has entered a brand new era - rapid elasticity and pay-per-use pricing promote quick development and hosting of innovative applications within a limited budget. As a result, many young entrepreneurs came forward with their innovative ideas and exploited cloud resources to make their applications available to the end-users; this is how a complex cloud ecosystem has evolved over the recent years.

In such an ecosystem, there are five major players who communicate with each other

1Throughout this dissertation, the female form of the consumer and the provider is used; however, one can also substitute it with the male form. 1.1. Cloud computing 7

Figure 1.2: Cloud ecosystem (adapted from [67]) through a standard business process: cloud provider, cloud consumer, cloud broker, cloud auditor and cloud carrier [169, 63]. The cloud provider oers her resources (e.g., infras- tructures, development platforms and applications) to the consumers. The cloud consumer is either an end-user who uses the SaaS applications without any modification or a de- veloper/application provider who uses the cloud infrastructural resources or development platforms to make her product/service available to the end-users. The consumers can request the cloud resources directly from the cloud provider or through a cloud broker.A cloud broker typically negotiates the relationship between the consumer and the provider. He may also oer enhanced quality cloud service by adding more features to it, such as performance reporting, enhanced security etc. A cloud auditor is an independent entity that verifies whether the cloud provider complies with the regulatory and compliance re- quirements (e.g., security, privacy, performance) based on objective evidence. And finally, a cloud carrier is responsible for providing connectivity (e.g., network, telecommunication, transport agent) between the consumer and the provider.

Fig. 1.2 depicts a simple ecosystem that consists of the cloud provider and the cloud consumer. The interested reader will find more example ecosystems in [169]. 8 Introduction

1.2 The truth about elasticity: expectation vs. reality

As mentioned in the previous section, elasticity is a cost-eective feature of the cloud that has received significant attention over the years. As suggested by the Gartner’s hype cycle (Fig. 1.3), it is an emerging technological innovation which is supposed to hit the mainstream in about 2015 2016. According to ComputerWeekly.com [208], this wave of ≠ mainstream adoption has already started to widen in 2015. Under these circumstances, the cloud consumer is very often confronted with this question: Is the current state of elasticity good enough to meet the needs of my application? In the following, we explore the answer to this question in the light of the available facts and empirical evidence.

Figure 1.3: Gartner’s hype cycle for cloud computing 2011

1.2.1 Consumers’ expectations about elasticity

When a new technology comes into the market, people usually try to understand its rel- evance and significance to their lives and businesses. Cloud elasticity faced no exception to this trend. Potential consumers of the cloud, therefore, studied commonly available information sources (e.g., cloud providers’ websites, white papers and research papers, practitioners’ blogs etc.) to form their expectations about elasticity. The general percep- 1.2. The truth about elasticity: expectation vs. reality 9 tion about cloud elasticity goes along with the following definition:

“In cloud computing, elasticity is a term used to reference the ability of a sys- tem to adapt to changing workload demand by provisioning and deprovisioning pooled resources so that provisioned resources match current demand as well as possible.” (Cloud computing IT Glossary [12])

This definition points out several advantages of cloud elasticity over the traditional IT. First, it relieves the consumer from the burden of upfront capital investment by replacing CapEx (Capital Expenditure) with a moving OpEx (Operational Expenses). Second, the on-demand availability of resources in response to changing workload conditions elimi- nates the risks of over-provisioning (payment for idle resources) and under-provisioning (unserved demand for inadequate resources); for this reason, it is believed to shape up the consumer’s revenue in proportion to the workload demand, something that was never so easy to achieve with conventional IT systems. Fig. 1.4 draws the distinction between inelastic traditional IT system and elastic cloud platform; apparently, elasticity appears to be a more economical alternative for serving the fluctuating workload of the cloud consumer.

Figure 1.4: Traditional IT vs. automated elasticity (adapted from [233]) 10 Introduction

Because of this unprecedented economic appeal of elasticity, it has received significant attention from the cloud consumer community. There are many compelling scenarios where elasticity appears as a boon to the consumer’s revenue [48]. A startup facing rapid growth wishes that its costs start small, and grow as and when the income arrives to match. In contrast, traditional data processing requires a large up-front capital expenditure to buy and install IT systems. Traditionally, the cost must cover enough processing for the anticipated and the hoped-for growth; this leaves the company bearing much risk from uncertainty in the rate of growth. If growth is slower than expected, the revenue would not be available to pay for the infrastructure, while if growth is too fast, the systems may reach capacity and then a very expensive upgrade or expansion is needed. Also, it is common in web-based companies for demand to be periodic or bursty (e.g., the Slashdot eect). The workload may grow very rapidly when the idea is “hot”, but fads are fickle and demand can then shrink back to a previous level. Traditional infrastructure must try to provision for the peak, and so it risks wasting resources after the peak has passed. In summary, elasticity can remove risk from a startup or an enterprise, by allowing “pay-as-you-grow” computing infrastructure where the costs adjust smoothly to rising (and perhaps falling) workload. Motivated by these examples, many consumers became enamored with the elasticity concept and decided to adopt it to maximize their revenues and net profits.

1.2.2 Reality: imperfect elasticity

There are two sides of every story: the bright side and the dark side, and the truth lies somewhere in between. Same applies to the views about elasticity too. Soon after its adoption, some consumers experienced practical issues in the delivered elasticity of the public cloud oerings. For instance, the resource acquisition delay in the public cloud is not instantaneous and the lead time is in minutes; therefore, concerns abound whether it would be a good fit for applications with stringent Quality of Service (QoS) requirements (e.g., web and e-commerce applications, real-time and mission-critical applications). To get a good grasp of the current state of elasticity, we carried out an experiment on the AWS EC2 cloud. In particular, we observed the elasticity behavior of an online bookstore application TPC-W hosted on a dynamic web server farm in the EC2 cloud in response to a sinusoidal workload of 30 minutes period. We configured the scaling policy as follows: add an EC2 instance to the server farm when the average CPU utilization goes beyond 70% for 1.2. The truth about elasticity: expectation vs. reality 11

2 consecutive minutes and remove an EC2 instance when the average CPU utilization goes below 20% for 2 consecutive minutes. The elasticity behavior is shown in the following, where each EC2 instance is assumed to provide 100% of CPU supply per minute.

Figure 1.5: Elasticity behavior of EC2 platform in response to a sinusoidal workload

In Fig. 1.5, demand means the amount of resource needed by the application to serve the requests with satisfactory performance, available supply (a/supply) implies the amount of resource allocated by the platform and chargeable supply (c/supply) indicates the amount of resource for which the cloud platform charges the consumer. This figure suggests that the elasticity oered by the EC2 platform is not perfect; sometimes the platform charges the consumer for unutilized resources (between timepoints 0 12), some- ≠ times the platform introduces minutes-long lead time to allocate the resources (between timepoints 15 20) although it starts charging from the very moment the resource request ≠ is placed. It also charges the consumer for already released resources due to the hour-long charging quanta (between timepoints 40 50). In fact, these flaws are not only specific to ≠ the EC2 platform but apply to all cloud platforms to some extent. Therefore, cloud con- sumers may end up with operational overspending if they do not know how to judiciously use the cloud resources [200]. Additionally, their expected revenues may also suer if the cloud resources are not available on the fly.

This reality check may increase the consumers’ skepticism about the elasticity claims of dierent cloud oerings. On one hand, there are compelling benefits of elasticity, but on the other hand, there are some concerning facts which may partially shadow its benefits. This is an important riddle that the potential consumer wants to solve at the preprocessing stage of elastic cloud adoption. 12 Introduction

Figure 1.6: Consumers need an elasticity benchmark to compare the elasticity claims of various cloud platforms. The consumer’s image is taken from Zedge.net 2

1.3 Why is an elasticity benchmark the need of the hour?

Example 1.1 Running example: Dona is a local pizza shop owner with some yummy recipes which were very popular in the town. She wanted to boost her business by cre- ating an online presence with a website. She expected a fluctuating trac pattern with peaks during the lunch and dinner hours; therefore, she decided to host her website into the elastic cloud so that the cost of the infrastructure adapts well to the incoming trac demand. Based on some back-of-the-envelope calculations [48], she felt convinced about the economic benefit of elastic cloud compared to the traditional infrastructure. Since all cloud providers claim elasticity in their oerings, she did not consider it as an important factor for cloud adoption. She carried out benchmarking for the scalability aspect (i.e., the performance/price ratio) based on a standard benchmark and picked the best one. This ap- proach worked well as long as the request arrival rate for that application remained steady over time.

However, a couple of weeks later, she heard customers’ complaints about her website’s sluggish performance and unavailability during rush hours. After a careful investigation, she found that the site could not scale quickly in response to growing trac because of minutes-long resource spin-up delay of the cloud platform, thus experiencing customer abandonment because of unacceptable response time and dropped requests. She also figured out that her operational expenses did not go down as soon as the resources were released because of the hour-long charging quanta of the cloud platform. These phenomena increased

2Source: http://www.zedge.net/wallpaper/3428710/ 1.3. Why is an elasticity benchmark the need of the hour? 13 her concerns about the consequences of imperfect elasticity on her business’s revenue and net profit. She also perceived the fact that the delivered elasticity of the public cloud services is not as perfect as their claims. Therefore, she felt the need to compare the elasticity claims of dierent cloud services and pick the one best suited to her application and workload profile. So, she searched the web for an elasticity benchmark whom she can ask:

“Mirror, mirror on the wall, Which cloud is the most elastic of all?”

However, to her disappointment, she could not find that magic mirror and therefore, failed to make a well-informed decision about elastic cloud adoption.

The story portrayed in the above example is somewhat familiar to many potential consumers of the elastic cloud. Enthusiastic consumers, who moved to the elastic cloud just driven by the marketing hype, experienced the repercussions of imperfect elasticity on their applications’ revenues. Apparently, elasticity is a double-edged sword; while good elasticity has the potential to maximize the consumer’s revenue, bad elasticity can worsen the revenue because of operational overspending and unacceptable website performance. Users, who find the website unavailable during the scale-out period (when resources are saturated), not only generate zero revenue but also never come back and recommend it to anyone due to poor service. For this reason, pragmatic cloud consumers feel the need to validate the elasticity claims of dierent cloud oerings for their applications and workload profiles. However, the irony is that elasticity at present is a frequently-claimed yet never-validated attribute. All public cloud providers claim their oerings to be elastic without specifying any measurement metric (even though none are perfect in delivering instantaneous elasticity). As a consequence, consumers face significant diculty when they try to compare and contrast the quality of elasticity and pick the one best suited to their applications’ requirements and business objectives.

This is the reason why an elasticity benchmark is so desirable to the consumers of the cloud. A well-designed elasticity benchmark can serve as a useful tool to draw insights about the adaptability behavior of the cloud platform in the context of a given application and its workload profile. It can help cloud consumers evaluate competitive cloud platforms and choose the most suitable one for a given application. It can also be applied to tune the deployment configuration (such as scaling policies) in order to achieve better elasticity 14 Introduction for an application hosted on a particular cloud platform.

1.4 State of the art

There are undoubtedly several elasticity benchmarks available for evaluating the cloud platforms. However, we have not found any elasticity benchmark in the literature that adequately characterizes and evaluates elasticity from the consumer’s perspective. Most of the works made attempts to characterize the elasticity of the cloud platform based on scaling delay and resource granularity; however, this sort of representation is flawed as it lacks some key aspects of elasticity, such as the charging quanta and pricing model of the cloud platform (because of their direct impact on the consumer’s operational expenses). Scarcely, any of these works oered any means to encapsulate the consumer’s concerns (e.g., her application-specific business objectives and workload profiles) in the derived elasticity metric. As a consequence, cloud consumers find it very dicult to comprehend the economic worth of elasticity for their application specific contexts. Moreover, none of the prevalent elasticity benchmarks provided any explicit guidance to express the elasticity score of a workload collection as a single figure of merit, thereby posing diculty to draw a simple conclusion about one platform’s worthiness over another.

To sum up, there is no adequate yardstick available today to evaluate the elasticity be- havior of the cloud platforms from the perspective of cloud consumers. As a consequence, it is dicult to identify and fix elasticity issues, validate and compare the elasticity claims of competing cloud oerings and adaptive scaling strategies, and choose the most optimal elasticity solution well-suited to the consumer’s business situation.

1.5 Research problem, hypothesis and goal

Our research problem is defined as follows:

Research Problem The computing community is lacking a meaningful and system- atic means to evaluate the elasticity of the cloud platform from the cloud consumer’s perspective.

By “meaningful and systematic means”, we denote a set of standard techniques that 1.5. Research problem, hypothesis and goal 15 can be applied to evaluate the elasticity of competing cloud platforms.

By “cloud platform”, we mean an adaptive cloud system on which the consumer’s application can run. Typically, an adaptive cloud system is comprised of an underlying cloud infrastructure and a set of scaling policies to adjust the resource capacity in response to a fluctuating workload.

By “cloud consumer’s perspective”, we mean doing everything with the consumer’s specific concerns and limitations in mind. In other words, it denotes putting oneself into the consumer’s shoes while evaluating the cloud platforms and support their decision- making process with relevant metrics. The following criteria need to be met in order to satisfy the consumer’s perspective; first, the evaluation process should keep provision to incorporate the consumer’s application and realistic workload profiles. Second, only those observations which are accessible to the consumer through the platform’s APIs and performance tools need to be recorded. And finally, the reported metrics should readily reflect the degree to which the consumer’s business objectives have been met so that they can easily draw simple conclusions about one platform’s worthiness over another.

To address the research problem, we have formulated the following hypothesis:

Research Hypothesis It is possible to develop an elasticity benchmarking framework to evaluate the elasticity of cloud platforms, so that consumers can make well-informed decisions about elastic cloud adoption.

By “benchmarking framework”, we refer to a conceptual abstraction that includes a set of metric definitions, a representative workload suite and a precisely defined procedure to yield those metrics for fair comparison of alternative platforms with respect to a spe- cific attribute, such as elasticity. The elasticity benchmarking framework serves as a basic template to instantiate an executable elasticity benchmark, given the consumer’s spe- cific context (e.g., application domain, workload characteristics) and business objectives. Designing a generic elasticity benchmarking framework that can be applied consistently across all types of cloud platforms and application domains is far from being trivial in the limited timeframe of a PhD. For this reason, we constrain our focus by considering a limited range of cloud platforms and application domains.

Research Goal Our research objective is defined using the Goal Question Metric (GQM) method [53, 231]:

Design a benchmarking framework for the purpose of Evaluation (i.e., measurement, 16 Introduction

analysis and comparison) with respect to the elasticity of cloud platforms from the viewpoint of cloud consumers in the following context: Online Transaction Processing (OLTP) type applications hosted on public IaaS and PaaS VM type oerings and charged based on the pay-per-use pricing scheme.

The research questions to accomplish this goal are specified in the following:

RQ 1 How can we design a core elasticity benchmarking framework for cloud platforms from the consumer’s viewpoint?

RQ 1.1 How can we define a metric for measuring elasticity of the cloud platform from the consumer’s perspective?

RQ 1.2 How can we design a standard workload suite for elasticity evaluation?

RQ 2 What are the concrete steps for instantiating an executable elasticity benchmark?

RQ 3 How can we reproduce custom prototypes of actual workloads for elasticity evalua- tion?

RQ 4 How can we ensure repeatability and validity in the elasticity benchmarking results in the presence of the performance unpredictability of cloud platforms?

Having defined the goal and research questions, we will now move on to highlight our contributions in the next section.

1.6 Contribution and impact

The primary contribution of this dissertation is ElasticMark - a novel elasticity bench- marking framework that takes on consumer-centric view while assessing the elasticity of the cloud platforms. Our contribution advances the state of the art in several ways. First, it helps the cloud consumer make a well-informed decision about elastic cloud adoption by comparing and contrasting the elasticity of competing cloud platforms. Furthermore, it serves as a crucial tool to support innovation in the area of adaptive cloud systems. Performance analysts and adaptive system designers can use this framework to evaluate the eectiveness of alternative adaptive mechanisms and identify possible areas for further improvement. In addition to this, it helps the cloud provider determine the worth of her 1.6. Contribution and impact 17 elastic cloud oerings with respect to other competitive oerings in the market, thereby giving her an opportunity to optimize her products to better address the consumer’s ob- jectives.

In the following, we highlight our individual contributions and their impacts:

A core framework for evaluating the elasticity of competing cloud platforms (e.g., • cloud oerings, adaptive scaling strategies) from the consumer’s perspective.

The first concrete proposal that clearly and objectively incorporates the consumer’s viewpoint while evaluating the elasticity of the cloud platforms. Unlike other frame- works in the computing literature that measures only the technical aspects of elastic- ity (e.g., scaling delay and precision between demanded and allocated resources), our framework quantifies elasticity in terms of the complex interaction of the technical aspects and the consumer’s business situation. This makes our task more dicult than usual as it involves looking at the evaluation problem through the lenses of the consumer and understanding the consumer’s preference objectives, business context and other constraints (e.g., limited visibility through the cloud APIs). Neverthe- less, this undertaking is worthy as it gives the consumer a rational basis to make a well-informed decision about elastic cloud adoption. This framework follows a penalty-based approach for elasticity evaluation; it includes some metric definitions that help the consumer understand the deviation between the desired and the per- ceived level of elasticity in terms of monetary units. Another noteworthy feature of this framework is that it expresses elasticity as a single figure of merit, thus helping the consumer draw a simple conclusion about the worthiness of alternative cloud platforms.

Elasticity implies the cloud platform’s adaptive behavior in response to the variation in the resource demand of the workload. The workload suite for elasticity evaluation in this framework, therefore, includes a set of time-varying workload patterns to stress the platform’s adaptability behavior. Each of these workload patterns repre- sents realistic usage scenarios commonly seen in the web and e-commerce applications (e.g., periodic variation, flash crowds). This workload suite helps the consumer un- derstand the cloud platform’s elasticity behavior in realistic settings. By plotting the workload intensity and elasticity behavior over time, it is also possible to diagnose adaptability issues and identify potential areas for optimization. 18 Introduction

Specific guidance for instantiating an executable elasticity benchmark. • We also provide guidelines for instantiating an executable elasticity benchmark from the conceptual framework. It involves making concrete choices about the benchmark application, workload suite specification and the consumer’s non-functional objec- tives (e.g., QoS thresholds, monetary eects for violating the QoS thresholds when the elasticity is imperfect). Additionally, it elaborates the details of the testbed setup and guides through the specifics of the configuration and measurement proce- dure that the consumer has to follow to conduct elasticity benchmarking for cloud platforms. All these steps are illustrated with a working example, thus providing the consumer precise guidance on instantiating an elasticity benchmark. The core framework and instantiation procedure were published in:

How a consumer can measure elasticity for cloud platforms. Sadeka Islam, Kevin Lee, Alan Fekete, and Anna Liu. In Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering, pages 85–96. ACM, 2012.

A novel workload model for generating representative prototypes of fine-scale bursty • workloads based on traces.

Although the standard workload suite includes workloads representative of some common real-world usage scenarios, sometimes a need may arise for reproducing realistic workload prototypes based on the consumer’s application-specific traces. Our literature review reveals the fact that web and e-commerce workloads include many noisy oscillations or burstiness at the small timescale (e.g., in the order of sec- onds), which are dicult to model with simple mathematical equations. To resolve this issue, this framework includes a novel workload model for reproducing realistic prototypes of fine-scale bursty workloads, thus assisting the consumer to carry out customized elasticity evaluation in a cost-eective way. This work was published in:

Evaluating the impact of fine-scale burstiness on cloud elasticity. Sadeka Islam, Srikumar Venugopal, and Anna Liu. In Proceedings of the Sixth ACM Sym- posium on Cloud Computing, pages 250–261. ACM, 2015.

A set of rigorous techniques for ensuring valid and repeatable elasticity benchmarking • results in the presence of the performance unpredictability of the cloud environment. 1.6. Contribution and impact 19

Elasticity is a random variable whose precise nature depends on various random fac- tors of the cloud environment; as a result, the derived elasticity metric shows some variation across dierent runs of the same workload. The elasticity benchmarks to date have not taken into account the influence of performance unpredictability on elasticity, nor have proposed any means to ensure consistent benchmarking results. To address this gap in the literature, our research investigates the impact of per- formance unpredictability on elasticity and suggests a set of rigorous techniques for ensuring repeatable and valid elasticity benchmarking results in the presence of such unpredictability. This is a distinctive feature of our framework that has never been considered by previous elasticity benchmarks.

Insightful case studies demonstrating the applicability of the framework for the pur- • pose of elasticity evaluation.

Several empirical case studies conducted on the Amazon’s EC2 cloud demonstrating the applicability of the concepts and techniques proposed by our framework. These studies draw some interesting insights about the elasticity of the EC2 platform. Additionally, these reveal the dierence between technical elasticity and consumer- perceived elasticity. Furthermore, these studies pinpoint some shortcomings of the commonly-used adaptive scaling strategies and suggest several approaches for im- proving the consumer-perceived elasticity. The extracted insights and optimization prescriptions themselves are valuable contributions to the performance engineering literature.

Some additional benefits of this research are summarized as follows:

A framework that promotes a proactive decision-making approach to elastic cloud • adoption, specifically with respect to the consumer’s application context and business situation.

Several shreds of empirical evidence that unravel the counter-intuitive notion about • consumer-perceived elasticity.

A case study that raises awareness about the impact of fine-scale burstiness on the • elasticity of the cloud platform.

An empirical analysis on the interaction of fine-scale burstiness and cloud elasticity. • 20 Introduction

Scientific papers in top-tier peer-reviewed conference proceedings ([139, 140]). A con- • ference paper, elaborating the rigorous techniques for ensuring valid and repeatable elasticity benchmarking results in the presence of the performance unpredictability of the cloud, is currently in progress. A survey paper on elasticity is also being drafted.

1.7 Scope and assumptions

This section describes the scope and assumptions of our work.

1.7.1 Scope

This dissertation concentrates on designing an elasticity benchmarking framework and providing guidance for instantiating an executable benchmark so that the consumer can make a well-informed decision about elastic cloud adoption. Elasticity benchmarking, however, is a vast area of research; covering the entire scope in the lifetime of a PhD is far from being trivial. For this reason, we have decided to restrict our scope to a limited range of cloud oerings and application types. In the following, we shed light on the inclusions and exclusions of our scope:

All IaaS and some PaaS based cloud providers nowadays oer resources as a bundle • and charge the consumer based on the leased time for those bundles. For example, Amazon’s EC2 instance includes CPU, memory and disk storage and the pricing for an m1.small EC2 instance is 4.4 per hour. Our framework applies only to this type of cloud oering. However, it is¢ not uncommon to see some PaaS oerings (e.g., [22], AWS Lambda [9]) where there is an abstraction around the underlying resource configuration and pricing occurs based on the number of completed tasks. Our framework at this moment does not cover this category.

At present, our benchmarking framework considers only the public cloud oering • which has one or more pricing models for renting resources. We have not investigated the applicability of our framework for other cloud deployment models (e.g., private cloud, community cloud) where the pricing scheme is not that obvious. 1.7. Scope and assumptions 21

Our benchmarking framework at present applies to the VM type which supplies • fixed amount of resource quantity throughout its lifecycle. We plan to extend our framework for VM types with variable resource supply or bursting capability in future.

Our benchmarking framework currently considers only pay-per-use pricing model. • Incorporating other pricing models (e.g., dynamic pricing model, subscription based pricing model) is one of the priorities of our future research.

Developing a generic elasticity benchmarking framework to assess the elasticity of all • application types (e.g., batch processing applications, real-time and mission critical applications) would be a large area of research as dierent applications have dierent types of resource access patterns, workload profiles, and business objectives. We, therefore, restrict ourselves within the scope of OLTP type applications; because this application type is considered to be at the heart of the e-business and has to maintain stringent QoS (e.g., response time, availability etc.) even in the face of highly fluctuating workload demands. Since cloud elasticity today is not so mature (i.e., immediate and fine-grained), it is more likely to have a negative impact on the revenue of these applications.

We have not provided any guidance on extrapolating the benchmarking results. Con- • sumers with large complex applications may often feel the need to extrapolate the benchmarking results based on data gathered from a small test application deployed into the cloud. However, providing a solution to this problem is not so straightfor- ward as it involves understanding the complex dependency structure of dierent ap- plication components, their deployment configurations, bottleneck switching among dierent components and designing of workloads representing the interactions among various components. For this reason, we have considered extrapolation as a future work.

We have decided to restrict our benchmarking case study to dierent adaptive scaling • strategies of AWS EC2 cloud platform. Setting up the testbed and carrying out benchmarking in other cloud oerings (e.g., Microsoft Azure, GCE) require a great deal of development, management and economic eorts. For this reason, we have decided to leave it out of our scope for the time being and explore later in the future.

It is not our intent to develop queuing theoretic models based on the instantiation • 22 Introduction

of our benchmarking framework.

Another area we have not covered is the development of elasticity optimization tech- • niques. It is an interesting research direction that we plan to explore in the future.

1.7.2 Assumptions

Our elasticity benchmarking framework relies on several assumptions. We recognize that some of those may not always hold in the real world setting. However, we have found those assumptions reasonable enough to evaluate and analyze elasticity of the cloud platforms in our case studies. With the continuation of our research, we will gradually relax or refine some of our assumptions, so that it can also cope up with the imperfections of the real world. For the time being, we specify the following assumptions:

There is a running version of the application hosted on the cloud which exhibits • adaptive behavior in response to the fluctuating resource demand of the workload.

The consumer has access to the resource utilization and performance related infor- • mation through the platform’s APIs and other performance tools.

There are sucient infrastructure resources available to evaluate the cloud platforms • over the full range of workload demands.

The consumer’s application has one or more QoS objectives, whose violation has • financial implications. It is also possible to express the financial implication as a mathematical function.

The consumer has the necessary analytical skill to cater for the mapping between • low-level observations and high-level business objectives as well as instantiating ap- propriate functions from the metric definition templates.

The consumer has the necessary skill to split the monetary cost for the VM bundle • within its contained resources.

The consumer has necessary development and deployment skills to configure the • testbed and representative adaptive scaling techniques and take measurements for elasticity evaluation. 1.8. Research method 23

observation Define elasticity metrics literature

Design workload suite / Develop workload model

Outline steps for instantiating an executable benchmark

Develop an executable benchmark

Recommend techniques for valid, repeatable benchmarking results

Figure 1.7: Research method: high-level approach 1.8 Research method

The research method adopted in this dissertation is based on both theoretical concepts and practical views. The research problem is formulated based on a comprehensive literature review of academic publications and experience reports from the practitioners’ blogs. The high-level methodology for the design and development of ElasticMark is depicted in Fig. 1.7.

Initially, an intuitive understanding of elasticity and its core aspects is formed based on the state of the art review and empirical observations on a real cloud platform. Next task is to define the elasticity metric so that it represents the complex interplay between the technical elasticity aspects and the consumer’s concerns. To meet this criterion, a penalty based approach is adopted that penalizes the cloud platform for imperfect elasticity in terms of monetary cost (e.g., hourly penalty rate).

Workload characteristics play a crucial role in consumer-centric elasticity evaluation. Since elasticity is a measure of the adaptability behavior of the cloud platform, it needs to be stressed with fluctuating workloads during evaluation. For this reason, we design a time-varying workload suite with representative usage scenarios. In addition to that, we include a novel workload model that allows the consumer to reproduce representative prototypes of her fine-scale bursty workloads (i.e., workloads with noisy oscillations at 24 Introduction the small timescale, such as seconds) from traces and carry out elasticity evaluation for custom scenarios in a cost-eective way.

Furthermore, we specify a set of precisely defined steps for instantiating an executable elasticity benchmark from the abstract framework. This process is exemplified through the construction of a prototype elasticity benchmark. This prototype is used for two purposes; first, it helps us demonstrate the applicability of our framework in the context of a real cloud oering, such as the AWS EC2 cloud. It also helps us explore interesting phenomena and pinpoint anomalies of the cloud platform.

The elasticity behavior shows some variation across dierent runs of the same workload because of the performance unpredictability of the cloud, which eventually aects the repeatability and validity of the elasticity evaluation results. To resolve this issue, we present a set of rigorous techniques to ensure valid and repeatable elasticity evaluation results even in the presence of the performance unpredictability of the cloud environment.

1.9 Terminologies used

There are several terms used throughout this dissertation. The definitions of those terms in the context of this research are presented in the following:

Metric. According to Fenton and Bieman [101], a measure is a number or symbol • that characterizes a specific attribute by mapping its empirical notion to the formal, relational world. A metric is “a quantitative measure of the degree to which a sys- tem, component or process possesses a certain attribute” [202]. In this dissertation, we define a single-figured elasticity metric so that the consumer can readily under- stand “how elastic is a given cloud platform” and compare the relative worth of one platform over another.

Validation. It is the process of “confirmation, through the provision of objective • evidence, that the requirements for a specific intended use or application have been fulfilled” [202]. Appropriate validation of software metrics is a necessary precondition for developing a good benchmarking framework [100]. The software measurement discipline stresses the importance of carrying out both theoretical and empirical validation; the former seeks to determine whether the software metric is a good 1.9. Terminologies used 25

reflection of the attribute it is purported to measure, whereas the latter intends to provide convincing evidence to demonstrate that the metric is practically useful [66, 147].

Benchmark. It refers to “a procedure, problem or test used to compare systems or • components with each other or to a particular standard” [202]. In this dissertation, we define a benchmark as a standardized tool which evaluates the quality of an attribute of a given system (e.g., elasticity of a cloud platform) against an established standard or point of reference. Typically, a benchmark consists of one or more performance metrics, a workload suite and a specification documenting the rules, requirements and constraints of the measurement environment [71]. A benchmark is used to compare the relative worth of various systems for decision-making purpose and extract insights about the performance bottlenecks in a system.

Benchmarking framework. By this term, we denote a conceptual abstraction • that includes a set of metric definitions, workload specification and guidelines for instantiating an executable benchmark.

Cloud platform. By this term, we mean an adaptive cloud system on which the • consumer’s application can run. Typically, an adaptive cloud system is comprised of an underlying cloud infrastructure and a set of scaling policies to adjust the resource capacity in response to a fluctuating workload.

Cloud consumer. In this dissertation, this term refers to an application provider • who leases the cloud infrastructure resources (IaaS) or development platforms (PaaS) to deliver a specific service to the end-users.

Workload. Generally, it refers to “a mix of tasks running on a given computer • system” [202]. According to Cloud scale EU project [64], a workload is the combi- nation of work and load,wherework denotes the data that needs to be processed by a service to yield a particular result and load refers to the frequency with which the service is invoked to perform the work. In this dissertation, a workload means a sequence of concurrent user requests to be processed by the application. More specifically, a workload has a specific duration and within that duration, the num- ber of concurrent user requests at each timepoint is precisely defined. Simply put, a workload is a function that relates each timepoint to a specific number of user requests. In this research, our purpose is to evaluate the adaptability aspect of the 26 Introduction

cloud platform; for this reason, we concentrate on workloads which are time-varying in nature and stress the application to increase and decrease its resource capacity on demand.

Fine-scale bursty workload. The timepoints of a workload can be viewed at • dierent resolutions or timescales. For instance, one can view the workload at a coarse resolution where the request rates are defined per hour; alternatively, one can view the workload at a fine resolution where the request rates are defined per second. Web and e-commerce workloads exhibit noisy oscillations or burstiness at the fine timescale. In this dissertation, we use this term to denote workloads which exhibit severe oscillations at the fine timescale (e.g., seconds, milliseconds).

Workload model. By this term, we mean a systematic representation of a workload • that resembles some of its important characteristics. A workload model facilitates benchmarking, performance analysis and capacity planning in a managed and cost- eective way.

Workload demand. By this term, we refer to the resource demand or process- • ing requirement of a given workload. For instance, a workload intensity of 20 re- quests/second may have a resource demand of 10% CPU capacity and 2MB memory.

Resource supply. This term implies the amount of resource capacity supplied • by the underlying system in response to a resource demand. It can assume two dierent meanings when used in the context of this dissertation: available supply and chargeable supply. Available supply indicates the amount of resources allocated by the cloud platform, whereas chargeable supply refers to the amount of resources for which the cloud consumer gets charged by the underlying platform.

Load generator or load driver. It is an important component of the benchmark • that initiates workloads during benchmarking [192, 31].

System Under Test (SUT). It is a collection of system components that processes • the workloads generated by the load drivers during benchmarking [192, 31]. In this dissertation, an SUT is comprised of a scalable infrastructure and a management system for monitoring and scaling that infrastructure on demand. The scalable infrastructure is of the principal interest of the benchmark; the management system is considered as a functional component. 1.10. Thesis overview 27

Testbed. It refers to the entire test setup which includes the SUT as well as • any external systems required to carry out the benchmarking process [31]. In this dissertation, the testbed consists of the SUT and one or more load drivers.

1.10 Thesis overview

This dissertation is composed of six themed chapters.

Chapter 1 has presented the motivation of our work. It has also described the contributions and impacts of this research as well as the research problem and hypothesis, scope and assumptions and an overview of the research method.

Chapter 2 begins by laying out the background concepts to understand this research. The position of our research with respect to the state of the art is also discussed in this chapter.

Chapter 3 introduces a core framework for evaluating the elasticity of the cloud plat- forms from the consumer’s perspective. Later in this chapter, we instantiate an executable elasticity benchmark based on this framework and use it to explore some interesting adap- tive behavior of the EC2 platform.

Chapter 4 can be regarded as an extension of the core framework. It presents a novel workload model that allows the consumer to reproduce prototypes of their fine-scale bursty workloads and conduct custom elasticity benchmarking in a cost-eective way.

Chapter 5 presents a set of techniques to ensure valid and repeatable elasticity bench- marking results even in the presence of the performance unpredictability of the cloud en- vironment. This chapter brings together several techniques from traditional computing and other fields of science to deal with the variation in the resultant elasticity metric.

Chapter 6 summarizes and critically evaluates the contributions of this work and explores possible avenues for future research.

General remarks. We have not included any separate validation chapter in this dissertation; instead, we have decided to validate our research ideas on a chapter by chapter basis. At the end of each concrete chapter, a critical evaluation section is also included that projects on the potential risks and benefits of the work presented in that chapter.

Chapter 2

Background and state of the art

“Don’t be satisfied with stories, how things have gone with others. Unfold your own myth.”

Jalaluddin Rumi

Elasticity is a pretty new computing paradigm that has the prospect to revolutionize the way resources are procured and utilized in the IT industry. Despite the freshness of this concept, it has been identified with a number of obstacles that ultimately have given rise to a large area of research. The aim of this literature review is, therefore, two folds; first, we lay out the necessary theoretical grounding to understand the elasticity concept and dierentiate it with similar terms. Next, we present an overview of the state of the art research in elasticity and relevant areas and give justification as how our work fits into the existing body of literature.

2.1 Background

The topics covered in this section includes the definition and characteristics of elasticity, its dierentiation with similar terms (e.g., scalability and eciency) and the architectural overview of an elastic system.

29 30 Background and state of the art

2.1.1 Elasticity: definition and characteristics

Since its inception, the interest in “Elasticity” witnessed a dramatic rise as it was consid- ered to be one of the most important and appealing features of the cloud. Initially, the term was ambiguous, and there was no common and clear definition and understanding of it. Several eorts attempted to address this gap with the main focus on the cloud; however, those eorts diered in the angle they looked at this feature.

A list of definitions on elasticity is given in Appendix A. Most of these definitions viewed elasticity as an extension of scalability that allows dynamic provisioning and depro- visioning of resource capacity on demand ([79, 172, 184, 207, 213, 173, 77, 211, 49, 185, 85]). Some interesting definitions are as follows:

“A simple, but interesting property in utility models is elasticity, that is, the ability to stretch and contract services directly according to the consumer’s needs.” (David Chiu, Editor, CrossRoad [77])

“Elasticity is basically a ‘rename’ of scalability [. . . ].” And “Elasticity is [. . . ] more like a rubber band. You ‘stretch’ the capacity when you need it and ‘release’ it when you don’t anymore.” (Edwin Schouten, IBM, Thoughts on Cloud [211])

These definitions, however, provide an oversimplified view about elasticity and do not shed any light on the quality of the adaptation process.

A large and growing body of literature attempted to provide a more rigorous definition for elasticity by relating it to two important dimensions, such as rapidity of resource adaptation and fine-grained resource supply [156, 48, 55, 223, 160, 150, 178, 125]. Armbrust et al. [48] defined elasticity as the cloud’s ability to add and remove computing resources at a fine grain and with a lead time of minutes so that the supplied capacity can closely follow the workload demand. With the term “fine grain”, they pointed to the granularity of one server at a time; however, they did not discuss anything about at what granularity the server capacity should be provided.

The prestigious National Institute of Standards and Technology (NIST) [178] provided definitions for a number of cloud-specific terms. For elasticity, NIST pointed to rapid pro- 2.1. Background 31 visioning and deprovisioning capability, virtually infinite resource capacity with unlimited purchasable quantity at any single moment.

Garg et al. [119, 118] followed a similar approach as NIST in their elasticity definition. They defined elasticity in terms of two aspects: mean adaptation delay to expand or contract the resource capacity and the max amount of resources that can be provisioned during the peak.

Among others, the definition given by Herbst et al. [125] is widely accepted in the community. Their definition is as follows:

“Elasticity is the degree to which a system is able to adapt to workload changes by provisioning and de-provisioning resources in an autonomic manner, such that at each point in time the available resources match the current demand as closely as possible.” (Herbst et al. [125])

This definition views elasticity as the quality of the adaptation process in response to a fluctuating workload and associates it with the scaling speed and precision between de- manded and supplied resource quantity. The scaling speed reflects how quickly the system can make a transition from an under-provisioned or over-provisioned state to an optimal resource configuration state. The precision aspect quantifies how closely the demanded and allocated resource quantity match each other. It is intuitive that if the resources are provided with the finest granularity in no time, the demand and supply curves will coincide with each other and the cloud platform will be considered as absolutely elastic. However, the real world is not as ideal as an absolute one; so the demand and supply rarely coincide with each other.

Although those resource-oriented elasticity definitions seem to be reasonable at first blush, one can easily encounter exceptions when looked through the lenses of the cloud consumer. These definitions are subjective (as they use terms - rapidly and closely) whose precise interpretation depends on the consumer’s context. For instance, a cloud platform with 2-minutes scaling delay may be perceived dierently by dierent applications. A batch processing application may find it elastic, whereas an interactive application with stringent QoS requirements (e.g., web application, online gaming) may perceive the same platform as not-so-elastic. Moreover, these definitions relate elasticity to the platform’s resource adaptability behavior only; looking at these definitions, it is not possible to interpret the meaning of elasticity in terms of “economies of scale” (main motivating 32 Background and state of the art factor for elasticity). As an example, consider two cloud platforms which are identical in all respects (scaling speed and precision) except their pricing models. Based on the resource elasticity definition, these two platforms have identical elasticity behavior; however, this conclusion is misleading as they yield dierent operational costs per transaction.

To resolve the above issues, the elasticity definition needs to incorporate the consumer’s context and the notion of cost-eectiveness. A number of authors addressed these concerns in their elasticity definitions [201, 40, 94, 93, 113, 111, 112, 155, 145, 103, 129, 78, 243].

Several authors [40, 94, 93, 155] explicitly tied the rapidity of resource adaptation pro- cess to the consumer’s application performance. They emphasized on a seamless resource adaptability behavior so that the hosted application do not suer from service disrup- tion and performance variation when resources are added or removed by the underlying platform.

Gambi et al. [113, 111, 112] specifically stressed on two aspects while defining the elasticity of the cloud platform: quick resource adaptation speed to ensure consistent QoS and cost-eective resource usage for minimal operational expenses. Their definitions are as follows:

“Elastic computing systems can dynamically scale to continuously and cost- eectively provide their required Quality of Service in face of time-varying workloads, and they are usually implemented in the cloud.” (Gambi et al. [111])

“Cloud-based elastic computing systems dynamically change their resources al- location to provide consistent quality of service and minimal usage of resources in the face of workload fluctuations.” (Gambi et al. [112])

Daryl Plummer, [200] a Gartner Fellow, viewed elasticity as a critical tool to be han- dled by the consumers because of several factors, such as variable nature of the workloads and lack of control in upfront budgetary planning. In his opinion, companies with lack of acumen in managing fluctuating workloads may sometimes end up with too much opera- tional expenditures which are far beyond their anticipated limits.

The blog posts from Timothy Fitz and Ricky Ho [103, 129] pointed to an additional aspect of elastic behavior, i.e., the granularity of usage accounting. That means, the 2.1. Background 33 deprovisioning aspect of elasticity is not only a function of the speed to decommission a resource, but also depends on whether charging for that resource is immediately ceased or not. This aspect is crucial to cover the notion of cost-eectiveness in the elasticity definition. Timothy Fitz also identified several enterprise use cases (e.g., extremely parallel computing needs, web user interaction) where the current degree of spin-down elasticity (typically an hour) is inadequate.

Considering all of these definitions, we identify the following key characteristics of elasticity:

Resource adaptation speed. Ideally, an elastic cloud platform should seamlessly • adapt its resource capacity in response to the time-varying workload so as to satisfy the QoS requirements of the hosted application.

Maximum scaling factor. It refers to the maximum amount of resources that can • be procured at once. To the consumer, the amount of resources in the provider’s resource pool should appear to be virtually infinite and they should be able to acquire as many servers as they need at any single moment to meet their workload demands.

Granularity of usage accounting. In an elastic cloud platform, a consumer • should be charged based on the amount of resources consumed and no more. It depends on two factors: the granularity of the resource supply and the charging quanta of the cloud platform. If the resources are allocated at a fine granularity, then the supplied resource quantity can closely follow the demand curve; thereby, minimizing the operating expenses for idle resources. On the other hand, charging quanta implies whether charging for a resource is immediately ceased with its release or not; it also influences the operational cost for unutilized resources.

Cost-eective resource usage. A motivating factor behind elastic cloud adoption • is economies of scale; that is, consumers would prefer a cloud oering that minimizes the operational cost per transaction over a range of scales. This essentially points to the pricing strategy of the cloud platform; between two platforms A and B identical in all respects, A provides better elasticity if and only if the cost per transaction is cheaper in A than in B.

Among these aspects, the Maximum scaling factor cannot be measured directly at the consumer’s end. However, this is not a major problem because almost all cloud providers 34 Background and state of the art nowadays possess virtually infinite resource capacity, which is most often sucient to serve the peak workloads of the consumer. The other characteristics, however, can be practically measured at the consumer’s end; therefore, we seek to incorporate these characteristics into our elasticity metric definition.

2.1.2 Comparison: Elasticity, Scalability and Eciency

Elasticity, scalability and eciency are commonly used (and often misused) in the context of cloud computing. Several academic works and practitioners’ blogs [236, 97, 40, 64, 156, 129, 221] discussed the similarities and dissimilarities among these terms.

2.1.2.1 Scalability

In general, scalability is the ability of a system to meet its quality requirements (e.g., supporting more users, improving QoS or both) in response to a growing workload by incrementally adding a proportional amount of resource capacity [141, 240, 95, 87]. In traditional client-server systems, applications are designed for scalability to make sure that the operational expenses grow cost-eectively with respect to workload demand. The case of removing resources, when the workload demand shrinks back to normal, is usually not considered because the purchased resources are already a sunk cost. However, in the context of the cloud and its pay-as-you-go pricing model, the financial motivation for re- leasing resources during low demand cannot be ignored and hence the notion of scalability needs to go through further refinement [129]. In cloud computing, scalability describes the ability of a higher layer service to meet its quality requirements in response to varying workload demand by adjusting its resource consumption from its lower layer service [64]. Resource capacity can be scaled in two dierent ways: vertical scaling (also known as, scaling up/down) and horizontal scaling (also known as, scaling out/in). Vertical scaling means the addition of resources (or removal of resources) to a computing node. On the other hand, Horizontal scaling refers to the addition (or removal) of computing nodes to an existing cluster. Note that scalability is a time-free notion and hence it does not capture how quickly the system adapts to a changing workload demand over time. In contrast, elasticity is a time-dependent notion which measures how quickly a system can adapt to a 2.1. Background 35 varying workload without causing any disruption in the oered service. This rapid respon- siveness notion is absent from the definition of scalability. Moreover, scalability does not consider how frequently and at what granularity the system adjusts its resource capacity as the workload demand varies over time. On the contrary, elasticity is concerned about how precisely the supplied capacity follows the workload demand over time; therefore, these aspects are considered as crucial ingredients in defining and measuring elasticity. In the context of cloud, elasticity means real-time optimization of the operational expenses with the variation in workload demand; in other words, good elasticity ensures significant savings with reduced operational expenditure and minimal service interruption.

2.1.2.2 Eciency

Eciency describes a system’s ability to process a certain amount of work with the smallest possible eort (e.g., cost, consumed energy or consumed amount of resources) [64, 236]. This term can be applied either to part of a system, e.g., to a single resource or to an entire system. The workload for which eciency is measured could be either constant or fluctuating over time. In contrast, for constant workloads, elasticity is not an attractive option as there is no need for resource adaptation in this case; so the value of elasticity can be best realized in a fluctuating demand setting.

2.1.2.3 Comparison

Table 2.1 compares elasticity, scalability and eciency based on several criteria. As we can see, elasticity is a dynamic property of the system as it allows real-time adaptation of its resource quantity in response to workload fluctuations. On the other hand, scala- bility and eciency are static properties of the system; none of these are concerned with the real-time adaptation process. Scalability and elasticity, however, are related to the adaptability aspect of the system. Scalability is a measure of the system’s ability to func- tion gracefully at dierent scales whereas elasticity specifically determines the quality of the adaptation process - that is, whether the system can make smooth and continuous transition between dierent scales in real time to guarantee acceptable QoS in response to the fluctuating workload demand. The quality of the adaptation process is measured 36 Background and state of the art in terms of adaptation time and accuracy between the demanded and supplied resource quantity. On the contrary, scalability is not at all concerned about responsiveness delay and resource accuracy.

Table 2.1: Comparison: Elasticity, Scalability and Eciency

Criteria Elasticity Scalability Eciency Dynamic property X Adaptability X X Real-time adaptation X Quality of adaptation X Performance (QoS) X X X Resource X X X Cost-eectiveness X X X

Note that all of these terms are more or less related to performance, resource and cost-eectiveness. However, eciency is a broader concept and may apply to a range of domains in the cloud computing context. For example, power eciency defines how well a system consumes power with respect to time [76, 75], computational eciency reflects the ratio between achieved operations and peak operations per second [98], etc. In the context of scalability, cost may sometimes be implicit, such as when the quality of interest is performance and energy consumption (e.g., performance/watt metric). In Section 2.1.1, several resource-oriented elasticity definitions have been discussed that completely ignore performance and cost aspects. We argue that elasticity is not an absolute concept, but a relative one whose precise value depends on the consumer’s context. Therefore, leaving performance and cost out from its definition and measurement conveys an incomplete view about elasticity.

2.1.3 Foundation

This section provides an architectural overview of the elastic system and its properties [96, 237, 57]. 2.1. Background 37

2.1.3.1 Elastic system architecture

An elastic system can dynamically adjust its resource capacity in the face of fluctuating workload conditions. Thus, it maintains a tolerable performance threshold in the delivered service with minimal operational expenditure. In particular, two specific features of the cloud serve as the main motivation behind an elastic system: on-demand availability of resources and usage-based pricing model.

Here goes an intuitive description of the working process of an elastic system. Usually, an elastic system starts with some minimum allocation of resources so that it can process requests under normal workload condition. In the course of time, there may arise situations (e.g., seasonality of user demand, flash crowds etc.) when the system resources tend to saturate because of a sudden increase in the arrival of requests. In this condition, an elastic system stretches its capacity by allocating additional resources from the cloud to maintain a tolerable performance limit. When the peak is over and the request arrival rate drops back to normal, a fraction of the resources in the system remains unutilized. In this condition, the system contracts its capacity by de-allocating a portion of resources to the cloud, thus not paying anymore for idle resources. This is how an elastic system meets its performance criteria with minimal operating expenses in response to workload fluctuations.

Figure 2.1: Architecture of an elastic system (adapted from [126])

Figure 2.1 shows the basic architecture of an elastic system. At the core of the elastic system, there are two components: a scalable infrastructure and a management system. The cloud providers oer infrastructures or development platforms to the consumers in the form of virtual machines (VM) with access to network bandwidth and storage. A 38 Background and state of the art or Virtual Machine Monitor (VMM) abstracts out virtual machines from the underlying physical hardware. The hypervisor has control over the physical server’s re- sources and can manage allocation (de-allocation) of resources to the virtual machines in case a scale-up (scale-down) is required. In some cases, the cloud consumer prefers to distribute her application load to a pool of virtual machines with dynamic scale-out (scale-in) capability. This task is performed by a load-balancer which sits in front of the pool of virtual machines and forwards the incoming requests based on a pre-configured routing algorithm (e.g., round-robin, least outstanding requests etc.) to the attached VMs. The scalable infrastructure is administered by a Cloud Management System which is com- prised of a load-balancer,amonitoring system,areconfiguration management module and in most cases, an elastic controller. The monitoring system periodically monitors the re- source utilization and performance characteristics of the VM instance pool. It can also send notifications to the elastic controller or the application administrator once a trig- gering condition is configured. A trigger watches a single metric (e.g., CPU utilization, latency etc.) or a combination of metrics over a time interval and sends a notification once the observed value of the metric(s) meets a predefined threshold condition for a specific number of times. The elastic controller, on receiving the trigger notification, makes an adjustment to the resource capacity by invoking the reconfiguration management API. The reconfiguration management service places request to the scalable infrastructure for an allocation (de-allocation) of resources and updates the load-balancer’s instance pool, if necessary, once the request gets fulfilled. This is how an elastic system can adapt to the workload demand by stretching and contracting its resource capacity in real time. The quality of the adaptation process depends on several parameters, for example, pro- visioning and deprovisioning time of the scalable cloud infrastructure, supplied resource granularity, the cloud provider’s pricing model, elastic controller’s adaptation strategy to load variation and so on.

2.1.3.2 Properties

The behavior of a cloud-based elastic system is a function of various factors and their complex dependencies, such as the adaptation quality of the underlying cloud platform, workload pattern and adaptation logic specified in the elasticity controller. These complex dependencies often give rise to hard-to-determine eects on the elasticity behavior of 2.1. Background 39 the cloud-based application. For this reason, application providers feel the need of a formalization and verification technique to check whether the elastic system adheres to the intended elasticity behavior or not. To address this need, Bersani et al. [57] characterized the properties of an elastic system and proposed a formalization to it using temporal logic called CLTLt(D) that stands for Timed Constraint LTL. This formalization is later used to construct a verification tool for validating whether certain facets of an elastic system hold or not during its execution in the cloud. In the following, we describe the properties of a cloud hosted elastic system. For brevity, we skip the formalized definition. The details can be found in [57]. The properties of an elastic system can be classified into three categories: elasticity, resource management and quality of service.

Eagerness. It describes the adaptation speed in response to workload variation • over time.

Sensitivity. It denotes the minimum change in workload that starts an adaptation • process. This parameter is defined over a range of the recently observed load. If the load stays within a particular limit, then the system resources are adequate to meet the demand, therefore no adaptation action is required. Otherwise, the system needs to go through an adaptation process to prevent resource saturation or resource wastage.

Plasticity. It defines the system’s inability to release resources and go back to the • minimal resource configuration from a state with higher resource capacity. Ideally, an elastic system should be able to de-allocate a portion of its resources and revert to the minimal resource configuration within a reasonable amount of time once the load intensity decreases; failure to exhibit this behavior makes the system plastic.

Precision. It defines the accuracy with which the elastic system can allocate or • de-allocate resources so that the dierence between the supplied resource quantity and workload demand is minimal (or zero in an ideal case).

Oscillation. It refers to repeated allocation and de-allocation of resources even when • the workload demand is stable. Poorly designed scaling rulesets may sometimes give rise to oscillatory behavior. Although oscillation appears to be a valid behavior, it has a negative impact on the running cost for the cloud application.

Resource thrashing. It refers to a situation when elastic systems tend to exhibit • 40 Background and state of the art

opposite adaptations in a very short interval. For instance, a system may acquire additional resources in one interval, and then when the resource allocation phase is completed, it starts to release these resources. This can be thought of a temporary, yet quick oscillation in the adaptation of resources. In a resource thrashing situation, the portion of resources taking part in oscillation can not perform any useful work, however, they increase the running cost of the cloud-hosted application.

Cool-down period. It is the amount of time that must elapse before a new scaling • action can be issued after an adaptation action has been triggered. During this interval, the elastic controller inhibits itself from triggering any scaling action in order to let the system stabilize after the previous adaptation. The purpose of the cool-down period is to prevent any unexpected oscillatory behavior.

Bounded concurrent adaptations. It is the maximum number of scaling actions • that are allowed by the elastic controller when an adaptation is in progress. It can be viewed as a relaxed generalization of the cool-down period strategy where the maximum number of scaling actions during an adaptation is one.

Bounded resource usage. It is a property that allows the elastic system to use • resources beyond a certain threshold, if necessary, for a pre-specified interval. This means, whenever the elastic controller allocates more resources than specified by some temporary threshold, it should de-allocate those excess resources before the end of that interval.

Bounded QoS degradation. It is a property that restrains the amount of QoS • degradation during adaptation. It conveys the fact that the normally-required QoS threshold can only be stratified when the elastic system is stable, i.e., it is not going through any adaptation process; otherwise, a more relaxed level of QoS threshold needs to be enforced during adaptation.

Bounded actuation delay. It restrains the maximum delay introduced by the • elastic controller and reconfiguration service to execute a scaling action. Intuitively, the time required for the application to be ready to serve requests must be limited by this value.

Preserving the intended elasticity properties may seem to be reasonably trivial to achieve at the first blush, yet there may arise some imperceptible situations when the 2.2. Related work 41 system may fail to exhibit the desired level of elasticity. In Chapter 3, we will present some real-world examples to elucidate this point further.

2.2 Related work

This section presents an overview of the state of the art research on elasticity evaluation and other related areas.

2.2.1 Cloud performance analysis and benchmarking

As mentioned in the previous section, elasticity is a combination of several internal at- tributes of the cloud platform, some of which are directly related to low-level performance measures (e.g., scaling latency, resource specification). It also has some commonalities with the concept of scalability. From this perspective, our work has some relevance to the area of cloud performance analysis and benchmarking; some of those works analyzed newly arrived features (e.g., variability) of the cloud platform while some others refined the existing metrics and evaluation strategies for the old ones (e.g., scalability). In the sections that follow, we highlight the main dierences between these streams of research and our work.

2.2.1.1 Variability

Variability refers to the extent of spread-out or fluctuation in the observed values of a performance aspect of the cloud oering [166]. This branch of research is specifically concerned about quantifying variability of a set of primitive performance measures, inves- tigating possible causes for variability and recommending viable solutions for its eective management [196, 193, 210, 88, 51, 162, 241, 136, 135, 99, 235, 73, 246, 153].

Several studies discovered substantial variability in the performance (e.g., resource scaling latency, CPU, IO and network performance of VMs) of public cloud oerings [196, 136, 210, 88, 162]. Some key findings from these studies are: (1) the performance of identical VM types is very heterogeneous and may vary up to a factor of 4 [88]; (2) most of 42 Background and state of the art these performance measures exhibit weekly and yearly patterns [196, 136]; (3) The choice of availability zone, time of the day and day of the week often exert significant influence on these performance measures, for instance, VM startup time showed high mean covariance when aggregated by hour of the day or day of the week (up to 91.4% for day of the week for EC2 m1.small instance in US data centers) in the study of Schad et al. [210]. Most of these works also conducted a series of case studies to analyze the impact of these micro variances on scientific and interactive application prototypes. These empirical analyses finally led the authors to the following conclusions: (1) variability has significant implications for SLA-aware dynamic provisioning systems in terms of both performance and cost, and (2) the complex dependency of the performance measures with the temporal and spatial aspects of the cloud platform has an influence on the repeatability and reproducibility of wall-clock experiments.

Considerable eort has been spent on addressing this variability problem; some works delved deeper to relate the performance variability eect further to the internal factors of the cloud environment [99, 51, 241, 162, 235, 73] while some others suggested novel techniques to mitigate its impact and deliver reliable QoS guarantee for the consumer’s application [51, 193, 241, 99, 153]. Various environmental factors of the cloud contribute to performance variability, examples include, heterogeneity in the underlying commodity hardware, interference eect from co-located VMs, sudden overload due to resource over- commitment at the provider’s end and so on [99, 51, 241, 162, 235, 246]. To resolve this issue, several solutions has been proposed; for instance, heterogeneity-aware VM place- ment strategies [99], Overdriver that adaptively switches between network memory based co-operative swap and VM migration techniques for mitigating transient and sustained memory overloads respectively [241], Q-Clouds - application feedback driven closed loop controller that tunes resource allocation to guarantee reliable QoS [193] and several ad- mission control policies for performance isolation (such as round-robin request handling, blacklisting disruptive tenants) [153].

Although at first glance this research does not seem to have much relevance to our work, it oers us some valuable insights for designing the elasticity benchmarking frame- work. It reveals significant variability in the micro characteristics of elasticity (e.g., scaling latency, resource demand due to variation in the VM performance), thereby setting our expectation that the overall elasticity behavior will also exhibit some variation across dif- ferent benchmark runs. To reiterate, variability may aect the validity and repeatability 2.2. Related work 43 of the elasticity evaluation results, if appropriate precautions are not taken. To resolve this issue, we recommend a set of rigorous techniques to ensure valid and repeatable elasticity evaluation results in Chapter 5.

2.2.1.2 Scalability

As pointed out in Section 2.1.2, both scalability and elasticity address the adaptability aspect of the cloud platform, however, in a complementary way. From this perspective, our work is closely connected to scalability evaluation research.

This area of research revolves around some key themes: development of scalability benchmarks and metrics [151, 64, 225, 116, 93, 79], scalability evaluation of cloud oer- ings for a range of workloads [151, 79, 93, 155] and development of scalability testing frameworks [115, 234]. A lot of works defined scalability measurement metrics in the context of cloud computing [151, 64, 225, 116, 93, 79]. Some example scalability metrics are: scaleup [79], WIPS and cost/WIPS where WIPS is defined as the number of requests meeting the Service Level Agreement (SLA) constraints [151], performance change with respect to workload change (measured as the PRR ratio for the compared workloads, where PRR stands for ‘performance to resource ratio’) [225], mean transactions per sec- ond [46], and ratio between system load increase and system capacity increase [116]. Most of these works characterize scalability in terms of performance and/or cost - two critical factors that influence the consumer’s purchase decision. The consumer-centric view in this domain is a good source of inspiration for our research. Despite this similarity, there is a clear distinction between scalability and elasticity measurement. Scalability metric is concerned about the system’s ability to meet its quality objectives (e.g., performance and cost) over a given range of workloads; for instance, it checks whether the system can function gracefully (e.g., in terms of latency) when resources are added in proportion to the workload. That means, scalability is measured based on the discrete steady-state behavior of the system (pre-adaptation state and post-adaptation state). It is not at all concerned about the system’s behavior during adaptation, that is, how quickly the system makes transitions between states and how accurately the system calibrates its resource al- location. This is what elasticity is concerned about; for this reason, it is considered as the dynamic property of the system. This argument also justifies the complementary relation- ship between scalability and elasticity; both of them together represent the adaptability 44 Background and state of the art behavior of the system.

Another obvious dierence between scalability and elasticity evaluation relates to the workload specification. Scalability evaluation considers growing workload patterns (that is, workloads that grow larger and larger) [151, 79, 93, 155]; on the contrary, elasticity evaluation requires fluctuating workloads (that is, workloads that grow as well as shrink) to assess the quality of the adaptation process.

2.2.2 Elasticity benchmarking: initial concepts

This section presents the initial ideas on elasticity measurement; some of these works followed the naive definition of elasticity and expressed it simply in terms of the resource scaling delay, while others put forth more concentrated eort to sketch the basic layout for an elasticity benchmarking framework.

As mentioned in Section 2.1.1, elasticity was initially viewed as the scaling delay of the underlying cloud service to adjust the resource capacity. Inspired by this definition, several research groups focused on measuring dierent statistics of the scaling delay as well as its complex dependency with various non-deterministic factors of the cloud platform and determining its suitability to handle various workload scenarios; examples include, [97, 164, 163, 127, 136, 173, 65]. Recall from Section 2.1.1, scaling delay is but one of the many aspects of elasticity; reporting it alone is not sucient to provide a complete view about the elasticity of the cloud platform (not even from the resource elasticity perspective).

The Standard Performance Evaluation Corporation (SPEC) Open System Group [37] characterized elasticity in terms of four metrics: Provisioning interval (same as scaling delay), Agility (measures how closely the supplied resource quantity tracks the workload demand), Scale up/down (measures the system’s ability to maintain a consistent unit completion time for increased problem size by adding a proportional amount of resources) and Elastic Speedup (measures whether the performance improvement is proportional to the increase in resource quantity or not). Note that the first two metrics capture the resource elasticity view, however, the last two metrics reflect scalability, not elasticity.

Among others, Suleiman et al. [218] characterized elasticity in terms of elasticity time (i.e., the scaling delay), minimum and maximum amount of resources that can be added 2.2. Related work 45

(i.e., how much to scale), specification and types of resources, and amount of available resources. The second and third metrics influence the granularity of usage accounting to some extent; however, these are not specific characteristics of elasticity.

Folkerts et al. [105] suggested an elasticity metric based on the price of a varying workload with respect to that of the full workload; a cheaper price for the varying workload serves as an indicator of the cloud platform’s elasticity property. This metric completely disregards the consumer’s detriment for the spin-up delay and imprecise resource supply, thus conveys a flawed view about the consumer-perceived elasticity of the cloud platform.

Li et al. [166] characterized elasticity in terms of three metrics: Resource acquisition time, Resource release time and Cost and time eectiveness (relates to the granularity of usage accounting to some extent). These discrete metrics, though comprehensive, can- not reflect the complex interaction between the cloud platform’s elasticity property and the consumer’s application specific context; this is the gap we want to address in this dissertation.

Binnig et al. [59] sketched the initial layout for an elasticity benchmark. They defined an elasticity metric based on the ratio of WIPS in RT (Web Interactions Per Second sat- isfying the given Response Time constraint) and Issued WIPS; a ratio of 1 means perfect elasticity of the cloud platform in response to that workload. In addition to this primary metric, they recommended reporting cost (i.e., $/WIPS) and standard deviation in cost; a smaller and stable cost value indicates better adaptability of the cloud platform. Fur- thermore, they also discussed the general characteristics of the elasticity benchmarking workload, such as slowly growing workload, fast and sudden spikes that stress the plat- form’s adaptability aspect etc. This idea, although perfectly valid, did not crystallize into a tangible elasticity benchmark; nevertheless, it still serves as a source of inspiration and fuel for many consumer-centric elasticity benchmarking frameworks, including ours.

2.2.3 Elasticity benchmarking frameworks

This section reports on the state of the art research in elasticity evaluation. This stream of research specifically concentrates on the design and development of elasticity benchmarks as well as revealing the pros and cons of alternative elasticity solutions. The publications in this area can be roughly divided into two categories: micro-benchmarking frameworks 46 Background and state of the art

Elasticity frameworks

Assumptions Measurement Validation Pragmatic issues and scope framework strategy Cloud deployment Modeling approach model Cloud service Metric(s)

System evaluated Workload profile

Modeling perspec- Figure of merit tive Testing method

Figure 2.2: Taxonomy of factors for analyzing elasticity benchmarking frameworks and macro-benchmarking frameworks. A micro-benchmarking framework focuses on the discrete characteristics of elasticity to reveal potential bottlenecks of the underlying sys- tem, while a macro-benchmarking framework attempts to draw an overall conclusion about the relative worthiness of competing cloud platforms.

In the subsections that follow, we present a taxonomy of factors for analyzing this diverse set of elasticity measurement frameworks and then we critically review these works based on the proposed factors. At the end, we highlight the research challenges and open issues in this domain and include those in our research roadmap.

2.2.3.1 Taxonomy

Now we employ a taxonomy for analyzing the elasticity benchmarking frameworks based on four criteria: (1) Assumptions and scope, (2) Measurement framework, (3) Validation strategy and (4) Pragmatic issues that have been addressed. This taxonomy has been derived from the study and analysis of the surveyed elasticity benchmarking frameworks. Fig 2.2 illustrates our proposed taxonomy.

The first criterion, Assumptions and scope, defines the context and applicability of the measurement framework. It can be further divided into several characteristics: Cloud deployment model, Cloud service, System evaluated and Modeling perspective. The cloud deployment model describes how resources are provisioned based on the organizational structure and the provisioning location. The cloud service is any service that is provided 2.2. Related work 47 over the internet on demand. In our surveyed measurement frameworks, we have identified three dierent deployment models, namely, private, public and hybrid cloud and two cloud services, namely, Infrastructure as a Service (IaaS) and Platform as a Service (PaaS). The system evaluated aspect refers to the type of the targeted system under study: application or database. The modeling perspective corresponds to the point of view from which the framework is designed and developed. Our surveyed elasticity frameworks either hold the consumer’s and/or the provider’s perspective or focus solely on technical characteristics.

The next criterion, Measurement framework, usually refers to a skeletal structure with a set of metrics and a set of rules that govern the test condition and method, for instance, input workload profiles, testbed specification, testing methodology, etc. This aspect can sub-classified as follows: Modeling approach, Metric(s), Workload profile, Figure of merit and Testing method. The modeling approach describes how the specific aspects of elasticity are conceptualized and represented. For micro-benchmarking frameworks, we characterize it with respect to the elasticity dimensions they address: capacity, QoS and cost dimensions [186]. For the macro-benchmarking frameworks, in contrast, we characterize it based on the technique adopted to represent the elasticity of the whole system; examples include, multi-criteria analysis, financial impact analysis, performance overhead analysis etc. The metric(s) refers to specific measures used to gauge the elasticity of the SUT. The workload profile corresponds to the representative use-case scenarios for a particular domain, e.g., transaction processing application, scientific application etc. Usually, the workloads used for elasticity benchmarking includes fluctuating patterns and a mix of transactions. The figure of merit implies whether the elasticity metric(s) for a set of workload profiles can be combined into one or not. A unified elasticity metric facilitates informed decision-making over several competitive cloud solutions. The testing method denotes how much control or internal knowledge about the SUT is needed to carry out the benchmarking task; examples include, black-box, grey-box and white-box testing.

The criterion, Validation strategy, is used to confirm whether the proposed measure- ment framework can appropriately reflect the intended aspects of elasticity or not. Exam- ples include, theoretical analysis, simulation and experimentation.

The last criterion, Pragmatic issues, corresponds to realistic phenomena and consid- erations that arise most often in practical situations. Examples include, SLA, charging anomalies of the cloud environments etc. 48 Background and state of the art

Macro- benchmarking frameworks Financial Performance Multi-criteria impact analysis impact analysis analysis

Weinman Dory et al. Majakorpi

Tinnefeld et al. Shawky et al.

Almeida et al.

Figure 2.3: Overview of macro-benchmarking modeling approaches

2.2.3.2 Macro-benchmarking frameworks

A macro-benchmarking framework quantifies the elasticity of the system as a whole. It measures elasticity with respect to a specific class of applications and yields a few sum- mary measures as a reflection of the system’s elasticity behavior. It is particularly useful to the stakeholders in that it helps them draw conclusions and informed comparisons over competitive cloud oerings, adaptive policies, design alternatives and deployment config- urations. It should be noted, however, it does not necessarily reveal the reasons behind the system’s poor elasticity behavior.

The frameworks in this category can be further classified based on their respective approaches to model the elasticity behavior. Fig. 2.3 provides an overview of this classi- fication.

2.2.3.2.1 Financial impact analysis

The financial impact analysis based approach quantifies the implications of under-provisioning and over-provisioning in terms of monetary cost. These frameworks usually take on the consumer’s perspective and hold the assumption that the consumer’s application has to meet a predefined SLA while minimizing operational expenses. Failure to meet the SLA for insucient resources or excess payment for unutilized resources contributes to the penalty which is later transformed into monetary units. The obvious merit of these frameworks lies in modeling the consumer’s detriment in monetary terms. However, it requires a solid 2.2. Related work 49 understanding of the consumer’s business situation and cloud platform’s charging policy; failure to accommodate any of these aspects may prevent any practical use of this type of framework.

Weinman’s framework. To the best of our knowledge, Weinman [239] is the first who coined the term “Penalty model” approach in his proposed elasticity measurement frame- work that explicitly takes on the consumer’s viewpoint. The main idea behind Weinman’s model stems from the basics of economics, i.e., demand and supply. In his measurement model, the resource demand is expressed as a function of time, D(t) and allocated resource supply is expressed as a function of time, R(t). Suppose that, for some time point ti,there is a dierence between the observed demand D(ti) and provided resource supply R(ti).If

D(ti) >R(ti), then the system is under-provisioned, i.e., the current resource capacity is not adequate to meet the demand. On the other hand, if D(ti)

Tinnefeld et al.’s framework. Tinnefeld et al. [223] presented a consumer-centric elasticity benchmarking framework 1 for IaaS hosted Relational Database Management Systems (RDBMS). The authors instantiated a benchmarking prototype by specifying the SLA design, resource type and a set of queries. According to this method, the penalty for imperfect elasticity is computed based on the observations taken over several executions of a benchmark suite over a particular interval. To measure under-provisioning penalty for an RDBMS, one has to quantify the amount of latency violations for each benchmark suite run, aggregate the violations across all benchmark suite runs and calculate the op- portunity cost based on the SLA. Instead of computing the over-provisioning penalty, this work estimates the total billing cost for all utilized nodes over the entire runtime of the benchmark suite. Finally, elasticity penalty rate is derived by normalizing the combined values of under-provisioning penalty and billing cost for the unit interval. This work is useful in determining the elasticity penalty for relational database clusters; however, it suers from several drawbacks. It is tightly coupled with a specific resource type (e.g., Amazon m3 double extra instance) and SLA design. A further generalization is required to incorporate a wide variety of resource types and SLA formats. Moreover, this work simplifies the computation of over-provisioning penalty and replaces it with billing cost. With this simplification, it is easy to overlook scenarios that may arise in exceptional circumstances, such as situations when the system exhibits plasticity. Furthermore, this work neither provides any guidance on workload design nor it oers any unified metric for comparing alternative elasticity solutions.

2.2.3.2.2 Performance impact analysis

The frameworks pertaining to this category are specifically concerned about the perfor- mance implications for imperfect elasticity on the application or the database cluster. They measure elasticity in terms of the performance overhead caused by the prolonged adaptation interval. This approach, however, does not take into account the granularity of usage accounting and cost-eectiveness of the cloud platform - two crucial ingredients of consumer-perceived elasticity. Another weakness of this approach is that it does not dif- ferentiate the performance overhead of adaptation with performance variability; therefore,

1This work adopted and simplified our core elasticity framework [139] for database elasticity evaluation 2.2. Related work 51 more likely to give rise to confounding eects while deriving the elasticity metric.

Dory et al.’s framework. Dory et al. [92, 94, 93] were one of the early pioneers to consider the problem of elasticity measurement for distributed NoSQL databases. They developed a framework that quantifies elasticity as a dimensionless measure. They viewed elasticity as the reaction of a live database cluster in response to node addition or removal during a workload execution. They incorporated two main aspects in their elasticity metric:

The cluster’s stabilization time right after adding or removing a node •

Performance overhead (i.e., amount of deviation from stable performance) during • the stabilization period.

The scope of their work was restricted to scale-out phases only. They hypothesized the fact that right after the addition of a new node to a cluster, the execution time will rise initially and then gradually settle back to the normal level as soon as the new nodes are up and running. They argued that no matter how small the stabilization period is, the standard deviation of execution time will increase because of the data transfer overhead between nodes during this interval. Figure 2.4 illustrates this concept; A and B represent the surface areas related to execution time increase and standard deviation respectively. The elasticity metric is then computed by dividing the combined areas of A and B by the sum of average response times of a single request (before bootstrapping new nodes and after stabilization) and average execution times of the request set (before bootstrapping new nodes and after stabilization) [93]. Alternatively, elasticity metric can be estimated by dividing the combined areas of A and B by the squared sum of the response times of a single request (before bootstrapping new nodes and after stabilization) and a factor F expressed as the number of requests per cluster [94]. In both cases, the lower the Elasticity score, the better the elastic response of the system.

This approach keeps the workload intensity constant during benchmarking. This as- sumption is not realistic as the adjustment of the cluster size, in most cases, is triggered by fluctuating workloads. Another shortcoming of this work is that it does not specify any metric to account for the contraction phase of the database cluster; so the stakeholders are supposed to have only a partial view of elasticity. These concerns, if not properly addressed, may jeopardize the adoption of this framework. 52 Background and state of the art

Figure 2.4: Elasticity measurement concept of Dory et al. (adapted from [94]) Shawky et al.’s framework. Shawky et al. [213] integrated the concept of elasticity theory in Physics into their elasticity measurement model. In Physics, elasticity is defined as the property of a deformed material to return to its original shape when the external force causing the deformation has been removed. It is expressed as the following ratio:

Stress Elasticity = (2.1) Strain where Stress represents the amount of pressure a material can withstand without under- going any deformation and Strain refers to the amount of deformation. Motivated by this concept, the authors drew a one-one correspondence between the ter- minologies in Physics and those of cloud computing. They measured Stress as the ratio between the demanded resource capacity and the allocated resource capacity, drawing the analogy that the demanded capacity is the external force to cause the deformation while the allocated resources try to resist it. Strain was measured as the combination of two factors: relative change in network bandwidth as a result of adaptation and the provisioning (or deprovisioning) delay. The ratio between Stress and Strain yields the elasticity metric for a cloud solution. Note that the elasticity metric, in this case, is not a dimensionless measure as compared to that of Physics, rather it has a unit 1/s or Hz. This work, though very interesting and reflective of elasticity concept in Physics, has several drawbacks. It does not consider the performance impact during the adaptation period, rather it includes the performance before and after adaptation. This view orients its standpoint more towards scalability instead of elasticity. Furthermore, it does not ad- dress cloud-specific pragmatic issues, such as charging quanta, variability. It also lacks appropriate guidance for workload design and a unified elasticity metric for the entire workload collection. 2.2. Related work 53

Almeida et al.’s framework. Almeida et al. [43] described a framework that evalu- ates the elasticity of a PaaS database by taking into account both the consumer’s and the provider’s perspectives. This framework proposes metrics for under-provisioning and over- provisioning based on the query response time. The under-provisioning metric is defined as the average magnitude of upper-bound response time violation (i.e., the ratio between observed response time and its upper bound) for queries which violate the upper-bound re- sponse time. The over-provisioning metric, in contrast, corresponds to the situation when the resource quantity is more than necessary to meet the database queries with expected SLA. In this scenario, the provider is paying more for excess resource quantity. The over- provisioning metric is measured as the average magnitude of lower-bound response time violation (i.e., the ratio between lower bound response time and observed response time) for queries which violate the lower-bound response time. The overall elasticity metric is computed by taking the weighted combination of under-provisioning and over-provisioning metrics. The authors recommended a higher weight for under-provisioning as compared to that of over-provisioning since the former one aects both parties. The framework considers only one workload type: a periodic step function in the number of concurrent users. The workload mix consists of scan and insert queries. The workloads in this framework are not representative of real-world usage scenarios. Moreover, this measurement model holds a restricted view on SLA design; it is not able to incorporate SLAs which have other dimensions, e.g., dropped queries. Moreover, some SLA specifications have step functions and are flexible enough to allow a certain percent- age of queries to experience violations. This model needs further refinement to take into account wide variety of SLAs.

2.2.3.2.3 Multi-criteria analysis

Multi-criteria analysis (MCA) facilitates rational decision-making in situations where mul- tiple conflicting criteria are present. In the context of the cloud, the stakeholders often have conflicting goals; this approach identifies the stakeholder’s goals of interest and evaluates how the elastic system meets these goals in terms of a combined utility measure. However, MCA has been criticized for several overheads, such as specifying appropriate weights for multiple preference criteria and deriving a utility function for each preference based on high-level business objectives and system-level metrics [90]. If the decision makers are not familiar with this approach and/or fail to convert their business objectives into appropri- 54 Background and state of the art ate utility functions, they will end up with wrong assessment scores for alternative cloud platforms. Note that MCA does not evaluate the monetary worth of alternative decisions, rather it measures the extent to which a platform achieves the desired utility; therefore, stakeholders, who prefer to evaluate the economic worth of alternative platforms, may not find MCA based frameworks beneficial [47].

Majakorpi’s framework. Majakorpi [172] developed a measurement model in his MSc thesis that applies multi-criteria analysis to transform conflicting preference objectives into an elasticity metric for a workload. This work holds the assumption that an application’s business objectives can be expressed in terms of low-level metrics, such as performance measures, costs etc. In this thesis, elasticity is viewed as a meta-quality - a quality that is related to other qualities of the system. For this reason, this work formalizes preference objectives based on application specific business requirements, e.g., performance, cost etc. and determines how much the application conforms to these preference criteria. These preference factors most often conflict with each other; for example, a gain in performance achieved with additional resources may oppose the cost objective and vice versa. For this reason, this work applies Multi-criteria analysis technique to define a utility function - it quantifies the perceived utility of the system as the weighted combination of achieved conformance under multiple preference objectives. The Quality of Elasticity (QoE) metric is then computed by aggregating the achieved utilities over a particular interval and normalizing that value for unit interval. This framework oers weak guidance on preference objective specification (e.g., cost pref- erence function) and workload suite design; the workloads considered here are mostly arbitrary and lack realistic features, e.g., seasonality, occasional trend, bursts etc. Fur- thermore, it does not provide any guidance on combining the QoE of multiple workloads into a single summary measure, thereby making it dicult for stakeholders to make a conclusive decision and fair comparison among competitive options. In addition, it does not consider cloud performance unpredictability while deriving the elasticity metric.

Table 2.2 compares these frameworks based on their general characteristics. 2.2. Related work 55 IaaS function Multiple Business Grey-box Application Utility SLA Experimental Total utility as a Majakorpi [172] Public & Private Stepwise fluctuating • preference objectives • Multi-criteria analysis weighted combination of achieved conformance to IaaS None None Public cluster penalty Multiple Grey-box Consumer Linear SLA Under-provision Tinnefeld [223] Total penalty rate Total running cost • • Relational database • Financial impact analysis Total penalty Over-provision SLA PaaS Public • queries analysis • Multiple Black-box Almeida [43] Experimental Under-provision • Periodic step function Performance impact with scan and insert Consumer & Provider • penalty NoSQL database cluster penalty None Physics Multiple Grey-box Technical Bag of tasks Simulation Application Demand factor Shawky [213] • bandwidth and adaptation time • analysis based on Performance impact Elasticity concept in normalized by changed IaaS & VM based PaaS Public, Private & Hybrid IaaS None Constant analysis Multiple Grey-box Technical • Dory [94, 93] Experimental Public & Private Performance overhead Performance overhead ratio for node addition • to absolute performance NoSQL database cluster Table 2.2: Characteristics of erent di elasticity macro-benchmarking frameworks walk Public Constant Multiple Grey-box Consumer • Theoretical Linear SLA Application Linearly growing Weinman [239] Total financial loss Random - Uniform Financial impact Over-provision cost Exponential growth • Under-provision cost • • • • • distribution & random IaaS & VM based PaaS analysis, Penalty method model Aspect Metric(s) Pragmatism Cloud service Figure of merit Testing method Workload Profile System evaluated Cloud deployment Modeling approach Validation strategy Modeling perspective 56 Background and state of the art

Micro- benchmarking frameworks

Capacity focused QoS focused

KIT & BUNGEE YCSB Konstantinou et. Coutinho et. al al SMICloud Sakr et. al

Kuhlenkamp et. al

Figure 2.5: Overview of Micro-benchmarking modeling approaches

2.2.3.3 Micro-benchmarking frameworks

A micro-benchmarking framework measures either specific aspects of elasticity or a spe- cific portion of the elastic system. It aids in the investigation of bottlenecks for imperfect elasticity and provides insights on how to tweak the system for improved level of elastic- ity. This framework, however, has a more detail-oriented view and does not necessarily oer any yardstick to draw conclusions or comparisons over a range of alternative elastic solutions.

The frameworks falling into this category can be sub-classified based on their modeling approaches. Fig. 2.5 depicts this classification.

2.2.3.3.1 Capacity focused framework

This approach specifically concentrates on the capacity related measures of the system under study. It includes VM provisioning and deprovisioning delay, maximum number of VMs that can be provisioned during the peak, precision of supplied resource capacity, temporal distribution of the reconfiguration points and so on. The frameworks belonging to this category are suitable to diagnose potential issues for imperfect elasticity; nevertheless, they fail to characterize the overall impact of elasticity on the application or the database.

KIT and BUNGEE. The KIT group presented their initial idea on elasticity measure- ment metrics in [156] and iteratively refined the benchmarking solution in [237] and [126] 2.2. Related work 57 respectively. This work is concerned about technical aspects only and adopts the elasticity concept of Herbst et al. [125] in their latest measurement metrics. The KIT group presented several alternative measurement models to analyze the plat- form’s elasticity behavior. In [156], they represented elasticity with respect to three as- pects: eect of reconfiguration, temporal distribution of reconfiguration points and reac- tion time. Eect of reconfiguration resembles the granularity of an adaptation and can be measured in terms of the amount of resources that are added/removed during the adap- tation process. Temporal distribution of reconfiguration points refers to the density of adaptation points over a time interval and Reaction time corresponds to the scaling delay. In a later work [237], however, they proposed an alternative view to measure elasticity. In this work, a number of prerequisites were specified to carry out the benchmarking task across several cloud platforms; this includes access to cloud management server, autonomic scaling mechanism, SLO specification, identical resource configuration and scaling range support. The model represents the over-provisioned state and the under-provisioned state in terms of several metrics: speed of scaling, precision (or accuracy), i.e., the deviation between the supplied and the demanded resource quantity, and a combined elasticity met- ric reflecting the adaptation quality as the reciprocal of the average speed and average precision. In their latest publication [126], they settled down with the following metrics: a weighted elasticity metric (i.e., the geometric mean of the normalized values of weighted ac- curacy and weighted timeshare measures) and a jitter metric (reflects the number of adap- tations with respect to demand fluctuations). The benchmarking approach is composed of two stages; in the first stage, system scalability is analyzed to define the relationship be- tween load intensity and resource demand as a function, resourceDemand = f(intensity). Later, the benchmarking load profile is calibrated using this function. In the next stage, the benchmarking workload is applied to the SUT (System Under Test) and then elasticity metrics are computed based on the observed measures. This framework adopts a meta-model approach [146] and the LIMBO toolkit [124] to con- figure variable intensity workloads which represent seasonal variations, trends, occasional bursts and noises. The benchmarking workload can either be configured to mimic a real- istic trace or be defined in terms of custom characteristics, e.g., seasonality, trend, burst and noise. This framework, however, suers from considerable drawbacks. First, it requires the consumer to specify several weights for deriving the timing and accuracy metrics; unfor- tunately, there is no guidance on how to derive those weights based on the consumer’s 58 Background and state of the art business situation. Inappropriate specification of weights may mislead the cloud con- sumer’s subjective conclusion about the elasticity of competing cloud platforms. Second, it does not evaluate dierent platforms based on the same workload demand. It adjusts the workload intensity for each compared platform so that the same absolute resource demand is induced on each (e.g., x VMs to serve the peak demand), thus completely dis- regards the heterogeneity in the eciency of the underlying resource unit during demand calibration. To reiterate, it calibrated the resource demands for EC2 and CloudStack in a way so that each platform served the peak demand with 10 VMs, although the CPU e- ciency of the supplied VMs were dierent for these platforms (the peak request rates were 710 requests/second and 339 requests/second for EC2 and CloudStack respectively). The usefulness of this approach is somewhat questionable from the viewpoint of consumers who want to compare the elasticity of dierent platforms for the same set of workloads. More- over, this approach introduces a conversion overhead between resource demand and load intensity. In the context of cloud environments, this conversion is not so straightforward and involves a number of challenges, such as performance variability issues, heterogeneity of the physical hardware, the influence of background load and so on [226]. Unfortunately, this framework does not provide any guidance for elasticity benchmarking in the pres- ence of cloud performance unpredictability. Furthermore, the workload generator LIMBO adopted by this framework fails to generate realistic prototypes when the workload has fine-scale burstiness, therefore, making it dicult for consumers to carry out elasticity benchmarking for custom workloads.

Coutinho et al.’s framework. Coutinho et al. [83] proposed a set of secondary met- rics to further investigate the system’s orchestrated adaptability behavior at the micro level. This measurement model characterizes elasticity behavior in terms of seven met- rics: Under-provisioned allocation time, Over-provisioned allocation time, Stable allo- cation time, Transient allocation time, Total under-provisioned allocated resource, Total over-provisioned allocated resource and Total stable allocated resource. Under-provisioned allocation time and Over-provisioned allocation time correspond to the interval used in carrying out resource allocation and de-allocation operations respectively. Stable alloca- tion time refers to the interval when the allocated resources are stable, that is, the system is not going through any adaptation process. Transient allocation time refers to the inter- val when the system makes a transition from one state to another right after an allocation or de-allocation operation is executed. Total under-provisioned allocated resources and 2.2. Related work 59

Total over-provisioned allocated resources are defined as the total resource quantity over all under-provisioned states and all over-provisioned states respectively. Total stable allo- cated resources correspond to the resource quantity over stable time periods. This model, however, has some conceptual limitations; it does not capture over-provisioning during stable and under-provisioned periods. In addition, it interprets Transient alloca- tion time as a theoretical metric but could not justify its presence in relation to practical scenarios. In particular, this metric reflects the underlying cloud platform’s overhead in configuring the load-balancer, propagating the resource configuration changes to the Do- main Name Server (DNS) etc. Moreover, it requires further refinement to diagnose prag- matic issues, e.g., the dierence between the charged and the allocated resource quantities during adaptation intervals.

SMICloud. Garg et al. [119, 118] developed a framework SMICloud for ranking IaaS clouds. This ranking framework includes a set of quantifiable KPIs based on which con- sumers can select an IaaS cloud best suited to their needs. Among these KPIs, there is a metric to evaluate elasticity; it is expressed in terms of two measures: mean time to adapt to a change in resource capacity and the maximum capacity that can be obtained during the peak. This framework, however, does not consider two fundamental aspects of elasticity: granularity of usage accounting and cost-eective resource usage.

2.2.3.3.2 QoS focused framework

This approach quantifies the QoS impact of elasticity on the application or the database during the adaptation phase. These measures are useful to understand the interactions of elasticity with the application or database during the adaptation interval; nevertheless, they do not necessarily reveal the underlying reasons for poor elasticity behavior in terms of system level measures.

Yahoo! Cloud Serving Benchmark (YCSB). Cooper et al. [79] developed an open source benchmarking framework YCSB to facilitate fair evaluation of data serving systems in the cloud. This benchmark has two tiers: performance and scaling tier. The scaling tier specifically concentrates on measuring the impact of elasticity. It defines two metrics: scaleup and elastic speedup. The scaleup metric assesses whether the database cluster can serve an increased amount of workload with a graceful performance (e.g., latency) 60 Background and state of the art as more nodes are added to it. This is a reflection of database scalability. The other metric, elastic speedup, measures the performance (e.g., read latency) of the database cluster during the adaptation period of node addition and data rebalancing operation under load. This work does not consider the use case scenario where the database cluster needs to contract itself by releasing a portion of nodes. It also does not take into account other elasticity characteristics, such as the precision of supplied resources with respect to demands and the granularity of usage accounting, therefore conveys a partial view of elasticity. Furthermore, this work quantified elasticity in response to a constant workload; it did not consider fluctuating and representative workloads for elasticity measurement.

Konstantinou et al.’s framework. Konstantinou et al. [150] conducted an empirical study on the elasticity of NoSQL database clusters hosted in the cloud. They developed an elasticity-provisioning prototype that automatically resizes and rebalances a cloud-hosted NoSQL database cluster in response to varying loads. This approach determines the impact of node addition (or removal) and data rebalancing operation on a NoSQL cluster based on straight-forward performance (e.g., query through- put, query latency), utilization measures (e.g., percentage CPU usage) and adaptation time. It reports the time taken to resize the cluster; however, the cluster’s adaptability behavior during transition between states has not been inspected under load. Moreover, it assumes data rebalancing as an oine process; however, it does not consider use cases when resize and rebalance operations need to be carried out under high load and perhaps low load. The cluster’s adaptability behavior for those use cases was skipped in this in- vestigation. The workload used here is a step function with three levels of intensities and includes read-only queries. The high and low load intensities are designed to trigger node addition and removal operations respectively, whereas the medium load is intended to keep the cluster stable for a while. This workload is a synthetic one and does not resemble the characteristics of realistic arrival patterns of database queries.

Sakr and Liu’s framework. Sakr and Liu [207] formulated a benchmarking proposal to evaluate the elasticity of the Database as a Service (DaaS) and virtualized database server scenarios with coverage for both NoSQL and relational database systems. This measurement model suggests several metrics to quantify elasticity from the consumer’s perspective: speedup, scaleup and billing cost. Speedup refers to the time interval during 2.2. Related work 61 which the database cluster resizes and stabilizes its resource capacity. A database cluster is considered to be in the stable state if the variation in the response time of the queries is equivalent to that of a steady-state reference system. Scaleup is defined as the throughput gain that can be achieved by adding more resources to the cluster in response to a rise in the workload intensity. This metric is not an appropriate measure of elasticity; rather it tends to reflect scalability. Moreover, it does not dierentiate metrics for IaaS and PaaS databases; when databases are hosted on the IaaS cloud, the supplied resource capacity as well as cost may not precisely follow the workload demand. This aspect of elasticity has not been captured by the metrics in this benchmark. The authors planned to adopt and extend Yahoo! Cloud Serving Benchmark YCSB [79] for elasticity evaluation of NoSQL and relational databases. The workload suite consists of a number of fluctuating patterns, e.g., sinusoidal workload, exponential rise and decay, linearly growing workload, random workload etc. They also aimed to consider geographical distribution of applications and database replicas in order to support elasticity evaluation over multiple data centers.

Kuhlenkamp et al.’s framework Kuhlenkamp et al. [155] conducted an empirical measurement study on the scalability and elasticity of distributed database systems. They assessed the elasticity of IaaS hosted distributed databases in terms of two metrics: scaling latency and performance impact. Scaling latency is the duration from the triggering of a scaling action to the end of data streaming. Performance impact is measured in terms of the average throughput as well as the average and 99th percentile latencies during, before and after scaling. These metrics, however, provide only a partial view on elasticity and do not consider other features, e.g., the precision of supplied resources in following the demand, cost etc. The workloads used to measure elasticity include both constant and varying patterns; YCSB read-heavy and update-heavy workloads with Zipfian request distribution. However, those workload profiles fail to capture other real- world usage scenarios, such as diurnal patterns, flash crowds etc.

Table 2.3 compares these frameworks based on their general characteristics. 62 Background and state of the art IaaS Public cluster latency Multiple Grey-box Technical Scaling time distribution Temporal & 99th percentile Experimental update-heavy Average latency • Zipfian request workloads with • NoSQL database • Average throughput Kuhlenkamp [155] Data streaming and YCSB read-heavy & • stabilization duration performance measures None None Scaleup Speedup Public servers patterns Multiple Billing cost measures • Grey-box • Sakr [207] Consumer Temporal, • decay, linearly with sinusoidal, database cluster growing, random exponential rise & DaaS & virtualized performance & cost NoSQL & Relational Fluctuating workloads IaaS None None None Public Multiple measures Black-box Consumer Mean time to Application at peak times • Maximum capacity expand or contract the service capacity SMICloud [119, 118] • Temporal & resource IaaS None cluster function Multiple Grey-box Technical Query latency Experimental Adaptation time Performance & • Percent cpu usage Three level step Query throughput NoSQL database Public & Private temporal aspects • Konstantinou [150] • • Total Total IaaS time time None • • resource resource resource Multiple measures Stable time Grey-box Technical Total stable Application Transient time • • Experimental Coutinho [83] Over-provisioned Under-provisioned • over-provisioned Public & Private under-provisioned • • Temporal & resource Fluctuating workloads IaaS None Multiple Grey-box Technical YCSB [79] Elastic speedup Experimental database cluster Public & Private • with read queries constant workload NoSQL & Relational Performance measures Jitter IaaS SLA Accuracy • aspects Timeshare Multiple Grey-box Technical • Application • Experimental Table 2.3: Characteristics of elasticity micro-benchmarking frameworks Public & Private Speed & precision BUNGEE [237, 126] exhibiting seasonality, Fluctuating workloads trend, burst, noise etc. ect of IaaS None E Public Temporal Multiple Grey-box Technical • • Application Reaction time Temporal & Experimental distribution of reconfiguration • Kuperberg [156] quantitative aspects reconfiguration points Fluctuating workloads model Aspect Metric(s) Pragmatism Cloud service Figure of merit Testing method Workload profile System evaluated Cloud deployment Modeling approach Validation strategy Modeling perspective 2.2. Related work 63

2.2.3.4 Critical reflection

Framework evaluation

Workload Strong target Appropriate Repeatability Metric relevance Fairness Economical representation audience representation of results Adaptation Fluctuating speed

Precision Scalable Usage ac- Realistic counting

Figure 2.6: Evaluation criteria for elasticity benchmarking frameworks

The field of measurement and benchmarking is not uncontroversial. Many proposed models and metrics become the subject of debate and dispute concerning whether or not they reflect what they are intended to. Several authors [131, 132, 105] characterized the key aspects that all good cloud benchmarks should possess. In this section, we evaluate the existing elasticity benchmarking frameworks in the light of those works.

We evaluate the benchmarking frameworks based on seven criteria: Metric relevance, Workload representation, Strong target audience, Appropriate representation, Repeatability of results, Fairness and Economical. The first criterion, Metric relevance implies whether or not the metric(s) has an appropriate and meaningful representation of elasticity. It is expected that the framework metric(s) would incorporate and reflect the basic aspects of elasticity: Adaptation speed, Precision and Usage accounting. Adaptation speed cor- responds to the time required for the SUT to adjust the resource capacity and stabilize itself with the varying workload intensity. It should account for the capacity change in both upward and downward direction. Precision refers to the accuracy with which the supplied resource aligns with the workload demand. And Usage accounting refers to the cost-eective adaptability of the operational expenses with fluctuations in the workload intensity. A system with rapid adaptability, precise resource supply and less operational cost orchestrates better elastic response as compared to another which falls short in any of the three aspects. The elasticity metric(s) should be able to convey this notion to its target stakeholders.

The next criterion, Workload representation, indicates whether the workload suite has exemplary workload profiles resembling realistic and adaptive usage scenarios. This can be verified by assessing the workload suite with respect to three factors: Fluctuating behavior, Scalable and Realistic features. In order to evaluate elasticity, the workload profiles should 64 Background and state of the art be designed to incorporate load intensity fluctuations at dierent scales and realistic trace level characteristics.

The information provided by the benchmark should be of interest to a group of stake- holders or a strong target audience. The target audience may be consumers, providers, performance analysts, adaptive system designers and researchers.

Appropriate representation implies whether the benchmark’s claim is consistent with what it does and vice versa. For instance, if a benchmark claims itself as the consumer’s benchmark, it should be able to adequately incorporate the consumer’s concerns and yield results which are useful to the consumers.

The criterion, Repeatability of results denotes whether the benchmark implementa- tion can generate consistent results when executed under the same set of testing con- ditions. Since cloud platforms exhibit performance variability, cloud performance bench- marks should take appropriate precautions to guarantee valid and repeatable results across identical runs of the benchmark.

Another aspect, Fairness describes whether all compared systems can participate equally; it also means that the benchmark will not discriminate any of the systems over another. For example, while comparing EC2 with Google Compute Engine, one has to ensure equivalent optimization of the benchmark for both platforms. If the optimization level is higher in EC2, the overall comparison result will be more biased towards EC2. Variability is another factor that aects the fair comparison of cloud platforms. Between the compared alternatives, if one platform’s elasticity experiment gets perturbed by an ex- ternal event (e.g., interference eect), the overall benchmarking result will be more biased towards the other platform.

Finally, Economical implies whether the benchmark is aordable and “worth the in- vestment” [131].

Table 2.4 presents a critical evaluation of the surveyed benchmarking frameworks based on the aforementioned criteria. Looking at this table, we identify a number of serious drawbacks in the prevalent elasticity benchmarking frameworks. The first issue relates to metric relevance; the elasticity metric definition provided by these frameworks do not adequately cover all important aspects, that is, adaptation speed, precision and usage accounting. In other words, the elasticity metric at present yields restricted view about the cloud platform’s adaptability behavior. 2.2. Related work 65 Economical Fairness of results Repeatability X Partial Partial Partial Partial Partial Partial Partial Appropriate representation X X X X X X X X audience Strong target Partial Partial Partial Partial Realistic workloads X X X X X X Scalable workloads X X X X X X X X X workloads Fluctuating X X X Usage accounting Table 2.4: Evaluation of elasticity benchmarking frameworks X X X Partial Precision X X X X X X X X X X X speed Partial Partial Partial Adaptation [155] [150] Aspect al. [156] YCSB [79] Weinman [239] Majakorpi [172] Sakr and Liu [207] Kuhlenkamp et al. Almeida et al. [43] Dory et al. [94, 93] Shawky et al. [213] Konstantinou et al. Coutinho et al. [83] KIT - Kuperberg et BUNGEE [237, 126] SMICloud [119, 118] Tinnefeld et al. [223] 66 Background and state of the art

Another key problem with these frameworks is that their workload representations lack realistic flavor. Most of these frameworks carried out elasticity evaluation for arbitrary workloads. Although a few of these works [239, 207] suggested a standard workload suite for elasticity evaluation, they either did not define it properly (that is, mathematically or algorithmically) or failed to cover some frequently observed usage scenarios (such as recurring patterns). BUNGEE [126] includes the LIMBO toolkit that allows reproducing custom prototypes from workload traces. Nevertheless, a criticism of LIMBO is that it fails to reproduce empirical characteristics of fine-scale bursty workloads (frequently seen in web and e-commerce applications).

A majority of these frameworks claim to have a strong target audience; some target the consumers while some others target stakeholders whose concern relates to the technical aspects of elasticity. However, our literature review suggests that none of the consumer- centric elasticity frameworks can appropriately represent the consumer’s concerns. Wein- man’s work [239] can be considered as a good starting point for consumer-centric elasticity evaluation; however, its metric definition and workload representation need to go through further refinement to appropriately incorporate the consumer’s concerns and cloud-specific pragmatic issues.

Perhaps the most serious disadvantage of these frameworks lies in their inability to ensure repeatable and valid benchmarking results in the presence of cloud environments’ nondeterminism. None of these frameworks provided any guidance to address nondeter- minism in public cloud environments. All of them reported elasticity evaluation results from only one execution of the benchmark. If repeatability and fairness are not guaranteed, then there would be very little economic advantage from running these benchmarks.

In this section, we have highlighted the principal issues in the prevalent elasticity benchmarking frameworks. As we have seen, none of these frameworks adequately cover the basic requirements of a good benchmark. Although several frameworks claimed con- sumers as the main target audience, none of them appropriately addressed the consumer’s concerns in their elasticity metric definitions and workload representations. A partial or improper view about elasticity not only hurts the consumer’s revenue and net profit but also stalls innovation in this area. For this reason, it is crucial to design an elasticity benchmarking framework that can appropriately address the consumer’s concerns as well as basic benchmarking requirements. 2.2. Related work 67

2.2.4 Elastic scaling techniques

Most of the commercial cloud services today claim elasticity as a virtue they possess; however, the current degree of elasticity is not adequate to meet the stringent QoS re- quirements for latency sensitive and real-time applications. Hence speculation abounds on how to exploit the virtually infinite cloud resources in a cost-eective way so as to achieve a balanced trade-o between QoS objectives and operational expenses. The aim of this research stream is to address this concern with optimal adaptive scaling strategies. The elastic controller’s adaptive response, in most cases, depends on the quality of these scaling strategies.

Figure 2.7: Classification scheme for elasticity techniques (adapted from [110])

A plethora of dynamic scaling strategies has been developed over the last decade to improve the elasticity experience of the consumer’s application. Fig. 2.7 depicts a tax- onomy proposed by Galante and De Bona [110] to classify elastic scaling techniques in the cloud. The general idea behind all these techniques is that they start out with a set of precisely defined optimization criteria which is later transformed into a set of scaling policies and ultimately integrated (as a set of programs) with some layer of the cloud deployment stack (e.g., IaaS, PaaS or the application itself). For example, a web applica- tion may want to ensure 2-seconds response time threshold in its delivered requests with minimal operational cost. Now transforming these optimization constraints into dynamic scaling policies requires a good understanding of the underlying platform’s elasticity be- 68 Background and state of the art havior (e.g., scaling latency, charging quanta) and workload characteristics (e.g., growth rate, decay rate, burstiness). These scaling policies can be classified as reactive or pre- dictive. The reactive policy can be specified in terms of a set of pairs. It triggers the adaptation action as soon as the elastic system’s recent condition (e.g., observed CPU load) satisfies the rule condition. There are many academic works and commercial cloud portals that proposed several variants of reactive scaling, examples in- clude AWS Autoscaling API [7], RightScale’s autoscaling algorithm [32], [30], Elastic site [174], proportional thresholding technique [168]. Reactive policies are relatively sim- ple to understand and easy to implement; however, they cannot guarantee good elasticity for complex workloads, such as unpredictable and bursty workloads [171, 170]. Predictive policies can eectively manage these situations by forecasting the workload demand in the near future and adapting the resource capacity for the anticipated demand. A wide vari- ety of predictive scaling mechanisms have been found in the literature; examples include, queuing theoretic approaches [248, 228], control theoretic techniques [197, 167, 249], rein- forcement learning policies [203, 245] and time-series analysis [157, 122, 138, 216]. All of these works validated their optimality claims based on self-designed evaluation platforms and arbitrary workload specifications [170]. Diculties arise, however, when an attempt is made to rank the eectiveness of these scaling techniques. A well-designed elasticity benchmarking framework is, therefore, the need of the hour for a fair assessment of these elasticity techniques.

Note that elastic scaling technique is an indispensable part of the elastic system; it is not possible to determine the application’s elasticity behavior without configuring any scaling technique (in IaaS cloud scenarios). From this perspective, this research area has a close connection to our work. Despite this fact, there are some fundamental dierences between our work and elastic scaling. First of all, it is not our intent to design optimal scaling strategies for cloud platforms. Instead, we are looking at elasticity optimization problem from a complementary angle. Our goal is to develop a framework so that con- sumers can compare the eectiveness of alternative scaling techniques and quantify the degree of improvement or degradation as a single figure of merit. Therefore, our framework can be regarded as a useful tool to support fair comparison of elasticity techniques. In addition to that, we carry out extensive case studies in the EC2 cloud and extract valuable insights about the elasticity behavior of several reactive scaling techniques. Based on our observations, we suggest several prescriptions for improving consumer-perceived elasticity. 2.2. Related work 69

These insights and prescriptions contribute to the practical knowledge on elasticity; adap- tive system designers will find this information valuable while designing elastic scaling techniques.

2.2.5 Elasticity quality assurance frameworks

The behavior of an elastic system depends on the complex interaction of various factors, such as the elasticity of the cloud oering, elastic scaling mechanism, input workloads and so on. Most often, elastic scaling mechanisms are designed using static threshold based reactive rulesets for some predicted workloads in mind [109]; however, these common-sense based rulesets may turn out to be extremely ineective when exposed to unprecedented workload conditions and critical usage scenarios [139, 114]. To address this problem, it is necessary to identify and fix adaptability issues in the elastic scaling techniques before putting them into production. The quality assurance stream partially serves this purpose by providing techniques for automated generation and execution of test suites for scaling rulesets [113, 111, 112]. These techniques are specifically concerned about detecting problematic workloads for a given ruleset by iteratively searching the state space of all possible workloads derived from the initial workload suite. To expedite the searching process and ensure better coverage and prediction accuracy for problematic workloads, these techniques adopt genetic algorithm based search optimization, surrogate models and iterative refinement procedure based on behavioral models.

Quality assurance techniques can be used to rule out erroneous elastic scaling tech- niques. This area and elasticity evaluation, therefore, share some commonalities to the assessment of elasticity. However, our benchmarking framework is dierent from qual- ity assurance in a number of respects. Firstly, the primary objective of our elasticity benchmarking framework is to help consumers compare the worthiness of competing cloud platforms. For this reason, we consider the consumer’s context (e.g., business objectives, representative workloads, pricing model) while evaluating alternative cloud solutions. In contrast, the quality assurance framework is not concerned about the fair assessment of alternative platforms; its purpose is to detect potential bugs in the scaling ruleset for a particular cloud environment. This work takes resource-oriented view while discovering the technical anomalies in the scaling ruleset, while our framework takes on the consumer’s viewpoint. Furthermore, our framework considers representative workloads (e.g., standard 70 Background and state of the art use cases, application-specific frequently observed scenarios) to keep the evaluation result relevant to the consumer. On the other hand, the problematic workloads detected by the quality assurance framework may sometimes include unprecedented usage scenarios (i.e., unusual or never-occurring workloads). Although our framework demonstrates the failure of common-sense based reactive ruleset to recover back to the optimal state in a critical workload situation (see Chapter 3), this is not its sole purpose. Finally, the qual- ity assurance framework deals with scaling rules only, whereas our framework evaluates a wide variety of cloud solutions, including, cloud oerings, scaling techniques, deployment architectures and so on.

2.2.6 Elasticity modeling and simulation

A large and growing body of literature focused on elasticity modeling and simulation, where established models, such as behavioral models, queuing theoretic models, formal verification techniques, Petri nets and Stochastic processes (e.g., Markov decision pro- cesses), are used to facilitate oine estimation and verification of elasticity characteristics as well as enforcement of elasticity techniques. Some of these works exploited Service- Oriented Architecture (SOA) performance modeling technique and discrete event simu- lation to predict the operational cost and elasticity break point of various cloud hosted applications for a range of cloud scenarios (e.g., default, best, worst and perfectly elas- tic platforms) [65], whereas others adopted analytical modeling and simulation for oine estimation and comparison of elasticity techniques for multi-tier web applications, such as [220, 145]. These models are useful indeed to simulate the elasticity behavior oine and approximate the relative eectiveness of alternative elasticity techniques, thus saving a great deal of cost and eort; however, they do not necessarily substitute the value of elasticity benchmarking in a real cloud environment. The reason is that in a real cloud de- ployment there may be numerous complexities and environmental uncertainties involved, not all of which can be appropriately incorporated into a model. Moreover, the elasticity estimation accuracy using a model depends on various factors, such as the quality of the model, configuring the model parameters so that they can closely approximate the reality etc. Unfortunately, we have not found sucient guidance on configuring the model pa- rameters in those works; in most cases, the parameters were chosen in an ad-hoc manner during model validation. Another concerning fact is that there is not much consensus in 2.3. Summary 71 the model validation procedure; dierent group of researchers evaluated the estimation accuracy of their models based on their own elasticity metrics and arbitrary workloads, thereby posing significant diculty in assessing the relative accuracy of these models. Perhaps following a standard procedure for model validation would have reflected more about their comparative accuracies; our benchmarking framework may come in handy in this regard.

There are also formal models based on temporal logics and Petri nets that support verification of elasticity properties and scaling strategies during a workload execution, such as [57, 44]. This type of model yields a sequence of boolean values to indicate whether an adaptive scaling strategy satisfies a collection of elasticity properties or not during a workload execution. However, it cannot measure the extent to which that strategy exhibits elasticity as a whole in response to that workload (e.g., as one or more summary metrics).

A number of works also concentrated on enforcing optimal elastic scaling action at runtime; examples include, excess cost driven resource forecasting model [91], behavioral learning based elasticity estimation framework [80], analytical resource forecasting models [42, 123], statistical learning based resource demand prediction models (e.g., Markov Chain Analysis) [122, 214] etc. These models are specifically concerned about local elasticity be- havior and make myopic decisions to ensure optimal elasticity for unforeseen demands at dierent points of the workload execution. In contrast to these works, our benchmarking framework evaluates the worthiness of competing elasticity strategies in response to the overall execution of the workload; therefore, it can be used to compare the relative eec- tiveness of alternative elastic scaling techniques that incorporate such prediction models.

2.3 Summary

This chapter has presented the conceptual foundation and state of the art overview re- quired to understand our elasticity benchmarking framework and its position with respect to similar works and related research areas.

Our literature review revealed considerable diversity in the definitions for elasticity. Its definition, as if, went through a series of evolutionary phases: naive representation, resource-centric view and consumer-centric view. The naive definition of elasticity simply focused on the ability to dynamically adapt the resource capacity. The resource-centric 72 Background and state of the art view refined it further by putting emphasis on the rapid and precise adaptation of the resource capacity in response to the varying workload demand. However, it is not pos- sible to understand the meaning of subjective words (such as rapid and precise) without any specific context. The consumer-centric view made this definition complete by explic- itly tying those subjective terms to the consumer’s context, that is, QoS objectives and cost-eectiveness. Based on the consumer-centric definition, we identified the key char- acteristics of elasticity that influence the consumer’s purchase decision. The elasticity metric(s) is expected to be a good reflection of these characteristics.

Our work also shares some common grounds with other research areas: elastic scaling techniques and quality assurance; the former is concerned about designing optimal scaling techniques to satisfy a set of goals, whereas the latter concentrates on preventing bugs in the scaling ruleset. These works can be viewed as dierent ways to achieve and/or improve elasticity. In contrast, our elasticity benchmarking framework helps consumers make an informed decision about alternative elastic solutions. It can be used to compare dierent elasticity strategies and quantify the worthiness of one strategy over another. When used with quality assurance frameworks, it can quantify the criticality of the bug in terms of monetary cost. From this viewpoint, our elasticity benchmarking framework complements the objectives of these research domains.

We reviewed a number of elasticity benchmarking frameworks in the literature. Again, we observed substantial diversity in the elasticity evaluation concept. Surprisingly, none of these works provided an elasticity metric definition that completely covers the consumer’s view. Most of these works restricted their elasticity metric definitions to resource-centric view only. Another potential concern is that majority of these studies used arbitrary work- loads for elasticity evaluation. Rarely any of these frameworks could come up with real- istic workload representations. As a consequence, the elasticity metrics reported by these frameworks cannot appropriately reflect the consumer’s reality. Another serious weakness of these frameworks is that they did not oer any specific guidance to generate repeatable and valid comparison results in the presence of the environmental nondeterminisms of the cloud. Scarcely, any of these works acknowledged the influence of nondeterminism on their evaluation results. Therefore, the credibility of the benchmarking results produced by these frameworks remains questionable.

In a nutshell, none of the elasticity benchmarking frameworks available today can adequately address the consumer’s perspective. Although several elasticity frameworks 2.3. Summary 73 claimed consumers as their main target audience, our critical evaluation suggests that all of them are over-ambitious in their claims. A flawed elasticity benchmarking framework may convey a skewed picture about the elastic ability of competing cloud platforms and in the worst case, may lead to incorrect conclusions. This may drive the consumer to choose a non-optimal cloud platform whose consequence may be realized when the consumer’s application fails to adapt well in response to the production workload. This not only aects the consumer’s current revenue, but also puts her future revenue at risk, and most importantly, this develops skepticism in the consumer’s mind about the real benefit of elasticity. We cannot let this situation continue indefinitely. Therefore, we have decided to introduce a novel elasticity evaluation framework ElasticMark for cloud platforms that addresses the consumer’s perspective by resolving the aforementioned issues. Specifically, our research roadmap has been defined with the following tasks in mind:

Develop a clear understanding about consumer-centric elasticity evaluation; specifi- • cally, map the key characteristics of elasticity to metrics which are of the consumer’s interest.

Design a core elasticity benchmarking framework that includes all basic components • required to address the consumer’s perspective. At the very least, it should consist of an elasticity metric definition (an appropriate reflection of the consumer-centric elas- ticity definition), a standard workload suite covering important and frequently ob- served workloads and a systematic procedure for instantiating an executable bench- mark.

Extend the core framework with a novel workload model to help the consumers • generate and customize fine-scale bursty workloads from traces. This is required to promote customized elasticity benchmarking with realistic prototypes of actual workloads.

Extend the core framework with a set of rigorous techniques to ensure repeatable • and valid elasticity benchmarking results, even in the presence of cloud environment specific nondeterminisms.

The first two steps are addressed in Chapter 3 and the next two tasks are presented in Chapter 4 and Chapter 5 respectively.

Chapter 3

The core framework

“Measure what can be measured, and make measurable what cannot be measured.”

Galileo Galilei

This chapter delves into the technical details of the core elasticity benchmarking frame- work for cloud platforms from the consumer’s viewpoint. It presents a penalty-based ap- proach for elasticity evaluation, guides through the process of constructing an executable elasticity benchmark and demonstrates the applicability of this framework in addressing dierent elasticity concerns of the cloud consumer based on real-world case studies.

3.1 Introduction

Elasticity promises rapid adjustment of resource capacity (as well as operational expenses) to closely follow the workload demand so that there is no disruption and performance degradation in the delivered service. It has the potential to boost the consumer’s revenue by minimizing the perceived risks of over-provisioning (excess operational expenses for idle resources) and under-provisioning (lost opportunities for inadequate resources). Motivated by this fact, cloud consumers have recently shown increased interest in the elastic cloud for their fluctuating and occasionally bursting workloads. While all cloud oerings claim elasticity as a virtue that they possess, recent evidence suggests that none are perfect in delivering ideal elasticity to the fluctuating workloads of the cloud consumer. There may

75 76 The core framework be a minimum charge (e.g., 1 instance), there may be delays in adapting the resource capacity to the sudden increase in workload, and so on. Therefore, the consumer needs to know how elastic is each cloud platform. Just as with traditional IT infrastructure, the company seeking to use a cloud platform needs a basis for comparing dierent oerings, and choosing the one that will be best for its needs. The usual approach is to follow a benchmark, which includes a standardized workload, and defines exactly how to measure the behavior of any system when subjected to this workload. The benchmark gives one or a few summary numbers that represent the value to the chooser of the system; it is then reasonable to select the system with the best benchmark results.

Although a good number of frameworks were proposed before for elasticity evaluation, none of them adequately addressed the concerns of the cloud consumer. Most of these frameworks make unfounded assumptions about or completely disregard the consumer’s business objective (e.g., application specific Service Level Objective (SLO) and workload profiles) and cloud-specific pragmatic issues (e.g., charging quanta and resource granular- ity). None of these frameworks provides any explicit guidance to express elasticity as a single figure of merit for a given workload suite, thereby posing diculty in drawing a simple conclusion about the worthiness of competing cloud platforms. Obviously, these aspects need to be considered in order to keep the benchmarking results useful and rele- vant to the cloud consumer. Failure to incorporate these aspects bears the risk of choosing a non-optimal elastic platform that may, in turn, aect the running cost and profitability of the consumer. To overcome this gap in the literature, we introduce a novel elasticity benchmarking framework that addresses the consumer’s concerns while deriving the elas- ticity metric. In other words, we limit ourselves to running a benchmark as a consumer - this includes considering her application and workload profile, incorporating the business objectives, and taking observations that are available to the consumer through the plat- form’s API or inside the user’s application code. This makes our task harder, since we do not have access to arbitrary measurements of the infrastructure itself, but this viewpoint is necessary for our work to give the consumer a reasonable basis for choosing between competing platforms. In contrast, the provider’s viewpoint of elasticity can be completely dierent because they have access to the measurements from the underlying physical in- frastructure (hardware configuration and specification) and virtualization environment. This enables them to do performance modeling and looking at interactions between com- ponents to deduce the performance outcome for a class of applications. For the consumer’s 3.1. Introduction 77 viewpoint, we try to understand the elasticity behavior of the platform from its response to a suite of workload patterns. In the end, we come up with a number that measures elasticity as a property of the cloud platform (though the actual figure-of-merit will vary depending on the application’s business model, workloads, etc.) so that the consumer can draw a simple conclusion about the relative ranking of alternative cloud platforms.

The key contributions of this chapter are summarized as follows:

We introduce a novel framework for evaluating elasticity, that can be run by a • consumer and which takes into account the consumer’s business objectives and usage scenarios. To achieve this, we use workloads that vary over time in dierent ways. Some workloads will rise and fall repeatedly, others will rise rapidly and then fall back slowly, etc. For each workload, we examine the way a platform responds to this, and we quantify the eect on the consumer’s finances. That is, we use a cost measure in cents per hour, with a component that captures how much is wasted by paying for resources that are not needed at the time for the workload (overprovisioning), and a component to see how much the consumer suers (opportunity cost) when the system is underprovisioned, that is, the platform is not providing enough resource for a recent surge in workload. In the end, we provide means to summarize elasticity as a single figure of merit so that the consumer can draw a simple conclusion about one platform’s worthiness over another.

We propose a set of guidelines for constructing an executable elasticity benchmark • based on our framework. This includes making concrete choices for designing the workload suite and SLOs relevant to the consumer’s application specific usage scenar- ios. We also illustrate with an example how to set the SLOs for the non-functional properties of the application, e.g., response time and dropped request constraints based on user behavior studies. In addition to this, we discuss the experimental testbed as well as the configuration and measurement procedure for elasticity eval- uation of actual cloud platforms.

Finally, we validate our benchmarking framework for dierent adaptive scaling rule- • sets. From our measurements, we have discovered several characteristics of a cloud platform that are important influences on the extent of elasticity. Some of these (such as the speed of responding to a request for increased provisioning) have al- ready been discussed by practitioners, but others seem to have escaped attention. 78 The core framework

For example, we find in Amazon EC2 that there is a large improvement of elasticity from relatively simple changes in the set of rules that the system uses to control provisioning and deprovisioning. There is a set of rules (based on recent utilization rates) that is widely followed, perhaps because it is done that way in tutorial ex- amples. We find that this leads to rapid deprovisioning when load decreases, which leaves the system underprovisioned if a future upswing occurs. Because the financial impact from poor QoS (when demand cannot be handled) is generally much more severe than the cost of running some extra resources for a while, this is a poor strat- egy. What is worse, on typical platforms one pays for an instance in quanta that represent a significant period of time (say, 1 hour), so eagerness to deprovision can leave the consumer paying for a resource without the ability to use it.

The remainder of this chapter is structured as follows: Section 3.2 formulates the basic intuition for elasticity evaluation. Section 3.3 presents our framework for evaluating elasticity, which includes a measurement model based on penalties that are expressed in monetary units and specific choices for the workload curves. Section 3.4 exemplifies the instantiation of an elasticity benchmark; it provides guidelines for specifying the SLOs and the benchmarking workload suite, the details of the experimental setup, including the tools and specific cloud technologies used in our case studies. Section 3.5 presents empirical case studies that show the elasticity of particular platforms, given by dierent choices of rulesets that control provisioning decisions in a widely-used public cloud. Section 3.6 provides a discussion of the lessons learnt based on the case studies. Section 3.7 reviews the potential risks and benefits associated with our elasticity framework and Section 3.8 concludes.

3.2 Consumer-centric elasticity measurement

This section illustrates our intuitive approach for consumer-perceived elasticity measure- ment. Our literature review in Chapter 2 suggests several key characteristics of elasticity: resource adaptation speed, granularity of usage accounting and cost-eective resource us- age. Measuring these characteristics as in a micro-benchmark could have made our task easier; however, discrete measures, though very comprehensive, fail to reflect the com- plex interplay between elasticity and the consumer’s business situation. For this reason, 3.2. Consumer-centric elasticity measurement 79 we adopt a macro-benchmarking approach for elasticity evaluation by incorporating the consumer’s application of interest, relevant workload suite and business objectives. We are specifically interested in determining how these discrete measures influence the con- sumer’s application-specific measures (such as end-to-end QoS, charged versus utilized resource levels) in terms of monetary cost. Similar to Weinman [239], we adopt a penalty- based approach in defining the elasticity metrics in order to assess the detriment of cloud consumers for imperfect elasticity. We do this by computing the total penalty for over- provisioning and under-provisioning; the former reflects excess payment for idle resources, whereas the latter represents the opportunity cost for inadequate resources.

Figure 3.1: Elasticity behavior of EC2 in response to a periodic sinusoidal workload

Looking at Fig. 3.1, we observe that over-provisioning penalty is a direct reflection of coarsely granular resource supply and charging for unavailable resources (due to long charging quanta, resource release delay, charging during the spin-up period). It also has relation to the pricing model; an expensive pricing strategy may increase the payment for idle resources. Under-provisioning penalty, on the other hand, can be determined by estimating the opportunity cost of unacceptable performance when the underlying platform causes a delay in provisioning adequate resources. This is how we can map the discrete measures of elasticity to the consumer’s business context. Having elaborated the basic intuition for elasticity measurement from the consumer’s viewpoint, we will now move on to represent this empirical notion using a formal mathematical model in the forthcoming section. 80 The core framework

3.3 Elements of the elasticity benchmarking framework

This section defines our proposal, to determine a figure that expresses “how elastic is a given cloud platform”. We explain a general framework to measure the cost of imper- fect elasticity when running a given workload, with penalties for overprovisioning and underprovisioning; the sum of these is the penalty measurement for the workload. By considering a suite of workloads, and combining penalties calculated for each, we can define a figure-of-merit for a cloud platform.

3.3.1 Penalty model

We present our approach to measuring imperfections in elasticity for a given workload in monetary units. We assume that the system involves a variety of resource types. For example, the capacity of an EC2 instance can be measured by looking at its CPU, memory, disk capacity, etc. We assume that each resource type can be allocated in units. We assume that the user can learn what level of resourcing is allocated and the relevant QoS metrics for their requests (such as the distribution of response times). AWS CloudWatch [8] is an example of the monitoring functionality we expect.

Our elasticity model combines penalties for over-provisioning and for under-provisioning. The former captures the cost of provisioned yet unutilized resources, while the latter measures opportunity cost from the performance degradation that arises with under- provisioning.

3.3.1.1 Penalty for over-provisioning

In existing cloud platforms, it is usual that resources are temporarily allocated to a con- sumer from a start time (when the consumer requests the resource based on observed or predicted needs, or when the system proactively allocates the resource) until a finish time.

This is represented by a function we call available supply and denote by Ri(t) for each resource i. In current platforms, it can also happen that a resource may be charged to a consumer even without being available. For example, in Amazon EC2, an instance is 3.3. Elements of the elasticity benchmarking framework 81 charged from the time that provisioning is requested (even though there is a delay of sev- eral minutes before the instance is actually running for the consumer to utilize). Similarly, charging for an instance is done in one-hour blocks, so even after an instance is deprovi- sioned, the consumer may continue to be charged for it for a while. Thus we need another function Mi(t) that represents the chargeable supply curve; this is what the consumer is actually paying for. These curves can be compared to the demand curve Di(t).

The basis of our penalty model is that the consumer’s detriment in overprovisioning (when R(t) >D(t)) is essentially given by the dierence between chargeable supply and demand; as well, we charge a penalty even in underprovisioned periods whenever a resource is charged for but not available (and hence not used). These penalties are computed with a constant of proportionality ci that indicates what the consumer must pay for each resource unit. In real systems, resources of dierent types are often bundled, and only available in collections (e.g., an EC2 instance has CPU, bandwidth, storage etc.). We assume that some weighting is used to partition the actual monetary charge for the bundle between its contained resources.

Formally, we define the overprovision penalty Po(ts,te) for a period starting at ts and ending at te. We assume a set of resources indexed by i, and we use functions Di(t),

Ri(t), and Mi(t) for the demand, available supply, and charged supply, respectively, of resource i at time t. Our definition aggregates the penalties from each resource, and for each resource we integrate over time.

Definition 3.1

Po(ts,te)= Po,i(ts,te) ÿi te P (t ,t )= c d (t)dt o,i s e i ◊ i ⁄ts

Mi(t) Di(t) if Ri(t) >Di(t), Y ≠ _ _Mi(t) Ri(t) if Mi(t) >Ri(t) _ ≠ di(t)=_ _ ]_ and D (t) R (t), i Ø i _ _0 otherwise. _ _ _ [_ 82 The core framework

3.3.1.2 Penalty for under-provisioning

Next, we turn to the penalty model for under-provisioning, when resources are insucient and performance is poor. We measure the opportunity cost to the consumer, using SLOs that capture how the service matters to them.

We assume that the consumer has used their business environment to determine a set of performance or Quality of Service (QoS) objectives, and that each is the foundation for an SLO-style quantification of unsatisfactory behavior. For example, the platform’s failure to meet the objective of availability can be quantified by counting the percentage of requests that are rejected by the system. In many cases, such SLO quantifications might reflect a wide variety of causes, not only those that arise from underprovisioning, but also some from e.g., network outage. We assume that the customer also knows how to convert each measurement into an expected financial impact. For example, there might be a dollar value of lost income for each percent of rejected requests. In many cases, the financial impact may be proportional to the measurement, but sometimes there are step functions or other nonlinear eects (for example, word-of-mouth may give a quadratic growth of the damage from inaccurate responses). To provide a proper baseline for the penalties, we also consider the ideal value that occurs when resources are unlimited (in practice, we measure with such a large amount of overprovision that any additional allocation would not change the SLO measurement).

Formally, we let Q be a non-empty set of QoS measures, and for each q Q,we œ consider a function pq(t) that reflects the amount of unsatisfactory behavior observed on the platform at time t. The consumer provides also, for each QoS aspect q, a function fq that takes the observed measurement of unsatisfactory behavior and maps this to the financial impact on the consumer. Let popt(t) denote the limit (as K ) of the amount of q ΩŒ unsatisfactory behavior observed in a system that is statically allocated with K resources.

Thus we define the underprovision penalty Pu(ts,te) for a period starting at ts and ending at te 3.3. Elements of the elasticity benchmarking framework 83

Definition 3.2

Pu(ts,te)= Pu,q(ts,te) q Q ÿœ te P (t ,t )= (f (p (t)) f (popt(t)))dt u,q s e q q ≠ q q ⁄ts

3.3.1.3 Total penalty rate for an execution

We calculate the overall penalty score P (ts,te) accrued during an execution from ts till te, by taking the sum of the penalties from both over- and under-provisioning; note that both are expressed in units of cents. We then calculate the total penalty rate P in cents per hour. A lower score for P indicates a better elastic response to the given workload.

Definition 3.3 The penalty score over a time interval [ts,te] is defined as follows:

P (ts,te)=Po(ts,te)+Pu(ts,te) P (t ,t ) P = s e t t e ≠ s

3.3.2 Single figure of merit for elasticity

The definitions above measure the elasticity of the system’s response to a single demand workload. Dierent features of the workload may make the elastic response easier or harder to achieve; for example, if the workload grows steadily and slowly, a system may adjust the allocation to match the demand, but a workload with unexpected bursts of activity may lead to more extensive under-provisioning. Thus, we consider a suite of dierent workloads, and determine the penalty rate for each of these; the workload suite will be discussed in detail in Section 3.3.3.

In order to draw a simple conclusion about the worthiness of one platform’s elasticity over another, we wish to summarize the penalty rates for the entire workload collection into a single score, as usual in benchmarks. To combine the measured penalty rates from several workloads into a single summary number, we follow the approach used by the SPEC family of benchmarks. That is, we choose a reference platform, and measure each 84 The core framework workload on that platform as well as on the platform of interest. We take the ratio of the penalty rate on the platform we are measuring, to the rate of the same workload on the reference platform, and then we combine the ratios for dierent workloads by the geometric mean. That is, if Px,w is the penalty rate for workload w on platform x, and we have n workloads in our suite, then we measure the elasticity of platform x relative to reference platform x0 by n E = n (P /P ) ˆ x,wi x0,wi ıi=1 ıŸ Ù The lower the elasticity metric E, the better the elastic response of the cloud platform x for the given workload suite. Therefore, sorting the elasticity metrics from the lowest to the highest, consumers can readily draw a conclusion about the relative worthiness of a number of competing cloud platforms for a given workload suite.

3.3.3 Workload suite specification

Workload characteristics significantly influence the elasticity behavior of an application; for instance, a steady and slowly changing workload demand may put less stress on the adaptivity of the elastic system as opposed to a highly fluctuating one. Several factors need to be considered while designing the workload suite for elasticity evaluation, namely, representativeness, adaptability and scalability. This means, the workload should stress the platform’s adaptability behavior in a realistic way. It should also be scalable so that the benchmarker can evaluate the cloud platform’s adaptability behavior for a range of dierent amplitudes.

We consider various workload characteristics (e.g., periodicity, growth and decay rate, randomness) to understand how the platform’s elastic response varies across the workload space. Across time, some workloads show recurring cycles of growth and decrease, such as an hourly news cycle. Others have a single burst, such as when the news breaks or during a marketing campaign1. We explore some trends as the length of cycles changes; however, work is still needed to consider the behavior with longer cycles such as daily ones, or longer-lasting one-o events. Further research is also needed on whether conclusions from small loads will be valid for much larger ones, as expected by large customers. Note that

1See ecn.channel9.msdn.com/o9/pdc09/ppt/SVC54.pptx, http://www.mediabistro.com/alltwitter/osama-bin-laden-twitter-record_b8019 3.4. Executable benchmark instantiation 85 the workloads described below are expressed in terms of the rate requests are generated.

Sinusoidal workload: This workload type represents periodicity, as experienced • by a news server or online restaurants. These loads can be expressed as x(t)= A(sin(2fit/T + „) + 1) + B,whereA is the amplitude, B is the base level, T is the period and „ is the phase shift.

Sinusoidal workload with plateau: This workload type modifies the sinusoidal • waveform, by introducing a level (unchanging) demand for a certain time, at each peak and trough. Thus the graph has upswings and downswings, with flat plateau sections spacing them out.

Exponentially bursting workload: This workload type exhibits extremely rapid • buildup in demand (rising U-fold each hour), followed by a decay (declining D-fold t each hour). It can be expressed as x(t)=a.b · ,wherea is the initial request rate, b is the positive growth (b>1) or decay factor (0

Linearly growing workload: This workload represents a website whose popularity • rises consistently. It can be stated as x(t)=mt + c,wherem is the slope of the straight line and c is the y-axis intercept.

Random workload: The generation of requests is ongoing and independent. We • have one example of this type, with requests produced by a Poisson process.

So far, we have delineated the abstract concept for elasticity benchmarking. In the section that follows, we exemplify which considerations should be taken while instantiating an executable elasticity benchmark based on our proposed framework.

3.4 Executable benchmark instantiation

The approach to elasticity evaluation described above is flexible, and could be adapted to the needs of each consumer, through the choices available. One can set particular SLOs that reflect the consumer’s business situation, workloads that are representative of her application’s patterns of load variation, etc. To determine an elasticity score, one 86 The core framework needs to make concrete choices for all these parameters. In this section, we illustrate the basic steps for constructing an executable benchmark. We first discuss choices of an executable benchmark - that is, taking concrete decisions on the Service-Level Objective (SLO) aspects that are evaluated, charging rates for cloud resources, and the particular suite of workloads, then describe the architectural components of the experimental testbed and finally, outline the how-tos of configuring the testbed as well as taking measurements to fit the consumer-specific scenarios.

3.4.1 Choices for an elasticity benchmark

The first step here is to determine the target application type to be benchmarked and set the performance objectives accordingly. We particularly focus on the On-Line Transaction Processing (OLTP) aspect of the e-commerce application type while designing this bench- mark. There are several reasons for this selection; first, e-commerce application is one of the most popular means by which most businesses interact with their customers and OLTP is considered to be its mainstream. Second, businesses earn their revenues by serving these OLTP workloads which usually require stringent QoS guarantee; even a few seconds delay in processing the customer’s request might end up with a lost opportunity. However, the current state of cloud elasticity is not so instantaneous and fine-grained to adapt to this requirement. As a result, the revenues of these applications are likely to be aected by the imperfect elasticity of the cloud platforms. And finally, these applications face highly fluctuating workload demands most often and require fine-grained resource supply that can catch up with the workload demand in real time, that is, supplied resources need to closely follow the workload demand in no time. However, existing cloud providers today cannot provide elasticity with this level of speed and granularity; this incurs increased IT cost for e-commerce applications. This is why we choose OLTP application type for our cloud elasticity benchmark. We pick the TPC-W implementation [117] as benchmark ap- plication for its wide acceptance in the enterprise to represent OLTP applications. TPC-W emulates user interactions of a complex e-commerce application (such as an online retail store).

To calculate the overprovisioning penalty, we deal with a single resource (CPU capacity, relative to a standard small EC2 instance) and measure the financial charge as $0.085 per 3.4. Executable benchmark instantiation 87 hour per instance. This reflects the current charging policy of AWS2. To calculate the underprovisioning penalty, we set the QoS constraints based on the existing user behavior studies in usability engineering literature [191]. In particular, each user in our workload pattern lands on the homepage first and then searches for newly released books. We expect at least 95% of these requests generated by the users will see a response within 2 seconds. Henceforth, we have used the following two QoS aspects with associated penalties over an hourly evaluation period. The cost penalty for response time violation is a simplified version of the cost function mentioned in [154]. As e-commerce websites lose more revenue when response time is slow than for application down-time3, we associate a lower cost penalty for the unserved requests.

(Response time) In each hour of measuring, there is no penalty as long as 95% of • requests have response time up to 2 seconds; otherwise, a cost penalty, 12.5 will apply for each 1% of additional requests (beyond the allowed 5%) that exceed¢ the 2-seconds response time constraint.

(Availability) Cost penalty of 10 per hour will apply for each 1% of requests that • fail completely (that are rejected¢ or timed out).

Note that the penalty for unmet demand is very high compared to the cost of provision- ing; this fits well with the real business scenarios of cloud consumers. As Weinman [239] pointed out, the cost of resources should be much less than the expected gain from using them (and the latter is what determines the opportunity cost of unmet demand).

Note that the appropriate SLOs and their penalties may vary largely based on the business situation of the consumer of cloud services. In this dissertation, we use a penalty corresponding to a rather small business (the penalty is only $10 in case the service is completely unavailable for an hour, when all requests are rejected). For a large e- commerce business application (e.g., e-bay), the appropriate penalty for down-time may be much higher4,say,$2000/second, and similarly, the appropriate workloads would be much greater. The SLOs and the opportunity costs specified here should be considered as an illustration, to be adjusted based on the application’s business context.

We have instantiated a workload suite to explore the platform’s elasticity behavior for a range of patterns of demand change, following the workload specification in Section

2 as of 2011 when this work was published 3See http://blog.alertsite.com/2011/02/online-performance-is-business-performance/ 4See http://www.raritan.com/resources/case-studies/ebay.pdf 88 The core framework

3.3.3. In our measurements, we use a set of 10 dierent workloads, which grow and shrink in a variety of shapes, though (to make benchmarking manageable) all are fairly small, peaking with less than 10 instances, and lasting 2-5 hours.

Sinusoidal workload: For the benchmark suite, we use three dierent examples, • whose periods are 30 minutes, 60 minutes, and 90 minutes, respectively. All have a peak demand of 450 requests/second, and trough at 50 requests/second. A load of 150 requests/second is about what one small VM instance can support.

Sinusoidal workload with plateau: In the suite, we have three workloads like this, • each starting from the sinewave with a period of 30 minutes; in one case the plateau at each peak lasts for 10 minutes, in another it lasts for 40 minutes, and in the last of this type, the peak plateau lasts for 70 minutes. In all cases, the plateaus at troughs last for 45 minutes (and there is always a 10 minutes plateau at the start of the experiment and also at the end).

Exponentially bursting workload: We provide two workloads of this type, one with • U = 18 and D =2.25; the other has U = 24 and D =3(so this rises and falls more quickly).

Linearly growing workload: We have one example of this type, with a workload that • starts at 50 requests/second (and stays here for 10 minutes to warm the system up), then the load rises steadily for 3 hours, each hour increasing the rate by an extra 240 requests/second. Thus we end up with 770 requests/second.

Random workload: We have one realization of this type, with requests produced by • a Poisson process.

Appendix B shows all workloads of the standard suite. We note that the demand curves described above are expressed in terms of the rate requests are generated; in practice, performance variation in identical instances means that this does not cause the utilization of CPU resources to track the desired demand pattern exactly. 3.4. Executable benchmark instantiation 89

3.4.2 Experimental setup

In the high-level view, the architecture of our experimental setup can be seen as a client- server model. The client side is a workload generator implemented using JMeter [194], which is a Java workload generator used for load testing and measuring performance. The sole purpose of JMeter in this experiment is to generate workloads based on our predefined workload patterns.

We choose TPC-W as the benchmark application; it has easy-to-obtain code examples and it is most often used in the literature. It can be substituted with other applications if desired. TPC-W emulates user interactions of a complex e-commerce application (such as an online retail store). In our experiment, we adopt the online bookshop implementation of TPC-W application and deploy it into EC2 small instances. Instead of having the TPC-W workload generator at the client side, we use JMeter to specify our predefined workload patterns.

The server side is considered to be the System Under Test (SUT), which consists of a single load-balancer facing the client side, and a number of EC2 instances behind the load-balancer. We hosted the web server, application server and database on the same EC2 m1.small instance at the US-East Virginia region (the cost of each instance is 8.5 per hour, matching the penalty we apply for overprovisioning); as some of the database¢ queries consumed more CPU, we had to restrict the processing rate for TPC-W server to 150 requests/second to achieve satisfactory performance. The number of instances is not fixed, but rather it is controlled by an autoscaling engine which dynamically increases and decreases the number of instances based on the amount of workload demand. The behavior of an autoscaling engine follows a set of rules that must be defined. Each ruleset produces a unique “elastic platform” for experimental evaluation.

An autoscaling rule has the form of a pair consisting of an antecedent and a conse- quence. The antecedent is the condition to trigger the rule (e.g., CPU utilization is greater than 80%) and a consequence is the action to trigger when the antecedent is satisfied (e.g., create one extra instance). In our experiments, we consider three platforms because we run with three dierent rulesets. The detailed configurations of the autoscaling engine (configured via Autoscaling library [7]) are shown in Table 3.1.

We have explored rulesets that scale-out and -in by changing the number of instances, 90 The core framework all of the same power. Some cloud platforms, including EC2, also allow one to provision instances of dierent capacity, vary bandwidth, etc.; how such rules alter the elasticity measures is an issue for further research, although our definitions will still apply.

Table 3.1: Autoscaling engine configuration

Ruleset Monitoring Upper Lower Upper Lower VM in- VM Scale- Scale-in interval breach breach thresh- thresh- crement decre- out cool- dura- dura- old old ment cool- down tion tion down period period 1 1min 2 2 70% 30% 1 1 2mins 2mins mins mins CPU CPU average average 2 1min 2 15 70% 20% 2 2 2mins 10 mins mins mins CPU CPU average average 3 1min 4 10 1.5 sec 20% 1 1 2mins 5mins mins mins max CPU latency average

We measure available supply R(t) by using the reports from CloudWatch showing the number of instances that are allocated to our experiment; we treat k instances as R(t) = 100 k% of supply, so this function moves in discrete jumps. Chargeable supply ◊ M(t) is determined from the launch time and termination time of the allocated EC2 instances, given by AWS EC2 API tools. For demand, our generator is defined to produce a given number of requests, rather than in the measure of CPU capacity, that is needed for our measurements. Thus we use an approximation: we graph D(t) from what CloudWatch reports as the sum of the utilization rates for all the allocated instances. As will be seen in the graphs in Section 3.5, this is quite distorted from the intended shape of the demand function. One distortion is that measured D(t) is capped at the available supply, so under-provisioning does not show up as D(t) >R(t). This inaccuracy is not serious for our measurement of elasticity, since the use of D(t) in measurement is only for cases of over-provisioning; during under-provisioning, the penalty is based on QoS measures of unacceptable response times and lost requests, and these do reveal the growth of true demand. Another inaccuracy is from the system architecture, where requests that arrive in a peak period may be delayed long enough that they lead to work being done at a later period (and thus measured D(t) may be shifted rightwards from the true peak). As well, there is considerable variation in the performance of the supplied instances [88], so a given rate of request generation with 450 requests/second can vary from 350% to 450% when we see the measured demand. Future work will find ways to more accurately measure demand in units of CPU capacity. 3.5. Case studies 91

3.4.3 Configuration and measurement procedure

Here is the procedure for setting up the elasticity measurement environment. First, a VM image is prepared by installing necessary components for the target application (e.g., TPC-W). Then the load-balancer is launched and the autoscaling configuration for the dynamically scalable server farm is set up. A monitoring agent (e.g., CloudWatch) is also configured to measure utilization and performance data for each workload run.

Next, the scripts for all workload demands are distributed to the client-side load gen- erator (e.g., JMeter). Each workload demand is applied to a fresh setup of the server farm. At the end of each workload run, utilization and performance data are collected from the monitoring agent and load generator respectively. The penalty rate for each workload demand is computed with the help of the penalty model, described in Section 3.3.1. Same procedure is repeated to measure the penalty rate for other workload demands. Finally, a single elasticity score is derived by taking the geometric mean of the penalty ratios of all workload demands in the collection with respect to the reference platform.

3.5 Case studies

We describe in some detail the observations made when we run our workloads against Amazon EC2. These case studies serve two purposes. (1) As a means of sanity checking the elasticity model in Section 3.3.1. That is, we hypothesize that the numerical scores, based on our elasticity model, do in fact align with what is observed in over- or under- provisioning. For example, in reducing the steepness of a workload increase we expect to observe that supply tracks more closely to demand, and the penalty calculated is lower. (2) We demonstrate the usefulness of our elasticity benchmark in exposing situations where elasticity fails to occur as expected, and other interesting phenomena can be observed.

3.5.1 Exploring workload patterns

To begin, we applied each of the 10 workload patterns from our elasticity benchmark, to EC2 with a fixed scaling ruleset 1 as defined in Table 3.1; the detail elasticity behavior 92 The core framework of all workloads will be found in Appendix B. Note that this ruleset is common in tu- torial examples, and it seems widespread in practice5. With this ruleset, the number of instances increases by one when average CPU utilization exceeds 70%, and one instance is deprovisioned when average CPU utilization drops below 30%.

We illustrate first how to derive the penalty rate from the raw measurement data. For each workload demand, we compute over-provisioning amount by taking the dierence between the charged resource supply and the used-up resource demand for the entire workload duration (e.g., 110 minutes interval for sine workload with 30 minutes period). For these case studies, we consider only CPU resource and assume that its pricing is equal to that of an EC2 instance (i.e., 8.5 /hour). Thus we calculate the unit price for CPU resource, assuming that each hour consists¢ of 60 time units (i.e., 60 minutes) and supplied CPU at each time unit is k 100%,wherek is the number of charged VM instances. ◊ We work out the over-provisioning penalty by multiplying the unit CPU price with the over-provisioned quantity. For the under-provisioning penalty, we measure the percentage of response time violations and dropped requests and evaluate the opportunity cost of the degraded performance based on the SLO definitions, described in Section 3.4.1. Finally, we aggregate the penalty values for over- and under-provisioning and normalize it to compute the penalty rate per hour, thus yielding the penalty rate for a particular workload demand.

3.5.1.1 Eect of over- and under-provisioning

Figure 3.2 shows elasticity behavior of the EC2 platform in response to an input sinu- soidal workload with a period of 30 minutes. The CPU graph shows the available supply, chargeable supply and demand curves over a 110-minute interval. Initially, there was only one instance available to serve the incoming requests. As workload demand increases (af- ter 15 minutes), the rule triggers provisioning a new instance, but we observed a delay of about 6 minutes until that is available (however it is charged as soon as the launch begins). As workload generation is rising fast during this delay, the system experiences severe eects of under-provisioning: latency spikes and penalties accrue at about 20 /min. In our implementation, demand is measured on the instances and so the curve shown¢ is capped at the available supply, rather than showing the full upswing of the sinewave. The lag between charging for the instance and its being available, is reflected in the penalty

5See http://mtehrani30.blogspot.com/2011/05/amazon-auto-scaling.html 3.5. Case studies 93 for over-provisioning which is about 0.14 /min during this period. ¢ CPU Demand and Supply 600 demand 500 a/supply c/supply 400 300

CPU (%) 200 100 0 0 20 40 60 80 100 Time (in mins)

Maximum Latency 70 60 50 40 30 20 10

(in seconds) 0 0 20 40 60 80 100 Maximum Latency Time (in mins)

Request Counts 20000 15000 10000 5000 0

No. of Requests 0 20 40 60 80 100 Time (in mins)

Penalty for Under-Provisioning 25 20 15 10 5 0 0 20 40 60 80 100 Penalty (cents/min) Time (in mins)

Penalty for Over-Provisioning 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 20 40 60 80 100 Penalty (cents/min) Time (in mins)

Figure 3.2: Elasticity behavior of EC2 with ruleset 1 in response to a sinusoidal workload with 30 minutes period

Between timepoints 50 and 55, we see high penalties for both over-provisioning and under-provisioning. This may sound counter-intuitive as one might wonder how “excess resource” and “insucient resource” co-exist at the same time. Looking at the CPU graph, we find significant dierence between the charged and available supply during that interval; the consumer continues paying for 3 extra unusable instances (2 deprovisioned instances from the previous cycle and 1 yet-to-be-provisioned instance in the current cycle) which contribute to the over-provisioning penalty. Also, the available instance supply (i.e., 2 instances) is not enough to meet the increasing demand for that duration, thus resulting in a high under-provisioning penalty. This phenomenon draws a clear distinction between 94 The core framework resource elasticity and consumer-perceived elasticity of the cloud platform. Just looking at the available supply, it is not possible to imagine how the consumer gets charged in a real cloud platform; therefore, the elasticity metric derived from resource elasticity concept yields a skewed view, which may eventually blur the consumer’s subjective judgement about the elastic ability of competing cloud platforms.

3.5.1.2 Deprovisioning of resources

Our work’s inclusion of cases where workload declines is dierent from most previous proposals for cloud benchmarks. These situations show interesting phenomena. We saw situations where a lag in releasing resources was actually helpful for the consumer. In Figure 3.2 we see, on the downswing of the demand curve, that most of the resources claimed on the upswing were kept; this means that the next upswing could utilize these instances, and so the latency problems (and underprovisioning penalty) were much less severe than in the first cycle.

We also observe in downswings that the dierence between chargeable supply and available supply is significant, with a deprovisioned instance continuing to attract charge till the end of the hour-long quantum. We see in the CPU graph of Figure 3.2 that the chargeable supply is simply not following the demand curve at all, and indeed there are extensive periods when the consumer is paying for 4 instances, even though they never have more than 2 available for use.

Considering the evolution of the supply led us to discover an unexpected inelasticity phenomenon, where the cloud-hosted application is never able to cut back to its initial state after a temporary workload burst. The average utilization may not drop below 30% which triggers deprovisioning, even though several instances are not needed. To demonstrate this fact, we ran a sinusoidal workload pattern with peak at 670 requests/second and trough at 270 requests/second, and 40 minutes plateau at each peak and trough; the resultant graphs are shown in Figure 3.3. The peak workload triggered the creation of 6 instances. The long-lasting trough workload (about 136% CPU) could easily be served by 2 instances, however, the number of VM instances remained at 4. 3.5. Case studies 95

CPU Demand and Supply 700 demand 600 a/supply c/supply 500 400 300 CPU (%) 200 100 0 0 20 40 60 80 100 120

Maximum Latency 50 40 30 20 10

(in Seconds) 0

Maximum Latency 0 20 40 60 80 100 120

Request Counts 25000 20000 15000 10000 5000 0

No. of Requests 0 20 40 60 80 100 120 Time (in mins)

Figure 3.3: Results of the trapping scenario

3.5.1.3 Trends in elasticity scores

Looking at the penalty scores in Table 3.2, we can see how the calculated penalty varies with the type of workload. In all workloads (except the linear one), the overall penalty is dominated by the loss in revenue due to under-provisioning. This is appropriate to business customers as the opportunity cost, from unmet requests or unsatisfactory response that may annoy users, is much higher than the cost of resources.

For pure sinusoidal workload patterns, the overall penalty declines with the increase in wave period. This demonstrates that, with this ruleset, the EC2 platform is better at adapting to changes that are less steep. Here underprovisioning penalties will be less severe as the demand will not have increased too much in the delay from triggering a new instance, until it is available to serve the load.

The sinusoidal workload with plateaus has higher overall penalty compared to the basic sinusoidal workload where the cycles are sharper. We attribute this to the insertion of a 45 minutes plateau at the trough which wipes out the resource-reuse phenomena in subsequent cycles. With a trough plateau, the system has time to deprovision and return to its initial state before the next cycle; therefore, each cycle could not take advantage of the resources created in the previous cycle and it pays a similar underprovisioning penalty. 96 The core framework

As the length of the plateau at the peak increases (from 10 minutes to 70 minutes), overall penalty gradually moves down. The system has time to adapt to the peak demand, and serve it eectively for longer; as a result, the total fraction of requests with unacceptable performance drops to a great extent.

For the exponential burst workloads, we observe large penalty values in Table 3.2. In general, under-provisioning penalty tends to rise as the growth rate increases; that means, the underlying cloud platform is not elastic enough to grow rapidly with these fast-paced workloads, thus resulting in sluggish performance as they head towards the peak. Figure 3.4 explains the performance implications of an exponential workload with growth 24-fold per hour and decay 3-fold per hour. The high under-provisioning cost in the penalty graph also confirms EC2 platform’s inelasticity in coping up with this fast-paced workload pattern. The under-provisioning penalties incurred were much higher than in the sinusoidal workloads, indicating that EC2 platform is not so adaptive to trac surges with high acceleration rate. Again, looking at the over-provisioning penalty graph, we observe large over-provisioning cost (around 0.43 /min) right after the peak is over; as some of the VM instances launched during the peak¢ load were available at the o-peak period, they just accrued more penalty due to over-provisioning with no significant reduction in under-provisioning penalty.

Unlike the above workloads, linear workload yields less overall penalty. This suggests that EC2 platform can easily cope up with workloads with lower and consistent growth. This is not surprising as the slowly growing workload pattern is not aected by the pro- visioning delay of the underlying platform and therefore incurs less under-provisioning penalty. However, we expect that as the slope becomes steeper, the overall penalty will show a rising trend, as the resources are not provisioned rapidly enough to cope up with the rising demand.

3.5.2 Exploring the impact of scaling rules

In our experiments with the widely-used ruleset 1, under-provisioning penalty dominates the overall score. Sometimes the system took too long in adjusting to rapid growth in demand. When demand drops, there is a tradeo: slow response increases the duration of overprovisioning charges, but it can help if an upswing follows that might reuse the retained 3.5. Case studies 97

Table 3.2: Penalty for Benchmarking Workloads - Ruleset 1

Workload Po(ts,te)/hr Pu(ts,te)/hr P (ts,te)/hr sine_30 27.51 374.88 402.39 sine_60 28.84¢ 133.65¢ 162.49¢ sine_90 22.17¢ 52.82 ¢ 74.99 ¢ sine_plateau_10 22.08¢ 554.44¢ 576.52¢ sine_plateau_40 18.52¢ 292.96¢ 311.48¢ sine_plateau_70 23.81¢ 174.19¢ 198.0 ¢ exp_18_2.25 24.83¢ 528.61¢ 553.44¢ exp_24_3.0 17.65¢ 1093.05¢ 1110.7¢ linear_240 35.01¢ 0.0 ¢ 35.01 ¢ random 29.31¢ 129.14¢ 158.45¢ ¢ ¢ ¢

Table 3.3: Penalty for Benchmarking Workloads - Ruleset 2

Workload Po(ts,te)/hr Pu(ts,te)/hr P (ts,te)/hr Ratio to Rule 1 sine_30 40.33 127.50 167.83 0.41 sine_60 38.49¢ 1.98 ¢ 40.47 ¢ 0.24 sine_90 38.03¢ 1.24¢ 39.27¢ 0.52 sine_plateau_10 33.94¢ 335.56¢ 369.5¢ 0.64 sine_plateau_40 32.27¢ 138.86¢ 171.13¢ 0.54 sine_plateau_70 33.24¢ 44.52 ¢ 77.76 ¢ 0.39 exp_18_2.25 33.27¢ 428.09¢ 461.36¢ 0.83 exp_24_3.0 60.62¢ 416.47¢ 477.09¢ 0.42 linear_240 39.83¢ 0.0 ¢ 39.83 ¢ 1.13 random 61.35¢ 35.13¢ 96.48¢ 0.60 Geometric Mean N/A¢ N/A¢ N/A¢ 0.52

Table 3.4: Penalty for Benchmarking Workloads - Ruleset 3

Workload Po(ts,te)/hr Pu(ts,te)/hr P (ts,te)/hr Ratio to Rule 1 sine_30 25.22 181.29 206.51 0.51 sine_60 33.78¢ 106.28¢ 140.06¢ 0.86 sine_90 60.23¢ 0.0 ¢ 60.23 ¢ 0.80 sine_plateau_10 22.68¢ 408.92¢ 431.6¢ 0.74 sine_plateau_40 20.93¢ 223.88¢ 244.81¢ 0.78 sine_plateau_70 21.97¢ 173.92¢ 195.89¢ 0.98 exp_18_2.25 27.0 ¢ 538.42¢ 565.42¢ 1.02 exp_24_3.0 37.76¢ 577.52¢ 615.28¢ 0.55 linear_240 15.68¢ 11.19 ¢ 26.87 ¢ 0.76 random 36.63¢ 108.75¢ 145.38¢ 0.91 Geometric Mean N/A¢ N/A ¢ N/A ¢ 0.77 98 The core framework

CPU Demand and Supply 700 demand 600 a/supply 500 c/supply 400 300 CPU (%) 200 100 0 0 20 40 60 80 100 120 140 160 Time (in mins)

Maximum Latency 70 60 50 40 30 20 10 0 (in Seconds) 0 20 40 60 80 100 120 140 160 Maximum Latency Time (in mins)

Request Counts 35000 30000 25000 20000 15000 10000 5000 0

No. of Requests 0 20 40 60 80 100 120 140 160 Time (in mins)

Penalty for Under-Provisioning 35 30 25 20 15 10 5 0 0 20 40 60 80 100 120 140 160 Penalty (cents/min) Time (in mins)

Penalty for Over-Provisioning 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 20 40 60 80 100 120 140 160 Penalty (cents/min) Time (in mins)

Figure 3.4: Elasticity behavior of EC2 platform with ruleset 1 for exponential workload with growth 24/hour and decay 3/hour resources. To improve elasticity, one can try changing the scaling rules so that they are aggressive in provisioning extra resources and conservative enough in deprovisioning those resources. The initial (Ruleset 1) and adjusted (Rulesets 2 and 3) scaling rulesets are shown in Table 3.1. Ruleset 3 is distinctive by making scale-out decisions based on the monitored values of the application level performance metric (response time) instead of considering a resource utilization metric. This performance-based approach has been adopted by some practitioners for autoscaling cloud applications6. Ruleset 3 explores the tradeos in these dierent approaches to scaling.

Ruleset 2 performed markedly better than ruleset 1. Two major factors contribute to

6See http://blog.tonns.org/2011/01/autoscaling-revisited.html 3.5. Case studies 99 this improvement in ruleset 2: adding multiple instances at each trigger in the upswing and lazy deprovisioning in the downswing. The ruleset 1 is less responsive to the rapidly increasing sinusoidal and exponential workloads because it only adds one instance at each rule trigger and there is a cooling period which stops it from immediately creating another instance (even if the condition is again met). On the other hand, ruleset 2 increases two instances at each trigger, thus it responds quicker to sharp workload increase.

On the downswing, ruleset 1 responds much quicker to the drop of demand by de- provisioning its resources. Though the trend of available supply follows closely with the demand, the chargeable supply does not follow as well: resources are being charged but not used. In contrast, as we intended, ruleset 2 (with an increased lower breach duration of 15 minutes and scale-in cool-down period of 10 minutes) keeps the resources from the previous upswing so that they are reused in the subsequent cycles of the workload demand.

This benefit from resource reuse holds as long as workloads come in periodic bursts and the inter-arrival time of bursts are short enough to retain some resources from the previous ones; otherwise, subsequent bursts will not be able to enjoy the resource reuse phenomenon as the resources are likely to be released by that time. For this reason, sine- plateau workloads could not make use of the resources from the previous cycle because of their long plateaus at the trough (45 minutes). Same holds for exponential workload with growth rate 18 and decay rate 2.25 per hour; it could not improve that much with ruleset 2 as the duration between the bursts is long enough to set the number of instances back to the initial state (1 EC2 instance).

Results for all workloads of our suite are shown in Table 3.3, which should be compared to Table 3.2. One clear disadvantage of ruleset 2 is that it is likely to overprovision too much in the case where workload does not increase quickly. This is reflected in the experiment with the linear workload pattern. The modest pace of growth in demand here means that ruleset 1 was sucient to align the resource supply with its resource demand. Ruleset 2 rather worsened the overall penalty by increasing the over-provisioning cost.

We observe a lower penalty score for ruleset 3 than for ruleset 1. Since the under- provisioning penalty is mostly dominated by high latency, we designed this ruleset to add an extra instance when the maximum latency goes beyond 75% of the allowed threshold. This ruleset triggers an instance provisioning request as soon as the observed latency starts rising due to request-buering at the server. The results show that this ruleset ensures higher SLO conformance and hence lower under-provisioning penalty for most of 100 The core framework the workloads. An increased lower-breach duration and scale-in cool-down period in the deprovisioning rule also promote resource reuse from previous cycles and thus contribute to the improvement in the penalty score.

CPU Demand and Supply

1600 demand 1400 a/supply c/supply 1200 1000 800

CPU (%) 600 400 200 0 0 50 100 150 200 250 Time (in mins)

Figure 3.5: Rippling eect for sinusoidal workload with 90 minutes period

The only shortcoming of ruleset 3 is that it sometimes results in excessive over- provisioning because of “rippling eect”. This ruleset assumes resource bottleneck as the only cause for latency violation and therefore provisions extra instances to improve the observed latency. However, there might be several other reasons for high latency even after an instance is available, for example, the warm-up period of the newly provisioned instance, request-queuing in other instances or problems in third party web service calls7. If the rule sets too small a “cool-down” period (how long after a rule is triggered till it can be triggered again), then provisioning requests for new instances might be triggered repeatedly based on latency violation information that has not yet reflected the earlier provisionings. Figure 3.5 demonstrates this rippling phenomenon for one workload.

We computed a single figure of merit based on the SPEC family of benchmarks as defined in Section 3.3.2. We used the platform with ruleset 1 as the reference when evaluating ruleset 2 and 3. The last column in Table 3.3 and 3.4 shows the ratio of the total penalties for the two rulesets with respect to ruleset 1 for each of the 10 workload patterns. All but one of these ratios are smaller than one, indicating that both rulesets 2 and 3 are generally more elastic than ruleset 1 for the benchmark workload patterns. We calculated the geometric mean of these ratios (0.52 for ruleset 2 and 0.77 for ruleset 3), which quantifies the improvement in elasticity. Thus we have demonstrated that our single figure for elasticity can be eectively used to compare dierent rule configurations. It can also be used to detect the dierence in the elasticity level between platforms from dierent cloud providers, as well as the variation over time within a cloud provider due to

7See http://aws-musings.com/choosing-the-right-metrics-for-autoscaling-your-ec2-cluster/ 3.6. Discussion 101 the consistently evolving underlying infrastructures of the cloud.

3.6 Discussion

Our case studies have given us insights into how a cloud platform can be better or worse at elasticity when following a varying workload. Identifying the importance of these char- acteristics should be of independent value to consumers who want to choose a platform, and they may also help a cloud provider oer better elasticity in her platform.

It is well understood that the granularity of instances is important in elasticity. If a substantial PC-like (virtual) machine is the smallest unit of increased resource, this is less elastic than a platform which can allow each customer to have whatever percentage of the cycles that they need, as in the case of Google App Engine [3]. Similarly, the time delay between a request for provisioning, until one can actually run on the new instance, can be significant. We observed that this delay varies unpredictably, but can be over 10 minutes. If the workload is increasing fast enough, by the time 10 minutes have elapsed, the previous configuration may have become badly overloaded. Ongoing changes in implementation by platform providers may lessen the provisioning delay, and thus improve observed elasticity. Finally, the delay to decommission an instance is also important. Being too slow to give up resources is wasteful, but being too eager can leave the consumer without resources if/when workload recovers to previous levels.

We have seen how important it is to understand the way the consumer is charged for resources. We have seen that this can be quite dierent from the actual access to those resources, and the dierence is important for the consumer’s perception of elastic behavior. When charging runs till the end of a substantial quantum (e.g., an hour for EC2), we can see financial losses from too rapid response to changed load. In particular, if the fluctuating load leads a consumer to give up an instance, and then they need to request it back, they may end up paying for it twice over.

Our experiments have shown how changes to the provisioning and deprovisioning rules can alter the elasticity of the platform. This seems to deserve much more attention from consumers, and we have not found useful guidelines in research or tutorial literature. In particular, many applications seem to follow sample code, and use a default policy where instances are created or given up based on utilization levels holding for a fairly short 102 The core framework time (e.g., 2 minutes). By being less eager to deprovision, we saw a dierent ruleset gave significant improvement (about 50% for ruleset 2) in the elasticity measure. We also observed better SLO conformance for rules based on QoS (i.e., latency-based ruleset 3) as compared to the utilization-based one (CPU-based ruleset 1); however, ruleset 3 is less reliable for autoscaling as it causes excessive over-provisioning when a QoS threshold is breached because of external factors (e.g., latency spike in other tiers or web services) instead of resource scarcity.

Running our benchmark has been informative for us. We now reflect directly on the advantages and disadvantages of the decisions we made in proposing this benchmark, that is, how exactly we decided to measure elasticity.

Having workloads with diverse patterns of growth and decline in demand is clearly essential. Those that rise rapidly (that is, fast compared to the provisioning delay in the platform) reveal many cases of poor elasticity. When demand declines and rises again, we see eects of charging quanta.

We followed the SPEC approach to combine information from several workloads into one number. It gives consistent relative scores no matter which platform is the reference [104]. It is very robust in that it does not change depending on the subtle choice of weights, nor on the scale chosen for each workload.

Our calculated penalty for overprovisioning is based on the charged level of resources, rather than on the resources that are actually allocated (as in Weinman’s discussion of elasticity [239]). We have seen that there can be a considerable dierence between these quantities. By this decision, we properly give a worse score for a system if it keeps charging over a longer quantum. Our penalty calculation for underprovision is based on observed QoS, and using consumer-supplied functions to convert each observation into the opportunity cost. We do not assume a constant impact of each unmet request. This clearly fits with widespread practice, where SLOs with penalty clauses are enshrined in contracts.

In Weinman’s discussion, each time period is penalized for either underprovisioning or overprovisioning, but not for both. We allow both penalties, for example, when utilization is around 95%, one has 5% of the capacity unused, but one may also see queuing delays that lead to poor QoS. For us, optimal elasticity is not when allocation equals demand, but rather when allocation is the least that still allows as many QoS conditions to be met 3.7. Critical reflection 103 as possible.

Overall, our approach supports informed decision-making of the consumer selecting a suitable cloud platform for her need.

3.7 Critical reflection

In this chapter, we have conceptualized a novel framework for elasticity evaluation of cloud platforms from the consumer’s viewpoint. We have also demonstrated the steps to instantiate an executable benchmark based on the proposed framework for e-commerce applications. Through an experimental case study conducted on AWS EC2, we have validated our framework for dierent adaptive scaling rulesets. Furthermore, we have pinpointed some anomalies in the adaptive scaling strategies frequently used in the tutorial examples and prescribed some workarounds for improved elasticity. There are, however, some risks associated with the work presented in this chapter, which are described as follows.

Risk: The relevance of the benchmarking results depends on the appropriate charac- • terization of the application domain, workload profile, business objectives, adaptive strategies and platform-specific pricing model.

We want to emphasize the fact that the elasticity metric derived from this framework is not simply a function of cloud elasticity, but a complex interplay between cloud elasticity and the consumer’s application and business goals. Therefore, the actual figure of merit will vary depending on the consumer’s application domain, SLOs, workload characteristics and adaptive scaling strategies. The cloud provider’s pricing scheme also exerts an important influence on the elasticity metric; apparently, fine- grained resource pricing (e.g., based on a smaller charging quanta, more granular resource supply, cheaper resource unit) would result in improved elasticity.

Risk: The standard workload suite does not allow consumers to reproduce customized • prototypes from traces.

It was, indeed, a valid criticism which raised concerns about the workload repre- sentation criterion of the core framework in the earlier stages of our research [236]. The standard workload suite describes various realistic use case scenarios based on 104 The core framework

a set of mathematical equations; however, the patterns of real-world workloads may sometimes lie outside the confines of these simple equations. For this reason, the consumer would prefer to have the flexibility to generate representative prototypes of their application-specific workloads based on traces. Note that the actual figure of merit is a complex function of the consumer’s application domain and workload pro- files; therefore, the consumer may want to carry out in-house elasticity benchmarking in response to custom prototypes of her own workloads for relevant benchmarking results. To address this criticism, we provide a novel workload model that reproduces realistic prototypes of the original workload in Chapter 4.

Risk: The framework does not provide any guidance for repeatable and valid bench- • marking results in the presence of cloud environment specific nondeterminisms.

Another concerning fact is that our core framework does not consider the perfor- mance unpredictability of the cloud platform while reporting the evaluation results. It has been pointed out by Folkerts et. al [105], the benchmarking results for cloud platforms may show some variation because of the random environmental factors (e.g., time-shared resources, over-commitment, ever-changing consumer load and its interference). This may pose a threat to the repeatability and fairness criteria of the benchmarking framework. In Chapter 5, we propose a set of rigorous techniques that ensures repeatable and valid elasticity benchmarking results, even in the presence of the performance unpredictability of the cloud environment.

Risk: Our empirical finding challenges conventional notion of elasticity. • The theoretical notion of elasticity considers either over-provision or under-provision at a particular timepoint, but not both. However, during our case study, we discov- ered situations where both over-provisioning and under-provisioning co-exist at the same timepoint. We observed a scenario when the platform has some slack resources (e.g., 5% CPU), however, the increased request rate causes a huge queuing eect at the application server, thereby adding up penalties for violating the response time threshold. As we have clarified earlier, optimal elasticity is not the condition when available resource equals demand; however, it is the condition when the allocated resource is at least adequate to gracefully meet the consumer’s SLO. Moreover, we encountered another situation when the system is severely under-provisioned, yet over-provisioning occurs for the excess payment for the resource which is not yet available due to spin-up delay. We believe we have provided convincing arguments 3.7. Critical reflection 105

and realistic evidence to support our statements.

Risk: This work does not provide any theoretical validation of the elasticity mea- • surement model.

Lack of theoretical validation is an apparent weakness of this work. Critics from the software metric validation community may, therefore, express their skepticism about the technical soundness and meaningfulness of the derived elasticity metric, such as whether it can properly conserve the consumer-perceived elasticity notion in the real world or not, its ability to reflect the empirical relations and representation conditions of elasticity and so on [89, 100]. Future research will concentrate on providing a rigorous theoretical validation of the elasticity measurement model.

Risk: Several factors may remain as threats to the validity of our empirical case • studies.

A number of factors may remain as threats to the validity of our empirical findings. It was not possible to eliminate the eect of cloud performance variability from the benchmarking setup; however, we infer that its influence was mitigated to some extent due to the diversity in our workload suite and appropriate choice of the SLO thresholds. Perhaps for this reason, the overall conclusions drawn from these studies remain consistent with the observations from other studies [157, 219, 114]. However, the percentage improvements or degradations reported in these case studies should be interpreted with caution, as those were reported from a single benchmark execution only. Future research should, therefore, focus on determining the uncertainty in those numbers. The external validity of our case study also got aected by a number of factors; for example, the TPC-W benchmark, deployment design, standard workload suite, specific tutorial rulesets used in the case study may not be representative of the real-world industrial practices. Due to the limited budget, it was also not feasible to check the generality of our conclusions for other cloud oerings (e.g., Azure [34, 14], Google Compute Engine [16]), VM instance types and scaling techniques. These risks are inherent to the case studies conducted in an academic setting; several actions can be taken to overcome such limitations, such as carrying out the case studies in the industrial context for increased representativeness, increasing the breadth of the study by exploring other cloud oerings, VM instance types and scaling techniques, and so on. 106 The core framework

Contributions and benefits

The contributions of this chapter are summarized as follows:

Contribution: A novel framework for evaluating the elasticity of competing cloud • oerings and adaptive scaling strategies from the consumer’s perspective.

Contribution: Guidance for instantiating an executable elasticity benchmark. • Some practitioners coined elasticity as “double-edged sword” [130]; it can make or break the consumer’s revenue and net profit. If the elasticity turns out to be imper- fect during the production phase when the application is exposed to the real user workload, it may not bring good news to the consumer. In the worst case, the con- sumer may experience both SLO violations and excessive operational expenditures far beyond the anticipated limit, thus feel frustrated because of the unexpected loss of revenue. This is the phase when the imperfections in elasticity would be the costli- est to fix. For this reason, analyzing and evaluating elasticity at the early phases of cloud adoption is considered to be an attractive alternative. However, the irony is that the research to date has paid too little attention to the development of an elas- ticity benchmarking framework that specifically takes into account the consumer’s perspective.

Unlike frameworks that evaluate only the technical aspects of elasticity (e.g., spin-up delay, spin-down delay and precision of allocated resource), ours evaluate elasticity in terms of the complex interaction of the technical aspects and the consumer’s business situation (e.g., business objectives, application and workload characteristics and the provider’s pricing scheme). Moreover, we have discussed concrete choices (e.g., a standard workload suite describing frequently observed use case scenarios, SLO constraints) for instantiating an executable elasticity benchmark based on our proposed framework. We have also described a compact listing of steps that a consumer has to follow to apply the executable benchmark for elasticity measurement of real cloud platforms. The single figure of merit provided by this benchmark helps the consumer compare and contrast the worthiness of alternative cloud services, adaptive scaling strategies and deployment configurations.

There are, however, several unresolved challenges that need to be dealt with to make this benchmarking framework more informative to the consumer. One particular 3.7. Critical reflection 107

concern relates to the evaluation of large and complex application types; hosting such applications into dierent cloud services is a tedious and time-consuming task. It also requires a significant amount of cloud-specific technical expertise, development eort and operational expenses. Therefore, the consumer may be tempted to port only a small representative test application to the cloud, gather benchmarking results for it and extrapolate the data for the actual production application. Now the question remains: how likely is it that the findings for a small (test) application would apply to large (production) application in case the consumer cannot aord to benchmark the large application on multiple clouds? Likewise, they may want to extrapolate the elasticity benchmarking results of small prototype workloads for larger actual workloads. Now the question is: how likely is it that the elasticity benchmarking results of the small test workload will equally hold for the actual workload? Future research will focus on finding suitable answers to these questions. Another research direction would be developing an elasticity planning framework that can help the consumer define a long-term strategic plan based on the anticipated probabilities of dierent workload types (i.e., even the non-fluctuating ones) and cloud provider’s discounted pricing schemes. And last but not least, we intend to develop a harness to automate elasticity benchmarking as much as possible and make it available to the consumer as a software package.

Contribution: A number of useful insights into the elasticity of the AWS EC2 • platform for alternative adaptive scaling policies.

Contribution: A set of prescriptions for improving the consumer-perceived elastic- • ity for the AWS EC2 platform in response to fluctuating workloads.

The case study conducted on AWS EC2 draws some valuable insights. First of all, it brings to our attention that there is a distinction between the technical elasticity measure and the consumer-perceived elasticity measure. According to the technical view, over-provisioning and under-provisioning are mutually exclusive conditions, whereas the consumer-perceived view of elasticity allows the co-existence of these conditions at the same timepoint. Therefore, a technical aspect oriented elasticity benchmark may mislead the consumer’s subjective judgement about the adaptability of the cloud platforms. Second, we have detected some anomalies in the widely-used adaptive scaling policies; some policies fail to release the excess resource (e.g., the trapping scenario) while some others cause rippling resource provisioning eect. The 108 The core framework

practitioners may consider these phenomena as antipatterns and design workarounds to get rid of such issues. Based on our observations, we have also prescribed several approaches to optimize the consumer-perceived elasticity, examples include, bulk- provisioning for fast-growing workloads, lazy deprovisioning for promoting resource- reuse etc.

There are a number of ways to extend this case study, such as: (1) investigating the elasticity behavior of other cloud services (e.g., Microsoft Azure [34, 14], Google Compute Engine [16] etc.) and extract insights and provide recommendations for eective elasticity in those platforms, (2) oering a set of platform-agnostic elas- ticity patterns and antipatterns to help the practitioners design eective adaptive strategies for their applications.

Further benefits of the work presented in this chapter are:

Benefit: The benchmarking framework oers a proactive means for early diagnosis • and treatment of imperfect elasticity.

The benchmarking framework facilitates early detection and treatment of elasticity issues for the consumer’s workload. The total penalty rate works as an indicator for imperfect elasticity; looking at the over-provisioning and under-provisioning penalty rates, the practitioner can identify possible areas for improvement. Plotting the penalty rates with respect to time helps her pinpoint the exact cause for imperfection and devise workarounds to fix it.

Benefit: The benchmarking framework can have important uses at dierent stages • of the cloud adoption lifecycle.

Our elasticity benchmarking framework can be used in various forms during the cloud adoption lifecycle. During the assessment and planning phases, it can guide the stakeholder’s strategic decision about cloud adoption in several ways. For instance, the financial penalty rates can help them understand whether the chosen application is a good candidate for hosting into the elastic cloud, the single-figured elasticity metric lets them compare alternative cloud services and pick the one best-suited to their needs. In the adoption and optimization phases, it can be used to guide several design/optimization decisions, such as choosing between dierent adaptive scaling strategies, configuration and tuning of scaling parameters for eective elasticity etc. 3.8. Conclusion 109

In future, we are interested in developing adaptive scaling policies based on the online measurement of the penalty rates in the recent past.

Benefit: The insights gained from the case study can help the cloud providers under- • stand the gap between the expected and the consumer-perceived elasticity and optimize their services accordingly.

In this chapter, we have presented some valuable insights into the elastic response of the EC2 platform for dierent adaptive scaling policies. We are also interested in extending our case study for other cloud services, e.g., Microsoft Azure [34, 14], Google Compute Engine [16] etc. Hope these insights and elasticity metrics will help the cloud providers better understand the consumer’s concerns and optimize the delivered elasticity of their services accordingly.

3.8 Conclusion

Small and medium enterprises are heading towards the cloud for many reasons, including varying workloads. To choose appropriately between platforms, a consumer of cloud ser- vices needs a way to measure the features that are important, one of which is the degree of elasticity of each platform. This chapter has oered a concrete proposal giving a numeric score for elasticity. We have suggested specific new ways to use SLOs to determine penal- ties for under-provisioning. We have defined a suite of workloads that show a range of patterns over time. We carried out several case studies showing that our approach is fea- sible, and that it leads to helpful insights into the elasticity properties of the platform. In particular, we have brought to attention the sharp distinction between resource elasticity and consumer-perceived elasticity; we have demonstrated several scenarios where resource elasticity frameworks fail to recognize the consumer’s detriments for imperfect elasticity, such as the deviation between chargeable supply and available supply, co-existence of over- and under-provisioning etc. Our framework, on the other hand, can reliably reflect the consumer’s perception of elasticity behavior in those scenarios, therefore, can be regarded as a more trustworthy indicator of the consumer-perceived elasticity.

However, the standard workload suite designed in this chapter is not sucient to represent the huge variety of workloads faced by the consumer’s application in the real- world. Failure to generate representative workload prototypes may aect the relevance of 110 The core framework the benchmarking results for custom workloads. To resolve this problem, the next chapter will describe a novel workload model for reproducing realistic prototypes of fine-scale bursty workloads, often observed in the context of web and e-commerce applications. Chapter 4

Customized benchmarking? Use fine-scale bursty prototypes

“A cloud is made of billows upon billows upon billows that look like clouds. As you come closer to a cloud you don’t get something smooth, but irregularities at a smaller scale.”

Benoit Mandelbrot

In order to get realistic benchmarking results, the cloud consumer needs to stress the elasticity of the cloud platform with representative workloads. A specific impediment to achieving this goal relates to the realistic modeling of fine-scale bursty workloads. This chapter presents a novel workload modeling method to reproduce the fine-scale bursty prototype so that it can mimic the empirical stylized facts of the original workload.

4.1 Introduction

A primary concern of elasticity benchmarking is the design of a representative workload suite. If the cloud platform is stressed with representative workloads, the elasticity bench- marking results prove to be realistic and relevant to the context of the cloud consumer (i.e., her application and workload profiles). The most straightforward way to address

111 Customized benchmarking? 112 Use fine-scale bursty prototypes the representativeness criterion is replaying the actual workload during benchmarking. However, this approach turns out to be very expensive (in terms of cloud usage costs, time and eort) when the amplitude and duration of the workload grow large (e.g., sea- sonal surges, flash crowds). An economically viable alternative, therefore, is to employ a workload model for reproducing realistic prototypes of the actual workload and use those prototypes for benchmarking elasticity of the cloud platform.

Ideally, the workload model should be able to preserve the salient features of the actual workload’s arrival process in the reproduced prototypes. Web and e-commerce workloads, however, exhibit a great deal of complexity in the request arrival process [179, 181, 229]: strong correlation or deterministic trend at the intermediate and coarse timescales (e.g., in the order of minutes) and severe oscillations or burstiness at the smaller timescales (e.g., in the order of seconds). This complex structure of the arrival process poses a significant challenge to the realistic modeling of web and e-commerce workloads. Our literature re- view reveals a considerable amount of eort spent on the modeling of the deterministic trend of the workload. Burstiness or severe oscillations at the finer timescales, neverthe- less, have not received much attention in the existing workload modeling literature, even though its presence in the arrival process induces detrimental eects on the performance and utilization in traditional client-server and virtualized systems [183, 247]. Lack of a fine-scale bursty workload model, therefore, presents a major obstacle for elasticity bench- marking of cloud platforms. Flawed modeling of the fine-scale bursty workloads impairs the representativeness criterion of the benchmark, which in turn aects the relevance of the benchmarking results and blurs the consumer’s subjective judgement while choosing an elastic platform for her application. In this chapter, we address this issue by intro- ducing a novel workload model that generates representative prototypes of the fine-scale bursty workload.

Example 4.1 Use case: Consider the following situation. An application provider has to choose the most elastic cloud platform for her online bookstore application anticipated to experience sudden surges during seasonal events, e.g., Christmas, New year etc. With a view to getting realistic benchmarking results, she plans to conduct an in-house elasticity benchmarking experiment for a number of cloud platforms based on the request access logs of last year’s seasonal events. To save cost and eort, she generates some workload prototypes from the access logs using a workload model that does not consider fine-scale burstiness. Now the question remains: will these benchmarking results help her predict 4.1. Introduction 113 the expected elasticity of her application or provide her a skewed view because of non- representative prototypes (i.e., due to the absence of fine-scale burstiness)?

The contribution of this chapter is two-fold.

First, we propose a novel methodology to model fine-scale bursty workloads so that • the time-dependent regularity structure (a measure of randomness over time) of the prototype is compatible with the original arrival process. One standard technique to characterize the regularity structure is the estimation of pointwise Holder exponents, a measure of randomness in the workload time series [38]. We advocate an approach to characterize the Holderian regularity from fine-scale fluctuations of the trace. The next step makes use of the regularity information and a user-defined standard deviation to synthesize fine-scale fluctuations from a multifractional Gaussian noise process. The deterministic trend is generated from a normalized shape function by performing rescaling and interpolation. The final step superimposes fine-scale burstiness on the deterministic trend to yield the desired workload. This method is robust and provides the user the flexibility to control the variance of fine-scale fluctuations while preserving the empirical stylized facts of the workload trace.

Second, we conduct a case study in EC2 to explore the elasticity behavior of an e- • commerce application in response to fine-scale burstiness. Our analysis reveals that fine-scale burstiness has significant implications for elasticity; it overwhelms the under-provisioning scenario with highly correlated peak arrivals near the saturation region and leads to over-provisioning with reduced utilization. Our investigation extracts out additional interesting insights about fine-scale burstiness, for instance, its influence on adaptive resource scaling and the trend in the elasticity penalty rate.

The remainder of the chapter is organized as follows. Section 4.2 defines the research problem and motivation behind this work. Section 4.3 provides a brief overview of the prevalent works on fine-scale bursty workloads. Section 4.4 reveals some statistical facts about fine-scale burstiness from the viewpoint of multifractal analysis. Section 4.5 defines our methodology to model fine-scale bursty workloads. Section 4.6 describes experimen- tal setup and Section 4.7 presents empirical observations about the impact of fine-scale burstiness on the elastic response of a cloud-based e-commerce application. Section 4.8 discusses potential risks and benefits associated with our workload model and case study, and Section 4.9 oers the conclusion. Customized benchmarking? 114 Use fine-scale bursty prototypes

4.2 Fine-scale burstiness: evidence and repercussions

The evaluation and design of elastic cloud platforms is a non-trivial exercise due to the complexity of the workload profiles. Eective elasticity can be achieved if and only if the application’s adaptivity is evaluated and optimized under realistic workload conditions. Hence, it is absolutely crucial to benchmark elasticity and design adaptive strategies under realistic reproducible prototypes of the actual workload. Fine-scale burstiness is an inher- ent characteristic of the web and e-commerce request arrival process [179, 181]. Despite its prevalence, it has not been addressed in the existing workload modeling and elasticity benchmarking studies. In this section, we first advocate the need for a novel methodology to model fine-scale burstiness so that the reproduced prototype resembles the empirical stylized facts of the actual workload. Next, we argue that ignoring fine-scale burstiness in the elasticity evaluation and adaptive scaling design bears the risk of skewing the overall picture and even leading to incorrect conclusions; therefore, it is important to take this factor into account during elasticity benchmarking and optimization.

4000

3500

3000

2500 Request arrival rate 2000 0 5000 10000 15000 20000 25000 30000 35000 40000 time, t (in seconds)

Figure 4.1: Wikipedia workload snippet (Oct 1 2007)

Fig. 4.1 demonstrates the presence of fine-scale burstiness in the request arrival process of a Wikipedia workload sample [227]. As we can see, the request arrival rate is not a smooth curve, instead, it consists of many fine-grained fluctuations. The width of the fluctuation band is very large (a rough estimate would be 200 1000 requests/second) ≠ which reflects high volatility in the request arrival process in consecutive intervals. As amplitude increases, the fluctuations tend to become more erratic in nature. A correlation coecient of 0.659 provides evidence of a strong relationship between amplitude and ≠ regularity behavior (i.e., the degree of randomness in the noisy fluctuations) over time for this workload sample (The detail on how to quantify regularity is elaborated in Section 4.2. Fine-scale burstiness: evidence and repercussions 115

4.5.1). In order to get a reliable estimate of the elasticity behavior, it is necessary to carry out benchmarking for a realistic reproducible prototype of the actual workload. Fine-scale burstiness, identified as a salient feature of the web and e-commerce workloads, should also be authentically resembled in the reproduced prototype. As yet, the literature does not provide any solution to the realistic modeling of workloads in which the empirical stylized facts about fine-scale burstiness is well-preserved. This is the gap that we address with a novel workload model based on a deterministic time-dependent regularity function.

Now we describe with an example how dierent assumptions of fine-scale burstiness result in dierent elasticity behavior. We illustrate this point based on the elastic response of TPC-W application [117] hosted on a dynamic web server farm in AWS EC2 cloud under two dierent workload assumptions, workload with no fine-scale burstiness versus workload with fine-scale burstiness. The details on how to reproduce a workload with fine- scale burstiness are provided in Section 4.5. In both experiments, we add an EC2 instance to the server farm when the average CPU utilization goes beyond 70% for 3 consecutive minutes and remove an instance when the average CPU utilization falls below 20% for 10 consecutive minutes using the Autoscaling API [7]. We assume each EC2 instance provides 100% of CPU supply per minute.

CPU Demand and Supply for Non-bursty workload 400 demand 350 c/supply 300 250 200 150 CPU (%) 100 50 0 0 20 40 60 80 100 120 140 160 180 Time (in mins) CPU Demand and Supply for Bursty workload 400 demand 350 c/supply 300 250 200 150 CPU (%) 100 50 0 0 20 40 60 80 100 120 140 160 180 Time (in mins)

Figure 4.2: EC2 platform’s elasticity behavior under non-bursty and bursty workloads

Fig. 4.2 shows the elasticity behavior of the CPU resource in response to non-bursty Customized benchmarking? 116 Use fine-scale bursty prototypes and bursty workload conditions in terms of demand and charged supply (c/supply). Recall from Chapter 3, demand means the amount of resource needed by the application to serve the requests with satisfactory performance whereas charged supply indicates the amount of resource for which the cloud platform charges the consumer. The cloud platform starts charging the consumer’s application at the very moment it requests for a new resource; however, the resource is available for use after some random delay (typically, several min- utes). For this reason, the demand function grows a few minutes later than the charged supply function in the above figure. During the scale-out period, the available resource capacity is scarce; therefore, the server farm may not be able to serve the peaks of the bursty arrivals with graceful QoS. The longer the scale-out period and the spikier the request arrivals, the more the degradation in the perceived QoS of the application.

In the above figure, we observe that these workloads influence the elasticity behavior of the EC2 server farm in dierent ways (even though the average scale-out delay is similar in both cases, which is around 75 seconds). When the arrival rate exhibits fine-scale burstiness, most of the CPU resource remains under-utilized, thus giving rise to high over- provisioning cost. We also notice that under bursty workload condition the application server had to stay longer in the under-provisioning state. Consequently, the application failed to serve requests with satisfactory QoS due to limited server resources. Table 4.1 shows the QoS degradation for both workloads because of under-provisioning. Under bursty workload condition, the application could not meet the response time constraint of 2 seconds for a significant portion of requests (10.82%), which is 5.49 times higher than the non-bursty workload condition (1.97%). Moreover, a fraction of requests (2.17%) remained unserved due to unavailability or timeout issue under bursty condition. The autoscaling ruleset, although appears to be quite sucient for the non-bursty workload, could not ensure good elastic response for the bursty workload. It caused a significant increase in the over-provisioning and under-provisioning for the fine-grained bursty workload. Therefore, we conclude that optimizing and evaluating elasticity under the assumption of non-bursty workloads reduces the productivity of cloud applications when the workload is bursty in practice. This example also justifies the reason behind our evaluation of elasticity for fine-scale burstiness and quantification of this impact on the monetary penalty from the consumer’s perspective. 4.3. Prevalent fine-scale burstiness studies 117

Table 4.1: QoS degradation for under-provisioning in non-bursty and bursty workloads

QoS Non-bursty Bursty Requests violating 2-seconds response time constraint 1.97% 10.82% Unserved requests 0.79% 2.17% 4.3 Prevalent fine-scale burstiness studies

A solid understanding of workload characteristics is required to ensure eective elasticity of the cloud-hosted applications. Web workloads have been extensively studied and char- acterized in many publications, such as [179, 181, 229]. These studies identified burstiness or high variability at the fine timescale and deterministic trend at the coarse timescale in the HTTP request arrival process.

Considerable eort has been spent on burstiness modeling and evaluation for tra- ditional client-server scenarios [244, 183, 124, 146]; some of these studies demonstrated deleterious eect of burstiness on the performance and utilization of the application. Nev- ertheless, the prototypes generated by these workload models cannot resemble the empir- ical stylized facts (e.g., regularity behavior) of the original arrival process. Xia et al. [244] proposed a workload model to generate request arrivals at the fine timescale based on exact self-similarity [56, 242] assumption and fractional Brownian motion (fBm). The roughness or regularity of an fBm process is the same at all timepoints [205]; therefore, the regularity of the reproduced workload using this model remains constant over time. This is not consistent with what we observe in actual workloads where the regularity be- havior varies erratically over time. The LIMBO toolkit [124, 146] generates cloud specific workloads based on four aspects: seasonality, trend, burst and noise. The noise component resembles fine-scale burstiness, similar to our notion. However, its modeling using random distribution produces many dierent realizations for workloads with fine-scale burstiness, most of which lack compatibility with the empirical stylized facts of the actual workload. An incompatible realization may provide a distorted view of the elasticity behavior and in the worst case, may surprise the cloud consumer with an unexpected loss of revenue. Mi et al. [183] developed a bursty workload model based on a Markov modulated process; it consists of two states, each generates requests with high and low think times respectively and the transition probabilities between states are governed by a burstiness parameter “Index of dispersion”. This approach, too, generates many incompatible realizations of the actual workload and cannot guarantee the preservation of empirical stylized facts in Customized benchmarking? 118 Use fine-scale bursty prototypes the reproduced prototype.

To sum up, the prevalent workload models fail to reproduce representative prototypes of fine-scale bursty workloads. Recall from Section 4.2, a flawed representation of fine- scale bursty workloads poses serious threat to the realistic elasticity benchmarking of cloud platforms. To resolve this issue, we have decided to design a workload modeling technique based on the fractal properties of the original request arrival process. Before diving into the detail about the workload model, it would be worthy to take a brief look at the basics of multifractal analysis, which we are going to present in the next section.

4.4 Multifractal analysis

The concept of “scaling” refers to the absence of any notable change in a finite sequence when observed at dierent timescales [205]. As a consequence of this phenomenon, the whole and its parts appear statistically identical to each other; processes with this property are known as exactly self-similar or scale-invariant. Fractional Brownian motion (fBm) is a widely accepted model to represent exactly self-similar processes. Most of the web and e-commerce workloads, however, are approximately self-similar [179, 181]; scaling holds only within a finite range of timescales and the scaling behavior at the fine timescales does not exactly resemble those at the coarse timescales. This type of process exhibits strongly correlated trend at the coarse timescales and noisy oscillations at the fine timescales. A global scaling exponent on asymptotic self-similarity is typically used to explain the strongly correlated trend feature at the limit of the coarsest scales. The sudden dips or spikes are described with respect to the local scaling exponent which is related to the degree of randomness in the timeseries. If the local scaling exponent varies with time, the process is referred to as multifractal. Wavelets are perfect tools for analyzing sudden discontinuities or sharp changes in the timeseries; hence, wavelet based analysis is often used to study the complex scaling phenomena in the timeseries.

To analyze the scaling behavior of the request arrival rate process R(t), we adopt the Wavelet Transform Modulus Maxima (WTMM) method. The wavelet transform (WT) decomposes the function R(t) into elementary space-scale contributions with an analyzing wavelet Â(t) by means of translations and dilations. Wavelet transform acts as a mi- croscope; it reveals more details on the local characteristics of a function while moving 4.4. Multifractal analysis 119 towards smaller timescales a. The partition function Z(q,a) is then computed by summing up the qth moment of the local maxima of the wavelet coecients at each timescale a.

The basic scaling hypothesis states that the partition function, Z(q,a), should behave as [188]:

Z(q,a) a·(q),a 0+ (4.1) ≥ æ

where, q R is the order of the moment, a is the timescale and ·(q) is the scaling œ exponent. Positive values of q accentuate the strong inhomogeneities in the timeseries whereas the negative values of q accentuate the smoothest ones. The slope of the log-log plot between the partition function Z(q,a) and timescale a yields the scaling exponent ·(q). If the ·(q) exponents are linear in q, the timeseries is monofractal or exactly self-similar; the slope of the ·(q) spectrum provides the constant global measure of self-similarity in the timeseries (also known as the Hurst or Holder exponent h). Otherwise, the timeseries is multifractal, which means that the scaling properties of the timeseries are inhomogeneous and varying with time; the local inhomogeneities are described in terms of local Holder exponents h(t). The Holder exponent h can be interpreted as a measure of randomness or regularity of the timeseries at a specific timepoint [38]. A formal definition of this exponent will be provided in Section 4.5.1. This exponent varies within the range (0,1); the higher the value of h, the less volatile and smoother the timeseries is. Depending on the value of h, the timeseries could be positively correlated (h>0.5), uncorrelated (h =0.5), or negatively correlated (h<0.5) [209].

Fig. 4.3 shows the log-log plot of Z(q,a) as a function of dierent timescales a for the Wikipedia workload shown in Section 4.2. In this plot, the slope of the partition function Z(q,a) changes with the timescale a, indicating the presence of multiple scaling exponents in the timeseries. Moreover, the slopes at the fine timescale are relatively lower than those at the coarse timescale; it implies that the scaling exponents (and Holder exponents h) are lower at the fine-scale region, thereby suggesting high variability in the request arrival rate. The non-linear scaling spectrum ·(q) versus q in Fig. 4.4 shows evidence for the multifractal behavior of the request arrival process at the fine timescale, which is also congruent with previous analyses [179, 181]. For the computations performed in this section, we acknowledge the use of Multifractal Toolbox [133].

Our modeling technique relies on the local scaling property of the multifractal process; Customized benchmarking? 120 Use fine-scale bursty prototypes

100 q = 0.5 q = 1.0 80 q = 1.5 q = 2.0 q = 2.5 60 q = 3.0 q = 3.5

(Z(q,a)) q = 4.0 2

log 40

20

0 0 5 10 15 log (a) (in seconds) 2

Figure 4.3: Partition function, Z(q,a) vs. Timescale, a

2

1.8

1.6 (q) τ 1.4

1.2

1

0.8 Scaling exponent, 0.6

0.4

0.2 0.5 1 1.5 2 2.5 3 3.5 4 Order of the moment, q

Figure 4.4: Scaling exponent, ·(q) vs. q 4.5. Modeling methodology 121

fine-scale burstiness is synthesized by exploiting the regularity structure of the arrival process, as directed by time-varying Holder exponents.

4.5 Modeling methodology

This section presents our approach to model workloads with fine-scale fluctuations. We assume that a workload consists of two components [179]: deterministic trend governed by the asymptotic self-similarity parameter H and highly variable noisy oscillations gov- erned by the Holder function h(t) and standard deviation ‡. We characterize how the deterministic trend and the noisy oscillations evolve with time with a shape function and a regularity function respectively. Later, workload prototypes are reproduced using these template functions. The standard deviation ‡ refers to the magnitude of fine-scale bursti- ness; the larger the ‡, the more bursty is the workload prototype at the fine-scale.

The following steps illustrate our methodology in detail.

4.5.1 Step 1: Characterization of pointwise regularity

The notion “regularity” is used to measure the degree of randomness in a finite se- quence [199]. A quantitative understanding about the regularity of a process can be obtained by estimating the Holder exponent h at each point. A function f(t) is Holderian with an exponent h (0,1) at point t, if there exists a constant c such that the following œ condition holds for all tÕ in the neighborhood of t [38]:

h f(t) f(tÕ) c t tÕ (4.2) | ≠ |Æ | ≠ |

One way to estimate regularity is the application of the Oscillation method, which determines the degree of abrupt fluctuations at each point with respect to its neighbor- hood [224]. This method restates the above condition as follows:

c, ·osc (t) c· h (4.3) ÷ ’ · Æ Customized benchmarking? 122 Use fine-scale bursty prototypes

where, osc· (t) is defined as,

osc· (t)=supt t · f(tÕ) inf t t · f(tÕ) | ≠ Õ|Æ ≠ | ≠ Õ|Æ (4.4) =supt ,t t ·,t+· f(tÕ) f(tÕÕ) Õ ÕÕœ| ≠ || ≠ |

The exponent h is then estimated as the slope of the regression between the logarithm of the oscillations osc· (t) and the logarithm of the size · of the neighborhood in which the oscillations are computed. The higher the value of the Holder exponent h, the more regular or smoother is the behavior at that point t. In practice, the self-similar or monofractal functions have the same Holder exponent for all t. In contrast, multifractal functions have a range of Holder exponents which vary over the values of t.

We characterize regularity as follows; first, the relative noise (t) is extracted out from the request arrival rate R(t):

(t)=R(t) R(t 1) (4.5) ≠ ≠

Next, the Holder exponents h(t) are estimated from (t) using an implementation of the Oscillation method discussed in [52]. The implementation makes use of several parameters: rmin, rmax and base, which define neighborhoods of dierent dimensions around t. The maximum absolute dierence in the oscillations in the neighborhood [t ≠ baser,t+ baser] is computed for all integer values of r [r ,r ]. Afterwards, the œ min max Holder exponent h at that point is estimated from the slope of the linear regression on the logarithm of the maximum absolute dierences in the oscillations with respect to the logarithm of the neighborhood sizes. This is repeated for all timepoints t to characterize the underlying regularity structure h(t) of the noisy oscillations.

4.5.2 Step 2: Synthesis of fine-scale burstiness

In this step, we reproduce the fine-scale burstiness of the original timeseries based on multifractional Gaussian noise (mGn), specified completely in terms of the Holder function h(t) and standard deviation ‡. Before diving into the detail, let’s rehearse some basic concepts on mGn. 4.5. Modeling methodology 123

A fractional Gaussian noise G =(Gt : t =1,2,...) is a zero-mean stationary Gaussian process with two parameters: a constant Holder exponent h (0,1) and variance ‡2 = œ var(Gt). The distribution of the fGn is characterized by its auto-covariances at intervals

· Z between discrete timepoints Gt [56]: œ

‡2 cov(·)= ( · +1 2h 2 · 2h + · 1 2h) (4.6) 2 | | ≠ | | | ≠ |

As we can see from Eq. 4.6, the distribution of the fGn exhibits long term correlation between discrete timepoints Gt, uniquely determined by the Holder exponent h.fGn is considered as the increment process of the self-similar timeseries, fractional Brownian motion (fBm). However, the constant h in its autocovariance structure implies the uniform nature of the oscillations, which poses a disadvantage in modeling the erratically changing oscillatory behavior of real-world phenomena. To address this limitation, multifractional Gaussian noises (mGn) have been introduced, where the constant Holder exponent h is generalized by a time-varying Holder function h(t) [198]. The distribution of the mGn is, therefore, non-stationary and varying with time.

Next, we turn to the synthesis of fine-scale burstiness. Suppose, our goal is to generate a workload with N request rates. We estimate the Holder exponents for these N points by carrying out cubic spline interpolation on the given Holder function. Later we sample each timepoint specific burstiness from an mGn process parameterized with h(n) and ‡.

4.5.3 Step 3: Trend construction and superposition

The next step is to construct the deterministic trend from a normalized shape function using rescaling and spline interpolation; however, it can be substituted by the existing curve-fitting and statistical modeling techniques for coarse-scale workloads, e.g., [62, 139, 124]. Note that we derive the shape function by exploiting the min-max scaling method on the average arrival rates at some coarse timescale a. The following step combines fine- scale burstiness with the deterministic trend using superposition (i.e., the point-by-point addition of two timeseries).

Algorithm 4.1 elaborates Step 2 and Step 3. It generates a workload of sample size N for a given standard deviation ‡ that resembles the underlying regularity structure of Customized benchmarking? 124 Use fine-scale bursty prototypes

Algorithm 4.1 Synthesis of workload with fine-scale burstiness

Input: NÛSample size ht Û Holder function from trace ‡ÛStandard deviation fÛShape function bÛLower bound uÛUpper bound Output: ⁄ÛWorkload with fine-scale burstiness Uses: getRandomSeed() Û generates random seed linspace(a, b, k) Û divides [a,b] into k linearly spaced points interpolateHolderExponents(h, v) Û interpolates holder exponents fBm(k, i, h, ‡, seed) Û generates an fBm of sample size k and returns the ith value rescaleAndInterpolate(f, k, b, u) Û rescales and interpolates f within the bound [b,u]

1: seed getRandomSeed() Ω 2: j linspace(0,1,N) Ω 3: h interpolateHolderExponents(h ,j) n Ω t 4: mGn(0) fBm(N,0,h (0),‡,seed) Ω n 5: for i 1 to N 1 do Ω ≠ 6: mGn(i) fBm(N,i,h (i),‡,seed) fBm(N,i 1,h (i),‡,seed) Ω n ≠ ≠ n 7: end for 8: X rescaleAndInterpolate(f,j,b,u) Ω 9: ⁄ X mGn Ω m 4.5. Modeling methodology 125 the trace at the fine timescale. At line 2, a row vector j is created with N linearly spaced points in the interval [0,1]. Line 3 performs cubic spline interpolation on ht to generate Holder exponents corresponding to the N points in vector j. Next, a multifractional

Gaussian noise process is synthesized based on interpolated Holder exponents hn.Atline 6, the noise mGn(i) for the ith point is produced by taking the sampled increment of the fBm parameterized with hn(i) and ‡.

One of the most popular approaches for synthesizing fBm is the Random Midpoint Displacement algorithm [158]. To generate an fBm process Z(t) in an interval [0,T],we start out by setting the values Z(0) = 0 and sampling Z(T ) from a Gaussian distribution with mean 0 and variance ‡2.Inthenextstep,Z(T/2) is constructed as the average of the values of the two endpoints plus an oset which is a Gaussian random variable with 2 variance ‡1: T Z(0) + Z(T ) Z( )= + Gauss(0,‡2) (4.7) 2 2 1 where ‡1 is computed as follows:

H H ‡ = ‡2≠ 1 22 2 (4.8) 1 ≠ ≠ 

In the following step, the two intervals [0,T/2] and [T/2,T] are further sub-divided and the midpoints are computed as the average of the values of the two endpoints plus H a Gaussian random oset whose standard deviation is 2≠ times the previous level’s standard deviation. This process continues recursively until the maximum number of levels is reached.

The next step of our algorithm constructs the trend part by rescaling the normalized shape function f bounded by the range [b,u] and then interpolating the datapoints from the rescaled function, as shown in Line 8. And finally, the fine-scale fluctuations and the trend are combined using superposition in Line 9. We acknowledge the use of Fraclab [134] routines for the computations performed in this section.

4.5.4 Working example

We illustrate our methodology with a 12-hour trace segment of Wikipedia workload start- ing from 11:20:00 PM Oct 1, 2007 and viewed at per-second resolution. We use the Customized benchmarking? 126 Use fine-scale bursty prototypes Wikipedia traces because these are the most recent publicly available real workload traces for web-scale applications. At first, the regularity exponents are estimated from the noisy oscillations using the method outlined in Section 4.5.1. We keep the neighborhood size small so that the abrupt irregularity in the noisy oscillations does not get smoothed out.

The parameters for estimating the neighborhood are: base =2.1, rmin =1and rmax =2.

Next, Algorithm 4.1 is executed to generate a sample of N = 10800 points whose fine- scale burstiness has a standard deviation ‡ and obeys the regularity structure h(t).It rescales the shape function f within the range [100,650] and performs spline interpolation on the rescaled function to produce the trend part. It should be noted that the shape function is derived from the min-max scaling or normalization of the average arrival rates at a = 300 seconds. The final step of the algorithm combines both the trend and the fine-scale burstiness to yield the desired workload.

Regularity Behavior 0.25

0.2

0.15

0.1 Holder exponent, h 0.05 0 2000 4000 6000 8000 10000 time, t (in seconds) Trend component 600 550 500 450 400 350 300 Arrival rate, X 250 200 0 2000 4000 6000 8000 10000 time, t (in seconds) Workload with fine-scale burstiness, σ=50 600 550 500 450 400 350 Arrival rate, λ 300 250 0 2000 4000 6000 8000 10000 time, t (in seconds)

Figure 4.5: Holder function, deterministic trend and generated fine-scale bursty prototype

The Holder exponents h(n),trendX(n) and the resulting workload ⁄(n) are shown in Fig. 4.5. The Holder exponents vary within the range [0.065451,0.23649]. The fluctuation band gets thick around the trend curve because of negatively correlated noisy oscillations [h(n) < 0.5]. The dips of h(n) are assumed to bear high risk as they induce abrupt 4.5. Modeling methodology 127

Workload 400

350

300

250

200

Request arrival rate 150 0 2000 4000 6000 8000 10000 time, t (in seconds) Regularity Behavior 0.25

0.2

0.15

0.1 Holder exponent, h 0.05 0 2000 4000 6000 8000 10000 time, t (in seconds)

Figure 4.6: Workload reproduced using LIMBO toolkit irregularity in the arrival rates. Also, the declining tendency of the dips with regard to amplitude suggests increased degree of fluctuations at higher amplitudes (so the events of very high arrivals will be followed by events of very low arrivals). The correlation coecient between amplitude and regularity for this workload prototype is 0.617 which is close to ≠ that of the original workload ( 0.659). It also preserves the time-dependent stylized facts ≠ of the original arrival process (as the fine-scale noises are generated using the regularity function of the actual workload). It is important to ascertain the cloud application’s elastic response to the abrupt change in fluctuation band. It is also necessary to understand the eect of fine-scale burstiness on adaptive resource scaling; when its presence can be abstracted out and when it would matter.

4.5.5 Comparison with other methods

In this section, we demonstrate with an example how existing approaches fail to repro- duce empirical stylized facts in the generated workload. Fig. 4.6 shows a realization of the Wikipedia workload generated using the LIMBO toolkit [124, 146]. Looking at this figure, we observe that this prototype workload can mimic the coarse-scale structure of the actual workload. However, the Holder function estimated from this prototype does not resemble the regularity behavior of the original arrival process (compare with Fig. 4.5). The correlation between amplitude and regularity for this prototype workload is 0.52 ≠ which implies significantly weaker correlation than what is observed in the actual workload Customized benchmarking? 128 Use fine-scale bursty prototypes ( 0.659). The prototype generated by our workload model, in contrast, closely follows the ≠ time-dependent regularity structure and approximately preserves the correlation between amplitude and regularity of the original arrival process ( 0.617). We have already dis- ≠ cussed in Section 4.2 how dierent assumptions of fine-scale burstiness result in dierent elasticity behavior. Therefore, it is vitally important to evaluate adaptive cloud systems in response to realistic bursty workloads to ensure eective elasticity in production.

4.6 Experimental setup

In this chapter, we also present a case study that reports on the elasticity implications of fine-scale burstiness for an e-commerce application, such as TPC-W that emulates the complex user interactions of a Business to Consumer (B2C) website [117]. We host the online bookshop implementation of the TPC-W application on a server farm of EC2 m1.small instances with 32-bit architecture. This instance type enables the consumer’s application to scale in small, low-cost increments with the varying workload intensity; for this reason, it is frequently used as the basic building block for autoscaling the server farm [39]. An EC2 m1.small instance provides the equivalent CPU capacity of a 1.0 1.2 GHz ≠ 2007 Opteron or 2007 Xeon processor, 1.7 GB of memory and 160 GB of local instance storage [25]. We host both the web server and the application server on the same EC2 instance; the web server directs all dynamic requests to the application server, therefore the utilization of the EC2 instance depends mostly on the service rate of the application server. Instead of keeping the images in the same EC2 instance, we deploy them in Simple Storage Service (S3) [5]. In this experiment, our prime concern is the elasticity implications of the application server in response to fine-scale burstiness; hence, we host the back-end database into an m1.xlarge Relational Database Service (RDS) [4] MySQL instance so that it does not become a bottleneck tier and its influence on the end-to-end performance is negligible.

The web server farm resides behind an internet-facing load-balancer [6]. The number of EC2 instances of the server farm is dynamically adapted in response to the workload demand using Autoscaling API. All experiments start with only one EC2 instance in the server farm; the capacity of the server farm is adapted in response to the workload inten- sity based on the following ruleset: increase the server farm capacity by one instance when 4.7. Case study 129 the average CPU utilization goes beyond 70% for 3 consecutive minutes and decrease the capacity by one instance when the average CPU utilization goes below 20% for 10 consec- utive minutes. The maximum number of EC2 instances used at the peak varied between 3 and 4 across dierent runs of the experiment because of the performance variability and heterogeneity issues in the cloud. We host the server farm on a single availability zone for these experiments.

Instead of using the TPC-W workload generator, we used Jmeter1 to specify our predefined workload pattern. JMeter provides a convenient interface for generating an open workload and varying the number of concurrent Requests Per Second (RPS) over time using XML script. A user session is defined in terms of browsing-specific requests; each user lands on a homepage, then searches for new books in a random genre and then browses the detail of a randomly selected item.

To compute the penalties, we adhere to the concrete SLO choices made by the ex- ecutable benchmark in Chapter 3. We expect that at least 95% of all requests will see a response within 2 seconds and there would not be any service disruption. Otherwise, a financial penalty of 12.5 will apply for each additional 1% of requests for which the 2-second threshold is breached.¢ Furthermore, for each 1% of dropped requests, a financial penalty of 10 is charged. The SLO evaluation period is considered to be an hour. The pricing of m1.small¢ instance is 4.4 per instance-hour (according to the revised on-demand pricing model in 2015), which is¢ used to compute the over-provisioning penalty for the CPU resource.

4.7 Case study

In this section, we report our findings on the impact of fine-scale burstiness on the elasticity of cloud applications. The purpose of this analysis is two-fold. First, we seek to investigate the impact of fine-scale burstiness on elasticity penalty by measuring its eect on over- provisioning and under-provisioning with respect to a reference smooth workload (i.e., the workload with no fine-scale burstiness). We also aim to determine how fine-scale burstiness influences adaptive resource scaling. Next, we conduct sensitivity analysis to explore the relationship between fine-scale burstiness and elasticity penalty. Table 4.2

1jmeter.apache.org/ Customized benchmarking? 130 Use fine-scale bursty prototypes shows the results of our case study. The execution time of each workload is 190 minutes where the first 5 minutes and last 5 minutes are warmup and cooldown periods respectively, and have been omitted in the analysis. The penalty rates, reported in this table, represent the average of three runs of a workload. The elasticity behaviors of all fine-scale bursty workloads can be found in Appendix C.

Table 4.2: Elasticity penalty for fine-scale burstiness

Workload Pover/hr Punder/hr Ptotal/hr smooth (‡ =0) 8.70 2.04 10.74 sigma50 (‡ = 50) 10.24¢ 8.87¢ 19.11¢ sigma100 (‡ = 100) 10.20¢ 10.21¢ 20.41¢ sigma150 (‡ = 150) 10.08¢ 17.08¢ 27.16¢ sigma200 (‡ = 200) 9.77 ¢ 14.04¢ 23.81¢ sigma250 (‡ = 250) 10.62¢ 23.59¢ 34.21¢ ¢ ¢ ¢

4.7.1 Eect of fine-scale burstiness on elasticity

Fig. 4.7 depicts the elasticity behavior of a workload with fine-scale burstiness (sigma150) as compared to the smooth workload. The topmost plot shows the CPU demand and charged supply over time. Note that the scale of the time axis is in minutes; we could not sample data at the fine granularity of seconds due to the limitation of the CloudWatch API [8]. The sigma150 workload’s CPU demand exhibits high variability; it is lightly loaded most of the time, except at the events of abrupt surges. In contrast, the smooth workload’s CPU usage varies steadily; its average CPU consumption is also higher. For this reason, we observe lower over-provisioning penalty for the smooth workload.

The empirical probability density of the average CPU utilization for both workloads is shown in Fig. 4.8. If the request arrival process does not exhibit fine-scale burstiness, then there is a large mass clustered around 55% 70% in the distribution of CPU utilizations. In ≠ contrast, the distribution of CPU utilizations appears to be extreme for the sigma150; the distribution is bimodal, one centering around 30% and the other one at 100%. Therefore, it is apparent that the EC2 server farm spent an increased fraction of time in the over- provisioned state while serving the sigma150 workload and incurred excess payment for idle resource capacity. The other mode at 100% indicates the fraction of time in the under-provisioned state (waiting for the scale-out rule to trigger as the CPU demand had an abrupt surge) which is almost 2 times higher than the smooth workload; during this 4.7. Case study 131 180 180 180 180 180 180 160 160 160 160 160 160 140 140 140 140 140 140 120 120 120 120 120 120 100 100 100 100 100 100 80 80 80 80 80 80 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Percentile Response Time CPU Demand and Supply th 60 60 60 60 60 60 Penalty for Over-Provisioning ELB Maximum Queue Length Penalty for Under-Provisioning 95 40 40 40 40 40 40 (b) case study: sigma150 workload 20 20 20 20 20 20 served rejected demand c/supply 0 0 0 0 0 0

0 0 0 0 5 0 0

50 25 20 15 (cents/min) 10 Penalty

CPU (%) CPU 0.2 0.1

400 350 300 250 200 150 100 800 600 400 200

Penalty (cents/min) Penalty Queue Length Queue 0.25 0.15 0.05

1000 5000 8000 6000 4000 2000 Response Time (ms) Time Response 35000 30000 25000 20000 Requests 15000 of 10000 Sum 10000 180 180 180 180 180 180 160 160 160 160 160 160 140 140 140 140 140 140 120 120 120 120 120 120 100 100 100 100 100 100 80 80 80 80 80 80 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Percentile Response Time CPU Demand and Supply th 60 60 60 60 60 60 Penalty for Over-Provisioning ELB Maximum Queue Length Penalty for Under-Provisioning 95 Figure 4.7: EC2 platform’s elasticity behavior for non-bursty and bursty workloads (smooth vs. sigma150) 40 40 40 40 40 40 (a) case study: smooth workload 20 20 20 20 20 20 served rejected demand c/supply 0 0 0 0 0 0

0 0 0 0 5 0 0

50 25 20 15 (cents/min) 10 Penalty

CPU (%) CPU 0.2 0.1

400 350 300 250 200 150 100 800 600 400 200

Penalty (cents/min) Penalty Queue Length Queue 0.25 0.15 0.05

1000 5000 8000 6000 4000 2000 Response Time (ms) Time Response 35000 30000 25000 20000 Requests 15000 of 10000 Sum 10000 Customized benchmarking? 132 Use fine-scale bursty prototypes interval, an increased percentage of requests were served with very high response time and some requests were even dropped or timed out, thus accruing financial penalty because of SLO violations. Therefore, we see that fine-scale burstiness significantly deteriorates the elasticity of cloud-hosted applications. An adaptive scaling strategy, which is optimized and evaluated under non-bursty assumption, fails to guarantee eective elasticity when the actual workload has fine-scale burstiness.

0.15 0.15

0.1 0.1

2 2 probability 5 10≠ probability 5 10≠ · ·

0 0 0 20 40 60 80 100 0 20 40 60 80 100 avg cpu utilization (%) avg cpu utilization (%) (a) smooth workload (b) sigma150 workload

Figure 4.8: Estimated probability of average cpu utilization

Next, we focus on the positioning of the scale-out points in the CPU graph (see Fig. 4.7). The first scale-out points for both workloads are relatively close (only 3 minutes apart); however, the second scale-out points appear to be quite apart (28 minutes apart). After closely inspecting the Holder exponents and the deterministic trend in Fig. 4.5, we infer that fine-scale burstiness influences adaptive resource scaling. Initially, the determin- istic trend had steep growth as compared to the fine-scale fluctuations (up to 1640 second) and became the dominant factor in adaptive scaling. This is why the first scale-out for both workloads occurred almost at the same time. Afterwards, the Holder exponents started to decline and caused increased variability in the request arrival rate. The application server served the fluctuating arrivals with a small thread pool (363 threads) as compared to that of the smooth workload (644 threads). Also, the concurrency level of the active pool got balanced by alternating episodes of high and low arrival rates (see Fig. 4.9 - note that the time-axis is in seconds2). This balancing-out eect reduced the CPU demand to some extent and delayed the provisioning of the third EC2 instance for the sigma150 workload.

We now turn our attention to the under-provisioning scenario. Workloads with fine-

2We collected these data using Command-line JMX Client (crawler.archive.org/cmdline-jmxclient/) 4.7. Case study 133 10000 10000 10000 8000 8000 8000 6000 6000 6000 Queue Size Activepool Size Threadpool Size Time (in seconds) Time (in seconds) Time (in seconds) 4000 4000 4000 (b) case study: sigma150 workload 2000 2000 2000 threads 0 0 0

0 0 0

50

800 600 400 200 800 600 400 200 250 200 Size 150 100 Queue Active Count Active 1800 1600 1400 1200 Count 1000 Thread 1200 1000 10000 10000 10000 8000 8000 8000 6000 6000 6000 Queue Size Activepool Size Threadpool Size Time (in seconds) Time (in seconds) Time (in seconds) 4000 4000 4000 (a) case study: smooth workload 2000 2000 2000 Figure 4.9: Behavior of the Tomcat application server under non-bursty and bursty workloads (smooth vs. sigma150) threads 0 0 0

0 0 0

50

800 600 400 200 800 600 400 200 250 200 Size 150 100 Queue Active Count Active 1800 1600 1400 1200 Count 1000 Thread 1200 1000 Customized benchmarking? 134 Use fine-scale bursty prototypes scale burstiness lead to a higher under-provisioning penalty. The reason can be explained with respect to the abrupt surge of correlated request arrivals and its prolonged stay in the saturation regions for the sigma150 workload. This is quite surprising; because the input workload does not have any surge of several minutes, only transient fluctuations around the deterministic trend. This phenomenon can be explained with respect to the saturation region near the second scale-out point. A close inspection of the traces and the Holder graph reveals that at around 2800 second (47th minute in the CPU graph of Fig. 4.7), the dips of the Holder exponents hit very low (around 0.065); this caused high variability in the request arrival rate. This variability provoked the application server to work at its maximum capacity with an abrupt sharp increase in the thread pool, active pool and queue (see Fig. 4.9). At some point, the application server could not increase its service capacity enough to handle the episodes of very high arrival rate; so only a small fraction of the requests got processed immediately and the rest were kept waiting in the queue. When the queue got filled up, the application started rejecting requests with HTTP 503 error indicating service unavailability. Users whose requests got rejected and timed out in the queue, retried at a later timepoint; this is how a huge amount of requests got piled up during that interval and magnified the under-provisioning scenario. The RequestCount graph in Fig. 4.7(b) supports our conjecture, indicating the presence of abrupt surge in the application’s served throughput. Most of the requests during this interval experienced very high response time because of queuing eect at the application server and the ELB; the long tails in the 95th percentile response time graph is a direct reflection of this overloaded situation. The fraction of the requests, which found the service unavailable, contributed to the under-provisioning penalty even more. Fig. 4.10 depicts the overall response time percentiles for both bursty and non-bursty workloads; we see that the higher percentiles of the sigma150 pass through slower response time values (only 90.7% requests were served within the 2-second threshold). It indicates that the presence of fine-scale burstiness induces a deleterious eect on the response time of a significant portion of overall requests. Moreover, the autoscaling ruleset, configured at coarse-scale granularity (in the order of minutes), could not ensure eective elasticity in response to fine-scale bursty arrivals. This phenomenon suggests the need for fine-grained monitoring metrics and more agile adaptive policies to guarantee eective elasticity under fine-scale burstiness, such as [122, 214, 195]. Therefore, we conclude that fine-scale burstiness has significant implications for under-provisioning; it may give rise to sudden large correlated surges that in turn cause long response time tails and high request defection rate, if not 4.7. Case study 135 properly handled.

10000 smooth sigma150 8000 sigma250

6000

4000

2000

Percentile value (in ms) 0 0 20 40 60 80 100 Percentile

Figure 4.10: Response time percentiles

Random variability

A few remarks on the performance variability eect. We observed the elasticity behav- ior for each workload in three dierent randomized experimental conditions by varying parameters, such as time-of-the-day, availability zone etc. We noticed extreme CPU de- mand pattern for fine-scale bursty workloads in all experimental conditions. The under- provisioning penalty deteriorated for the fine-scale bursty workloads in all runs, however, the extent of degradation varied with the scale-out delay of the EC2 platform. Because of their highly volatile demand pattern and abruptly correlated request arrivals at the re- source saturation region, the elasticity of the fine-scale bursty workload exhibits increased susceptibility to the random scaling delay of the cloud platform.

4.7.2 Trends in the elasticity penalty rate

Fig. 4.11 illustrates the trends in the elasticity penalty rates as a function of the magnitude of the fine-scale burstiness ‡.

The over-provisioning penalty rate remains almost stable, perhaps because of the cheap price and resource heterogeneity of the cloud VMs. However, the under-provisioning penalty rate follows an upward trend. We also observe that the under-provisioning penalty rate stays lower than that of over-provisioning up until ‡ = 100. After that point, the under-provisioning penalty grows beyond its over-provisioning counterpart and starts dom- inating the total penalty. Since the over-provisioning penalty rate stays almost constant, Customized benchmarking? 136 Use fine-scale bursty prototypes

35 Pover 30 Punder Ptotal 25 20 15 10 5 0 Penalty Rate (cents / hour) 0 50 100 150 200 250 Standard Deviation,σ

Figure 4.11: Trends in elasticity penalty rates the total penalty rate mimics the trend of the under-provisioning penalty curve. There- fore, we conclude that adaptive scaling strategies designed under non-bursty assumption perform worse in bursty condition; the degradation in elasticity penalty is proportional to the magnitude of the fine-scale burstiness in the arrival stream.

4.7.3 Summary

In this section, we have examined the impact of fine-scale burstiness on the elasticity of the cloud platform. The evidence presented thus far demonstrates the detrimental eect of fine-scale burstiness on over-provisioning and under-provisioning states of the server farm. Overall, this case study supports the view that appropriate prototyping of the fine- scale bursty workload is crucial to ensure realistic and meaningful elasticity benchmarking results. Our findings from this section are summarized as follows:

Fine-scale burstiness causes increased queuing eect and request defection rate, thus • degrades the under-provisioning scenario with increased SLO violations. It also causes reduced resource utilization at the application server, thus results in a high over-provisioning penalty.

Both fine-scale burstiness and deterministic trend have influences on adaptive re- • source scaling. Fine-scale burstiness causes an abrupt surge in resource demand near the saturation region which severely overloads the server farm and aggravates the under-provisioning situation.

Elasticity penalty rate tends to get higher with the increase in fine-scale bursti- • ness. There exists a cut-o threshold beyond which the under-provisioning penalty dominates the total penalty. 4.8. Critical reflection 137

4.8 Critical reflection

In this chapter, we have presented a novel workload model to generate representative fine- scale bursty prototypes that preserve the empirical stylized facts of the original arrival process. Through an experimental case study, we have also extracted insights about the impact of fine-scale burstiness on the perceived elasticity of the cloud consumer. There are, however, several risks associated with our work, which are discussed as follows:

Risk: The workload model needs to be refined further to accommodate extreme fluc- • tuations.

We have demonstrated in Section 4.5.5 that our workload model can closely ap- proximate the empirical stylized facts of fine-scale burstiness in the original arrival process. Nevertheless, an important limitation lies in its inability to model the in- tensity of an extreme fluctuation (refer to the sudden periodic dips in the original Wikipedia workload). A close inspection of the prototype generated by our model reveals the presence of an abrupt fluctuation at the same relative position, however, its magnitude is not large enough to authentically represent those sudden dips in the original workload. This problem usually occurs when the extreme fluctuation is lonely, that is, it does not have similar type neighbors surrounding it. We consider it as a minor issue because of several reasons; first, these extreme fluctuations seem to be an artefact of the Wikipedia workload we used (note that these extreme dips occur at regular intervals and no extreme spikes were found in that workload). We have not found any other workload (e.g., FIFA World Cup 1998) with such extreme and lonely fluctuations. Moreover, a single extreme fluctuation, especially if it is a dip, is not supposed to have much influence on the elasticity behavior of the platform. Nevertheless, we will explore other workload traces to evaluate the eectiveness of our workload model. If these extreme fluctuations appear as a typical aspect of the workload, we will refine our model in future.

Risk: The generation of realistic prototypes includes a number of steps, some of • which may seem to be rather complex.

Our workload modeling method includes some complex steps, for instance, the identi- fication of pointwise Holder exponents and generation of fractional Brownian motion (fBm). Although we have described each step in detail with a working example (Sec- Customized benchmarking? 138 Use fine-scale bursty prototypes tion 4.5.4), cloud consumers may still consider its implementation as an additional overhead. In future, we plan to come up with a user-friendly tool to simplify the prototype generation process at the cloud consumer’s end.

Risk: The results reported in the case study should be interpreted with caution be- • cause of the presence of bias in the elasticity responses.

The elasticity of the cloud platform shows some variation in dierent runs of the same workload because of the presence of nondeterminism (e.g., random scaling delay). Since we conducted the case study with m1.small EC2 instances, possible interfer- ence eect (e.g., cache contention) from a neighboring VM cannot be ruled out too. Further research is, therefore, recommended to explore the elasticity implications of fine-scale burstiness for other instance types (e.g., m1.medium) and cloud services.

Contributions and benefits

The contributions of this chapter are recapitulated as follows:

Contribution: A novel workload model for generating representative prototypes of • the fine-scale bursty workload.

Web and e-commerce workloads are bursty at small timescales. To date, there has been no reliable evidence of a workload model that can reproduce realistic prototypes of the fine-scale bursty workload. Lack of a representative workload model poses a serious obstacle to carrying out realistic elasticity benchmarking in a cost-eective manner.

We fill the gap in the existing literature by proposing a novel workload model so that the intrinsic stylized facts of fine-scale burstiness are appropriately reflected in the reproduced prototype. The intuitive approach relies on the time-dependent regularity structure of the noisy oscillations h(t) and a user-defined standard devi- ation ‡ - both of these can be easily derived from the workload trace. Statistical validation confirms the ecacy of our model to resemble fine-scale burstiness with a good deal of accuracy. Overall, our model resolves an important problem in the area of workload modeling. It also proves to be particularly valuable to realistic and cost-eective benchmarking, especially for the cloud platforms. 4.8. Critical reflection 139

There are numerous avenues that can be explored in the future, such as (1) enhancing the workload model to accommodate extreme fluctuations, (2) validating its ecacy in modeling a wide variety of web and e-commerce workloads, and (3) developing a user-friendly prototype generation tool for the cloud consumer.

Contribution: Increased awareness about the impact of fine-scale burstiness on • elasticity.

Contribution: A thorough empirical analysis on the interaction of fine-scale bursti- • ness and elasticity.

Up to now, no detailed investigation has been carried out to understand the asso- ciation between fine-scale burstiness and elasticity of the cloud platform. Without a sound understanding of the workload characteristics, it is very dicult to de- sign eective adaptive provisioning mechanisms for cloud-hosted applications. In this regard, our case study makes several noteworthy contributions; first, it demon- strates the concerning eect of fine-scale burstiness on elasticity, thereby developing increased awareness to consider realistic workload prototypes during elasticity eval- uation and benchmarking. Second, it points out that the current granularity (i.e., coarse-scale - in the order of minutes) of the monitoring and scaling APIs is not adequate to achieve good elasticity under the fine-scale bursty workload condition; therefore, it leaves a message for the cloud service providers for oering fine-grained monitoring and scaling to the cloud consumers. Based on the fine-grained informa- tion, cloud consumers may design online prediction based adaptive policies to ensure eective elasticity for their fine-scale bursty workloads.

In terms of directions for future research, further investigation could be carried out to generalize our hypotheses for a wide variety of instance types and cloud services. Another possible extension might be the development of online prediction based adaptive mechanisms to ensure eective elasticity under fine-scale bursty workloads.

Additional benefits of the work presented in this chapter are as follows:

Benefit: The workload model fosters realistic and cost-eective elasticity bench- • marking for cloud platforms.

Realistic elasticity evaluation and benchmarking require representative prototypes of actual workloads. Our workload model has demonstrated its eectiveness in mod- eling real-world fine-scale bursty workloads, thereby promoting realistic elasticity Customized benchmarking? 140 Use fine-scale bursty prototypes evaluation and benchmarking with increased eciency (in terms of time, eort and cloud usage costs).

Benefit: The workload model can have dierent uses during adaptive mechanism • design and optimization.

Our workload model can be used for the design and optimization of adaptive policies. For instance, it can be used to evaluate the immunity of the adaptive policy in response to fine-scale bursty conditions. It can also be used while tuning the amount of excess resource padding in the adaptive policy for sustaining the expected degree of fine-scale burstiness.

4.9 Conclusion

The analysis of the elasticity behavior under representative workloads is an essential step for eective cloud adoption. Fine-scale burstiness is an inherent feature of the web request arrival process. In this chapter, we have provided a novel methodology to reproduce such burstiness so that it can resemble the empirical stylized facts of the original arrival process. We have exemplified the eectiveness of our methodology through an experimental case study in the AWS EC2 cloud and have extracted valuable insights about its elasticity be- havior in response to workloads with fine-scale burstiness. Our findings demonstrate that fine-scale burstiness has significant implications for elasticity; it deteriorates the under- provisioning and over-provisioning scenarios with increased SLO violations and reduced resource utilization respectively. Therefore, ignoring the fine-scale burstiness in the design and analysis of adaptive systems may provide an over-optimistic view about the elasticity behavior which may turn out to be disastrous in practical scenarios.

This chapter resolves one of the crucial challenges for elasticity benchmarking, that is, workload representation. However, cloud performance variability may remain as a threat to the validity of the evaluation results. This is a real risk that deserves appropriate attention to ensure valid and repeatable benchmarking results of the cloud platforms. The next chapter recommends a set of rigorous techniques to guarantee valid and repeatable elasticity benchmarking results, even in the presence of the performance unpredictability of the cloud environments. Chapter 5

Validity and repeatability? Tame the unpredictability

“A good decision cannot guarantee a good outcome. All real decisions are made under uncertainty. A decision is therefore a bet, and evaluating it as good or not must depend on the stakes and the odds, not on the outcome.”

Ward Edwards

The runtime nondeterminism of the cloud environment often causes unpredictable variation in the elasticity benchmarking results. Of course, this nondeterminism needs to be properly addressed to guarantee valid and repeatable benchmarking results for cloud platforms. Despite the pervasive presence of such nondeterminisms in the cloud, there is no explicit guidance in the existing literature for its eective handling during elasticity benchmarking. In this chapter, we recommend a number of rigorous techniques to ensure repeatable and valid elasticity benchmarking results, even in the presence of the runtime nondeterminism of the cloud environment.

141 Validity and repeatability? 142 Tame the unpredictability

5.1 Introduction

Evaluating elasticity of the cloud platform is a non-trivial task because of the influence of various environmental variables, such as random delay in resource scaling, interference eect from the co-hosted VMs and heterogeneous commodity cloud infrastructures. Un- fortunately, the cloud consumer does not have clear visibility and explicit control over these environmental variables; therefore, it is not feasible on her part to guarantee the same environmental state across multiple executions of the benchmarking workload. As a consequence, the elasticity metric fluctuates quite often across dierent runs of the same workload. Apparently, the benchmarker needs to eectively address such environmental nondeterminisms during elasticity evaluation. Failure to do so may have severe ramifica- tions on the repeatability and validity of the elasticity benchmarking results.

There exists a proliferation of elasticity benchmarking frameworks and case studies in the literature. Unfortunately, most of these works reported results from only one execution of the benchmark. Barely any of those works provided a systematic guidance to evaluate elasticity in the presence of the cloud environment specific nondeterminism. Moreover, none of them reported variation in the overall elasticity metric. This sort of malpractice poses a significant threat to the validity and repeatability of the elasticity benchmarking results. In this chapter, we point out the inability of the existing approaches in ensuring credible conclusions about elasticity in the presence of the environmental nondeterminism of the cloud. This random variation, nevertheless, is not a unique and isolated phenomenon specific to the cloud; traditional computing systems and other fields of science have been dealing with such problems for decades. We adapt a set of solutions from these fields to eectively deal with the runtime nondeterminism, thereby ensuring that the reported elasticity metric is a good reflection of the reality.

Example 5.1 Use case: Consider the following scenario. An application provider has to choose the most elastic cloud platform in order to host her e-commerce application an- ticipated to grow wildly popular during a product launch event. Instead of carrying out the tedious tasks of setting up a series of experiments and measuring the elasticity behavior for each cloud service, she searches for available benchmark results to accelerate her decision- making process. After searching the web, she gets directed to a benchmarking organization’s website which contains a considerable amount of elasticity benchmarking results for a wide 5.1. Introduction 143 variety of cloud platforms. Now the question is whether these benchmarking results can help her make the right procurement decision; in other words, are the published results representative of the cloud platform’s elasticity behavior or mostly spurious because of the random bias induced by the cloud environment?

The contributions of this chapter are summarized as follows:

We provide a brief synopsis of the potential causes of random variation during elas- • ticity evaluation. We also demonstrate that the extent of such random variation is significant; it may present a skewed view or at worst, an incorrect conclusion about the elasticity of the cloud platform.

We argue that existing elasticity evaluation methods are unable to ensure repeatable • and valid elasticity benchmarking results based on a literature review of 40 papers. Most of these works seem to be completely oblivious of the non-deterministic eect of the cloud environment and do not specify any means to smooth out the influ- ence of the runtime nondeterminism and summarize the uncertainty in the reported elasticity metric. For this reason, these approaches are vulnerable to end up with incorrect conclusions. Failure to address the runtime nondeterminism of the cloud also makes the elasticity results generated by these benchmarks non-repeatable and non-reproducible, thereby posing a threat to the validity of the elasticity scores.

Motivated by the existing solutions in the traditional computing systems and other • fields of science, we bring together a set of rigorous techniques to ensure repeatable and valid elasticity benchmarking results, even in the presence of the environmental nondeterminism of the cloud. These techniques incorporate statistical principles in the experiment design and data analysis. The experiment design mainly concentrates on factoring out the random bias by exploiting techniques, such as workload suite diversification, setup randomization and careful planning of the SLO thresholds to avoid confounding eects. The data analysis focuses on reporting the uncertainty in the elasticity metric for a given confidence level and comparing the alternatives based on hypothesis testing. Although not a panacea, adopting our techniques can signifi- cantly reduce the variation and ensure repeatable and valid elasticity benchmarking results for the cloud platforms.

Finally, we demonstrate the eectiveness of our solution by evaluating the elasticity • of m1.small and m1.medium EC2 instances and comparing the result with those Validity and repeatability? 144 Tame the unpredictability of the prevalent methods. Our case study reveals serious weakness in the prevalent approaches to yield valid conclusion about the elasticity of the cloud platforms (about 29% incorrect conclusions). Our rigorous solution, on the other hand, proves to be quite eective in providing valid and repeatable elasticity benchmarking results for non-deterministic cloud platforms.

The remainder of the chapter is organized as follows. Section 5.2 reveals significant variation in the elasticity benchmarking results due to the non-deterministic factors of the cloud environment and explains its consequences on the overall conclusion. Section 5.3 discusses the potential sources of runtime nondeterminism aecting elasticity evaluation. Section 5.4 demonstrates that prior works do not adequately address nondeterminism in the elasticity evaluation method. To address this gap, we recommend a set of rigorous techniques to alleviate the impact of the runtime nondeterminism and report the elasticity benchmarking results with confidence in Section 5.5. We also evaluate the eectiveness of our approach as compared to the existing ones based on a case study conducted on the EC2 platform in Section 5.7. Finally, Section 5.8 outlines the potential risks and benefits associated with our recommended solution and Section 5.9 draws conclusion.

5.2 Runtime variability in elasticity behavior

As mentioned, there are many sources of runtime nondeterminism in the cloud that can bias the elasticity evaluation study. For instance, consider a researcher who wants to determine whether a new configuration I is more elastic than the existing one O for a cloud platform C. However, the elasticity measurement experiment of configuration I gets perturbed by severe interference from its neighbor VMs, resulting in a worse elasticity penalty. The researcher, not being aware of this anomaly, decides to discard the new configuration I based on the data that underestimates its elasticity behavior. The bias arising from some random events of the system is termed as random bias. The presence of such bias in the benchmarking results can make us believe that a configuration is less (or more) elastic than another, even though it is not.

In this section, we show that the eect of random bias on the elasticity metric is significant, that is, it is large enough to yield a flawed conclusion about the elasticity of the cloud platform. To demonstrate this fact, we ran our benchmark once in each of 5.2. Runtime variability in elasticity behavior 145 the availability zones us-east-1b and us-east-1c for a sinusoidal workload with 30 minutes period and estimated the elasticity ratio of the m1.medium EC2 instance with respect to m1.small EC2 instance for the following ruleset: add 1 EC2 instance when the average CPU demand goes beyond 70% for 3 minutes and remove 1 EC2 instance when the average CPU demand goes below 20% for 10 minutes. The elasticity ratios found from the benchmark executions are 1.42 and 0.55.

Example 5.2 The contradiction: Consider the following scenario. Alice and Bob run the same benchmark to compare the elasticity of the EC2 m1.small and m1.medium in- stances in response to a sine_30 workload. Alice finds an elasticity ratio 1.42 and con- cludes that the m1.medium instance yields 42% degradation in elasticity as compared to the m1.small instance. Bob, on the other hand, finds an elasticity ratio 0.55 and concludes that the m1.medium instance yields 45% improvement in elasticity than the m1.small in- stance. Now the elasticity benchmarking results of Alice and Bob are contradicting each other. So the question is: which result is valid and how can we verify it?

Unfortunately, most of the elasticity benchmarking studies today are not aware of these consequences of random bias; therefore they do not evaluate elasticity with the expected level of rigor. The lack of statistical rigor in the current practice also makes it extremely dicult to validate the benchmarking results as well as the overall conclusions.

Note that this random variability in elasticity is not a result of some flaws associated with our benchmark; instead, it is a ubiquitous phenomenon that should be attributed to the inherent non-determinism of the cloud environment. Now we illustrate how the run- time nondeterminism aects the overall elasticity behavior. Fig. 5.1 shows the elasticity behaviors of the m1.small and m1.medium instances in response to the input sinusoidal workload (sine_30). The y-axis here represents the demand and the supply of ECU (Elas- tic Compute Unit); note that m1.small provides 1 ECU per minute whereas m1.medium provides 2 ECUs per minute. Since our application (TPC-W) is compute-intensive and the sine_30 workload has a sharp growth rate, it may be benefited from an instance supply- ing more ECUs per minute (at least intuitively). In the optimistic case (see Fig. 5.1(a)), the m1.medium instance exhibits improved elasticity behavior due to the “resource reuse” phenomenon (45% improvement compared to m1.small). However, in the pessimistic case (see Fig. 5.1(b)), the m1.medium instance fails to reuse the resources from the previ- ous cycle and therefore yields poor elasticity (42% degradation in elasticity compared to Validity and repeatability? 146 Tame the unpredictability

m1.medium: ECU Demand and Supply

800 demand 700 a/supply 600 c/supply 500 400 300 ECU (%) 200 100 0 0 20 40 60 80 100

m1.small: ECU Demand and Supply

800 demand 700 a/supply 600 c/supply 500 400 300 ECU (%) 200 100 0 0 20 40 60 80 100 (a) optimistic case

m1.medium: ECU Demand and Supply

800 demand 700 a/supply 600 c/supply 500 400 300 ECU (%) 200 100 0 0 20 40 60 80 100

m1.small: ECU Demand and Supply

800 demand 700 a/supply 600 c/supply 500 400 300 ECU (%) 200 100 0 0 20 40 60 80 100 (b) pessimistic case

Figure 5.1: Random elasticity behavior of the m1.medium instance m1.small). In both cases, the elasticity behavior of the m1.small instance remains same; therefore, the variation in the elasticity metric directly maps to the elastic performance of the m1.medium instance in this example 1. From these two cases, we observe that the demand curve of the m1.medium instance assumed dierent shapes in dierent runs which eventually aected the supply curves too. Therefore, we see that varying an innocuous aspect in the experiment setup (e.g., availability zone), can severely aect the elasticity benchmarking results. The lack of visibility into the cloud platform’s internal environment specific anomalies poses serious impediment to ensuring repeatable benchmarking for the cloud services. In the next section, we will describe possible causes of such variation in

1We observed significant variation in the elasticity behaviors of both m1.small and m1.medium instances during our benchmarking in EC2 5.3. Causes of runtime variability 147 the elasticity benchmarking results.

5.3 Causes of runtime variability

Suppose, there are two cloud platforms A and B and A’s elasticity is superior to B by x%. Random bias arises if the benchmarking environment favors (or disfavors) one system over another, even when it is not [190, 189]. In such cases, the derived elasticity metric may convey skewed view or at worst, incorrect conclusion about the elastic ability of the cloud platforms; for instance, the benchmark may end up saying “A is superior to B by (x+y)%” if it unduly favors A by y%, or “B is same as or superior to A” if it improperly favors B. In order to ensure fair comparison of cloud platforms and guarantee valid conclusions, the benchmarker should take precautions to address the random bias.

According to our literature review in Chapter 2, adaptation speed and precision of resource quantity are typically regarded as the core aspects of elasticity. However, both of these aspects are subjected to random variation. As a consequence, the observed elas- ticity behavior varies randomly in response to the same workload for dierent benchmark executions. In the following, we describe dierent sources which are prone to introduce random bias to the elasticity evaluation results.

CPU Demand and Supply

600 demand a/supply 500 c/supply 400 300 CPU (%) 200 100 0 0 20 40 60 80 100 (a)

CPU Demand and Supply

600 demand a/supply 500 c/supply 400

300 CPU (%) 200 100 0 0 20 40 60 80 100 (b)

Figure 5.2: Elasticity aected by random scaling delay Validity and repeatability? 148 Tame the unpredictability Random scaling delay: Adaptation speed or scaling delay indicates how quickly • the resource pool grows or shrinks in response to a fluctuating workload intensity [125]. However, the time it takes to scale a resource pool in the cloud is variable and random [127, 210, 136, 165]; for instance, Schad et al. [210] reported considerable variation in the startup time of an EC2 m1.small instance (139.9% and 43.9% of the mean value for US and EU regions respectively). This variation aects elasticity behavior by adding random bias to the total time spent in the under-provisioned and over-provisioned states. The temporal distribution of scaling points also gets influenced by the random adaptation delay, which in turn gives rise to dierent supply curves for dierent executions of the same workload (Fig. 5.2).

Performance heterogeneity of the physical infrastructure: Performance vari- • ability is very common in today’s commodity cloud environments [127, 210, 136]. The performance of the same abstract instance type may vary up to a factor of 3 4 ≠ [88, 99]. One important cause of this variability is the heterogeneity in the under- lying commodity cloud infrastructure [99]. Datacenters typically contain hardwares from multiple generations (CPU architectures, network switches, memory, disk etc.); old components get replaced with new ones on failure, new capacities are added for further expansion. Performance heterogeneity of the underlying hardware exerts random eect on the throughput serving capacity of the same instance type. There- fore, the number of instances to serve the same workload may vary across dierent benchmark executions, resulting in dierent demand and supply curves. The preci- sion aspect of elasticity defines how closely the supplied resource quantity can follow the workload demand; runtime variability in the demand and supply curves induces bias to the precision characteristic of the elasticity metric.

Perturbation eects: The performance of the same abstract instance type in • the cloud may exhibit variability due to the interference eect from other tenants hosted on the same physical machine [51, 99, 152]. This situation arises when the workloads of several tenants contend for the same resource type. Sometimes cloud providers over-commit their resources for profitability reasons, which also aects the performance and throughput of the cloud-hosted applications [241, 102]. The perturbation eect varies naturally and unpredictably over time; it may also give rise to random demand and supply curves across dierent runs for the same workload. 5.4. Prevalent evaluation methodologies 149

Therefore, we conclude that runtime nondeterminism is a ubiquitous phenomenon for the cloud platforms. It is virtually impossible to isolate the benchmarking state from all dierent sources of random variation and compute the elasticity metric in a deterministic manner, especially in the context of public cloud services. However, hope is still there! Nondeterminism is also pervasive in traditional computing systems and other fields of science and there are several techniques available in that literature to deal with such issues. Based on the solutions of these fields, we will describe a set of rigorous yet practical benchmarking techniques to ensure proper estimation and reporting of the elasticity metric for cloud platforms. However, before describing our approach, it would be worthy to have a look at the current state of the art practices for elasticity evaluation.

5.4 Prevalent evaluation methodologies

There is a wide variety of elasticity evaluation methodologies available in the literature. We surveyed 40 papers (published from 2010 onwards) which proposed evaluation techniques or carried out measurements for one or more aspects of elasticity. However, it appears that researchers and practitioners are not completely aware of the presence of runtime nondeterminism and its severity on elasticity evaluation. The only aspect of elasticity that has been rigorously evaluated so far is the resource scaling delay. How runtime nondeterminism influences other elasticity aspects (e.g., precision of the supplied resource, temporal distribution of scaling points) or the overall elasticity metric, nevertheless, has not been addressed in the literature. Moreover, we have not found any specific guidance for addressing the runtime nondeterminism of the cloud environment during elasticity benchmarking.

During our survey, we found 7 papers which proposed conceptual frameworks for benchmarking elasticity, however, did not present any direction to instantiating executable elasticity benchmarks ([166, 105, 207, 237, 223, 125, 54]). Without any explicit guidance or working example, it is impossible to validate the ecacy of the benchmarking concept for actual cloud platforms.

Among the surveyed papers, 29 of them evaluated one or more aspects of elastic- ity ([174, 127, 210, 164, 163, 136, 173, 79, 150, 94, 93, 213, 148, 217, 43, 91, 159, 215, 126, 81, 239, 206, 139, 83, 172, 222, 155, 145, 82]) and 4 reported the elasticity measure Validity and repeatability? 150 Tame the unpredictability without any evaluation ([119, 118, 218, 65]). Table 5.1 summarizes the main findings of our literature review; a detailed comparison of various evaluation methodologies can be found in Appendix D. In this section, we first discuss some general features of the cur- rent methodologies to illustrate their positions with respect to nondeterminism. We close this section by spotting some common pitfalls in these methodologies and explaining their repercussions on the unbiased estimation of the elasticity metric.

Table 5.1: State of the art elasticity evaluation methodologies

Total number of papers surveyed 40 Evaluated elasticity metric(s) 29 Considered measurement bias (for scaling delay only) 7 Reported uncertainty in the elasticity metric(s) (for scaling delay only) 9 Reported elastic improvement as a single number 2

5.4.1 General features

5.4.1.1 Repeated measurements

Only 7 out of 40 papers in our survey addressed random bias in scaling delay by taking repeated measurements ([127, 174, 210, 164, 163, 136, 173]). Others, on the contrary, reported elasticity metric(s) from a single execution of the benchmark. We have demon- strated in Section 5.2 that elasticity metric(s) reported from a single execution can easily lead to flawed conclusions.

5.4.1.2 Acknowledgement of runtime nondeterminism

Some researchers explored the influence of runtime nondeterminism on scaling delay (7 papers). Various factors influence scaling delay, e.g., datacenter location, temporal as- pects, VM instance type etc.; these papers measured random variation in scaling delay by varying one factor at a time in the experimental setup. The interactions of multiple factors, nevertheless, were not considered in those experiments. Moreover, runtime nonde- terminism has implications for other aspects of elasticity too; however, none of the works in our survey evaluated its impact on the overall elasticity metric(s). 5.4. Prevalent evaluation methodologies 151

5.4.1.3 Reporting uncertainty

The only aspect of elasticity that was evaluated repeatedly and reported with statistical rigor is the scaling delay of the cloud platform. Dierent research groups reported the uncertainty in scaling delay using dierent statistics, however, there is not much consensus in their adopted data analysis techniques. During our survey, we found 9 papers which reported the uncertainty in the scaling delay of the cloud platform ([127, 174, 210, 164, 163, 136, 173, 119, 118]). All of those papers reported the uncertainty in the VM startup delay; only 3 of those additionally reported uncertainty in the VM release delay ([127, 174, 173]).

Among these papers, some reported uncertainty using a single statistical measure, some others reported it using multiple statistical numbers. The most popular approaches are range (3 papers [136, 119, 118]) and mean and/or median with standard deviation (3 papers [127, 174, 136]). Quartiles, percentiles and Coecient of Variation (CoV) were reported in 2 papers [136, 173], 2 papers [164, 163] and 1 paper [210] respectively.

Only a small fraction of the papers (2 out of 40 - [139, 126]) reported the overall elasticity ratio of one platform with respect to another based on a single execution of the benchmark. However, none of those works evaluated the uncertainty in the elasticity ratio.

5.4.2 Common pitfalls

Our literature survey reveals some common pitfalls in the prevalent elasticity evaluation methodologies. These pitfalls aect the fairness and repeatability aspects of elasticity benchmarking. They also pose threats to the validity of the derived elasticity metric(s). We summarize those pitfalls in the following:

5.4.2.1 Uncontrolled random factors

At present, researchers are not concerned about the impact of random environmental factors on the elasticity behavior. Most of the papers in our survey reported elasticity metric(s) from a single execution of the benchmark. If elasticity metric(s) is reported from a single benchmark execution, its validity remains questionable. It also puts the Validity and repeatability? 152 Tame the unpredictability repeatability criterion of the benchmark at stake; hence, a dierent benchmark execution may end up with a dierent result and contradict the previous conclusion. Sometimes the random bias may be too large to mask out the eect of innovation; in such situations, it becomes dicult to determine whether the improvement in elasticity results from the optimized system or some artefacts arising from random bias. Failure to consider random environmental factors provides a skewed view and leads to flawed conclusions about the cloud platform’s elastic ability.

5.4.2.2 Omitted joint interaction of factors

The elasticity of a cloud platform is the result of the complex interaction among various random environmental factors, such as the location of the datacenter, day of the week and time of the day. The daily and weekly variations in elasticity for one datacenter may not be representative of another datacenter and extrapolating this behavior may lead to flawed conclusions. Some of the surveyed papers repeatedly measured scaling delay in the one-factor-at-a-time experimental setup. This type of experiment design may yield a partial view of elasticity when the joint interaction of factors becomes significant.

5.4.2.3 Lack of rigorous data analysis

The elasticity behavior of the cloud platform gets influenced by environmental nondeter- minism; consequently, one may observe quite some variation in the reported metric across dierent executions of the benchmark. However, none of the papers in our survey reported uncertainty in the overall elasticity metric. If uncertainty is not reported, it is extremely dicult to verify the benchmarking results and the inferred conclusions. Because of run- time nondeterminism, dierent executions of the benchmark may yield dierent elasticity metrics and without quantified uncertainty, it is extremely dicult to ascertain whether the observed dierence in elasticity stems from the random bias or a serious error in the experimental design. 5.5. Rigorous evaluation techniques 153

5.5 Rigorous evaluation techniques

In order to ensure valid and repeatable elasticity benchmarking results, we need to adopt a scientific approach that can alleviate the eect of runtime nondeterminism in the collected data and report the elasticity benchmarking results (i.e., the ranking of the compared platforms as well as the percent elasticity dierence between them) with a given confidence level. To achieve this goal, we need to incorporate statistical design principles [187, 175, 61, 190, 189] and statistically rigorous data analysis techniques [121, 142, 143, 120] into the elasticity benchmarking methodology.

There are two key steps to any benchmarking methodology: experiment design and data analysis. Experiment design refers to the planning of the experimental setup; it requires a good combination of theoretical understanding and practical experience of the environmental factors of the cloud platform. For instance, if the benchmarking is carried out only during the weekend (i.e., when the cloud has low load), the elasticity results may be a partial representation of the reality. Data analysis, on the other hand, denotes the process of analyzing the collected data with statistical rigor so that the reported metric is a good reflection of the reality and also verifiable when the benchmarking is repeated at a later time under identical conditions.

5.5.1 Experiment design

We have already discussed in Section 5.3 how dierent sources of runtime nondetermin- ism exert bias to the elasticity metric. Several key strategies need to be adopted in the experiment design to factor out the random bias from the elasticity measurement data, which are presented as follows.

5.5.1.1 Workload suite diversification

One important strategy to factor out the random bias is the diversification of the bench- marking workload suite. This approach is often followed in Java performance evaluation to avoid the random bias [61, 190]. We can assess the diversity of the workload suite by Validity and repeatability? 154 Tame the unpredictability

Elasticity for sinusoidal workload Elasticity for both workloads 5 2.5 4 2.0 3 1.5 2 density 1.0 density

0.5 1

0.0 0 0.4 0.6 0.8 1.0 1.2 1.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 elasticity ratio elasticity ratio

(a) (b)

Figure 5.3: Diversified workload suite smooths out the random bias to some extent examining the tightness of the probability distribution for the elasticity metric. Typically, the width of the probability distribution indicates the amount of variability in the pos- sible outcomes of the elasticity metric. Thus, a tight distribution reduces the amount of dispersion in the elasticity metric; under this condition, it is more likely that the actual elasticity metric will converge to the expected value. The tightness measure of a discrete distribution is often specified in terms of the standard deviation; it can be computed using the following steps [58]:

Compute the expected value of the elasticity metric, eˆ as follows: • n eˆ = eiPr(ei) (5.1) ÿi=1

th where, ei is the i possible outcome of the elasticity metric, Pr(ei) is the probability

of occurrence for ei and n is the number of possible outcomes.

Estimate the standard deviation using the following formula: •

n ‡ = (e eˆ)2Pr(e ) (5.2) ˆ i ≠ i ıi=1 ıÿ Ù

Therefore, a small standard deviation implies that the distribution of the elasticity metric is tight; in other words, it indicates that the workload suite is diverse enough to factor out the random bias. We want to emphasize again at this point that it is not merely the multiplicity of the workload suite that is important, but also the variety within the suite that is crucial to smooth out the random bias.

Looking at Fig. 5.3, we see that the distribution of the elasticity scores for the sine_30 5.5. Rigorous evaluation techniques 155 workload is very wide; the elasticity score varies within the range [0.5461,1.4190] and the standard deviation is 0.1949. This means, the elasticity of this workload is very sensitive to the runtime variability of the cloud environment. Adding an exponential workload (exp_18_2.25) to the workload suite improves the diversity, as is evident from the tighter distribution of the elasticity scores for both workloads. The elasticity scores for the combined workload suite varies within the range [0.5661,0.9887] and the standard deviation of the distribution is 0.1009. Therefore, diversified workload suite improves the tightness measure of the elasticity metric distribution by 48.23% in this example.

Several aspects need to be considered while forming a diversified portfolio for the elasticity benchmarking workloads. The most important aspect is to ensure that the workloads stress the elasticity of the cloud platform in dierent ways. For instance, in the above workload suite, the sine_30 workload and the exp_18_2.25 workload are dierent based on several points. The sine_30 workload has a smaller period and fast growth and decay rate which made it ideal to be benefited from the resource reuse phenomenon of the cloud pricing model. On the contrary, the exp_18_2.25 workload has an extremely high growth rate but slower decay and the trough between the two exponential curves prevents resource reuse. The exp_18_2.25 workload also has a longer duration than the sine_30 workload; duration is another crucial aspect that influences the diversity of the workload suite [190].

5.5.1.2 Planning the SLOs

Performance variability is a frequently observed phenomenon in the cloud; it may some- times degrade the QoS (e.g., response time and throughput) of the applications. If the SLOs are too stringent (i.e., does not have sucient padding for accommodating random fluctuations), some fraction of requests may breach the threshold because of the interfer- ence eect from the co-located VMs. Since the benchmarker does not have clear visibility into the cloud environment’s internal events, she will not be able to dierentiate the causes of SLO violations, that is, whether it is due to the resource scaling delay or some arte- facts arising from the interference eect. In that case, the eects of the scaling delay and interference on the observed SLO violations are said to be confounded. The design of the SLOs needs to incorporate some flexibility so as to avoid confounding and minimize the random bias while measuring the elasticity of the cloud platform. Validity and repeatability? 156 Tame the unpredictability

5.5.1.3 Setup randomization

A number of environmental variables in the cloud influence elasticity, which are inherently random in nature and sometimes beyond the control of the benchmarker. In order to obtain valid results, we need to randomize experimental setups by varying controllable factors which are likely to cause random bias. Note that a controllable factor is one which can be set by the benchmarker (e.g., datacenter location); in contrast, an uncontrollable factor is beyond the benchmarker’s control (e.g., interference eect, underlying hardware). The presence of uncontrollable factors may also introduce bias to the elasticity metric; in such situations, we need to devise a way to minimize the variability transmitted from the uncontrollable factors [187, 142]. For instance, randomizing the experiment schedule to cover both high-load and low-load scenarios (e.g., peak hours and o-peak hours) and using appropriate statistical analysis can guard against the lurking eects, such as interfer- ence. Once the randomized setup combinations are finalized, we measure elasticity for the th platforms of interests Ci in each of these setups, where Ci denotes the i cloud platform. This process results in a number of distributions corresponding to the elasticity metrics of

Cis. Finally, we use statistical methods to report the uncertainty in the elasticity metric for the compared platforms; these methods will be presented in Section 5.5.2.

When randomizing the experimental setups, we first need to identify potential factors which exert significant influence on elasticity and organize those in a hierarchical structure. This can be represented by a random eect model with n-way classification. This approach turned out to be eective to smooth out the random bias in the performance measure of Java runtime systems [143]. In this model, the factors are arranged in a hierarchy of n random ways, each of which has an influence on the distribution of actual elasticity measures. For this reason, an n-way classification has n +1 levels in the experimental design. For instance, we can think of a 3-level experiment design that takes into account 3 factors: availability zone, day of the week and time of the day. Fig. 5.4 shows an example experiment design for 2 availability zones (AZ1 and AZ2), 2 days of the week (D1 and D2) and 2 times of the day (T1 and T2). To find out the factor combinations for the experimental setups, we start from the root and traverse towards the leaf (e.g., AZ1 æ D1 T2); the values of factors seen in each root-to-leaf traversal give us a unique setup æ combination (e.g., (AZ1, D1, T2)). The main advantage of a hierarchical design is that it considers the complex interaction among dierent factors while evaluating the elastic 5.5. Rigorous evaluation techniques 157 response, which would otherwise be overlooked in a one-factor-at-a-time or single level experiment design. For instance, the impact of the pair (Day, Time) is more likely to exert significant influence on elasticity; a single level experiment which varies only one factor at a time (e.g., either day or time) misses the information about the impact of the joint interaction of day and time on elasticity.

AZ1 AZ2

D1 D2 D1 D2

T1 T2 T1 T2 T1 T2 T1 T2

Figure 5.4: An example 3-level experiment design

The hierarchical structure provides us a bunch of test conditions; we need to carry out elasticity measurement one or more times in each of these experimental setups. Repeating the experiment in multiple test conditions is known as replication. Repeated measurement, on the contrary, implies conducting the experiment more than once in an identical test condition [175]. Both repeated measurement and replication are useful to neutralize the bias in the elasticity metric. Increasing the number of repeated measurement and/or repli- cation increases the sample size, thus reduces the standard deviation in the distribution of the elasticity metric.

5.5.1.4 Blocking

When comparing the elasticity of two or more cloud platforms, it is beneficial to adopt blocking to reduce the variability due to heterogeneous test conditions. This design tech- nique is often used in industrial experimentation to ensure homogeneous test conditions [187]. Consider the situation where the hardness of four dierent tips needs to be evalu- ated based on the depth of depression they cause when pressed on a metal test coupon. If the metal coupons are produced in dierent heat conditions, then their heterogeneity will influence the overall conclusion by inducing variability to the hardness evaluation results. Validity and repeatability? 158 Tame the unpredictability This issue is tackled by taking the hardness reading of the four tips on each of the coupons [187]. A similar strategy can be adopted for elasticity evaluation too.

According to Design of Experiment (DoE), a block refers to a set of homogeneous experimental conditions, which are almost identical to each other in all respects [175]. To ensure a fair comparison, we recommend conducting the experiments for the alternative cloud platforms under conditions which are alike in terms of temporal and spatial aspects (at least from the benchmarker’s perspective by ensuring the same combination of con- trollable factors). For instance, when comparing two adaptive policies in EC2, we can run both experiments in parallel under the same availability zone; in this case, the two adaptive policies are said to be tested in the same block which is identical in terms of day, time and availability zone. This is the best that a benchmarker can do to ensure homogeneous experimental conditions.

We want to emphasize the fact that cloud benchmarking is an iterative process [105]; therefore, it will not be wise to conduct a massive and comprehensive elasticity benchmark- ing study in all test conditions for all potential factors. For instance, a 6-level experiment design with 5 possible values at each level yields 56 = 15625 combinations of experimental setups; it is neither feasible nor economically ecient to carry out elasticity benchmarking at such a large scale. Instead, what we can do is to conduct the benchmarking periodically at a reasonable scale (e.g., by reducing the number of factors based on an initial screening experiment or picking a feasible subset of experimental setups), perform analysis on the results, draw conclusions about the compared platforms as well as the influential factors and suggest a course of action for the future benchmarking process. Followup benchmark- ing runs can be performed to validate the previous results further. The application of the strategies mentioned above in the benchmarking design thus helps us spend our budget economically, explore the impact of various controllable factors as well as their complex interactions on elasticity and smooth out the random bias in the elasticity metric.

5.5.2 Data analysis

Adopting the techniques specified in the previous section works as a partial remedy to factor out the random bias from the elasticity metric. Some bias still remains in the measured data because of the benchmarker’s lack of clear visibility and control over the 5.5. Rigorous evaluation techniques 159 cloud services. The presence of such bias may mask out the actual mean of the elasticity metric distribution of the benchmarked system, thereby posing a threat to the validity of the inferred conclusions. To resolve this issue, we need to conduct rigorous data analysis and summarize the results with a confidence interval for the mean elasticity metric.

We now demonstrate how to report the uncertainty in the elasticity metric for an n-way classification experiment design based on the work of Kalibera and Jones [143]. This work advocates two approaches to construct the confidence interval for the bench- marking results: a parametric method based on asymptotic normality assumption and a non-parametric method based on bootstrapping at dierent levels in the hierarchical experiment design.

In the hierarchical experiment design, there are n+1 levels and each level corresponds to a controllable factor. Suppose, the repetition counts of the benchmarking experiment from the lowest (level 1) to the highest level (level n +1) are r1,...,rn+1. In each of the experiment setups, we measure the elasticity behavior of the cloud platform. We express the overall elasticity metric as Ejn+1...j1 , where the subscript of E denotes the index of the factor levels in the hierarchy (highest to lowest) and ji =1...ri. For instance, E2,1,2 refers to the elasticity metric observed for the setup combination (AZ2, D1, T2), i.e., the 2nd availability zone, 1st day and 2nd time.

5.5.2.1 Asymptotic confidence interval for elasticity metric

We report the mean elasticity metric as the arithmetic mean of all elasticity metric ob- servations, denoted as E. However, this may not represent the true mean µ of the overall population of elasticity metrics because of sampling variability. Dierent benchmarkers may end up with dierent samples of elasticity metrics and therefore the estimated mean may vary from sample to sample. To ensure valid and repeatable benchmarking results, we need to approximate the true mean µ based on the sample mean E and a range of values [cl,cu] around E denoting the confidence interval with a specific probability or confidence level. The confidence interval [cl,cu] is defined such that the probability of µ staying between c and c equals to 1 –; – is known as the significance level and 1 – l u ≠ ≠ is the confidence level. For example, a 90% confidence interval implies that there is 90% probability that the true mean µ of the actual distribution will lie within the interval Validity and repeatability? 160 Tame the unpredictability

[cl,cu].

The approach for constructing the confidence interval depends on the number of ob- servations at the top-most level. If the number of observations at the highest level is large (i.e., r 30), the confidence interval with significance level – can be estimated n+1 Ø as follows:

2 2 rn+1 Sn 1 – +1 – E z1 = E z1 ˆ Ejn+1 ... E ... (5.3) û ≠ 2 Û rn+1 û ≠ 2 ırn+1(rn+1 1) Q • • ≠ • •R ı jn+1=1 n n ı ≠ ÿ +1 Ù a b ¸ ˚˙ ˝ ¸ ˚˙ ˝ – where the value z1 – is the 1 quantile of the standard normal or Z distribution ≠ 2 ≠ 2 S2 with mean 0 and variance 1 and n+1 is the unbiased variance (i.e., not the sample specific rn+1 variance) at the highest level n +1.

On the other hand, when the number of observations at the highest level is relatively small (i.e., rn+1 < 30), the confidence interval with significance level – can be estimated as follows:

2 2 rn+1 Sn 1 – +1 – E t1 ,‹ = E t1 ,‹ˆ Ejn+1 ... E ... (5.4) û ≠ 2 Û rn+1 û ≠ 2 ırn+1(rn+1 1) Q • • ≠ • •R ı jn+1=1 n n ı ≠ ÿ +1 Ù a b ¸ ˚˙ ˝ ¸ ˚˙ ˝ – where, t1 – ,‹ is the 1 quantile of the two-tailed Student’s t distribution for ‹ = ≠ 2 ≠ 2 r 1 degrees of freedom. n+1 ≠

5.5.2.2 Bootstrap confidence interval

We now describe how to construct the confidence interval for the mean elasticity metric using non-parametric bootstrapping. Let’s assume that the elasticity metrics at the low- est level, that is, Ejn jn...j for any fixed jn+1jn ...j2 are independent and identically +1 2• distributed (i.i.d) and the means at the higher levels are also independent. Also, assume that the distribution of the elasticity ratios for each workload in the benchmark suite is i.i.d.

The primary motivation behind bootstrapping is to simulate many experiments based on real experimental measurements and then estimate the confidence interval from the simulated samples. Therefore, instead of getting just one realization of the elasticity metric 5.5. Rigorous evaluation techniques 161

E estimated from the actual sample, we can generate many realizations of the elasticity metric using simulation, thereby saving time and cost for experimentation. After that, we can construct the confidence interval for the mean elasticity metric by selecting appropriate quantiles from the simulated samples.

The pseudocode for constructing the bootstrap confidence interval for elasticity met- ric is shown Algorithm 5.1. This algorithm generates a simulated sample of elasticity metrics at each iteration by carrying out random sampling with replacement on the real sample; then it estimates the mean elasticity metric E from the elasticity metrics of the simulated sample. Once all the bootstrap iterations are completed, we end up with a large collection of mean elasticity metrics. For an appropriately chosen –, we estimate the lower (probability – ) and upper quantiles (probability 1 – ) for the collection of mean 2 ≠ 2 elasticity metrics. The resampling method used in the algorithm obeys the hierarchical structure of the actual experiment and performs random sampling at each level. Several variations of the bootstrapping method are possible, for example, random resampling at each level with (or without) replacement, ignoring the hierarchical structure and sampling at random from all measurements etc. More information on bootstrapping can be found in [143, 86].

5.5.2.3 Comparing alternatives

So far, we have discussed dierent approaches to computing the confidence interval for the mean elasticity metric. However, there are many practical scenarios where the cloud consumer wants to rank dierent platforms based on their elasticity behaviors. In this section, we demonstrate how to compare two alternatives based on the elasticity metric.

The first step for comparing two alternatives is to find out some way that helps us decide whether to choose the null hypothesis or the alternative hypothesis:

g2 H0 : g1 = g2 E = =1 © g1 H : g = g E =1 a 1 ” 2 © ”

th ,wheregi stands for the geometric mean of the penalty rates for the i cloud platform.

However, due to the presence of nondeterminism, we will not be able to find out the Validity and repeatability? 162 Tame the unpredictability

Algorithm 5.1 Bootstrap confidence interval for elasticity metric

Input: mÛNumber of workloads (w1,...,wm) Û Workload suite nÛNumber of ways in an n-way classification (r1,...,rn+1) Û Repetition counts at each level E [w] where, i,1 j r Û Observed elasticity ratio for workload w jn+1...j1 ’ Æ i Æ i nIterations = 1000 Û Number of iterations – =0.10 Û significance level Output: cl Û Lower limit of the confidence interval cu Û Upper limit of the confidence interval Uses: mean(x) Û computes arithmetic average quantile(probability, x) Û selects the sample quantile resample(replacement, x) Û performs random sampling sqrt(q, x) Û computes q-th root of x

1: simulatedMeans new vector[nIterations] Ω 2: for iteration 1 to nIterations do Ω 3: simulatedMeasures new vector[r .....r ] Ω n+1 1 4: index 1 Ω 5: for each jn+1 resample(TRUE,1...rn+1) do . œ 6: . 7: for each j resample(TRUE,1...r ) do 2 œ 2 8: for each j 1 to r do 1 Ω 1 9: metric 1 Ω 10: for each w (w ,...,w ) do œ 1 m 11: k resample(TRUE,1...r ) Ω 1 12: metric metric E [w] Ω ú jn+1jn...j2k 13: end for 14: simulatedMeasures[index] sqrt(m,metric) Ω 15: index index +1 Ω 16: end for 17: end for 18: end for 19: simulatedMeans[iteration] mean(simulatedMeasures) Ω 20: end for

21: c quantile( – ,simulatedMeans) l Ω 2 22: c quantile(1 – ,simulatedMeans) u Ω ≠ 2 5.5. Rigorous evaluation techniques 163 true elasticity metric for the overall population of the compared platforms. Instead, what we can do is to compute the confidence interval for the elasticity metric from the sample. Suppose, the confidence interval is [c ,c ] at a confidence level of (1 –). If this confidence l u ≠ interval includes 1 (i.e., 1 [c ,c ]), we can conclude with confidence level (1 –) that there œ l u ≠ is no statistically significant dierence between the two alternatives and we will not reject

H0.Otherwise,wewillrejectH0 and give verdict in favor of the alternative hypothesis H at a confidence level (1 –), suggesting that the two platforms are dierent. Note that a ≠ although we reject the null hypothesis, there would still be a non-zero probability – that the observed elasticity dierence between the compared platforms has arisen simply due to some nondeterministic eect of the cloud environment.

5.5.2.4 Testing repeatability

As stated in Section 5.5.1, benchmarking is an iterative process; therefore, benchmarkers and researchers may carry out reproducibility studies and want to compare whether their benchmarking results conform to the previous ones or not. This can be done by comparing the new confidence interval for the elasticity metric with the previous one; if the two confidence intervals do not overlap, we can conclude that they are significantly dierent. On the other hand, if these confidence intervals overlap, we need to perform Student’s t-test to determine whether they are dierent or not [84, 149].

Let’s assume, the original metric distributions corresponding to the previous and cur- rent elasticity benchmarking studies have the same variance. Then the t-statistic can be calculated as follows:

E E t = p ≠ c s . 1 + 1 EpEc np nc Ò

2 2 (np 1)s +(nc 1)s where, s = ≠ Ep ≠ Ec . EpEc np+nc 2 Ú ≠

In the above formula, Ex is the mean elasticity metric for benchmarking study x, sEpEc is the pooled standard deviation for both study samples, and nx is the number of benchmark executions at the highest level for study x. (n 1) is the degrees of freedom x ≠ for study x, therefore, (n + n 2) is the degrees of freedom for both studies, which is p c ≠ Validity and repeatability? 164 Tame the unpredictability also used to find the critical t value for significance testing. If the t value estimated from the above equation is smaller than the critical t value, then we cannot reject the null hypothesis (i.e., the benchmarking results of the two studies are same) at significance level –; otherwise, we will reject the null hypothesis stating that the benchmarking results dier significantly from each other.

Now that dierent rigorous techniques have been presented, we next demonstrate how these fit together in a real-world case study for evaluating the elasticity of the cloud platforms in the presence of the environmental nondeterminisms.

5.6 Experimental setup

To validate our proposed techniques, we conducted a case study in the EC2 cloud and compared the elasticity of two adaptive scaling strategies: one strategy uses m1.small instance as the basic building block, the other one uses the m1.medium instance as the basic building block. We sought to determine which strategy yields better elasticity for a set of fluctuating workloads. Our experimental environment is described below.

We deploy the TPC-W application on the EC2 platform and measure the elasticity penalty in response to a small workload suite (sine_30 and exp_18_2.25). TPC-W is a widely used e-commerce benchmark that emulates the complex user interactions of a Business-to-Consumer (B2C) website [117]. We host the online bookshop implementation of TPC-W application into a server farm of EC2 instances. We choose the 32-bit archi- tecture of the EC2 instance for this experiment. We host both the web server and the application server on the same EC2 instance; the web server directs all dynamic requests to the application server, therefore the utilization of the EC2 instance depends mostly on the service rate of the application server. Instead of keeping the images in the same EC2 instance, we deploy them into S3 [5]. In this experiment, we want to measure the elasticity implications of the application server in response to the applied workloads; for this reason, we host the back-end database into an m2.2xlarge RDS (Relational Database Service) [4] MySQL instance so that it does not become a bottleneck tier and its influence on the end-to-end performance is negligible.

The web server farm resides behind an internet-facing load-balancer. The number of EC2 instances of the server farm is dynamically adapted in response to the workload 5.6. Experimental setup 165 demand using Autoscaling API [7]. All experiments start with only one EC2 instance in the server farm; the capacity of the server farm is adapted in response to the workload intensity based on the following ruleset: increase the server farm capacity by one instance when the average CPU utilization goes beyond 70% for 3 consecutive minutes and de- crease the capacity by one instance when the average CPU utilization goes below 20% for 10 consecutive minutes. We host the server farm on a single availability zone for these experiments.

Instead of using the TPC-W workload generator, we use JMeter [194] to specify our predefined workload patterns. JMeter provides a convenient interface for generating an open workload and varying the number of concurrent requests per second (RPS) over time using XML script. A user session is defined in terms of browsing-specific requests; each user lands on a homepage, then searches for new books in a random genre and then browses the detail of a randomly selected item.

We benchmark the elasticity of two dierent adaptive scaling strategies and determine the elasticity ratio of one strategy over another. The first adaptive scaling strategy uses m1.small instance as the basic building block of the server farm. An EC2 m1.small instance provides 1 Elastic Compute Unit (ECU) 2, 1.7 GB of memory and 160 GB of local instance storage 3. The other scaling strategy uses m1.medium instance as the basic building block; it provides 2 ECUs, 3.75 GB of memory and 410 GB of local instance storage. The pricing of the m1.small and m1.medium instances are 4.4 and 8.7 per instance hour respectively; this pricing is used to compute the over-provisioning¢ penalty¢ for the CPU resource. We choose m1.medium as the alternative platform because it provides a larger resource capacity; since the workloads of our benchmark grow at a fast pace, perhaps the benefits of using the m1.medium instance may outweigh its additional cost.

To compute the under-provisioning penalty, we need to have an SLO specification which is relevant but flexible enough to factor out the confounding eect of performance variability. We run an experiment with a single instance behind the load balancer, grad- ually increase the number of concurrent users and monitor the 95th percentile response time just before the point when it starts dropping requests. We repeat this experiment 3 times and find the upper bound of the 95th percentile response times for all request types. The upper bounds of the 95th percentile response times for all request types in

2 1 ECU is the equivalent CPU capacity of a 1.0 1.2 GHz 2007 Opteron or 2007 Xeon processor ≠ 3http://aws.amazon.com/ec2/previous-generation/ Validity and repeatability? 166 Tame the unpredictability those 3 experiments are 1138 ms, 1532 ms and 1489 ms. We keep sucient padding for variability while designing the SLO threshold so that performance variability does not get confounded with the under-provisioned situation. We expect that at least 95% of all requests will see a response within 2 seconds and there will not be any service disruption. Otherwise, a financial penalty of 12.5 will apply for each additional 1% of requests for which the 2-second threshold is breached.¢ Furthermore, for each 1% of dropped requests, a financial penalty of 10 is charged. The SLO evaluation period is considered to be an hour. ¢

We randomize the experimental setups based on a hierarchical design of 3 controllable factors: availability zone, day of the week and time of the day. The hierarchical structure is similar to Fig. 5.4, as explained in Section 5.5.1.3. Dierent availability zones may have dierent combinations of physical infrastructures; we assume that the hardware hetero- geneity across dierent availability zones may induce bias to the elasticity measures, this is why we decide to put this factor at the highest level of the hierarchy. The next factor “day of the week” causes variation in the elasticity behavior with the variation in the day- to-day load in the availability zone. And the lowest level factor “time of the day” causes variation in the elasticity measure as a consequence of the load variation in the datacenter throughout the day (we partition time into two categories - peak hour, 7 : 00 AM - 7 : 00 PM and o-peak hour, 7 : 00 PM - 7 : 00 AM). There may be other factors responsible for nondeterministic influence (e.g., seasonality, weekly variability); however, we restrict our attention to a small set of factors to ensure tractability and meet budgetary constraints. We took 2 measurements per day throughout the week (7 days) for 2 availability zones (us-east-1b and us-east-1c) in May 2015; therefore, we have a total of 2 7 2 = 28 repli- ◊ ◊ cated setups in which each of the adaptive scaling strategies gets measured. The overall cost for this experiment was about 700 USD.

5.7 Case study

In this section, we apply our method to compare the elasticity of two adaptive scaling strategies which have m1.small and m1.medium instances as the building block respec- tively. At first, we perform statistically rigorous data analysis for our observed sample of elasticity ratios, and report the ranking and percent elasticity dierence between the 5.7. Case study 167 alternatives. Then we compare and contrast the prevalent methods with our rigorous methodology based on the accuracy of the elasticity benchmarking results.

5.7.1 Rigorous method

Table 5.2a shows the elasticity metrics obtained from 28 dierent replicated experimental setups (i.e., for 2 availability zones, 7 days of the week and 2 dierent times of the day); the elasticity behaviors of the scaling rulesets in each of these setup combinations can be found in Appendix E. Our objective is to answer the following questions:

Which EC2 instance (between m1.small and m1.medium) yields better elasticity for • the given workload suite?

What is the dierence in elasticity between m1.medium and m1.small for the given • workload suite?

To answer these questions, we analyze the sample of elasticity metrics in an iterative manner, as demonstrated by the series of tables in Table 5.2. At first, we calculate the sample means and variances for each day ( M and V), as reported in Table 5.2b and •• •• 5.2c respectively. In the next step, we compute the mean and variance for each of the columns of Table 5.2b; this results in vectors MM and VM shown in Table 5.2d and • • 5.2e respectively. We also calculate the mean for each column of Table 5.2c, getting Table 5.2f. In the final step, we compute the mean for all tables found in the second step (i.e., the tables in the second row) and variance for Table 5.2d. Now we get the grand mean E 2 2 2 and variances at dierent levels in the hierarchy, i.e., S3 , S2 and S1 . Now we can estimate the 90% confidence interval (i.e., significance level – =0.10) for the sample of elasticity metrics as follows:

2 S3 0.001 E t1 – ,‹ =0.72 6.314 =0.7172 0.157 (5.5) û ≠ 2 Û r3 û Ú 2 û where, 6.314 is the rounded value of the 0.95 quantile of the t distribution with 1 degree of freedom. This ultimately yields [0.56,0.87] as the 90% confidence interval for the elasticity metric. Note that the confidence interval does not include 1, indicating statistically signif- icant dierence in the elasticity behavior between m1.medium and m1.small. Therefore, Validity and repeatability? 168 Tame the unpredictability we conclude that “m1.medium yields better elasticity than the m1.small instance and the percentage improvement lies within the range [13,44]”.

Table 5.2: Elasticity metrics for a 3-level hierarchical experimental design (followed [143]).

(a) Observed sample (b) Day means (c) Day variances us-east-1b us-east-1c M us-east-1b us-east-1c V us-east-1b us-east-1c o-peak peak o-peak peak •• •• Sat 0.65 0.68 Sat 0.002 0.016 Sat 0.62 0.68 0.77 0.59 Sun 0.76 0.71 Sun 0.017 0.007 Sun 0.85 0.66 0.77 0.65 Mon 0.76 0.69 Mon 0.018 0.000 Mon 0.67 0.86 0.68 0.70 Tue 0.80 0.64 Tue 0.073 0.000 Tue 0.99 0.61 0.65 0.63 Wed 0.77 0.73 Wed 0.005 0.015 Wed 0.82 0.72 0.81 0.64 Thu 0.74 0.62 Thu 0.000 0.005 Thu 0.73 0.76 0.66 0.57 Fri 0.72 0.78 Fri 0.004 0.029 Fri 0.68 0.77 0.66 0.90

(d) Availability zone means (e) Variances of day means (f) Means of day variances MM us-east-1b us-east-1c VM us-east-1b us-east-1c MV us-east-1b us-east-1c • • • 0.74 0.69 0.002 0.003 0.017 0.011

(g) Variance of availability (h) Mean variance of day (i) Mean of time variances zone means means MMV, S2 VMM, S2 MVM, S2 1 3 2 0.014 0.001 0.003

(j) Grand mean MMM, E 0.72

We also estimate elasticity using the bootstrapping technique. After 1000 iterations, the 90% confidence interval for the elasticity ratio settles at [0.67,0.76]. Based on the bootstrapping method, we conclude that “m1.medium yields better elasticity than the m1.small instance and the percentage improvement lies within the range [24,33]”. Note that both methods show consensus in the ranking of the EC2 instances, however, boot- strapping ends up with a narrower confidence interval due to a larger set of realizations of the elasticity metrics.

Fig. 5.5 depicts the violin plot for the elasticity metrics. A violin plot is an approxi- mate lookalike of a box plot, however, with an additional rotated probability density plot overlaid on each side [33, 128]. The mid-point (i.e., the white one) of the violin plot rep- resents the median and the black box shows the interquartile range, IQR (therefore, 50% of the values stay in this region); furthermore, the thin vertical line shows the lower and upper adjacent values (i.e., the furthest value that lies within the 1.5 IQR from the edge of the box in the outward direction) and the bottom-most and top-most points represent 5.7. Case study 169 the minimum and the maximum values respectively.

This violin plot is quite revealing in several ways. First, the variability in the elas- ticity metrics appears to be significant; the coecient of variation (CoV) is about 14%, indicating notable dispersion in the elasticity behavior of the EC2 instances. Second, the dierence between the top-most and bottom-most points in this plot is 0.42, indicating that the elasticity improvement due to m1.medium instance (compared to m1.small) is highly variable. Third, the distribution of the elasticity metrics appears to be almost Gaussian with substantial weight around the mean. Statistical tests, e.g., Kolmogorov- Smirnov test and Shapiro-Wilk test (significance level, p 0.05) have given verdict in Æ favor of the approximate normality assumption. The quantile-quantile (qq) plot (Fig. 5.6) has also supported the normality assumption for this data sample (as the quantiles of the elasticity metric sample closely approximate the theoretical quantiles of a normal distribution).

1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 elasticity score 0.6 0.55

ElasticMark

Figure 5.5: Runtime variability in the elasticity score when compared the elastic improve- ment of m1.medium instance with respect to the m1.small instance.

Normal Q−Q Plot 1.0

0.9

0.8

0.7

Sample Quantiles 0.6

−2 −1 0 1 2 Theoretical Quantiles

Figure 5.6: Quantile-quantile (qq) plot of the sample elasticity scores Validity and repeatability? 170 Tame the unpredictability

5.7.2 Comparison with prevalent methods

Now we assess the correctness of the existing elasticity evaluation methods. Georges et al. [121, 120] delineated an approach for the assessment of the Java performance evaluation methods; a similar approach is adopted here for comparing our rigorous method with the prevalent ones. Table 5.3 shows how to compare the existing elasticity evaluation methods with the rigorous method based on dierent combinations of their outcomes. The purpose of this classification table is to find out the percentage cases where both the rigorous method and the prevalent method agree on the ranking and percentage elasticity dierence about the compared platforms. Cases, where the prevalent method disagrees with the outcome of the rigorous method, are considered to steer us to the wrong direction with either an incorrect ranking or an erroneous elasticity dierence between the compared platforms.

Table 5.3: Classification of the outcomes of the prevalent method and the rigorous method

Rigorous methodology Elasticity ratio for the prevalent methodology g2 H0 : E = =1 Eˆ 1 ◊ Eˆ 1 >◊ g1 | ≠ |Æ | ≠ | H0 is not rejected indicative misleading H is rejected but the ordering of 0 misleading but correct correct prevalent method is maintained H is rejected and the ordering of 0 incorrect incorrect prevalent method is reversed

H is not rejected: In this scenario, the dierence in elasticity behavior between the • 0 compared platforms may have been caused by random bias. Remember from Section 5.5.2.3, we identify this situation by examining whether the confidence interval of the

elasticity scores contains 1; if it contains 1, we do not reject H0 and conclude that there is no statistically significant dierence between the compared platforms. If the prevalent method states the superiority of one platform over another in this situa- tion, its validity would be at stake, because the correct conclusion here nullifies the presence of any statistically significant dierence between the compared platforms.

Benchmarkers usually do not dierentiate the alternatives when the elasticity dif- ference is very small or negligible. Following [121], we incorporate this practice by specifying a threshold ◊ for categorizing the decisions; when the elasticity dierence is less than or equal to ◊, it is considered to be small and vice versa. If the prevalent method reports an elasticity dierence greater than ◊, its conclusion would be con- 5.7. Case study 171

sidered as misleading (remember, the valid conclusion nullifies the presence of any significant dierence in this scenario). On the other hand, if the prevalent method states a small dierence in the elasticity metric, the conclusion would be considered as indicative.

H is rejected: In this scenario, the valid conclusion would be there is a statistically • 0 significant dierence between the compared platforms for a given confidence level 1 –. Two possible ranking scenarios may arise based on the relative positioning ≠ of the confidence interval [cl,cu] of the rigorous method and the estimated elasticity

ratio x of the prevalent method; either both [cl,cu] and x lie on the same side of 1 (e.g., all y [c ,c ] < 1 and x<1) or on dierent sides of 1 (e.g., all y [c ,c ] < 1 œ l u œ l u and x>1). If both [cl,cu] and x lie on the same side of 1, the ranking would be considered as correct. On the other hand, the ranking of the prevalent method would

be considered as incorrect,ifx and [cl,cu] lie on dierent sides of 1; in this case, it contradicts the ranking of the rigorous method.

The above procedure tells us whether the prevalent method can yield the valid ranking for the compared platforms. However, among these correctly ranked cases, if the reported elasticity dierence is relatively small (i.e., below the threshold), then the conclusion of the prevalent method would be considered as misleading. Because of the small dierence, the prevalent method may decide not to distinguish the elastic ability of the compared platforms. By contrast, the rigorous approach precisely reports a statistically significant albeit small dierence between the alternatives in these scenarios.

Now we compare the prevalent methods against the rigorous method based on the classification mentioned above. We report our findings for two dierent values of threshold, ◊: ◊ = 5% and ◊ = 10%. We choose the following prevalent methods for evaluation:

Single workload, single benchmark execution: A considerable number of prevalent • elasticity evaluation methods fall into this category (e.g., [150, 94, 217, 219, 43, 126]). This approach does not consider any statistical principle in the experimental setup and data analysis. We report the accuracy of this approach for the sine_30 and exp_18_2.25 workloads.

Diversified workload suite, single benchmark execution: We demonstrated this ap- • proach for evaluating elasticity of several rulesets in Chapter 3. This approach Validity and repeatability? 172 Tame the unpredictability considers a diversified workload suite and reports the elasticity metric based on only one benchmark execution.

Diversified workload suite, multiple benchmark executions, reporting mean ratio: • We have not observed any work that reports the overall elasticity metric based on multiple benchmark executions. However, reporting the mean of 3 benchmark exe- cutions is very common in the benchmark industry; therefore, one may be tempted to consider this approach for reporting the elasticity metric. It follows a number of statistical techniques, such as workload diversity, repeated measurements and data analysis using the mean.

Note that none of the prevalent methods used the blocking technique when evaluating alternative platforms. For this reason, we randomly pick z number of elasticity penalties from the samples of m1.small and m1.medium scaling rulesets respectively (sample size, N = 28), pair those up, estimate the overall elasticity metric for the z pairs and then classify the result into any of the aforementioned categories. At the end, we report the percentage of correct, incorrect and misleading conclusions for all possible combinations of the given samples (i.e., N N A z B ◊ A z B

).

Fig. 5.7 shows the accuracy of the prevalent methods in terms of the percentage of correct, incorrect and misleading conclusions. These charts reveal some interesting observations about the prevalent methods:

Most of the prevalent methodologies fail to provide 100% accuracy for the elasticity • benchmarking results. We find some cases where the fraction of incorrect conclusions ranges up to 29.59%; the prevalent method yields incorrect ranking for the compared platforms for those cases (i.e., states that m1.small provides better elasticity than m1.medium). We observe an increased percentage of misleading but correct conclu- sions too (up to 24.32%); the prevalent method reports the correct ranking for those cases, however, ends up with a weaker statement saying that the gap in the elastic response is not large enough to consider the alternative platforms to be dierent.

Now let’s concentrate on the accuracy rate of the scenario. We observe contrasting results between the sine_30 5.7. Case study 173

correct misleading incorrect misleading but correct indicative

100

80

60

40 % of total comparisons 20

0 sine30 - 1 exec exp - 1 exec both - 1 exec sine30 - mean ofexp 3 execs - mean of 3 bothexecs - mean of 3 execs

(a) threshold,◊ =5%

correct misleading incorrect misleading but correct indicative

100

80

60

40 % of total comparisons 20

0 sine30 - 1 exec exp - 1 exec both - 1 exec sine30 - mean ofexp 3 execs - mean of 3 bothexecs - mean of 3 execs

(b) threshold,◊ =10%

Figure 5.7: Prevalent method evaluation Validity and repeatability? 174 Tame the unpredictability workload and exp_18_2.25 workload. sine_30 yields the lowest accuracy (varied between 64.92% and 55.99%) whereas exp_18_2.25 workload yields 100% accuracy. This observation gives us the impression that the elasticity of some workloads (e.g., sine_30) are more prone to be aected by random bias than others. Based on this observation, one should not be tempted to exclude a workload from the benchmark- ing suite simply because of its sensitivity to random bias; each workload in the portfolio should be chosen based on the importance of its usage scenario. To reduce the intensity of the random bias, we have to ensure a diversified workload suite. Looking at the chart, we can observe significant increase (about 34%) in the cor- rect conclusion rate for the combined workload case (because the chosen workloads ensure diversity).

Taking repeated measurements and reporting the mean of 3 executions also improve • the percentage of correct conclusions for the elasticity of the compared platforms. However, the improvement is much better when the workload suite is diverse (cor- rectness rate more than 99%).

Note that the accuracy of the prevalent methods also depends on the choice of the • threshold ◊: if the percentage of elasticity improvement (or degradation) is less than or equal to ◊, the elasticity dierence between the compared platforms will not be considered as significant. The fraction of correct conclusions tends to decline with the increase in the ◊ threshold.

The prevalent methods, however, cannot guarantee the repeatability of the bench- • marking results and the overall conclusions. All of these report a single number for elasticity which is subject to change because of sampling variability. Our method reports a confidence interval for the elasticity metric, therefore ensures repeatability and reproducibility of the benchmarking results.

5.7.3 Summary

In this section, we have evaluated the eectiveness of our rigorous solution against the prevalent methodologies for elasticity benchmarking. In a nutshell, our evaluation reveals severe limitation in the prevalent methodologies to yield unbiased and repeatable elasticity benchmarking results in the presence of the environmental nondeterminisms of the cloud. 5.8. Critical reflection 175

For some prevalent methods, the total percentage of incorrect and misleading conclusions went up to 44%; This situation is very concerning and may turn out to be a serious blocking factor for innovation in the area of cloud elasticity. Incorporating statistical rigor in the elasticity benchmarking method can help us avoid this problem. Our case study demonstrates the ecacy of the rigorous method in smoothing out most of the random bias and yielding valid elasticity benchmarking results, even in the presence of the environmental nondeterminisms of the cloud.

5.8 Critical reflection

In this chapter, we have presented a number of rigorous techniques to yield valid and re- peatable elasticity benchmarking results in the presence of environmental nondeterminism of the cloud. However, there are some risks associated with our benchmarking techniques, which are presented as follows:

Risk: There may be skepticism about the time complexity and economic eciency • of our benchmarking techniques.

Our solution suggests measuring elasticity in a number of randomized experimental setups. Stakeholders may feel skeptical about the economic eciency of this tech- nique and prefer to report elasticity from a single execution or mean of 3 executions. Although the prevalent methods appear to be economical in terms of the number of executions (as well as time and monetary cost), the economy of eort may turn out to be deceiving because of their negligence in considering the wider environmental eect on elasticity; for this reason, the prevalent methods sometimes compromise the correctness of the results. In contrast, our solution ensures the validity and re- peatability of the benchmarking results with a reasonable increase in the monetary cost, time and eort, which is likely to be amortized in several months. Our rigorous techniques can be adopted even when the budget is limited; in that situation, we have to take elasticity measurements for a subset of the possible randomized setups. It would also be helpful to conduct an initial screening experiment to eliminate fac- tors whose eect on elasticity is negligible or deterministic. Decreasing the number of executions, however, results in a wider confidence interval (or a narrow interval with reduced confidence level). Validity and repeatability? 176 Tame the unpredictability Risk: Some of the solutions described in this chapter are in the inception phase and • need further refinements.

Some of the solutions recommended in this chapter are just initial ideas, such as diversification of the workload suite and selection of the controllable factors. We have specified an approach to evaluate whether the workload suite is diversified or not; however, there is not much guidance on how to design a diversified and complete workload suite, such as which attributes of the workload need to be considered when stressing the cloud platform’s elasticity behavior. We have observed in Section 5.5 that a diversified workload suite can smooth out the random bias to a great extent and ensure a tighter distribution; therefore, we intend to come up with a set of guidelines for designing a diversified workload suite for elasticity evaluation in future.

Likewise, we have not presented any method for screening the controllable factors for elasticity benchmarking. There is a proliferation of controllable factors in the cloud that may aect the elasticity behavior; we need to conduct cause-eect analysis to understand the eects of various factors on elasticity. This will also help the benchmarkers design the experiments with more economic eciency by eliminating factors whose eects are negligible. More research is required to further refine this technique.

Risk: There is still a lack of guidance about the optimal number of benchmark • executions.

We have not given any direction to estimating the optimal number of benchmark executions for elasticity measurement. Designing the elasticity experiment with an optimal number of benchmark executions is very important because of its direct implication on the monetary cost, especially in the context of the public cloud. Because of its undeniable importance, we include it as one of the top priorities in our future research roadmap.

Risk: Several factors may remain as threats to the external validity of our case • study.

Factors that aected the external validity of the case studies in Chapter 3 and Chapter 4 also remain as threats to the case study performed in this chapter. Due to budgetary constraints, we had to conduct this case study for a small fraction of the standard workload suite and a simple scenario (m1.small vs. m1.medium); 5.8. Critical reflection 177

this also aects the generalizability of the findings reported in this study. Further research needs to be undertaken to validate the rigorous techniques in more general experimental conditions.

Contributions and benefits

A short summary of the contribution associated with this chapter is as follows:

Contribution: A number of rigorous techniques for ensuring valid and repeatable • elasticity benchmarking results in the presence of the environmental nondeterminisms of the cloud.

Our literature review suggests that environmental nondeterminism is mostly overlooked during elasticity evaluation. A myriad of factors may be responsible for this negligence, such as the lack of awareness about the environmental eect, concerns about time com- plexity and monetary cost, and the lack of techniques to address the random bias. In this chapter, we have demonstrated that the random bias induced by the cloud environment is significant and commonplace. Because of its unpredictable nature, it is practically impos- sible to avoid such bias. Since the benchmarker does not have clear visibility and explicit control over the cloud environment’s internal events, it is also non-trivial to correct the bias in the elasticity measures.

The elasticity benchmarking techniques that we have presented in this chapter incor- porate statistical design principles and statistically rigorous data analysis; these techniques proved to be quite eective in tackling the issues of nondeterminism in the traditional com- puting systems and other fields of science. This methodology advocates a set of proactive approaches that should be followed in the experiment design phase to smooth out most of the random bias. The statistical data analysis technique then helps us compare the alternative platforms and report the uncertainty in the elasticity metric for a given con- fidence level. We have demonstrated the eectiveness of our method based on a case study conducted on the EC2 platform. The validation results are quite promising; our rigorous methodology outperforms the existing ones in the presence of the environmental nondeterminism.

There are a number of research directions that can be taken in the future. Possible Validity and repeatability? 178 Tame the unpredictability avenues include: (1) Using causal analysis to determine the eects of dierent controllable factors and define a set of guidelines for the factor selection process; (2) Developing a statistical model and/or defining some convergence criteria that take into account the ex- tent of various factor eects and suggest the optimal number of benchmark executions for improving economic eciency without compromising correctness; and (3) further investi- gation to determine which specific workload attributes need to be considered to ensure a diversified workload suite for elasticity evaluation and designing a diversified benchmark- ing workload collection. For Java performance evaluation, some eorts [121, 142, 60] were undertaken to address the optimal benchmark execution and workload diversification is- sues; it would be worthwhile to study those approaches for resolving similar problems in elasticity benchmarking.

A further benefit of the work introduced in this chapter is as follows:

Benefit: increased awareness about the significant eect of environmental nonde- • terminism on the elasticity behavior of the cloud platform.

Based on our literature review, we have found that prevalent state of the art for elas- ticity evaluation is not aware of the detrimental eect of nondeterminism. In this work, we have demonstrated that the influence of environmental nondeterminism on the elasticity benchmarking result can be significant. We have also outlined possible sources of nondeterminism that may induce random bias to the elasticity measures. Hope this information will increase awareness in the elasticity benchmarking com- munity about cloud environment-specific nondeterminism and encourage them to devise useful techniques to address the random bias.

5.9 Conclusion

In this chapter, we have demonstrated the ubiquitous presence of nondeterminism in the cloud environment and its severity on the elasticity benchmarking results. Unfortunately, in most cases, the extent of such bias is unpredictable; there is no straightforward way to avoid or correct such bias. However, it is possible to neutralize the impact of the random bias by incorporating statistical rigor in the experimental design and data analysis phases. Motivated by the solutions available in other fields of science and traditional computing systems, we have introduced a number of rigorous techniques to ensure valid and repeatable 5.9. Conclusion 179 elasticity benchmarking results even in the presence of the environmental nondeterminisms of the cloud. In the end, we have evaluated the eectiveness of our solution based on a case study conducted on the EC2 platform - specifically, we have applied our rigorous method to compare the elasticity of m1.small and m1.medium instances. We have also revealed the shortcomings of the prevalent methods in ensuring valid elasticity benchmarking results when aected by the random bias in the cloud environment. Our solution turns out to be quite eective in yielding valid and repeatable elasticity benchmarking results with a reasonable amount of monetary cost.

Chapter 6

Conclusion and future work

“WARNING! Everything in this book may be all wrong. But if so, it’s all right!”

Mark Twain

My PhD quest to design a consumer-centric elasticity benchmarking framework comes to an end with this chapter. It recapitulates the research problem and goal, summarizes the key contributions, oers a critical appraisal of our work and finally speculates on possible avenues for further research.

6.1 Recap on research problem and objective

The unique cost-eective vision of elasticity for the time-varying and unpredictable work- loads makes it a popular alternative to traditional IT systems. From startups to large enterprises, almost everyone is considering elastic cloud as a great opportunity to max- imize their Return on Investment (ROI). Although all cloud providers nowadays claim elasticity in their oered services, we can be pretty sure that none are perfect in deliv- ering the right degree of elasticity for the consumer’s applications. Therefore, pragmatic consumers often feel the need of an elasticity benchmark to compare and contrast the elasticity claims of dierent cloud oerings. However, the irony is that at present the com- puting community is lacking an elasticity benchmarking framework that can adequately reflect the consumer’s perception of elasticity. As a result, consumers find it very di-

181 182 Conclusion and future work

cult to validate the elasticity claims of dierent cloud platforms, compare the elasticity of competing cloud solutions, and diagnose and avoid elasticity issues often arising in the context of her application.

Our objective, therefore, is to address this gap by introducing an elasticity benchmark- ing framework that can address the consumer’s concerns by taking into account her appli- cation context, business objectives and technical constraints. In the end, this framework is supposed to provide a single-figured elasticity metric in order to help the consumer draw a simple conclusion about the relative worthiness of alternative cloud platforms. However, it is not feasible to cover all dierent types of cloud oerings and application contexts in the 4-years timeframe of a PhD. Therefore, we restrain our scope only with VM based IaaS and PaaS oerings, pay-as-you-go pricing model and e-commerce application type in this dissertation. Our high-level research questions defined in Chapter 1 are as follows:

RQ 1 How can we design a core elasticity benchmarking framework for cloud platforms from the consumer’s viewpoint?

RQ 1.1 How can we define a metric for measuring elasticity of the cloud platform from the consumer’s perspective?

RQ 1.2 How can we design a standard workload suite for elasticity evaluation?

RQ 2 What are the concrete steps for instantiating an executable elasticity benchmark?

RQ 3 How can we reproduce custom prototypes of actual workloads for elasticity evalua- tion?

RQ 4 How can we ensure repeatability and validity in the elasticity benchmarking results in the presence of the performance unpredictability of cloud platforms?

6.2 Contributions

The main contribution of this dissertation is ElasticMark - a novel elasticity benchmark- ing framework that considers the consumer’s perspective for assessing the elasticity of the cloud platforms. This framework expresses the elasticity of a cloud platform as a single figure of merit, thereby aids the consumer in making a well-informed decision about the desirability of competing cloud platforms. It also promotes innovation in this field by 6.2. Contributions 183 letting the cloud providers and researchers understand the worth of their products with respect to other competitive elasticity oerings available in the market.

ElasticMark is composed of a number of novel concepts, each aims to solve a critical research problem in the field of cloud performance engineering. These contributions are summarized as follows:

A core framework for evaluating the elasticity of competing cloud platforms (e.g., • cloud oerings, adaptive scaling strategies) from the consumer’s perspective.

We have designed a core framework that incorporates the consumer’s viewpoint throughout the entire elasticity evaluation process. It starts o by characterizing the consumer’s viewpoint in terms of four aspects: application context, business objectives, observational constraints and financial concerns. These views are later embodied into dierent components of the elasticity benchmarking framework - met- ric definition, workload representation and instantiation procedure of the executable elasticity benchmark. In the end, this framework expresses the elasticity of the cloud platform as a single figure of merit to help the consumer draw a simple and well-informed conclusion about the worthiness of competing cloud platforms.

Our elasticity metric encapsulates the key characteristics of the consumer-centric elasticity definition - adaptation speed (upswing and downswing), precision and us- age accounting, thereby conveying a complete view about the platform’s elasticity behavior. It also incorporates the platform specific pragmatic issue by making a clear distinction between chargeable supply and available supply, a crucial aspect that significantly influences the consumer’s operational expenses. It brings together all of these factors into a penalty based elasticity measurement model, thereby help- ing the consumer estimate the financial implications of imperfect elasticity. It also presents an approach to express elasticity as a single figure of merit so that consumers can rank the relative worthiness of alternative cloud platforms. In addition to that, it provides a precisely defined procedure for instantiating an executable elasticity benchmark from the abstract framework; this procedure discusses concrete choices on SLO definition and workload representation. Moreover, it includes a standard workload suite covering a number of frequently observed usage scenarios, thereby satisfying the workload representation criterion to some extent. In summary, our framework takes into account dierent facets of the consumer’s viewpoint as well as 184 Conclusion and future work

pragmatic issues to make sure that the elasticity metric is a good indicator of the consumer’s reality.

This elasticity benchmarking framework facilitates well-informed decision-making at dierent phases of the cloud adoption lifecycle. For instance, the financial penalty rate estimated by this framework can help the consumer understand whether the application is a good candidate for elastic cloud or not. The single-figured elastic- ity metric helps them compare and contrast the worthiness of one platform over another and pick the one best suited to their needs. It also helps them evaluate the relative eectiveness of alternative scaling techniques and deployment configura- tions. By plotting the financial penalty rate with time, it is also possible to diagnose possible causes for imperfect elasticity behavior in unexpected circumstances and imperceptible workload conditions.

A novel workload model for generating representative prototypes of fine-scale bursty • workloads based on traces.

The standard workload suite of our framework covers only a fraction of usage scenar- ios; this may not be adequate to support custom usage scenarios, often arising in the context of dierent types of consumers’ applications. As a result, elasticity bench- marking under custom workload conditions may be at stake. Our literature review revealed that existing workload models fall short of generating realistic prototypes of fine-scale bursty workloads, frequently faced by web and e-commerce applications. To address this problem, we have presented a novel workload model that can gener- ate representative prototypes of the fine-scale bursty workload based on trace-level characteristics.

The intuitive approach to our workload modeling relies on the time-dependent reg- ularity structure of the noisy oscillations and a user-defined standard deviation ‡ - both of these can be easily extracted from the original arrival process. Statistical validation confirms that the prototype generated by our model is a good approxi- mation of the empirical stylized facts of fine-scale burstiness in the actual workload. This modeling technique is robust and gives the consumer the flexibility to scale the prototype as well as the degree of burstiness according to her need. Cloud consumers and benchmarkers will find this workload model very handy to generate custom prototypes for elasticity benchmarking. Performance analysts and adaptive system designers can also use this model to check the sustainability of their adaptive 6.2. Contributions 185

techniques in response to fine-scale bursty behavior.

A set of rigorous techniques for ensuring valid and repeatable elasticity benchmarking • results in the presence of the environmental non-determinism of the cloud.

Elasticity behavior of the cloud platform shows some variation across dierent runs of the same workload. The reason behind this variation is the environmental non- determinism of the cloud. Since the consumer has limited visibility and control over the public cloud environment’s internal events, there is little she can do to ensure the same environmental state across dierent benchmark runs. This is a serious problem that poses a threat to the validity and repeatability of the elasticity benchmarking results. To resolve this problem, we have adapted a number of statistically rigorous techniques from traditional computing systems and other areas of science.

Our solution incorporates a number of statistical experimental design techniques to factor out the random variation from the elasticity benchmarking setup; examples include, workload suite diversification, randomized setups and careful SLO planning for avoiding confounding eects. However, adopting these techniques may work as a partial remedy, some bias may still remain in the derived elasticity metrics. Statisti- cally rigorous data analysis can then help reporting the uncertainty in the elasticity metric with the confidence interval and comparing the alternatives based on hypoth- esis testing. Although not a panacea, adopting our solution significantly factors out the variation in the elasticity benchmarking scores. Our case study also demonstrates its eectiveness in ensuring valid and repeatable elasticity benchmarking results as compared to the prevalent elasticity evaluation methods.

Comprehensive empirical case studies demonstrating the applicability of the frame- • work for the purpose of elasticity evaluation.

A number of case studies were carried out to understand the elasticity behavior of an online bookstore application hosted on the AWS EC2 platform. These case studies were planned to serve two purposes: validation and insights extraction. The valida- tion task sought to check whether dierent components of the elasticity framework serve their intended purposes; examples include, sanity checking whether the elastic- ity metric is a good reflection of the consumer’s reality or not, demonstrating the ef- fectiveness of the rigorous methodology to ensure repeatable and valid benchmarking results and so on. The insights extraction task, on the other hand, specifically con- centrated on developing practical understanding about the platform’s elasticity be- 186 Conclusion and future work

havior, such as pinpointing anomalies in the delivered elasticity of the cloud platform, illuminating the distinction between technical elasticity and consumer-perceived elas- ticity, demonstrating flaws in the common-sense based scaling techniques to handle imperceptible workload conditions, and so on. Some general optimization prescrip- tions have also been recommended to improve the consumer-perceived elasticity, examples include, lazy deprovisioning, bulk provisioning etc. These case studies significantly improve the current understanding as well as the state of the art prac- tice about the platform’s elasticity behavior and can be considered as important contributions to the cloud performance engineering literature.

6.3 Critical reflection

According to the research goal stated in Chapter 1, the specific objective of this dissertation is to design an elasticity benchmarking framework to facilitate well-informed decision- making about the relative worthiness of competing cloud platforms from the consumer’s perspective.

Designing a consumer-centric elasticity evaluation framework, however, is a challeng- ing task because of the involvement of various factors and their complex interactions, such as the consumer’s context, wide diversity in the cloud oerings (e.g., instance types, pricing models) and cloud-specific pragmatic issues. It is, therefore, not feasible to design an elasticity framework that is broad and all-encompassing during the lifetime of a PhD. For this reason, we decided to focus on the VM based IaaS and PaaS cloud oerings with pay-as-you-go pricing model in the context of e-commerce application type. This disserta- tion developed several core components of the consumer-centric elasticity benchmarking framework, namely, a set of elasticity metric definitions, a standard workload suite covering representative usage scenarios, a novel workload model for generating custom prototypes of fine-scale bursty workloads and a number of rigorous techniques for valid and repeatable elasticity benchmarking results. Each of these components was validated through a series of argumentation and extensive case studies.

Despite the restricted scope of this research, the tasks we picked for framework con- struction and validation were inherently complex and therefore, could not be completely dealt with. Those limitations are summarized as follows: 6.3. Critical reflection 187

Limitations of the core framework

At this moment, our elasticity framework supports only fixed capacity VM type and pay- per-use pricing model. However, this narrow scope may not always be sucient to address the consumer’s needs in the real world. Consumers are more likely to take advantage of cheaper and discounted pricing schemes (e.g., dynamic pricing, subscription-based pricing) to lower their operational expenses. They are often tempted to exploit VMs with bursting capabilities - that is, a VM type which stores unused CPU credits from idle time periods so that it can burst beyond its baseline performance level to serve sudden surges in demand. Due to the increased interest of the consumer into these oerings, we intend to broaden the scope of our framework to consider dierent VM types and pricing models in future.

Another limitation of our framework relates to the representativeness of the standard workload suite. We designed this workload suite based on information available in the academic works and practitioners’ blogs. In an academic setting with very limited access to the OLTP workload traces, it was not possible to carry out a comprehensive workload characterization study for elasticity benchmarking. For this reason, these workloads should be considered as a rough approximation to the real-world usage scenarios. This is a real risk that should be dealt with caution when developing an industry standard benchmark; in particular, a thorough workload characterization study needs to be carried out to identify representative workloads for elasticity evaluation. To support prototype extraction from workload traces, we have provided a novel workload model in Chapter 4.

Another shortcoming of our framework is that it does not provide any guidance to extrapolate the benchmark results. As pointed out in Chapter 3, a consumer with a large and complex application may find it very inconvenient to migrate the entire application to dierent cloud platforms and conduct elasticity benchmarking, as it requires a good deal of development and economic eorts. She would prefer to conduct elasticity benchmarking for a small portion of her application in dierent clouds and extrapolate the benchmarking results for the whole application. Now, what is the probability that the findings of the small application will equally hold for the larger one? What would be the relationship between these two? In addition to this, she may feel the need to extrapolate the bench- marking results of a small prototype workload for the anticipated production workload. Cloud consumers may also want to know the likelihood and applicability of these bench- marking results to their long-term capacity planning decisions. These are some legitimate 188 Conclusion and future work concerns that need to be addressed in order to promote the usefulness of our framework. Future research will concentrate on finding suitable answers to these questions of the cloud consumer.

Lack of theoretical validation is an obvious threat to the validity of our elasticity metric. Due to time constraints, we had to leave it out of our scope, however, included it as one of our top priorities for future research.

Several factors may remain as threats to the validity of our empirical case studies. It was not possible to eliminate the eect of cloud performance variability from the bench- marking setup; however, we infer that its influence was mitigated to some extent due to the diversity in our workload suite and appropriate choice of the SLO thresholds. Perhaps for this reason, the overall conclusions drawn from these studies remain consistent with the observations from other studies [157, 219, 114]. However, the percentage improve- ments or degradations reported in these case studies should be interpreted with caution, as those were reported from a single benchmark execution only. Future research should, therefore, focus on determining the uncertainty in those numbers. The external validity of our case study also got aected by a number of factors; for example, the TPC-W bench- mark, deployment design, standard workload suite, specific tutorial rulesets used in the case study may not be representative of the real-world industrial practices. Due to the limited budget, it was also not feasible to check the generality of our conclusions for other cloud oerings (e.g., Azure [34, 14], Google Compute Engine [16]), VM instance types and scaling techniques. These risks are inherent to the case studies conducted in an academic setting; several actions can be taken to overcome such limitations, such as carrying out some case studies in the industrial context for increased representativeness, increasing the breadth of the study by exploring other cloud oerings, VM instance types and scaling techniques, and so on.

Limitations of the workload model

As described above, the main reason for including this workload model is to promote elasticity benchmarking for custom fine-scale bursty workloads by extracting prototypes from the trace. This workload model can preserve the empirical stylized facts of fine-scale burstiness in the reproduced prototype. However, a minor issue with this model is that it 6.3. Critical reflection 189 cannot appropriately model the magnitude of an extreme and lonely fluctuation. A close inspection of the prototype generated by our model reveals the presence of an abrupt fluctuation at the same relative position; however, its magnitude is not large enough to authentically represent extreme fluctuations of the workload. This problem usually occurs when the extreme fluctuation is lonely, that is, it does not have similar type neighbors surrounding it. We consider it as a minor issue because of several reasons; first, these extreme fluctuations seem to be an artefact of the Wikipedia workload we used (note that these extreme dips occurred at regular intervals and no extreme spikes were found in those workloads). We could not find any other workload (e.g., FIFA World Cup 1998) with such extreme and lonely fluctuations. Moreover, a single extreme fluctuation, especially if it is a dip, is not supposed to have much influence on the elasticity behavior of the platform. Nevertheless, we will explore other workload traces to evaluate the eectiveness of our workload model. If these extreme fluctuations appear as a typical aspect of the workload, we will refine our model in future.

Some of the steps in our workload modeling method are rather complex; for instance, the identification of pointwise Holder exponents and generation of fractional Brownian motion (fBm). Although we described each step in detail with a working example, cloud consumers may consider its implementation as an additional overhead. However, under- taking this overhead is worthy as it gives the consumer a realistic picture of the cloud platform’s elasticity behavior for the production workload. To address this inconvenience, we plan to develop a user-friendly tool to simplify the prototype generation process at the consumer’s end.

Most of the threats reported in the previous case study also hold for this evalua- tion. However, in this case study, we adopted some good practices, such as randomized experiments and sensitivity analysis to reduce the influence of variability on the overall conclusion. The general findings of this study are also in sync with other studies in tra- ditional computing and virtualized systems [247], therefore, can be considered as valid. Nevertheless, the reported numbers should be interpreted with caution as they are based on a small number of experiments. Furthermore, the conclusions derived from this case study need to be verified for other workload scenarios, cloud oerings and instance types for further generalizability. Therefore, a natural progression of this work would be to investigate the impact of fine-scale burstiness in the context of other workloads, a variety of instance types and other cloud oerings. 190 Conclusion and future work

Limitations of the rigorous techniques

The environmental non-determinism of the cloud exerts substantial influence on the va- lidity and repeatability of elasticity benchmarking results. To address this problem, we have recommended a number of rigorous techniques based on the statistical experimental design principles and data analysis approaches from other domains. This work, however, is still at the inception phase; some of the steps specified in the rigorous solution, therefore, lack explicit guidance.

Our rigorous solution requires the consumer to carry out benchmarking in a num- ber of replicated experimental setups. Perhaps the most obvious criticism of this work may emerge from the economic eciency perspective. Critics may argue that conducting benchmarking in many experimental setups aects the “economic” outlook of the bench- mark. Our response to this criticism is as follows. As mentioned in 2.2.3.4, the economic eciency of the benchmark depends on two factors: aordability and worth of investment [131]. Existing elasticity benchmarking methods can only guarantee the aordability as- pect; lack of any mechanism to ensure valid and repeatable benchmarking results in the presence of the environmental non-determinism of the cloud is a major drawback, which also raises questions on their worth for investment. Our solution, on the contrary, can ensure valid and repeatable results with a reasonable increase in the operating expenses. Several workarounds are available for applying our solution even when the budget is lim- ited; for instance, carrying out benchmarking for a subset of randomized setups, eliminat- ing factors with negligible and/or deterministic eects, and so on. However, decreasing the number of benchmark executions may result in a wider confidence interval.

At present, our solution does not provide any specific guideline for an optimal number of benchmark executions, factor screening process and workload diversification strategies; this is a real risk that needs to be addressed to improve the aordability of running our benchmarking framework. Further research will, therefore, concentrate on providing ex- plicit guidance on these tasks. For Java performance evaluation, some eorts [121, 142, 60] were undertaken to address the optimal benchmark execution and workload diversifica- tion issues; it would be worthwhile to investigate those approaches for resolving similar problems in elasticity benchmarking.

All the external threats already specified also apply to this case study. Due to bud- 6.4. Future directions 191 getary limitation, we could not carry out this evaluation for the entire set of standard workload suite, other instance types and scaling techniques; these also contribute to the external threats to validity. More research is required to validate the eectiveness of these techniques in other experimental conditions.

Notwithstanding these limitations, this dissertation significantly advances the current state of the art practice in elasticity evaluation. This is the first concrete proposal that comprehensively addresses the consumer’s perspective for elasticity evaluation of cloud platforms. This framework also yields a single figure of merit for elasticity to facilitate simple and proactive decision-making about the relative worthiness of alternative cloud platforms, thereby helping the consumer make a well-informed decision about cloud adop- tion. The case studies presented in this dissertation also enhance the existing knowledge about consumer-perceived elasticity with valuable insights and real-world phenomena.

6.4 Future directions

As discussed above, this research has thrown up many questions in need of further investi- gation. In terms of directions for future research, a number of avenues can be explored to address the existing limitations of our elasticity framework as well as broaden the scope of our present research; these are briefly summarized in the following:

Framework extension and refinement. The elasticity benchmarking framework in- troduced in this dissertation currently applies to a limited range of cloud oering types, instance types and pricing models, which may not be adequate to address the consumer’s needs in general. There is, therefore, a definite need to iteratively expand and refine this framework so that it can help the consumer determine elasticity for even non-VM based PaaS oering types (e.g., Heroku [22], AWS Lambda [9]), other instance types (e.g., bursting instances) and pricing models (e.g., dynamic pricing, subscription-based pricing).

Framework validation. This dissertation has not provided any theoretical validation of the elasticity metric and the associated measurement model. Future work will concentrate on oering a rigorous validation of the elasticity metric and the measurement model. A possible starting point could be the work of Meneely et al. [182], which describes a step- 192 Conclusion and future work by-step procedure for selecting the validation criteria of a software metric based on its intended use. For example, to demonstrate that the elasticity metric is meaningful, one has to validate several criteria, such as attribute validity, content validity, dimensional consistency and so on. Additionally, we will address the limitations of our empirical validation as much as possible. Possible strategies may include broadening the scope of our case studies for other cloud oerings, scaling techniques and instance types, validating the generalizability of our insights and prescriptions, determining the uncertainty in the reported elasticity metrics and carrying out a few case studies in the industrial context, if possible.

Workload model validation. This dissertation has shown the validation of our work- load model for a specific instance of the Wikipedia workload. Future work will explore other workload traces to validate its eectiveness. Recall that our workload model at present cannot model those sharp lonely dips of the Wikipedia workload. This issue is an intriguing one, which needs to be properly investigated in further research. If this abrupt change (spike or dip) is found to be a typical characteristic of the workload arrival process, then necessary actions should be taken to fix this issue.

Rigorous methodology validation. This dissertation has brought together a number of statistical techniques to ensure validity and repeatability of the elasticity benchmarking results. However, due to budgetary constraints, we validated it for a small workload suite and scaling techniques. Future work will concentrate on validating the eectiveness of these techniques for a larger workload suite, scaling techniques and other cloud oerings.

Automated benchmark harness. Future work will also develop an automated bench- mark harness for elasticity benchmarking. It is a useful tool that promotes productivity in benchmarking, reduces the possibility of manual errors and ensures repeatability in dier- ent benchmark executions. The harness should provide a user-friendly interface through which the consumer can specify their benchmarking workloads, cloud platform configu- rations, SLOs and an optional benchmarking schedule. It will then generate workload scripts based on those specifications, set up the testing environments according to the cloud platform configurations, take necessary observations and finally generate an elastic- ity benchmarking report for individual workload scenarios as well as the overall workload suite. In cases when multiple benchmark executions are required, it will ensure homoge- 6.4. Future directions 193 neous testing conditions across multiple executions with minimal operational cost.

Extrapolation of evaluation results. As discussed in the previous section, our work does not oer any guidance to extrapolate elasticity evaluation results for large complex applications and bigger workloads. At present, our framework relies on the assumption that the consumer’s application is manageable enough to be directly migrated to dierent cloud platforms. It also assumes that the elasticity evaluation results of a small workload equally holds when the workload grows larger in time and amplitude. These simplistic assumptions may not always hold in practice. In most cases, the consumer may want to conduct benchmarking for a small component or module of her application in response to a small prototype workload and extrapolate the results for the full-scale application as well as actual workloads. This is a valid requirement of the cloud consumer, which needs to be addressed in further research. The extrapolation method should take into account the dependency structure among dierent components, application-level operation flow characteristics, fine-grained detail about performance and utilization of individual operation types, micro aspects of elasticity and so on. A tentative approach to tackle this issue may include behavioral modeling, micro-benchmarking and simulation.

Elasticity planning framework. Note that the elasticity ranking between alternative cloud platforms in the short term may not always hold in the long term because of several factors, such as long-term pricing discounts, workload combination. A natural progres- sion of our benchmarking framework, therefore, leads to the development of an elasticity planning framework so as to help the consumer make a well-informed decision about the elasticity of alternative cloud platforms in the long term. Unlike the elasticity evaluation framework which focuses only on adaptive and peak usage scenarios, the elasticity plan- ning framework should consider all types of workload usage scenarios - peak, moderate and low usage scenarios. The planning framework also needs to incorporate probabilis- tic occurrences of dierent workload types as well as long-term pricing discounts while evaluating long-term elasticity of cloud platforms.

Incorporation of user navigation behavior in the benchmark workload. In this dissertation, we considered a simple browsing workflow consisting of read-only transac- tions for elasticity evaluation. However, in reality, the users of an e-commerce application exhibit complex navigation patterns with varying degrees of read-only and update trans- 194 Conclusion and future work actions. For this reason, we plan to keep provision for simulating the diversity in the user navigation behavior in the benchmarking workload. More specifically, we want to integrate the following user navigation characteristics into the workload specification: all possible workflows (e.g., browsing, purchasing) and their probabilities of occurrences, inter-request dependencies and data dependencies within each workflow, transition probabilities be- tween dierent states (or web pages) within each workflow, session length distribution and think time distribution. The inter-request dependencies within a workflow can be specified using several variants of Finite State Machines (FSM) with guards and actions, whereas the interstate transition probabilities within a workflow can be described using Probabilistic Finite State Machines (PFSM), such as Markov chains [180, 212, 230].

Representative and diversified workload suite. This is, without any doubt, one of the most crucial elements of a good benchmark. A representative workload suite guaran- tees the relevance of the benchmarking results, whereas diversity in the workload collection minimizes the dispersion in the reported elasticity metrics. An initial outline for perform- ing this work includes the following tasks: extraction of typical adaptive request arrival patterns and request type distributions from workload traces, understanding how these workloads stress the elasticity behavior of the cloud platforms as well as their anity to dierent resources (e.g., CPU intensive applications, memory intensive applications), defining a workload classification scheme based on insights about the variety of stressing patterns and resource anities, classifying workloads based on that scheme and picking one or more representative workloads from each equivalence class for inclusion in the benchmarking workload suite.

Rigorous and economical benchmarking. In Chapter 5, the number of benchmark executions was chosen arbitrarily. The controllable factors for the benchmark design were also chosen on an ad-hoc basis. In order to improve economic eciency and rigor, these problems need to be resolved in a systematic manner. There are several ways to address these problems; for instance, defining convergence criteria for elasticity benchmarking, developing statistical models to estimate the optimal number of benchmark executions etc. During the factor screening phase, it is possible to determine the influence of dierent environmental factors on the variation in elasticity metric using statistical meta-analysis [175, 187]. 6.5. Concluding remarks 195

Beyond the horizon of benchmarking. The case studies in this dissertation dis- covered a number of useful insights about the cloud platform’s elasticity behavior. This intuitive understanding can be used for the optimization of elastic systems. Therefore, a natural extension of our work could be designing optimal scaling techniques to improve consumer-perceived elasticity; one possible roadmap is to develop optimal scaling strate- gies for fine-scale bursty workloads. A tentative direction would be developing an online prediction based feedback controller that takes into account the evolving nature of the fine-grained bursty request arrivals; for online prediction, several techniques can be used, such as dynamic MDPs (Markov Decision Processes), SARIMA and ARFIMA models, etc. Recall from Chapter 5, the consumer-perceived elasticity is also influenced by the heterogeneous infrastructures and perturbation eects of the cloud platform. Therefore, we can also focus on developing elasticity optimization strategies that are aware of those perturbation eects and exploit the heterogeneity of the underlying infrastructure and various pricing models.

6.5 Concluding remarks

Elasticity oers unprecedented economies of scale and business agility to SMEs with a limited budget; my interest in elasticity initiated as soon as my software developer mind noticed its ability to promote creative freedom and innovation. Anyone with an innovative idea can now go live with an elastic cloud infrastructure within hours; an undergrad student with an innovative software application can now plan for her own startup, a local restaurant owner can bring her creative recipes to the marketplace through online presence and so on. Undeniably, it is the true revolution in the history of IT! However, as we continued our use-inspired research, we started to see some imperfections in the currently delivered elasticity of the cloud platforms and speculate their potential ramifications on the consumer’s finances; without an appropriate elasticity metric, consumers bear the risk of choosing a non-optimal elastic platform which, if do not adapt well in response to the fluctuations of the production workload, may eventually make them feel disillusioned about the real benefit of elasticity. This concern ultimately led us to design a consumer- centric elasticity benchmarking framework ElasticMark - a yardstick to aid proactive decision-making about the relative worthiness of alternative elastic cloud platforms. 196 Conclusion and future work

This benchmarking framework incorporates the consumer’s financial concerns, business objectives and technical constraints while evaluating the elasticity of the cloud platforms and finally yields elasticity as a single figure of merit. It helps the consumer validate the elasticity claims of dierent cloud oerings for her application specific context. It also helps her make a well-informed decision by comparing and contrasting the relative worthiness of alternative elastic cloud platforms. For this reason, this is a must-have tool for consumers to simplify decision-making at dierent stages of elastic cloud adoption.

We plan to release the initial version of the framework as soon as possible. Our research eort will also be continued in parallel to address its current limitations. We encourage consumers to use our framework for evaluating the elasticity of cloud platforms. We also urge researchers and practitioners to put forth more coordinated eort to resolve the existing issues of consumer-perceived elasticity. Bibliography

[1] Amazon EC2 Reserved Instances. . [Online; accessed 2016-03-09].

[2] Amazon EC2 Spot Instances. . [Online; accessed 2016-03-09].

[3] App Engine - Platform as a Service | Google Cloud Platform. . [Online; accessed 2016-03-09].

[4] AWS | Amazon Relational Database Service (RDS). . [On- line; accessed 2016-03-09].

[5] AWS | Amazon Simple Storage Service (S3). . [Online; accessed 2016-03-09].

[6] Aws | elastic load balancing - cloud network load balancer. . [Online; accessed: 2016-03-09].

[7] AWS Autoscaling API. . [Online; accessed 2016- 03-09].

[8] AWS CloudWatch. . [Online; accessed 2016-03- 09].

[9] Aws lambda | product details. . [Online; ac- cessed: 2016-03-09].

[10] Box | Secure Content & Online File Sharing for Businesses. . [On- line; accessed 2016-03-09].

[11] Cloud computing for business : What is cloud? . [Online; accessed: 2016-03-01].

197 198 BIBLIOGRAPHY

[12] Cloud computing it glossary | top 50 cloud terms defined. . [Online; accessed: 2016-02-09].

[13] Cloud service elasticity. . [Online; accessed: 2016-02-09].

[14] Cloud Services - Deploy web apps & APIs | Microsoft Azure. . [Online; accessed 2016-03-09].

[15] Cloud Services Pricing - Amazon Web Services (AWS). . [Online; accessed 2016-03-09].

[16] Compute Engine - IaaS | Google Cloud Platform. . [Online; accessed 2016-03-09].

[17] Dropbox. . [Online; accessed 2016-03-09].

[18] Elastic Compute Cloud (EC2) Cloud Server & Hosting - AWS. . [Online; accessed 2016-03-09].

[19] Exponential growth. . [Online; accessed: 2016-03-30].

[20] Facebook Developers - Facebook for Developers. . [Online; accessed 2016-03-09].

[21] Google Docs - create and edit documents online, for free. . [Online; accessed 2016-03-09].

[22] Heroku: Cloud application platform. . [Online; accessed: 2016- 03-09].

[23] Oce 365 | Home, Personal and Business Editions ... . [Online; accessed 2016- 03-09].

[24] OrionVM - Wholesale Cloud IaaS Provider. . [Online; accessed 2016-03-09].

[25] Previous generation instances. . [Online; accessed: 2016-03-09]. BIBLIOGRAPHY 199

[26] Pricing - Price Performance Leadership | Google Cloud Platform. . [Online; accessed 2016-03-09].

[27] Pricing Overview - How Azure pricing works | Microsoft Azure. . [Online; accessed 2016-03-09].

[28] Rackspace: Managed Dedicated & Cloud Computing Services. . [Online; accessed 2016-03-09].

[29] Sales Cloud: Sales Force Automation CRM Software Features. . [Online; accessed 2016-03-09].

[30] Scalr. . [Online; accessed 2016-03-09].

[31] Spec glossary. . [Online; accessed: 2016-02-09].

[32] Understanding the voting process. . [Online; accessed 2016-03-09].

[33] Violin plot. . [Online; accessed: 2016-03-30].

[34] Virtual Machines - & Windows VMs | Microsoft Azure. . [Online; accessed 2016-03-09].

[35] What Is AWS Elastic Beanstalk? - AWS Documentation. . [Online; accessed 2016-03-09].

[36] Life in the cloud, living with cloud comput- ing. , 2008. [Online; accessed: 2016-03-01].

[37] Report on cloud computing to the osg steering committee. 2012.

[38] Patrice Abry, Paolo Goncalves, and Jacques Lévy Véhel. Scaling, fractals and wavelets, volume 74. John Wiley & Sons, 2010.

[39] Brian Adler. Will aws t2 replace 30 percent of instances? not so fast. , 2014. [On- line; accessed 2016-03-09]. 200 BIBLIOGRAPHY

[40] Divyakant Agrawal, Amr El Abbadi, Sudipto Das, and Aaron J Elmore. Database scalability, elasticity, and autonomy in the cloud. In Database Systems for Advanced Applications, pages 2–15. Springer Berlin Heidelberg, 2011.

[41] May Al-Roomi, Shaikha Al-Ebrahim, Sabika Buqrais, and Imtiaz Ahmad. Cloud computing pricing models: a survey. International Journal of Grid & Distributed Computing, 6(5):93–106, 2013.

[42] Ahmed Ali-Eldin, Johan Tordsson, and Erik Elmroth. An adaptive hybrid elas- ticity controller for cloud infrastructures. In Network Operations and Management Symposium (NOMS), 2012 IEEE, pages 204–212. IEEE, 2012.

[43] Rodrigo F Almeida, FR Sousa, Sérgio Lifschitz, and Javam C Machado. On defin- ing metrics for elasticity of cloud databases. In Proceedings of the 28th Brazilian Symposium on Databases, 2013.

[44] Mourad Amziani. Modeling, evaluation and provisioning of elastic service-based business processes in the cloud. PhD dissertation, Evry, Institut national des télé- communications, 2015.

[45] Raluca Suzana Andra. Investigating pricing and negotiation models for cloud com- puting. MSc in High Performance Computing University of Edinburgh, Year of Presentation, 2013.

[46] Claudio A Ardagna, Ernesto Damiani, Fulvio Frati, Guido Montalbano, Davide Rebeccani, and Marco Ughetti. A competitive scalability approach for cloud archi- tectures. In Cloud Computing (CLOUD), 2014 IEEE 7th International Conference on, pages 610–617. IEEE, 2014.

[47] George Argyrous. Cost-benefit analysis and multi-criteria analysis: Competing or complementary approaches. School of Social and International Studies, The Aus- tralia and New Zealand School of Government, 2010.

[48] Michael Armbrust, Armando Fox, Rean Grith, Anthony D Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, et al. A view of cloud computing. Communications of the ACM, 53(4):50–58, 2010.

[49] Neil Ashizawa. The Elasticity of the Cloud. , 2010. [Online; accessed 26-Jan- 2016]. BIBLIOGRAPHY 201

[50] Thepparit Banditwattanawong. The survey of infrastructure-as-a-service taxonomies from consumer perspective. In Proceedings of the 10th International Conference on e-Business, 2015.

[51] Sean Kenneth Barker and Prashant Shenoy. Empirical evaluation of latency-sensitive application performance in the cloud. In Proceedings of the first annual ACM SIGMM conference on Multimedia systems, pages 35–46. ACM, 2010.

[52] Olivier Barrière. Synthèse et estimation de mouvements browniens multifraction- naires et autres processus à régularité prescrite: définition du processus auto-régulé multifractionnaire et applications. PhD dissertation, Nantes, 2007.

[53] Victor R Basili. Software modeling and measurement: the goal/question/metric paradigm. 1992.

[54] Matthias Becker, Sebastian Lehrig, and Steen Becker. Systematically deriving quality metrics for cloud computing systems. In Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering, pages 169–174. ACM, 2015.

[55] Dominique Bellenger, Jens Bertram, Andy Budina, Arne Koschel, Benjamin Pfän- der, Carsten Serowy, Irina Astrova, Stella Gatziu Grivas, and Marc Schaaf. Scaling in cloud environments. Recent Researches in Computer Science, 2011.

[56] Jan Beran. Statistics for long-memory processes, volume 61. CRC Press, 1994.

[57] Marcello Maria Bersani, Domenico Bianculli, Schahram Dustdar, Alessio Gambi, Carlo Ghezzi, and Srdan Krstic. Towards the formalization of properties of cloud- based elastic systems. In Proceedings of the 6th International Workshop on Principles of Engineering Service-oriented Systems (PESOS 2014). ACM, 2014.

[58] Scott Besley and Eugene Brigham. Principles of finance. Cengage Learning, 2011.

[59] Carsten Binnig, Donald Kossmann, Tim Kraska, and Simon Loesing. How is the weather tomorrow?: towards a benchmark for the cloud. In Proceedings of the Second International Workshop on Testing Database Systems, page 9. ACM, 2009.

[60] Stephen M Blackburn, Robin Garner, Chris Homann, Asjad M Khang, Kathryn S McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z Guyer, et al. The dacapo benchmarks: Java benchmarking development and analysis. In ACM Sigplan Notices, volume 41, pages 169–190. ACM, 2006. 202 BIBLIOGRAPHY

[61] Stephen M Blackburn, Kathryn S McKinley, Robin Garner, Chris Homann, As- jad M Khan, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z Guyer, et al. Wake up and smell the coee: evaluation methodology for the 21st century. Communications of the ACM, 51(8):83–89, 2008.

[62] Peter Bodik, Armando Fox, Michael J Franklin, Michael I Jordan, and David A Patterson. Characterizing, modeling, and generating workload spikes for stateful services. In Proceedings of the 1st ACM symposium on Cloud computing, pages 241–252. ACM, 2010.

[63] Robert B Bohn, John Messina, Fang Liu, Jin Tong, and Jian Mao. Nist cloud com- puting reference architecture. In Services (SERVICES), 2011 IEEE World Congress on, pages 594–596. IEEE, 2011.

[64] Gunnar Brataas and Excecutive Board. Scalability management for cloud comput- ing.

[65] Paul C Brebner. Is your cloud elastic enough?: performance modelling the elasticity of infrastructure as a service (iaas) cloud applications. In Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering, pages 263–266. ACM, 2012.

[66] Lionel Briand, Khaled El Emam, and Sandro Morasca. Theoretical and empirical validation of software product measures. International Software Engineering Re- search Network, Technical Report ISERN-95-03, 1995.

[67] Gerard Briscoe and Alexandros Marinos. Digital ecosystems in the clouds: to- wards community cloud computing. In Digital Ecosystems and Technologies, 2009. DEST’09. 3rd IEEE International Conference on, pages 103–108. IEEE, 2009.

[68] Rajkumar Buyya, James Broberg, and Andrzej M Goscinski. Cloud computing: principles and paradigms, volume 87. John Wiley & Sons, 2010.

[69] Rajkumar Buyya, Chee Shin Yeo, and Srikumar Venugopal. Market-oriented cloud computing: Vision, hype, and reality for delivering it services as computing utilities. In High Performance Computing and Communications, 2008. HPCC’08. 10th IEEE International Conference on, pages 5–13. Ieee, 2008.

[70] Rajkumar Buyya, Chee Shin Yeo, Srikumar Venugopal, James Broberg, and Ivona Brandic. Cloud computing and emerging it platforms: Vision, hype, and reality BIBLIOGRAPHY 203

for delivering computing as the 5th utility. Future Generation computer systems, 25(6):599–616, 2009.

[71] Maria Carla Calzarossa and Salvatore Tucci. Performance Evaluation of Complex Systems: Techniques and Tools: Performance 2002. Tutorial Lectures, volume 2459. Springer Science & Business Media, 2002.

[72] Nicholas G Carr. The big switch: Rewiring the world, from Edison to Google.WW Norton & Company, 2008.

[73] Davide Cerotti, Marco Gribaudo, Pietro Piazzolla, and Giuseppe Serazzi. Flexible cpu provisioning in clouds: A new source of performance unpredictability. In Quan- titative Evaluation of Systems (QEST), 2012 Ninth International Conference on, pages 230–237. IEEE, 2012.

[74] K Chandrasekaran. Essentials of cloud computing. CRC Press, 2014.

[75] Qingwen Chen. Towards energy-aware vm scheduling in iaas clouds through empir- ical studies. Yüksek Lisans Tezi„ University of Amsterdam, Hollanda, pages 10–11, 2011.

[76] Qingwen Chen, Paola Grosso, Karel van der Veldt, Cees De Laat, Rutger Hofman, and Henri Bal. Profiling energy consumption of vms for green cloud computing. In Dependable, Autonomic and Secure Computing (DASC), 2011 IEEE Ninth Interna- tional Conference on, pages 768–775. IEEE, 2011.

[77] David Chiu. Elasticity in the cloud. Crossroads, 16(3):3–4, 2010.

[78] Reuven Cohen. Defining Elastic Computing. , 2013. [Online; accessed 26-Jan-2016].

[79] Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM symposium on Cloud computing, pages 143–154. ACM, 2010.

[80] Georgiana Copil, Demetris Trihinas, Hong-Linh Truong, Daniel Moldovan, George Pallis, Schahram Dustdar, and Marios Dikaiakos. Advise–a framework for evaluating cloud service elasticity behavior. In Service-Oriented Computing, pages 275–290. Springer, 2014. 204 BIBLIOGRAPHY

[81] E Coutinho, Danielo G Gomes, and JD Souza. On applying microeconomics con- cepts to cloud elasticity evaluation. In XIII Workshop em Clouds e Aplicacoes (WCGA2015), 2015.

[82] Emanuel F Coutinho, Paulo AL Rego, Danielo G Gomes, and José N de Souza. Physics and microeconomics-based metrics for evaluating cloud computing elasticity. Journal of Network and Computer Applications, 63:159–172, 2016.

[83] Emanuel Ferreira Coutinho, Danielo Gonçalves Gomes, and Jose Neuman de Souza. An analysis of elasticity in cloud computing environments based on allocation time and resources. In Cloud Computing and Communications (LatinCloud), 2nd IEEE Latin American Conference on, pages 7–12. IEEE, 2013.

[84] MJ Crawley. The r book. 2007. Imperial College London at Silwood Park. UK, pages 527–528.

[85] Sudipto Das. Scalable and elastic transactional data stores for cloud computing platforms. PhD dissertation, University of California, Santa Barbara, 2011.

[86] Anthony Christopher Davison and David Victor Hinkley. Bootstrap methods and their application, volume 1. Cambridge university press, 1997.

[87] AL de Cerqueira Leite Duboc. A framework for the characterization and analysis of software systems scalability. PhD dissertation, UCL (University College London), 2010.

[88] Jiang Dejun, Guillaume Pierre, and Chi-Hung Chi. Ec2 performance analysis for re- source provisioning of service-oriented applications. In Service-Oriented Computing. ICSOC/ServiceWave 2009 Workshops, pages 197–207. Springer, 2010.

[89] Richard A DeMillo and Richard J Lipton. Software project forecasting. Cambridge, MA: MIT Press, 1981.

[90] Hepu Deng. Multicriteria analysis with fuzzy pairwise comparison. In Fuzzy Systems Conference Proceedings, 1999. FUZZ-IEEE’99. 1999 IEEE International, volume 2, pages 726–731. IEEE, 1999.

[91] Mohit Dhingra, J Lakshmi, SK Nandy, Chiranjib Bhattacharyya, and Kanchi Gopinath. Elastic resources framework in iaas, preserving performance slas. In 2013 BIBLIOGRAPHY 205

IEEE Sixth International Conference on Cloud Computing, pages 430–437. IEEE, 2013.

[92] Thibault Dory. Study and Comparison of Elastic Cloud Databases: Myth or Reality? Master’s thesis, Université Catholique de Louvain, 2011.

[93] Thibault Dory, Boris Mejías, Peter Van Roy, and Nam-Luc Tran. Comparative elasticity and scalability measurements of cloud databases. In Proc of the 2nd ACM symposium on cloud computing (SoCC), volume 11, 2011.

[94] Thibault Dory, Boris Mejías, Peter Van Roy, and Nam-Luc Tran. Measuring elastic- ity for cloud databases. In CLOUD COMPUTING 2011, The Second International Conference on Cloud Computing, GRIDs, and Virtualization, pages 154–160, 2011.

[95] Leticia Duboc, David Rosenblum, and Tony Wicks. A framework for characterization and analysis of software system scalability. In Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pages 375–384. ACM, 2007.

[96] Schahram Dustdar, Yike Guo, Benjamin Satzger, and Hong-Linh Truong. Principles of elastic processes. IEEE Internet Computing, 15(5):66–71, 2011.

[97] Jeremy Elson and Jon Howell. Handling flash crowds from your garage. In USENIX Annual Technical Conference, pages 171–184, 2008.

[98] E Evans and R Grossman. Cyber security and reliability in a digital cloud. US Department of Defense Science Board Study, 2013.

[99] Benjamin Farley, Ari Juels, Venkatanathan Varadarajan, Thomas Ristenpart, Kevin D Bowers, and Michael M Swift. More for your money: exploiting perfor- mance heterogeneity in public clouds. In Proceedings of the Third ACM Symposium on Cloud Computing, page 20. ACM, 2012.

[100] Norman Fenton. Software measurement: A necessary scientific basis. Software Engineering, IEEE Transactions on, 20(3):199–206, 1994.

[101] Norman Fenton and James Bieman. Software metrics: a rigorous and practical approach. CRC Press, 2014. 206 BIBLIOGRAPHY

[102] Marc Fielding. Virtual CPUs with Amazon Web Services. , 2014. [Online; accessed 01- Jun-2015].

[103] Timothy Fitz. Cloud elasticity. , 2009. [Online; accessed 26-Jan-2016].

[104] Philip J Fleming and John J Wallace. How not to lie with statistics: the correct way to summarize benchmark results. Communications of the ACM, 29(3):218–221, 1986.

[105] Enno Folkerts, Alexander Alexandrov, Kai Sachs, Alexandru Iosup, Volker Markl, and Cafer Tosun. Benchmarking in the cloud: What it should, can, and cannot be. In Selected Topics in Performance Evaluation and Benchmarking, pages 173–188. Springer, 2013.

[106] Will Forrest and Charlie Barthold. Clearing the air on cloud computing. Discussion Document from McKinsey and Company, 2009.

[107] Ian Foster and Steven Tuecke. Describing the elephant: The dierent faces of it as service. Queue, 3(6):26–29, 2005.

[108] Ian Foster, Yong Zhao, Ioan Raicu, and Shiyong Lu. Cloud computing and grid computing 360-degree compared. In Grid Computing Environments Workshop, 2008. GCE’08, pages 1–10. Ieee, 2008.

[109] Sören Frey, Florian Fittkau, and Wilhelm Hasselbring. Search-based genetic opti- mization for deployment and reconfiguration of software in the cloud. In Proceedings of the 2013 International Conference on Software Engineering, pages 512–521. IEEE Press, 2013.

[110] Guilherme Galante and Luis Carlos E de Bona. A survey on cloud computing elasticity. In Utility and Cloud Computing (UCC), 2012 IEEE Fifth International Conference on, pages 263–270. IEEE, 2012.

[111] Alessio Gambi, Antonio Filieri, and Schahram Dustdar. Iterative test suites refine- ment for elastic computing systems. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, pages 635–638. ACM, 2013. BIBLIOGRAPHY 207

[112] Alessio Gambi, Waldemar Hummer, and Schahram Dustdar. Automated testing of cloud-based elastic systems with autocles. In Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on, pages 714–717. IEEE, 2013.

[113] Alessio Gambi, Waldemar Hummer, and Schahram Dustdar. Testing elastic sys- tems with surrogate models. In Combining Modelling and Search-Based Software Engineering (CMSBSE), 2013 1st International Workshop on, pages 8–11. IEEE, 2013.

[114] Alessio Gambi, Giovanni Toetti, Cesare Pautasso, and Mauro Pezze. Kriging con- trollers for cloud applications. Internet Computing, IEEE, 17(4):40–47, 2013.

[115] Jerry Gao, K Manjula, P Roopa, E Sumalatha, Xiaoying Bai, Wei-Tek Tsai, and Tadahiro Uehara. A cloud-based taas infrastructure with tools for saas validation, performance and scalability evaluation. In Cloud Computing Technology and Science (CloudCom), 2012 IEEE 4th International Conference on, pages 464–471. IEEE, 2012.

[116] Jerry Gao, Pushkala Pattabhiraman, Xiaoying Bai, and Wei-Tek Tsai. Saas perfor- mance and scalability evaluation in clouds. In Service Oriented System Engineering (SOSE), 2011 IEEE 6th International Symposium on, pages 61–71. IEEE, 2011.

[117] Daniel F García and Javier García. Tpc-w e-commerce benchmark evaluation. Com- puter, 36(2):42–48, 2003.

[118] Saurabh Kumar Garg, Steve Versteeg, and Rajkumar Buyya. A framework for rank- ing of cloud computing services. Future Generation Computer Systems, 29(4):1012– 1023, 2013.

[119] Saurabh Kumar Garg, Steven Versteeg, and Rajkumar Buyya. Smicloud: A frame- work for comparing and ranking cloud services. In Utility and Cloud Computing (UCC), 2011 Fourth IEEE International Conference on, pages 210–218. IEEE, 2011.

[120] Andy Georges. Three pitfalls in java performance evaluation. PhD dissertation, Ghent University, 2008.

[121] Andy Georges, Dries Buytaert, and Lieven Eeckhout. Statistically rigorous java performance evaluation. ACM SIGPLAN Notices, 42(10):57–76, 2007. 208 BIBLIOGRAPHY

[122] Zhenhuan Gong, Xiaohui Gu, and John Wilkes. Press: Predictive elastic resource scaling for cloud systems. In Network and Service Management (CNSM), 2010 International Conference on, pages 9–16. IEEE, 2010.

[123] Rui Han, Moustafa M Ghanem, Li Guo, Yike Guo, and Michelle Osmond. Enabling cost-aware and adaptive elasticity of multi-tier cloud applications. Future Generation Computer Systems, 32:82–98, 2014.

[124] Nikolas Herbst and Samuel Kounev. Limbo: a tool for modeling variable load intensi- ties. In Proceedings of the 5th ACM/SPEC international conference on Performance engineering, pages 225–226. ACM, 2014.

[125] Nikolas Roman Herbst, Samuel Kounev, and Ralf Reussner. Elasticity in cloud computing: What it is, and what it is not. In ICAC, pages 23–27, 2013.

[126] Nikolas Roman Herbst, Samuel Kounev, Andreas Weber, and Henning Groenda. Bungee: An elasticity benchmark for self-adaptive iaas cloud environments. In Pro- ceedings of the 10th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS 2015), 2015.

[127] Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, and Marty Humphrey. Early observations on the performance of windows azure. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pages 367– 376. ACM, 2010.

[128] Jerry L Hintze and Ray D Nelson. Violin plots: a box plot-density trace synergism. The American Statistician, 52(2):181–184, 1998.

[129] Ricky Ho. Between Elasticity and Scalability. , 2009. [Online; accessed 26- Jan-2016].

[130] Ricky Ho. Between elasticity and scalability. , 2009. [Online; accessed 2016-03- 09].

[131] Karl Huppler. The art of building a good benchmark. In Performance Evaluation and Benchmarking, pages 18–30. Springer, 2009. BIBLIOGRAPHY 209

[132] Karl Huppler. Benchmarking with your head in the cloud. In Topics in Performance Evaluation, Measurement and Characterization, pages 97–110. Springer, 2012.

[133] Espen AF Ihlen. Introduction to multifractal detrended fluctuation analysis in mat- lab. Frontiers in physiology, 3, 2012.

[134] INRIA. Fraclab: A fractal analysis toolbox for signal and image processing.

[135] Alexandru Iosup, Simon Ostermann, M Nezih Yigitbasi, Radu Prodan, Thomas Fahringer, and Dick HJ Epema. Performance analysis of cloud computing services for many-tasks scientific computing. Parallel and Distributed Systems, IEEE Trans- actions on, 22(6):931–945, 2011.

[136] Alexandru Iosup, Nezih Yigitbasi, and Dick Epema. On the performance variability of production cloud services. In Cluster, Cloud and Grid Computing (CCGrid), 2011 11th IEEE/ACM International Symposium on, pages 104–113. IEEE, 2011.

[137] Sadeka Islam, Jacky Keung, Kevin Lee, and Anna Liu. An empirical study into adaptive resource provisioning in the cloud. In IEEE International Conference on Utility and Cloud Computing (UCC 2010), page 8, 2010.

[138] Sadeka Islam, Jacky Keung, Kevin Lee, and Anna Liu. Empirical prediction mod- els for adaptive resource provisioning in the cloud. Future Generation Computer Systems, 28(1):155–162, 2012.

[139] Sadeka Islam, Kevin Lee, Alan Fekete, and Anna Liu. How a consumer can measure elasticity for cloud platforms. In Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering, pages 85–96. ACM, 2012.

[140] Sadeka Islam, Srikumar Venugopal, and Anna Liu. Evaluating the impact of fine- scale burstiness on cloud elasticity. In Proceedings of the Sixth ACM Symposium on Cloud Computing, pages 250–261. ACM, 2015.

[141] PP Jogalekar and CM Woodside. A scalability metric for distributed computing applications in telecommunications. Teletrac Science and Engineering, 2:101–110, 1997.

[142] Tomas Kalibera and Richard Jones. Rigorous benchmarking in reasonable time. In ACM SIGPLAN Notices, volume 48, pages 63–74. ACM, 2013. 210 BIBLIOGRAPHY

[143] Tomas Kalibera and Richard E Jones. Quantifying performance changes with eect size confidence intervals. Technical report, Technical Report 4–12, University of Kent, 2012.

[144] Sahil Kansal, Gurjeet Singh, Harish Kumar, and Sakshi Kaushal. Pricing models in cloud computing. In Proceedings of the 2014 International Conference on Infor- mation and Communication Technology for Competitive Strategies, page 33. ACM, 2014.

[145] Pankaj Deep Kaur and Inderveer Chana. A resource elasticity framework for qos- aware execution of cloud applications. Future Generation Computer Systems, 37:14– 25, 2014.

[146] Jóakim Gunnarsson v Kistowski. Modeling Variations in Load Intensity Profiles. Master’s thesis, National Research Center, 2014.

[147] Barbara Kitchenham, Shari Lawrence Pfleeger, and Norman Fenton. Towards a framework for software measurement validation. Software Engineering, IEEE Trans- actions on, 21(12):929–944, 1995.

[148] Markus Klems, David Bermbach, and Rene Weinert. A runtime quality measure- ment framework for service systems. In Quality of Information and Communications Technology (QUATIC), 2012 Eighth International Conference on the, pages 38–46. IEEE, 2012.

[149] Andrea Knezevic. Overlapping confidence intervals and statistical significance. Stat- News: Cornell University Statistical Consulting Unit, 73, 2008.

[150] Ioannis Konstantinou, Evangelos Angelou, Christina Boumpouka, Dimitrios Tsoumakos, and Nectarios Koziris. On the elasticity of nosql databases over cloud management platforms. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 2385–2388. ACM, 2011.

[151] Donald Kossmann, Tim Kraska, and Simon Loesing. An evaluation of alternative architectures for transaction processing in the cloud. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 579–590. ACM, 2010. BIBLIOGRAPHY 211

[152] George Kousiouris, Tommaso Cucinotta, and Theodora Varvarigou. The eects of scheduling, workload type and consolidation scenarios on virtual machine perfor- mance and their prediction through optimized artificial neural networks. Journal of Systems and Software, 84(8):1270–1291, 2011.

[153] Rouven Krebs, Christof Momm, and Samuel Kounev. Metrics and techniques for quantifying performance isolation in cloud environments. Science of Computer Pro- gramming, 90:116–134, 2014.

[154] Stefan Krompass, Daniel Gmach, Andreas Scholz, Stefan Seltzsam, and Alfons Kemper. Quality of service enabled database applications. In Service-Oriented Computing–ICSOC 2006, pages 215–226. Springer, 2006.

[155] Jörn Kuhlenkamp, Markus Klems, and Oliver Röss. Benchmarking scalability and elasticity of distributed database systems. Proceedings of the VLDB Endowment, 7(13), 2014.

[156] Michael Kuperberg, Nikolas Herbst, Joakim von Kistowski, and Ralf Reussner. Defining and quantifying elasticity of resources in cloud computing and scalable platforms. 2011.

[157] Jonathan Kupferman, Je Silverman, Patricio Jara, and Je Browne. Scaling into the cloud. CS270-advanced operating systems, 2009.

[158] Wing-Cheong Lau, Ashok Erramilli, Jonathan L Wang, and Walter Willinger. Self- similar trac generation: The random midpoint displacement algorithm and its properties. In Communications, 1995. ICC’95 Seattle,’Gateway to Globalization’, 1995 IEEE International Conference on, volume 1, pages 466–472. IEEE, 1995.

[159] Sebastian Lehrig and Matthias Becker. Approaching the cloud: Using palladio for scalability, elasticity, and eciency analyses. In Proceedings of the Symposium on Software Performance, pages 26–28, 2014.

[160] Sebastian Lehrig, Hendrik Eikerling, and Steen Becker. Scalability, elasticity, and eciency in cloud computing: a systematic literature review of definitions and met- rics. In Proceedings of the 11th International ACM SIGSOFT Conference on Quality of Software Architectures, pages 83–92. ACM, 2015. 212 BIBLIOGRAPHY

[161] Alexander Lenk, Markus Klems, Jens Nimis, Stefan Tai, and Thomas Sandholm. What’s inside the cloud? an architectural map of the cloud landscape. In Pro- ceedings of the 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing, pages 23–31. IEEE Computer Society, 2009.

[162] Alexander Lenk, Michael Menzel, Johannes Lipsky, Stefan Tai, and Philipp Oer- mann. What are you paying for? performance benchmarking for infrastructure- as-a-service oerings. In Cloud Computing (CLOUD), 2011 IEEE International Conference on, pages 484–491. IEEE, 2011.

[163] Ang Li, Xiaowei Yang, Srikanth Kandula, and Ming Zhang. Cloudcmp: comparing public cloud providers. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pages 1–14. ACM, 2010.

[164] Ang Li, Xiaowei Yang, Srikanth Kandula, and Ming Zhang. Cloudcmp: shopping for a cloud made easy. USENIX HotCloud, pages 5–5, 2010.

[165] Zheng Li, Liam O’Brien, and He Zhang. Ceem: A practical methodology for cloud services evaluation. In Services (SERVICES), 2013 IEEE Ninth World Congress on, pages 44–51. IEEE, 2013.

[166] Zheng Li, Liam O’Brien, He Zhang, and Rainbow Cai. On a catalogue of metrics for evaluating commercial cloud services. In Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing, pages 164–173. IEEE Computer Society, 2012.

[167] Harold C Lim, Shivnath Babu, and Jerey S Chase. Automated control for elastic storage. In Proceedings of the 7th international conference on Autonomic computing, pages 1–10. ACM, 2010.

[168] Harold C Lim, Shivnath Babu, Jerey S Chase, and Sujay S Parekh. Automated control in cloud computing: challenges and opportunities. In Proceedings of the 1st workshop on Automated control for datacenters and clouds, pages 13–18. ACM, 2009.

[169] Fang Liu, Jin Tong, Jian Mao, Robert Bohn, John Messina, Lee Badger, and Dawn Leaf. Nist cloud computing reference architecture. NIST special publication, 500(2011):292, 2011. BIBLIOGRAPHY 213

[170] Tania Lorido-Botran, Jose Miguel-Alonso, and Jose A Lozano. A review of auto- scaling techniques for elastic applications in cloud environments. Journal of Grid Computing, 12(4):559–592, 2014.

[171] Tania Lorido-Botrán, José Miguel-Alonso, and Jose Antonio Lozano. Auto-scaling techniques for elastic applications in cloud environments. Department of Computer Architecture and Technology, University of Basque Country, Tech. Rep. EHU-KAT- IK-09, 12:2012, 2012.

[172] Mika Majakorpi. Theory and practice of rapid elasticity in cloud applications.MSc thesis, University of Helsinki, 2013.

[173] Ming Mao and Marty Humphrey. A performance study on the vm startup time in the cloud. In Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on, pages 423–430. IEEE, 2012.

[174] Paul Marshall, Kate Keahey, and Tim Freeman. Elastic site: Using clouds to elasti- cally extend site resources. In Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pages 43–52. IEEE Computer Society, 2010.

[175] Robert L Mason, Richard F Gunst, and James L Hess. Statistical principles in exper- imental design. Statistical Design and Analysis of Experiments: With Applications to Engineering and Science, Second Edition, pages 107–139, 2003.

[176] Peter Mell and Tim Grance. Draft nist working definition of cloud computing. Referenced on June. 3rd, 15:32, 2009.

[177] Peter Mell and Tim Grance. The nist definition of cloud computing. Communications of the ACM, 53(6):50, 2010.

[178] Peter Mell and Tim Grance. The nist definition of cloud computing. 2011.

[179] Daniel Menascé, Virgílio Almeida, Rudolf Riedi, Flávia Ribeiro, Rodrigo Fonseca, and Wagner Meira Jr. In search of invariants for e-business workloads. In Proceedings of the 2nd ACM conference on Electronic commerce, pages 56–65. ACM, 2000.

[180] Daniel A Menascé, Virgilio AF Almeida, Rodrigo Fonseca, and Marco A Mendes. A methodology for workload characterization of e-commerce sites. In Proceedings of the 1st ACM conference on Electronic commerce, pages 119–128. ACM, 1999. 214 BIBLIOGRAPHY

[181] Daniel A Menascé, Virgilio AF Almeida, Rudolf Riedi, Flávia Ribeiro, Rodrigo Fonseca, and Wagner Meira Jr. A hierarchical and multiscale approach to analyze e-business workloads. Performance Evaluation, 54(1):33–57, 2003.

[182] Andrew Meneely, Ben Smith, and Laurie Williams. Validating software metrics: A spectrum of philosophies. ACM Transactions on Software Engineering and Method- ology (TOSEM), 21(4):24, 2012.

[183] Ningfang Mi, Giuliano Casale, Ludmila Cherkasova, and Evgenia Smirni. Injecting realistic burstiness to a traditional client-server benchmark. In Proceedings of the 6th international conference on Autonomic computing, pages 149–158. ACM, 2009.

[184] Umar Farooq Minhas, Rui Liu, Ashraf Aboulnaga, Kenneth Salem, Jonathan Ng, and Sean Robertson. Elastic scale-out for partition-based database systems. In Data Engineering Workshops (ICDEW), 2012 IEEE 28th International Conference on, pages 281–288. IEEE, 2012.

[185] OCDA Master Usage Model. Compute infratructure as a service. Techni- cal report, Tech. rep., Open Alliance (OCDA), 2012. Available at www.opendatacenteralliance.org/docs/ODCA_Compute_IaaS_MasterUM_v1.0_Nov2012.pdf.

[186] Daniel Moldovan, Georgiana Copil, Hong-Linh Truong, and Schahram Dustdar. Mela: Monitoring and analyzing elasticity of cloud services. In Cloud Comput- ing Technology and Science (CloudCom), 2013 IEEE 5th International Conference on, volume 1, pages 80–87. IEEE, 2013.

[187] Douglas C Montgomery. Design and analysis of experiments. John Wiley & Sons, 2008.

[188] Jean-François Muzy, Emmanuel Bacry, and Alain Arneodo. Multifractal formalism for fractal signals: The structure-function approach versus the wavelet-transform modulus-maxima method. Physical review E, 47(2):875, 1993.

[189] Todd Mytkowicz. Supporting experiments in computer systems research.PhDdis- sertation, University of Colorado, 2010.

[190] Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F Sweeney. Pro- ducing wrong data without doing anything obviously wrong! ACM Sigplan Notices, 44(3):265–276, 2009. BIBLIOGRAPHY 215

[191] Fiona Fui-Hoon Nah. A study on tolerable waiting time: how long are web users willing to wait? Behaviour & Information Technology, 23(3):153–163, 2004.

[192] Raghunath Nambiar and Meikel Poess. Selected Topics in Performance Evalua- tion and Benchmarking: 4th TPC Technology Conference, TPCTC 2012, Istanbul, Turkey, August 27, 2012, Revised Selected Papers, volume 7755. Springer, 2013.

[193] Ripal Nathuji, Aman Kansal, and Alireza Ghaarkhah. Q-clouds: managing perfor- mance interference eects for qos-aware clouds. In Proceedings of the 5th European conference on Computer systems, pages 237–250. ACM, 2010.

[194] Dmitri Nevedrov. Using jmeter to performance test web services. Published on dev2dev (dev2dev. bea. com/), 2006.

[195] Hiep Nguyen, Zhiming Shen, Xiaohui Gu, Sethuraman Subbiah, and John Wilkes. Agile: Elastic distributed resource scaling for infrastructure-as-a-service. In Proc. of the USENIX International Conference on Automated Computing (ICAC’13). San Jose, CA, 2013.

[196] Simon Ostermann, Alexandru Iosup, Nezih Yigitbasi, Radu Prodan, Thomas Fahringer, and Dick Epema. An early performance analysis of cloud computing services for scientific computing. Delft University of Technology, Tech. Rep, 2008.

[197] Pradeep Padala, Kai-Yuan Hou, Kang G Shin, Xiaoyun Zhu, Mustafa Uysal, Zhikui Wang, Sharad Singhal, and Arif Merchant. Automated control of multiple virtualized resources. In Proceedings of the 4th ACM European conference on Computer systems, pages 13–26. ACM, 2009.

[198] Romain-François Peltier, Jacques Lévy Véhel, et al. Multifractional brownian mo- tion: definition and preliminary results. 1995.

[199] Steve Pincus and Burton H Singer. Randomness and degrees of irregularity. Pro- ceedings of the National Academy of Sciences, 93(5):2083–2088, 1996.

[200] Daryl Plummer. Cloud Elasticity Could Make You Go Broke. , 2009. [Online; accessed 26-Jan-2016]. 216 BIBLIOGRAPHY

[201] Daryl C Plummer, Thomas J Bittman, Tom Austin, David W Cearley, and David Mitchell Smith. Cloud computing: Defining and describing an emerging phenomenon. Gartner, June, 17, 2008.

[202] Jane Radatz, Anne Geraci, and Freny Katki. Ieee standard glossary of software engineering terminology. IEEE Std, 610121990(121990):3, 1990.

[203] Jia Rao, Xiangping Bu, Cheng-Zhong Xu, Leyi Wang, and George Yin. Vconf: a reinforcement learning approach to virtual machines auto-configuration. In Proceed- ings of the 6th international conference on Autonomic computing, pages 137–146. ACM, 2009.

[204] Stefan Ried, Holger Kisker, and Pascal Matzke. The evolution of cloud computing markets. Forrester Research, 2010.

[205] Rudolf H Riedi. Multifractal processes. Technical report, DTIC Document, 1999.

[206] Sherif Sakr and Anna Liu. Sla-based and consumer-centric dynamic provisioning for cloud databases. In Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on, pages 360–367. IEEE, 2012.

[207] Sherif Sakr and Anna Liu. Is your cloud-hosted database truly elastic? In Services (SERVICES), 2013 IEEE Ninth World Congress on, pages 444–447. IEEE, 2013.

[208] Cli Saran. Preparing for mainstream cloud it. , 2015. [Online; accessed 2016-03-09].

[209] Dietmar Saupe. Algorithms for random fractals. In The science of fractal images, pages 71–136. Springer, 1988.

[210] Jörg Schad, Jens Dittrich, and Jorge-Arnulfo Quiané-Ruiz. Runtime measurements in the cloud: observing, analyzing, and reducing variance. Proceedings of the VLDB Endowment, 3(1-2):460–471, 2010.

[211] Edwin Schouten. Rapid elasticity and the cloud. , 2012. [Online; accessed 26-Jan- 2016]. BIBLIOGRAPHY 217

[212] Mahnaz Shams, Diwakar Krishnamurthy, and Behrouz Far. A model-based ap- proach for testing the performance of web applications. In Proceedings of the 3rd international workshop on Software quality assurance, pages 54–61. ACM, 2006.

[213] Doaa M Shawky and Ahmed F Ali. Defining a measure of cloud computing elasticity. In Systems and Computer Science (ICSCS), 2012 1st International Conference on, pages 1–5. IEEE, 2012.

[214] Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, and John Wilkes. Cloudscale: elastic resource scaling for multi-tenant cloud systems. In Proceedings of the 2nd ACM Symposium on Cloud Computing, page 5. ACM, 2011.

[215] Rhodney Simões and Carlos Kamienski. Elasticity management in private and hybrid clouds. In 2014 IEEE 7th International Conference on Cloud Computing, pages 793– 800. IEEE, 2014.

[216] Matthew Sladescu, Alan Fekete, Kevin Lee, and Anna Liu. Event aware workload prediction: A study using auction events. Springer, 2012.

[217] Basem Suleiman. Elasticity economics of cloud-based applications. In Services Com- puting (SCC), 2012 IEEE Ninth International Conference on, pages 694–695. IEEE, 2012.

[218] Basem Suleiman, Sherif Sakr, Ross Jeery, and Anna Liu. On understanding the economics and elasticity challenges of deploying business applications on public cloud infrastructure. Journal of Internet Services and Applications, 3(2):173–193, 2012.

[219] Basem Suleiman, Sherif Sakr, Srikumar Venugopal, and Wasim Sadiq. Trade-o analysis of elasticity approaches for cloud-based business applications. In Web In- formation Systems Engineering-WISE 2012, pages 468–482. Springer, 2012.

[220] Basem Suleiman and Srikumar Venugopal. Modeling performance of elasticity rules for cloud-based applications. In Enterprise Distributed Object Computing Conference (EDOC), 2013 17th IEEE International, pages 201–206. IEEE, 2013.

[221] Abhisar Swami. Cloud Elasticity or Scalability, Whats the Dierence? , 2011. [Online; accessed 26-Jan- 2016]. 218 BIBLIOGRAPHY

[222] Christian Tinnefeld, Daniel Taschik, and Hasso Plattner. Providing high-availability and elasticity for an in-memory database system with ramcloud. In GI-Jahrestagung, pages 472–486, 2013.

[223] Christian Tinnefeld, Daniel Taschik, and Hasso Plattner. Quantifying the elastic- ity of a database management system. In DBKDA 2014, The Sixth International Conference on Advances in Databases, Knowledge, and Data Applications, pages 125–131, 2014.

[224] Claude Tricot. Curves and fractal dimension. Springer, 1995.

[225] Wei-Tek Tsai, Yu Huang, and Qihong Shao. Testing the scalability of saas ap- plications. In Service-Oriented Computing and Applications (SOCA), 2011 IEEE International Conference on, pages 1–4. IEEE, 2011.

[226] Markus Ullrich and Jorg Lassig. Current challenges and approaches for resource demand estimation in the cloud. In Cloud Computing and Big Data (CloudCom- Asia), 2013 International Conference on, pages 387–394. IEEE, 2013.

[227] Guido Urdaneta, Guillaume Pierre, and Maarten van Steen. Wikipedia workload analysis for decentralized hosting. Elsevier Computer Networks, 53(11):1830–1845, July 2009.

[228] Bhuvan Urgaonkar, Prashant Shenoy, Abhishek Chandra, Pawan Goyal, and Tim- othy Wood. Agile dynamic provisioning of multi-tier internet applications. ACM Transactions on Autonomous and Adaptive Systems (TAAS), 3(1):1, 2008.

[229] Udaykiran Vallamsetty, Krishna Kant, and Prasant Mohapatra. Characterization of e-commerce trac. Electronic Commerce Research, 3(1-2):167–192, 2003.

[230] André Van Hoorn, Matthias Rohr, and Wilhelm Hasselbring. Generating proba- bilistic and intensity-varying workload for web-based software systems. Performance Evaluation: Metrics, Models and Benchmarks, pages 124–143, 2008.

[231] Rini Van Solingen, Vic Basili, Gianluigi Caldiera, and H Dieter Rombach. Goal question metric (gqm) approach. Encyclopedia of software engineering, 2002.

[232] Luis M Vaquero, Luis Rodero-Merino, Juan Caceres, and Maik Lindner. A break in the clouds: towards a cloud definition. ACM SIGCOMM Computer Communication Review, 39(1):50–55, 2008. BIBLIOGRAPHY 219

[233] Jinesh Varia. Architecting for the cloud: Best practices. Amazon Web Services, 2010.

[234] Martti Vasar, Satish Narayana Srirama, and Marlon Dumas. Framework for mon- itoring and testing web application scalability on the cloud. In Proceedings of the WICSA/ECSA 2012 Companion Volume, pages 53–60. ACM, 2012.

[235] Qingyang Wang, Yasuhiko Kanemasa, Jie Li, Danushka Jayasinghe, Motoyuki Kawaba, and Calton Pu. Response time reliability in cloud environments: an empir- ical study of n-tier applications at high resource utilization. In Reliable Distributed Systems (SRDS), 2012 IEEE 31st Symposium on, pages 378–383. IEEE, 2012.

[236] Andreas Weber. Resource elasticity benchmarking in cloud environments. Master’s thesis, Department of informatics., KIT, 2014.

[237] Andreas Weber, Nikolas Roman Herbst, Henning Groenda, and Samuel Kounev. Towards a resource elasticity benchmark for cloud environments. In 2nd Interna- tional Workshop on Hot Topics in Cloud Service Scalability (HotTopiCS 2014). ACM (March 2014), 2014.

[238] Christof Weinhardt, Dipl-Inform-Wirt Arun Anandasivam, Benjamin Blau, Dipl- Inform Nikolay Borissov, Dipl-Math Thomas Meinl, Dipl-Inform-Wirt Wibke Michalk, and Jochen Stößer. Cloud computing–a classification, business models, and research directions. Business & Information Systems Engineering, 1(5):391– 399, 2009.

[239] Joe Weinman. Time is money: the value of on-demand, January 2011. Working paper, available from www.joeweinman.com (last retrieved on February 28, 2012).

[240] Charles B Weinstock and John B Goodenough. On system scalability. Technical report, DTIC Document, 2006.

[241] Dan Williams, Hani Jamjoom, Yew-Huey Liu, and Hakim Weatherspoon. Over- driver: Handling memory overload in an oversubscribed cloud. In ACM SIGPLAN Notices, volume 46, pages 205–216. ACM, 2011.

[242] Walter Willinger, Murad S Taqqu, Will E Leland, and Daniel V Wilson. Self- similarity in high-speed packet trac: analysis and modeling of trac measurements. Statistical science, pages 67–85, 1995. 220 BIBLIOGRAPHY

[243] Rich Wolski. Cloud Computing and Open Source: Watching Hype meet Real- ity. , 2011. [Online; accessed 26-Jan-2016].

[244] Cathy H Xia, Zhen Liu, Mark S Squillante, Li Zhang, and Naceur Malouch. Web trac modeling at finer time scales and performance implications. Performance Evaluation, 61(2):181–201, 2005.

[245] Cheng-Zhong Xu, Jia Rao, and Xiangping Bu. Url: A unified reinforcement learning approach for autonomic cloud management. Journal of Parallel and Distributed Computing, 72(2):95–105, 2012.

[246] Yunjing Xu. Characterizing and Mitigating Virtual Machine Interference in Public Clouds. PhD dissertation, University of Michigan, 2014.

[247] Jianwei Yin, Xingjian Lu, Hanwei Chen, Xinkui Zhao, and Neal N Xiong. System resource utilization analysis and prediction for cloud based applications under bursty workloads. Information Sciences, 279:338–357, 2014.

[248] Qi Zhang, Ludmila Cherkasova, and Evgenia Smirni. A regression-based ana- lytic model for dynamic resource provisioning of multi-tier applications. In Auto- nomic Computing, 2007. ICAC’07. Fourth International Conference on, pages 27– 27. IEEE, 2007.

[249] Qian Zhu and Gagan Agrawal. Resource provisioning with budget constraints for adaptive applications in cloud environments. In Proceedings of the 19th ACM Inter- national Symposium on High Performance Distributed Computing, pages 304–307. ACM, 2010. Appendix A

Elasticity definition

“In cloud computing, elasticity is a term used to reference the ability of • a system to adapt to changing workload demand by provisioning and de- provisioning pooled resources so that provisioned resources match current demand as well as possible.” (Cloud computing IT Glossary [12])

“In the service provider view, cloud service elasticity is the ability to in- • crease or decrease the amount of system capacity (for example, CPU, stor- age, memory and input/output bandwidth) that is available for a given cloud service on demand, in an automated fashion. This gives their cus- tomers the perception of unlimited capacity. From the consumer and RTI perspectives, cloud service elasticity is an automated means to increase or decrease a specific service capacity in response to increasing or scheduled demand changes.” (Gartner’s IT Glossary [13])

“Cloud Computing’s ability to add or remove resources at a fine grain (one • server at a time with EC2) and with a lead time of minutes rather than weeks allows matching resources to workload much more closely.” (Armbrust et al. [48])

“Elasticity, i.e. the ability to deal with load variations by adding more re- • sources during high load or consolidating the tenants to fewer nodes when the load decreases, all in a live system without service disruption, is there- fore critical for these systems.” (Agrawal et al. [40])

“near 100% resource elasticity would hint towards a high density of recon- • 221 222 Elasticity definition

figuration points, low reaction time and small eects.” (Kuperberg et al. [156])

“Scale is an aspect of performance and the ability to support customer • needs. The concept of elasticity is related to the ability to support those needs in large or small scale at will. The key issue with elasticity is the ability for a system to scale both in an upward direction (for example, to millions of users) and in a downward direction (for example, to one user) without disrupting the economics of the business model associated with the cloud service.” (Plummer et al. [201])

“Elasticity: While scale-out provides the ability to have large systems, elas- • ticity means that we can add more capacity to a running system by deploy- ing new instances of each component, and shifting load to them.” (Cooper et al. [79])

“The ability of an application deployment on a cloud platform to change in • size dynamically at runtime is referred to as elasticity or rapid elasticity.” (Majakorpi [172])

“Elasticity is defined in terms of how much a Cloud service can be scaled • during peak times. This is defined by two attributes: mean time taken to expand or contract the service capacity, and maximum capacity of service. The capacity is the maximum number of compute unit which can be pro- vided at peak times.” (Garg et al. [119, 118])

“In cloud environments, resources are virtualized. This virtualization en- • ables elasticity of the cloud, meaning that the cloud can easily and quickly be resized to adjust to a variable workload.” (Bellenger et al. [55])

“With the advent of provisioning information technology infrastructure • over the Internet, the aspect of elasticity became more important as it defines how well a system adapts to a changing workload.” (Tinnefeld et al. [223])

“An important goal for database systems today is to provide elastic scale- • out, i.e., the ability to grow and shrink processing capacity on demand, with varying load.” (Minhas et al. [184]) 223

“Elasticity is the degree a cloud layer autonomously adapts capacity to • workload over time.” (Lehrig et al. [160])

Elasticity is “the ability to expand or contract resources in order to meet • the exact demand.” (Konstantinou et al. [150])

“The elasticity is a characterization of how a cluster reacts when new • nodes are added or removed under load. It is defined by two properties. First, the time needed for the cluster to stabilize and second the impact on performance. To measure the time for stabilization, it is mandatory to characterize the stability of a cluster, and therefore a measure of the variation in performance is needed.” (Dory et al. [94])

Elasticity “represents the ability to dynamically and rapidly scale up or • down the allocated computing resources on demand.” (Sakr and Liu [207])

“Elastic computing systems can dynamically scale to continuously and • cost-eectively provide their required Quality of Service in face of time- varying workloads, and they are usually implemented in the cloud.” (Gambi et al. [111])

“Elasticity of a cloud computing system refers to its ability to expand and • contract overtime in response to users’ demands.” (Shawky et al. [213])

“we define elasticity as the impact on performance of new nodes boot- • strapped while the same load is applied until stabilization.” (Dory et al. [93])

“Elasticity measures how ecient a system can be scaled at runtime, in • terms of scaling speed and performance impact on the concurrent work- loads.” (Kuhlenkamp et al. [155])

“Cloud-based elastic computing systems dynamically change their resources • allocation to provide consistent quality of service and minimal usage of resources in the face of workload fluctuations.” (Gambi et al. [112])

“Resource Elasticity is thus referred to as the process of allocating su- • cient amount of computational resources to the application so that the QoS 224 Elasticity definition

requirements are always met in response to the changing user load.” (Kaur and Chana [145])

“One of many advantages of the cloud is the elasticity, the ability to dy- • namically acquire or release computing resources in response to demand.” (Mao and Humphrey [173])

“Elastic systems dynamically change their resources allocation to provide • consistent quality of service in face of workload fluctuations. However, their ability to adapt could be a double-edged sword if not properly designed: They may fail to acquire the right amount of resources or even fail to release them” (Gambi et al. [113])

“Elasticity is a critical mesaurement: the time it takes to start a node • up, and your minimum time commitment per node.” And “there are two important measures of cloud elasticity: spin-up elasticity and spin-down elasticity. Spin-up elasticity is the time between requesting compute power and recieving it. Spin-down elasticity is the time between no longer requir- ing compute power and no longer paying for it. In the case of EC2 these numbers aren’t balanced, it’s a minute to spin up and up to an hour to spin down. EC2’s true elasticity is an hour!” (Timothy Fitz [103])

“elasticity is the ability to instantly commission and decommission large • amount of resource capacity on the fly, and then charge purely based on the actual resource usage. Elasticity is quantitatively measured by - Speed of commissioning / decommissioning, Max amount of resource can be brought in, Granularity of usage accounting” (Ricky Ho [129])

“A simple, but interesting property in utility models is elasticity, that is, • the ability to stretch and contract services directly according to the con- sumer’s needs.” (David Chiu, Editor, CrossRoad [77])

“Elasticity is basically a ‘rename’ of scalability [. . . ].” And “Elasticity is • [. . . ] more like a rubber band. You ‘stretch’ the capacity when you need it and ‘release’ it when you don’t anymore.” (Edwin Schouten, IBM, Thoughts on Cloud [211])

“In the cloud, because you have this idea of elasticity, where you can scale • 225

up your compute resources when you need them, and scale them back down, obviously that adds another dimension to old-school capacity planning.” (Neil Ashizawa, HP [49])

Elasticity is “the quantifiable ability to manage, measure, predict and adapt • responsiveness of an application based on real time demands placed on an infrastructure using a combination of local and remote computing re- sources.” (Reuven Cohen [78])

“Elasticity measures the ability of the cloud to map a single user request • to dierent resources.” (Rich Wolski, CTO, [243])

“[. . . ] defines elasticity as the configurability and expandability of the so- • lution [. . . ] Centrally, it is the ability to scale up and scale down capacity based on subscriber workload.” (OCDA, Compute Infratructure as a Service [185])

“Capabilities can be elastically provisioned and released, in some cases • automatically, to scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time.” (Mell & Grance, NIST [178])

“Elasticity is the degree to which a system is able to adapt to workload • changes by provisioning and de-provisioning resources in an autonomic manner, such that at each point in time the available resources match the current demand as closely as possible.” (Herbst et. al [125]) 226 Elasticity definition Appendix B

Elasticity behavior of standard workloads

This chapter presents the details of the standard workload suite and EC2 platform’s elas- ticity behavior in response to each individual workload of this suite.

B.1 Standard workload suite

This section depicts the workloads of the standard workload suite: sinusoidal workloads (Fig. B.1), sinusoidal workloads with plateau (Fig. B.2), exponential workloads (Fig. B.3), linear workload (Fig. B.4) and random workload (Fig. B.5).

227 228 Elasticity behavior of standard workloads

(a) sine_30

(b) sine_60

(c) sine_90

Figure B.1: Sinusoidal workloads B.1. Standard workload suite 229

(a) sine_plateau_10

(b) sine_plateau_40

(c) sine_plateau_70

Figure B.2: Sinusoidal workloads with plateau 230 Elasticity behavior of standard workloads

(a) exp_18_2.25

(b) exp_24_3.0

Figure B.3: Exponential workloads

Figure B.4: Linear workload (linear_240)

Figure B.5: Random workload B.2. Elasticity behaviors 231

B.2 Elasticity behaviors

This section shows the elasticity behavior of the EC2 platform in response to the standard workload suite. Recall from Chapter 3, the scaling rulesets were defined as follows (with m1.small instance as the basic building block):

Table B.1: Autoscaling engine configuration

Ruleset Monitoring Upper Lower Upper Lower VM in- VM Scale- Scale-in interval breach breach thresh- thresh- crement decre- out cool- dura- dura- old old ment cool- down tion tion down period period 1 1min 2 2 70% 30% 1 1 2mins 2mins mins mins CPU CPU average average 2 1min 2 15 70% 20% 2 2 2mins 10 mins mins mins CPU CPU average average 3 1min 4 10 1.5 sec 20% 1 1 2mins 5mins mins mins max CPU latency average 232 Elasticity behavior of standard workloads 100 100 100 100 100 80 80 80 80 80 60 60 60 60 60 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency (b) Ruleset 2 40 40 40 40 40 CPU Demand and Supply Penalty for Over-Provisioning Penalty for Under-Provisioning 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 30 25 20 15 10 1.2 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

30000 25000 20000 15000 10000 No. of Requests of No. improvement with respect to Ruleset 1 (Reason for huge improvement can be 59% 100 100 100 100 100 80 80 80 80 80 0 . 41 , indicating 60 60 60 60 60 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency 40 40 40 40 40 CPU Demand and Supply Penalty for Over-Provisioning Penalty for Under-Provisioning (a) Ruleset 1 (reference) 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 30 25 20 15 10 1.2 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

30000 25000 20000 15000 10000 No. of Requests of No. attributed to bulk-provisioning & resource reuse from previous cycle) Figure B.6: EC2 platform’sPenalty elasticity ratio behavior of in Ruleset 2 response for to this sine_30 workload workload. is B.2. Elasticity behaviors 233 100 100 100 100 100 80 80 80 80 80 60 60 60 60 60 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency (b) Ruleset 3 40 40 40 40 40 CPU Demand and Supply Penalty for Over-Provisioning Penalty for Under-Provisioning 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 30 25 20 15 10 1.2 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

30000 25000 20000 15000 10000 No. of Requests of No. improvement with respect to Ruleset 1 (Reason for improvement can be 49% 100 100 100 100 100 80 80 80 80 80 0 . 51 , indicating 60 60 60 60 60 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency 40 40 40 40 40 CPU Demand and Supply Penalty for Over-Provisioning Penalty for Under-Provisioning (a) Ruleset 1 (reference) 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 30 25 20 15 10 1.2 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

30000 25000 20000 15000 10000 No. of Requests of No. Figure B.7: EC2 platform’sPenalty elasticity ratio behavior of in response Ruleset to 3 sine_30 for workload. this workload is attributed to resource reuse from previous cycle). 234 Elasticity behavior of standard workloads 180 180 180 180 180 160 160 160 160 160 140 140 140 140 140 120 120 120 120 120 100 100 100 100 100 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency 80 80 80 80 80 (b) Ruleset 2 CPU Demand and Supply Penalty for Over-Provisioning Penalty for Under-Provisioning 60 60 60 60 60 40 40 40 40 40 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 30 25 20 15 10 1.2 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

35000 30000 25000 20000 15000 10000 No. of Requests of No. improvement with respect to Ruleset 1 (Reason for huge improvement can be 76% 180 180 180 180 180 160 160 160 160 160 140 140 140 140 140 0 . 24 , indicating 120 120 120 120 120 100 100 100 100 100 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency 80 80 80 80 80 CPU Demand and Supply Penalty for Over-Provisioning Penalty for Under-Provisioning 60 60 60 60 60 (a) Ruleset 1 (reference) 40 40 40 40 40 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 30 25 20 15 10 1.2 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

35000 30000 25000 20000 15000 10000 No. of Requests of No. attributed to bulk-provisioning & resource reuse from previous cycle) Figure B.8: EC2 platform’sPenalty elasticity ratio behavior of in Ruleset 2 response for to this sine_60 workload workload. is B.2. Elasticity behaviors 235 180 180 180 180 180 160 160 160 160 160 140 140 140 140 140 120 120 120 120 120 100 100 100 100 100 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency 80 80 80 80 80 (b) Ruleset 3 CPU Demand and Supply Penalty for Over-Provisioning Penalty for Under-Provisioning 60 60 60 60 60 40 40 40 40 40 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 30 25 20 15 10 1.2 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

35000 30000 25000 20000 15000 10000 No. of Requests of No. improvement with respect to Ruleset 1(Reason for improvement can be 14% 180 180 180 180 180 160 160 160 160 160 0 . 86 , indicating 140 140 140 140 140 120 120 120 120 120 100 100 100 100 100 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency 80 80 80 80 80 CPU Demand and Supply Penalty for Over-Provisioning Penalty for Under-Provisioning 60 60 60 60 60 (a) Ruleset 1 (reference) 40 40 40 40 40 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 30 25 20 15 10 1.2 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

35000 30000 25000 20000 15000 10000 No. of Requests of No. attributed to resource reuse from previous cycle). Figure B.9: EC2 platform’sPenalty elasticity ratio behavior of in response Ruleset to 3 sine_60 for workload. this workload is 236 Elasticity behavior of standard workloads 250 250 250 250 250 200 200 200 200 200 150 150 150 150 150 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency (b) Ruleset 2 CPU Demand and Supply 100 100 100 100 100 Penalty for Over-Provisioning Penalty for Under-Provisioning 50 50 50 50 50 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 2 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 20 15 10

2.5 1.5 0.5 800 600 400 200 (cents) Penalty

1800 1600 1400 1200 (%) 1000 CPU 5000

35000 30000 25000 20000 15000 10000 No. of Requests of No. improvement with respect to Ruleset 1 (Reason for improvement can be 48% 250 250 250 250 250 0 . 52 , indicating 200 200 200 200 200 150 150 150 150 150 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency CPU Demand and Supply 100 100 100 100 100 Penalty for Over-Provisioning Penalty for Under-Provisioning (a) Ruleset 1 (reference) 50 50 50 50 50 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 2 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 20 15 10

2.5 1.5 0.5 800 600 400 200 (cents) Penalty

1800 1600 1400 1200 (%) 1000 CPU 5000

35000 30000 25000 20000 15000 10000 No. of Requests of No. Figure B.10: EC2 platform’sPenalty elasticity ratio behavior of in response Ruleset to 2 sine_90 for workload. this workload is attributed to bulk-provisioning & resource reuse from previous cycle) B.2. Elasticity behaviors 237 250 250 250 250 250 200 200 200 200 200 150 150 150 150 150 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency (b) Ruleset 3 CPU Demand and Supply 100 100 100 100 100 Penalty for Over-Provisioning Penalty for Under-Provisioning 50 50 50 50 50 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 2 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 20 15 10

2.5 1.5 0.5 800 600 400 200 (cents) Penalty

1800 1600 1400 1200 (%) 1000 CPU 5000

35000 30000 25000 20000 15000 10000 No. of Requests of No. improvement with respect to Ruleset 1 (Resource-reuse alleviated the 20% 250 250 250 250 250 0 . 80 , indicating 200 200 200 200 200 150 150 150 150 150 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency CPU Demand and Supply 100 100 100 100 100 Penalty for Over-Provisioning Penalty for Under-Provisioning (a) Ruleset 1 (reference) 50 50 50 50 50 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 2 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 20 15 10

2.5 1.5 0.5 800 600 400 200 (cents) Penalty

1800 1600 1400 1200 (%) 1000 CPU 5000

35000 30000 25000 20000 15000 10000 No. of Requests of No. Figure B.11: EC2 platform’sPenalty elasticity ratio behavior in of response Ruleset to 3 sine_90 workload. for this workload is under-provisioned state to some extent, however, rippling e ect gave rise to increased over-provisioning cost) 238 Elasticity behavior of standard workloads 250 250 250 250 250 200 200 200 200 200 150 150 150 150 150 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency (b) Ruleset 2 100 100 100 100 100 CPU Demand and Supply Penalty for Over-Provisioning Penalty for Under-Provisioning 50 50 50 50 50 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 30 25 20 15 10 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

30000 25000 20000 15000 10000 No. of Requests of No. improvement with respect to Ruleset 1 (Bulk-provisioning improved the 36% 250 250 250 250 250 200 200 200 200 200 0 . 64 , indicating 150 150 150 150 150 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency 100 100 100 100 100 CPU Demand and Supply Penalty for Over-Provisioning Penalty for Under-Provisioning (a) Ruleset 1 (reference) 50 50 50 50 50 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 30 25 20 15 10 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

30000 25000 20000 15000 10000 No. of Requests of No. Figure B.12: EC2 platform’sPenalty elasticity ratio behavior of in response Ruleset to 2 sine_plateau_10 for workload. this workload is under-provisioned situation, however, troughthe plateau newly prevented provisioned resource-reuse; resources the that peak much). plateau, lasting for a small duration, could not utilize B.2. Elasticity behaviors 239 250 250 250 250 250 200 200 200 200 200 150 150 150 150 150 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency (b) Ruleset 3 100 100 100 100 100 CPU Demand and Supply Penalty for Over-Provisioning Penalty for Under-Provisioning 50 50 50 50 50 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 30 25 20 15 10 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

30000 25000 20000 15000 10000 No. of Requests of No. improvement with respect to Ruleset 1 (SLO-aware provisioning improved 26% 250 250 250 250 250 200 200 200 200 200 0 . 74 , indicating 150 150 150 150 150 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency 100 100 100 100 100 CPU Demand and Supply Penalty for Over-Provisioning Penalty for Under-Provisioning (a) Ruleset 1 (reference) 50 50 50 50 50 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 30 25 20 15 10 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

30000 25000 20000 15000 10000 No. of Requests of No. the under-provisioned situation to somecould extent, not however, utilize trough the plateau prevented newly resource-reuse; provisioned the resources peak that plateau, much). lasting for a small duration, Figure B.13: EC2 platform’sPenalty elasticity ratio behavior of in Ruleset response 3 to for sine_plateau_10 this workload. workload is 240 Elasticity behavior of standard workloads 350 350 350 350 350 300 300 300 300 300 250 250 250 250 250 200 200 200 200 200 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts 150 150 150 150 150 Maximum Latency (b) Ruleset 2 CPU Demand and Supply Penalty for Over-Provisioning Penalty for Under-Provisioning 100 100 100 100 100 50 50 50 50 50 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 30 25 20 15 10 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

30000 25000 20000 15000 10000 No. of Requests of No. improvement with respect to Ruleset 1 (Bulk-provisioning improved the 46% 350 350 350 350 350 300 300 300 300 300 0 . 54 , indicating 250 250 250 250 250 200 200 200 200 200 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts 150 150 150 150 150 Maximum Latency CPU Demand and Supply Penalty for Over-Provisioning Penalty for Under-Provisioning (a) Ruleset 1 (reference) 100 100 100 100 100 50 50 50 50 50 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 30 25 20 15 10 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

30000 25000 20000 15000 10000 No. of Requests of No. Figure B.14: EC2 platform’sPenalty elasticity ratio behavior of in response Ruleset to 2 sine_plateau_40 for workload. this workload is under-provisioned situation, however, troughthe plateau newly prevented provisioned resource-reuse; resources). the peak plateau, lasting for a longer duration, utilized some of B.2. Elasticity behaviors 241 350 350 350 350 350 300 300 300 300 300 250 250 250 250 250 200 200 200 200 200 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts 150 150 150 150 150 Maximum Latency (b) Ruleset 3 CPU Demand and Supply Penalty for Over-Provisioning Penalty for Under-Provisioning 100 100 100 100 100 50 50 50 50 50 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 30 25 20 15 10 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

30000 25000 20000 15000 10000 No. of Requests of No. improvement with respect to Ruleset 1 (SLO-aware provisioning improved the 350 350 350 350 350 22% 300 300 300 300 300 250 250 250 250 250 0 . 78 , indicating 200 200 200 200 200 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts 150 150 150 150 150 Maximum Latency CPU Demand and Supply Penalty for Over-Provisioning Penalty for Under-Provisioning (a) Ruleset 1 (reference) 100 100 100 100 100 50 50 50 50 50 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 30 25 20 15 10 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

30000 25000 20000 15000 10000 No. of Requests of No. under-provisioned situation to someutilized extent, some however, of trough the plateau newly prevented provisioned resource-reuse; resources). the peak plateau, lasting for a longer duration, Figure B.15: EC2 platform’sPenalty elasticity ratio behavior of Ruleset in 3 response for to this sine_plateau_40 workload workload. is 242 Elasticity behavior of standard workloads 400 400 400 400 400 350 350 350 350 350 300 300 300 300 300 250 250 250 250 250 200 200 200 200 200 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency (b) Ruleset 2 CPU Demand and Supply Penalty for Over-Provisioning 150 150 150 150 150 Penalty for Under-Provisioning 100 100 100 100 100 50 50 50 50 50 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 35 30 25 20 15 10 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

30000 25000 20000 15000 10000 No. of Requests of No. improvement with respect to Ruleset 1 (Bulk-provisioning and some resource 61% 400 400 400 400 400 350 350 350 350 350 300 300 300 300 300 0 . 39 , indicating 250 250 250 250 250 200 200 200 200 200 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency CPU Demand and Supply Penalty for Over-Provisioning 150 150 150 150 150 Penalty for Under-Provisioning (a) Ruleset 1 (reference) 100 100 100 100 100 50 50 50 50 50 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 35 30 25 20 15 10 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

30000 25000 20000 15000 10000 No. of Requests of No. Figure B.16: EC2 platform’sPenalty elasticity ratio behavior of in Ruleset 2 response for to this sine_plateau_70 workload workload. is reuse from previous cyclelasting improved the for under-provisioned a situation, longer however, duration, trough could plateau make mostly better prevented resource-reuse; utilization the of peak the plateau, newly provisioned resources.) B.2. Elasticity behaviors 243 400 400 400 400 400 350 350 350 350 350 300 300 300 300 300 250 250 250 250 250 200 200 200 200 200 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency (b) Ruleset 3 CPU Demand and Supply Penalty for Over-Provisioning 150 150 150 150 150 Penalty for Under-Provisioning 100 100 100 100 100 50 50 50 50 50 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 35 30 25 20 15 10 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

30000 25000 20000 15000 10000 No. of Requests of No. improvement with respect to Ruleset 1 (Both rulesets showed almost similar 2% 400 400 400 400 400 350 350 350 350 350 0 . 98 , indicating 300 300 300 300 300 250 250 250 250 250 200 200 200 200 200 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency CPU Demand and Supply Penalty for Over-Provisioning 150 150 150 150 150 Penalty for Under-Provisioning (a) Ruleset 1 (reference) 100 100 100 100 100 50 50 50 50 50 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 35 30 25 20 15 10 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

30000 25000 20000 15000 10000 No. of Requests of No. Figure B.17: EC2 platform’sPenalty elasticity ratio behavior of in Ruleset response 3 to for sine_plateau_70 this workload. workload is performance in this particularsome of scenario; the the newly trough provisioned resources). plateau prevented resource-reuse; the peak plateau, lasting for a longer duration, utilized 244 Elasticity behavior of standard workloads 160 160 160 160 160 140 140 140 140 140 120 120 120 120 120 100 100 100 100 100 80 80 80 80 80 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency (b) Ruleset 2 CPU Demand and Supply Penalty for Over-Provisioning 60 60 60 60 60 Penalty for Under-Provisioning 40 40 40 40 40 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 35 30 25 20 15 10 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

40000 35000 30000 25000 20000 15000 10000 No. of Requests of No. improvement with respect to Ruleset 1 (Bulk-provisioning improved the 17% 160 160 160 160 160 140 140 140 140 140 0 . 83 , indicating 120 120 120 120 120 100 100 100 100 100 80 80 80 80 80 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency CPU Demand and Supply Penalty for Over-Provisioning 60 60 60 60 60 Penalty for Under-Provisioning (a) Ruleset 1 (reference) 40 40 40 40 40 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 35 30 25 20 15 10 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

40000 35000 30000 25000 20000 15000 10000 No. of Requests of No. Figure B.18: EC2 platform’sPenalty elasticity ratio behavior of in response Ruleset to 2 exp_18_2.25 for workload. this workload is under-provisioned situation, however, the troughlasting in for between a two very peaks small was duration, long could enough to not prevent utilize resource-reuse the from newly previous provisioned cycle; resources the at peak, all.) B.2. Elasticity behaviors 245 160 160 160 160 160 140 140 140 140 140 120 120 120 120 120 100 100 100 100 100 80 80 80 80 80 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency (b) Ruleset 3 CPU Demand and Supply Penalty for Over-Provisioning 60 60 60 60 60 Penalty for Under-Provisioning 40 40 40 40 40 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 35 30 25 20 15 10 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

40000 35000 30000 25000 20000 15000 10000 No. of Requests of No. degradation with respect to Ruleset 1 (The trough in between two peaks was 2% 160 160 160 160 160 140 140 140 140 140 1 . 02 , indicating 120 120 120 120 120 100 100 100 100 100 80 80 80 80 80 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency CPU Demand and Supply Penalty for Over-Provisioning 60 60 60 60 60 Penalty for Under-Provisioning (a) Ruleset 1 (reference) 40 40 40 40 40 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 35 30 25 20 15 10 0.8 0.6 0.4 0.2 Penalty (cents) Penalty 800 700 600 500 (%) 400 300 CPU 200 100

5000

40000 35000 30000 25000 20000 15000 10000 No. of Requests of No. Figure B.19: EC2 platform’sPenalty elasticity ratio behavior of in Ruleset response 3 to for exp_18_2.25 this workload. workload is long enough to preventresources resource-reuse at from all). previous cycle; the peak, lasting for a very small duration, could not utilize the newly provisioned 246 Elasticity behavior of standard workloads 160 160 160 160 160 140 140 140 140 140 120 120 120 120 120 100 100 100 100 100 80 80 80 80 80 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency (b) Ruleset 2 CPU Demand and Supply Penalty for Over-Provisioning 60 60 60 60 60 Penalty for Under-Provisioning 40 40 40 40 40 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 2 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 35 30 25 20 15 10

1.5 0.5 800 600 400 200 (cents) Penalty

1400 1200 1000 (%) CPU

60000 50000 40000 30000 20000 10000 No. of Requests of No. improvement with respect to Ruleset 1 (Bulk-provisioning and resource-reuse 58% 160 160 160 160 160 140 140 140 140 140 0 . 42 , indicating 120 120 120 120 120 100 100 100 100 100 80 80 80 80 80 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency CPU Demand and Supply Penalty for Over-Provisioning 60 60 60 60 60 Penalty for Under-Provisioning (a) Ruleset 1 (reference) 40 40 40 40 40 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 2 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 35 30 25 20 15 10

1.5 0.5 800 600 400 200 (cents) Penalty

1400 1200 1000 (%) CPU

60000 50000 40000 30000 20000 10000 No. of Requests of No. significantly improved the under-provisioned situation). Figure B.20: EC2 platform’sPenalty elasticity ratio behavior of in Ruleset 2 response for to this exp_24_3.0 workload workload. is B.2. Elasticity behaviors 247 160 160 160 160 160 140 140 140 140 140 120 120 120 120 120 100 100 100 100 100 80 80 80 80 80 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency (b) Ruleset 3 CPU Demand and Supply Penalty for Over-Provisioning 60 60 60 60 60 Penalty for Under-Provisioning 40 40 40 40 40 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 2 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 35 30 25 20 15 10

1.5 0.5 800 600 400 200 (cents) Penalty

1400 1200 1000 (%) CPU

60000 50000 40000 30000 20000 10000 No. of Requests of No. improvement with respect to Ruleset 1 (SLO-aware provisioning and resource- 45% 160 160 160 160 160 140 140 140 140 140 120 120 120 120 120 0 . 55 , indicating 100 100 100 100 100 80 80 80 80 80 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency CPU Demand and Supply Penalty for Over-Provisioning 60 60 60 60 60 Penalty for Under-Provisioning (a) Ruleset 1 (reference) 40 40 40 40 40 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 2 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 35 30 25 20 15 10

1.5 0.5 800 600 400 200 (cents) Penalty

1400 1200 1000 (%) CPU

60000 50000 40000 30000 20000 10000 No. of Requests of No. reuse significantly improved the under-provisioned situation). Figure B.21: EC2 platform’sPenalty elasticity ratio behavior of Ruleset in 3 response for to this exp_24_3.0 workload is workload. 248 Elasticity behavior of standard workloads 180 180 180 180 180 160 160 160 160 160 140 140 140 140 140 120 120 120 120 120 100 100 100 100 100 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) 80 80 80 80 80 Request Counts Maximum Latency (b) Ruleset 2 CPU Demand and Supply Penalty for Over-Provisioning Penalty for Under-Provisioning 60 60 60 60 60 40 40 40 40 40 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

60 50 40 (seconds) 30 20 10 Latency 25 20 15 10

0.8 0.6 0.4 0.2 800 600 400 200 (cents) Penalty

1200 1000 (%) CPU

50000 40000 30000 20000 10000 No. of Requests of No. degradation with respect to Ruleset 1 (This workload, growing at a slow pace, 180 180 180 180 180 13% 160 160 160 160 160 140 140 140 140 140 1 . 13 , indicating 120 120 120 120 120 100 100 100 100 100 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) 80 80 80 80 80 Request Counts Maximum Latency CPU Demand and Supply Penalty for Over-Provisioning Penalty for Under-Provisioning 60 60 60 60 60 (a) Ruleset 1 (reference) 40 40 40 40 40 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

60 50 40 (seconds) 30 20 10 Latency 25 20 15 10

0.8 0.6 0.4 0.2 800 600 400 200 (cents) Penalty

1200 1000 (%) CPU

50000 40000 30000 20000 10000 No. of Requests of No. ered due to bulk-provisioning; note the increased amount of over-provisioning). su Figure B.22: EC2 platform’sPenalty elasticity ratio behavior of Ruleset in 2 response for to this linear_240 workload workload. is B.2. Elasticity behaviors 249 180 180 180 180 180 160 160 160 160 160 140 140 140 140 140 120 120 120 120 120 100 100 100 100 100 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) 80 80 80 80 80 Request Counts Maximum Latency (b) Ruleset 3 CPU Demand and Supply Penalty for Over-Provisioning Penalty for Under-Provisioning 60 60 60 60 60 40 40 40 40 40 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

60 50 40 (seconds) 30 20 10 Latency 25 20 15 10

0.8 0.6 0.4 0.2 800 600 400 200 (cents) Penalty

1200 1000 (%) CPU

50000 40000 30000 20000 10000 No. of Requests of No. improvement with respect to Ruleset 1 (SLO-aware provisioning correctly 24% 180 180 180 180 180 160 160 160 160 160 140 140 140 140 140 0 . 76 , indicating 120 120 120 120 120 100 100 100 100 100 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) 80 80 80 80 80 Request Counts Maximum Latency CPU Demand and Supply Penalty for Over-Provisioning Penalty for Under-Provisioning 60 60 60 60 60 (a) Ruleset 1 (reference) 40 40 40 40 40 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

60 50 40 (seconds) 30 20 10 Latency 25 20 15 10

0.8 0.6 0.4 0.2 800 600 400 200 (cents) Penalty

1200 1000 (%) CPU

50000 40000 30000 20000 10000 No. of Requests of No. understood the resource requirement of this slow-paced workload) Figure B.23: EC2 platform’sPenalty elasticity ratio behavior of in Ruleset response 3 to for linear_240 workload. this workload is 250 Elasticity behavior of standard workloads 160 160 160 160 160 140 140 140 140 140 120 120 120 120 120 100 100 100 100 100 80 80 80 80 80 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency (b) Ruleset 2 CPU Demand and Supply Penalty for Over-Provisioning 60 60 60 60 60 Penalty for Under-Provisioning 40 40 40 40 40 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 30 25 20 15 10

1.4 1.2 0.8 0.6 0.4 0.2 800 600 400 200 (cents) Penalty

1200 1000 (%) CPU

50000 40000 30000 20000 10000 No. of Requests of No. improvement with respect to Ruleset 1 (Bulk-provisioning and resource-reuse 40% 160 160 160 160 160 140 140 140 140 140 0 . 60 , indicating 120 120 120 120 120 100 100 100 100 100 80 80 80 80 80 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency CPU Demand and Supply Penalty for Over-Provisioning 60 60 60 60 60 Penalty for Under-Provisioning (a) Ruleset 1 (reference) 40 40 40 40 40 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 30 25 20 15 10

1.4 1.2 0.8 0.6 0.4 0.2 800 600 400 200 (cents) Penalty

1200 1000 (%) CPU

50000 40000 30000 20000 10000 No. of Requests of No. better served sudden spikes, hence improved the under-provisioned situation). Figure B.24: EC2 platform’sPenalty elasticity ratio behavior of in Ruleset 2 response for to this random workload workload. is B.2. Elasticity behaviors 251 160 160 160 160 160 140 140 140 140 140 120 120 120 120 120 100 100 100 100 100 80 80 80 80 80 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency (b) Ruleset 3 CPU Demand and Supply Penalty for Over-Provisioning 60 60 60 60 60 Penalty for Under-Provisioning 40 40 40 40 40 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 30 25 20 15 10

1.4 1.2 0.8 0.6 0.4 0.2 800 600 400 200 (cents) Penalty

1200 1000 (%) CPU

50000 40000 30000 20000 10000 No. of Requests of No. improvement with respect to Ruleset 1 (SLO-aware provisioning and resource- 9% 160 160 160 160 160 140 140 140 140 140 0 . 91 , indicating 120 120 120 120 120 100 100 100 100 100 80 80 80 80 80 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Maximum Latency CPU Demand and Supply Penalty for Over-Provisioning 60 60 60 60 60 Penalty for Under-Provisioning (a) Ruleset 1 (reference) 40 40 40 40 40 20 20 20 20 20 demand a/supply c/supply 0 0 0 0 0

0 0 0 5 0 1 0

Penalty (cents) Penalty

70 60 50 40 (seconds) 30 20 10 Latency 30 25 20 15 10

1.4 1.2 0.8 0.6 0.4 0.2 800 600 400 200 (cents) Penalty

1200 1000 (%) CPU

50000 40000 30000 20000 10000 No. of Requests of No. reuse improved the under-provisioned situation to some extent). Figure B.25: EC2 platform’sPenalty elasticity ratio behavior of in Ruleset 3 response for to this random workload workload. is 252 Elasticity behavior of standard workloads Appendix C

Elasticity behavior of fine-scale bursty workloads

This chapter shows the elasticity behavior of the EC2 platform in response to the fine-scale bursty workloads. Recall from Chapter 4, the scaling ruleset was defined as follows (with m1.small instance as the basic building block):

Table C.1: Autoscaling engine configuration

Ruleset Monitoring Upper Lower Upper Lower VM in- VM interval breach breach thresh- thresh- crement decre- dura- dura- old old ment tion tion 1 1min 3 10 70% 20% 1 1 mins mins CPU CPU average average

253 254 Elasticity behavior of fine-scale bursty workloads 180 180 180 180 180 180 160 160 160 160 160 160 140 140 140 140 140 140 120 120 120 120 120 120 100 100 100 100 100 100 80 80 80 80 80 80 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Percentile Response Time CPU Demand and Supply th 60 60 60 60 60 60 Penalty for Over-Provisioning per hour ELB Maximum Queue Length Penalty for Under-Provisioning 95 40 40 40 40 40 40 28 . 04 ¢ 20 20 20 20 20 20 served rejected demand c/supply a/supply 0 0 0 0 0 0

0 0 0 0 5 0 0

50 25 20 15 (cents/min) 10 Penalty

CPU (%) CPU 0.2 0.1

400 350 300 250 200 150 100 800 600 400 200

Penalty (cents/min) Penalty Queue Length Queue 0.25 0.15 0.05

1000 5000 8000 6000 4000 2000 Response Time (ms) Time Response 35000 30000 25000 20000 Requests 15000 of 10000 Sum 10000 Figure C.2: Elasticity behaviorpenalty rate of sigma50 workload; 180 180 180 180 180 180 160 160 160 160 160 160 140 140 140 140 140 140 120 120 120 120 120 120 100 100 100 100 100 100 80 80 80 80 80 80 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Percentile Response Time CPU Demand and Supply th 60 60 60 60 60 60 Penalty for Over-Provisioning ELB Maximum Queue Length Penalty for Under-Provisioning 95 per hour 40 40 40 40 40 40 9 . 46 ¢ 20 20 20 20 20 20 served rejected demand c/supply a/supply 0 0 0 0 0 0

0 0 0 0 5 0 0

50 25 20 15 (cents/min) 10 Penalty

CPU (%) CPU 0.2 0.1

400 350 300 250 200 150 100 800 600 400 200

Penalty (cents/min) Penalty Queue Length Queue 0.25 0.15 0.05

1000 5000 8000 6000 4000 2000 Response Time (ms) Time Response 35000 30000 25000 20000 Requests 15000 of 10000 Sum 10000 Figure C.1: Elasticitypenalty behavior rate of smooth workload; 255 180 180 180 180 180 180 160 160 160 160 160 160 140 140 140 140 140 140 120 120 120 120 120 120 100 100 100 100 100 100 80 80 80 80 80 80 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Percentile Response Time CPU Demand and Supply th 60 60 60 60 60 60 Penalty for Over-Provisioning per hour ELB Maximum Queue Length Penalty for Under-Provisioning 95 40 40 40 40 40 40 35 . 89 ¢ 20 20 20 20 20 20 served rejected demand c/supply a/supply 0 0 0 0 0 0

0 0 0 0 5 0 0

50 25 20 15 (cents/min) 10 Penalty

CPU (%) CPU 0.2 0.1

400 350 300 250 200 150 100 800 600 400 200

Penalty (cents/min) Penalty Queue Length Queue 0.25 0.15 0.05

1000 5000 8000 6000 4000 2000 Response Time (ms) Time Response 35000 30000 25000 20000 Requests 15000 of 10000 Sum 10000 Figure C.4: Elasticity behaviorpenalty of sigma150 rate workload; 180 180 180 180 180 180 160 160 160 160 160 160 140 140 140 140 140 140 120 120 120 120 120 120 100 100 100 100 100 100 80 80 80 80 80 80 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Percentile Response Time CPU Demand and Supply th 60 60 60 60 60 60 Penalty for Over-Provisioning per hour ELB Maximum Queue Length Penalty for Under-Provisioning 95 40 40 40 40 40 40 34 . 54 ¢ 20 20 20 20 20 20 served rejected demand c/supply a/supply 0 0 0 0 0 0

0 0 0 0 5 0 0

50 25 20 15 (cents/min) 10 Penalty

CPU (%) CPU 0.2 0.1

400 350 300 250 200 150 100 800 600 400 200

Penalty (cents/min) Penalty Queue Length Queue 0.25 0.15 0.05

1000 5000 8000 6000 4000 2000 Response Time (ms) Time Response 35000 30000 25000 20000 Requests 15000 of 10000 Sum 10000 Figure C.3: Elasticity behaviorpenalty of sigma100 rate workload; 256 Elasticity behavior of fine-scale bursty workloads 180 180 180 180 180 180 160 160 160 160 160 160 140 140 140 140 140 140 120 120 120 120 120 120 100 100 100 100 100 100 80 80 80 80 80 80 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Percentile Response Time CPU Demand and Supply th 60 60 60 60 60 60 per hour Penalty for Over-Provisioning ELB Maximum Queue Length Penalty for Under-Provisioning 95 40 40 40 40 40 40 39 . 67 ¢ 20 20 20 20 20 20 served rejected demand c/supply a/supply 0 0 0 0 0 0

0 0 0 0 5 0 0

50 25 20 15 (cents/min) 10 Penalty

0.2 0.1

400 350 300 250 (%) 200 150 CPU 100 800 600 400 200

0.25 0.15 0.05

Penalty (cents/min) Penalty 1000 Length Queue 5000 8000 6000 4000 2000

Response Time (ms) Time Response 35000 30000 25000 20000 Requests 15000 of 10000 Sum 10000 Figure C.6: Elasticity behaviorpenalty of sigma250 rate workload; 180 180 180 180 180 180 160 160 160 160 160 160 140 140 140 140 140 140 120 120 120 120 120 120 100 100 100 100 100 100 80 80 80 80 80 80 Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Time (in mins) Request Counts Percentile Response Time CPU Demand and Supply th 60 60 60 60 60 60 Penalty for Over-Provisioning per hour ELB Maximum Queue Length Penalty for Under-Provisioning 95 40 40 40 40 40 40 38 . 68 ¢ 20 20 20 20 20 20 served rejected demand c/supply a/supply 0 0 0 0 0 0

0 0 0 0 5 0 0

50 25 20 15 (cents/min) 10 Penalty

CPU (%) CPU 0.2 0.1

400 350 300 250 200 150 100 800 600 400 200

Penalty (cents/min) Penalty Queue Length Queue 0.25 0.15 0.05

1000 5000 8000 6000 4000 2000 Response Time (ms) Time Response 35000 30000 25000 20000 Requests 15000 of 10000 Sum 10000 Figure C.5: Elasticity behaviorpenalty of sigma200 rate workload; Appendix D

A comparison of elasticity evaluation methodologies: Environmental bias perspective

This chapter presents a comparison of various elasticity evaluation methodologies from the perspective of handling the nondeterministic bias of the cloud environment. Most of these evaluation methods relied on a single execution of the benchmark; however, a few of those reported the scaling delay aspect of elasticity based on repeated measurements and data analysis. Table D.1 illustrates the diversity in the elasticity evaluation methodologies.

257 A comparison of elasticity evaluation methodologies: Environmental bias 258 perspective H X [239, 94, 93, 206, 139] [83, 172, 222, 155, 145, 82] G X [79, 150, 213, 148, 217] [43, 91, 159, 215, 126, 81] F X [119, 118] E X X X X [173] D X X X X X X [136] C X X X [164, 163] Data Analysis B X X X X [210] Experimental Design A X X X X [174, 127] Table D.1: A comparison of various elasticity evaluation methodologies Range Reference Methodology Single workload Multiple workloads Quartiles and mean Percentiles and median Repeated measurements One-factor-at-a-time method CoV (Coe cient of Variation) No workload, micro measurements Mean/median and standard deviation (e.g., VM startup time vs. hour, VM startup time vs. location) Appendix E

Variability in elasticity behavior

This chapter shows the non-deterministic elasticity behavior of the EC2 platform in re- sponse to the sinusoidal (sine_30) and exponential (exp_18_2.25) workloads of the stan- dard suite. Recall from Chapter 5, the scaling rulesets were defined as follows.

Table E.1: Autoscaling engine configuration

Ruleset Monitoring Upper Lower Upper Lower VM VM Building interval breach breach thresh- thresh- incre- decre- block dura- dura- old old ment ment tion tion 1 1min 3 10 70% 20% 1 1 m1.small mins mins CPU CPU average average 2 1min 3 10 70% 20% 1 1 m1.medium mins mins CPU CPU average average

259 260 Variability in elasticity behavior 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 . a 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.99 (d) Exponential workload - Ruleset 2. Penalty ratio 0.73 demand demand a/supply c/supply a/supply c/supply ECUs per minute. Note that the terms “ECU” and “CPU” have been 2 0 0 0 0

0 . 85 800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) ECU per minute whereas an m1.medium instance provides 80 1 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 An m1.small instance provides a Replicated set up combination is (us-est-1b, Sun, o peak). Elasticity metric is Figure E.1: EC2 platform’s elasticity behavior for sine_30 and exp_18_2.25 workloads used interchangeably in the figures. 261 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.78 (d) Exponential workload - Ruleset 2. Penalty ratio 0.56 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 0 . 66 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.2: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1b, exp_18_2.25 Sun, workloads. peak). Elasticity metric is 262 Variability in elasticity behavior 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.73 (d) Exponential workload - Ruleset 2. Penalty ratio 0.61 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

0 . 67

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.3: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1b, exp_18_2.25 Mon, workloads. o peak). Elasticity metric is 263 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 1.13 (d) Exponential workload - Ruleset 2. Penalty ratio 0.65 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 0 . 86 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.4: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1b, exp_18_2.25 Mon, workloads. peak). Elasticity metric is 264 Variability in elasticity behavior 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 1.42 (d) Exponential workload - Ruleset 2. Penalty ratio 0.69 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

0 . 99 800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.5: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1b, exp_18_2.25 Tue, workloads. o peak). Elasticity metric is 265 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.55 (d) Exponential workload - Ruleset 2. Penalty ratio 0.67 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 0 . 61 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.6: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1b, exp_18_2.25 Tue, workloads. peak). Elasticity metric is 266 Variability in elasticity behavior 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 1.09 (d) Exponential workload - Ruleset 2. Penalty ratio 0.61 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

0 . 82

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.7: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1b, exp_18_2.25 Wed, workloads. o peak). Elasticity metric is 267 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.88 (d) Exponential workload - Ruleset 2. Penalty ratio 0.58 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 0 . 72 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.8: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1b, exp_18_2.25 Wed, workloads. peak). Elasticity metric is 268 Variability in elasticity behavior 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.90 (d) Exponential workload - Ruleset 2. Penalty ratio 0.59 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

0 . 73 800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.9: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1b, exp_18_2.25 Thu, workloads. o peak). Elasticity metric is 269 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.83 (d) Exponential workload - Ruleset 2. Penalty ratio 0.69 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 0 . 76 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.10: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1b, exp_18_2.25 Thu, workloads. peak). Elasticity metric is 270 Variability in elasticity behavior 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.72 (d) Exponential workload - Ruleset 2. Penalty ratio 0.64 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 0 . 68 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.11: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1b, exp_18_2.25 Fri, workloads. o peak). Elasticity metric is 271 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.85 (d) Exponential workload - Ruleset 2. Penalty ratio 0.69 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 0 . 77 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.12: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1b, exp_18_2.25 Fri, workloads. peak). Elasticity metric is 272 Variability in elasticity behavior 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.64 (d) Exponential workload - Ruleset 2. Penalty ratio 0.59 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

0 . 62 CPU (%) CPU 1200 1000 (%) CPU 1200 1000 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.13: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1b, exp_18_2.25 Sat, workloads. o peak). Elasticity metric is 273 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.78 (d) Exponential workload - Ruleset 2. Penalty ratio 0.59 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 0 . 68 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.14: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1b, exp_18_2.25 Sat, workloads. peak). Elasticity metric is 274 Variability in elasticity behavior 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.63 (d) Exponential workload - Ruleset 2. Penalty ratio 0.67 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 0 . 65 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.15: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1c, exp_18_2.25 Sun, workloads. peak). Elasticity metric is 275 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.92 (d) Exponential workload - Ruleset 2. Penalty ratio 0.50 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

0 . 68

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.16: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1c, exp_18_2.25 Mon, workloads. o peak). Elasticity metric is 276 Variability in elasticity behavior 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.88 (d) Exponential workload - Ruleset 2. Penalty ratio 0.57 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 0 . 70 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.17: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1c, exp_18_2.25 Mon, workloads. peak). Elasticity metric is 277 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.78 (d) Exponential workload - Ruleset 2. Penalty ratio 0.54 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

0 . 65 CPU (%) CPU 1200 1000 (%) CPU 1200 1000 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.18: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1c, exp_18_2.25 Tue, workloads. o peak). Elasticity metric is 278 Variability in elasticity behavior 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.69 (d) Exponential workload - Ruleset 2. Penalty ratio 0.57 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 0 . 63 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.19: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1c, exp_18_2.25 Tue, workloads. peak). Elasticity metric is 279 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 1.10 (d) Exponential workload - Ruleset 2. Penalty ratio 0.60 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

0 . 81 800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.20: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1c, exp_18_2.25 Wed, workloads. o peak). Elasticity metric is 280 Variability in elasticity behavior 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.76 (d) Exponential workload - Ruleset 2. Penalty ratio 0.54 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 0 . 64 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.21: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1c, exp_18_2.25 Wed, workloads. peak). Elasticity metric is 281 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.73 (d) Exponential workload - Ruleset 2. Penalty ratio 0.60 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

0 . 66 800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.22: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1c, exp_18_2.25 Thu, workloads. o peak). Elasticity metric is 282 Variability in elasticity behavior 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.74 (d) Exponential workload - Ruleset 2. Penalty ratio 0.43 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 0 . 57 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.23: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1c, exp_18_2.25 Thu, workloads. peak). Elasticity metric is 283 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.86 (d) Exponential workload - Ruleset 2. Penalty ratio 0.51 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 0 . 66 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.24: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1c, exp_18_2.25 Fri, workloads. o peak). Elasticity metric is 284 Variability in elasticity behavior 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 1.0 (d) Exponential workload - Ruleset 2. Penalty ratio 0.81 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 0 . 90 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.25: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1c, exp_18_2.25 Fri, workloads. peak). Elasticity metric is 285 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.86 (d) Exponential workload - Ruleset 2. Penalty ratio 0.69 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 0 . 77 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.26: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1c, exp_18_2.25 Sat, workloads. o peak). Elasticity metric is 286 Variability in elasticity behavior 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 0.55 (d) Exponential workload - Ruleset 2. Penalty ratio 0.64 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 0 . 59 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.27: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1c, exp_18_2.25 Sat, workloads. peak). Elasticity metric is 287 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 40 20 20 (b) Sinusoidal workload - Ruleset 2. Penalty ratio 1.11 (d) Exponential workload - Ruleset 2. Penalty ratio 0.54 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

0 . 77 CPU (%) CPU 1200 1000 (%) CPU 1200 1000 100 180 160 80 140 120 60 100 Time (in mins) Time (in mins) 80 40 CPU Demand and Supply CPU Demand and Supply 60 (a) Sinusoidal workload - Ruleset 1 (c) Exponential workload - Ruleset 1 40 20 20 demand demand a/supply c/supply a/supply c/supply 0 0 0 0

800 600 400 200 800 600 400 200

CPU (%) CPU 1200 1000 (%) CPU 1200 1000 Figure E.28: EC2 platform’sReplicated elasticity set behavior up for combination sine_30 is and (us-est-1c, exp_18_2.25 Sun, workloads. o peak). Elasticity metric is