Modeling and Evaluation of Rule-Based Elasticity for Cloud-Based Applications

Modeling and Evaluation of Rule-based Elasticity for Cloud-based Applications

Basem Suleiman

A thesis in fulfillment of the requirements for the degree of Doctor of Philosophy

The University of New South Wales

School of Computer Science and Engineering Faculty of Engineering

Jan. 2015 PLEASE TYPE THE UNIVERSITY OF NEW SOUTH WALES Thesis/Dissertation Sheet

Surname or Family name: Suleiman

First name: Basem Other name/s: Fathi

Abbreviation for degree as given in the University calendar: PhD

School: Computer Science and Engineering Faculty: Engineering

Title: Modeling and Evaluation of Rule-Based Elasticity for Cloud-based Applications

Abstract 350 words maximum: (PLEASE TYPE)

Cloud computing has evolved to become a dominant computing paradigm in which computing resources such as software, platform and infrastructure are provisioned and consumed as services. The Infrastructure as a Service (IaaS) cloud model has particularly attracted many web-based businesses to run their business applications. This is primarily due to elasticity characteristic which encompasses dynamic provisioning and de-provisioning of computing resources, e.g., servers, storage and network, on-demand through internet self-service interfaces. In this model, resources usage are billed based on usage per unit of time. Different application classes, specifically internet-based business (or e-business) applications, can highly benefit from IaaS elasticity. Such applications are of high business value and subject to variable workload patterns and volumes due to its exposure to the web. Realizing elasticity benefits of IaaS for e-business applications primarily relies on achieving application Service Level Objectives (SLOs) through efficient use of computing resources. These SLOs are often specified in Service Level Agreements (SLAs) agreed between businesses and their customers. This is of paramount importance for both IaaS cloud providers and consumers, but it has been faced with many challenges. First, most IaaS providers support elasticity through auto-scaling rules which are mainly based on thresholds. Choosing appropriate values for these thresholds, however, is not a trivial task for the cloud consumer as it requires exhaustive empirical testing with applications workload and performance and cost metrics. Second, IaaS elasticity is primarily driven by resource- based metrics such CPU utilization. However, elasticity that is based on application SLA metrics are also crucial for cloud consumers. Third, while resource and application metrics are readily available, it is still a challenge to fashion resource provisioning rules that perform well in terms of important performance and cost. In this thesis, we address these challenges through three main contributions. First, we propose novel analytical models that capture core elasticity thresholds and emulate how IaaS elasticity works. The proposed models also approximate key metrics for evaluating performance of elasticity rules including CPU utilization, applications response time and server usage cost. We also develop algorithms that decide when and how to scale-out and scale-in based on CPU utilization and other thresholds, and estimate servers cost incurred by scaling actions. We validate our models and algorithms using Matlab simulation and equivalent experiments in Amazon EC2 cloud using an e-commerce 3-tier web application. Second, we propose an architecture and a method for deriving IaaS elasticity based on application SLA metrics. We present algorithms that monitor response time SLA satisfaction and decide when to scale-out and scale-in application servers in an IaaS cloud environment. Third, we extensively evaluate the two common types of elasticity rules with different CPU utilization and response time SLA thresholds. In addition, we carry out trade-off analysis of the performance of both elasticity approaches in terms of key metrics including response time SLA satisfaction, CPU utilization and servers usage cost. We carry out our evaluation of performance of both elasticity approaches using the same e-commerce 3-tier web application running on Amazon EC2 cloud.

Decla ration rel ating to disposition of project thesis/dissertation

I hereby grant to the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or in part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all property rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.

I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstracts International (this is applicable to doctoral theses only).

22222222222222222222222 22222222222222..222222 222.22222222...22.2 Signature Witness Date

The University recognises that there may be exceptional circumstances requiring restrictions on copying or conditions on use. Requests for restriction for a period of up to 2 years must be made in writing. Requests for a longer period of restriction may be considered in exceptional circumstances and require the approval of the Dean of Graduate Research.

FOR OFFICE USE ONLY Date of completion of requirements for Award:

THIS SHEET IS TO BE GLUED TO THE INSIDE FRONT COVER OF THE THESIS

ORIGINALITY STATEMENT

‘I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged.’

Signed ……………………………………………......

Date ……………………………………………......

‘I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral theses only). I have either used no substantial portions of copyright material in my thesis or I have obtained permission to use copyright material; where permission has not been granted I have applied/will apply for a partial restriction of the digital copy of my thesis or dissertation.'

Signed ……………………………………………......

Date ……………………………………………......

AUTHENTICITY STATEMENT

‘I certify that the Library deposit digital copy is a direct equivalent of the final officially approved version of my thesis. No emendation of content has occurred and if there are any minor variations in formatting, they are the result of the conversion to digital format.’

Signed ……………………………………………......

Date ……………………………………………...... To my loving parents, brothers and sisters, and my beloved wife and son, Elias. Acknowledgements

It would not have been possible to write this doctoral thesis without the help and support of the kind people around me, to only some of whom it is possible to give particular mention here. My deepest thanks for my great parents for their endeavour and invaluable support and encourage which raised me to successfully complete my PhD. Sincere thanks also to my loving brothers and sisters, my loving wife and son for their personal support and patience.

I would like to sincerely thank my supervisors Dr. Srikumar Venu- gopal and Prof. Ross Jeﬀery. Their invaluable advice, encourage- ment and knowledge have been of great motivation and help through- out my PhD. I would like also to thank the other supervisors who I worked with previously for their feedback, support and suggestions. Dr. Vladimir Tosic provided me with great help and support to start and move with my PhD candidature. He has greatly helped me to secure ﬁnancial support and internship opportunities. Dr. Wasim Sadiq advice and feedback have been inspiring and practical which helped me to strive for real-world scenarios in the context of my research. Dr. Sherif Sakr has supported my move to cloud computing research with his feedback and guidance. I would like also to extend my thanks to my PhD committees from UNSW, NICTA and SAP research who continuously provided suggestions and feedback to maintain my PhD progress.

I would like to extend my thanks to different parties who supported me during my PhD journey. Particularly, I am thankful for The Univer- sity of New South Wales (UNSW), National ICT Australia (NICTA) and SAP Research Australia for their academic, financial and technical support which were of great help. The generous research grant from Amazon Web Services have been of great help, as well, to make extensive experimentation using their cloud services. It is important to mention that Amazon was neither involved in nor had any influence on the research reported in this thesis though.

Last but least, many thanks to all people who I have met and worked with during my PhD journey in conferences, UNSW, NICTA, SAP Research and internships. Abstract

Cloud computing has become a dominant computing paradigm in which computing resources such as software, platform and infrastructure are provisioned and consumed as services. The Infrastructure as a Service (IaaS) cloud model has particularly attracted many web- based businesses to run their business applications. This is primarily due to elasticity characteristic which encompasses dynamic provisioning and de-provisioning of computing resources, e.g., servers, storage and network, on-demand through internet self-service interfaces. In IaaS cloud model, resources are billed based on usage per unit of time. Different application classes, specifically internet-based business (e-business) applications, can highly benefit from IaaS elasticity. Such applications are of high business value and subject to variable workload patterns and volumes due to exposing these applications to the web. Realizing elasticity benefits of IaaS for e-business applications primarily relies on achieving application Service Level Objectives (SLOs) through efficient use of computing resources. SLOs are often specified in Service Level Agreements (SLAs) and agreed between businesses and their customers. Meeting SLAs is of paramount importance for both IaaS cloud providers and consumers, but it has been faced with many challenges. First, most IaaS providers support elasticity through auto-scaling rules which are mainly based on performance thresholds. Choosing appropriate values for these thresholds, however, is not a trivial task for the cloud consumer as it requires exhaustive empirical evaluation with applications workload and performance and cost metrics. Second, IaaS elasticity is primarily driven by resource-based metrics such CPU utilization. However, elasticity that is based on application SLA metrics are also crucial for cloud consumers. Third, while resource and application metrics are readily available, it is still a challenge to fashion elasticity rules that perform well in terms of important performance and cost metrics. In this thesis, we address these challenges through three main contributions. First, we propose novel analytical models that capture core elasticity thresholds and emulate how IaaS elasticity works. The proposed models also approximate key metrics for evaluating performance of elasticity rules including CPU utilization, applications response time and servers usage cost. We also develop algorithms that decide when and how to scale-out and scale-in based on CPU utilization and other thresholds, and estimate servers cost incurred by scaling actions. We validate our models and algorithms using Matlab simulation and equivalent experiments in Amazon cloud using an e-commerce 3-tier web application. Second, we propose an architecture and a method for deriving IaaS elasticity based on application SLA metrics. We present algorithms that monitor response time SLA and decide when to scale-out and scale-in application servers in an IaaS cloud environment. Third, we extensively evaluate the two common types of elasticity rules with different CPU utilization and response time SLA thresholds. In addition, we carry out trade-off analysis of the performance of both elasticity approaches in terms of key metrics including response time SLA, CPU utilization and servers usage cost. We carry out our evaluation of performance of both elasticity approaches using the same e-commerce 3-tier web application running on Amazon cloud. Contents

Contents vi

List of Figures ix

List of Tables xi

Nomenclature xi

1 INTRODUCTION3 1.1 Cloud Computing ...... 3 1.2 Service Level Agreements ...... 9 1.3 IaaS Cloud Elasticity ...... 13 1.4 Thesis Aims ...... 17 1.5 Thesis Contributions ...... 22 1.6 Thesis Organization ...... 26

2 BACKGROUND 30 2.1 Cloud Computing ...... 31 2.1.1 Cloud Service Models ...... 33 2.1.2 Cloud Deployment Models ...... 35 2.1.3 Cloud Business Models ...... 37 2.2 SLAs in Cloud Computing Environments ...... 39 2.3 IaaS Cloud Elasticity ...... 44 2.3.1 Architecture and Mechanisms of IaaS Elasticity ...... 47 2.3.2 Types of IaaS Elasticity Techniques ...... 52 2.3.3 Structure and Types of Rule-based Elasticity ...... 54

vi CONTENTS

2.4 Summary ...... 55

3 LITERATURE REVIEW 57 3.1 Modeling Performance of IaaS Cloud Elasticity and Cloud-based Applications ...... 58 3.1.1 Modeling Cloud Resource Provisioning Mechanisms . . . . 58 3.1.2 Modeling Performance of Cloud-based Applications . . . . 60 3.1.3 Modeling IaaS Elasticity ...... 62 3.2 Performance Evaluation of IaaS Clouds and Cloud-based Applica- tions ...... 63 3.2.1 Performance Evaluation of Elasticity of IaaS Cloud Services 64 3.2.2 Evaluating Elasticity Performance on Diﬀerent Server In- stances ...... 66 3.2.3 Impact of Performance Variability on IaaS Cloud Elasticity 69 3.3 SLA-based Elasticity for Cloud-based Applications ...... 71 3.3.1 SLA-based Auto-scaling for IaaS Providers ...... 71 3.3.2 SLA-based Cloud Resource Management Mechanisms . . . 74 3.3.3 IaaS Auto-scaling Mechanisms ...... 76

4 MODELING CPU-BASED ELASTICITY 80 4.1 Elasticity Rules Structure ...... 81 4.2 Modeling CPU-based Elasticity ...... 83 4.2.1 Queue Model for Multi-tier Applications: Assumptions and Scope ...... 84 4.2.2 Modeling CPU Utilization ...... 88 4.2.3 Modeling Application Response Time ...... 91 4.2.4 Modelling Other Elasticity Constraints ...... 95 4.2.5 CPU-based Elasticity Algorithms ...... 97 4.2.6 Cost Models ...... 101 4.3 Validation ...... 104 4.3.1 Experimental Design and Methodology ...... 104 4.3.2 Results and Data Analysis ...... 112 4.3.3 Summary ...... 122

vii CONTENTS

4.4 Use Case Scenario ...... 124 4.5 Discussion ...... 130

5 PERFORMANCE EVALUATION OF SLA-BASED AND CPU- BASED ELASTICITY 134 5.1 IaaS Cloud Elasticity and Application SLAs ...... 135 5.2 SLA-based Elasticity Approach ...... 138 5.2.1 Design ...... 139 5.2.2 SLA-based Elasticity Algorithms ...... 144 5.3 Experimental Evaluation ...... 147 5.3.1 Experiment Design ...... 148 5.3.2 Evaluation Methodology ...... 155 5.3.3 Results and Analysis ...... 160 5.4 Discussion ...... 173

6 CONCLUSIONS AND FUTURE WORK 181 6.1 Summary ...... 181 6.2 Limitations ...... 189 6.3 Future Work ...... 191

APPENDIX A: LIST OF PUBLICATIONS 194

APPENDIX B: GLOSSARY 197

References 202

viii List of Figures

1.1 Example of CPU-based Elasticity Rules ...... 15

2.1 Cloud Service Models with Examples of Cloud Providers and Con- sumers ...... 33 2.2 Types of SLAs in IaaS Cloud Environment and its Dependency . 41 2.3 Elasticity and Cost-eﬀectiveness of IaaS cloud Illustrated (adapted from [1])...... 46 2.4 Types of IaaS Cloud Scaling ...... 49 2.5 Common Structure of IaaS Elasticity Rules ...... 54

4.1 Key Elements of an Elasticity Rule ...... 81 4.2 Example of an Elasticity Rule ...... 82 4.3 Queue Model of 3-tier Application Architecture ...... 85 4.4 Deployment Architecture of TPC-W Book Store on Amazon EC2 106 4.5 TPC-W Workload Used in All Experiments ...... 108 4.6 Experimental Results of all Elasticity Rules - Models and Empirical113 4.7 CPU Utilization and Response Time Spikes of CPU90 Experiments 117 4.8 No. of Servers Triggered by all Elasticity Rules - Models and Em- pirical ...... 119 4.9 Servers Cost Resulted from all Elasticity Rules - Models and Em- pirical ...... 121 4.10 Simulated Workload of TrustedCRM Use Case ...... 125 4.11 Use Case Scenario Simulation Results ...... 127 4.12 Use Case Simulation Results - Average Response Time Statistics at the Application Tier ...... 129

ix LIST OF FIGURES

5.1 Example of an SLA-based Elasticity Rules ...... 136 5.2 Empirical Example of SLA Violations with CPU-based Elasticity Rules ...... 137 5.3 Architecture of Proposed SLA-based Elasticity Approach . . . . . 139 5.4 Generated Workload for TPC-W Application ...... 156 5.5 Experimental Results of all Evaluated Elasticity Rules ...... 161 5.6 Experimental Results of Elasticity Rules on Servers with Diﬀerent Capacity Proﬁles ...... 165 5.7 Performance Consistency of SLA-based and CPU-based Elasticity Rules on Small and Medium Servers ...... 169 5.8 Performance Consistency of CPU Utilization of Elasticity Rules . 171 5.9 Performance Consistency Results of CPU and SLA-based Elastic- ity Rules on Small and Medium Servers ...... 172

x List of Tables

1.1 Example of Diﬀerent IaaS Cloud Server Oﬀerings ...... 8 1.2 Example of an SLA: QoS properties and SLOs ...... 11

2.1 Cloud Computing Pricing Models and IaaS Cloud Oﬀering Examples 37 2.2 Examples of CI-SLAs from Diﬀerent IaaS Providers ...... 42 2.3 Practical Examples of IaaS Auto Scaling Rules ...... 52

4.1 TPC-W Workload Proﬁles and Web Interaction Groups ...... 105 4.2 Elasticity Thresholds Used in all Experiments ...... 109 4.3 Request Mix: Percentages of Request Types and CPU Demands of Browsing Proﬁle ...... 110 4.4 Parameters and Thresholds Used in the Use Case Experiments . . 126

5.1 Response Time SLOs for TPC-W Requests at the Application Tier 149 5.2 Elasticity Rules Set at the Application Tier Used in All Experiments151 5.3 Server Capacity Proﬁles Used in Second and Third Experiment Sets159 5.4 Resulted Metrics of CPU-based and SLA-based Elasticity Experi- ments ...... 163 5.5 Statistics of SLA and CPU-based Rules on Small & Medium Servers166

1 List of Symbols and their Explanation ...... 200

xi Abstract

INTRODUCTION

In this chapter we motivate for the main topics of our thesis. We, first, present the key principles of cloud computing service and deployment models with supportive examples of different cloud services offerings. We then introduce Service Level Agreements (SLAs) and motivate for their importance for web applications. The concept IaaS elasticity is then explained with supportive examples of key elements of elasticity rules. Here, we motivate for the key challenges that relate to IaaS elasticity, which represent one of the core pillars for this thesis. Based on these challenges, we describe our thesis aims and the main contributions realized by our research work. The chapter is then concluded with the main organization of the thesis chapters and its key sections.

1.1 Cloud Computing

The rapid development in Information Communication Technology (ICT) has motivated wide range of business organizations and enterprises to seek new ways to improve their business services. This includes improving business agility and flexibility, reducing operational costs, achieving high performance and improving efficiency of computing resources utilization of their business software applications and services. The advancement in ICT has also opened different channels through which business organizations and enterprises can offer their services. In

3 1. Introduction particular, most business organizations and enterprises have been increasingly making their business processes and services readily available through the internet, and recently through mobile devices such as smart phones and tablets. This exposure has considerably changed access patterns and volumes to the offered business processes and services. Consequently, business organizations and enterprises have been urged to investigate better ways to deploy and run their business processes and services on a computing infrastructure that meet their requirements.

In response to the evolving requirements of today’s business organizations and enterprises, we have witnessed signiﬁcant advancement in several computing technologies including Distributed Systems, Virtualization, Grid computing, Networking and Autonomic Computing [2;3;4;5]. Such technologies attempted to improve the means through which computing resources are provisioned and allocated but it has not been widely adopted due to unresolved concerns. Having said that, these technologies provided key pillars for a new computing paradigm, cloud computing. Cloud computing has been widely adopted by various business organizations and enterprises due to its proven capabilities that support businesses addressing the dynamic requirements of their business services and processes. Large number of research areas and topics have also emerged to help advancing the cloud computing ﬁeld.

Since its emergence, there have been many definitions, characteristics and classifications evolved to explain the key concepts of cloud computing. In this context we use the definition introduced by US National Institute of Standards and Technology (NIST) [6] which is widely adopted by various research and industry communities:

Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of conﬁgurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management eﬀort or service provider interaction.

4 1. Introduction

In the cloud computing model, there are two main parties involved; cloud providers who offer computing resources as services and cloud consumers or users who consume or use the offered computing services. Particularly, cloud providers offer computing resources, either hardware or software, on a service-based model similar to the well-known utility models. From the cloud consumer perspective, there are a number of characteristics that distinguish cloud computing from traditional computing paradigms. Among these characteristics are on-demand self service, rapid elasticity, and measured service [6]. These features enable cloud consumers to instantly use as much as they need computing resources (e.g., servers, storage, software) which charged based on resources usage time, typically hourly. Therefore, in contrast to the traditional computing models, the cloud computing model neither require up-front capital investment nor long term contractual commitment.

There are different service models of cloud computing which classifies the type of services being offered/consumed. NIST [6] defines the following main service models for cloud computing:

• Infrastructure as a Service (IaaS): in this model, cloud consumers have access and use various computing resources such as CPU, memory, storage and network on-demand. Such services are typically oﬀered as a hardware bundle such as servers, which can be customized by cloud consumers. Cloud consumers can therefore, conﬁgure the hardware, the operating system and the system and software applications to be deployed on the servers.

• Platform as a Service (PaaS): this service model builds on top of IaaS model. Here, cloud consumers are oﬀered computing environment on which they can develop and deploy their application. The environment includes development frameworks, e.g., Eclipse, J2SDK or .NET, operating system environments, and other software development tools. In this model, cloud consumers, however, have limited control over the hosting infrastructure including the hardware, the operating system and the networking infrastructure.

5 1. Introduction

• Software as a Service (SaaS): in this model cloud consumers uses offered software applications, e.g., Customer Relationship Management, through internet based interfaces and based on user subscriptions. They can configure high level parameters of the offered software. However, cloud consumers do not have any control over the hosting environment and platform including the application, the operating system, the computing hardware and the network infrastructure.

There are also different classifications of the cloud deployment models which are based on who owns and uses the offered services. The NIST definition [6] describes the following main deployment models:

• Public Cloud: in this model, computing resources are made available to all consumers including the individual users, individual organizations, and large industry groups, through the Internet. Public clouds are owned by cloud providers, i.e., organizations who oﬀer and sell cloud services. Cloud providers provide secure access control mechanisms to allow cloud consumers to access their services. For cloud consumers, one of the key beneﬁts of public clouds is that no upfront investment computing resources and no ongoing management and maintenance costs as cloud providers are responsible for such issues.

• Private Cloud: this type of clouds is dedicated for the exclusive use of only one organization. It may be designed and managed by that organization or by external service providers. Private clouds may be hosted on-premise, i.e., within the organization’s data-center or off-premise, i.e., by external cloud provider. One benefit of private clouds is the highest degree of control in terms of resources configuration and allocation, data storage and backup and service levels such as availability, performance and security. However, this comes with a cost; private cloud involves huge upfront capital investment on computing infrastructure and the related risks and overhead of managing it.

• Hybrid Cloud: in this deployment model, the computing infrastructure is a combination of private and public clouds. Speciﬁcally, cloud consumers out-

6 1. Introduction

source non-mission critical business processes to a public cloud environment while mission-critical processes are hosted on-premise under full control.

In the context of this research, the IaaS and public cloud models are of particular interest. We consider the perspective of business organizations or enterprise who want to move their business applications, or part of it, to an IaaS cloud to benefit from on-demand and dynamic use of computing resources on time- based prices. This allows business organizations to be agile in meeting changing customers requirements and focusing on their business needs, rather than the underlying computing infrastructure. In contrast to PaaS clouds, utilizing IaaS cloud does not require major redesign and development of the application that needs to be deployed and run on a cloud platform. The IaaS model provides a seamless way of moving and integrating applications and processes that requires dynamic computing infrastructure. In contrast to private clouds, public clouds, particularly IaaS clouds, relief cloud consumers from large upfront capital and on-going management costs of computing infrastructure. The scope of this research is centralized around public IaaS clouds due to its potential benefits when compared to the other models. For cloud consumers, the key benefits of public IaaS offerings are reduction of computing costs and increase of business flexibility and agility [7;8].

The models and characteristics of cloud computing, specifically public IaaS cloud, have increasingly become one of the most adopted computing paradigms in research and industry [9; 10]. Due to its significant business potential, public IaaS clouds have attracted a large number of cloud providers such as Amazon Web Services (AWS) [11], Rackspace [12], GoGrid [13], Joyent [14] and Elasti- cHosts [15]. Such providers offer wide range of internet-based access to a variety of computing resources such as servers, storage and network resources on-demand and on a pay-as-you-go pricing model. Table 1.1 describes a number of different cloud servers along with main hardware-software specifications and costs1.

1Servers speciﬁcations and costs are collected during our research work, 2011-2013

7 1. Introduction

IaaS Cloud Server Type and Speciﬁcation Charges Provider Amazon Web General purpose (standard) servers - Small: $0.08/hour Services Linux OS, 1 vCPU (32-bit or 64-bit), 1.7GiB, 1x160GB, Low I/O performance. GoGrid Standard cloud server - Large: $0.32/hour Linux OS, 4 virtual cores, 4GiB, 200 GB, Band- width charged separately. Joyent High I/O cloud servers: $3.067/hour Linux OS, 8 vCPU, 60.5GiB, 1452GB, 1Gbit/s. Amazon Web Memory optimized servers - Extra Large: $0.51/hour Services Windows OS, 2 vCPU (64-bit), 6.5GiB, 1x420GB, Moderate. Rackspace General purpose cloud servers: $1.04/hour Windows OS + SQL standard, 2 vCPU, 4GiB, 160GB, Bandwidth charged included. Amazon Web Compute optimized servers - Extra Large : $0.58/hour Services Linux OS, 8 vCPU (64-bit), 7GiB, 4x420GB, High I/O performance.

Table 1.1: Example of Diﬀerent IaaS Cloud Server Oﬀerings

From table 1.1 we can highlight a number of important points. First, IaaS providers bundle computing resources based on hardware and software specifications and capacity. Second, the usage charges of each bundle are made per hour, which means cloud consumers need to pay only for what they use of computing resources. Changes in one or more hardware and/or software specifications influence usage charges, often the higher capacity or more add-ons the higher the charges. Third, included hardware or software in cloud server bundles may vary between different offerings. For example, unlike all providers, Rackspace and GoGrid do not include bandwidth in their server packages and they charge for it separately. Fourth, some providers offer specific purpose cloud servers, in addition to the general purpose (standard) servers. For example, AWS of-

8 1. Introduction fers compute-optimized and memory-optimized servers in which CPU capacity and memory capacity are respectively high. Such cloud servers can be useful for CPU-intensive and memory-intensive applications. Similarly, Joyent oﬀer I/O optimized servers which can be utilized for applications that require high I/O capacity.

IaaS providers offer cloud servers prepackaged with commonly used Operat- ing System (OS), e.g., Linux and Windows, with different supported versions of each. Here, cloud server usage charges are also influenced by the type of OS used. Windows-based cloud servers are often more expensive than Linux servers due to additional licensing costs. In addition, some IaaS providers prepackage cloud servers with popular system and/or application software such as Jboss application server, Apache web server, IBM WebSphere and MySQL database server. In our example, Joyent provides SQL standard software in their Linux servers. The included software often increase server usage charges due to licensing and management and maintenance costs. Most IaaS providers also allow cloud consumers to customize and configure certain cloud servers with any OS and application and system software they may need.

1.2 Service Level Agreements

IaaS cloud offerings have increasingly attracted wide range of applications [16] including Internet business applications such as internet banking, online retailers and online Customer Relationship Management (CRM). One important characteristic that distinguishes such applications is workload variability [17; 18; 19], where the number of application’s users and usage patterns often grow and shrink at various rates and times due to its exposure to the web. Furthermore, the workload variability have also increased since most organizations and enterprises have been increasingly provisioning their business applications, or part of its business processes, as cloud services (SaaS) by hosting it on a cloud platform. These business applications are bound to Service Level Agreements (SLAs), which define a contract that ensures quality levels of offered services to the application’s users,

9 1. Introduction or deﬁned internally by the business organization to ensure meeting organization performance objectives [20; 21; 22].

An SLA is a formal contract between the service provider, e.g., business organization/enterprise, and the service consumer, i.e., the organization customers or their application’s users [21; 22]. It formally defines functional and non- functional, or Quality of Service (QoS), properties and its levels that a service provider is obliged to meet for their users/customers. An SLA specifies one or more QoS property, or SLA parameter, that can be used to measure how well an application is performing [22]. The QoS properties are defined by specifying a business or technical metric that has to be monitored continuously during the application execution.

The monitoring and measurements of QoS metrics take place either at the client side or the provider side. In the context of this thesis, we assume all measurements occur at the provider side as there might be a number of diﬀerent factors at the client that might inﬂuence the accuracy of measurement. Each QoS property is evaluated against Service Level Objective (SLO). A SLO represents a threshold that is used to continuously evaluate whether the desired QoS satisfaction level has been met. SLOs may vary depending on how critical is the application, or part of its processes, on the business enterprise’s goals.

Table 1.2 describes a number of QoS properties and its SLOs that are crucial for Internet-based business applications. In this example, the SLA consists of three QoS properties and SLOs which can be read as follows. The application response time, which should be measured periodically, has to be less than or equal 78 milliseconds. If the response time goes beyond this threshold then the response time SLO is considered to be unsatisﬁed. The throughput QoS states that the application throughput must be greater than 75 requests per second. Similarly, the availability indicates that the application must be available 98% of the time during a certain time period, e.g., day, week, month. Further details such as when, how and how regular to monitor such QoS properties can be described in the SLA and agreed between the service provider and their users/customers. The

10 1. Introduction

QoS Property SLO Response Time ≤ 78ms Throughput ≥ 75req/s Availability ≥ 98%

Table 1.2: Example of an SLA: QoS properties and SLOs

QoS properties and SLOs can be speciﬁed at diﬀerent levels the system, e.g., at the application level, at the component level or at the tier level.

Meeting SLAs is one of the crucial concerns for most of Internet business applications, and therefore it is a primary concern for business organizations and enterprises [20; 23; 24; 25; 26]. Failure to meet promised SLOs can make the provider liable for penalties. Such penalties are usually part of the SLA between the providers and their consumers [21; 22; 27]. SLA violations do cost service providers additional charges as they have to pay penalties for the cloud consumers who are influenced by QoS degradations. Furthermore, failure to meet SLOs can also lead to loss of business due to customer dissatisfaction [28], especially in case of service failure or unplanned downtime [29; 30]. Although most business organizations and enterprises agree to pay penalty charges for their SLA violations, however, this could not quantify business losses. Service degradations or failures often lead to loss of a large amount of business transactions. As a result, this will influence the reputation and credibility of the business organizations who offer such business applications [28; 29; 31].

Most of internet service providers including IaaS providers offer monetary compensation for any QoS performance degradation that do not comply with SLOs contracted with their consumers, i.e., business organizations. Such compensation do not appropriately quantify business losses as the calculation of agreed SLA penalties are not based on exact business losses. It has been argued that whatever the compensation cloud providers offer often will not compensate the cost of lost revenue, breach of customer SLA or loss of market share as a result of credibility issues [32]. Essentially the cost of downtime or failure for enterprises is very expensive especially for specific application domains [29; 30]. For example,

11 1. Introduction

GoGrid, like many other IaaS providers, oﬀer compensation of seven hours of downtime with ten servers (approximately $560). This will not compensate an E-Business Web application which cloud easily lose thousands of dollars of sales, plus its reputation and credibility between its customers [31].

Among the most important QoS properties for Internet business applications is response time and availability. Due to its signiﬁcance, response time SLA has been widely considered in various performance and cloud studies including [18; 23; 25; 33; 34]. Some other studies such as [28; 35] have shown the negative impact of high response time of web applications and transaction on a business. In most cases, end users or customers are likely to leave a page or transaction if it takes long to respond to their requests.

Among the key service levels that hinder the adoption of cloud services is performance [36]. The multi-tenancy feature, in which public cloud infrastructure is shared, incurs complexity in predicting the system performance, especially on public cloud. This leads to what is called “performance variability” [36] which results in variable and unpredicted cloud consumers usage patterns over various period of times. For cloud service providers it is very challenging and costly to guarantee a certain level of performance to its consumers. For cloud consumers it would incur significant financial losses especially in performance-critical applications, e.g., stock trading, as monetary compensations have to be paid when SLA levels are violated. In their long-term performance traces study of Amazon and Google cloud services, Losup et. al. [36] show that service performance varies in significance and magnitude between different IaaS services of the same provider.

Cloud infrastructure availability is one of the biggest challenges that face public cloud infrastructure model [7; 32]. Most IaaS cloud providers provide availability guarantees for their cloud services along with monetary compensation in case of service downtime. Durkee [32] argues that a meaningful SLA should provide an availability of five-nines of uptime, i.e., 99.999%, with 10% monetary compensation if that availability level is not satisfied in any month. None of IaaS providers, however, can guarantee the “five-nines” of uptime, and they try to at-

12 1. Introduction tract consumers by some usage reward or cash discounts. Such publicly available cloud infrastructure services are not designed to achieve the “ﬁve-nines” uptime due to the unpredictable performance behavior that results from multi-tenancy, sharing and visualization.

One of the key challenges that faces business organizations and enterprises is how to maintain consistent and appropriate SLA levels for their business applications in a cost-efficient way. This challenge is primarily because of the growing increase of workload variability of Internet business applications [18]. Particu- larly, the exposure of such applications, or part of its business processes, to the web and offering them as consumable services have led to variable user’s access patterns and volumes. This, as a result, has demanded for a flexible way to dynamically adapt the underlying computing infrastructure resources to respond to application’s workload changes. The dynamic change of computing resources would return with two significant cost benefits; (1) reducing SLA violations and hence penalty costs and customer dissatisfaction and (2) utilizing computing resources efficiently by dynamically changing them in a timely manner and reducing under-utilized resources over time. Realizing such benefits requires investigating ways that can help business organizations to understand how efficient can be the flexible use of computing resources in maintaining appropriate SLA. Therefore, it is becomes necessary for business organizations to understand IaaS elasticity and its mechanisms to achieve these goal.

1.3 IaaS Cloud Elasticity

IaaS elasticity is a service quality attribute that refers to the dynamic and ﬂexible provisioning of computing resources, and the ability to automatically scale them to meet cloud consumers changing needs [6;8]. IaaS elasticity allows enterprises, or cloud consumers, to dynamically acquire and release as much as they need computing resources on-demand and through internet-based interfaces. Here, computing resources can be servers, disk storage, memory and network. In practice, IaaS elasticity is also known as auto-scaling [8; 37; 38]. There are two main

13 1. Introduction types of elasticity or auto-scaling, vertical scaling and horizontal scaling [8; 39]. Replacing existing computing resource with another one of larger or smaller capacity refers to the vertical scaling type (also known as scale-up/down). Hori- zontal scaling, on the other hand, refers to adding or removing new computing resources (of any capacity) to an existing resources (also called scale-out/in). In practice, scale-out/in is more common than scale-up/down as it is easier to implement and it provides more reliable fail-over scenario [8; 40]. Therefore, in this thesis we focus on scale-out/in type of IaaS clouds.

IaaS elasticity is crucial for many classes of applications including Internet business applications [8; 16]. It provides automated mechanisms to dynamically change computing resources capacity at fine granular levels. The costs of such resources are charged based on the capacity and usage time. These features provides business enterprises with flexible and efficient way for meeting application’s workload changes and hence meeting application SLA requirements. When application workload volumes change, appropriate scale-out/in decisions can be triggered to add appropriate capacity to avoid potential poor application performance and to reduce potential SLA violations and charges [7;8; 41].

IaaS providers enable cloud consumers to control elasticity through implicit or explicit mechanisms. The most common mechanism is elasticity policies or rules. For example, Amazon Auto Scaling [37] allows cloud consumers to set elasticity rules that define actions to be executed in response to conditions. These conditions have to be defined by cloud consumers based on thresholds over measurable metrics and parameters. Cloud consumers can define scale-out rules to add additional computing resources (e.g., servers) and scale-in rules to remove computing resources. Similar facilities are provided by Microsoft through a library called the Windows Azure Auto-scaling Block (WASABi) [42] and by third party cloud management platforms such as Scalr [43] and RightScale [44].

In such elasticity rules, a number of thresholds, e.g., CPU utilization thresholds, form the basis for the elasticity service to decide when to scale-out or to scale-in computing resources. Figure 1.1 presents an example of such elasticity

14 1. Introduction

Monitor CPU Utilization (U) every 1 min.

IF U > 80% FOR 7 min. Add 1 server of small capacity //Scale-out Wait 5 consecutive 1 min. intervals

IF U < 30% FOR 10 min. Remove 1 server of small capacity //Scale-in Wait 7 consecutive 1 min. interval

Figure 1.1: Example of CPU-based Elasticity Rules rules. An elasticity rule consist of a condition and actions. In this example, the rule triggers if CPU utilization increases above 80% for 7 minutes (the condition) to add 1 server (scale-out), and to wait for 5 minutes before evaluating this rule again (the action). The scale-in rule triggers to remove one server (scale-in) and to evaluate this rule again after 7 minutes when CPU utilization decreases below 30% for 10 minutes. Changing one or more threshold or parameter values will influence when a scale-out or scale-in action will be triggered, and therefore directly influence resources and application’s performance (and SLA guarantees) and cost requirements [45]. For instance, setting low value for CPU utilization threshold can improve application’s performance (e.g., response time SLA) but at the expense of high servers usage cost and under-utilized servers [45]. In contrast, setting high value for the CPU utilization threshold can reduce servers cost but at the expense of poor application performance due to potential over-utilization of servers. Similarly, this applies to other parameters and thresholds that must be specified in elasticity rules. The resource utilization, application’s SLOs and cost factors cloud be also dependent on the application workload patterns.

Therefore, the most important inputs required from cloud consumers are the parameters and thresholds of elasticity rules. Specifying the elasticity thresholds inaccurately can result in adding excess servers due to over-provisioning or in loss of QoS guarantees specified in terms of SLAs for the application due to servers under-provisioning. In such cases defining cost-effective elasticity rules

15 1. Introduction becomes one of the most crucial needs for enterprises who wants to benefit from IaaS elasticity. Here, we use the term economic or cost-effective elasticity to refer to the efficient use and management of computing resources that are required to run business applications on IaaS cloud with desired SLOs and costs. In the above elasticity rules, we have explained two examples of how achieving economic elasticity can be influenced by one or more threshold and parameter which are defined in elasticity rules. This even becomes more crucial for business applications facing dynamic workloads and bound to rigid SLOs and SLA violation penalties. Hence, it is critical that the cloud consumer specify the most appropriate thresholds that lead to achieve economic elasticity.

Another important challenge that face cloud consumers to achieve economic elasticity is the type of metrics to be used as a basis in elasticity rules. Most IaaS providers, such as [37; 42; 44], enable cloud consumers to define elasticity rules based on resource-level metrics such as CPU and memory utilization of a server and amount of inbound and outbound bytes of a network. This is reasonable as IaaS providers can monitor and manage their own computing infrastructure resources, which are homogenous in nature. Application-level metrics such response time are also crucial metrics for cloud consumers [8; 28; 32]. Specifically, application-level metrics forms the basis of SLOs that the consumer obliged to meet as part of the SLAs to their users or customers [32; 46]. Main- taining appropriate SLA levels for business applications when they are hosted on IaaS cloud is crucial. Such applications have critical business value and any degradation in meeting SLOs translates into financial loss and customer dissatisfaction [28; 32; 47]. Currently, IaaS providers only provide limited resource- level SLA guarantees such as resource availability [8; 32; 48; 49], and they do not offer or support any guarantees on consumer-specific application SLOs. In the face of these challenges, it becomes the responsibility of cloud consumers to ensure appropriate SLA satisfaction for their business applications deployed on IaaS cloud. The first challenge in this direction is, how cloud consumers can define elasticity rules based on application-level metrics for their applications hosted on IaaS cloud. The second challenge here is concerned with evaluating both resource-specific and application-specific elasticity approaches in terms of

16 1. Introduction performance-cost metrics of the application and IaaS recourses. This includes evaluating the impact of different metrics thresholds for application-specific and resources-specific elasticity.

Evaluating application-level and resources-level elasticity approaches is also related to further challenges. One important challenge is concerned with server instance types. Most IaaS providers offers different types of servers (known as server instances) that vary in terms of compute capacity and usage charges [8]. For example, AWS offers variety of server instance categories each of which vary proportion of CPU and memory capacity including High-Memory and High-CPU instances. Under each category, a number of instance types are also offered with varying the amount of memory and/or CPU capacity. The usage charge per hour also varies accordingly. With such server offerings, cloud consumers become in need for investigating and analysing the performance-cost metrics of both elasticity approaches; resource-specific and application-specific approaches.

Another important challenge that adds to evaluating performance of elasticity approaches, is the performance variability of public IaaS computing resources that are reported by recent studies [50; 51; 52; 53]. These studies showed that computing resources including servers of diﬀerent IaaS providers exhibit variable performance over time. This performance variability also questions cloud consumers how consistent will be the performance and costs of application-level and resource-level elasticity approaches over time and to what extend cloud consumer can rely upon it.

1.4 Thesis Aims

In this section, we describe the main and secondary research aims addressed in this thesis. In section 1.2 and section 1.3, we discussed the research context and the key challenges of IaaS cloud elasticity respectively, which motivate for the thesis research topics. Based on this motivation, we present the concrete aims which are the focus of the research carried out in this thesis. The overall aim of

17 1. Introduction this thesis is:

To develop models that emulate the behavior of IaaS elasticity and to evaluate the performance of IaaS elasticity approaches in terms of application and resources metrics from a cloud consumer perspective

For feasibility purposes, we define this research aim within a scope that defines a number of important constraints as follows. The deployment and service cloud models we aim to use is a public IaaS cloud respectively. We focus on Internet business application class such as online shops and business services. For such application class, we adopt a 3-tier application architecture namely web, application and database tiers, each of which is considered to be physically deployed on a separate cloud server. The focus of elasticity modeling and performance evaluation is scoped at the application tier. Here, elasticity rules are considered to be defined on the application tier to automatically increase and decrease number of application servers to meet application workload changes. We also consider horizontal scaling actions, i.e., scale-out and scale-in operations defined at the application tier. We choose two types of rule-based elasticity which are widely used by cloud consumers and IaaS providers. For resource-level elasticity we use average CPU utilization (of all servers at the application tier) as monitored metric and refer to it as CPU-based elasticity. Application-level elasticity, on the other hand, is based on response time SLA metric, which is measured at the application tier, and we refer to it as SLA-based elasticity. The metrics (resources and application) that are considered in evaluating the performance of elasticity mechanism which are collected at the application tier as well. These metrics are, average CPU utilization, percentile of SLA response time, number of servers triggered by scale-out actions and servers usage cost. We also collect end-to-end response time which is measured at all tiers of the application. Further details about each of these constraints will be provided in the appropriate sections.

We further divide the overall research aim into a number of research objectives, each of which will be addressed and evaluated individually. The following

18 1. Introduction describe these objectives and the scope within which it will be realized.

To develop analytical models that can approximate the parameters and thresholds and related performance and cost metrics of CPU-based elasticity

Defining cost-effective elasticity rules requires choosing appropriate values for their parameters and thresholds. The chosen values have to result in satisfying application and resources performance and cost metrics that are crucial for cloud consumers. Our objective here is, to model the key parameters and thresholds of resource-level elasticity mechanism, which are exemplified in figure 1.1. The modeling focuses on CPU-based elasticity, in which CPU utilization metric is used as a basis for triggering auto-scaling actions. Furthermore, this modeling is used to emulate how CPU-based elasticity mechanism works in real IaaS cloud environment. This objective also includes modeling application and resources metrics that are influenced by elasticity parameters and thresholds. These metrics are average CPU utilization, average application response time, number of servers and server usages costs. As previously reasoned, the scope of our modeling is at the application tier of a 3-tier internet business application hosted on a public IaaS cloud. Further details about this aim is provided in chapter4.

To develop a simulation that emulate CPU-based elasticity mechanism and corresponding performance and cost metrics

The modeling of CPU-based elasticity parameters and thresholds provides core building blocks of elasticity rules. The main objective here is to use these models to develop algorithms that simulate how CPU-based elasticity work in real IaaS environment. This includes simulating the evaluation of auto-scaling conditions and actions over time passage, and simulating the real conditions under which decisions are made to scale group cloud servers. The algorithms also aim to simulate the approximation of the corresponding metrics that are inﬂu- enced by changes in elasticity mechanism, i.e, average CPU utilization, average application response time, number of servers and servers usage cost. Similar to the modeling objective, the scope of the simulation will be at the application tier

19 1. Introduction of a 3-tier business application, which is running on an IaaS cloud. The models and simulation will provide a platform for conducing experimental evaluation and analysis of various elasticity rules as it is described in the following objective.

To analyse the capability of the analytical models in simulating real behavior of CPU-based elasticity and approximating performance and cost metrics

The main purpose of the analytical models and algorithms is to approximate how CPU-based elasticity works in real IaaS cloud environment. Therefore, it is crucial to evaluate how well the simulation of the analytical models performs in comparison to CPU-based elasticity behavior under real IaaS cloud environment. We aim here to analyse how well our analytical models can simulate the behavior and performance and cost metrics of CPU-based elasticity mechanism. The analysis focuses on the ability of our models to emulate the real behavior of CPU-based elasticity rules, and when to trigger scale-out and scale-in actions. Furthermore, it focuses on evaluating how well our CPU-based elasticity simulation can approximate important application and resource metrics. These metrics are, average CPU utilization, average application response time, number of triggered severs and servers usage cost. The metrics are computed at the application tier of a 3-tier Internet business application. These metrics are compared with the empirical metrics that are obtained from our empirical experiments with the same CPU-based elasticity rules set deﬁned at the application tier of an online bookshop application running on an IaaS cloud. The analysis is carried out from a cloud consumer perspective who wants to achieve desired performance and cost metrics through realizing IaaS elasticity. The above two aims are addressed in further details in chapter4.

To evaluate the performance of CPU-based and application SLA-based elasticity under real IaaS cloud conditions

As previously explained, elasticity mechanisms can be driven by either resource or application metrics. Each mechanism relies on diﬀerent type of metric to decide when to trigger scale-out and scale-in actions, and hence, each elasticity

20 1. Introduction mechanism is likely to have different influence on performance and cost metrics that are crucial for cloud consumers. Therefore, our objective here is to evaluate how well resource-level and application-level elasticity mechanisms perform in real IaaS cloud environment and conditions. We choose average CPU utilization and response time SLA at the application tier as the basis for CPU-based and SLA-based elasticity mechanisms respectively. To maintain appropriate scope of evaluation, the evaluation is further divided into a number of dimensions, each of which is concerned with a specific real-world condition or scenario that ex- ists in cloud environments. The first dimension is concerned with evaluating the performance of a set of CPU-based and SLA-based elasticity rules with different CPU utilization and response time SLA thresholds. This aims to analyse the impact of such thresholds on satisfying important performance and cost metrics. The second dimension is concerned with evaluating the performance of CPU- based and SLA-based elasticity rules on two cloud servers with different capacity profiles. The main aim here is to provide insights on how cloud server profiles could impact the performance of CPU-based and SLA-based elasticity rules. The dimension focus on evaluating the performance consistency of CPU-based and SLA-based elasticity rules and on two cloud server of different capacity profiles. This aims to provide insights whether performance variability of cloud servers would have influence on the performance of CPU-based and SLA-based elasticity mechanisms. The evaluation in all dimensions is carried out in terms of resources and application metrics important for cloud consumers. These are CPU utilization, SLA satisfaction, response time, servers cost and percentage of successfully served requests. All these metrics are scoped at the application tier.

Evaluating the performance of SLA-based elasticity demands the need for approach that enables elasticity based on SLA metrics. For this reason, we aim here to develop an approach for enabling SLA-based elasticity Internet business applications running on IaaS cloud. This includes a development of architecture and methods to scale an IaaS cloud based on application SLA metrics. Following the scope of the thesis, the design of our SLA-based elasticity approach is focused on multi-tier application architecture which is deployed on an IaaS cloud. The focus of auto-scaling decisions is also focused on the application tier. The cho-

21 1. Introduction sen metric for deriving elasticity decisions is the percentile of SLA satisfaction of requests response time. We also aim to implement the SLA-based elasticity architecture on an IaaS cloud and to use it to carry out the performance evaluation of SLA-based elasticity mechanism. Therefore, the design of SLA-based elasticity architecture becomes an integral part of the performance evaluation aim. Further details of this aim will be addressed in chapter5.

1.5 Thesis Contributions

Based on the research aims presented in section 1.4, we highlight the primary research contributions of the work conducted in this thesis. These contributions add to the research output in a number of areas including modeling performance of elasticity mechanisms for cloud-based applications, empirical evaluation of resource-level and application-level elasticity mechanisms, and deriving elasticity based on application SLA metrics for multi-tier business applications running on IaaS cloud. The contributions are discussed as follows.

Modeling and simulating the behavior of CPU-based elasticity and corresponding performance and cost metrics.

The contribution in this dimension is threefold. First, analytical models that approximate the core parameters and thresholds of CPU-based elasticity at the application tier of multi-tier application running on IaaS cloud. Speciﬁcally, the models approximate CPU utilization, number of servers changes over time, scale- out and scale-in cool-down time periods, and maximum and minimum number of servers limits. These are crucial elements for emulating the behavior of CPU- based elasticity mechanism. Second, models that approximate performance and cost metrics that are crucial for evaluating performance of CPU-based elasticity mechanism. These metrics are application’s average response time, average CPU utilization, number of servers and servers usage cost at the application tier. Third, algorithms that simulate how CPU-based elasticity mechanism work, i.e., when to scale-out and when scale-in cloud servers at the application tier of an

22 1. Introduction application running on IaaS cloud. The algorithms also approximate the values of average response time, changes in number of servers and servers usage cost that are incurred by scale-out and scale-in actions over the execution of CPU-based elasticity mechanism.

The scope of the proposed models and algorithms is deﬁned in terms of a number of aspects. First, the modeling and algorithms of CPU-based elasticity and corresponding metrics focus on the application tier of multi-tier web applications running on an IaaS cloud platform. Second, the modeling focuses on Web application workload that is modeled exponentially. It does not consider other types of web workloads such as spiking workload. Third, the application tier is considered to be equally load-balanced in which all requests received by the Web tier are evenly distributed between all servers at the application tier. Further details of this contribution are presented in chapter4.

Performance evaluation and analysis of CPU-based elasticity simulation and empirically in real-world IaaS cloud environment

This contribution focuses on two main aspects. First, validating the analytical models of CPU-based elasticity using Matlab simulation and empirically in a real cloud environment. In the simulation, we have implemented our models and algorithms in Matlab, and we have used them to simulate CPU-based elasticity and application and resources’ metrics. The empirical evaluation is based on TPC- W online bookstore application running on AWS cloud. Using both, simulation and empirical experiments, we have provided analysis of how well the simulated models can approximate the behavior of CPU-based elasticity, and corresponding application and resources’ metrics. This validation provides experimental evidence of how well our analytical models and algorithms perform in emulating the behavior of CPU-based elasticity, in comparison to the real behavior of CPU- based elasticity in IaaS cloud environment. The validation also shows the extent to which our CPU-based elasticity models can approximate core metrics including application’s average response time, average CPU utilization and servers usage cost at the application tier. The second aspect of this contribution is, evaluating

23 1. Introduction and analysing the impact of CPU thresholds, which are deﬁned in CPU-based elasticity rules, on performance and cost metrics. The evaluation is also carried out using the same empirical and simulation experiments. It provides empirical evidence on the capability of our elasticity models in capturing the impact of CPU thresholds on application response time, number of servers and servers usage cost. These contributions are also constrained by the same scope presented above in the previous paragraph.

The CPU-based elasticity models and algorithms provide a tool that can support cloud consumers to efficiently evaluate different elasticity rules’ thresholds and parameters. It can also help them to carry trade-off analysis of the impact of changing one or more parameters on application’s response time, CPU utilization, number of servers and servers usage cost at the application tier of multi-tier cloud-based applications. Further details on this contribution are presented in chapter4.

Empirical evaluation and trade-oﬀ analysis of the performance of application SLA-based and CPU-based elasticity in real IaaS cloud conditions.

This contribution is primarily focused on evaluating the performance of SLA- based and CPU-based elasticity mechanisms in real-world IaaS cloud environment. Particularly, we have evaluated the performance of a number of SLA-based and CPU-based elasticity rules with different SLA and CPU utilization thresholds respectively. The SLA-based elasticity is based on application response time SLA computed as percentile at the application tier only. The evaluation is based on TPC-W online bookstore benchmark deployed and running AWS cloud with 3-tier architecture. The application workload has been generated using TPC-W workload generation software. We have evaluated SLA-based and CPU-based elasticity rules with exponentially increasing web application workload. Other types of workloads are not considered within the scope of this contribution. We have also evaluated two sets of SLA-based and CPU-based elasticity on two server instances with different capacity profiles, small and medium server instances. Fur- thermore, we have evaluated the consistency of performance of CPU-based and

24 1. Introduction

SLA-based elasticity on the same cloud server instances.

In all evaluations, we have provide detailed trade-oﬀ analysis of important application and resources’ metrics including, response time SLA satisfaction, end- to-end response time, percentage of served/dropped requests; average CPU utilization, number of servers and servers usage cost at the application tier. This performance analysis equips cloud consumers with empirical evidences on a number of important aspects. First, it provides empirical evidences on how well each elasticity mechanism perform in terms of crucial performance and cost metrics. Second, it also provides empirical evidence of the impact of changing SLA and CPU thresholds on those metrics. Third, it equips cloud consumers with empirical evidences on how well CPU-based and SLA-based elasticity could perform on diﬀerent cloud server types in terms of performance and costs metrics from a cloud consumer perspective. Fourth, it provides empirical evidences on how consistent the performance of elasticity rules can be over time given the cloud resource performance variability. Further details on this contribution can be found in chapter 5.2.

Architecture and methods for supporting SLA-based elasticity for multi-tier web applications on IaaS cloud.

As previously explained, evaluating the performance of SLA-based elasticity demanded the need for an approach for enabling auto-scaling mechanism based on SLA metrics. To achieve this, we developed an architecture for enabling IaaS cloud elasticity based on application SLA metrics. Therefore, this contribution is concerned with the design of an SLA-based elasticity approach for multi-tier applications running on an IaaS cloud. In addition, we have developed algorithms that determine SLA satisfaction based on monitored response times and SLOs of diﬀerent request types at the application tier. Using this monitoring and SLA algorithms, we have also developed an application-level SLA auto-scaling algorithms that dynamically scale-out and scale-in pool of cloud servers at application tier based on percentile of response time SLA metric.

25 1. Introduction

The application SLA elasticity approach provides cloud consumers with a way to automatically scale IaaS cloud servers based on response time SLA of their applications. It, therefore, can be used to ensure appropriate levels of response time SLA are met by automatically adding servers when SLA is below desired level and removing servers when SLA crosses above desired levels. The SLA elasticity also allows cloud consumers to evaluate the performance and cost trade-oﬀs between SLA-based and CPU-based elasticity rules. Further details on this contribution can be found in chapter 5.2.

1.6 Thesis Organization

Most of the content of this thesis has been peer-reviewed and published in international conferences and journals. The following papers represent core content of the thesis.

• JISA2011 : B. Suleiman, S. Sakr, R. Jeﬀery, and A. Liu, On understanding the economics and elasticity challenges of deploying business applications on public cloud infrastructure, JISA, 2011, vol. 2, no. 3, pp. 1-21.

• SCC2012 : B. Suleiman, Elasticity economics of cloud-based applications, in Services Computing (SCC), 2012 IEEE Ninth International Conference on Web Services, 2012, pp. 694-695.

• WISE2012 : B. Suleiman, S. Sakr, S. Venugopal, and W. Sadiq, Trade-oﬀ analysis of elasticity approaches for cloud-based business applications, in WISE2012, 2012, pp. 468-482.

• EDOC2013 : B. Suleiman and S. Venugopal, 17th IEEE International Enter- prise Distributed Object Computing Conference, EDOC 2013, Vancouver, Canada. IEEE, Sep. 2013, pp. 201–206.

• UNSW-CSE2013 : B. Suleiman and S. Venagoupal, Modeling Performance of Elasticity Rules for Cloud-based Applications, School of Computer Sci-

26 1. Introduction

ence and Engineering, University of New South Wales, Tech. Rep. UNSW- CSE-TR-201323, Sep. 2013. 24

The rest of this thesis is organized as follows:

• Chapter2 describes background information that is fundamental for understanding the work presented in this thesis. It starts with presenting the pillars of cloud computing and its deployment, service and business models. Application Service Level Agreements (SLAs) and its significance for cloud- based applications are then explained with focus on two important types of SLAs in cloud environments. Then the concept IaaS elasticity and types of auto-scaling techniques are then presented with examples of different elasticity services provided by different IaaS providers to support auto-scaling mechanisms. Followed by this, we present the general structure elasticity rules and two common types of elasticity mechanisms namely, resource- level and application-level elasticity. The chapter is then concluded with a summary of the key points. The content of this chapter is mainly based on JISA2011, SCC2012 and WISE2013 papers. The work in these papers is primarily my own work. I received feedback and suggestions on JISA2011 paper from the second and third co-authors which helped to make the paper precise and improved the clarity of the content.

• In chapter3, we discuss the research work that has been done in the area of the thesis topics. In particular, we describe the state-of-the-art and identify its contributions and limitations in the context of our research aims. We also identify research gaps and how our research contributions could ﬁll these gaps and complement existing research. The primary topics that covered in the literature analysis are (i) modeling of IaaS elasticity for cloud-based applications, (ii) performance evaluation of IaaS elasticity and cloud-based applications and (iii) SLA-based elasticity approaches for cloud-based applications.

27 1. Introduction

The content of this chapter is partially based on the work from a number of papers including JISA2011, WISE2012, and EDOC2013. The related work in these papers is primarily my own work. I received general feedback and suggestions from the co-authors on it.

• Chapter4 presents our analytical models that capture CPU-based elasticity mechanism. Particularly, modeling of parameters and thresholds that comprise CPU-based elasticity are detailed in this chapter. Furthermore, we also presents the key models for approximating application and resource metrics that are crucial for evaluating IaaS performance elasticity. Based on these models, we describe the scale-out and scale-in; and servers usage cost algorithms that simulate how CPU-based elasticity work. The validation of our IaaS models and algorithms is also presented here using Matlab simulation and empirical experimentation with TPC-W online bookstore deployed on Amazon IaaS cloud. Following the model validation we discuss a use case that shows how our CPU-based elasticity models and algorithms can be used to help cloud consumers to appropriate elasticity rules for their cloud-based applications. This chapter is concluded with a discussion of the key results of models and algorithms and analysis of the validation. chap- ter4 content is mainly based on EDOC2013 paper UNSW-CSE technical report.

• Our empirical evaluation of CPU-based and SLA-based elasticity mechanisms is presented in chapter5. In this chapter, we ﬁrst motivate for the need for SLA-based elasticity approach to support cloud consumers in evaluating elasticity mechanisms with both resource-level and application- level metrics. We then present the architectural components of our SLA- based elasticity for 3-tier cloud-based applications running on an IaaS cloud. Based on this architecture, we describe our monitoring and auto-scaling algorithms to monitor and scale the application tier based on response time SLA metric. The experimental evaluation of our SLA-based elasticity approach is presented next. Here, we present our evaluation methodology and

28 1. Introduction

experimental results for evaluating the performance of SLA-based and CPU- based elasticity mechanisms under different scenarios. The first scenario is concerned with evaluating elasticity performance with different CPU utilization and application SLA thresholds. In the second scenario, we evaluate the performance of one set of SLA-based and CPU-based elasticity rules on two cloud servers with different capacity profiles. Using these elasticity rule sets, we also evaluate the impact of cloud performance variability on the performance of SLA-based and CPU-based elasticity rules. The chapter concludes with analysis and discussion of the results and key findings. The content of this chapter is mainly based on WISE2012 paper. I have carried out the whole work in this paper. The second co-author provided feedback and suggestions on the experiment design and method carried out in this paper. Both, the third and the forth co-authors, provided feedback and suggestions to improve the content of the paper.

• Chapter6 discusses the key conclusions of the work presented in this thesis. We ﬁrst summarize the key aspects of the research work we have carried out in the context of the thesis aims and contributions. We then discuss the limitations of our research and threats to validity. Finally, we conclude the chapter with potential future work that has arisen from the research work we conducted in this thesis.

29 Chapter 2

BACKGROUND

In this chapter we present essential topics that form the context for research work of this thesis. In particular, cloud computing and its key characteristics and cloud deployment, service and business models are presented in section 2.1, with real- world examples of cloud service bundles and charging models. Application Ser- vice Level Agreements (SLAs) and its significance for cloud-based applications are then explained in section 2.2 with examples of QoS properties which are crucial for Internet business applications. In this section, we also present two important types of SLAs in cloud environments, one at the cloud resource-level and the other at the application-level. In section 2.3, we explain the concept IaaS elasticity and its significance for multi-tier business applications. Followed by this, we describe types of auto-scaling techniques and examples of different elasticity services provided by different IaaS providers to support auto-scaling mechanisms in section 2.3.1. Followed by this, we present the general structure rule-based elasticity mechanism which is commonly used by cloud consumers and providers in section 2.3.3. The two common types of rule-based elasticity mechanisms, resource-level and application-level elasticity, are then presented with examples. The chapter is then concluded with a summary of the key points.

30 2. Background

2.1 Cloud Computing

Cloud computing has evolved as a result of a number of computing technologies in research and practice such as grid computing, utility computing and virtualization. It consequently has gain compelling characteristics and benefits that attracted the attention of wide range of communities including research, IT and business organizations and societies. As a result, a number of useful papers such as [3; 54; 55] have introduced comprehensive definitions of cloud computing and discussed various related aspects of cloud computing from different perspectives. The following is the US National Institute of Standards and Technology (NIST) [6] definition which has been widely accepted and adopted by various research and industry communities:

“Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of conﬁgurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management eﬀort or service provider interaction”

The above definition highlights a number of interesting points. First, cloud computing is not a new technology, but rather a new computing model of access- ing and using computing resources at massive scale. Second, the cloud computing model has been made possible due to the advances in various technologies such as virtualization, dynamic resource allocation and configuration, fast and reliable network access, e.g., LAN and WAN, and computing processing power, e.g., CPU, RAM, storage. Third, computing resources are provisioned and consumed as services through the internet, following the web services principles. This implies that computing resource services can be programmatically accessed and utilized on- demand through internet-based interfaces. The service model of cloud computing also implies that there are two main players, cloud service providers and cloud service consumers. Forth, different types of computing resources, separately or bundled together, are provision and consumed including, servers, storage, networking, applications and services. Fifth, management of computing resources and service has become easier for cloud consumers as it has been shifted to cloud

31 2. Background service providers. Various software and tools such as monitoring and control tools are essential, by cloud service providers and other software providers, to enable eﬃcient and easy to use creation, execution and management of computing resources.

Characteristics of Cloud Computing

Many studies including [3; 54; 55] have also investigated the principles and characteristics that distinguish cloud computing from other traditional computing paradigms. NIST deﬁnition [6] describes the following key characteristics of cloud computing model:

• On-demand self-service: computing resources and services, e.g., servers, storage and monitoring applications, can be automatically consumed as needed with minimal human interaction with cloud service providers.

• Broad network access: computing resources and services are accessed through the network, i.e., WLAN, LAN, and using standard communication mechanisms that promote use by heterogeneous light or heavy client platforms such as mobile phones, laptops, and PDAs.

• Resource pooling: computing resources and services of a cloud service provider are pooled to serve multiple cloud consumers using multi-tenant model and virtualization technology, in which diﬀerent physical and virtualized computing resources and services are dynamically allocated and de-allocated based on cloud consumer demand. The cloud consumer has no control or knowledge over the exact location of the utilized computing resources, i.e., location independence services, but may be able to specify location at a higher level of abstraction.

• Rapid elasticity: computing resources and services can be quickly and ﬂexi- bly scaled based on consumers changing needs, e.g., amount and size of data supported by an application and number of concurrent users. The cloud appears to be unlimited to the service consumer and therefore they can rent and release cloud resources as much or little as they need. Resource scaling

32 2. Background

Figure 2.1: Cloud Service Models with Examples of Cloud Providers and Con- sumers

can be realized in diﬀerent ways, i.e., vertical and horizontal scaling, as it will be described in section 2.3.

• Measured services: because cloud computing resources are provisioned in on-demand self-service, its usage is automatically controlled and monitored using metering capabilities by cloud service providers. The metering service includes diﬀerent types of cloud resources such as storage, compute processing, bandwidth and active user accounts. Metering services are essential for accurate resource usage billing, pay-per-use model, capacity planning and resource optimization.

2.1.1 Cloud Service Models

Many attempts such as [3;6; 55], have been made to classify service models of cloud computing. These models are based on the type of computing resources offered and consumed and it is presented as cloud hierarchy, stack or architecture. The NIST definition [6] provides a common service model that addresses the key abstraction levels which are commonly accepted in research and industry. Figure 2.1 shows the main cloud service models according to the NIST definition

33 2. Background with real world examples of cloud providers, services and consumers.

• Infrastructure as a Service (IaaS): in this model, computing resources such as servers, storage and network bandwidth are offered as on-demand me- tered services by cloud service providers. Examples of major IaaS providers include Amazon Web Services (AWS) [11], GoGrid [13] and Rackspace [12]. The IaaS layer enables access to a pool of computing resources and storage by dynamic partitioning the physical infrastructure resources using virtualization technologies such as Xen [56] and VMware [57]. As shown in Figure 2.1, the IaaS layer forms a foundation for other cloud services as it provides essential infrastructure resources that are needed to develop and run software applications and services. At this layer, dynamic scaling of computing resources and services are made. Cloud consumers can utilize the IaaS to deploy, run and control their software applications in an elastic way. They can either use computing resources that are pre-configured with systems and software applications such as Linux or Windows; and IBM DB2 database, or configure and customize computing resources with certain systems and application software. Cloud consumers, however, do not have control over the underlying physical infrastructure itself.

• Platform as a Service (PaaS): this layer builds on top of IaaS layer and provides application development frameworks such as Eclipse, J2EE or .NET environments, operating system environments and other software development tools. PaaS provides cloud consumers with a complete platform to develop cloud-based from scratch applications. The users or consumers of the PaaS layer, often software/system developers, have control over the deployed applications and conﬁguration of its hosting environment. They can develop software applications or services, often from scratch, to be run on cloud provider infrastructure. However, they do not have control over the infrastructure itself including network, servers, operating systems or storage. Having said this, the extent to which cloud platform users or consumers have over managing the platform vary from provider to another depending on their platform capabilities and technologies they use. The use of underlying computing resources, abstracted below the PaaS layer, is typically

34 2. Background

charged similar to IaaS oﬀerings. Examples of major PaaS platforms include Google’s App Engine [58], Microsoft [59], and Salesforce Platform [60].

• Software as a Service (SaaS): this model refers to the provisioning of software applications, fully or part of it, as service over the internet. The offered applications are completely configured on certain computing environment and ready to be run and used by cloud consumers. Examples of primary SaaS providers include Salesforce Sales Cloud [61], SAP Business ByDesign [62], and Microsoft office 365 [63]. In SaaS model, applications are accessible from different platforms such as laptops, Mobile phones and PDAs. SaaS consumers have no control over the underlying infrastructure including storage, servers, network, operating systems, or even the offered software application capabilities, except some high-level user-specific application configuration and settings. The level of management and control the SaaS user could have depends on the SaaS providers capabilities and offering.

2.1.2 Cloud Deployment Models

In diﬀerent studies [3;6; 55], clouds are distinguished based on who owns and uses the computing resources and where are they distributed. The NIST deﬁnition [6] of cloud computing describes the following common cloud deployment types.

• Public Clouds: in this type, the infrastructure resources are made available to the general public, individual users, individual organizations, or large industry group. Public clouds are owned by organizations who sell cloud services such as Rackspace, GoGrid, Amazon and Google. In this model, cloud consumers neither need to make upfront investments on infrastructure resources nor ongoing management and maintenance costs as these are shifted to the cloud providers. In addition, computing resource configurations are provided and billed in different levels of granularity, i.e., based on CPU and RAM speed, storage amount, etc. This provides cloud consumers with wide variety of cloud services that meet the needs of different organizations of different sizes and different applications types. In public

35 2. Background

clouds, quality of delivered services, e.g., availability, performance, security, are usually speciﬁed and agreed in a contract between the cloud service provider and consumer using a Service Level Agreement (SLA). In public clouds, consumers have limited control over the physical infrastructure resources which could inﬂuence their performance.

• Private Clouds: in this model, the cloud infrastructure is designed for the exclusive use of only one organization. It may be designed and managed by the organization who owns it or by external IaaS providers. Private clouds may also be hosted either internally within the organization wall or externally off-premise. One obvious benefit of private clouds is the highest degree of control in terms of resources configuration and allocation, data storage and backup and service levels such as availability, performance and security. One drawback of private cloud is the large upfront capital investment on infrastructure and the related risks and overhead of managing it, similar to traditional on premise server farms.

• Community Clouds: in this type of cloud, several organizations that belong to a speciﬁc community and have shared concerns, e.g., mission, security requirements, policy and complaisance consideration, share the cloud infrastructure. The cloud infrastructure may be managed by the organizations or a third party organization, and it may exist on-premise or oﬀ-premise. Similar to private cloud, the community who share and own the cloud has maximum control over several aspects of the cloud infrastructure.

• Hybrid Cloud: here, the cloud infrastructure is a combination of two or more diﬀerent cloud deployment models. Here, part of the infrastructure, where information or application to be processed and stored are not mission- critical, is outsourced to a public or community cloud. Mission-critical business information or application, on the other hand, is hosted on-premise under full control (either on private cloud or traditional hosting environment). Hybrid clouds enable higher control and security over various aspects of the cloud infrastructure and applications running on them. They also beneﬁt from dynamic elasticity and reduced costs of the public cloud model.

36 2. Background

2.1.3 Cloud Business Models

Pricing Commitment Example of Cloud Oﬀerings model Per-use Nil On-demand servers ($ per hour use). Ex- amples: Amazon on-demand and spot instances, Rackspace cloud servers, Terremark vCloud (per-hour). Subscription Short-term(less Dedicated servers (upfront $ per time than 6 months) period). Examples: GoGrid dedicated and Long- servers (monthly), Joyent Smart-Machines term(1-3 years) (monthly), Rackspace servers (monthly), Amazon reserved instances (1 or 3 years) Prepaid Nil On-demand servers ($ per hour use deducted per-use from prepaid credit). Examples: Elasti- cHosts hourly-burst cloud servers, GoGrid cloud servers (hourly), Joyent SmartMa- chines (daily) Subscription Short-term(less Dedicated servers (upfront $ per + per-use than 6 months) month/year) + (on-demand instances and Long- $ per hour use). Examples: ElasticHosts term(1-3 years) monthly cloud servers+ hourly usage, Joyent monthly SmartMachines + daily usage

Table 2.1: Cloud Computing Pricing Models and IaaS Cloud Oﬀering Examples

The business models of cloud computing describe how cloud services are charged, and the level of commitments that cloud consumers need to obliged for. Ta- ble 2.1, summarizes the main pricing models according to which cloud services are charged with some real-world examples of cloud oﬀerings [8]1. These models are summarized as follows. 1The oﬀering types are up-to-date of the published research in 2011

37 2. Background

• Per-use model (pay-as-you-go): this pricing model is the most commonly used by IaaS cloud providers for charging various cloud oﬀerings including cloud servers. In this model computing resources are bundled and billed per-unit-of-usage time (e.g., per-minute, per-hour). Here, cloud usage prices vary based on capacity of bundled computing resources such as CPU, RAM, disk storage and/or bandwidth. Examples of this pricing type include $ per GB input/output data transfer over a network, $ per GB data storage per period of time and per IP address usage per unit of time. The per-use model is simple and it does not require any upfront payment and/or long- term commitments. Furthermore, computing resources can be allocated and released on-demand and charged according to its usage time.

• Subscription model: in this model, cloud consumers need to subscribe in advance for using computing resources for a short or long period through an agreement with a cloud provider. Here, cloud computing resources are reserved or dedicated for a certain cloud consumer. Often, the rates of most cloud computing resources under this model are cheaper than the equivalent amount of on-demand hourly usage. However, unlike pay-as-you-go model, the subscription model requires upfront payment and long-term or short- term commitment, ranging from months to years.

• Prepaid per-use model: this model is a variation of the per-use pricing model in which cloud computing resources are billed on-demand usage, e.g., hourly rate, but from a prepaid credit set by a cloud consumer upfront. Unlike subscription model, neither upfront payments are required nor cloud computing resources are reserved in advance.

• Subscription+per-use model: this model is a hybrid which combines both per-use and subscription models. Here, dedicated computing resources must be rented in advance for a period of time and with upfront payments and commitments. In addition, cloud consumer can rent additional computing resources on-demand which will be charged based on usage per unit of time.

Table 2.1 summarizes the four pricing models along with key commitments and examples of relevant cloud oﬀerings of diﬀerent IaaS providers. Often, server

38 2. Background cost with the subscription model is cheaper than the equivalent amount of on- demand hourly usage servers. The latter, however, is more convenient as it allows ﬂexible allocation and release of computing resources of any capacity and at anytime. Joyent’s daily-usage SmartMachines are an intermediate solution, it is also cheaper than on-demand hourly-usage servers. Physical machines are often more reliable as it does not rely on dynamic resource scheduling and sharing like on- demand ones. In the prepaid per-use, prepaid credit must not go below a certain limit and some IaaS providers such as ElasticHosts may not refund unused credit but they still charge their consumers based on per-use model.

The subscription+per-use combines the advantage of discounted dedicated servers which is fit for continuous and stable fixed workload and the availability of on-demand instances for variable application workloads. Some pricing parameters are also used to differentiate the offerings. Most of cloud server offerings of all pricing models differentiate between Windows and Linux/Unix servers; Win- dows servers are often more expensive as Linux/Unix systems often have open source licenses which does not incur any upfront costs to purchase and install. Most IaaS providers allow cloud consumers to customize their cloud servers with different software applications and server instance prices are adjusted accordingly. Amazon offers most of its cloud servers at three main regional areas and pricing varies accordingly.

2.2 SLAs in Cloud Computing Environments

One of the most important aspects of Internet-based business applications is ensuring appropriate levels of Quality of Service (QoS). This includes achieving high performance and availability of business services and operations to an organization clients and/or end users. Over the last decade, Service Level Agreements (SLAs) have received increasingly wide interest particularly for e-business applications [20; 23; 26; 27; 64]. As business services, processes and operations are made available to clients and end users through the Internet, business organizations become in an important need to ensure desired service levels are maintained.

39 2. Background

An SLA is an agreement between a service provider and service consumer that specifies the QoS attributes and Service Level Objectives (SLOs) that must be guaranteed by the service provider. As an example, an organization (service provider) may specify to their clients (service consumers) that the response time of a certain service offered via the Web will not exceed the agreed SLO, e.g., 470 millisecond. Breaching this SLO will lead to violating the agreed SLA, and often this incurs financial and non-financial losses to the organization.

Achieving desired SLAs for e-business applications is highly dependent on the underlying computing infrastructure on which applications are deployed. Before the emergence of cloud computing model, business organizations used to run their business applications on either their own on-premise on an external computing hosting infrastructure. Therefore, it was the responsibility of the business organization or hosting provider to grantee and manage appropriate application SLAs. This is, however, very expensive approach as maximum computing resources and capacity should be provided upfront, and continuously monitored by the organization or the hosting provider. The introduction of cloud computing, particularly IaaS public cloud, has brought better approaches to manage computing infrastructure in a more cost-eﬀective way. As previously explained, deploying and running business applications on IaaS cloud can help business organizations leveraging dynamic computing resources on-demand and on pay-as-you-go model. Public IaaS cloud is useful for achieving application SLAs, but it introduces new challenges that face business organizations. The deployment and running business applications on IaaS cloud demand the need for distinguishing between two primary types of SLAs in cloud environments as described in [8]:

• Cloud Infrastructure SLA (CI-SLA): this type of SLA is concerned with QoS of an IaaS cloud on which applications are deployed and running. Par- ticularly, IaaS cloud providers oﬀer their cloud consumers guarantees for their cloud services. Such SLA guarantees encompass QoS levels and capabilities of oﬀered IaaS resources and services such as servers performance, network speed, resource availability and storage capacity. The agreement here is between the IaaS provider and the business organization who would

40 2. Background

Figure 2.2: Types of SLAs in IaaS Cloud Environment and its Dependency

like to use IaaS services to deploy and run their applications on IaaS cloud. Therefore, IaaS providers are obliged to ensure appropriate performance levels of their IaaS services are satisﬁed. However, IaaS providers do not guarantee or maintain application-level QoS metrics which are crucial for cloud consumers.

• Cloud-based Application SLA (CA-SLA): in this type of SLA, QoS levels of applications deployed and running on an IaaS cloud are main concern. This type of agreement is between a business organization, who hosts their business applications or part of it on an IaaS cloud, and their clients, who use application services and functions. Here, business organizations specify guarantees that describe quality levels of their oﬀered functions and services for their application clients or end users. Examples of common QoS properties that are speciﬁed in such SLAs include application’s response time, percentage or request errors and service availability. For example, an organization who deploys their Customer Relationships Management (CRM) on a public IaaS cloud would be interested in monitoring and ensuring average

41 2. Background

IaaS Provider Cloud SLA Windows Azure 99.9% of the time Content Delivery Network (CDN) will IaaS respond to client requests and deliver the requested content without error [65]

Amazon Web Ser- 99.95% uptime of Elastic Block Store (EBS) and Cloud vices (AWS) Servers and 99% uptime of Simple Storage Service (S3) during monthly billing cycle [66; 67]

Rackspace 99.9% availability of “Cloud Files” and “Cloud Database Instances” services in a given billing cycle [68] Terremark 100% availability of “vCloud Express” service in a monthly billing cycle [69]

Table 2.2: Examples of CI-SLAs from Diﬀerent IaaS Providers

waiting and average service times to meet desired SLOs agreed with their clients. Therefore, CA-SLAs reﬂect the perspective and responsibility of cloud consumers of ensuring application’s QoS.

The definitions of both types of SLAs bring our attention to an important relationship between CI-SLA and CA-SLA. This relationship has been illustrated by [8] as shown in Figure 2.2. Deploying and running applications on an IaaS make them highly dependent on provisioned cloud infrastructure resources. In particular, the performance and service levels of offered cloud infrastructure resources is highly likely to influence application’s QoS levels. For example, a performance degradation in offered servers or network bandwidth will translate into a high application response time and probably dropping or error requests. Therefore, achieving appropriate CA-SLA in cloud environment requires careful considerations cloud services and their QoS guarantees of CI-SLAs provided by IaaS cloud providers.

Currently, most of IaaS cloud providers provide limited CI-SLA for their computing resources and services. Table 2.2 shows examples of real CI-SLAs of IaaS cloud services offered by different providers. From this table, we can see most of IaaS providers offer QoS guarantees only to their cloud resource services such as

42 2. Background availability of servers, network and storage services. Furthermore, CI-SLAs cover certain aspects of the service and they are not detailed enough to cover further aspects such as performance and speed of servers and network [7; 32]. Providing appropriate CI-SLAs is crucial for cloud consumers as their application’s QoS levels are significantly dependent on CI-SLAs (as illustrated in Figure 2.2). Partic- ularly, the costs of application down-time or performance degradation often very expensive [29; 30]. Several costs have been reported in a Gartner research [70] including revenue, financial performance, employee productivity and damaged reputation. Most cloud providers offer “credit percentage” of the offered service in case agreed service levels are not respected. Durkee [32] argues that whatever compensation a cloud provider may offer for cloud service degradation or failure, often it will not compensate for the cost of lost revenue, breach of customer SLA or loss of market share credibility.

For cloud consumers, CA-SLAs are also crucial for their business applications and services. When hosting and running their applications on an IaaS cloud, cloud consumers, or business organizations, need to ensure adequate CA-SLAs to their application clients. Examples of such CA-SLA include application response time and percentage of successfully served requests. In such case, CI-SLAs are not adequate for determining how well application’s QoS will perform, and therefore, ensuring CA-SLA levels. Therefore, it would be very challenging and complicated for IaaS providers to guarantee CA-SLAs for their cloud consumers [8]. This is primarily because of the heterogeneity of cloud consumers’ application workload and characteristics, and hence, their CA-SLA requirements. Therefore, maintaining and managing CA-SLA becomes one of the crucial responsibilities of cloud consumers.

One of the key QoS properties that could inﬂuence satisfying CA-SLA and CI- SLA is cloud resource performance [8; 36]. The multi-tenancy nature of an IaaS cloud, in which computing resources are virtualized and shared between diﬀerent cloud consumers, considerably increase complexity of predicting the IaaS system performance. So it does for predicting performance of application running on such IaaS clouds. The variability of application’s workload and usage patterns of

43 2. Background

IaaS computing resources over time also contribute to the performance variability of IaaS cloud [36]. Performance variability of IaaS cloud resources can lead to application performance degradations such as increase in application response time and number of dropped requests. As a result, this will highly influence CA-SLAs and cause financial and non-financial losses for cloud consumers. Busi- ness organizations who would rely on IaaS cloud to run their applications need to understand how much their business can tolerate performance and costs in order to meet their CA-SLAs. IaaS cloud providers employ automated elasticity mechanisms that aims to help cloud consumers to control changing performance of their applications. Therefore, it would be beneficial to investigate how such mechanisms work and to what extent they can help consumers to ensure meeting their CA-SLAs. The following section presents these automated elasticity mechanisms to form a foundation for exploring their benefits in meeting application SLA and performance and cost metrics.

2.3 IaaS Cloud Elasticity

Elasticity, or elastic computing, is one of the most crucial characteristics of cloud computing. It refers to the ability of a system to dynamically adapt its underlying computing infrastructure resources to variable workload changes over time [8]. This involves scaling, or resizing, computing resources by diﬀerent means including automatic means which is also known as auto-scaling. Elasticity applies to all cloud computing service models, but the focus of adaptation vary accordingly. In SaaS model, the focus of scaling is at the application level, e.g., the number of supported users who use provisioned software or service. In PaaS, certain blocks or packages of the application need to be deﬁned as scalable objects along with application performance metrics that must be met. Therefore, elasticity is realized at the code or development framework level. IaaS elasticity is realized at the level of the physical or virtual hardware. For example, increasing or decreasing the capacity of processing power, memory or storage.

44 2. Background

In SaaS model, the whole stack, from hardware to application or services, is managed and handled by the cloud provider. A PaaS provides cloud consumers with a platform and development environment on which they can develop and deploy their applications, often from scratch, on it and the PaaS will take care of managing the underlying hardware. Therefore, in both models, SaaS and PaaS, cloud consumers do not take control of adapting and managing the underlying hardware infrastructure including provisioning of computing resources, auto-scaling and load balancing. In IaaS model, computing hardware resources are oﬀered as services for cloud consumers to run and manage their applications on them. In contrast SaaS and PaaS, therefore, cloud consumers need to control and manage the provisioned IaaS cloud services through automated and self- service interfaces.

Unlike PaaS, deploying and running business applications on public IaaS cloud does not require major architecture redesign and coding and therefore allowing business organizations to focus on their core business competitive advantages. Such applications can derive immense benefits from (a) massive computing resources at hourly usage costs with no upfront payments or long term commitments [7] and (b) on-demand dynamic computing resource elasticity [6]. In this context, we consider IaaS elasticity for internet-based transactional business applications such as Amazon Web store1, Ticketek2 and Customer Relationship Management (CRM), but not the analytical ones. Such class of applications often has fluctuating workload patterns and volumes because it serves a wide range of users who would have different needs. Failure to accommodate such variable transaction volumes would directly affect critical business metrics of a business such as profit, customer satisfaction and company reputation. Such factors emphasize the significance of elasticity and cost-effectiveness benefits of IaaS clouds. The discussion of elasticity concept, therefore, will be focused on IaaS cloud model.

1http://www.amazon.com/ 2http://www.premier.ticketek.com.au/

45 2. Background

Figure 2.3: Elasticity and Cost-eﬀectiveness of IaaS cloud Illustrated (adapted from [1])

Figure 2.3 illustrates the elasticity and cost-effectiveness of deploying and running business applications on an IaaS cloud. Traditionally, business organizations used to plan their computing infrastructure based on the maximum expected computing resource capacity, i.e., fixed computing capacity as shown in Figure 2.3(a). Given today’s dynamic business workload changes and growth, such capacity planning needs to be more flexible and cost-effective for two main reasons. First, traditional infrastructure capacity planning involves very large upfront capital investment which could reduce organization’s cash flow considerably, and it has a very long payback period. Second, such large computing capacity cannot be efficiently utilized all the time except when maximum load is reached. Therefore, there are time periods where computing resources are under- utilized (see Figure 2.3(a)). In addition to resources under-utilization, there are additional on-going costs for maintaining computing infrastructure operational and healthy (e.g., renting physical space, electricity power, management services, maintenance, etc). Figure 2.3(a) also shows some time periods where required application’s computing capacity exceeds its planned capacity because of unexpected workload spikes or business growth. Inability to meet such dynamic computing capacity often leads to customer frustration and negative impact on organization’s reputation, and as a result potential loss of profit and customers [28].

IaaS model of cloud computing enable cloud consumers to achieving efficient and flexible utilization of computing infrastructure resources as illustrated in Figure 2.3(b). In this figure, the blue line represents the actual infrastructure ca-

46 2. Background pacity needed to serve application workload changes. The green line refers to the capacity of allocated computing resources over time using IaaS model. From this ﬁgure, we can notice that the elastic provisioning of computing resource capacity in the IaaS model allows business organizations to handle over-utilization and under-utilization problems in an eﬃcient manner. Here, computing resources are launched and released on-demand to meet workload changes and to ensure enough resources are provisioned. Therefore, there is no need for large upfront capital investment on computing infrastructure resources and the on-going management and maintenance costs. As a result, under-utilization and over-utilization scenarios in traditional capacity planning can be minimized.

2.3.1 Architecture and Mechanisms of IaaS Elasticity

Most IaaS providers combine computing resources such as processing unit, memory, disk storage and network bandwidth into service bundles. Such bundles are often offered as instances or classes of service by varying one or more these resources and/or its capacity. For example, Terremark [71] cloud servers are packaged in terms of CPU and RAM resources only. Joyent SmartMachines and Rackspace Cloud Servers instances are bundles of CPU, RAM, disk and bandwidth. Unlike most IaaS who providers who have number of server instance offerings, ElasticHosts [15] allow their cloud consumers to customize their cloud servers by varying CPU, RAM, disk and data transfer/bandwidth capacity at very fine-grained levels. Based on this service structure, IaaS providers provide different granularity levels at which they allow cloud consumers to scale cloud resources. This includes the following common scaling points related to different computing resources [72]:

• Adding/removing virtual or physical servers of the same or bigger capacity.

• Increasing/decreasing computing resources such as CPU, memory and storage capacity by adding/removing additional hardware components to existing machines.

• Increasing/decreasing network speed and number of IP addresses.

47 2. Background

• Increasing/decreasing amount of data transfer and number of data operations/requests of cloud resources.

Depending on their service bundles, IaaS providers vary in the way they enable elasticity and resources to scale. Some providers such as AWS [11] and Rackspace [12] allow their consumers to add and removing resources from their re- configured service instances with certain capacity. Some others, such as GoGrid [13], allow consumers to vary resources’ capacity such as memory. Therefore, it worth noticing that the service offering structure and level of granularity of computing resources would influence how elasticity can be realized from one IaaS provider to another. Our discussion in this thesis, will be focused on scaling capacity of cloud servers which are pre-configured and packaged by an IaaS provider. In practice, scaling cloud servers is the most common and adopted approach by IaaS providers and consumers.

There are two common scaling architectural types that are adopted by cloud providers and consumers [8; 37; 39] as shown in Figure 2.4:

• Horizontal scaling: is concerned with adding (scale-out) and removing (scale-in) computing resources (e.g., servers) to existing ones in order to meet application’s workload changes (Figure 2.4(a)). Here, a load balancer is often used to distribute or direct the incoming workload between all servers depending on pre-conﬁgured metrics or policies. Horizontal scaling provides a gradual increase with small computing capacity.

• Vertical scaling: is concerned with replacing an existing computing resource, e.g., cloud servers, with another one either of larger capacity (scale-up) or smaller capacity (scale-down) to serve changing applications workload. Here, increasing workload is served by a more powerful server with higher capacity and no need for distributing workload between a number of servers of small capacity.

Scaling out/in is suitable for transactional web application architectures, e.g., web servers, application servers [72]. In such class of applications, a request/response processing is the focus, more than a conversational processing which needs to keep

48 2. Background

Figure 2.4: Types of IaaS Cloud Scaling state of client requests and information about them. One advantage of horizontal scaling is that it usually does not need a lot of conﬁguration work or architecture changes as the application workload is distributed between servers using a load balancing. In contrast to scale up/down management and monitoring activities become increasingly complex when the number of servers grows. Vertical scaling is suitable for database application servers where powerful machines with high throughput and processing power becomes crucial. One of biggest disadvantages here is such powerful servers become a single point of failure and could lead to periods of down-times. This is not the case with horizontal scaling as if one server fails, incoming workload can be redirected to other running servers until that server becomes healthy or a new server is added. We adopt horizontal scaling in this thesis as it is widely used by cloud consumers and it ﬁts well with design-for- failure best practice of architecting multi-tier Internet business applications [40].

Both techniques, horizontal and vertical scaling, can be realized through static and dynamic means [72]. In static scaling, an IaaS cloud expert scale cloud servers, vertically or horizontally, manually using cloud management tools and/or technical interfaces provided by a cloud provider. Static scaling allows better human control over the computing infrastructure. However, it can be very expensive as it needs human labour and it can increase human factor error. Some organi-

49 2. Background zations use this way due to their rigid business processes that require approvals and legal constraints. Dynamic scaling, also known as auto-scaling, is achieved through programmatic interfaces that can be configured once to automatically scale an IaaS cloud horizontally and vertically. Dynamic scaling automate the control process over the computing infrastructure, and hence, it enables rapid scaling decisions with minimal human intervention and costs. Therefore, dynamic scaling fits well with the definition and characteristics of cloud computing, i.e., rapid elasticity through automated self-interface over the internet. In practice, although most IaaS cloud providers supports both static and dynamic scaling, but dynamic scaling is the one that is widely used by cloud consumers as it real- izes real benefits of elasticity [8].

Most of IaaS cloud providers do not inherently enable elasticity of computing resources in their offerings. IaaS providers often have large number of cloud consumers who have different types of applications and requirements. Such applications often have very dynamic workload changes over time as well as performance and cost objectives. Therefore, it would be highly challenging and risky for an IaaS to guarantee scaling computing infrastructure according to each customer’s workloads and performance-cost metrics in their IaaS offerings. Instead, most of IaaS providers provide internet management tools and technical Application Programming Interfaces (APIs) to support cloud consumers realizing elasticity of their IaaS cloud. Examples of such services include:

• AWS Elastic Load Balancing [73]: a service to distribute application’s requests across diﬀerent server instances set to be auto scaled by predeﬁned auto-scaling policies.

• Autoscaling Application Block for Windows Azure (WASABi) [74]: a feature that allow cloud consumers to deﬁne how a computing platform should respond to application workload changes. Management APIs are also provided with WASABi to request additional resources when they are needed.

• GoGrid’s Infrastructure [75] and RAM Scaling [76]: GoGrid provides a management console and automated APIs to scale out/in and scale up/down

50 2. Background

cloud servers. They also provide a RAM scaling method that allows cloud consumers to increase and decrease RAM capacity of their cloud servers.

• Rackspace Cloud Monitoring [77]; and AWS CloudWatch [78]: tools to monitor, control and visualize different IaaS cloud resource metrics that support automated scaling techniques they offer. Both providers offer manual configurations and management through a management console and automated APIs for automating configuration and monitoring of their services.

• AWS Auto Scaling [37]: set of services such as auto scaling groups, alarms, and load balancing to conﬁgure and manage scaling policies on AWS IaaS cloud. These services are provided through management console so cloud consumers can conﬁgure them according to their application’s needs. Also, APIs are provided for all these services so scaling operations can be automated.

• RightScale Autoscaling [44]: tools that allow cloud consumer to automate conﬁguration, monitoring and scaling their computing infrastructure across multiple IaaS platforms.

It is not only IaaS providers who provide supporting tools for automating and managing auto-scaling of computing resources but also third party providers. RightScale [79] is an example of such third-party party cloud providers who offers IaaS cloud management services across multiple cloud platforms. It provides tools for automated deployment, monitoring and auto-scaling for multiple IaaS providers including AWS, GoGrid and Rackspace clouds. The main advantage of such third-party services is a management platform which consolidates the con- ﬁguration, automation and management of auto-scaling policies across multiple IaaS clouds in a transparent way.

As described above, there are different methods, tools and services through which IaaS providers enable cloud consumer to realize auto-scaling ,or elasticity. Having said that, most of these techniques are mainly on based policies, more specifically rules [45]. Such rules enable cloud consumers to define, configure and control auto-scaling operations for their applications to run on an

51 2. Background

IaaS Auto Scal- Auto Scaling Rules Examples ing Service Windows Azure If the average CPU utilization over the last 10 minutes WASABi for worker role A (across all instances) is greater than 90%, then scale out application server instances.

Amazon Web Ser- If the application latency seen by the load balancer vices (AWS) is greater than 800 millisecond for 9 minutes continuously, scale out the application tier.

Rackspace Auto If average memory utilization across all web servers is Scale greater than 85%, then scale out the web servers

Table 2.3: Practical Examples of IaaS Auto Scaling Rules

IaaS cloud. Table 2.3 shows practical examples of such rules from diﬀerent IaaS providers. As it can be noticed from these examples, scaling decisions are primarily based on thresholds and parameters of monitored metrics. Example of these thresholds include CPU utilization, memory utilization and application latency. Such thresholds are crucial as they determine when to scale computing resources to ensure application workload changes are accommodated appropriately. Such scaling operations directly inﬂuence performance and costs metrics. Scaling-out increases resources capacity, and hence, usage charges, to accommodate growing workload while maintaining appropriate performance levels or even improving them. Scaling-out also increases usage charges when new resources are added. In response to shrinking workload, scale-in decreases unneeded computing resources, and therefore usage costs, while maintaining appropriate performance levels.

2.3.2 Types of IaaS Elasticity Techniques

There are various methods and techniques that are employed to derive scaling to automatically change computing capacity in IaaS cloud environments [8; 80]. These techniques primarily focus on deciding when and how to automatically scale an IaaS cloud. They can be mainly classiﬁed into two categories; reactive

52 2. Background and proactive elasticity techniques.

Proactive elasticity: in this technique, scaling decisions are primarily based on predictive models to decide when and how to scale an IaaS cloud [8; 80]. Such models focus on employing methods and algorithms that analyse historical information and/or future trends to derive scaling decisions before a change happens. Such information and trends could relate to anything in the cloud environment including the cloud resources (e.g., servers, network, systems) and the application components (e.g., workload, application tiers). There is signiﬁcant work such as [33; 81; 82; 83; 84; 85] focus on proactive elasticity of IaaS cloud resources. One of the main goals of such approaches is to enable autonomic and self-adaption scaling decisions to automatically change computing resources capacity. For example, in [33; 85] the authors used re-enforcement learning that includes mapping the usage and performance metrics of each server and application. The proposed auto-scaling techniques in [81; 82; 83] are based on analysing historical workload traces, resources capacity and load and forecasting future application workload. while proactive elasticity attempts to make autonomic predictive scaling decisions, they include probabilistic assumptions that could lead to inaccurate decisions. In practice, the majority of IaaS cloud providers support rule-based elasticity that are based on real events as it provides more certain and accurate scaling decisions.

Reactive elasticity: in this technique, scaling decisions are mainly based on an event or combination of events. Such events could be related to the cloud environment such as changes in system or resources metrics or the application hosted in the cloud environment. The most common technique that supports reactive elasticity is rule-based elasticity. These rules deﬁne a number of parameters and thresholds that determine when and how to scale computing resources. There is signiﬁcant work in the literature such as [38; 45; 86; 87; 88; 89] focus on rule-based auto-scaling techniques to adapt computing resources. Such work use application and resources metrics and parameters and thresholds as core components of auto-scaling decisions. In practice, rule-based elasticity is the most common auto-scaling technique that is currently supported by most IaaS

53 2. Background

Figure 2.5: Common Structure of IaaS Elasticity Rules cloud providers including AWS [37], Microsoft Azure [42] and RightScale [44]. It provides tangible ways of measuring and deciding when and how to scale cloud resources. Therefore, the focus of this thesis is rule-based elasticity techniques. In the following, we explain the general structure of elasticity rules and common types thresholds used in such rules namely, resource-based and applications SLA- based rules.

2.3.3 Structure and Types of Rule-based Elasticity

The general form of elasticity rules is shown in Figure 2.5[45; 90]. An elasticity rule consists of two main parts; a condition and action. Upon the evaluation of the condition, a decision is made to trigger an elasticity action (e.g., scale out/in) or not. A condition is based on a metric (e.g., CPU utilization, SLA satisfaction). The metric () will be evaluated against a speciﬁc threshold value

(Mθ). Furthermore, the metric can be any variable that is measurable through monitoring scripts provided by either by a cloud consumer or IaaS provider.

Every time interval, the measured value of the metric is compared against a user-deﬁned threshold on the metric (denoted here as Mθ). If this condition holds for a time window (Tw), the action is triggered. Tw must be consecutive time intervals of length T . The action speciﬁes the change in capacity (P ) to be

54 2. Background administered to a resource identified by . An elasticity action can be adding more servers (scale-out) or removing existing servers (scale-in). For each metric, a pair of auto-scaling rules must be defined; one to scale-out and one to scale-in. After the action is executed, the elasticity action also specifies a cool-down time (Tc), for which the system has to wait before the elasticity condition is again evaluated.

Thresholds, and hence elasticity approaches, can be classified into two main categories; resources-based and application SLA-based thresholds. The former is concerned with computing resources metrics such CPU, memory and disk utilization; and network throughput. Application SLA-based thresholds, or elasticity, are driven from application SLA, or CA-SLA, metrics such as response time and percentage of successfully served requests SLAs. As shown in Table 2.3, such thresholds are defined in elasticity rules to derive auto-scaling decisions. Resources-based thresholds are widely supported by IaaS providers for enabling elasticity [8; 37; 75; 91]. This is mainly because IaaS providers offer homogenous hardware resources which can be monitored through common APIs. For cloud consumers, it is also vital to monitor and scale their IaaS cloud based on application-based SLA metrics [8; 45] to reduce service level violations and potential costs that could arise from SLA violations. The structure and elements of elasticity rules structure form a core base for performance modeling and evaluation of elasticity mechanisms.

2.4 Summary

In this section, we have presented the key topics that form contextual background for the the research work carried out in this thesis. We have explained the concept of cloud computing in details and its service, deployment and business models. This has been supported by real-world examples of IaaS, PaaS and SaaS cloud offering types and service as well as charging models of IaaS offerings. We have focused our examples on IaaS offerings as per of the thesis scope. We also distinguished between two main types of SLA for cloud-based applications namely,

55 2. Background

(i) cloud infrastructure SLA (CI-SLA) which describes QoS properties at cloud infrastructure level and (ii) cloud-based application SLA (CA-SLA) which describes QoS attributes of an application running on IaaS cloud. The latter has been motivated for with real-world examples due to its signiﬁcance for cloud consumers and the thesis topic.

Two primary benefits of IaaS model has been explained in details namely elasticity, or auto-scaling, and cost-effectiveness of IaaS cloud model. We have distinguished both benefits through comparing IaaS cloud model with traditional way of resource provisioning. We described how elasticity can be achieved at different cloud resources including scaling cloud servers which is one of the most common cloud resources. We have also presented two common scaling types, vertical and horizontal, with focus on the horizontal scaling as it is widely used in practice. We have then explained how elasticity is currently supported through real-world examples from different IaaS providers. Based on these examples, we have presented the common structure of elasticity rules that are employed by IaaS providers to enable elasticity. This forms the basis for our key contributions presented in chapter4 and5. Particularly, performance modeling of elasticity and its key parameters requires understanding of how elasticity works in reality. This also important for evaluating performance of elasticity driven by resource- level and application-level metrics. The main topics presented in this chapter also provide basis for research studies which are analysed and compared to our work in the next chapter.

56 Chapter 3

LITERATURE REVIEW

In this section, we discuss the state-of-the-art of the research areas related to the core contributions of this thesis. This has been organized into three main research categories. The ﬁrst category is modeling studies that focus on modeling performance of clouds, cloud-based applications and related performance parameters. Such studies relate to our performance modeling of elasticity of IaaS cloud, which is presented in chapter4. We analyse and compare research studies that attempted to model dynamic cloud resource provisioning approaches, performance of multi-tier web applications running on IaaS clouds and performance of elasticity, or auto-scaling, of IaaS cloud. The second category is empirical evaluation studies that evaluate performance of IaaS cloud services, including auto-scaling, and their impact on performance of cloud-based applications. Here, we focus on studies that relate to our performance evaluation of CPU-based and SLA-based elasticity mechanisms, which are introduced in chapter5, speciﬁcally section 5.3. In this group, we examine and analyse studies that try to evaluate performance of IaaS cloud and cloud-based applications in terms of cloud resources and application metrics such as CPU utilization, response time SLA and servers usage cost. The third category is SLA-based elasticity, or auto-scaling, research studies. This category corresponds to our design of SLA-based elasticity approach presented in chapter5, section 5.2. Here, we examine and compare auto-scaling mechanisms, that are driven by resources and application SLA metrics in IaaS cloud environments, with our SLA-based elasticity approach.

57 3. Related Work

3.1 Modeling Performance of IaaS Cloud Elas- ticity and Cloud-based Applications

Due to its signiﬁcance, large number of research studies have addressed elasticity, or auto-scaling, challenges in cloud environments. Particularly, modeling performance of IaaS cloud and dynamic resource provision have received considerable attention from research communities. In such research studies, dynamic cloud resource provisioning or allocation refer to elasticity or auto scaling mechanism of IaaS clouds. The key focus topics of these studies include eﬃciency of provisioning cloud resources, application and resource SLA metrics, management of computing resources in cloud environments and optimization of multiple performance criteria.

3.1.1 Modeling Cloud Resource Provisioning Mechanisms

A number of research studies such as [18; 92; 93; 94; 95] proposed techniques for dynamic server provisioning for multi-tier internet applications running in cloud data centers. One of the primary objectives of all these studies is, to efficiently allocate cloud servers for different applications hosted on a cloud environment while ensuring application’s response time targets are satisfied. Xu et. al. [93] and Lama et. al. [92] proposed an autonomic self-tuning resource controller that uses models based on fuzzy logic. The main goal of this controller is to achieve optimal resource allocation that satisfy application’s response time targets. Their models were specifically designed for multi-tier business applications running on different cloud servers. In [93], Xu et. al. employed local controllers and global controllers. The local controller, which is created with every virtual container, is responsible for determining and allocating cloud resources needed by the application running in that container. The global controller, on the other hand, manages the local controllers’ resources requests and allocated shared resources for them. Both controllers talk to each other to make optimal cloud resource allocation decisions. The provisioning system proposed by Singh et. al. [18] considers non-stationary in multi-tier internet application workloads, and proposed

58 3. Related Work a clustering algorithm to detect workload mix over time. Based on their clustering algorithm, they also proposed another algorithm, based on G/G/1 queuing model, to determine the number of cloud servers needed to serve a workload mix over time. The provisioning model proposed by Malkowski et. al. [94] is based on automated learning and empirical models that require empirical measurements from previous application runs. Such application traces are often hard to obtain in practice. Ghanbari et. al. [95] also addressed the dynamic resource allocation problem, but in a private cloud. Their focus is on optimal resource allocation through maximization of resource sharing (to minimize provider’s costs) and meeting application’s SLA requirements of all clients.

Our elasticity modeling shares a common objective with these studies, i.e., modeling when and how to provision and de-provision servers in cloud environments. In contrast to our work, our elasticity performance modeling is based on elasticity mechanisms which are commonly used by IaaS providers and cloud consumers. More specifically, none of these studies [18; 92; 93; 94; 95] considered modeling key thresholds and performance parameters of rule-based elasticity approach. Our modeling also simulates the mechanism of elasticity rules using queuing theory and performance models. Furthermore, in our research we examine performance modeling of elasticity mechanisms from cloud consumers perspective rather than IaaS providers. The above studies consider meeting providers performance and cost targets while maintaining appropriate performance levels of consumer’s applications hosted on IaaS clouds. As we previously argued, managing various applications on an IaaS cloud, and hence modeling its performance, is very complicated and challenge due to the diversity and variability of application requirements in terms of performance and workloads. For IaaS providers this would not be a feasible solution. We, therefore, believe in techniques and methods that support cloud consumers in modeling and simulating core performance parameters of elasticity mechanisms and its impact on application’s performance. Particularly, our models and CPU-based elasticity algorithms can help cloud consumers to simulate their application workloads with different elasticity rules and perform off-line cost-performance analysis to decide on the best elasticity rules thresholds and parameters.

59 3. Related Work

3.1.2 Modeling Performance of Cloud-based Applications

Some other research studies including [20; 83; 96; 97; 98; 99] employed analytical models for modeling and approximating performance of web applications. Urgaonkar et. al. [83; 96] proposed comprehensive analytical models, based on queuing networks, for multi-tier internet applications and a dynamic provisioning techniques on top of it. The proposed models in [96] were used by a server farm to determine capacity needed to serve application workloads, and to predict the performance of an application running in a server farm. In [83], the authors addressed when and how many cloud servers to provision for a multi-tier internet application. They derived a model based on the G/G/1 queue to determine how many servers to provision for an application given application’s performance objectives. Furthermore, they proposed a predictive and reactive techniques that decide when to scale based on a predictive model of workload changes, and estimation of its potential impact on the application’s performance bounds. The cloud server provisioning technique presented in [98] consists of components including “application provisioner”; and “load predictor and performance modeler”. The application provisioner is assumed to have access to inﬁnite cloud servers from an IaaS cloud , and hence modeled as M/M/∞. Each cloud server was modeled as M/M/1/k queue. The load predictor and performance modeler approximate expected workload and servers and application performance to decide when to add or remove cloud servers.

Similar to these models [83; 96; 97; 98], we have utilized queuing networks principles to model performance of rule-based elasticity mechanisms for cloud- based applications. In addition, our analytical models partially have similar purpose of these models, which is to analyse server’s CPU utilization and application performance, and to decide when to scale-out and scale-in pool of cloud servers. However, our modeling effort differs in terms of capturing elasticity thresholds and elements, and their influence on application and resource metrics. In contrast to these models, we have used M/M/m queues to model the performance of

60 3. Related Work group of cloud servers, and applications running on them, at the application tier rather than individual servers. This is crucial as elasticity rules in practice are deﬁned on a group of servers, which is set to be elastic. In contrast to M/M/∞ model [98], our models also have captured core elasticity parameters of IaaS cloud such as maximum number of servers, which is enforced by IaaS providers in their elasticity mechanisms. Furthermore, our models consider other important elasticity parameters including server provisioning lag time and scaling cool-down time periods. These parameters proved to have impact on important metrics including CPU utilization and application response time SLA [45]. Moreover, none of the models proposed in [83; 96; 97; 98] have captured servers cost on the basis of usage-based business model of IaaS cloud services. Our models capture and simulate the server usage costs based on pay-per-usage-time charging model by considering the eﬀect of scale-out and scale-in actions over time. Therefore, our elasticity modeling of IaaS cloud performance and costs is closer to how elasticity work in real environments.

Li et. al. [20] and Al-Azzoni et. al. [99] developed performance and cost models based on queuing theory. Li’s performance models are specific for multi-tier ERP applications that forecast resource utilization and end-to-end response time. Their cost models estimate two types of cost (i) fixed hardware and (ii) dynamic operational costs of servers. Based on their performance and cost models, they used multi-objective optimization models to help service providers in evaluating cost and performance trade-offs to efficiently plan appropriate resources for ERP applications. Li’s performance and cost models were developed for applications running in on-premise or private cloud environments. Therefore, these models cannot be suitable for applications running on public IaaS cloud environment as server provisioning and pricing are based on auto-scaling mechanisms, which triggers dynamically based on workload changes. Similar to [20], Malkowski et. al. [34] models for configuration and planning of computing resources, but in IaaS cloud environments. In contrast to our models, their approach is based on in- teractive and iterative refinement process carried out on empirical data, which is collected by system management tools. These models help cloud infrastructure providers to find and plan suitable cloud resource configurations that consider

61 3. Related Work

“infrastructure cost model” and “provider revenue model” (based on SLA penalties and earnings). Al-Azzoni et. al. [99] performance and cost models are based on public IaaS cloud service provisioning. They utilized service demand law and Mean Value Analysis (MVA) algorithm to model servers’ CPU utilization and application’s average response time respectively. These models were then used to determine appropriate server capacity for internet multi-tier applications running on Amazon IaaS cloud. In contrast to our models, the models in [20; 34; 99] have not captured IaaS cloud elasticity rules and its key thresholds and parameters. We have captured core parameters and thresholds of IaaS elasticity in our models including metrics’ thresholds, monitoring time window and cool-down times; and maximum and minimum number of servers that can be triggered. Using these models, we have developed algorithms that simulate how scale-out and scale-in work in response to application’s workload. The algorithms allow cloud consumers to carry out trade-oﬀ analysis of the impact of changing elasticity thresholds on important metrics including server’s CPU utilization, application response time SLA and servers usage cost.

3.1.3 Modeling IaaS Elasticity

Modeling elasticity of public IaaS cloud was addressed differently by few other studies including [38; 100; 101]. Brebner [102] utilized three real-world applications and workloads obtained from their SOA performance modeling approach. He evaluated the performance of each of these application workloads, running on Amazon IaaS cloud, with four scenarios. In each scenario, he evaluated the impact of changing elasticity thresholds, defined in Amazon auto-scaling rules, on applications performance. The performance of these applications was evaluated in terms of a number of metrics including servers cost and end-to-end response time SLA. The performance evaluation of Brebner [38] shares with our elasticity modeling important aspect, which is evaluating different elasticity thresholds and its impact on response time SLA and server usage costs. However, this study fails to provide models that capture and simulate the behavior of rule-based elasticity and its core thresholds and parameters. It also does not provide algorithms

62 3. Related Work that approximate crucial application and resource metrics. Our models capture these core elasticity thresholds and parameters including CPU utilization thresholds, monitoring time windows, cool-down times and server provisioning lag time. Furthermore, our algorithms estimate important performance and cost metrics including CPU utilization, application response time, servers usage cost and changes in number of servers over time that result from scale-out and scale-in operations. On top of our models, we have developed simulation of rule-based scale-out and scale-in mechanisms, and for estimating the resources and application performance and cost metrics in IaaS cloud environment.

The analytical models developed by Ghosh et. al. [100; 101] focused on end- to-end performance analysis of IaaS cloud elasticity. Their models were based on “stochastic reward net” to analyse the QoS of resource provisioning and cloud resource service requests of consumers of IaaS cloud. Furthermore, “Provisioning response delay” and “service availability” were the key performance measures used to quantify the effects of dynamic workload changes of various cloud consumers who use IaaS cloud services. Unlike our models, Ghosh et. al. [100; 101] analytical models aim is to support IaaS providers in analysing performance of their elasticity service, and to ensure service availability and reduce service provisioning delays and job rejection rate. Our analytical models are consumer-driven as they focus on modeling elasticity thresholds and parameters to support cloud consumers in analysing performance and cost trade-offs of different thresholds for their application workload.

3.2 Performance Evaluation of IaaS Clouds and Cloud-based Applications

Evaluating performance of IaaS cloud platforms and applications running on them relate to our research work in chapter5 from four point of views. First, we have carried out performance evaluation of elasticity mechanisms which are commonly oﬀered and used by IaaS cloud providers and consumers respectively. Therefore,

63 3. Related Work it would be useful to investigate how our performance evaluation of elasticity mechanisms is positioned in the context of existing performance evaluation of IaaS cloud platforms. Second, some studies focus on evaluating scalability and elasticity of offered cloud services. This is one of the objectives of our performance evaluation which is primarily concerned with evaluating how well common IaaS elasticity service performs. Third, some studies considers factors that may influence performance of offered cloud services such as service specification and variability of offered service. This is similar to our consideration of impact cloud server instance type and performance variability of offered services on achieving cost-effective elasticity. Fourth, our performance evaluation methodology involves applying different elasticity mechanisms to a multi-tier business application running on IaaS cloud. This brings foreword the need for investigating current performance studies of cloud-based applications, and how our performance evaluation fits in the context of such existing studies.

3.2.1 Performance Evaluation of Elasticity of IaaS Cloud Services

The cloud computing model has been built based on a number of technologies and principles including resource virtualization and internet-based and measured self-service. This has made maintaining appropriate performance levels of IaaS cloud, such as service response time and availability, one of the most important aspects for cloud consumers to rely and utilize such services. Therefore, an increasing number of research studies including [50; 53; 103; 104] have focused on evaluating the performance of diﬀerent IaaS cloud services and its impact on web applications.

Dejun et. al. [103] focused their evaluation on two important performance properties of cloud servers provided by Amazon IaaS cloud namely, (i)“performance stability” which measures how consistent the performance of provisioned cloud servers remains over time and (ii) “performance homogeneity” which measures how homogeneous is the performance behavior of provisioned cloud servers of the

64 3. Related Work same type. Similar to our work, the evaluation was carried on a web application of 3-tier architecture running on Amazon cloud small and medium cloud servers. The evaluation was also carried out on cloud servers at six different availability zones that Amazon offers to their cloud consumers. One of the application workloads used in their experiments was CPU-intensive, which stresses performance at the application tier, similar to our workload end performance evaluation objectives which are defined at the application tier. The other application workloads that were tested were read-intensive and write-intensive, which stress the performance of the database tier. The evaluation results were compared according to the application response time measured at the application and database tier. Lenk et. al. [53] proposed a method to support cloud consumers in measuring and comparing performance-costs of virtual cloud servers from different providers. Their evaluation was based on CPU utilization of cloud servers application response time metrics, which are among the primary metrics we used in our evaluation and analysis of elasticity approaches.

Although such performance evaluation studies [53; 103] share some aspects of our performance evaluation methodology, but there are some important differences in terms of the evaluation scope and objectives. Particularly, our research scope is nailed down to evaluating performance of rule-based elasticity mechanisms of IaaS clouds. We have evaluated two primary elasticity approaches; CPU-based and SLA-based elasticity with different thresholds and parameters. Furthermore, we have evaluated performance of these elasticity mechanisms on cloud servers with different capacity profiles. Unlike [103; 104], we have also provided trade-off analysis between crucial metrics from cloud consumers perspective. These metrics include response time SLA, CPU utilization, end-to-end response time, and servers usage cost metrics. None of the studies [53; 103; 104] have employed and evaluated performance of different elasticity rules with different thresholds. Therefore, we see our evaluation of IaaS elasticity mechanisms compliment such studies in the performance evaluation area.

Cloud performance studies such as [50; 104; 105] focused on evaluating IaaS cloud services for High Performance Computing (HPC) or scientiﬁc applications.

65 3. Related Work

Particularly, Jackson et. al. [104] evaluated the performance of five applications on Amazon IaaS cloud and two other HPC conventional virtualized clouds namely; Franklin and Lawrencium. They also provided analysis of Amazon cloud performance variations and its impact on application performance. Iosup et. al. [50] analysed the sufficiency of IaaS clouds for running scientific computing workloads at similar performance and lower cost of corresponding grid and parallel production infrastructure resources. In contrast to [104], the empirical performance and cost evaluation in [50] were carried out on four major IaaS cloud provider’s cloud servers namely, Amazon EC2, GoGrid, ElasticHosts and Mosso. From each provider, a number of cloud server instances, each of which with different capacity profile, were used in the evaluation study. This evaluation provided more comprehensive analysis of different IaaS clouds and server instance types. Although the performance and cost results of those studies [50; 104] showed that conventional HPC grid and parallel infrastructure resources performed better than public IaaS clouds, but such results would not benefit our study. This primarily because of the type of applications used in the evaluation. Our evaluation focuses on e-business, web transactional, applications with multi-tier architecture. Furthermore, unlike our evaluation work, none of these studies [50; 104; 105] have considered impact of elasticity mechanisms on application and resource metrics. Therefore, we see our IaaS elasticity performance evaluation adds to these studies as it covers other common class of applications, and it employs trade-off analysis of the impact of elasticity mechanisms and its thresholds on resource and application metrics.

3.2.2 Evaluating Elasticity Performance on Diﬀerent Server Instances

Some cloud studies such as [50; 103; 105; 106; 107; 108; 109] have focused recently on evaluating performance of cloud servers with different capacity profiles. Such studies relate to our performance evaluation of IaaS cloud we have carried out with different cloud server instances (presented in section 5.3.3). Jayasinghe et. al. [106] evaluated performance and scalability variations of n-tier web applica-

66 3. Related Work tions on different clouds including Amazon IaaS cloud. One part of their evaluation involves scale-out scenarios of the database tier with different pre-defined server configurations. They analysed the impact of scaling out the database servers with Amazon small, large and extra large server instances in terms of CPU utilization and throughput metrics. Kossamann et. al. [107] evaluated the scalability of multi-tier web applications on different IaaS cloud platforms including Amazon IaaS cloud. Particularly, they evaluated scale-out capabilities of different cloud database services (e.g., Amazon Relational Database Service, SimpleDB, and MySQL) with different database architectures. Michael et al. [39] also carried out an experimental case study to evaluate performance-price trade- off of scale-out and scale-up mechanisms. Their evaluation was done on dedicated servers, IBM’s specific clusters, and specific to Nutch/Lucene framework for im- plementing search applications. The performance characteristics of such dedicated servers vary from those which we have used in our evaluation study, i.e., public cloud servers which are offered on-demand. In addition, web transactional applications have workload characteristics differ from search applications as the latter consist of crawling, indexing and query functions rather than user sessions such as shopping and browsing. Therefore, this evaluation cannot be beneficial for the purpose of our study, but rather compliment it by considering other class of applications and cloud server instances.

While horizontal scaling were employed in these performance studies [39; 106; 107], but these studies diﬀer from our work in three aspects. First, scaling was scoped at the database tier not the application tier [106; 107]. The scope of our performance evaluation is on the application tier with CPU-intensive workload class that stresses the application tier. Scaling the database tier is beyond the scope of our study as it includes additional challenges such as data consistency and replication. Second, scaling-out operations were manually conﬁgured in the studies [39; 106]. We have employed auto-scaling rules which are commonly used in practice to dynamically scale-out and scale-in the application tier in response to workload changes. In [107], auto-scaling mechanism was custom implementation rather utilizing existing auto-scaling techniques provided by IaaS provider. As we explained in our evaluation methodology in chapter5, we have chosen auto-scaling

67 3. Related Work rules provided by Amazon Auto-scaling mechanisms as they provide built-in and well-tested scaling operations. Such auto-scaling service, which is widely used by cloud consumers, would not incur additional performance overhead when triggering provisioning and de-provisioning requests in the cloud. Third, all these studies [39; 106; 107] employed scale-out but not scale-in operations. Our performance evaluation have considered both scale-out and scale-in operations deﬁned based on application and resources parameters. Furthermore, unlike [106; 107], we studied the impact of elasticity rules thresholds on important application and resources performance metrics such as response time SLA, CPU utilization and servers usage cost. We have used two common types of elasticity rules based on metrics thresholds, (i) resource-based , based on CPU utilization and (ii) application-based, based response time SLA satisfaction. Therefore, we see our work complement these studies [106; 107] from diﬀerent aspects.

Similar to our evaluation methodology, [103; 108] carried out their performance analysis on Amazon IaaS cloud instances of small and medium capacity, but with diﬀerent evaluation objectives. Dejun et. al. [103] work shares with our work the type of application used in the evaluation (n-tier web applications), and the application metric (i.e., response time at the application). However, their analysis scope is on performance stability and performance homogeneity as previously explained in this section. Meanwhile, the focus of [108] is analysing the performance of Amazon IaaS cloud servers based on network metrics such as TCP/UDP throughput and packet delays.

Some other research studies like [50; 105; 109] considered various cloud servers with different capacity profiles including small and medium instances, which we used in our performance evaluation. Ostermann et. al. [105] evaluated and analysed the acquisition and release overhead in terms of time taken to allocate or release different Amazon cloud server instances. They also evaluated the performance of each server instance type in terms of CPU, memory and I/O metrics. Both studies [50; 105] would be useful for our performance evaluation if web transactional application with n-tier architecture have been considered rather than scientific, or HPC, application types. In contrast to scientific applications, web

68 3. Related Work transactional applications exhibits dynamic workload changes which impose different performance requirements. In such applications, trade-oﬀ between meeting application SLAs, and resource utilization and cost metrics are crucial for evaluating application and IaaS cloud performance.

Chen et. al. carried out trade-off analysis based on different metrics including application and resource metrics [109]. Particularly, one goal of their study is to analyse trade-off between application SLA performance and the cost of running them on Amazon IaaS cloud. Furthermore, they investigated the influence of using different Amazon server instances with different capacity profiles to serve application requests on customer satisfaction and IaaS provider profit. In their evaluation, customer satisfaction is defined in terms of application response time and service price using their utility models. While their performance analysis considered trade-offs between cloud consumer’s SLA satisfaction and provider’s profit, our performance work solely focuses on comprehensive metrics from consumer’s perspective using two common Amazon IaaS cloud server instances with small and medium capacity. Specifically, we analyse the impact of elasticity thresholds on achieving appropriate trade-off between CPU utilization, response SLA satisfaction, end-to-end response time, percentage of served requests and server usage costs.

3.2.3 Impact of Performance Variability on IaaS Cloud Elasticity

The last group of performance studies that are related to our research is performance variability of IaaS cloud studies. Such work relates to our performance consistency of IaaS cloud elasticity. Some research studies have been recently done in performance variability of diﬀerent IaaS cloud services including [51; 52; 103; 105]. Both studies [52; 105] observed variable performance for a number Amazon cloud services including cloud servers. They measured performance variability of what they call “deployment latency” [52] and “resources acquisition time” [105], which refer to the time needed to provision cloud servers measured from the time they

69 3. Related Work are requested until the time they become up and running. In [105], results are driven from requesting 20 cloud server instances of similar capacity for different Amazon cloud server instances. Meanwhile, in [52] results were derived from the empirical measurements of weekly traces over a period of two months time, during different weeks, for both Amazon small and large instances. Their empirical analysis showed variable acquisition time of Amazon cloud servers over time to some extent. Based on their results,Iosup et. al. [52] emphasised the importance of having more steady cloud server acquisition time, especially for applications that use auto-scaling operations. This notice is consistent with one of our observations concluded from our performance consistency of elasticity rules. Specifically, we noticed how variable server acquisition time had resulted slightly inconsistent performance in terms of CPU utilization, response time SLA, end-to-end response time and server usage costs in most of the cases.

Schad et. al. [51] observed slightly high CPU performance variability for both Amazon small and large cloud server instances during different weekly periods of time. Furthermore, they noticed higher server CPU performance variation during weekdays and reasoned that to running cloud consumers applications mostly during weekdays. The results also showed that performance variation of Amazon IaaS cloud servers differ between availability zones. Dejun et. al. [103] also evaluated variations of performance of Amazon small and medium server instances at six Amazon availability zones. Their focus is on how application response would vary when running them on cloud servers at different availability zones.

The above studies provided cloud consumers with useful performance guidelines and indicators about performance variation in various IaaS cloud services and from diﬀerent perspectives. Our evaluation of performance elasticity mechanisms, have not aim to study the performance of IaaS cloud services and its variability. Rather, we have evaluated how consistent would be performance of elasticity mechanisms on two diﬀerent Amazon server instances namely, small and medium instances. Our analysis focused on how consistently elasticity mechanisms perform in achieving desired resource and application metrics that are important for cloud consumers. These metrics are CPU utilization, response time

70 3. Related Work

SLA satisfaction, server usage costs, end-to-end response time percentage of requests served successfully. Our evaluation covers two common types of elasticity mechanisms namely, CPU-based and application SLA-based elasticity. There- fore, we see our performance consistency evaluation of elasticity mechanisms contributes and fill gaps to theses performance evaluation studies from a number of different perspectives to support cloud consumers in their decision-making in defining appropriate elasticity policies.

3.3 SLA-based Elasticity for Cloud-based Ap- plications

One of the primary benefits of elasticity of IaaS cloud is the dynamic allocation of computing resources to accommodate variable workload changes of applications running on them. Elasticity can be driven by different objectives, that could be important for cloud consumers, cloud providers or even both. One of the most important objectives for cloud consumers is to maintain appropriate application SLA satisfaction while ensuring efficient use of computing resources. Due to the significance of such need, there has been considerable body of growing research in the area of scaling IaaS cloud resources based on application SLA metrics [8; 49]. In this section, we discuss and position our work of SLA-based elasticity of n-tier cloud-based applications, presented in section 5.2, in the context of existing literature in this field.

3.3.1 SLA-based Auto-scaling for IaaS Providers

Many research studies such as [82; 110; 111; 112; 113] proposed automated scaling mechanisms for applications running in IaaS cloud environments. The auto- scaling mechanisms proposed [110; 111] were based on predefined scaling policies/rules and cloud server configurations. One important characteristic of these auto-scaling mechanisms is that they are cloud-independent so they are not bounded to a specific cloud infrastructure. Furthermore, such mechanisms aim

71 3. Related Work to trigger timely and efficient scaling actions based on user-defined cloud resource utilization rules. Therefore, they do not consider application SLA metrics. Unlike these mechanisms [110; 111], the auto-scaling approaches presented in [82; 112; 113] are driven by application and resource SLA metrics. These elasticity approaches are driven by multiple objectives which are, satisfying application SLOs while maintaining appropriate resource utilization and minimize resource usage and SLA violation costs. Both auto-scaling mechanisms in [82; 112], employed predictive workload models to predict potential workload behavior and make appropriate scaling decisions beforehand. The main goal of such models is to reduce the overhead of server provisioning time found in existing auto-scaling mechanisms provided by IaaS providers. In contrast to other mechanisms, Dutta et. al. [113] auto-scaling approach attempts to find the optimal combination of horizontal and vertical scaling actions for different applications running on an IaaS cloud.

Our SLA-based elasticity approach is solely focused on cloud consumers application SLA metrics. While the above approaches attempted to meet consumers application SLAs, but they also consider optimizing cloud providers objectives. This would influence efficiency of proposed auto-scaling mechanism as performance and cost objectives of cloud providers often conflict with cloud consumers ones. Particularly, IaaS providers often demand mechanisms that optimize resource usage to minimize running and operational costs of their IaaS clouds. Fur- thermore, such approaches cloud providers try to balance satisfaction of SLOs of different cloud consumers application, and hence, they do not focus on certain application SLA requirement of each cloud consumer. They need, therefore, to balance all requirements of their cloud consumers SLAs. One objective of our SLA-based elasticity is to enable auto-scaling actions based on application SLOs that are managed by a cloud consumer. Our elasticity decisions are, therefore, driven by cloud consumers SLA metrics. In addition, our SLA-based elasticity mechanism can be also integrated or used with existing resource-based elasticity mechanisms to help cloud consumers to identify the impact of resource thresholds on satisfying application SLAs, and hence, to choose the most appropriate thresholds.

72 3. Related Work

Some other research work including [95; 114; 115] proposed approaches for virtualization and provisioning of cloud servers based on applications and resource SLAs. Kertesz et al. [114] proposed a resource virtualization architecture that can be run on top of an IaaS cloud. It consists of a number of components that coordinates SLA-based resource allocation. Particularly, these components automate service negotiation, brokerage and deployment of IaaS cloud resources based on QoS requirements that must be specified by a cloud consumer. In [115], Goiri et. al. presented a resource-based metric for defining QoS guarantees on CPU resource capacity. Based on defined CPU guarantees of a customer, they derived models for enabling dynamic resource allocation that ensure satisfaction of QoS guarantees through consistent monitoring. The goal of such models is to ensure that resource capacity agreed with cloud provider is met. This is primarily to reduce false resource-level SLA violations that could arise from other factors such as poor application performance. This approach, however, lack appropriate methods for deriving resource allocation based on applications SLAs. Unlike [114; 115], the dynamic allocation of resources introduced in [95] focuses on optimizing cloud providers and consumers metrics. In particular, the allocation approach aims to maximize resources sharing, and hence, profit of a provider while maintaining maximum levels of application SLA satisfaction of a cloud consumer. The optimization approach was designed for private cloud environments.

Unlike these approaches [95; 114; 115], our SLA-based elasticity considers deriving auto-scaling (or dynamic resource allocation) from cloud consumer perspective. We define SLA metrics as the core for evaluating the decision when to scale-out or scale-in cloud servers. In our case, application SLA metrics of a cloud consumer are solely monitored and used as basis for auto-scaling actions. Mean- while, the approaches presented in [95; 114; 115] require optimizing and ensuring providers metrics, e.g., maximize resource sharing, and large number of cloud consumer applications’ SLAs. Therefore, we believe such approaches would be not be practical as they could lead to conflicting scenarios between cloud consumer’s SLAs and cloud provider’s performance and profit objects. More importantly, we believe such approaches could not scale when large number of application’s SLAs

73 3. Related Work need to maintained and optimized, in addition to provider’s resource metrics. We argue cloud providers would not adopt such solutions due to the huge risks involved with them.

3.3.2 SLA-based Cloud Resource Management Mechanisms

There has been a growing number of research studies such as [24; 33; 94; 116; 117; 118] that proposed approaches for managing and controlling dynamic resource allocation operations with focus on the database tier. These approaches were driven by applications SLAs and optimizing resource metrics. The approaches presented in [24; 33; 116; 117] share two main management objectives. First, to minimize operating cost of an IaaS cloud environment by determining the smallest number of servers needed to serve all applications hosted on that cloud. Second, to dynamically distribute, or place, these applications among all the provisioned servers so that applications SLA requirements are satisﬁed. However, they have some similarities and diﬀerences in the methods and models employed to achieve these objectives. Unlike all approaches, Nguyen et. al. [117] utilized (i) “local decision module”, which is associated with each application, and “global decision module”, which interact with all local modules. Dynamic resource scaling decisions in both modules are driven from constraint optimization models. The local constraint model attempts to optimize application SLAs. The global constraint model attempts to maximize SLA satisfaction while maintaining minimum resource costs.

The management approach presented in [116] is primarily driven by a forecasting algorithm that periodically predicts the variability of application workload demand based on historical workload data. In contrast to this, both management approaches [24; 33] proposed a “resource manager” or “controller” that controls dynamic resource management decisions. Li et. al. utilized “reinforcement learning” approach to continuously develop a knowledge base to be used by the controller to map environment states into suitable scaling actions [33]. The “system modeling module”, of the “SmartSLA” component proposed in [24], also

74 3. Related Work employed machine learning techniques to estimate potential resource profits that could result from different allocations of each application hosted on the cloud. Based on this module, a “resource allocation decision module” dynamically ad- justs allocated resources to achieve applications SLAs and optimize provisioned resources. Unlike [24; 33; 116], Malkowski et. al. [94] dynamic resource management approach consists of multi-model controller that incorporates resource allocation decisions from multiple models. These models are, “horizontal scale model”, “empirical model” and “workload forecast model”. The controller de- cides on the best allocation of virtual machines for each tier of 3-tier cloud-based applications in terms of application SLA fulfillment and resource costs.

Unlike our SLA-based scaling approach, Pujol et. al. [118] proposed a scale- up method, “One Hop Replication”, for centralized “Online Social Networks” design to reduce costs of full transition to a cloud distributed environment. Sim- ilarly, Xiong et. al. [24] resource management approach also focuses on database systems to eﬃciently scale-out resources to meet application’s cost and performance SLAs. Our approach, like other approaches [24; 33; 94; 116], considers web transaction n-tier cloud-based applications. Database and social network applications own special characteristics and scaling requirements that diﬀer from those n-tier web transactional ones. In [24; 118], such application class is more memory-intensive and data-driven, which requires addressing data replication and partitioning. Our approach, like [33; 94], focuses on auto-scaling of CPU-intensive transactional workloads at the application tier. In contrast to our approach and other approaches [24; 33; 94; 116], Pujol et. al. [118] have not considered scale-out and scale-in operations. As previously discussed, we focused on horizontal scaling as it provides more reliable scaling solution by avoiding a single point of failure.

Our SLA-based elasticity have narrow focus in comparison to the approaches presented in [24; 33; 94; 116; 117; 118]. Particularly, our SLA-based elasticity algorithms focus on driving scale-out and scale-in actions primarily based on SLOs of individual request types of the application, and the overall SLA satisfaction at ﬁne-granular time intervals. In contrast to these approaches, we do not optimize multiple constraints such as provider’s resources utilization and costs, and

75 3. Related Work other hosted applications’ SLAs. Therefore, we derive more appropriate auto- scaling decisions for individual cloud consumer application SLA independently from other metrics. The SLA monitoring part of our SLA-based architecture can be utilized by cloud consumers to analyse trade-oﬀ of diﬀerent CPU utilization thresholds and response time SLA satisfaction, cost and end-to-end response time metrics. Our SLA-based elasticity architecture is also designed for multi-tier web applications where auto-scaling decisions can be made independently at each tier.

3.3.3 IaaS Auto-scaling Mechanisms

Many IaaS and cloud management providers provide tools and services to support cloud consumers in defining auto-scaling policies for their cloud-based applications. Examples of these include AWS Auto Scaling service [37], GoGrid RAM Scaling service [76], RightScale Autoscaling service [44], Microsoft Azure Auto-scaling Block (WASABi) [42] and Rackspace Auto Scale service [91]. Such tools provide APIs and management console that help cloud consumers to automate auto-scaling operations. This includes services such as cloud resources monitoring metrics [78; 119], cloud load balancing services [73; 120; 121] and server configuration and deployment services [37; 44]. Realizing auto-scaling requires technical configuration and management of a number of these services by a certain IaaS provider. RightScale is an example of a third-party cloud management provider who offers cloud management services including auto-scaling services across multiple IaaS clouds such as AWS [11], Microsoft Azure [59], Rackspace [12], GoGrid [13] and Google [122]. RightScale management tools allow cloud consumers to define and automate various auto-scaling operations using high-level management console without the need for technical APIs development.

Existing commercial auto-scaling tools such as Microsoft WASABi [42], AWS Auto Scaling [37] and Rightscale autoscaling [44] provide cloud consumers with ways to deﬁne elasticity rules for their cloud-based applications. This helps cloud consumers to deﬁne and automate scaling decisions for their Web applications.

76 3. Related Work

For example, both AWS Auto scaling and Microsoft WASABi require defining rules and actions that determine when and how to scale their cloud resources respectively. These rules are often defined in terms of resources metrics such as CPU utilization and other parameters. They also can be be defined based on predefined schedule, rather than metrics and parameters, which determine when and how much to change resources. In contrast to AWS auto-scaling, Microsoft WASABi allows defining auto-scaling actions at application block levels rather than resources level. Therefore, Microsoft WASABi allows scaling actions at certain parts of the application, e.g., application throttling, in addition to changing capacity of computing resources.

While such auto-scaling mechanisms enable cloud consumers to define various rules and actions, but they do not provide cloud consumers with ways to decide how would elasticity rules’ thresholds would influence satisfying their resources and application metrics. Our performance modeling and simulation of IaaS elasticity rules (in chapter4) support cloud consumers in analysing and determining the impact of changing threshold values on important resources and application metrics. Using our elasticity rules simulation, cloud consumers can, for example, determine the CPU utilization thresholds that lead to SLAs being satisfied. This extends the benefits and practicality of commercial auto-scaling rules and it helps cloud consumers to make best use of them.

Another important limitation of existing elasticity rules is that they are based on resources-level metrics such as CPU and memory utilization, and amount of network traﬃc received or sent by a cloud server. Such rules, therefore, do not consider scaling decisions based on application-level metrics. Our architecture and algorithms presented in chapter5 extends these elasticity rules and enable automated scaling decisions based on application-level metrics. The proposed architecture and algorithms, therefore, support cloud consumers in analysing and comparing how well SLA-based elasticity perform in comparison to those auto- scaling rules based on resources metrics.

One major limitation of these auto-scaling tools is that they primarily support

77 3. Related Work the automation and management of resource-level auto-scaling operations [8; 49]. In particular, auto-scaling policies are defined based on resource-level metrics such as CPU and disk utilization, and data transfer. Such metrics are collected by appropriate cloud resources monitoring services often provided by IaaS providers. Unlike application SLA metrics, resource-level metrics are homogenous and can be measured across group of resources, e.g., servers and network, through common APIs. Our SLA-based auto-scaling architecture and algorithms fill in this gap. They provide cloud consumers with a mechanism to define and automate scale- out and scale-in operations based on response time SLA. We believe it would be very challenging and risky for IaaS cloud providers to manage and scale their IaaS cloud based on different application-level SLA metrics of their cloud consumers. This has not yet been provided by any IaaS provider due to the complexity and diverse requirements of cloud consumer’s application SLA metrics.

It can be argued that there is an interdependence between resource metrics and application performance, and this interdependence does not clearly distinguish between resource-based and SLA-based elasticity. Resource metrics, and therefore resource-based elasticity, may indirectly indicate the impact of changing resource thresholds on application performance. However, there could cases in which application SLA metric is not satisﬁed while resource metrics have not triggered scale-out actions. This could be during the time in which resource- based elasticity condition could take to be satisﬁed or even just close to the resource metric threshold baseline. We demonstrated these cases empirically in section 5.1. Such scenarios, in fact, demonstrate the need for elasticity rules based on application SLA as it provides direct measure of application performance. It also guarantee triggering auto-scaling actions on direct metric of application SLA metrics rather than interdependent resource metrics.

Recently, there have been support services for allowing cloud consumers to define their own specific metrics. Such services are straight forward for multi- tier application as they involve non-trivial design decisions and development of application-specific SLA-based algorithms. Our SLA-based elasticity architecture and algorithms, presented in sections 5.2.1 and 5.2.2, provide a method

78 3. Related Work for enabling cloud consumers to define and automate scaling decisions and actions based on application-level metrics. Furthermore, it is a generic architecture through which any application-level metrics can be incorporated to derive auto- scaling decisions. In addition, our conceptual design can be integrated with any IaaS cloud platform. The proof-of-concept implementation of the architecture has demonstrated the feasibility of our SLA-based elasticity mechanism. In addition, our evaluation with different elasticity SLA-based elasticity rules, presented in section 5.3, showed its usability for supporting cloud consumers in analysing trade-offs of different application and resource performance and cost metrics.

79 Chapter 4

MODELING CPU-BASED ELASTICITY

In the previous chapter, we have analysed the related literature to the thesis aims and we have discussed the proposition of our elasticity modeling in it. In this chapter, we present our modeling of resource-level elasticity based on CPU utilization metric. We first present the structure of elasticity rules and its key elements. This followed by introducing queuing theory and its basic notations in the context of multi-tier application architecture. Based on queuing models, we then introduce our performance and cost models for estimating the key parameters and metrics of CPU-based elasticity. Furthermore, we present our algorithms that simulate scale-out and scale-in mechanisms and servers usage cost for cloud- based applications. We then present our validation of the proposed models using simulation and empirical experiments in AWS cloud environment. The results of evaluating CPU-based elasticity rules with different thresholds are then analysed and discussed. The chapter concludes with a use case that demonstrates how our proposed CPU-based elasticity models and algorithms can support cloud consumers to define appropriate elasticity thresholds for their cloud-based applications.

80 4. Elasticity Modeling

Figure 4.1: Key Elements of an Elasticity Rule

4.1 Elasticity Rules Structure

As explained in Section 2.3.3, IaaS elasticity can be either proactive or reactive. The former requires sophisticated and highly reliable predictive models based on historical data such as workload and environment changes. The latter, on the other hand, involves reacting to real-time system events and/or changes and do not require such predictive models. Alternatively, reactive elasticity approaches provide more reliable and realistic scaling ways. This is because reactive elasticity rely on real-time monitoring data and events, from which certain performance indicators can be drawn. Most of IaaS providers support reactive IaaS elasticity through rule-based mechanisms. Such rule-based mechanisms are commonly adopted by cloud consumers due to its fit for their real-world scenarios [38; 45; 86; 87; 88; 89]. Therefore, the focus of our modeling effort in this thesis is on rule-based elasticity techniques. In order to model such mechanism and effects of elasticity on application and resources’ metrics, it is crucial to understand the key elements of the the common structure of such elasticity rules.

The general form of elasticity rules is shown in Figure 4.1. An elasticity rule consists of two main parts; a condition and action. The condition speci- ﬁes a metric () to be evaluated against a speciﬁc threshold value

81 4. Elasticity Modeling

Monitor CPU Utilization (CPUUtil) every 1 min. interval

IF CPUUtil > 80% FOR 7 min. Add 1 server of small capacity //Scale out Wait 5 consecutive 1 min. intervals

IF CPUUtil < 30% FOR 10 min. Remove 1 server of small capacity //Scale in Wait 7 consecutive 1 min. interval

Figure 4.2: Example of an Elasticity Rule

(Mθ). The metric can be any variable that is measurable through monitoring mechanisms provided by either by the IaaS provider or installed by the cloud consumers. These metrics can be monitored either at the resource-level or at the application-level. Examples of resource-level metrics include CPU and memory utilization and amount of Network out/in (in bytes). Response time and number of percentage of dropped requests represent examples of application-level metrics. The utilized metrics determine the basis of the elasticity approach which can be resource-based elasticity or application-based elasticity. Here, we are primarily concerned with monitoring application’s response time and server’s CPU utilization metrics (at the application tier) as these are the most important measures for determining the performance of an application. These metrics are measured at regular time intervals (e.g. 1 minute), the length of which is denoted here as Time Interval Length (T ).

At every time interval, the measured value of the specified metric is compared against a user-defined threshold on the metric (denoted here as Mθ). If this condition holds for a time window (Tw), then the action part is triggered. The Tw must be consecutive time intervals of length T . The action specifies the change in capacity (P ) to be administered to a resource identified by . Examples of change in resource capacity are increase or decrease of number of servers. After the action is executed, the elasticity action also specifies a cool- down time (Tc), for which the system has to wait before the elasticity condition

82 4. Elasticity Modeling is again evaluated. Figure 4.2 presents an example of such elasticity rules. Here, the rule triggers if CPU utilization increases above 80% for 7 consecutive minutes to add 1 server of small capacity, i.e., scale-out. The scale-in rule, triggers to remove 1 server of small capacity when the CPU utilization decreases below 30% for 10 consecutive minutes. These rules are not evaluated for 5 and 7 minutes after the scale-out and scale-in actions are triggered respectively to allow resource changes to take place.

The parameters and metrics of elasticity specify how the conditions and actions of an elasticity rule work and therefore, are important inputs for the modeling process. Therefore, our models for approximating CPU-based elasticity incorporate these as input variables.

4.2 Modeling CPU-based Elasticity

In this section, we ﬁrst present the M/M/m queuing model for 3-tier application architecture. We further discuss the basic notations of the queuing model including the request arrival rate and mean service rate. Based on the M/M/m queuing theory, we derive the key models for estimating the core metrics and outputs for CPU-based elasticity namely, average CPU utilization, average response time, number of servers and server usage costs. We also derive models for estimating other elasticity parameters and constraints, i.e., cool-down time, maximum and minimum allowed servers and server provisioning lag time. We further deﬁne the conditions under which when to scale-out and scale-in decisions should be evaluated and scaling actions should be triggered. We then present our algorithms for scale-out and scale-in operations on an IaaS cloud, and for calculating costs of server usage.

83 4. Elasticity Modeling

4.2.1 Queue Model for Multi-tier Applications: Assump- tions and Scope

Typical Internet business applications are often structured using multi-tier architecture, in which each tier is responsible for certain processing of incoming requests [96; 97; 123]. The most commonly used structure in industry and academia is the 3-tier architecture [23; 25; 40; 96]; which consists of Web, application and database tiers. The Web tier receives requests from the application users and send back the responses to them. It also balances incoming requests between a number of servers at the application tier. Each server, at the application tier, processes incoming requests and sends the processing results back to the Web tier. To achieve this, a server may need to communicate with the database tier one or more times to retrieve needed information. The database tier maintains records of all important data of the application and, hence, it provides access to this data through the application tier. In the context of our modeling, we assume that a 3-tier business application is deployed and running on a public IaaS cloud such as Amazon cloud. As a common practice each tier is deployed on a separate server in the cloud to enable reliable fail-over [40].

Queuing theory employs modeling processes and parameters of queue systems and how they work [124; 125]. It has been widely used in research and academia as analytical modeling technique for performance analysis of computer systems [124; 126]. A queuing model abstracts computer systems which consist of a number of workstations (computers or servers) that serve jobs arriving to the system. Each workstation has a queue of jobs waiting to be served, one at a time [124]. Therefore, a workstation has two main parameters; request arrival rate and mean service rate. The request arrival rate represents the average number of jobs arriving at a workstation. The jobs are either served or wait in a queue depending on the status of the workstation. The mean service rate is the average time taken by the workstation to serve one job. In queuing systems, there could be one or more workstations, each of which is represented as an independent queue.

84 4. Elasticity Modeling

Figure 4.3: Queue Model of 3-tier Application Architecture

Based on queuing theory, a 3-tier application architecture can be modeled as network of queues, as shown in Figure 4.3[97; 124; 126]. In such queuing network model, each tier consists of one or more servers to serve requests received by an application. Here, each server is represented as a queue at which application requests are served. As shown in Figure 4.3, the Web tier (Load Balancer) distributes incoming requests across a pool of application servers which reside at the application tier. At the application tier, each application server needs to send one or more queries to the database tier in order to serve a request. As previously discussed, the thesis scope is on the elasticity of the application tier of Internet business applications. Therefore, the discussion of rule-based elasticity modeling will be focused on the application tier.

Queues are diﬀerentiated by a number of elements such as number and type of parameters, processes included to capture certain behavior of the system and derivation or estimation of system parameters [124]. Examples of queues include G/G/1, M/D/1, and M/M/m [124; 127]. We chose M/M/m queue to model IaaS elasticity at the application tier of multi-tier web applications running on IaaS cloud. Due to its suitability, the M/M/m queuing model has been widely used in performance modeling and analysis of multi-tier Internet applica-

85 4. Elasticity Modeling tions [20; 83; 96; 99; 128]. This is because the way servers receive and process application requests employs core principles of queuing model [96; 124].

In a M/M/m queue model, the ﬁrst parameter of the model represents the arrival process of the requests at a server. Here, the arrival and execution of requests at each server follows a birth-death process which is a special Markovian chain [124; 125]. In Internet applications, a user requests certain pages or functions independently from other users, which implies that there is no dependency between incoming requests. Therefore, the arrival of requests follows a Poisson process [83; 125]. In Poisson process, the inter-arrival times are independent and identically distributed (IID) and therefore it can be derived from exponential distribution (M) [124]. In this thesis, we assume the arrival rate of requests follow exponential distribution which has been proven as a common workload characteristic of e-commerce applications by TPC-W benchmark [129], and workload characterization studies such as [17].

The second parameter in the model represents the service time. Each request brings with it certain amount of work to be processed by the server. In M/M/m model, service times are assumed to be estimated from an exponential distribution (M). The third parameter refers to the number of servers available to serve request (m). Requests are served in the form First-Come-First-Served (FCFS) order. Any request that may arrive while all servers are busy needs to wait in a queue until all other requests arrived before it will be served.

As shown in Figure 4.3, requests arriving at the web tier at rate λ (lambda) are equally distributed between m servers at the application tier. Therefore, each server receives λ/m requests. Also, each server has a mean service rate denoted as µ, the number of requests served by a unit of time. As we treat the application tier as a whole, it, therefore, receives requests arriving at rate lambda (λ) and serves it at service rate µ, and it has m servers of same processing capacity.

By considering the time interval factor described in elasticity rules structure, we deﬁne the following core parameters of the application tier queue model with

86 4. Elasticity Modeling respect to a particular time interval t:

• Request arrival rate(λt): is the rate at which requests arrive at the application tier at time interval t and served by m servers.

• Mean service rate (µt): the mean service rate at time interval t, which is also equal for all servers at the application tier as we assume each server to have the same processing capacity.

• Number of servers (mt): is the number of servers at the application tier at time interval t, which varies over time as scale-out and scale-in actions trigger over time passage.

• Average CPU utilization (Ut): is the average CPU utilization of all servers at the application tier at time interval t.

• Average response time (Rt): is the average time required to serve all requests arriving at the application tier at time interval time t. The response time is measured at the application tier.

It should be noted that the metrics, especially the CPU utilization, are calculated at the application tier as an average of all servers. This primarily because the application tier is modeled as a whole, i.e., as M/M/m, rather than each server as independent queues. Furthermore, the incoming requests are distributed equally distributed among all servers which means all servers will be under similar workload all the time.

According to queuing theory principles, λt and µt can be estimated from appropriate exponential distribution functions [124] at every time interval t [97].

We use Poisson and Power Law functions to generate λt and µt respectively. These functions are commonly used for generating requests and service rates for multi-tier Internet applications [33]. The average CPU utilization and response time can be derived using queuing theory principles as will be explained in the following sections. For convenience, all symbols used in our modeling and their meaning are described in Appendix B, table1.

87 4. Elasticity Modeling

4.2.2 Modeling CPU Utilization

Approximating the average CPU utilization of mt servers is crucial for modeling CPU-based elasticity. It forms the basis for evaluating elasticity conditions at every time interval t and deciding when elasticity actions have to be triggered. The following derivations of CPU utilization and response time are based on queuing theory and operational laws [124].

The total number of requests that arrive and get served at the application tier during time interval t can be estimated through the multiplication of the request arrival rate at certain time interval t by the length of the time interval during which the system is monitored T (i.e., λt T ). Therefore, the average time of each server at time interval t is:

(λtT/µt) Btm = (4.1) mt

Accordingly, the average busy time of mt servers during time interval t is:

Bt = λtT/µt (4.2)

According to the Utilization law in queuing theory [124; 125], the CPU utilization of each server is deﬁned as the busy time of each server divided by the total time during which the system is monitored. Therefore, the CPU utilization of each server during time interval t is deﬁned as follows:

Ut = Btm/T (4.3)

By substituting Btm in equation 4.3 then we get the average CPU utilization of a server at the application tier is:

λt Ut = (4.4) mtµt

88 4. Elasticity Modeling

In the above equations, it is important to note that the CPU utilization has been derived from the utilization law of queuing model. Furthermore, in equation 4.4 the multiplication of number of servers by the mean service rate (mtµt) results in requests intensity.

Empirical experiments with different CPU elasticity rules [45] have demonstrated that CPU utilization when m = 1 (i.e., before triggering any scale-out actions) increases significantly while request rate increases. This is due to the immediate start of concurrent user sessions (ramp-up stage) that leads to sharp increases in the number of concurrent requests that have to be served by one server. It can also be observed that such significant increases in CPU utilization start decreasing quickly as soon as new servers are added by the scale-out actions triggered by elasticity rules. This can be reasoned due to the fact that the increased workload is distributed between multiple servers rather than one server. To capture this impact on CPU utilization, we employ a utilization ramp-up threshold (Urθ ) to adjust the approximated CPU utilization value. This threshold captures the effect of sudden workload increase while there is only one server handles requests. Therefore, Urθ will not be required to be modified when auto- scaling adds more servers. By applying the ramp-up threshold to equation 4.4, we obtain the following equation which determines the final approximated value of CPU utilization.



Ut + Urθ if mt = 1, Ut = (4.5) Ut if mt > 1 As shown in equation 4.5,the ramp-up threshold is added to the estimated CPU utilization (from equation 4.4) when number of servers is equal to 1. The value of ramp-up time is estimated from a distribution based on the value of CPU threshold speciﬁed in the elasticity rules. Based on the empirical experiments with diﬀerent CPU thresholds [8], we have observed that as the value of CPU threshold is increased, the increase in CPU utilization values becomes higher when only one server is serving requests, i.e., m = 1.

89 4. Elasticity Modeling

The CPU utilization model (equation 4.5) forms the basis for simulating CPU- based elasticity rules as it will be used to evaluate elasticity conditions to decide whether to trigger speciﬁed elasticity actions or not. Using this CPU utilization model, we can deﬁne the following states in which elasticity conditions could be:

u Ut ≥ Uθ over-utilized l Ut ≤ Uθ under-utilized l u Uθ ≤ Ut ≤ Uθ normal utilization (4.6)

In the above conditions, the approximated CPU utilization value (from equation 4.5) is evaluated every time interval t against the CPU utilization thresholds u (speciﬁed in elasticity rules by the cloud consumer). Here, Uθ represents the l upper CPU utilization threshold for scale-out. The Uθ refers to the lower CPU utilization threshold for scale-in. These thresholds are deﬁned in scale-out and scale-in conditions (in elasticity rules), which we try to capture in our modeling. The evaluation can result in one of the three CPU utilization states; over- utilization, under-utilization or normal utilization.

Using the above CPU utilization equations and the evaluation conditions to determine CPU utilization states, we can decide when elasticity conditions on CPU utilization hold. However, as explained in Section 4.1, the elasticity condition should hold for time window (Tw) of consecutive time intervals, each with length T . This can be modeled as follows:

 u m + m if PTw t ≥ T u,  t inc j=1 i w  l m = PTw l (4.7) t mt − mdec if j=1 ti ≥ Tw,   mt otherwise

90 4. Elasticity Modeling

The ﬁrst condition states to scale-out number of servers by minc servers when u u Ut ≥ Uθ for t consecutive time intervals of total length Tw (the upper time window threshold). The second condition is to scale-in number of servers by mdec l l when Ut ≤ Uθ for t consecutive time intervals of total length Tw (the lower time window threshold). The number of servers will not change in any other states.

The increase and decrease in the number of servers, i.e., minc and mdec, are determined by the cloud consumers. The change in number of servers (when to trigger scale-out and scale-in actions) is also subject to other constraints such as server provisioning lag time and cool-down time. These constraints will be explained in Section 4.2.4.

4.2.3 Modeling Application Response Time

Response time is a crucial metric for evaluating the performance of Internet business applications [28; 35; 130]. In web applications performance, response time is deﬁned as the time needed to execute a task [127], which includes the processing time at each tier. Here, we are interested in the application tier response time, which is the time that is needed by the application tier to serve one request. As the scope of the elasticity modeling is on the application tier, we focus on modeling response time at the application tier. Therefore, we approximate the average response time of all requests seen by the application tier every time interval t as speciﬁed in the elasticity rules.

According to Little’s law [124; 125], the mean response time is a relationship between the number of requests in the system and λ. This relationship can be represented with respect to the time interval t as shown in the following equation:

nt Rt = (4.8) λt

Where nt is the expected number of requests in the application tier during time interval t. Based on queuing theory, requests can be in any of the two states; either being served or waiting in a queue. Therefore, nt can be further divided

91 4. Elasticity Modeling into the number of requests are being served and the number of requests queuing at each time interval t as follows:

nt = (ns)t + (nq)t (4.9)

Where (ns)t is the number of requests are being served and (nq)t is the number of requests queuing at the application tier. These can be derived based on Erlang’s formula and probability of the number of requests in the system equations [124]. Therefore, equation 4.9 can be further represented in terms of number of servers, server utilization and probability of queuing as in the following equation:

Ut%t nt = (mtUt) + (4.10) 1 − Ut

Where %t is the probability that a request has to wait in a queue during time interval t. When all m servers are busy, then the arriving requests have to wait in a queue. Here, the probability of queuing is derived based on the number of jobs in the system [124; 125]. As previously mentioned, the arrival and serving of requests follow a birth-death process which can be represented as state transition diagram [124; 125]. Given this, probability laws can be used to estimate the number of requests in the system. To do so, we need to estimate the probability of number of jobs in the system in two states, when there is no requests in the system and when the number of requests is greater than number of servers. Using probability laws [124], the probability of zero requests in the system P0 can be estimated as follows:

m −1 −1 mt t n (mtUt) X (mtUt) P = [1 + + ] (4.11) 0 (m )!(1 − U ) n! t t n=1 The probability of n of requests in the system where n is greater or equal than the number of servers Pn represents when requests have to wait in queues until a server becomes available. This probability of queuing can be estimated using

92 4. Elasticity Modeling probability functions as follows:

n m Ut mt Pn(n ≥ m) = P0( ) (4.12) (mt)!

By substituting P0 in the equation 4.12, we obtain the probability of queuing

%t as follows:

mt (mtUt) %t = P0 (4.13) (mt)!(1 − Ut)

By substituting nt from equation 4.10 into equation 4.8, we obtain the formula for the average response time at time interval t:

1 %t Rt = (1 + ) (4.14) µt mt(1 − Ut) One important aspect that queuing theory does not take into consideration is the types and mix of requests [18; 125]. The request types represent the certain functionality or page that a user may request. For example, users can request any of the following pages/functionality of an online shop: Home page, New Products, Search Items and Check Out. At any point of time, it is likely that the application could receive requests from different users concurrently. It is also likely that each user will request different functionality/pages at different times, and therefore the application will likely receive mix of requests of different types.

The mix of requests received by the system in a particular time interval consist of different proportions of different types, each requiring different processing times. Different request types (e.g., ”Home” page, ”Search Results” request, ”Item Details” request) put different demands on the CPU as each request has different execution paths. Furthermore, the percentage of requests of each type would play a significant role in determining the time on resources (e.g., CPU). Therefore, request types and requests mix could have different footprint on the servers and this influence has to be the considered in estimating application re-

93 4. Elasticity Modeling sponse times. Some recent studies have demonstrated the impact of request types and mix on the performance of web applications [18]. We capture the inﬂuence of request mix and types on response time as follows:

n X Txt = Diλit (4.15) i=1

Where Tx is the additional time resulting from the request mix. Di is the average demand of a request of type i puts on the CPU and n represents the number of request types. The values of Di can be obtained from real measurements for each request type as will be explained in Section 4.3. λit is the average request rate of requests of type i during time interval t and can be estimated as follows:

λit = λtQpi (4.16)

Where Qpi is the percentage of request type i relative to all requests. These percentages can be defined by classifying the workload profiles. For example, the TPC-W, an industry standard benchmark for web applications, classifies e- commerce application profiles into Browsing, Shopping and Ordering profiles in which the percentages of request types vary [131].

By adding the time resulted from request mix (equation 4.16) to equation 4.14 we get the ﬁnal average response time at every time interval t as follows:

n 1 %t X R = (1 + ) + D λ (4.17) t µ m (1 − U ) i it t t t i=1 The above modeling of the key performance metrics, i.e., CPU utilization and response time, are based on queuing theory and principles. There are still other constraints or parameters that are important for simulating elasticity rules. These parameters do not follow queuing theory and principles but rely on the estimated performance metrics. In the next section we explain how we capture

94 4. Elasticity Modeling these constraints into our elasticity models.

4.2.4 Modelling Other Elasticity Constraints

The remaining elasticity constraints for modeling elasticity rules are: server provisioning time, cool-down time and number of server limits. In the following, we explain these concepts and how we model it.

Server Provisioning Lag Time

The M/M/m model assumes an ideal queue with frictionless elasticity, that is, servers are provisioned and de-provisioned instantly. However, a scale-out action involves a delay before a server becomes operational due to the time required to provision and boot a new server with all required operating system packages and software applications [45]. For example, it has been shown that AWS cloud servers require in average about 5 minutes to start a cloud server of small size (m1.small) and about 2 minutes to start high-CPU medium server (c1.medium) [45]. We term this as the server provisioning lag time (or server lag time) and denote it by Tsl. In contrast, the effect of a scale-in or de-provisioning action can be considered to be as near instantaneous. The server lag time is important as it dictates when a change in the number of servers takes effect after a scale-out action is triggered. We model the server lag time and its influence on the number of server (mt) changes as in the following equation:

 PTslθ mt + minc if i=1 T ≥ Tslθ , mt = (4.18) PTslθ mt if i=1 T ≤ Tslθ , The above conditions reads as follow. The number of servers must be increased by minc when two conditions are satisﬁed; a scale-out action is triggered and when a total duration of time intervals of length T is greater than or equal to the server lag time threshold (Tslθ). The summation of these time intervals go for all time intervals of length Tslθ. Here, minc is a constant that can be deﬁned by cloud consumers to indicate the amount of server increments at each scale-out

95 4. Elasticity Modeling

action. The threshold, (Tslθ), here depends on the server type and it can be obtained from real measurements of server types to be used from an IaaS provider. The threshold value should be provided by the cloud consumer as an input to the above models. The other condition states that the number of servers should not be changed while the total duration of time intervals of length T is less than or equal to the Tslθ for all time intervals of length Tslθ. For scale-in actions, there is no server lag time to apply as shutting-down a server takes eﬀect immediately.

Therefore, the change in servers from mt to mt − 1 will be applied as soon as the scale-in condition is satisﬁed for duration of Tw.

Cool-down Time

IaaS providers such as AWS allow cloud consumers to specify a cool-down time to take into account the effects of an elasticity action on the system, such as provisioning lag time and the time needed for a server to start serving requests, before evaluating elasticity conditions again. This time interval allows an appropriate interval for a triggered scaling action to take effect impact application performance (e.g., response time) before triggering a new scaling action. There- fore, it is important to reflect the cool-down time into the evaluation of elasticity u conditions. We denote the cool-down time after a scale-out action as Tc and after l a scale-in action as Tc . The following conditions model the cool-down time that must be respected after both scale-out and scale-in actions.

u Tc u u W hen Σj=1T ≥ Tc evaluate Ut ≥ Uθ l Tc l u W hen Σj=1T ≥ Tc evaluate Ut < Uθ (4.19)

The ﬁrst condition is the cool-down time condition for scale-out. It states that the elasticity condition based on estimated CPU utilization (Ut) must be u evaluated against CPU utilization threshold only after Tc time(upper cool-down threshold) has been elapsed. The total time of elapsed time intervals of length

96 4. Elasticity Modeling

u T must be greater or equal to Tc . As previously explained, T represents the length of time during which the system is monitored. It is a ﬁxed constant value measured in minutes. The cool-down condition has to be evaluated after a scale- out action is triggered. Similarly, this applies to the cool-down time condition l for scale-in. It reads the same way but with Tc (the lower cool-down threshold). The upper and lower cool-down threshold values could be diﬀerent and should determined by cloud consumers depending on their requirements.

Number of Servers Limits

Another important constraint that should also be considered in modeling elasticity involves user-speciﬁed maximum and minimum numbers of servers that must not be breached by elasticity actions. This is another parameter that is required by most elasticity mechanisms to allow cloud consumers to control limits on provisioning servers, and hence ensuring cost constraints and baselines of required servers are respected. So, a scale-out action will not be executed when a maximum number of servers Smax is reached. Similarly, a scale in action will not be triggered if a minimum number of servers Smin is reached. The following conditions capture maximum and minimum server limits that must be respected each time a scale-out or scale-in action is triggered.

 mt + minc if scale − out == true && mt ≤ Smax,  mt = mt − mdec if scale − in == true && mt>Smin, (4.20)   mt otherwise

4.2.5 CPU-based Elasticity Algorithms

The ﬁrst condition states that the number of servers at time interval t must be increased when a scale-out is triggered and the current number of servers is less than or equal the maximum allowed number of servers (Smin). Similarly, the number of servers at time interval t must decrease only when a scale-in action

97 4. Elasticity Modeling triggers and the current number of servers is less greater than the minimum allowed servers limit (Smin). If the maximum or the minimum server limits are reached then no scaling actions should be executed.

In previous sections, we have explained how key parameters and metrics of elasticity rules are modeled. We also explained the constructs that capture the logic of elasticity conditions and when to trigger elasticity actions. These models and constructs form core building blocks that can be put together to simulate the logic of elasticity rules in an IaaS cloud. Here, we present two algorithms that emulate the scale-out and scale-in operations respectively.

Scale-Out Algorithm

Algorithm1 describes the logic of our scale-out algorithm. The scale-out algorithm1 is executed at each time interval t of length T as follows. It ﬁrst requires initialization of the values of the key thresholds and parameters namely, the length of time intervals under which the application is monitored T , the up- u per CPU utilization threshold Uθ , the server lag time Tsl, the cool-down time u u Tc , the monitoring time window Tw , the maximum allowed server limit Smax and the number of servers increase minc. The values of these parameters depend on the application’s requirements and the IaaS cloud and therefore it has to be provided by the cloud consumer as input. This has been explained in sections 1.3 and 1.4, cloud consumers do not know which values will lead to achieving desired performance objectives, e.g., resources utilization and SLA satisfaction. This is the purpose of the scale-out algorithm, to approximate how well performance metrics are met for set of values provided by cloud consumers. By evaluation diﬀerent sets of values, cloud consumers can make appropriate decisions about the impact of changing one or more value on performance metrics, and therefore choose appropriate values to satisfy their application performance metrics.

The values of request arrival rate at the application tier (λt) and the mean service rate of each server (µt) are generated from Poisson and exponential distri-

98 4. Elasticity Modeling butions respectively. As it has been explained in section 4.2.1, these distribution are commonly used for workload generation of Web applications [83; 124; 125]. Alternatively, they can be generated from real workload and benchmarks. The average server utilization at the application tier (Ut) and the average response time (Rt) every time interval t are approximated using equations 4.4 and 4.5; and 4.17 respectively.

Algorithm 1 CPU-based Elasticity - Scale-Out u u u Initialize T , Uθ , Tsl, Tc , Tw , Smax, minc

Estimate λt, µt, Ut, Rt u if ((Tct ≤ 0) && (Tslt ≤ 0)) then u if (Ut ≥ Uθ ) then overUtilT ime = overUtilT ime + T else overUtilT ime = 0 end if else

Tsl = Tsl − 1 u u Tct = Tct − 1 end if u if ((overUtilT ime ≥ Tw ) && (mt < Smax)) then

mt = mt + minc overUtilT ime = 0 u u Tct = Tc

Tslt = Tsl else

mt = mt−1 end if

return Ut, Rt, mt

The ﬁrst condition ensures that the interval is not within the cool-down time and server provisioning lag times. This is to decide whether to wait for a server lag time and a cool-down time before evaluating scale-out condition. If these do

99 4. Elasticity Modeling not hold, then it checks if the approximated CPU utilization value is above the upper threshold (Uθ)u for at least a time interval of length (Tw)u. If the approximated utilization drops below (Uθ)u within (Tw)u, then the counter is reset to start new interval check. When the approximated CPU utilization remain above

(Uθ)u for a time interval (Tw)u and the maximum number of servers has not been reached, then a scale-out action is triggered to increase the number of servers by minc. To ensure that no further scaling actions will be taken until both cool-down and server provisioning lag times elapse, then the cool-down time and the server provisioning lag time are reset to the desired values. The number of servers do not change during all time intervals during which the approximated CPU utilization is below (Uθ)u. The outputs of the algorithm are the approximated values of the key parameters and metrics including average CPU utilization, average response time and number of servers at each time interval t. These values are represented as arrays indexed by time intervals. The resulted values can be used for evaluating and analysing elasticity rules with diﬀerent thresholds and parameters’ values.

Scale-In Algorithm

Algorithm1 describes the logic of our scale-in algorithm. Similar to the scale- out logic, the scale-in is executed at each time interval t as follows. It starts with initializing the values of each of the time interval length T , the lower CPU uti- l l lization threshold T , Uθ , the lower cool-down time Tc , the lower monitoring time l window Tw , the minimum server limit Smin and the number of servers decrement mdec. Also, the values of request arrival rate λt, mean service rate µt, average

CPU utilization Ut and average response time Rt are estimated using the models presented earlier.

The scale-in algorithm2 has similar logic but it does not have server provisioning lag time (as shutting down a server does not need time to take eﬀect). An- other diﬀerence is the lower thresholds (e.g., CPU utilization threshold, cool-down threshold)and parameters (e.g., minimum number of servers) are used instead of upper thresholds and parameters. The outputs obtained from the algorithm are

100 4. Elasticity Modeling similar to the outputs of scale-out algorithm.

Algorithm 2 CPU-based Elasticity - Scale-In l l l Initialize T , Uθ , Tc , Tw , Smin, mdec

Estimate λt, µt, Ut, Rt l if (Tct > 0) then l l Tct = Tct − 1 else l if (Ut ≤ Uθ ) then

underUtilT ime = underUtilT ime + Ti else underUtilT ime = 0 end if end if l if ((underUtilT ime ≤ Tw ) && (mt > Smin)) then

mt = mt − mdec underUtilT ime = 0 l l Tct = Tc else

mt = mt−1 end if

return Ut, Rt, mt

4.2.6 Cost Models

One of the important output of scale-out and scale-in algorithms is the number of servers at every time interval t. The number of servers over time provides an important input for estimating server usage costs. This is because IaaS providers mainly charge users based on the number of servers in operation and the time for which they have been used (see cloud business models in section2). Com- monly, most IaaS providers charge on an hourly basis, which is consistent with the elasticity in which servers can be provisioned and de-provisioned at anytime. Triggering of elasticity actions leads to changes in the overall cost of operation

101 4. Elasticity Modeling due to the provisioning and de-provisioning of servers. The following equation captures server usage costs over time based on the number of servers over time interval t:

Max(m) X Sc = (Shj →T )Sr (4.21) j=1 Where Max(m) represents the maximum number of servers that have been provisioned during the time period during which cloud consumers need to calculate server usage costs. Shj→T is the uptime of the server j in hours. We will explain this parameter in an algorithm for determining the value of Shj→T for each server j. Sr is the server usage charge per hour which is ﬁxed by the IaaS provider.

Algorithm3 describes the logic for approximating the cost of server usage triggered by scale-out and scale-in actions. The cost of provisioning can only be calculated considering the entire time from the application’s first deployment till the present. We denote this time period as τ and is the sum of all time intervals t. The first two loops fill the number of minutes for each server in the Smins (i.e., server minutes array) with dimensions maximum number of servers (Max(m)) and measurement time τ. For each server represented by a column in the array, a value of 0 or 1 is assigned depending on whether that server is provisioned or not (the change in m over time determines here when a server is provisioned or de-provisioned). In the Smins array, a value 1 indicates that server is used for 1 time interval (e.g., 1 minute). The second loop block uses the server minutes array to compute the total number of hours (rounded to the next hour) for each server that has been used. The number of hours is then used to compute the server cost by multiplying total number of minutes with server charges (Sr) for each server which is added to the total cost of provisioning (Sc).

In cloud service models, there are additional charges for using the elasticity or auto-scaling service. Examples of such charges include monitoring service for important metrics (e.g., CPU utilization) and notiﬁcation service such as alerts

102 4. Elasticity Modeling to notify cloud consumers when a scale action is taken or the application is in critical performance manner. We have not considered such costs as they are not related to the scale-out and scale-in actions which frequently change the number of servers over time.

Algorithm 3 CPU-based Elasticity - Servers Usage Cost

Smins(Max(m), τ) = 0 for i ← 1, τ − 1 do

if (mt+1 > mt)k(mt+1 < mt) then

mCount = mt+1 else

mCount = mt end if for i ← 1, mCount do

Smins(i, t) = 1 end for end for for i ← 1, Max(m) do totalMins = 0 for t ← 1, τ do

totalMins = totalMins + Smins(i, t) end for

Sc = Sc + (dtotalMins/60e ∗ Sr)

return Sc, Smins end for

The above scale-out, scale-in and cost algorithms must execute simultaneously so to ensure both scale-out and scale-in conditions are evaluated at each time interval and appropriate scaling decisions are made. Similarly, the server usage cost must be evaluated at every time interval as the number of servers might change any time due to scale-out or scale-in actions. In our Matlab simulation, we have implemented the these algorithms in a single logical block that continuously updates all the desired parameters and metrics simultaneously.

103 4. Elasticity Modeling

4.3 Validation

In this section we introduce how we validate our elasticity models. Validation is necessary to evaluate the extent to which the analytical models can capture the behavior of elasticity mechanisms as they work in real world enjoinments. Therefore, the main purpose of our validation is to evaluate how well our elasticity models and algorithms can (1) emulate the behavior of scale-out and scale-in elasticity rules, and (2) approximate important metrics namely, average CPU utilization, average response time, number of servers and server costs. This evaluation will primarily be based on these metrics that are crucial for cloud consumers who want to host their business applications on an IaaS cloud and utilize the beneﬁts of elasticity. Another important aim of the validation is to discuss factors that may inﬂuence modeling elasticity rules behavior and approximating those metrics. This is important as it opens opportunities for further research.

We have validated our elasticity models and algorithms empirically and by simulation. We ﬁrst describe the design and methodology of our empirical and simulation environments (Section 4.3.1). We then compare and discuss the results of the empirical and simulation experiments in terms of CPU utilization, application response time, number of servers and server costs (Section 4.3.2).

4.3.1 Experimental Design and Methodology

In this section we describe the experimental setup and methodology for our (1) simulations in Matlab and (2) empirical experiments with TPC-W benchmark on Amazon EC2.

Simulation Experiments

We have simulated our elasticity models and algorithms, presented in Section 4.2, using Matlab. Speciﬁcally, we have implemented the scale-out and scale-in algorithms as one program to ensure they operate simultaneously. The key inputs to the simulation program are the application workload (derived from real appli-

104 4. Elasticity Modeling

Table 4.1: TPC-W Workload Proﬁles and Web Interaction Groups Web Interaction Group/ Browsing Shopping Ordering Workload Proﬁle Browse (Read Operations) 95% 80% 50% Order (Write Operations) 5% 20% 50% cation traces), the thresholds values and the parameters values of the elasticity rules to be evaluated (see Algorithms1 and2). The output of these algorithms are the resource and application metrics as listed in Algorithm1 and2.

We have also implemented the algorithm to calculate server usage cost as another program that takes as input the changes in number of servers over time

(i.e., mt) resulting from the scale-out and scale-in algorithms. The main purpose of the simulation is to evaluate how well our elasticity models can approximate elasticity parameters and related performance and cost metrics. Therefore, the output of the simulation will have to be compared with the corresponding metrics that must be obtained empirically as explained next.

Empirical Experiments

In our empirical experiments, we have chosen TPC-W, an industry benchmark for transactional Web application [131], as a representation of online retail applications. TPC-W benchmark has been widely used for evaluation in cloud-related studies [18; 45; 99]. TPC-W offers a comprehensive specification of functionality of an online bookstore application and the behavior of its users. According to TPC-W specifications [131], the basic scenario of user interaction in the TPC- W benchmark is described in the following manner. Each emulated user opens a session that consists of a sequence of interactions (or requests) such as ”Best Sellers”, ”Item Details”, ”Search”, ”Add to Cart” and ”Buy Confirm”. Each emulated user waits for a certain interval (think time) before issuing the next interaction to the web server. The transition from one interaction to another is determined by a state transition matrix which determines the transition proba- bilities from one interaction to another.

105 4. Elasticity Modeling

...... m1.xlarge Linux m1.xlarge Linux server server Elastic Load TPC-W DB, MySQL TPC-W user Balancer

emulation software Database Tier

Web Tier Workload Generation m1.small Linux server Emulated Users TPC-W App. logic, JBoss, J2SDK

Application Tier Auto-Scaling Group

Figure 4.4: Deployment Architecture of TPC-W Book Store on Amazon EC2

Based on 14 different web interactions, TPC-W specifications differentiate between three workload profiles; Browsing, Shopping and Ordering profiles. These profiles vary based on the percentage of each interaction in the Browse (read operations) and Order (write operations) groups as shown in Table 4.1. In all experiments we used Browsing profile as it stresses the application tier(i.e.,95% read operations and 5% write operations) that is the focus of our elasticity performance modeling.

We have used the open source Java implementation of TPC-W benchmark developed by Horvath [132]. We have deployed it on Amazon EC2 cloud in a 3-tier architecture; typically used for Internet applications [18; 40; 45; 83; 99; 107; 133]. Figure 4.4 shows the deployment architecture of TPC-W on Amazon EC2 cloud. On the Web tier, we conﬁgured Amazon’s Elastic Load Balancer to distribute user requests among the pool of instances at the application tier. We have created an Amazon Machine Image (AMI) on which we installed JBoss2.3.2, J2SDK1.4.19

106 4. Elasticity Modeling and AWS SDK for Java 1.3.27. We also deployed the TPC-W bookstore application logic on this AMI. We use the VM image as a root storage to create Amazon Linux virtual servers for the application tier. The application tier consists of a pool of Amazon’s Linux virtual servers of small capacity (i.e., m1.small instance 1). Furthermore, we conﬁgured the application tier as an auto-scaling group to scale-out and scale-in based on the elasticity rules to be deﬁned by cloud consumers. All virtual servers in the auto-scaling group are instantiated from the AMI we have created for the application server.

We have deployed the TPC-W bookstore database on a separate Amazon Linux virtual server of Extra Large capacity (i.e., m1.xlarge instance 2) which runs MySQL5.1.92. We populated the database with 10000 books generated ran- domly according to TPC-W speciﬁcations [131]. The TPC-W user emulation program was deployed on a separate Amazon Linux virtual server of Extra Large capacity (i.e., m1.xlarge instance). We have chosen servers with very high capacity for both the database and the user emulation program to ensure they do not become performance a bottleneck. All the cloud servers we have used were located in the same Amazon geographic region, US East (Virginia) namely us- east-1b. This is to reduce network overhead between cloud servers at diﬀerent tiers and hence, its impact the application response time.

Application Workload

We have generated the workload using TPC-W user emulation software developed by Horvath [132] based on TPC-W speciﬁcation [131]. The workload has been generated using the TPC-W Browsing proﬁle as it stresses the application tier, the focus of our elasticity modeling. The TPC-W user emulation takes as input a number of parameters such as number of concurrent user sessions and its inter-arrival times. In this workload, the number of concurrent users and inter- arrival times have been driven from a power-law (Zipf) and Poisson probability

1Capacity: 1 EC2 Compute Unit (ECU) (1 virtual core), 64-bit Architecture, 1.7GB memory, Moderate I/O performance, 160GB disk storage 2Capacity: 8 EC2 Compute Unit (ECU) (4 virtual cores), 64-bit Architecture, 15 GB memory, High I/O performance, 2x840 GB disk storage

107 4. Elasticity Modeling

2 4 0 0 W o r k l o a d 2 2 0 0 )

n 2 0 0 0 i m /

q 1 8 0 0 e r (

e 1 6 0 0 t a R

l 1 4 0 0 a v i r

r 1 2 0 0 A

t s 1 0 0 0 e u q

e 8 0 0 R 6 0 0 4 0 0 2 0 0 0 0 3 0 6 0 9 0 1 2 0 1 5 0 1 8 0 2 1 0 2 4 0 2 7 0 3 0 0 3 3 0 3 6 0 3 9 0 4 2 0 4 5 0 4 8 0 5 1 0 5 4 0 5 7 0 T i m e ( m i n u t e s )

Figure 4.5: TPC-W Workload Used in All Experiments distribution functions respectively. These functions are commonly used for generating workloads for web applications [33]. The other key parameters that has been to generate the workload are ramp-up time (180 seconds) and ramp-down time (120 seconds). The ramp-up time is the number of seconds that all user sessions have to wait before start interacting with the TPC-W application. This is useful for synchronizing multiple user sessions. The ramp-down time is the number of seconds that must be waited before all user sessions are terminated at the end of measurement interval, i.e., the time interval during which all user sessions are executed.

The resulting workload is shown in Figure 4.5. It represents a varying workload in which the number of requests increase to reach a peak and then decrease for a number of times over a time period. The peaks and troughs have diﬀerent magnitudes. This is an example of the varying workloads that web applications typically experience. It motivates the need for the underlying infrastructure to be scalable in order to meet application performance and cost requirements. We

108 4. Elasticity Modeling

Table 4.2: Elasticity Thresholds Used in all Experiments u u l l Rule/Threshold Uθ Tw Uθ Tw CPU75 75% 5 min 30% 10 min CPU80 80% 5 min 30% 10 min CPU85 85% 5 min 30% 10 min CPU90 90% 5 min 30% 10 min have generated the workload over long time period, about 10 hours, to allow enough time for the scale-out and scale-in conditions and actions to take eﬀect.

This workload has been generated using the parameters explained previously, for each empirical experiment to evaluate the elasticity rules in practice. From the empirical experiments, we have collected the traces of the generated workload and provided it as input to the simulation experiments. This ensures that the same workload have been used in the empirical and simulation experiments so that there was no inﬂuence from changes in workloads. In this workload, T is 1 minute and τ is 573 minutes and therefore the workload was divided into 1-minute time intervals. The choice to divide workload into 1-minute time intervals is to make it consistent with the length of monitoring time intervals. In addition, 1-minute time interval is granular enough to perform monitoring and analysis of collected metrics. This is also consistent with most of existing cloud monitoring and auto-scaling techniques including [37; 44; 77; 78; 119]. As it has been explained in chapter 4, our modeling of elasticity mechanisms and related performance metrics have been also divided into time intervals of same length. In our simulation experiments we used time intervals of 1-minute length which is consistent with empirical experiments time intervals.

Elasticity Rules and Thresholds

In all experiments, simulation and empirical, we have used four elasticity rule sets for scale-out and scale-in as shown in Table 4.2. The naming convention of the elasticity rule set is based on the CPU utilization upper threshold (e.g.,

109 4. Elasticity Modeling

Table 4.3: Request Mix: Percentages of Request Types and CPU Demands of Browsing Proﬁle Request Type Request Percentage (%) Avg. CPU Demand (s) Home 29% 0.062 New Products 11% 0.580 Best Sellers 11% 0.482 Product Detail 21% 0.035 Search 12% 0.059 Search Results 11% 0.289 Shopping Cart 2% 0.073 Buy Conﬁrm 0.69% 0.069

CPU75 for scale-out action with a CPU utilization upper threshold 75%). As shown from the table, we focus on the scale-out conditions by changing the upper CPU utilization threshold. The scale-out condition and its upper threshold has direct impact on the server usage cost and application performance as these determine when servers are added. In practice, cloud consumers try to carefully choose CPU utilization values that lead to eﬃciently utilize their resources while maintaining desired application performance [45]. Therefore, we have chosen CPU utilization values in accordance with recommended performance design and testing principles [134] and empirical studies such as [45].

The other parameters of all elasticity rule sets are set as follows: upper cool- u l down (for scale-out) Tc = 5 minutes, lower cool-down for scale-in Tc = 5 minutes, server lag time Tsl = 4 minutes, minimum number of allowed servers Smin = 1, maximum allowed servers to be triggered Smax = 20.

Request Mix and Demand

As previously discussed, the response time models require determining the percentages of each request type and its demands on the CPU (see Section 4.2.3). We have used the percentages of the request mix of the Browsing profile specified by TPC-W benchmark [131]. We chose the percentages from the Browsing profile as it stresses the application tier, the focus of our elasticity modeling. This

110 4. Elasticity Modeling request mix is also consistent with that used in the implementation of TPC-W workload generation [132]. Therefore, in our simulations, we divided the request mix at each time interval t into 8 request types each of which has the percentage of the overall number of request λt as shown in Table 4.3.

We derived the demand that each request type puts on the CPU through measurements in the real cloud environment. As described previously, we have used the TPC-W experimental set on Amazon EC2 cloud. Speciﬁcally, we have generated individual requests, one at a time, of the same type and sent it to the TPC-W application to be served. We repeated this for each request type 19 times and collected the service times taken by the CPU to serve each request type individually. For each request type, we calculated the average service time from the 19 samples. The resulting times are the service demand of each request type as shown in Table 4.3. We have used these request percentages and CPU demands in our simulation implementation.

Metrics and Data Representation

In all experiments, the following metrics are continuously monitored and their measurements collected at every time interval t:

• CPU Utilization: the average CPU utilization of all servers in the application tier at every minute interval t.

• Response Time: the average response time of all requests at the application tier every minute interval t.

• Number of Servers: the number of servers at the application tier every minute interval t.

• Servers Cost: the usage cost of the servers at the application tier based on Amazon hourly charges (USD$0.08 for N.Virginia small server instances).

These are the key metrics for evaluating elasticity rules and impact of changing the values of elasticity thresholds. From a cloud consumer perspective, it

111 4. Elasticity Modeling is crucial to ensure that their cloud servers are well-utilized and the costs are reduced while application’s response time is within desired levels. In the empirical experiments, we have conﬁgured Amazon CloudWatch [78] to collect average CPU utilization and to count the number of servers in the application tier at every 1-minute time interval. From the number of servers, we calculate the servers cost over the duration of the experiments. Also, the application’s response is collected using our implementation of SLA monitoring component that is described in Section 5.2.1. This represents the average response times of all requests being served at all servers at the application tier at every 1-minute time interval. In the simulations, the measurements of the above described metrics are approximated using our Matlab implementation of our CPU-based elasticity models and algorithms.

We have represented the CPU utilization and response time measurements using box plots as these provide useful statistics which can help in analysing and comparing data points at a glance [135]. These statistics are; the mean, the median (50th percentile), the 1st, 25th, 75th and 99th percentiles as it will be illustrated in Section 4.3.2.

4.3.2 Results and Data Analysis

In this section we analyse and discuss the results of the empirical and simulation experiments in terms of CPU utilization, application’s response time, number of servers and server usage costs at the application tier.

CPU Utilization

Figure 4.6(a) shows the box plots of the average CPU utilization of all elasticity rules resulting from the simulation and the empirical experiments. We have used the CPU upper threshold as a naming convention followed by either ’M’ referring to the results from the simulation experiments or ’E’ referring to the results from the empirical experiments.

112 4. Elasticity Modeling

C P U 7 5 M 1 0 0 % C P U 7 5 E C P U 8 0 M C P U 8 0 E 9 0 % C P U 8 5 M C P U 8 5 E 8 0 % C P U 9 0 M C P U 9 0 E

) 7 0 % % (

n o i 6 0 % t a z i l i t 5 0 % U

P 4 0 % C

. g v 3 0 % A

2 0 %

1 0 %

0 % C P U 7 5 M C P U 7 5 E C P U 8 0 M C P U 8 0 E C P U 8 5 M C P U 8 5 E C P U 9 0 M C P U 9 0 E E v a l u a t e d E l a s t i c i t y R u l e s - M o d e l a n d E m p i r i c a l

(a) CPU Utilization - Model and Empirical

C P U 7 5 M C P U 7 5 E 0 . 7 C P U 8 0 M C P U 8 0 E C P U 8 5 M 0 . 6 C P U 8 5 E C P U 9 0 M C P U 9 0 E ) .

c 0 . 5 e s (

e m i 0 . 4 T

e s n o

p 0 . 3 s e R

. g

v 0 . 2 A

0 . 1

0 . 0 C P U 7 5 M C P U 7 5 E C P U 8 0 M C P U 8 0 E C P U 8 5 M C P U 8 5 E C P U 9 0 M C P U 9 0 E E v a l u a t e d E l a s t i c i t y R u l e s - M o d e l a n d E m p i r i c a l

(b) Response Time - Model and Empirical (c) Legend

Figure 4.6: Experimental Results of all Elasticity Rules - Models and Empirical

113 4. Elasticity Modeling

As it can seen from Figure 4.6(a), the simulated CPU utilization statistics approximated the empirical CPU utilization statistics with quite small variations. For example, the variation between simulated mean CPU utilization of all evaluated elasticity rules and the empirical mean CPU utilization range between 1.5% and 6.8%. Similarly, the simulated CPU utilization exhibit slight variations from the empirical ones in terms of other statistics namely, 75 percentile, median and 25 percentiles. This shows that our models were able to approximate CPU utilization metrics quite well but it still can be improved by capturing other factors that may have inﬂuence on CPU utilization changes.

One observation that relates to the variation of simulation and empirical data is the relationship with the CPU utilization threshold. It can be noticed from Fig- ure 4.6(a) that the increase in variation between simulated and empirical CPU utilization statistics is sensitive to the upper CPU utilization threshold. The higher this threshold the higher in variation between simulated and empirical data. We see this due to the lack of our models in capturing other performance parameters that becomes more apparent when a system enters higher overloaded times.

One important observation that can be noticed from the simulated and empirical experiments is that there is a consistent pattern in the CPU utilization data (see Figure 4.6(a)). Speciﬁcally, the majority simulated CPU utilization statistics are always lower than the corresponding empirical statistics. This pattern applies to most of the evaluated elasticity rules. We believe that this can be explained by the lack of capturing other factors that could have inﬂuence on performance. Examples of such factors include visualization mechanisms and sharing of computing resources in cloud environments.

The simulation and empirical data of CPU utilization also reveals another important observation. From Figure 4.6(a), it can be noticed that there is a re- u lationship between the upper CPU utilization threshold (Uθ ) and the mean and median CPU utilization. Particularly, the mean and median CPU utilization re-

114 4. Elasticity Modeling sulting from the simulation and empirical experiments increase as the upper CPU utilization threshold increases. Our models are not only able to approximate CPU utilization statistics, but also capture such relationships. As the purpose of any modeling eﬀort is to approximate the real-world behavior, we believe our models and simulation can help cloud consumers to understand the eﬀect of changing elasticity thresholds on important metrics such as CPU utilization.

One of the important observations on improving our models accuracy of approximating CPU utilization statistics is the spike effect. In particular, examining the raw data of both the empirical and simulation experiments shows a spike in the CPU utilization within the first 80 minutes as shown in Figure 4.7(a). This pattern occurs in all elasticity rules experiments but the magnitude of such spike increase as the upper CPU threshold increases. As illustrated in Figure 4.7(a), our CPU-based elasticity models have not been able to approximate such spikes. Such spikes have occurred because of the continuous increase in request rate while only one server was serving all incoming requests, and before the first scale-out action was triggered [45]. Modeling workload spikes and its effect on CPU utilization require considering other system factors. However, in a production environment, this effect is sensitively captured as all factors that could influence performance can naturally cause such effects. We can see that such spikes gradually disappear as the number of servers increase because of the scale-out actions triggered over time.

As shown in Figure 4.7(a), the CPU utilization data points of the simulation and empirical experiments have converged over time, as the number of servers increased. We have observed the same convergence pattern in all other elasticity rules experiments. We believe this is because the increase in request rate was distributed between a number of servers resulted from scale-out actions instead of being handled by a single server as the case before triggering any scale-out actions at the start. As previously explained in section 4.2, our models are primarily based on request rate and number of servers, in addition to other variables, for approximating CPU utilization. Therefore, the eﬀect of increasing request rate is therefore divided between more servers over the time due to the trigger of scale-

115 4. Elasticity Modeling out action. This has led to the similarity between empirical and modeled CPU utilization data as shown in Figure 4.7(a).

Response Time

Figure 4.6(b) shows the box plots for the average response time statistics of all elasticity rules resulting from the empirical and simulation experiments. As shown in this figure, our models approximated response times with a mean and median relatively similar to the mean and median response times respectively resulted from empirical experiments. The mean response times of the simulation experiments were higher than those obtained from the empirical results. For all elasticity rules, the mean response times of the simulations were between 3.6% and 11.9% higher than the corresponding ones resulting from the empirical experiments. These results show relatively close figures and therefore the models can be relied upon to approximate the response times with different elasticity rules. In terms of the other statistics of response times, we can see some variation between the simulations and the empirical results. Such variations were caused by the CPU utilization spikes that have been discussed previously. Figure 4.7(b) shows the mean response times data points obtained from the empirical and simulation experiments of elasticity rule CPU90. It is clear that the CPU utilization spike has influenced response time by causing a spike in the latter. This effect is also found in experiments involving all other elasticity rules but with a magnitude that increases as the upper CPU utilization threshold increases. It can be noticed that the spike effect is proportional to the upper threshold due to the relationship between the upper threshold and when a scale-out action is triggered. Particularly, the higher the upper threshold the longer time a scale-out action takes to add a server. This results in increasing the time during which the CPU utilization is over-utilized and hence increase the magnitude of response times.

The spike eﬀect has not been precisely approximated by our models as it does not consider the situation under which such spikes occur, i.e., high increases in concurrent request volumes over time while one server at the application tier is serving these requests. As shown in Figure 4.7(b), the spike eﬀect in the response

116 4. Elasticity Modeling

1 0 0 C P U 9 0 M C P U 9 0 E 9 0

8 0 ) % (

7 0 n o i t

a 6 0 z i l i t

U 5 0

P 4 0 C

. g

v 3 0 A

2 0

1 0

0 0 4 0 8 0 1 2 0 1 6 0 2 0 0 2 4 0 2 8 0 3 2 0 3 6 0 4 0 0 4 4 0 4 8 0 5 2 0 5 6 0 T i m e ( m i n u t e s )

(a) CPU Utilization of CPU90- Model and Empirical

C P U 9 0 M 0 . 7 C P U 9 0 E

0 . 6 ) . c e

s 0 . 5 (

e m i

T 0 . 4

. p s e

R 0 . 3

. g v

A 0 . 2

0 . 1

0 . 0 0 4 0 8 0 1 2 0 1 6 0 2 0 0 2 4 0 2 8 0 3 2 0 3 6 0 4 0 0 4 4 0 4 8 0 5 2 0 5 6 0 T i m e ( m i n u t e s )

(b) Response Time of CPU90- Model and Empirical

Figure 4.7: CPU Utilization and Response Time Spikes of CPU90 Experiments

117 4. Elasticity Modeling time did not hold when the number of servers increased because of triggering scale-out actions. Accordingly, the response time data points of the simulations and the empirical experiments have converged and have become reasonably close to the empirical response time data points (see Figure 4.7(b)) when the number of servers increased.

Another important observation that can be derived from the response time box plots is that the simulation results showed a relationship with the upper CPU u utilization threshold (Uθ ). Speciﬁcally, as the upper CPU utilization threshold u Uθ increased, the average response time either increased or remained constant. This trend is consistent with the same trend that occurred with the empirical response time data. As previously explained, this relationship is because of the increasing load on the CPU that proportionally increase with the upper CPU utilization. Therefore, the CPU needs longer time to process incoming requests until a scale-out action is triggered and the load is distributed between further servers. The ability of our models to capture the impact of changing threshold on application response time demonstrate its capability and hence can consumers can rely upon it. This provides cloud consumers with a reliable way for predicting such trends when evaluating the performance of diﬀerent elasticity thresholds.

Number of Servers and Servers Usage Cost

Figure 4.8 shows the number of servers resulting from our model simulations and from the empirical experiments. As it can be seen, the simulations of all elasticity rules have resulted in a number of servers higher than those in the corresponding empirical experiments. Furthermore, the time when servers were triggered varying between the simulations and empirical the experiments. From Figure 4.8, we can also observe the relationship between the CPU utilization up- u u per threshold Uθ and the number of servers. Particularly, as Uθ increases, the number of servers decreases or remains constant. This is mainly because of the u elasticity conditions that took longer time to hold when Uθ increases. The higher u Uθ , the longer time it takes the CPU utilization to become above the threshold which delays when a scale-out action is triggered and the number of servers is

118 4. Elasticity Modeling

6 C P U 7 5 C P U 8 0 C P U 8 5 5 C P U 9 0

s 4 r e v r e S

f o

o 3 N

1 0 3 0 6 0 9 0 1 2 0 1 5 0 1 8 0 2 1 0 2 4 0 2 7 0 3 0 0 3 3 0 3 6 0 3 9 0 4 2 0 4 5 0 4 8 0 5 1 0 5 4 0 5 7 0 T i m e ( m i n u t e s )

(a) No. of Server Over Time - Empirical

C P U 7 5 C P U 8 0 C P U 8 5 C P U 9 0 6

5 s r 4 e v r e S

f o

. 3 o N

0 3 0 6 0 9 0 1 2 0 1 5 0 1 8 0 2 1 0 2 4 0 2 7 0 3 0 0 3 3 0 3 6 0 3 9 0 4 2 0 4 5 0 4 8 0 5 1 0 5 4 0 5 7 0 T i m e ( m i n u t e s )

(b) No. of Server Over Time - Model

Figure 4.8: No. of Servers Triggered by all Elasticity Rules - Models and Empir- ical

119 4. Elasticity Modeling changed. Our models have demonstrated the ability to capture the relationship u between Uθ and number of servers.

Figure 4.8 also shows that the time when our simulation triggered scale-out actions (i.e., adding servers) does not match the time when scale-out actions happened in the empirical experiments, especially in the ﬁrst scale-out actions of all elasticity rules experiments. This can be mainly reasoned to the limited ability of our models in approximating the CPU utilization spikes, which has not increased CPU utilization beyond the upper threshold as it happened in the empirical experiments. This, as a result, has delayed when a scale-out condition has been satisﬁed and therefore when a server has been added. This pattern has occurred in all elasticity rules experiments.

It can be noted from Figure 4.8 that there were a number of servers added but they have not been removed in both empirical and simulation experiments. This has been because of the eﬀect of scale-in rules which requires the average CPU utilization to remain below 30% for 10 minutes to remove one server. Fur- thermore, a cool-down period of time of 5 minutes have to be applied before evaluating scale-in conditions again. As our experiments have not continued for more than 375 minute, there was not enough time to see the removal of servers that were added by scale-out actions. There were servers added toward the end of the experiment time and they need time to be removed one by one according to the satisfaction of scale-in conditions.

Figure 4.9 shows servers usage cost resulted from both empirical and simulation experiments. In both, empirical and simulation experiments, we have calculated the servers usage cost using Algorithm3. The algorithm takes as input the number of servers data over the time, i.e., the duration of experiments time. The Algorithm computes the total number of hours that each server has been used during the experiment time and then multiply them with hourly charges ($0.08 for small server instances). We have followed AWS charging model in which servers that are successfully started and used for at least one minute are charged for complete hour even if they have not been used for full hour.

120 4. Elasticity Modeling

Figure 4.9: Servers Cost Resulted from all Elasticity Rules - Models and Empirical

Although the simulation has not approximated the exact number of servers and the time when new servers were added, this does not have a significant impact on estimating the server costs. As it can be seen from Figure 4.9, the approximated server usage costs resulting from the simulations of all elasticity rules, are quite similar to the costs resulting from the corresponding empirical experiments. This is mainly because the costs of provisioning are calculated based on the total server hours during which servers were utilized. At the beginning, the models had not trigger scale-out actions at times similar or close to the times when scale-out actions were triggered in the empirical experiments. This was due to the spike effect which delayed when a scale-out action was satisfied and hence a new server was added. However, the models have started to trigger scale-out actions at times which are closer to those in empirical experiments. The total server-hours during which the triggered servers were triggered in both simulations and empirical experiments have converged over the time. Consequently, this has resulted in simulated server costs that are close to those empirical ones.

121 4. Elasticity Modeling

u Figure 4.9 also reveals another important observation that relates Uθ and provisioning cost. Particularly, the server costs of the simulation experiments follow u the trend of provisioning costs in the empirical experiment; i.e., as Uθ increases, the provisioning costs decrease.

4.3.3 Summary

In the previous sections, we have presented a validation of the proposed elasticity models by simulation and empirical means. The simulation is an implementation of our models which emulate how elasticity rules work and it approximate important metrics. The empirical experiments have been carried out with online bookshop application deployed on Amazon EC2 cloud as three-tier Web application architecture. In all experiments, simulation and empirical, we have evaluated the performance of different elasticity rules, with different thresholds, in terms of the average CPU utilization, average response time, number of servers triggered by elasticity rules and the server costs. All these metrics were collected at the application tier where elasticity rules are configured. The resulting data from both simulation and empirical experiments have been statistically analysed and compared. From this comparison, a number of important concluding remarks can be observed:

• Our models and algorithms have been able to emulate the behavior of CPU- based elasticity rules. The models that are based on queuing principles and laws provided approximates for the crucial metrics and parameters that represent the core blocks of elasticity rules and their mechanism. The algorithms have provided the logic to continuously evaluate elasticity conditions and trigger appropriate elasticity actions for scale-out and scale-in operations. The simulation results have demonstrated the ability of our models in approximating the related application and resources metrics that are crucial for evaluating the performance of elasticity rules.

• The simulation results have shown good approximates of the important application and resources metrics when they are compared to the correspond-

122 4. Elasticity Modeling

ing empirical results. The results have demonstrated small diﬀerences between the statistics resulted from simulations and those ones obtained from empirical experiments. The diﬀerence between simulation and empirical data ranges between 1.52% and 6.69% for the average CPU utilization, 11 millisecond and 34 millisecond for the average response time, 1 and 2 servers for the number of servers triggered and $0.08 and $0.24 for the server costs. The main aim of performance modeling is to approximate system behavior and related performance-cost metrics but not the exact behavior or values. Having this in mind, one can see that our models have approximated CPU- based elasticity behavior and corresponding metrics with slight deviation from the real behavior and metrics.

• One of the key observations in the empirical results is the impact of changing CPU thresholds on application and resources’ metrics. The simulation results have demonstrated the ability of our models in predicting such relationship. Speciﬁcally, the simulation showed higher average CPU utilization and average response time as the CPU threshold increases. It also showed decrease in the number of servers and related costs when the CPU utilization threshold increased. These trends match the corresponding ones resulted from the empirical experiments.

• One of the key performance aspects that our models were unable to approximate is the spike effect. This spike effect was resulted when the application workload continuously increased while only one server was serving all incoming requests, i.e., before any scale-out action was triggered. The empirical results show that the spike in the average CPU utilization had influenced when a scale-out was triggered and therefore when the number of servers was changed. Also, this had impacted the response times by causing a spike in the average response time. As our models were unable to approximate the spike effect initially in the average CPU utilization, this had influenced all other metrics. It influenced when a scale-out decision was triggered as the scale-out condition had not crossed the upper CPU utilization threshold. This as a result has influenced the number of servers and server costs. It has also influenced response time as the spike in the average CPU uti-

123 4. Elasticity Modeling

lization had not been captured and change in the number of servers was delayed. These two parameters are core part of the response time formula. Therefore, improving the models to capture the spike eﬀect in CPU utilization will improve its accuracy in approximating the corresponding metrics and approximating the behavior of CPU-based elasticity.

4.4 Use Case Scenario

In this section we demonstrate how our models can be used as a tool to support cloud consumers to perform cost-performance analysis of diﬀerent CPU-based elasticity rules. It should be noted that, the scenario presented in this section is an illustrative one. The scenario has been developed based on real-world experience obtained from work in an enterprise environment.

TrustedCRM is a company that specializes in providing Customer Relation- ship Management (CRM) services for business enterprises. With the advancement of cloud service offerings, TrustedCRM is planning to offer their CRM services as Software as a Service (SaaS) on-demand and based on subscription model. While the offering of CRM as SaaS will create new business opportunities for Trusted- CRM it will also impose some challenges. As the CRM services will be provided based on the subscription model, the access pattern, in terms of request volumes, on these services will vary over time in terms of request volumes. Furthermore, TrustedCRM provides SLA guarantee on the application response time to their customers to ensure response time within certain limits. If TrustedCRM fails to meet the response time SLA, they have to pay a penalty to the customer as a means of compensation for under-performing services. This is will require a dynamic computing infrastructure to meet variable workloads and to ensure that SLAs are met. To achieve this flexibility, TrustedCRM decided to deploy their CRM application on Amazon to benefit from the on-demand elastic computing infrastructure and on hourly charging model. They chose a 3-tier architecture as it is a common architecture for business applications.

124 4. Elasticity Modeling

3 6 0 0 R e q u e s t R a t e 3 3 0 0

3 0 0 0

2 7 0 0

2 4 0 0 e t u n

i 2 1 0 0 m / t

s 1 8 0 0 e u q

e 1 5 0 0 R 1 2 0 0

9 0 0

6 0 0

3 0 0

0 0 6 0 1 2 0 1 8 0 2 4 0 3 0 0 3 6 0 4 2 0 4 8 0 5 4 0 6 0 0 6 6 0 7 2 0 7 8 0 T i m e ( m i n u t e s )

Figure 4.10: Simulated Workload of TrustedCRM Use Case

To enable elasticity in Amazon’s cloud servers, however, TrustedCRM need to define and configure auto-scaling rules. By analysing the historical workloads of their CRM application, TrustedCRM finds that most of the operations are CPU-intensive, which creates load on the application tier. Therefore, the elasticity rules will be mainly defined based on the CPU utilization of the cloud servers at the application tier. They also need to also ensure that the servers are well-utilized, costs are kept to a minimum reduced while the changing number of requests are served and the response time SLA is met. Defining such rules require determining appropriate thresholds and parameters that can satisfy performance and cost objectives. TrustedCRM needs to test different elasticity rules to decide on the thresholds and parameters that can achieve the most appropriate response time SLA, number of servers to be triggered and server usage costs. In the following we will demonstrate how our models can help TrustedCRM in evaluating the impact of elasticity rules with different thresholds on their desired performance and cost metrics.

125 4. Elasticity Modeling

Table 4.4: Parameters and Thresholds Used in the Use Case Experiments Parameter/ Elas- CPU75 CPU80 CPU85 CPU90 ticity Rule u Uθ 75% 80% 85% 90% u Tw 5 min 5 min 5 min 5 min l Uθ 30% 30% 30% 30% l Tw 10 min 10 min 10 min 10 min l Tw 10 min 10 min 10 min 10 min

Application workload are often generated using historical workload traces [136; 137] assume that TrustedCRM has collected workload traces from their historical workloads and represented it as shown in Figure 4.10. We have simulated this workload in Matlab using Poisson and Zipf power-law probability distribution functions. As can be seen from this workload, the number of requests peaks and drops a number of times with diﬀerent magnitudes. These peaks and drops have some sharp increases and decreases over about 13 hours (i.e.,τ = 800minutes), which emulates workload variability.

TrustedCRM needs also to decide on the parameters and thresholds of the elasticity rules and evaluate their impact on important performance and cost metrics. For this purpose, the parameters and threshold values shown in ta- u ble 4.4 are used. The other elasticity parameters are set as follows: Tc = 6 l minutes, Tc = 10 minutes, Tsl = 4 minutes Smin = 1, Smax = 40.

Using our CPU-based elasticity simulation, we ran four separate experiments, each using the varying workload described previously (Figure 4.10) and the set of thresholds of each elasticity rule (CPU75, CPU80, etc) as input. Figures 4.11 and 4.12 summarize key results of the experiments with the four elasticity rules; CPU75, CPU80, CPU85 and CPU90. From the simulation results, TrustedCRM can notice a number of important observations including the following:

• As the upper CPU utilization threshold increases, the response time statistics become slightly higher. This informs TrustedCRM how increasing CPU threshold would inﬂuence the response time and provides. Using the

126 4. Elasticity Modeling

1 8 C P U 7 5 C P U 8 0 C P U 8 5 1 6 C P U 9 0

1 4 ) r e i

T 1 2 n o i t a c i 1 0 l p p A (

s 8 r e v r e S

6 . o N 4

0 0 6 0 1 2 0 1 8 0 2 4 0 3 0 0 3 6 0 4 2 0 4 8 0 5 4 0 6 0 0 6 6 0 7 2 0 7 8 0 T i m e ( m i n u t e s )

(a) Number of Servers over Time at the Application Tier

(b) Servers Usage Cost at the Application Tier

Figure 4.11: Use Case Scenario Simulation Results

127 4. Elasticity Modeling

approximated response time statistics, TrustedCRM can decide on which threshold ensure the satisfaction of response time SLO.

• There were increases followed by decreases in the number of servers that have occurred two times with a higher magnitude in the second time (see Figure 4.11(a)). All elasticity rules resulted in this pattern with different magnitudes that are dependent on the upper CPU utilization threshold. Specifically, the lower the CPU utilization threshold, the higher is the increase in the number of servers. Furthermore, the lower the CPU utilization threshold, the earlier a scale-out triggers and therefore the number of servers over time is higher. This can guide TrustedCRM on how number of servers change over time in response to the workload changes, and hence influencing server costs.

• There were a number of increases and decreases in the number of servers over the duration of the experiment. This was due to the trigger of scale- out and scale-in actions. We have seen more frequent changes in number of servers, in comparison to those in Figure 4.8(a), due to two factors. First, the sharp changes in the workload (Figure 4.10) are frequent and of high magnitudes. Second, the duration of workload changes, and hence experiment time, was long enough to allow addition and removal of servers which had been inﬂuenced by these changes in the workload (see Figure 4.10).

• The servers usage cost has also decreased as the upper CPU threshold increased as shown in Figure 4.11(b). Noticeably, CPU85 and CPU90 have had signiﬁcant decreases in cost when compared to CPU80 and CPU75.

• The decrease of servers usage cost (as shown in Figure 4.11(b)) has a direct impact on the response time statistics. The lower the server costs, the higher are the response time statistics. This demonstrates to TrustedCRM that reducing costs will lead to increasing response times, and hence SLA violations that may arise.

• There is an increasing variability in the response time statistics, CPU90, speciﬁcally, exhibited the highest variability as shown in Figure 4.12. We

128 4. Elasticity Modeling

3 . 6 C P U 7 5 C P U 8 0 3 . 3 C P U 8 5 3 . 0 C P U 9 0 )

s 2 . 7 (

e 2 . 4 m i

T 2 . 1

s 1 . 8 n o

p 1 . 5 s e 1 . 2 R

g 0 . 9 v

A 0 . 6 0 . 3 0 . 0 C P U 7 5 C P U 8 0 C P U 8 5 C P U 9 0

Figure 4.12: Use Case Simulation Results - Average Response Time Statistics at the Application Tier

believe this pattern is relative to the upper CPU utilization threshold which seems to be very sensitive on application response time. This is because a higher CPU threshold delays the trigger of scale-out, adding a server, to a stage in which the system reaches a critical overloaded state. This pushes the application response time to very critical levels until a new server is added as a result of scale-out action. The addition of a new server results in balancing the load of all servers and therefore drops application response quickly to lower levels. As a result, we see such large variations, especially with high CPU thresholds. Our empirical experiments have shown a similar pattern of increasing response time variability relative to the CPU threshold. Therefore, we believe our models have been capable of approximating the impact elasticity parameters or thresholds on performance metrics.

TrustedCRM can make what-if analysis and appropriate decisions based on the above observations. If the cost is the most important criterion for Trusted- CRM then the elasticity rule CPU90 would be the best choice. If maintaining

129 4. Elasticity Modeling response times is the predominant criterion for TrustedCRM (due to the high penalties for violating their SLAs) then CPU75 rule would be a more suitable approach. Another scenario that TrustedCRM might consider is when both, cost and response time, are important criteria that need to be met. Assume that TrustedCRM promised their customers with average response time to be less than 230 milliseconds. Furthermore, assume that TrustedCRM’s internal budget constraints are set to up $7.50 a day. In such case, the rules CPU75, CPU80 and CPU85 all meet the response time objective but only the CPU85 rule meets the cost constraint.

In a similar manner, the simulation results can be used to answer such cost- performance questions to help TrustedCRM to make appropriate decisions about choosing elasticity rules thresholds. Furthermore, the simulation can be used to evaluate the impact of one or more thresholds and/or parameters on such performance and cost metrics. Diﬀerent application workloads can also be evaluated with diﬀerent elasticity thresholds and parameters.

4.5 Discussion

We have motivated the challenges faced cloud consumers in dealing with existing elasticity mechanisms. Particularly, when defining elasticity rules for their cloud-based applications, cloud consumers are challenged with the issue of evaluating the impact of elasticity thresholds on important resource and application’s metrics. The elasticity models and algorithms we have introduced in this chapter provide cloud consumers with a practical tool to help them meeting this challenge. Furthermore, our simulations have demonstrated fairly reliable results in emulating the behavior of CPU-based elasticity. Specifically, the models simulation has captured the trend and the relationships between elasticity thresholds and CPU utilization, response time, number of servers and servers usage cost. In terms of approximating these metrics, our simulations produced statistics that slightly deviate from the empirical statistics, deviation ranges between 2% and 11%. Therefore, the differences between the simulation and the empirical results

130 4. Elasticity Modeling are within acceptable ranges (between 2% and 11%) so that cloud consumers can rely on the simulation results to predict and analyse the impact of elasticity thresholds on the desired performance and cost metrics. The main purpose of modeling and simulation is to approximate the key performance and cost metrics of real systems rather than the exact values. With this in mind, one can conclude that our models have achieved this purpose and have produced fairly reliable results. Having said this, the simulations and the empirical experiments have resulted in a number of observations that worth discussing.

The models showed a weakness in approximating the resources and application metrics under certain circumstances, i.e., when the number of concurrent requests significantly increased while only one server was serving the requests. First, this influenced the approximation the of CPU utilization values at time intervals when only one server was serving all requests. Consequently, this influenced the accuracy of approximating the other metrics (response time, number of servers and servers usage cost) which are dependent on the CPU utilization values. The increase in the number of servers is subject to the satisfaction of CPU utilization condition for a certain interval, and therefore it influenced the hourly usage cost. The number of servers and CPU utilization are key parameters for approximating the response time.

The inability of our models to accurately approximate these metrics can be in- terpreted due to a number of factors. First, queuing theory, particularly M/M/m queue model, considers only certain system parameters in an abstract way, i.e., number of incoming requests and service rates. It, for example, does not consider the request types and mix. This is an important factor as different types of requests are likely to put a different footprint utilization the CPU and it could influence the service rate. Second, the aspect of multi-tenancy, resource sharing, and software and hardware virtualization of offered cloud services also play an important factor. Among the main interests of cloud providers are maximizing resources and reducing operational costs of their cloud infrastructure resources. Therefore, they try to maximize resource sharing among different cloud consumers and provide many cloud servers on the same virtual software and system layers.

131 4. Elasticity Modeling

This especially holds as the details of resource allocation and sharing mechanisms of cloud providers are not visible and cloud consumers do not have control beyond the physical infrastructure and resources conﬁguration. Consequently, cloud consumers cannot determine performance bottlenecks at the infrastructure level and cannot tune system parameters to improve their application performance.

The simulations demonstrated considerable improvements in approximating the desired metrics after the ﬁrst scale-out action is triggered and more servers are added. We attribute this improvement to the distribution of the increasing concurrent requests between multiple servers (in contrast to the initial state of the system). The simulation results, thereafter, showed improvements in approximating the performance and cost metrics. We believe this is because there is dependency between CPU utilization and other metrics, i.e., number of servers, server costs and response time, as we have discussed above. Although, the same factors (discussed above) still hold in the case of multiple servers, but their eﬀect becomes less prominent as the increasing number of concurrent requests is distributed among two or more servers rather than is being served by one server.

The latency in approximating CPU utilization accurately before triggering any scale-out actions is an important factor that is likely to inﬂuence the overall accuracy of all metrics. In all the simulations, the scale-out actions were triggered later when compared to the empirical results. This has resulted in delays when the number of servers changed and therefore impacted the approximated response time values. Therefore, the earlier the CPU utilization can be approximated the less impact would be on the overall accuracy in approximating other metrics.

The elasticity models and algorithms and its simulation and validation have demonstrated its ability to approximately predict the patterns and relationships between elasticity thresholds and performance and cost metrics for cloud-based applications. In addition, our models have approximated the CPU utilization, number of servers, servers usage cost and response time values over time with slight deviation from the empirical values. Both results can be considered reliable to guide cloud consumers to evaluate elasticity rules with diﬀerent thresholds

132 4. Elasticity Modeling without the need to run empirical experiments and testing in real cloud environments. This is especially hold when cloud consumers are interested in approximating the trends and behaviors of important metrics and not to achieve exact values.

As one of the important metrics for cloud consumers is the application-level metrics, e.g., response time, it could be beneﬁcial for cloud consumers to consider scaling based on application metrics. This requires developing application-level elasticity approach and evaluating its performance in terms of the desired metrics for cloud consumers. This is will be addressed in the following chapter.

133 Chapter 5

PERFORMANCE EVALUATION OF SLA-BASED AND CPU-BASED ELASTICITY

In the previous chapter, we have discussed performance modeling and evaluation of CPU-based elasticity which is based on a resource-level metric, CPU utilization. Resource-level elasticity enables cloud consumers to dynamically allocate and release computing resources to meet their application workload changes in a cost-eﬀective manner. Meeting application performance metrics deﬁned in Service Level Agreements (SLAs) is also one of the primary concerns for cloud consumers. Therefore, it is crucial to investigate how cloud consumers can enable elasticity of IaaS cloud based on application SLA metrics. Furthermore, it also becomes important to investigate how well CPU-based and SLA-based elasticity mechanisms perform in terms of meeting desired application SLAs and other performance and cost metrics.

In this chapter, we present an architecture for enabling application-level elasticity of IaaS cloud based on application SLA metric. The design and core components of our SLA-based elasticity are discussed ﬁrst. We also present key

134 5. SLA-based Elasticity algorithms for monitoring and scaling IaaS cloud based on response time SLA percentile. Second, we present our empirical evaluation of different CPU-based and SLA-based elasticity rules in terms of key application and resources metrics that are important for cloud consumers. Furthermore, we present our experimental evaluation of CPU-based and SLA-based elasticity on servers with different capacity profiles and we evaluate the consistency of their performance and its impact on the application and resources metrics.

5.1 IaaS Cloud Elasticity and Application SLAs

As previously explained, SLAs are crucial aspect of Internet-based business applications as they define desired Quality of Service (QoS) properties, e.g., response time, availability, that must be satisfied by the application. QoS properties need to be monitored continuously during application execution to determine their performance levels against desired Service Level Objectives (SLOs) (as illustrated in the example in section 1.2). Based on this monitoring and evaluation, a measure of SLA satisfaction level can be determined. One common way for computing SLA satisfaction level is SLA satisfaction percentile. For example, 95% application response time SLA means 95 percent of the application request must have a response time within desired response time SLO. This can specifically stated as; 95% of all requests monitored every 3 minutes must have response times less than 120 milliseconds.

SLA satisfaction can be utilized as a crucial application metric for enabling SLA-based elasticity. Figure 5.1 illustrates an example of elasticity rules defined based on SLA satisfaction metric. In this example, a scale-out action will trigger to add one server response time SLA remain below 90% for 3 consecutive 1-minute intervals. If response time SLA remains above 90% for 7 consecutive 1-minute intervals, then one server will be removed from the servers fleet (i.e., scale-in action will be executed). After each elasticity action, a waiting period is applied to allow server changes to take effect before evaluating the elasticity conditions again.

135 5. SLA-based Elasticity

Monitor Response Time SLA (RT_SLA) every 1 min.

IF RT SLA < 90% FOR 3 min. Add 1 server of small capacity //Scale-out Wait 10 consecutive 1 min. intervals

IF RT SLA > 90% FOR 7 min. Remove 1 server of small capacity //Scale-in Wait 15 consecutive 1 min. interval

Figure 5.1: Example of an SLA-based Elasticity Rules

Existing elasticity mechanisms offered by IaaS cloud providers are primarily based on resources metrics. This is because IaaS providers have full control of computing infrastructure resources which have common abstraction and interfaces. This allows IaaS providers to configure their resources to collected and monitor appropriate resources metrics that can be used as a basis for defining resource-level elasticity rules. Achieving this with application-level metrics, however, is more challenging for IaaS providers. This is mainly because each application has different SLA obligations and thresholds and it require different configuration and monitoring methods at the application level. This complicates the configuration and design of monitoring and scaling operations of IaaS cloud. Therefore, it becomes the responsibility of cloud consumers to define appropriate elasticity rules, CPU-based or SLA-based, that guarantee satisfying their SLAs.

As we have demonstrated in section 4.3, meeting desired application and resources metrics is bound to a number of parameters and thresholds that need to be specified in elasticity rules. Likewise, this applies to achieving desired levels of SLA satisfaction which can be influenced by one or more elasticity thresholds. Evaluating elasticity rules, CPU-based or SLA-based, can help cloud consumers to choose the appropriate thresholds that lead to satisfaction of application SLA and other desired metrics. However, this does not necessarily guarantee satisfying application SLA at all times. Figure 5.2 shows an empirical evidence of this case. The figure shows the CPU utilization and response time SLA satisfaction

136 5. SLA-based Elasticity

C P U U t i l S L A 1 0 0 %

9 0 %

n 8 0 % o i t c a

f 7 0 % s i t a

S 6 0 %

A L

S 5 0 %

l i t

U 4 0 %

U P

C 3 0 %

2 0 %

1 0 %

0 % 0 1 5 3 0 4 5 6 0 7 5 9 0 1 0 5 1 2 0 1 3 5 1 5 0 1 6 5 T i m e ( m i n u t e s )

Figure 5.2: Empirical Example of SLA Violations with CPU-based Elasticity Rules percentile data resulted from one of our CPU-based elasticity experiments. In this experiment, the elasticity rules were deﬁned at the application tier of TPC- W benchmark (under the same experimental settings discussed in Section 4.3). Speciﬁcally, the scale-out rule was set to add one server when the CPU utilization is above 85% for 5 consecutive 1-minute intervals. Assume that the desired response time SLA is 90% of the requests must have response time within an acceptable range.

Figure 5.2 reveals two key observations. First, The SLA satisfaction had been below the desired level, i.e., 90%, during the time 20-80 minutes. During this period, the CPU utilization exceeded the desired utilization threshold, i.e., 85%, for 5 minutes 3 times and a server was added each time. However, there were fairly signiﬁcant drops in SLA satisfaction during these 3 periods until the CPU elasticity rule was triggered and a new server was added. Second, during the time period 80-140 minute the SLA satisfaction was below the desired level,i.e., 90%.

137 5. SLA-based Elasticity

During this period the CPU utilization threshold had not been breached and no servers were added. These cases demonstrate scenarios when SLA satisfaction is below desired levels with signiﬁcant drops during some time periods. These scenarios become more complicated when cloud consumers need to evaluate the impact of other elasticity thresholds and parameters on SLA satisfaction and other resources and application metrics.

5.2 SLA-based Elasticity Approach

The SLA-based elasticity approach is based on a number of important assumptions that must be clariﬁed. First, the SLA-based elasticity approach is designed for a multi-tier internet application architecture, due to its wide adoption in practice and research. Second, making and enabling SLA-based elasticity decisions of IaaS cloud resources is focused at the application tier. Third, the basis metric for evaluating and making scaling decisions is limited to the response time SLA metric which is measured at the application tier, following the same scope of modeling CPU-based elasticity. Fourth, the approach is independent of IaaS provider and it can be built on top of any IaaS cloud by utilizing appropriate cloud infrastructure services.

The web application, to be monitored and scaled, is deployed on an IaaS cloud and uses computing resources such as cloud servers and network services. This can be any IaaS cloud in which computing resources can be allocated on-demand and following a pay-as-you-go model (e.g., hourly). The application deployment architecture on the IaaS cloud is based on the multi-tier architecture principles. Speciﬁcally, we use a deployment architecture that is commonly represented by a web tier, Application (App.) tier and Database (DB) tier. The functional components of the application logic are deployed on the corresponding tier. At each tier, the corresponding components of the application are deployed, the Web Server (WS), the Application Server (AS) and the Database Server (DS) respectively. The number of servers could vary at each tier. All incoming requests received by the application users are served by these servers by performing certain tasks

138 5. SLA-based Elasticity

Cloud Infrastructure Resources (IaaS)

Cloud-based Application

Web Server k App. Server m DB. Server n

Web Server 1 App. Server 1 DB. Server 1

WS Run-time Metrics WS Run-time Metrics WS Run-time Metrics

Monitor Monitor Monitor WS Run -time AS Run- time DS Run-time

Metrics Monitor Metrics Monitor Metrics Monitor

Web Tier App. Tier DB Tier

SLA-based Application SLA Run-time Application Scaling Actions Manager Monitoring Data

IaaS Cloud Services

Metrics SLA Elasticity Resources Application SLA Monitoring Statistics Controller APIs

SLA-based Elasticity Logic Auto-scaling

Figure 5.3: Architecture of Proposed SLA-based Elasticity Approach based on the deployed application logic and server software. The design decisions of multi-tiers and the physical distribution of WS, AS and DS on separate cloud servers is crucial for performance monitoring and scaling of individual tiers as will be explained next.

5.2.1 Design

Based on the above assumptions, we have designed the architecture of our SLA- based elasticity approach as shown in Figure 5.3. The architectural design is primarily developed based on a number of components which are highlighted in green and/or dashed lines. The other components are part of an IaaS cloud and services. In the following, we describe the main components, its functionality and design decisions.

139 5. SLA-based Elasticity

Application Monitor

An application monitor is an independent software component that continuously measures running application functional components, processes the metrics and store them into a repository. The metric of interest in the thesis context is the response time of all requests at the application tier. Cloud consumers need to conﬁgure the monitors at each tier with the desired metrics to be measured and the time interval length during which requests are measured and recorded in the repository. These monitors must be deployed and conﬁgured on all servers at each tier, including the servers to be added by auto-scaling actions. Each monitor is called according to the tier convention name, e.g., AS Run-time Monitor for the application tier (see Figure 5.3).

The logic of how these monitors work at each tier is quite similar. Most modern systems and servers employ performance logging functions and plug-ins that can be customized and extended. Often, these functions and plug-ins log performance data desired by the cloud consumer into a database. Each monitoring component starts collecting desired metrics when a server boots up and stops when the server is shut down. In addition to collection of metrics, each monitoring component collects additional data about the requests received and processed by the application and where they have been served. Such data is necessarily to link the collected metrics with the appropriate request types and the server and tier at which they are served. Knowing the request types is important to evaluate each request against its desired SLO, as different request types have different SLOs. Knowing the tier and the server details are also crucial for making appropriate elasticity decisions. Particularly, it helps in determining the tier and the server performance at which bottlenecks occur and deciding where elasticity actions should be triggered. This design decision fits well with the principles of elasticity mechanisms in which auto-scaling policies or rules are configured for a group of homogenous servers that reside on the same tier.

140 5. SLA-based Elasticity

Application Monitoring Repository

The main purpose of this repository is to maintain structured records of the application metrics, collected by all the monitors in a database as shown in Fig- ure 5.3. The metrics for each tier is maintained in a separate database instance. This ensures that the SLAs can be evaluated at each tier separately and scaling decisions are also made appropriately for each tier. The records of these database instances contain details of the requests received by the application (e.g., URI and date-time stamp), server details (e.g., server name and IP address) and user session details (e.g., user name and session ID). This information is crucial for grouping requests and its metrics based on desired criteria (e.g., request type, date-time, server IP address), and to compute the SLA satisfaction of desired performance metrics, i.e., response time SLA in this context. The repository is designed as a centralized repository so that the metrics data collected by all servers are synchronized in real-time every desired time interval. This can help in grouping and evaluating metrics from multiple servers synchronously and in real-time at every desired time interval.

Application SLA Manager

The SLA evaluator is an independent software component that carries a central role in enabling SLA-based elasticity. It performs the following primary functions:

• It retrieves a set of metrics records every predeﬁned time interval from the application monitoring repository. The time interval represents the period during which the desired SLA is monitored. It can be deﬁned by the cloud consumer to suit their needs.

• It groups these metrics records based on desired criteria such as request type and time-stamp. The criteria are dependent on the component conﬁg- urations that have to be deﬁned by the cloud consumer.

• For all gathered records, it evaluates the corresponding metrics of each request against predeﬁned SLOs and compute SLA and requests statistics for

141 5. SLA-based Elasticity

each time interval. The SLOs are those performance objectives for each request type deﬁned by the cloud consumer. These SLA statistics include the percentage of requests that satisfy their SLOs for all requests and for each request type. The SLA evaluator stores the resulting SLA and corresponding requests statistics in a separate repository.

Application SLA Statistics Repository

This repository maintains all SLA and request statistics as shown in Figure 5.3. The records are structured based on the time interval during which request and SLA statistics are computed. Therefore, there is a record for every time interval. It has been designed as a separate repository to reduce any potential performance bottlenecks that could arise from multiple access from diﬀerent sources.

SLA Elasticity Controller

The main function of the SLA elasticity controller is to enforce elasticity decisions and their execution based on desired SLA statistics. It is a software component that retrieves desired SLA statistics based on which elasticity conditions and actions can be decided. The evaluation and actuation of such SLA-based elasticity decisions can be achieved either independently or through integration with IaaS cloud services.

Independent Auto-scaling: in this mechanism, the SLA controller fully controls the evaluation of elasticity conditions and actions, and uses cloud resource APIs (Application Programming Interfaces) to actuate them. At every time interval, the SLA controller evaluates the SLA statistics, after retrieving them from the SLA statistics repository, against the desired SLA levels deﬁned by the cloud consumer. Here, the elasticity rules structure presented in Figure 4.1, forms the core component for the elasticity conditions and actions but with the SLA satisfaction as the metric to be monitored. An example of SLA-based elasticity rules is shown in Figure 5.1. The logic to continuously evaluate elasticity conditions

142 5. SLA-based Elasticity and determining when elasticity actions should triggered is similar to the logic presented in Algorithms1 and2. The main diﬀerence is the use of SLA satisfaction as the evaluation metric instead of CPU utilization. The cloud consumer should set the desired values for the key parameters including the upper and lower SLA thresholds, monitoring time window, cool-down time, and server lag time. The actuation of the scale-out or scale-in that may result from this continuous evaluation can be then enabled through the APIs provided by the IaaS cloud or third-party cloud management provider. The logic of elasticity rules and API integration have to be implemented within the SLA elasticity controller.

Integration with Auto-scaling Services: this mechanism is an integration with the auto-scaling services provided by an IaaS provider or a third-party cloud management provider. To enable evaluation and actuation elasticity conditions and actions, the cloud consumer needs to define and configure auto-scaling policies using the services offered by the cloud provider. To achieve this, the SLA elasticity controller has to publish the desired SLA statistics to a cloud monitoring service using appropriate APIs provided by the cloud provider. Based on these published SLA metrics, the cloud consumer needs to configure auto-scaling services to define the parameters and thresholds for the elasticity conditions and actions. These rules will call the resources APIs to enable the allocation or de- allocation of computing resources when the defined conditions are held. These components are highlighted with dashed lines as it requires some configuration from the cloud consumer (as shown in Figure 5.3).

The resulting actions, regardless which mechanism is adopted, could be either scale-out or scale-in operations based on the continuous evaluation of SLA satisfaction as deﬁned in the elasticity rules. Once executed through the resource APIs, these actions trigger changes in the computing resources at the appropriate tier.

143 5. SLA-based Elasticity

5.2.2 SLA-based Elasticity Algorithms

The logic of SLA-based mechanism works is mainly dependent on the key components presented in Figure 5.3. In the above paragraphs, we have explained the functionality and architectural design decisions of these components. Based on this, we present the algorithms that form the logic behind these components. As previously explained, the algorithms focus on scaling the application tier out and in based on the response time metric.

Algorithm4 presents the main logic of the AS runtime metrics monitor. Basi- cally, the algorithm starts with establishing a connection to the application server metrics database asReqsMonitoringDB in which request details and metrics are to be stored. It then loops until the program is halted or the server is shut down. Within this loop, a line is first read from the request metrics log file reqsMet- ricLogFile. This file is continuously updated with request details and collected metrics the logging components, each time a request is served by a server being monitored. If the end of file is reached, then a waiting time of predefined sync- Time seconds is applied to synchronize reading with logging. The syncTime is set by the cloud consumer. Once a log line is read, it is parsed to extract desired request and metrics details including request URI, type, date-time, response time, server IP address and user session Id.

The extracted details are stored in a structured record reqMetricDetails and added to a temporary buffer. When the temporary buffer size reaches the maximum limit set by the cloud consumer, then records accumulated in the req- MetricTempBuffer and inserted into the application server monitoring database (asReqsMonitoringDB) and the temporary buffer is reset. The accumulation of monitoring records and inserting them as block of records, rather than one record at a time, is to reduce write overhead on the monitoring repository so other components can read/write data from/to it efficiently.

144 5. SLA-based Elasticity

Algorithm 4 Response Time Monitor Logic at the Application Tier Initialize: syncTime, reqsMetricLogFile, asReqsMonitoringDB, maxBufferLength Establish connection to asReqsMonitoringDB while (true) do logLine = Read(reqsMetricLogFile) if (logLine = EndOfF ile) then reqMetricDetails = Prase (logLine) Add reqMetricDetails to reqMetricTempBuffer if (reqMetricBufferLength == maxBufferLength) then Insert reqMetricTempBuffer into asReqsMonitoringDB Reset reqMetricTempBuffer end if else Wait syncTime end if end while

Algorithm5 describes the logic of the SLA manager component. Speciﬁcally, the Algorithm details how the response time SLA is continuously monitored at the application tier. There are two main inputs for the algorithm; SLOs for all requests and the SLA monitoring time interval (timeInterval). The SLOs are an array of response time targets that must be met by all requests all the time. SLOs may vary depending on the request type as each request of a particular type has its own code execution path. The timeInterval is the duration of time for which the response time SLA will be monitored. The algorithm then initializes the key variables with appropriate default values. A connection to the application server monitoring database is then established to read the request and metrics records. Another connection to the SLA statistics database is also created to write response time SLA and request statistics. The current date-time is then assigned to the variable startTime which is used as a starting time for retrieving the metrics.

145 5. SLA-based Elasticity

Algorithm 5 Application SLA Manager Logic - Response Time Input: SLOs[ ], timeInterval Initialize: asReqsMonitoringDB, slaStatisticsDB, syncTime, respTimeSLA, reqStatiscts Establish connection to asReqsMonitoringDB Establish connection to slaStatisticsDB startT ime = GetCurrentT ime() while (true) do reqMetricRecords = GetReqRecord (asReqsMonitoringDB, startTime, startTime+timeInterval) timeIntervalLength = GetTimeIntervalLength (reqMetricRecords) if (reqMetricRecord 6= NULL||timeIntervalLength ≥ timeInterval) then satCount = 0 for i = 1 → reqMetricRecordsSize do reqT ypeIndex = ParseRequestType (reqMetricRecords[i]) if (reqMetricRecords[i].respT ime ≤ SLOs[reqT ypeIndex]) then satCount = satCount + 1 end if end for respT imeSLA = satCount/reqMetricRecordsSize reqStatistics[] = ComputeStatistics(reqMetricRecords) Insert respTimeSLA, reqStatistics[] into slaStatisticsDB startT ime = startT ime + timeInterval else W aitsyncT ime end if end while

The algorithm then iterates until the SLA manager is halted. Inside this main loop, the algorithm performs a number of operations to compute and store the response time SLA statistics as follows. It ﬁrst retrieves the set of all request records starting within the time interval value startTime with a length of timeInterval

146 5. SLA-based Elasticity seconds. If no records are returned or the time length of the returned records is less than the timeInterval, then a waiting time of length syncTime is applied to get records in the current interval. This is to ensure that response time SLA is computed for all requests metrics with the speciﬁed time interval and no records are shifted to the next interval. For all request records that are retrieved from the application server monitoring database, the response time of each request is evaluated against its SLO to compute the number of requests that have satisﬁed its SLOs. After the loop iterates through all the request records, the percentage of requests that satisfy its SLOs and statistics about requests types are calculated for the current time interval. The resulting statistics are then inserted into the SLA statistics database for the current time interval. To process the next group of metrics, the startTime variable is shifted by timeInterval seconds before the next iteration in the loop starts. Once the response time SLA statistics are stored in the SLA statistics repository, SLA-based elasticity can be enabled through either integration with available auto-scaling mechanism, or independently as explained before.

Either way, the elasticity controller plays a central role here. In case of integration with existing auto-scaling service, the SLA controller publishes the response time SLA statistics every time interval into a monitoring service. These metrics are then used as a basis to evaluate and trigger appropriate elasticity conditions and actions based on rules that have to be conﬁgured by the cloud consumer using existing auto-scaling policies. In case of independent auto-scaling (by the SLA controller), the same logic of CPU-based elasticity algorithms1 and2 in chapter4 is used but using the response time SLA as the basis of the evaluation.

5.3 Experimental Evaluation

In this section, we present our experimental evaluation of SLA-based and CPU- based elasticity approaches. The main goal of this evaluation is to evaluate performance of CPU-based and SLA-based elasticity approaches under real IaaS cloud scenarios. The ﬁrst dimension of this aim is concerned with evaluating how

147 5. SLA-based Elasticity well CPU-based and SLA-based elasticity rules with different thresholds perform in terms of resources and application metrics. We aim here to provide empirical evidence on how CPU utilization and SLA satisfaction thresholds, which are used as basis for triggering auto-scaling actions, can influence achieving performance and cost metrics from a cloud consumer perspective. The second dimension is to evaluate performance of CPU-based and SLA-based elasticity on different cloud server instances. The focus here is on analysing the impact of server capacity profile on performance of both elasticity approaches from a cloud consumer perspective. The third dimension is focused on performance consistency that both elasticity approaches can achieve on different server capacity profiles. Here, we focus on analysing performance variability that SLA-based and CPU-based elasticity can exhibit on servers with different capacity profiles.

We ﬁrst describe the design of our experiments and justify our design decisions and technology choices. We then describe how we carried out our experiments to address each of the aims described above. This is followed by data analysis and discussion of the results in terms of the three aims described above.

5.3.1 Experiment Design

The details of the environment setup which was used to carry out our experiments are described as follows.

Application Benchmark

Similar to our elasticity modeling evaluation, we have used TPC-W [131], the online bookstore application, as a representative Internet business application to carry out our SLA-based and CPU-based elasticity evaluation. The details of TPC-W speciﬁcations and implementation are explained in section 4.3.1.

148 5. SLA-based Elasticity

Table 5.1: Response Time SLOs for TPC-W Requests at the Application Tier Request Type SLO Threshold Home 149 ms Item Details 81 ms New Products 1362 ms Best Sellers 1276 ms Search 143 ms Execute Search 1051 ms Shopping Cart 132 ms Buy Conﬁrm 105 ms

Deployment Architecture

Following the same architecture principles of elasticity modeling empirical experiments, we have deployed the online bookstore on Amazon EC2 cloud with 3-tier architecture (as shown in Figure 4.4). The details of the deployment architecture and hardware and software speciﬁcations are described in section 4.3.1.

Determining SLOs

As previously explained, determining SLA satisfaction levels requires defining SLO thresholds against which collected metrics are evaluated (see section 1.2). A SLO threshold specifies a value which is used to measure whether a performance objective is met or not. In this thesis, we needed to define response time SLO thresholds that will be used to evaluate the performance of all requests and to compute SLA satisfaction metric.

We have followed an experimental approach to determine the response time SLOs of all TPC-W request types as shown in table 5.1. We used the 3-tier deployment architecture of TPC-W on Amazon EC2 cloud (see Figure 4.4) for running a number of preliminary experiments. First, we gradually increased the number of concurrent user sessions, using TPC-W user emulation software, until the system started dropping new requests. This has resulted in the maximum number of concurrent users who can be served by our system without dropping

149 5. SLA-based Elasticity requests. In our case this represents 65 concurrent users. Using the same deployment setting, we then ran ﬁve separate experiments and generated workload of 65 concurrent user sessions. Each experiment has been run for one hour. We then collected the resulted response time data of all requests at the application tier. We then computed the 90th percentile of response time of each request type. From the 90th percentiles resulted from the ﬁve experiments, we computed the average of the response time values for each request type. The resulted values represent SLO thresholds of all request types as shown in table 5.1. These SLO thresholds were used for evaluating response times collected during system execution and for calculating the percentage of SLA satisfaction of all requests at every time interval.

Elasticity Rules

To evaluate the performance of resource-based and SLA-based elasticity, we needed to deﬁne elasticity rules of each approach. Here, we used two categories of elasticity rules; one is based on CPU utilization metric (CPU-based elasticity rules) and the other is based on response SLA metric (SLA-based elasticity rules). We chose CPU utilization metric as a load-dependent metric which is commonly used in performance and monitoring studies [53; 134; 138]. Similarly, response time SLA is a widely used metric for measuring the performance of web applications [23; 25; 25; 92; 130; 139].

Table 5.2 describes the CPU-based and SLA-based elasticity rules that were used in our experiments, using the elasticity rules structure presented in section 4.1. The naming convention of these rules are based on to the metric name and the threshold value (e.g., CPU75, SLA90, etc). The CPU-based elasticity rules vary in CPU utilization threshold value (i.e., 75%, 80% and 85%). The threshold values were chosen based on Haines [134] performance testing guidelines. He stated that as CPU utilization increases from 80% to 95%, the system begins to thrash, the response time increases and requests are dropped. One of the primary aims of cloud consumers is to eﬃciently utilize cloud servers triggered by auto-scaling rules while ensuring desired application performance levels.

150 5. SLA-based Elasticity

Therefore, a CPU utilization below 75% will likely lead to under-utilized servers scenario as all servers will reach around that threshold. Meanwhile, choosing a CPU utilization above 85% will likely put server utilization to critical stage where system performance will start to degrade and inﬂuence application performance.

Table 5.2: Elasticity Rules Set at the Application Tier Used in All Experiments Name Elasticity Rules CPU-based Elasticity Rules IF CPUUtil >75% FOR 5 minutes (scale-out) Add 1 server of small capacity, Wait 7 consecutive 1 min. interval CPU75 IF CPUUtil ≤ 30% FOR 5 minutes (scale-in) Remove 1 server of small capacity, Wait 10 consecutive 1 min. interval

IF CPUUtil >80% FOR 5 minutes (scale-out) Add 1 server of small capacity, Wait 7 consecutive 1 min. interval CPU80 IF CPUUtil ≤ 30% FOR 5 minutes (scale-in) Remove 1 server of small capacity, Wait 10 consecutive 1 min. interval

IF CPUUtil >85% FOR 5 minutes (scale-out) Add 1 server of small capacity, Wait 7 consecutive 1 min. interval CPU85 IF CPUUtil ≤ 30% FOR 5 minutes (scale-in) Remove 1 server of small capacity, Wait 10 consecutive 1 min. interval SLA-based Elasticity Rules IF SLASat <90% FOR 5 minutes (scale-out) Add 1 server of small capacity, Wait 7 consecutive 1 min. interval SLA90 IF SLASat ≥ 90% FOR 5 minutes (scale-in) Remove 1 server of small capacity, Wait 10 consecutive 1 min. interval

IF SLASat <95% FOR 5 minutes (scale-out) Add 1 server of small capacity, Wait 7 consecutive 1 min. interval SLA95 IF SLASat ≥ 95% FOR 5 minutes (scale-in) Remove 1 server of small capacity, Wait 10 consecutive 1 min. interval

151 5. SLA-based Elasticity

The objective of our scale-out rules is to allocate a new server of small size 1 when the CPU utilization of the application tier reaches this range. The load balancer delegates new requests to the new servers, when they are online, thereby reducing load on the existing servers. We also consider a CPU utilization less than 30% as under-utilised servers. To reduce costs and improve return on investment, the scale-in rule de-allocates a server with utilisation below 30% for 5 continues time intervals of 1 minute.

In the SLA-based elasticity rules, we used SLA satisfaction percentile as monitored metric for evaluating elasticity conditions. It is based the response time of incoming requests at the application tier (the scope of our elasticity performance evaluation). As shown in table 5.2, we have chosen 90% and 95% thresholds for the SLA satisfaction metric. A study in enterprise application performance [140] showed that monitoring the 90th response time SLA percentile would be suﬃ- cient to maintain appropriate application performance without anomalies. Fur- thermore, TPC-W speciﬁcations [132] require that 90% of requests have to meet the response time SLOs. Therefore, we use 90% as one of the SLA thresholds for triggering scale-out and scale-in actions. The 95% threshold provides a more stringent SLA threshold for testing durability and reaction times of the elasticity rules. The choice of 95% is to test if higher SLA satisfaction levels can be achieved while maintaining servers cost and CPU utilization similar to those with a 90% threshold. In both rules, a server will be de-allocated when SLA satisfaction goes above the same thresholds, 90% or 95%, for 5 continues time intervals of 1 minute.

For CPU-based elasticity rules, we have chosen 75%, 80% and 85% values for CPU utilization upper threshold. In practice, cloud consumers try to carefully choose CPU utilization values that lead to eﬃciently utilize their resources while maintaining desired application performance [45]. Therefore, we have chosen CPU utilization values in accordance with recommended performance design and testing principles [134] and empirical studies such as [45]. A 75% utilization value is minimum enough to allow system respond early before a server gets over-

1m1.small capacity: 1.7GB memory, 1 virtual core (Xeon processors of 1.7GHz), and moderate network I/O Performance

152 5. SLA-based Elasticity loaded. On the other hand, 85% would allow better CPU utilization but it will likely get close to overloaded server until a new server is added [134].

For all rules, we used a cool-down period of 7 minutes after a scaling action is executed to allow system changes to take place. This is to ensure that elasticity conditions are evaluated while the system is not in transition stage. For scale-in rules, we used 10 minutes cool-down period to ensure an SLA satisfaction is maintained for appropriate time interval. As previously discussed, the scope of our performance evaluation is on horizontal scaling (scale-out and scale-in) at the application tier. Horizontal scaling is widely used in practice as it provides transparent fail-over strategy when a server becomes unresponsive or fails [40].

Metrics and Data Collection

The core metrics in our experiments are CPU utilization and SLA satisfaction which are used as a basis for enabling elasticity rules. They also provide useful measures of how well an elasticity rule performs. In addition to these metrics, we collected end-to-end response time, percentage of served requests and server usage costs as primary metrics for evaluating the performance of elasticity rules. The details of these metrics and how they are measured and collected are explained as follows.

• Average CPU Utilization: this represents the average CPU utilization of of all servers at the application tier, the focus on our auto-scaling rules. We have configured an Amazon cloud service, called CloudWatch [78], to collect CPU utilization statistics every 1 minute time interval. These statistics include minimum, maxim and average CPU utilization at the application tier. In all CPU-based experiments, we configured Amazon Auto Scaling service [37]to scale-out and scale-in the application tier based on the computed average CPU utilization metric. The CPU-based elasticity rules we have used in configuring the Auto Scaling service are those defined in table 5.2. In all experiments, we have extracted the average CPU utilization data from CloudWatch to use them in the performance analysis.

153 5. SLA-based Elasticity

• SLA Satisfaction Percentile: we used our SLA-based mechanism to compute and monitor SLA satisfaction metric at every 1 minute time interval (see section 5.2 for further details). The SLA satisfaction metric has been computed based on the response time of all requests being measured at each server at the application tier. The SLA satisfaction has been computed as percentile every 1 minute interval (as explained in algorithm5). In all SLA-based experiments, we have used this SLA satisfaction percentile for configuring Amazon Auto Scaling service to scale-out and scale-in the application tier. The SLA-based elasticity rules used in the experiments are those defined in table 5.2. In all experiments we collected SLA satisfaction data from the Application SLA Statistic repository (shown in Figure 5.3). The SLA satisfaction percentile provides measure to evaluate the impact of changing elasticity thresholds at the application tier at which elasticity rules are defined.

• End-to-end Response Time: this represents the overall time a request spends at all the application tiers. It measures the elapsed time from the moment a request is received by the web tier and the moment a response is sent back to the end user. In all experiments, we conﬁgured Amazon load balancer [73] (in our 3-tier architecture) to monitor and collect the response time of all requests as seen from the web tier. This was conﬁgured at every monitoring time interval, 1 minute in our experiments. We extracted the end-to-end response time data from Amazon CloudWatch for analysis. The end-to-end response time provides an overall performance measure on the entire application.

• Percentage of Served Requests: this measures the percentage of requests that are served successfully at every 1 minute time interval. This excludes all requests errors that may occur at any of the tiers including (a) requests that are not served due to error responses from application servers (server error 5XX series and client error 4XX series) and (b) errors at the load balancer due to unregistered or unhealthy application servers, or the request rate exceeding the load balancers current capacity. In all experiments, we conﬁgured Amazon CloudWatch to collect these request errors and compute

154 5. SLA-based Elasticity

the percentage of successfully served requests every 1 minute time interval. We extracted the percentage of served requests data from Amazon Cloud- Watch for analysis.

• Servers Usage Cost: these costs result from allocating and de-allocating servers at the application tier due to scale-out and scale-in actions. We computed these costs based on the server-hours of all servers instantiated in each experiment. We used instances of small capacity (m1.small) for some group of experiments and medium (c1.medium) capacity for other group of experiments. Both server instances were from US East N. Virginia region and hence, the hourly charges were $0.08 for small and $0.16 for medium Linux servers at the time of experiments. We have extracted usage times of all servers during each experiments to compute incurred server usage costs. In regard to Amazon CloudWatch metrics (resource and user- deﬁned), charges are computed on per-metric-month basis, and therefore such charges were not incurred in our experiments as we use metrics for a few hours (not a whole month). The server usage costs were computed every 1 hour, the common billing cycle of most of IaaS cloud provides. In case a server was used for less than one hour, then full hour charges was applied, following charging mechanisms employed by IaaS cloud providers. The server usage costs provide a measure of monetary value that elasticity rule thresholds incur, and therefore it helps analysing the cost-eﬀectiveness of each elasticity rule.

5.3.2 Evaluation Methodology

In this section we describe the approach we have followed to conduct our experiments to evaluate the performance of SLA-based and CPU-based elasticity approaches. We ﬁrst describe how application workload has been generated for all experiments. This is followed by the details of how we run diﬀerent experiment sets to address the research objectives presented earlier. We then introduce how data is collected and presented for analysis.

155 5. SLA-based Elasticity

2 6 0 0 R e q u e s t R a t e 2 4 0 0 2 2 0 0 2 0 0 0

e 1 8 0 0 t u

n 1 6 0 0 i m /

t 1 4 0 0 s

e 1 2 0 0 u q

e 1 0 0 0 R 8 0 0 6 0 0 4 0 0 2 0 0 0 0 1 5 3 0 4 5 6 0 7 5 9 0 1 0 5 1 2 0 1 3 5 1 5 0 T i m e ( m i n u t e s )

Figure 5.4: Generated Workload for TPC-W Application

As previously explained, TPC-W user emulation [131] is based on precise user behavior model and web interaction mixes. This emulation program requires generating a workload density, i.e., variation of number of concurrent users over time. In all experiments, we have generated workload density by using Power Law and Poisson distribution functions, which have been used in the literature to generate web application workloads [33; 103]. These functions have been selected based on the available evidence in the literature which shows that requests of web applications follow a Power-law distribution [141; 142]. It is also known in system and performance modeling that the arrival process of requests follow a Poisson process [124].

In all experiments, the number of concurrent user sessions has been generated using Zipf function (a type of power-law function). The input parameters for this function are (a) 65 user sessions which represents the maximum number of user sessions that can be served by a cloud server of small size without dropping any request and (b) 0.05 as exponent for changing number of user sessions. The

156 5. SLA-based Elasticity inter-arrival time (time to wait before generating new number of concurrent user sessions) has been generated using Poisson function with 7 minutes as a mean value. We fed the number of concurrent user sessions and inter-arrival times to the TPC-W emulation program over a 150-minute period of time. Figure 5.4 shows the resulting workload represented in terms of request rate per minute.

To address the three aims that have been previously described, we ran three experiment sets each of which with particular comparison and analysis objectives. Each experiment set was conducted as follows:

Evaluating Performance of SLA-based and CPU-based Elasticity

In this group of experiments, we have run five experiments each of which with the CPU-based and SLA-based elasticity rule sets described in table 5.2 (i.e., CPU75, CPU80,...SLA95). We have repeated each experiment three times following experimental guidelines of performance evaluation in practice [134; 143]. In each experiment, we have configured corresponding elasticity rules at the application tier of the TPC-W application which was deployed on Amazon cloud. Further- more, we have configured CPU-based elasticity rules using Amazon Auto-Scaling and CloudWatch services. For SLA-based elasticity rules, we used our implementation of SLA-based elasticity architecture and algorithms(figure 5.3) integrated with Amazon Auto Scaling and CloudWatch services to enable SLA-based Auto- Scaling. In all experiments, we fed the workload that was generated as described in Figure 5.4. In each experiment, we collected the resources and application metrics as described in section 5.3.1 (metrics and data collection). To evaluate the performance of tested elasticity rules, we carry on trade-off analysis in terms of all collected metrics. We focused on CPU utilization and SLA satisfaction as primary metrics for evaluating the performance of each elasticity rule. We also analysed how changing CPU utilization and SLA satisfaction thresholds, as defined in corresponding elasticity rules, influence the overall SLA satisfaction and CPU utilization at the application tier. In addition, we also consider the impact of change elasticity thresholds on other metrics including server usage costs and percentage of successfully served requests.

157 5. SLA-based Elasticity

Evaluating Performance SLA-based and CPU-based Elasticity on Dif- ferent Cloud Servers

In the second group of experiments, we have chosen two of the best performing elasticity rules sets; one from SLA-based rules and the other from CPU-based rules. The aim here is to evaluate the performance of CPU-based and SLA- based elasticity on cloud servers with different capacity profiles. Similar to the experiments described above, we configured the chosen elasticity rules at the application tier. For each elasticity rules set, we have run two sets of experiments by changing the servers capacity at the application tier. The first experiment set was with cloud server instances of small capacity (m1.small) which has a balance of compute, memory, and network resources [144] as shown in table 5.3. The second experiment set was with cloud server instances of medium (c1.medium) capacity which is optimized for compute processing [144]. The key difference between the two server instances is that c1.medium instance has double CPU processing power of m1.small instance (see table 5.3. We have chosen medium server instance as it has high CPU capacity that matches the TPC-W Browsing workload profile requirements (95% of the Browsing profile web interactions are Browse operations which stress the application tier). We ran each experiment set three times using the generated workload shown in Figure 5.4. In all experiments, we collected all the metrics described in section 5.3.1 (metrics and data collection).

We analysed the collected metrics and discussed how SLA-based and CPU- based elasticity rules performed on the two cloud servers profiles; m1.small and c1.medium. We, specifically, analysed how servers capacity influenced the performance of SLA-based elasticity and CPU-based elasticity. We have carried this analysis in terms of the metrics described previously namely CPU utilization, SLA satisfaction, end-to-end response time percentage of served requests and server costs.

158 5. SLA-based Elasticity

Table 5.3: Server Capacity Proﬁles Used in Second and Third Experiment Sets Server Instance Capacity Speciﬁcations m1.small 1.7GB memory, 1 virtual core (Xeon processors of 1.7GHz), and moderate network I/O Performance

c1.medium 1.7GB memory, 2 virtual cores (Xeon processors of 1.7GHz), and moderate network I/O Performance

Evaluating Consistency of Performance of SLA-based and CPU-based Elasticity

This group of experiments focus on analysing how consistent is the performance of SLA-based and CPU-based elasticity. We carry out this analysis on the same two server capacity profiles described in table 5.3. We have run group of experiments similar to the second group of experiments (described above), but at different times. Particularly, we have run three experiment sets each of which with the same CPU-based and SLA-based elasticity rules but at different days of the week. Each experiment was also run in different time of the day. The same workload shown in Figure 5.4 was used in all experiments. We analysed how consistent was the performance of each elasticity rules set on both small and medium server profiles. The analysis was carried out in terms of CPU utilization and SLA satisfaction metrics which are core performance indicators. The main aim here is to investigate how the reported performance variability of cloud servers [51; 52; 145] may influence the performance of elasticity rules.

In all experiments, we have used box-and-whisker (or box plot) method to summarize the resulted SLA satisfaction and CPU utilization data. Box plots provide a robust way to show important statistics of data points and its distribution at glance [135]. These statistics are the minimum (Q0), the lower quartile (Q1 = 25th percentile), the median (Q2=50th percentile), the upper quartile (Q3=75th percentile), the maximum value (Q4) as well as the mean of all data points. The higher quartile values the better the results. In addition, we computed the end-to-end average response time seen from the load balancer at the

159 5. SLA-based Elasticity web tier. We have also computed total server costs and the percentage of requests served successfully (at the application tier where elasticity rules are applied). The results of our experiments are presented in the following sections.

5.3.3 Results and Analysis

In this section, we present the results of all elasticity rules experiments we have carried out as described above. Based on these results, we discuss and analyse the performance of SLA-based and CPU-based elasticity approaches. The discussion is focused on the three aims of evaluating SLA-based and CPU-based elasticity as described in section 5.3 and detailed in section 1.4. The results analysis and discussion are organized into three experiment categories, each of which relates to one aim, as described above in section 5.3.2.

Evaluating Performance of SLA-based and CPU-based Elasticity

Here we present and analyse the results of the first experiment set with the aim of evaluating SLA-based and CPU-based elasticity rules in terms of different performance-cost metrics as describe in section 5.3.1. We, first, focus on analysing how well both elasticity approaches perform in terms of SLA satisfaction and CPU utilization as primary metrics. Then, we analyse how they perform in terms of other metrics namely, end-to-end response time, percentage of served requests and number of servers and server usage costs.

Figure 5.5 shows the box plots of SLA satisfaction and CPU utilization data resulted from SLA-based and CPU-based elasticity rules experiments. As it can be noticed from Figure 5.5(a), SLA-based elasticity rules performed slightly better than CPU-based elasticity rules in terms of SLA satisfaction. In particular, SLA90 and SLA95 rules resulted in slightly higher mean and median SLA satisfaction compared to those values resulted from CPU75, CPU80 and CPU85 rules. We can also see higher Q0-Q4 statistics for both SLA90 and SLA95 which mean better SLA satisfaction. From both CPU and SLA elasticity rules, we can notice that there is a relationship between the CPU and SLA thresholds and SLA

160 5. SLA-based Elasticity

1 0 0 % C P U 7 5 C P U 8 0 9 0 % C P U 8 5 S L A 9 0 8 0 % S L A 9 5

n 7 0 % o i t

c 6 0 % a f s i t 5 0 % a S

A 4 0 % L S 3 0 % %

2 0 %

1 0 %

0 % C P U 7 5 C P U 8 0 C P U 8 5 S L A 9 0 S L A 9 5 E v a l u a t e d E l a s t i c i t y R u l e s

(a) SLA Satisfaction Box Plot - CPU-based Vs. SLA-based Elasticity

1 0 0 % C P U 7 5 9 0 % C P U 8 0 C P U 8 5 S L A 9 0 8 0 % S L A 9 5 7 0 % n o i

t 6 0 % a z i l i

t 5 0 % U

U 4 0 % P C 3 0 % %

2 0 %

1 0 %

0 % C P U 7 5 C P U 8 0 C P U 8 5 S L A 9 0 S L A 9 5 E v a l u a t e d E l a s t i c i t y R u l e s

(b) CPU Utilization Box Plot - CPU-based Vs. SLA-based Elasticity (c) Legend

Figure 5.5: Experimental Results of all Evaluated Elasticity Rules

161 5. SLA-based Elasticity satisfaction statistics. In terms of CPU rules, the lower CPU utilization, the better SLA satisfaction statistics. This is mainly because a lower CPU utilization threshold lead to early satisfaction of the elasticity rule and adding a new server before existing servers start to under-perform and SLA satisfaction to degrade. In terms of SLA rules, it can be noticed that there was slight SLA satisfaction improvement when SLA threshold increased. Similarly, this was due to the early satisfaction and addition of new servers with the SLA elasticity rule with higher threshold.

In terms of CPU utilization metric, it can be noticed from Figure 5.5(b) that CPU-based rules outperformed SLA-based rules. Speciﬁcally, the mean CPU utilization resulted from CPU-based rules ranges between 62% and 72% compared to 48% and 53% mean CPU utilization resulted from SLA-based elasticity rules. Furthermore, we can see that all CPU-based elasticity rules resulted in better Q0-Q4 statistics when compared Q0-Q4 statistics resulted from SLA-based rules. Similar to SLA satisfaction results, it can be observed that there is a relationship between CPU and SLA thresholds and CPU utilization statistics. Particularly, a higher CPU utilization threshold has led to better CPU utilization statistics when considering CPU-based elasticity rules. CPU-based rules with lower CPU utilization threshold resulted in adding more servers than those rules with higher CPU utilization threshold. This was because the higher CPU utilization threshold, the longer it took to add a new server, and hence this led to better CPU utilization statistics. By examining SLA-based rules, we can observe that SLA90 resulted in better CPU utilization statistics when compared to SLA95. This is because SLA90 resulted in adding servers less than those servers added by SLA95 rule, and hence less number of servers led to better server utilization.

By examining both metrics, we can observe that SLA-based rules resulted in better SLA satisfaction compared to CPU-based rules, but at the expense of less CPU utilization. On the other hand, CPU-based rules provided a more balanced SLA and CPU utilization statistics when compared to SLA satisfaction. In both elasticity rules, we can also notice that achieving better SLA satisfaction statistics inﬂuenced how well servers’ CPU were utilized. Therefore, higher SLA

162 5. SLA-based Elasticity

Table 5.4: Resulted Metrics of CPU-based and SLA-based Elasticity Experiments Exp./Metric Resp. %Served. Servers Time(ms) Reqs. Cost($) CPU75 351 99.996 1.04 CPU80 404 99.999 0.80 CPU85 372 99.998 0.80 SLA90 299 99.998 1.68 SLA95 295 99.999 2.40 satisfaction was achieved at the expense of lower CPU utilization of instantiated servers.

Table 5.4 presents the other metrics collected from CPU-based and SLA-based elasticity rules experiments namely; average end-to-end response time, percentage of served requests and server costs resulted. From this table, a number of important observations about rules performance can be drawn. First, SLA-based rules performed better in terms of response time compared to the response time resulted from CPU-based elasticity rules. The improvement in average response time that SLA-based elasticity achieved ranges between 16% and 27%. Second, the differences in average response between CPU-based elasticity rules are higher than those between SLA-based elasticity rules. This means that response time is more sensitive to CPU utilization thresholds than SLA thresholds. Third, the percentage of served requests is very high in all CPU-based and SLA-based elasticity rules meaning only very few requests were not been served. This demonstrates that both elasticity rules are reliable, and there is very little impact of changing CPU and SLA thresholds on number of served requests. Fourth, CPU-based elasticity rules resulted in servers cost much lower than those costs resulted from SLA-based rules. The difference in servers cost ranges between 52% and 66% which is significant.

When comparing servers cost of the same rules, we can see CPU-based elasticity rules have less cost diﬀerences in comparison to the cost diﬀerences of SLA-based rules, SLA95 resulted in almost double servers cost of SLA90 rule.

163 5. SLA-based Elasticity

One may think that with a lower CPU trigger threshold, CPU75 should have resulted in high costs. However, this is a function of the workload sensitivity and the time required to bring new servers online. Since CPU75 triggers scale-out actions early, a new server was able to start up and participate in the load distribution by the time the utilization reaches dangerous levels. We can notice that CPU80 and CPU85 rules added more servers than CPU75 and therefore resulted in higher servers cost. With SLA-based rules, a new instance was started while the SLOs were already being violated. Furthermore, we observed that the time needed to instantiate a new server and to start serving requests was almost equal to the monitoring time window speciﬁed in the elasticity rules (i.e., 5 minutes). This eﬀect did not happen with CPU-based rules as new servers were added at earlier stage. Therefore, with SLA-based rules, multiple new servers were instantiated at a time and hence, the higher costs.

Evaluating Performance of SLA-based and CPU-based Elasticity on Diﬀerent Cloud Server Instances

According to our evaluation methodology described in section 5.3.2, we have chosen CPU75 and SLA90 as the best performing elasticity rules of CPU-based and SLA-based approaches respectively. We run three experiments with each rule on both small and medium cloud server instances.

Figure 5.6 shows the SLA satisfaction and the CPU utilization statistics resulted from CPU75 and SLA90 experiments on both small and medium cloud server instances. As it can be seen, both elasticity rules achieved better SLA statistics on servers with medium capacity when compared to the resulting SLA satisfaction on servers with small capacity (Figure 5.6(a)). SLA90 and CPU75 rules scored average SLA satisfaction on medium server instances 8% and 3% higher than those ones on scored on small server instances. Both rules also achieved better Q0-Q4 statistics as well. Obviously, SLA90 achieved better SLA satisfaction than CPU75 on medium server instances.

The SLA results demonstrate that the increase in server capacity has con-

164 5. SLA-based Elasticity

1 0 0 %

9 0 %

8 0 % n n 7 0 % o o i i t t c c 6 0 % a a f f s s i i t t 5 0 % a a S S

A A 4 0 % L L S S

3 0 % % % 2 0 %

1 0 %

0 % C P U 7 5 - S m a l l C P U 7 5 - M e d i u m S L A 9 0 - S m a l l S L A 9 0 - M e d i u m

E l a s t i c i t y R u l e - S e r v e r C a p a c i t y

(a) SLA Satisfaction Box Plot - CPU75 Vs. SLA90 on Small and Medium Servers

1 0 0 %

9 0 %

8 0 % n

o 7 0 % i t a z

i 6 0 % l i t U 5 0 % U P

C 4 0 %

% 3 0 %

2 0 %

1 0 %

0 % C P U 7 5 - S m a l l C P U 7 5 - M e d i u m S L A 9 0 - S m a l l S L A 9 0 - M e d i u m E l a s t i c i t y R u l e - S e r v e r C a p a c i t y

(b) CPU Utilization Box Plot - CPU75 Vs. SLA90 on Small and Medium Servers

Figure 5.6: Experimental Results of Elasticity Rules on Servers with Diﬀerent Capacity Proﬁles

165 5. SLA-based Elasticity tributed to improving SLA satisfaction with both SLA-based and CPU-based rules. The cloud servers at the application tier were able to serve more requests when workload increased, and therefore higher rates of SLA satisfaction were achieved. Another important factor that contributed to achieving better SLA satisfaction is the diﬀerence in server lag time (time needed to start a server until it becomes operational) between small and medium server instances. We observed that medium server instances required about half of the lag time of small instances. On average, it took a medium instance about 2 minutes (compared to about 4 minutes for small instances) to become operational and starts serving requests. As a result, the time during which the SLA was violated and the CPU utilization exceeded a speciﬁed threshold was less on medium server instances, and therefore this resulted in achieving better SLA satisfaction.

In contrast to SLA satisfaction, both SLA90 and CPU75 elasticity rules resulted in CPU utilization statistics on medium server instances considerably lower than those CPU utilization statistics on small instances (see Figure 5.6(b)). No- ticeably, medium server instances were 19% and 32% less utilized than small server instances for CPU75 and SLA90 respectively. Both elasticity rules achieved better CPU utilization in terms of other statistics on small instances as well. We can also see that SLA90 had higher drops in CPU utilization statistics in comparison to CPU75 elasticity rule. The drop in CPU utilization with both elasticity rules on medium servers was due to the fact that medium server instances have CPU capacity double of those small server instances. Each medium server added by an elasticity rule is equivalent to two CPUs of small server capacity and therefore CPUs are less utilized.

Table 5.5 summarizes the other metrics collected from SLA90 and CPU75 elasticity experiments on both small and medium cloud servers. Similar to SLA satisfaction results, both SLA90 and CPU75 achieved better average end-to-end response time on medium instances. Furthermore, SLA90 scored higher average response time improvement, about 31%, when compared to CPU75 response time improvement, which is about 13%. Therefore, we can see considerable improvement in terms of end-to-end response time with medium server instances. In

166 5. SLA-based Elasticity

Table 5.5: Statistics of SLA and CPU-based Rules on Small & Medium Servers Exp./Metric Resp. Time %Served Servers (ms) Reqs. Cost($) CPU75-Small 321 99.999 1.07 CPU75-Medium 279 100 0.48 SLA90-Small 357 99.999 1.55 SLA90-Medium 245 99.999 1.60 terms of percentage of served requests, both rules, SLA90 and CPU75, resulted in almost the same percentage of served requests on both small and medium cloud server instances. This means both server types can be very reliable as only very few requests were not be served.

We have calculated the servers usage cost shown in Table 5.5 using Algo- rithm3. The algorithm takes as input the resulted number of servers data over the duration of experiments time. It computes the total number of hours that each server has been used during the experiment time and then multiply them with hourly charges ($0.08 and $0.16 for small and medium server instances). We have followed AWS charging model in which servers that are successfully started and used for at least one minute are charged for complete hour even if they have not been used for full hour.

In terms of servers cost, CPU75 elasticity rules achieved better cost saving on medium server instances, nearly half of the servers cost on small instances. In contrast, SLA90 rules resulted in cost on medium servers slightly higher than the cost on small server instances. We see two main observations behind achieving lower servers cost in case of CPU75 elasticity rules. First, the initial servers state in which a server CPU capacity was doubled (2 virtual cores with medium servers). This helped to stabilize server’s performance from the start and therefore increasing workload was addressed beforehand, unlike small server instances case. Second, a 75% threshold cannot be reached and maintained easily with medium instances due to the doubled CPU capacity. This is especially hold as additional CPU capacity was oﬀered before workload increase and therefore

167 5. SLA-based Elasticity less stress was put on CPU utilization. Both factors contributed to adding less servers by CPU75 rule and therefore reduced servers cost. However, this was not the case with SLA90 rules on medium servers. The main diﬀerence here is that SLA satisfaction was the threshold upon which adding new servers is decided. SLA violations occurred before 75% CPU utilization threshold was reached, and therefore SLA90 triggered adding new servers a number of times and hence resulted in higher servers cost. This shows that CPU utilization threshold, and hence CPU-based rules, would not be enough indicator for how well SLA is satis- ﬁed and based scaling decisions upon. Figure 5.6(a) demonstrates this as it shows SLA90 achieved better SLA satisfaction particularly on medium instances.

Evaluating Performance Consistency of SLA-based and CPU-based Elasticity

In this section, we analyse performance variability of CPU–based and SLA-based elasticity rules. Similar to the previous group of experiments, we run six experiments at diﬀerent days and times, three with each of CPU75 and SLA90 elasticity rules. For both CPU75 and SLA90 elasticity rules, we grouped the box plots into two groups; experiments with small server instances and experiments with cloud medium instances. The juxtaposition of box plots here provides a good way to investigate if there are diﬀerences between the SLA satisfaction and CPU utilization data sets [135] obtained from individual experiments. We also analyse variability of performance in terms of average end-to-end response time and servers cost.

Figure 5.7 summarizes the SLA satisfaction box plots resulted from the three experiments we have run with both CPU75 and SLA90 elasticity rules on (a) small server instances and (b) medium server instances. Figure 5.7(a) shows that both CPU75 and SLA90 achieved almost consistent SLA satisfaction statistics when examining the three box plots of each elasticity rule experiment. On medium server instances (Figure 5.7(b)), we noticed that CPU75 experiments exhibited variation in SLA satisfaction statistics slightly higher than SLA satisfaction statistics of SLA90 experiments. Particularly, CPU75 resulted in almost

168 5. SLA-based Elasticity

1 0 0 %

9 0 %

8 0 %

n 7 0 % o i t c 6 0 % a f s i t 5 0 % a S

A 4 0 % L S 3 0 % % 2 0 %

1 0 %

0 % C P U 7 5 - 1 C P U 7 5 - 2 C P U 7 5 - 3 S L A 9 0 - 1 S L A 9 0 - 2 S L A 9 0 - 3 E v a l u a t e d E l a s t i c i t y R u l e s

(a) Small Server Instances - CPU75 & SLA90

1 0 0 %

9 0 %

8 0 %

n 7 0 % o i t c 6 0 % a f s i t 5 0 % a S

A 4 0 % L S 3 0 % % 2 0 %

1 0 %

0 % C P U 7 5 - 1 C P U 7 5 - 2 C P U 7 5 - 3 S L A 9 0 - 1 S L A 9 0 - 2 S L A 9 0 - 3 E v a l u a t e d E l a s t i c i t y R u l e s

(b) Medium Server Instances - CPU75 & SLA90

Figure 5.7: Performance Consistency of SLA-based and CPU-based Elasticity Rules on Small and Medium Servers

169 5. SLA-based Elasticity up to %6 difference in average SLA satisfaction compared to up to 2% difference in average SLA satisfaction resulted from SLA90. The third experiment of CPU75 shows mean, median, 75% percentile and minimum SLA satisfaction lower than the first and second experiments.

Figure 5.8 summarizes the CPU utilization box plots resulted from the three experiments of each of CPU75 and SLA90 elasticity rules on (a) small server instances and (b) medium server instances. As it can be noticed from Figure 5.8(a), both CPU75 and SLA90 resulted in CPU utilization with slight variations in terms of all statistics on small servers. As shown, there were slight differences in terms of mean and median CPU utilization in the three experiments of both CPU75 and SLA90. The other statistics were quite similar in the three experiments of each elasticity rule. In comparison to small servers, medium server instances experiments resulted in more performance variations in terms of average, median and 75% percentile CPU utilization with both CPU75 and SLA90 elasticity experiments (Figure 5.8(b)). For example, the first experiment of CPU75 had mean CPU utilization and 75% percentile higher than the other two experiments. The other statistics are quite similar though. Similarly, the third experiment of SLA90 resulted in mean, median, 25% and 75% percentiles slightly lower than the first and second experiments. The statistics of the first and second experiments were quite similar.

Figure 5.9 shows the average response time and servers cost resulted from the three experiments of both CPU75 and SLA90 on small and medium server instances. As shown in ﬁgure 5.9(a), both elasticity rules exhibited slight variations in average response time on small and medium server instances. Among all experiments, CPU75 elasticity experiments on medium scored the highest variation reaching 0.13% diﬀerence in average response time. Both, CPU75 and SLA90, scored average response time variation on small server instances lower than those on medium server instance. Furthermore, SLA90 resulted in variations on both small and medium servers lower than corresponding ones with CPU75.

170 5. SLA-based Elasticity

1 0 0 %

9 0 %

8 0 %

7 0 % ) % ( 6 0 % n o i t

a 5 0 % z i l i t 4 0 % U

U 3 0 % P C 2 0 %

1 0 %

0 % C P U 7 5 - 1 C P U 7 5 - 2 C P U 7 5 - 3 S L A 9 0 - 1 S L A 9 0 - 2 S L A 9 0 - 3 E v a l u a t e d E l a s t i c i t y R u l e s

(a) Small Server Instances - CPU75 & SLA90

1 0 0 %

9 0 %

8 0 %

7 0 % ) % ( 6 0 % n o i t

a 5 0 % z i l i t 4 0 % U

U 3 0 % P C 2 0 %

1 0 %

0 % C P U 7 5 - 1 C P U 7 5 - 2 C P U 7 5 - 3 S L A 9 0 - 1 S L A 9 0 - 2 S L A 9 0 - 3 E v a l u a t e E l a s t i c i t y R u l e s

(b) Medium Server Instances - CPU75 & SLA90

Figure 5.8: Performance Consistency of CPU Utilization of Elasticity Rules

171 5. SLA-based Elasticity

0.4

0.35

0.3

0.25

Average Response Time (s) (s) Time Response Average 0.2 Exp1 Exp2 Exp3 0.15

0.1

0.05

0 SLA90-Small SLA90-Medium CPU75-Small CPU75-Medium Elasticity Rule Experiment - Server Instance

(a) Average Response Time of CPU75 & SLA90 on Small and Medium Instances

1.8

1.6

1.4

1.2

Exp1 Server Costs ($) Costs Server 0.8 Exp2 Exp3

0.6

0.4

0.2

0 SLA90-Small SLA90-Medium CPU75-Small CPU75-Medium Elasticity Rule Experiment - Server Instance

(b) Servers Cost of CPU75 & SLA90 on Small and Medium Instances

Figure 5.9: Performance Consistency Results of CPU and SLA-based Elasticity Rules on Small and Medium Servers

172 5. SLA-based Elasticity

In terms of servers cost (Figure 5.9(b)), both elasticity rules, CPU75 and SLA90, exhibited slight variations on both small and medium server instances, except CPU75 on medium instances. The servers cost of CPU75 experiments on medium server was the same, which means it is highly consistent. CPU75 experiments on small servers scored lower variation (0.02%) than SLA90 on small and medium servers. The variations resulted from SLA90 on both small and medium server experiments are minor which is 0.1% for both small and medium experiments. We noticed this variations was due to time diﬀerences in when a server is added in the three experiments which resulted in 1-2 hours more or less server usage.

It is important to note that the servers usage cost in Figure 5.9(b) shows the variation in cost resulted from three experiments. The aims of these experiment is to evaluate how consistent would be servers cost on small and medium server instances. Meanwhile, the servers cost shown in Table 5.5 represents ﬁxed cost which was resulted from only one experiment with the purpose to compare servers cost on small and medium instances.

The other metric which we have used to compare performance variation of small and medium server instances is the percentage of served requests. We have noticed very minor variations, ranging in 0.001%-0.002%, in all elasticity experiments on both small and medium instances. Therefore, the performance of CPU75 and SLA90 are highly consistent in terms of percentage of served requests.

5.4 Discussion

In this section, we discuss the key ﬁndings of our experiments and data analysis in the context of the research aims and contributions which are presented in section 1.4 and 1.5. We divide the discussion into three main categories following the experiment categories presented in section 5.3.3.

173 5. SLA-based Elasticity

Evaluating SLA-based and CPU-based Elasticity

Each elasticity approach achieved better performance than the other one in terms of some but not all metrics. Speciﬁcally, SLA-based elasticity performed better than CPU-based elasticity in terms of SLA satisfaction and end-to-end average response time metrics. Meanwhile, CPU-based elasticity had better performance in terms of CPU utilization and servers cost. Both elasticity approaches almost had equivalent performance in terms percentage of successfully served requests. By examining the performance data of both elasticity experiments, we can conclude that CPU-based elasticity achieved performance to cost trade-oﬀ better than SLA-based rules. Noticeably, CPU75 scored fairly comparable SLA satisfaction and average response time while keeping servers cost reduced. SLA-based elasticity rules achieved slightly higher SLA satisfaction and average response time but at the expense of high servers cost.

Based on these observations, we can infer that CPU-based elasticity rules are more sensitive to application workload changes than SLA-based rules, and therefore they trigger adding new servers (scale-out actions) earlier than SLA-based rules. This is because increase in workload puts load on the CPU utilization first and then this load is translated into a degradation in application response time, and therefore SLA-based took longer time to trigger adding new servers while SLOs were violated. Furthermore, the average time needed by a server (added by a scale-out action) to start serving requests was about 4 minutes which is close to the cool-down time (7 minutes in our experiments). In case of SLA-based elasticity, this had not provided enough time to improve SLA satisfaction, and hence SLOs were quickly violated again and a new scale-out condition was satisfied to trigger addition of a new server. This was not the case with CPU-based elasticity as servers were added before SLOs become severely violated, and hence there was enough time to improve SLA satisfaction once a server was added. As a result, SLA-based elasticity needed to add more servers to reduce SLA violations and therefore this increased the servers cost. In this direction, further investigations would be useful to improve the cost-effectiveness of SLA-based elasticity while

174 5. SLA-based Elasticity maintaining SLA satisfaction, response time and CPU utilization metrics at appropriate levels. In the case of CPU-based elasticity, further investigations on how to improve SLA satisfaction and response time metrics while maintaining appropriate servers cost would be an interesting future work as well.

Another important inference that can be drawn from the results of CPU-based and SLA-based experiments is the impact of changing metrics’ thresholds on the performance and cost metrics. Increasing or decreasing CPU utilization threshold in CPU-based rules and SLA satisfaction threshold in SLA-based rules inﬂuenced the performance of all metrics. Noticeably, the highest impact was on servers cost which increased up to 30% and 23% when the thresholds was increased in SLA- based and CPU-based rules respectively. There was also up to 13% increase in average response time with CPU-based rules when CPU utilization threshold was increased. Similarly, average CPU utilization was increased by 11% when CPU threshold increased to 85% and it was decreased when SLA threshold increased to 95%.

The experimental evaluation and analysis of SLA-based and CPU-based elasticity rules address the first sub-aim, evaluating the performance of resource-level and SLA-level elasticity based on different CPU utilization and SLA satisfaction thresholds. We have presented empirical data of how well SLA-based and CPU- based elasticity perform in terms of important metrics namely SLA satisfaction, CPU utilization, end-to-end response time, servers cost and percentage of served requests. We have also presented trade-off analysis of the performance of both elasticity approaches and discussed its strengths and weaknesses in terms of all these metrics. Furthermore, we have analysed the impact of changing CPU utilization and SLA satisfaction thresholds on the performance of CPU-based and SLA-based elasticity in terms of these metrics.

Cloud consumers need to choose appropriate elasticity approach and its thresholds depending on which metrics they want to optimize. Our performance evaluation and analysis equip cloud consumers with practical guidelines and inferences for deﬁning appropriate elasticity rules, that meet their performance and cost

175 5. SLA-based Elasticity metrics for their internet business applications running on an IaaS cloud. The performance analysis also provide cloud consumers with empirical evidences on the factors that inﬂuence achieving performance and cost metrics such as using SLA satisfaction and CPU utilization thresholds and changing their values, in addition to server provisioning time and cool-down time.

One of the key lessons that can be derived from our experiments is that both SLA and CPU thresholds have responded differently to application performance, CPU utilization and servers usage cost. Therefore, cloud consumers would be challenged with determining the impact of CPU and SLA thresholds on application and resources’ metrics. Our analysis provides insights on how SLA and CPU thresholds will impact important performance and cost. Another key les- son is the essential need for modeling SLA-based auto-scaling. Considering all possible thresholds and different types of application workloads would be impracticable. Therefore, there will be an essential need for modeling SLA-based elasticity and related performance metrics. We see this as one of the key future work to further support cloud consumers to analyse and understand the effect of different thresholds and application workload on performance and cost metrics.

One crucial outcome of our performance evaluation in this chapter is the design and development of an SLA-based elasticity approach. We have designed and developed an architecture and methods for auto-scaling n-tier applications running on IaaS cloud based on application SLA metrics. Our SLA-based elasticity primarily consists of a monitoring component that performs real-time monitoring of response time SLA at any tier of web applications running on an IaaS cloud. It also contains an elasticity controller that evaluate and control elasticity conditions and decisions based on the response time SLA computed by the monitoring component. These components provide cloud consumers with a way to deﬁne elasticity rules to automatically scale-out and scale-in a pool of cloud servers. Auto-scaling decisions are based on application response SLA monitored in real-time in cloud environment.

Together with our experimental approach of elasticity rules, SLA-based elas-

176 5. SLA-based Elasticity ticity equips cloud consumers with a platform for evaluating performance of elasticity rules for their web applications running on IaaS cloud, and for choosing appropriate thresholds that satisfy their performance and cost metrics. The application SLA monitoring component can also be utilized together with CPU-based elasticity to evaluate the performance impact of CPU thresholds on application SLA, and to choose CPU thresholds that meet their desired SLA levels.

Evaluating SLA-based and CPU-based Elasticity on Cloud Servers of Diﬀerent Capacity Proﬁles

Our experiments with different cloud server instances showed empirical evidence of the impact of server capacity profiles on the performance of both CPU-based and SLA-based elasticity. Particularly, CPU-based elasticity achieved better SLA satisfaction, average response time and servers cost on medium cloud servers. Likewise, SLA-based elasticity achieved better performance on medium cloud servers in all metrics, except the servers cost. The most noticeable improvement with CPU-based elasticity is the cost on medium cloud servers, which was more than 50% reduction in comparison to the servers cost on small instances. In terms of SLA-based elasticity, average end-to-end response time is the highest improvement achieved on medium server instances, which was 31% higher than average response time on small cloud servers. Both elasticity approaches had less CPU utilization on medium server instances in comparison to small instances. There was almost no impact on the percentage of served requests with both CPU-based and SLA-based elasticity as both approaches scored almost very similar figures.

The analysis of the performance of both elasticity approaches on small and medium cloud servers bring a number of points worth discussing. First, increasing the capacity of cloud servers have resulted in considerable performance improvements in some metrics but not all. The average CPU utilization dropped 13% and 32% on medium servers with CPU75 and SLA90 elasticity rules respectively. Furthermore, SLA90 resulted in slightly higher servers cost on medium servers in comparison to small servers cost. Therefore, cloud consumers need to be

177 5. SLA-based Elasticity aware that increasing server capacity would not lead to improvements in all the metrics. Our empirical experiments showed which metrics, and the ratios of improvements, that were influenced by changing servers capacity. If certain metrics are more important than others, cloud consumers can approximate potential performance improvement patterns using our empirical data and trade-off analysis. Second, both elasticity approaches followed performance patterns similar to those ones on small instances. Specifically, SLA-based elasticity performed better than CPU-based elasticity in terms of SLA satisfaction and average response time on both small and medium cloud servers. Meanwhile, CPU-based elasticity achieved better performance in terms of average CPU utilization and servers cost. This shows that even an increase in server capacity would not lead to improve the performance of those metrics, which under-perform with small servers. Therefore, further investigations are required to determine other factors that may contribute to improve the trade-off between all metrics such as dynamically tuning elasticity thresholds and parameters defined in elasticity rules.

Third, using cloud server instances of higher capacity requires careful trade-oﬀ considerations. In our experiments, a medium server instance, which has double of the CPU capacity of a small server instance, has resulted in better SLA, end-to-end response time and cost metrics but lower CPU utilization. A capacity of medium cloud server is equivalent two small servers which means adding a medium server by elasticity rules is equivalent to two small servers. This contributed to the low CPU utilization in CPU-based and SLA-based rules, and hence higher cost with the later although it has improved the performance of other metrics.

The addition of two small servers at a time when a scale-out condition is sat- isﬁed might not necessary needed. Instead, a gradual increase of a small server at a time would be more cost-eﬀective way in terms of servers cost and CPU utilization. One advantage of using medium cloud servers is the server provisioning time which was half of the provisioning time of small servers. This improved SLA satisfaction and response time with medium cloud servers as it reduced the time during which servers were under high workload and SLA was violated. If server

178 5. SLA-based Elasticity provisioning time of small instances can be reduced to the half, then we can expect more improvements in SLA and response time metrics with small server instances.

Evaluating Performance Consistency of SLA-based and CPU-based Elasticity

Our third group of experiments showed that both elasticity approaches exhibited slight performance variations in terms of all metrics except few. CPU-based elasticity showed complete consistency in servers cost on medium cloud servers. In addition, CPU-based and SLA-based elasticity demonstrated almost consistent performance in terms of percentage of served requests. There were performance variations in terms of all other metrics on medium cloud server instances which are slightly higher than those variations on small server instances. In comparison to SLA-based, CPU-based elasticity exhibited slightly higher variations in average SLA satisfaction and average end-to-end response time on medium servers (about 6% and 0.13%). In addition, SLA-based showed similar variation in cost on both small and medium cloud servers (about 0.1%).

We see these performance variations resulted from CPU-based and SLA-based elasticity on small and medium servers were due to a number of factors. First, the slight diﬀerences in the emulated workload. The TPC-W workload generator [131] we utilized requires several variables to calculate the number and type of requests to be emulated by each user session. The program also uses probability functions to emulate the number and type of requests. This has contributed in producing some variations in the number and type of requests that were generated in each repeated experiment. Second, the slight change in the amount of resource capacity allocated each time a server is instantiated by scale-out action. As resources allocation in cloud environments is primarily driven by multi-tenancy and virtualization technology, resources capacity is likely not to be the exact capacity each time. Cloud providers employ automated mechanisms to determine when and how much resources capacity are shared between diﬀerent applications hosted on their cloud, so they can maximize their resources utilization and minimize costs [146; 147]. The performance variability of cloud infrastructure resources

179 5. SLA-based Elasticity reported in a number of studies such as [36; 51; 52; 53] provides empirical evidence of the variations of the performance of allocated resources of the same cloud provider.

Although our experiments showed that there were slight performance variations in both elasticity approaches, we believe these variations were minor and would not have major impact on the reliability of elasticity rules performance. As we have discussed above, one of the factors which contributed to such variations is related to the performance variability of oﬀered cloud resources. Cloud consumer cannot control this factor as they do not have control behind the virtualization layer of the physical resources. Therefore, cloud consumers can summarize the key performance metrics along with a variation percentage that represent a range within which performance values may occur, rather than using exact values or ﬁgures. For example, an average SLA satisfaction can be represented as an interval such as [88%,90%]. This can be read as an average SLA satisfaction lies between these two values as there could be a variation of 2% because of performance variability of cloud resources.

We believe that the evaluation and trade-off analysis of performance consistency provide empirical evidences on how CPU-based and SLA-based elasticity might vary in terms of key application and resources metrics. It also shows that these performance variations are tolerable. Therefore, we believe cloud consumers can rely on the resulted performance figures to define appropriate elasticity thresholds that can approximate how well important metrics are met with certain SLA-based or CPU-based elasticity.

180 Chapter 6

CONCLUSIONS AND FUTURE WORK

In this chapter we draw the key conclusions for the research work carried out in this thesis. Particularly, we summarize the key challenges with respect to IaaS elasticity approaches and its performance for multi-tier cloud-based applications. We discuss how the proposed research work achieved in this thesis can help addressing these challenges, and how it contributes to the related research areas (section 6.1). In section 6.2, we discuss the main research limitations that could have threats to validity of our claimed contributions. We then describe potential future research work that could improve and/or compliment our proposed research work.

6.1 Summary

One important benefit of running e-business applications on IaaS cloud is satisfying Service Level Objectives (SLOs), specified in Service Level Agreements (SLAs), through efficient use of computing resources. Modern enterprises are increasingly running their business applications on such IaaS clouds to utilize elasticity benefits and maintain appropriate SLA levels. SLAs are significant for businesses to maintain appropriate levels of service quality and to reduce po-

181 6. Conclusions and Future Work tential financial and non-financial losses that can result from violating SLOs. The dynamic provisioning and de-provisioning of IaaS computing resources can support cloud consumers, business organizations, in achieving this goal. Most IaaS providers such as Amazon Web Services (AWS) provide elasticity (or auto- scaling) mechanisms to dynamically resize computing resources on-demand. How- ever, elasticity benefits are not inherently realized from the deployment of such applications on IaaS clouds. Instead, cloud consumers are faced with a number challenges that demand appropriate methods and approaches to maintain application SLAs in efficient ways.

In the context of this thesis, we have proposed and developed models, methods and algorithms that can help cloud consumers to address important elasticity challenges when running their business applications on IaaS clouds. The following summarizes the main output of our research in terms of the challenges being addressed and the contributions that have been made to tackle these challenges.

Modeling and simulating performance of IaaS elasticity

Currently, the commonly used mechanisms for controlling elasticity are driven by rules that trigger scale-out and scale-in actions based on thresholds and parameters. One of the key challenges that face cloud consumers here is to decide on appropriate thresholds and parameters’ values that lead to achieving cost- eﬀective elasticity in terms of application and resources’ metrics. In this thesis, we have proposed analytical models, based on queuing theory, that emulate the behavior of CPU-based elasticity mechanism for multi-tier applications running on IaaS cloud. These models capture key parameters and thresholds of elasticity rules such as CPU utilization thresholds, monitoring time windows and cool-down time. In addition, our models approximate a number of metrics that are important for analysing the performance of elasticity rules. These metrics include CPU utilization, application’s response time, number of servers triggered by elasticity and cost of servers usage. Based on these models, we have also proposed CPU- based elasticity algorithms that simulate when and how to trigger scale-out and scale-in actions. Furthermore, we have presented a cost algorithm that estimates

182 6. Conclusions and Future Work hourly server usage costs that could be incurred from such scale-out and scale-in actions.

We have implemented our elasticity models and algorithms in Matlab. Us- ing Matlab implementation, we have conducted simulation experiments using elasticity rules with diﬀerent CPU utilization thresholds. We have validated performance metrics resulted from the simulation experiments against the same metrics resulted from corresponding empirical experiments, which we have conducted with an e-commerce application running on the Amazon IaaS cloud. The validation of our models and algorithms has demonstrated their feasibility and accuracy in approximating the behavior and performance of rule-based mechanisms based on CPU utilization metrics. Based on our validation and experiments, a number of important observations can be drawn.

• First, our models and algorithms have been able to emulate the behavior of CPU-based elasticity rules to a fairly high extent. Particularly, they have been able to trigger scale-out and scale-in actions in response to workload changes that mimic the behavior of auto-scaling actions triggered in real production environment. There have been some delays in triggering auto- scaling actions, especially at the start of experiments, that inﬂuenced the accuracy of our models in matching the real auto-scaling behavior.

• Second, the simulation results demonstrated reasonable accuracy of our elasticity models and algorithms in approximating application and resources’ metrics. The diﬀerence between simulation and empirical data ranges between 1.52% and 6.69% for the average CPU utilization, 11 millisecond and 34 millisecond for the average response time, 1 and 2 servers for the number of servers triggered and $0.08 and $0.24 for the server costs. The main aim of any performance modeling is to approximate system behavior and related performance-cost metrics but not approximating exact behavior or values. Having this in mind, we can see that our models have approximated CPU- based elasticity behavior and corresponding metrics with slight deviation from the real behavior and metrics.

183 6. Conclusions and Future Work

• Third, our elasticity models have shown ability in emulating the trends and relationship between CPU utilization threshold, speciﬁed in elasticity rules, and application and resources metrics. Speciﬁcally, our simulation showed higher average CPU utilization and average response time as the CPU threshold increases. It also showed decrease in the number of servers and related costs when the CPU utilization threshold increased. Both observations matched the trend and relationship between CPU utilization and application and resources’ metrics resulted from empirical experiments in real IaaS cloud environment.

• Fourth, our IaaS elasticity models and algorithms provide a tool that can be used to support cloud consumers to deﬁne appropriate elasticity thresholds for their application workloads. The use case we have presented in chapter 4, section 4.4, demonstrated a representative scenario in which cloud consumers can deﬁne appropriate elasticity thresholds to meet their resource and application’s performance and costs metrics. Particularly, our elasticity models simulation have been used to analyse the impact of changing elasticity thresholds on performance and cost metrics that are important to satisfy as part of SLOs.

One of the key lessons that have been learned from our modeling is that it is possible to model how CPU-based elasticity works and to approximate the impact of thresholds on important performance and cost metrics. Although our models exhibited some variations in approximating CPU-based elasticity and related metrics, but they have highlighted the need for further improvements to the queuing models to capture more complex parameters. Our modeling of CPU- based have shown their ability in capturing general trends and patterns between elasticity thresholds and their impact on performance and cost metrics. As it is impracticable to evaluate all possible thresholds and diﬀerent workload types, our models an algorithms can provide cloud consumers with a method to evaluate the impact of thresholds on desired application performance and cost metrics.

184 6. Conclusions and Future Work

Evaluating performance of elasticity approaches

Elasticity of IaaS cloud can be driven based on either resources or application metrics. Among the most common approaches are CPU-based and SLA-based elasticity in which auto-scaling decision are triggered based on CPU utilization and application SLA metrics respectively. The main challenge here is to decide how well each elasticity approach performs from cloud consumer perspective. To address this challenge, we have carried out empirical performance evaluation of both elasticity mechanisms under different IaaS cloud scenarios. In the first evaluation scenario, we have empirically evaluated IaaS elasticity based on different CPU utilization and application response time SLA thresholds on a 3-tier e-commerce application running on Amazon IaaS cloud. Our evaluation experiments provided empirical evidences on how well both elasticity mechanisms perform based on trade-off analysis of applications and resources’ metrics. From this analysis, we have demonstrated that these are trade-off to either elasticity mechanism. Specifically, SLA-based elasticity achieved slightly better SLA satisfaction and end-to-end response time than CPU-based elasticity. CPU-based elasticity, on the other hand, resulted in higher servers CPU utilization and lower servers cost.

This result brought forward important take-away regarding both elasticity approaches. In particular, the main factor that influences how well each of the metrics is satisfied was the number of servers triggered by scale-out and scale-in actions over time. SLA-based elasticity triggered higher number of servers that achieved higher SLA satisfaction and response time, but at higher servers cost and lower CPU utilization. In contrast, CPU-based elasticity was able to reduce servers cost, and to improve CPU utilization of all servers, but it influenced SLA satisfaction and response time metrics. We have inferred that the number of servers triggered by each elasticity approach was influenced by a number of factors, primarily sensitivity to workload changes and server provisioning time. CPU-based elasticity rules are more sensitive to application workload changes than SLA-based rules and therefore they trigger adding new servers (scale-out actions) earlier than SLA-based. Meanwhile, SLA-based elasticity took longer

185 6. Conclusions and Future Work to satisfy scale-out actions due to the fact that increases in workload have ﬁrst inﬂuenced server CPU utilization and then application’s metrics. The server provisioning time has also impacted SLA satisfaction as it increased the period of time during which SLA is violated and therefore adding more servers in case of SLA-based elasticity.

Our performance evaluation of both elasticity approaches have also drawn crucial conclusions about the impact of CPU and SLA thresholds values on application and resources metrics. Noticeably, the highest impact has been on the servers cost which increased up to 30% and 23% when CPU and SLA thresholds were increased in SLA-based and CPU-based rules respectively. In addition, there has been up to 13% increase in average response time with CPU-based rules when CPU utilization threshold was increased. Similarly, the average CPU utilization increased by 11% when CPU threshold increased to 85%, and it decreased when SLA threshold increased to 95%.

Another important take-away is that although both CPU-based and SLA- based elasticity can be used to satisfy SLA and CPU utilization, but there is a need for a systematic method to determine how they should be deﬁned. Our CPU-based elasticity aims to support cloud consumers on analysing CPU thresholds on performance and cost metrics. In regards to SLA-based elasticity, we see a similar need for determining how SLA thresholds would inﬂuence such metrics.

In the second evaluation scenario, we have considered diﬀerent cloud server instances and their impact on the performance of SLA-based and CPU-based elasticity. Here, a number of important conclusions can be drawn from the analysis of the resulted data. First, there have been noticeable performance improvements of both elasticity mechanisms in terms of some metrics but not all. In particular, CPU-based and SLA-based elasticity achieved better SLA satisfaction and end-to-end response time on medium server instances when compared to small instances. CPU-based scored the highest improvement with 50% servers cost reduction. The highest improvement that SLA-based achieved is 31% decrease in response time. The average CPU utilization was dropped 13% and

186 6. Conclusions and Future Work

32% on medium server instances with CPU-based and SLA-based elasticity rules respectively. Therefore, cloud consumers cannot assume that increasing server capacity would lead to performance improvements in all metrics. Our empirical experiments showed the metrics, and the ratios of improvements, that were inﬂuenced by changing server’s capacity. Depending on which metrics are more important than others, cloud consumers can approximate potential performance improvement patterns based on our empirical experiments. Second, the impact of server provisioning time had less eﬀect on the performance of SLA-based and CPU-based elasticity. This was because medium server instances needed less provisioning time than small instances, and hence improvements in SLA satisfaction and response time were achieved.

The third evaluation scenario is concerned with evaluating how consistent is the performance of both SLA and CPU-based elasticity on both small and medium server instances. Our empirical evaluation showed slight performance variations in most metrics. CPU-based elasticity showed complete consistency in servers cost on medium cloud servers. In addition, CPU-based and SLA-based elasticity demonstrated almost consistent performance in terms of percentage of served requests. There were performance variations in terms of all other metrics on medium cloud servers which are slightly higher than those on small servers. In comparison to SLA-based, CPU-based elasticity exhibited slightly higher variations in average SLA satisfaction and average response time on medium servers (about 6% and 0.13%). SLA-based showed similar variation in cost on both small and medium cloud servers (about 0.1%).

We believe these slight variations were primarily driven from two main factors; (i) slight variations in workload generation due to the probability functions in the TPC-W workload generator program we have used and (ii) dynamic resource allocation and sharing techniques employed by public IaaS cloud providers. Therefore, we believe the real variations that could exist from both SLA and CPU- based elasticity can be minor and would not have clear eﬀect on performance and cost metrics.

187 6. Conclusions and Future Work

We believe that our evaluation of elasticity under different real-world scenarios can highly support cloud consumers in defining appropriate auto-scaling rules that satisfy their application and resources metrics. The performance evaluation of CPU-based and SLA-based elasticity has been quantified in terms of metrics that are crucial for cloud consumers including application SLA satisfaction and end-to-end response time; and server’s CPU utilization and servers usage cost. The trade-off analysis provides quantified evidence on how well each elasticity approach perform with different SLA and CPU utilization thresholds. Further- more, the evaluation also shows quantified evidence of the impact of capacity profiles and potential performance variability of cloud servers on performance of SLA-based and CPU-based elasticity. These evidences collectively have shown that there is fairly considerable dynamics between thresholds and application and resources metrics which influence realizing optimal performance and cost constraints. In addition, other influencing factors such as server provisioning time and server capacity profile have shown impact on realizing performance and cost metrics. Based on our empirical data, cloud consumers can decide on the appropriate elasticity mechanisms, SLA and CPU utilization thresholds and server instances that meet their desired performance and cost metrics.

SLA-based Elasticity Architecture

Current, elasticity approaches are mainly driven by resource-level metrics such as CPU and memory utilization. For cloud consumers, it is also crucial to derive elasticity decisions based on application SLA metrics due to the signiﬁcance of such metrics for them. In this regard, we have designed and developed an architecture and methods to scale-out and scale-in multi-tier applications running on IaaS cloud based on application SLA metric. The architecture primarily consists of an elasticity controller that evaluate and control elasticity conditions and decisions based on response time SLA. This is computed based on the data that is continuously collected by our monitoring components which reside on each server of the pool of servers under auto-scaling rules. We also developed algorithms that describe how elasticity controller and monitor components work to enable automated autos-scaling decisions for multi-tier applications running on

188 6. Conclusions and Future Work

IaaS cloud. We have implemented our SLA-based architecture and algorithms and used them in our performance evaluation of both SLA-based and CPU-based elasticity mechanisms.

Our SLA-based elasticity architecture provide cloud consumers with a way to deﬁne elasticity rules to automatically scale-out and scale-in a pool of cloud servers based on application SLA metrics, which are monitored in real-time in IaaS cloud environments. Together with our experimental approach of elasticity mechanisms, SLA-based elasticity equips cloud consumers with a useful platform for evaluating performance of elasticity rules for their web applications running on IaaS cloud. It also supports cloud consumers in determining appropriate elasticity thresholds that satisfy their performance and cost metrics. The application SLA monitoring component can also be utilized together with CPU-based elasticity to evaluate the performance impact of CPU thresholds on application SLA and to choose CPU thresholds that meet their desired SLA levels.

6.2 Limitations

This section describes the key limitations that could constrain our research contributions. Such limitations provide better understanding of potential threats to the validity of our research results. The following summarizes these key limitations of our research.

• Impact of virtualization and resources sharing have not been considered in our elasticity and performance models. In our elasticity modeling, we have considered that servers triggered by scale-out actions are with maximum capacity and equivalent to physical machines. In IaaS cloud environments, however, computing resources are virtualized and shared between different applications. Resource allocation and sharing often aims to maximize IaaS provider’s benefits, and hence, could influence cloud consumer’s application performance and cost metrics. Resource sharing and allocation remain behind the control of cloud consumers and it would be hard to investigate

189 6. Conclusions and Future Work

their impact on performance of elasticity.

• Some performance factors such as network latency and memory caching have not been considered in our performance models. These factors could have inﬂuence on performance metrics that have been employed to evaluate IaaS elasticity. Network latency could increase performance overhead, and hence, end-to-end response time. Caching could reduce overhead on server’s CPU utilization. Modeling these factors could improve accuracy of elasticity modeling and its performance.

• Our performance modeling of IaaS elasticity is focused on the application tier. The web and the database tiers were allocated a ﬁxed maximum capacity in our modeling. We did not model elasticity on all tiers and dependencies between them. Furthermore, we have not considered implications of scaling one tier on the other tiers in terms of performance and cost metrics. Modeling each tier with elastic resources using queuing networks could inﬂuence how elasticity and performance models should be captured using multi-tier applications.

• Our performance modeling approach considers workloads without spike effect. The arrival of requests were considered to follow Poisson process with time between arrival distributed exponentially. Workload spikes often result from unexpected increase in workload volumes that are very challenging to characterise and model. It could signiﬁcantly inﬂuence how well elasticity can be achieved, and hence, an application’s performance and cost metrics.

• Similar to performance modeling, our performance evaluation of elasticity approaches were focused on the application tier independently from other tiers. It is assumed that performance bottlenecks will occur on the application tier and the workload used in the evaluation was CPU-intensive. It could be the case that performance bottleneck could shift to the database tier due to auto-scaling eﬀect on the application tier. This would especially hold when a workload mix with higher write operations ratio. Such aspects would inﬂuence the evaluation methodology and results.

190 6. Conclusions and Future Work

• The workload mix used in our performance evaluation considered to be stationary. Web applications nowadays tend to have non-stationary workload patterns in which volume of workload mix (combination of request types) changes over time. In our evaluation, we considered browsing mix which primarily generates similar workload mixes over the time, but with changing volumes. Changing workload mix would likely influence performance over time as request types put different loads on CPU, and hence, would take different times to process.

• There might be performance bottlenecks due to the code structure of the application being tested. The code of the TPC-W benchmark, online bookstore, we have used in our evaluation is not proven to be optimized in terms of modularity and communication between application and system components. Furthermore, today’s web applications utilize modern internet development technologies that also contribute to improving responsiveness of application’s functions at components level.

• The synchronization of monitoring data with the SLA manager in our SLA- based elasticity architecture involves some delays. The SLA manager reads monitoring data, collected by all servers’ agents, at every minute interval. In some cases, the SLA manager needs to pause metric collection for a minute to get all monitoring data in the current minute interval before it starts ﬁltering and evaluate the data based on request types and its SLOs. This introduces some overhead in calculating and publishing aggregated SLA metrics that need to be evaluated in elasticity conditions. This can improve responsiveness of auto-scaling actions by evaluating SLA metrics more accurately in real-time.

6.3 Future Work

Our performance modeling and evaluation of elasticity approaches presented in this thesis have brought forward important research challenges. The research limitations discussed in section 6.2 provide pointers for potential future research

191 6. Conclusions and Future Work work. Investigating these future research challenges would complement the research contributions that have been introduced in our research work. The following describes the key research problems that could be interesting to investigate.

Modeling performance of elasticity provides useful means for cloud consumers to define auto-scaling policies that meet performance and cost metrics for their applications. There still a room for achieving better accuracy in modeling elasticity mechanisms and related performance parameters. One way to improve elasticity modeling is through investigating and modeling impact of virtualization technologies utilized technologies utilized in IaaS environments on capacity of allocated computing resources. Furthermore, this would include investigating impact of known resources allocation and sharing strategies on realizing cost-effective elasticity in terms of application and resources’ metrics. Other important factors that worth investigating to improve elasticity modeling include, incorporating impact of network latency and caching techniques on auto-scaling and resources and application’s metrics. Network could play important role in influencing performance of cloud-based applications due to changing communication overhead that results from variable accessibility patterns of computing resources. Further- more, in multi-tier application architecture, there is considerable communication between all tiers that could influence application response time SLA and resources usage metrics, i.e., network cost. It would be also interesting to investigate impact of caching techniques on reducing CPU utilization metric , and hence, impact on number of servers to be triggered by auto-scaling and their cost. Addressing this future work could help in two main areas; (i) improve our model’s accuracy in estimating resources metrics, e.g., CPU utilization and application metrics such as response time and SLA satisfaction (ii) improve accuracy of our elasticity algorithms in estimating when to trigger scale-out and scale-in actions and estimating servers usage cost.

The performance modeling and evaluation can be further improved by considering modeling elasticity performance of the web and database tiers of the application. This requires investigating and modeling when and how to scale all tiers in response to workload changes and dependencies between all tiers. Fur-

192 6. Conclusions and Future Work thermore, this requires modeling performance and cost implications on all tiers when scaling decisions are applied to a certain tier.

Our elasticity models and algorithms can be further extended to consider SLA-based elasticity mechanism and its performance. In our elasticity models, we have been able to approximate average response time at every certain time interval based queuing theory. However, the SLA-based elasticity mechanism we have presented requires evaluating each request of certain type to its SLO in order to compute SLA satisfaction metric at every certain time interval. Therefore, one crucial challenge here is to estimate response time of each request individually given diﬀerent types of requests in request mix collected every time interval. This represents the core for using our algorithms and other models to derive auto- scaling decisions based on SLA satisfaction metric.

The performance modeling and evaluation of elasticity could also be improved with incorporating models for capturing SLA violations and their cost. This includes deriving utility functions, or use existing ones, to determine when application SLAs are not respected and financial penalties that cloud consumers have to pay to their application users. Such violations are related to under-provisioning of computing resources. Therefore, it would be beneficial to quantify financial losses that could be incurred from failing of employed elasticity mechanism to trigger scale-out actions or any delay in provisioning of cloud servers in a timely manner.

193 APPENDIX A: LIST OF PUBLICATIONS

Below is a list of papers that I have published during my Ph.D. study

• B. Suleiman and S. Venagoupal, “Modeling Performance of Elasticity Rules for Cloud-based Applications”, School of Computer Science and Engineer- ing, University of New South Wales, Tech. Rep. UNSW-CSE-TR-201323, Sep. 2013. [Online]. Available: ftp://ftp.cse.unsw.edu.au/pub/doc/ papers/UNSW/201323.pdf

• Suleiman B; Venugopal S, “Modeling Performance of Elasticity Rules for Cloud-Based Applications”, in proceedings of 17th IEEE International En- terprise Distributed Object Computing Conference (EDOC), IEEE Com- puter Society, Vancouver, Canada, 9 - 13 Sep. 2013. pp. 201–206. [Online]. Available: http://dx.doi.org/10.1109/EDOC.2013.3

• B. Suleiman, “Elasticity economics of cloud-based applications,” IEEE 9th International Conference in Services Computing (SCC), 2012, pp. 694-695. [Online]. Available: http://ieeexplore.ieee.org/xpl/articleDetails. jsp?arnumber=6274216

• B. Suleiman, S. Sakr, S. Venugopal, and W. Sadiq, “Trade-oﬀ analysis of elasticity approaches for cloud-based business applications,” in 13th Inter- national Conference on Web Information System Engineering (WISE2012),

194 Appendix A

Nov. 2012, pp. 468-482. [Online]. Available: http://dl.acm.org/citation. cfm?id=2426760

• B. Suleiman, S. Sakr, R. Jeﬀery, and A. Liu, “On understanding the economics and elasticity challenges of deploying business applications on public cloud infrastructure,” in Journal of Internet Services and Applications (JISA), vol. 2, no. 3, pp. 1-21, 2011. [Online]. Available: http: //link.springer.com/article/10.1007%2Fs13174-011-0050-y

• B. Suleiman, C. E. da Silva, and S. Sakr, “One size does not ﬁt all: A group-based service selection for web-based business processes,” in 25th IEEE International Conference on Advanced Information Networking and Applications Workshops, AINA Workshops. IEEE Computer Society, 2011, pp. 253-260. [Online]. Available: http://dx.doi.org/10.1109/WAINA. 2011.106

• M. Roy, B. Suleiman, and I. Weber, “Facilitating enterprise service discovery for non-technical business users,” in Proceedings of the 2010 Inter- national Conference on Service-oriented Computing, ICSOC10. Springer- Verlag, 2011, pp. 100-110. [Online]. Available: http://dl.acm.org/ citation.cfm?id=1987684.1987697

• M. Roy, B. Suleiman, D. Schmidt, I. Weber, and B. Benatallah, “Using soa governance design methodologies to augment enterprise service descrip- tions,” in Proceedings of the 23rd International Conference on Advanced Information Systems Engineering, ser. CAiSE11. Springer-Verlag, 2011, pp. 566-581. [Online]. Available:http://dl.acm.org/citation.cfm? id=2026716.2026771

• B. Suleiman and F. Ishikawa, “A constraint-based approach for developing consistent contracts in composite services,” in Proceedings of the 2009 Congress on Services - I, SERVICES 09. IEEE Computer Society, 2009, pp. 392-399. [Online]. Available: http://dx.doi.org/10.1109/SERVICES-I. 2009.15

195 Appendix A

• B. Suleiman, V. Tosic, D. R. Jeﬀery, and Y. J. Liu, “Models and algorithms for business value-driven adaptation of business processes and software infrastructure,” in 31st International Conference on Software Engineering, ICSE Companion. IEEE Computer Society, 2009, pp. 387-390. [Online]. Available: http://dx.doi.org/10.1109/ICSE-COMPANION.2009.5071028

• B. Suleiman and V. Tosic, “Integration of UML modeling and policy-driven management of web service systems,” in proceedings of the 2009 ICSE Workshop on Principles of Engineering Service Oriented Systems, PESOS 09. IEEE Computer Society, 2009, pp. 75-82. [Online]. Available: http: //dx.doi.org/10.1109/PESOS.2009.5068823

• B. Suleiman, “Commercial-oﬀ-the-shelf software development framework,” in Proceedings of the 19th Australian Conference on Software Engineer- ing, ASWEC08. IEEE Computer Society, 2008, pp. 690-695. [Online]. Available: http://dl.acm.org/citation.cfm?id=1395083.1395633

• B. Suleiman, V. Tosic, and E. Aliev, “Non-functional property speciﬁ- cations for WRIGHT ADL,” in proceedings of 8th IEEE International Conference on Computer and Information Technology, CIT 2008. IEEE, 2008, pp.766-771. [Online]. Available: http://dx.doi.org/10.1109/CIT. 2008.4594771

• V. Tosic, B. Suleiman, and H. Lutﬁyya, “UML proﬁles for ws-policy4masc as support for business value driven engineering and management of web services and their compositions,” in proceedings of the 11th IEEE In- ternational Enterprise Distributed Object Computing Conference, EDOC 07. IEEE Computer Society, 2007, pp. 157-168. [Online]. Available: http://dl.acm.org/citation.cfm?id=1317532.1318045

• B. Suleiman, E. Aliev, and V. Tosic, “Time-quality metric model for quality measurement of web-based systems,” in 14th Asia-Paciﬁc Software Engi- neering Conference (APSEC’2007). IEEE Computer Society, 2007, p. 566 [Online]. Available: http://doi.ieeecomputersociety.org/10.1109/APSEC. 2007.94

196 APPENDIX B: GLOSSARY

Abbreviations

ICT Information Communication Technology

IT Information Technology

IaaS Infrastructure as a Service

SaaS Software as a Service

PaaS Platform as a Service

SLA Service Level Agreement

NIST National Institute of Standards and Technology

SLO Service Level Objective

QoS Quality of Service

CI-SLA Cloud Infrastructure Service Level Agreement

CA-SLA Cloud Application Service Level Agreement

AWS Amazon Web Services

EC2 Elastic Cloud Compute

WASABi Windows Autoscaling Application Block

197 Appendix B

RAM Random Access Memory

CPU Central Processing Unit

API Application Programming Interface

MVA Mean Value Analysis

IID Independent and Identically Distributed

FCFS First Come, First Served

HPC High Performance Computing

RDS Relational Database Service

I/O Input/Output

TPC Transaction Processing Performance Council

TPC-W Transaction Web Benchmark

AMI Amazon Machine Image

ECU EC2 Compute Unit

CRM Customer Relationship Management

DaaS Database as a Service

WS Web Server

AS Application Server

DS Database Server

DB Database

URI Uniﬁed Resource Identiﬁer

198 Appendix B

IP Internet Protocol

PDA Personal Device Assistant

TCP/UDP Transmission Control Protocol/User Datagram Protocol

199 Appendix B

List of Symbols

The following are the symbols and their explanation that have been used in the modeling and evaluation of IaaS Elasticity Rules.

Table 1: List of Symbols and their Explanation

Symbol Description

Mθ Metric threshold M/M/m Multi-server queue model with exponential distribution G/G/1 Single-server queue model with general distribution T Time interval length

Tw Time evaluation window of an elasticity condition

Tc Cool-down time of elasticity action t Time interval during which a system is monitored τ Sum of all time intervals t

λt Request arrival rate at time interval t

µt Requests service rate during time interval t

mt Number of servers at time interval t

Ut Average CPU utilization at time interval t

Rt Average response time at time interval t

Bt Busy time of application tier servers at time interval t

Btm Busy time of a server at time interval t

Urθ Utilization ramp-up threshold u Uθ Upper CPU utilization threshold for scale-out l Uθ Lower CPU utilization threshold for scale-in u Tw Upper time window threshold of a scale-out condition l Tw Lower time window threshold of a scale-in condition

minc Number of servers to be added by a scale-out action

mdec Number of servers to be removed by a scale-in action

nt Expected number of requests being served during time interval t

(ns)t Number of requests being served at time interval t

(nq)t Number of requests queuing at time interval t

P0 Probability of zero requests in the system

Pn Probability of n of requests in the system

%t Probability of queuing

Txt Additional serving time due to request mix during time interval t

Di Average demand of a request of type i puts on CPU Continued - List of Symbols and their Explanation

200 Appendix B

Table 1 – Continued - List of Symbols and their Explanation Symbol Description n Number of request types

λit Average request rate of a request of type i during time interval t

Qpi Percentage of requests of type i relative to all other requests

Tslθ Server provisioning lag time threshold u Tc Upper cool-down threshold to be applied after scale-out action l Tc Lower cool-down threshold to be applied after scale-in action Smin Minimum number of servers to be incurred by a scale-in action

Smax Maximum number of servers to be incurred by a scale-out action

Sr Server usage charge ($) per hour

Sc Total servers usage cost ($)

201 References

[1] D. Rutland, “Cloud: Economics,” http://www.rackspace.com/knowledge center/ whitepaper/cloud-economics, Rackspace Hosting, Tech. Rep. 55, Sep. 2012.

[2] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warﬁeld, “Xen and the art of virtualization,” in Proceedings of the nineteenth ACM symposium on Operating systems principles, ser. SOSP ’03. New York, NY, USA: ACM, 2003, pp. 164–177. [Online]. Available: http://doi.acm.org/10.1145/945445.945462

[3] S. Zhang, S. Zhang, X. Chen, and X. Huo, “Cloud computing research and development trend,” in Proceedings of the 2010 Second International Conference on Future Networks, ser. ICFN ’10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 93–97. [Online]. Available: http://dx.doi.org/10.1109/ICFN.2010.58

[4] I. Foster and C. Kesselman, The Grid 2: Blueprint for a New Computing Infrastructure. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2003.

[5] M. Armbrust, A. Fox, R. Griﬃth, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “A view of cloud computing,” Commun. ACM, vol. 53, no. 4, pp. 50–58, Apr. 2010. [Online]. Available: http://doi.acm.org/10.1145/1721654.1721672

[6] P. Mell and T. Grance, “Deﬁnition of cloud computing,” National Institute of Standards and Technologies (NIST), Tech. Rep., July 2009. [Online]. Available: http://csrc.nist.gov/groups/SNS/cloud-computing/

[7] M. Armbrust, A. Fox, G. Rean, A. Joseph, R. Katz, A. Konwinski, L. Gunho, P. David, A. Rabkin, I. Stoica, and M. Zaharia, “Above the Clouds: A Berkeley View of Cloud Com- puting,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS- 2009-28, Feb 2009.

[8] B. Suleiman, S. Sakr, R. Jeﬀery, and A. Liu, “On understanding the economics and elasticity challenges of deploying business applications on public cloud infrastructure,” JISA, vol. 2, no. 3, pp. 1–21, 2011.

202 REFERENCES

[9] W. Kim, “Cloud computing adoption,” International Journal of Web and Grid Services, vol. 7, no. 3, pp. 225–245, Jan. 2011. [Online]. Available: http: //inderscience.metapress.com/content/t44w170h26081458/

[10] N. Leavitt, “Is cloud computing really ready for prime time?” Computer, vol. 42, no. 1, pp. 15–20, 2009.

[11] Amazon Web Services Inc., “Amazon web services,” http://aws.amazon.com/, Feb. 2013, date accessed: May. 2013.

[12] Rackspace Inc., “Rackspace the open cloud company,” http://www.rackspace.com/, Feb. 2013, date accessed: Jun. 2013.

[13] GoGrid Inc., “Gogrid,” http://www.gogrid.com/, Feb. 2013, date accessed: Aug. 2013.

[14] Joyent Inc., “Joyent,” http://www.joyent.com/, Feb. 2013, date accessed: Jul. 2013.

[15] ElasticHosts Inc., “Elastichosts pricing,” http://www.elastichosts.com/cloud-hosting/ pricing, May 2013, date accessed: Jun. 2013.

[16] Amazon Web Services Inc., “Case studies,” http://www.aws.amazon.com/solutions/ case-studies/, Jan 2012, date accessed: Feb. 2012.

[17] D. Menasc´e, V. Almeida, R. Riedi, F. Ribeiro, R. Fonseca, and W. Meira, Jr., “In search of invariants for e-business workloads,” in Proceedings of the 2nd ACM conference on Electronic commerce, ser. EC’00. New York, NY, USA: ACM, 2000, pp. 56–65. [Online]. Available: http://doi.acm.org/10.1145/352871.352878

[18] R. Singh, U. Sharma, E. Cecchet, and P. Shenoy, “Autonomic mix-aware provisioning for non-stationary data center workloads,” in Proceedings of the 7th International Conference on Autonomic Computing, ser. ICAC ’10. New York, NY, USA: ACM, 2010, pp. 21–30. [Online]. Available: http://doi.acm.org/10.1145/1809049.1809053

[19] J. Hellerstein, F. Zhang, and P. Shahabuddin, “An approach to predictive detection for service management,” in 1999. Distributed Management for the Networked Millen- nium. Proceedings of the Sixth IFIP/IEEE International Symposium on Integrated Net- work Management, 1999, pp. 309–322.

[20] H. Li, G. Casale, and T. Ellahi, “Sla-driven planning and optimization of enterprise applications,” ser. WOSP/SIPEW’10. ACM, 2010, pp. 117–128.

[21] P. M. Philip Bianco, Grace A. Lewis, “Service Level Agreements in Service-Oriented Ar- chitecture Environments,” http://www.sei.cmu.edu/reports/08tn021.pdf, Software Engi- neering Institute, Carnegie Mellon University, Tech. Rep. CMU/SEI-2008-TN-021, Sep. 2009.

203 REFERENCES

[22] “Deﬁning and monitoring service-level agreements for dynamic e-business,” in Proceedings of the 16th USENIX conference on System administration, ser. LISA’02. Berkeley, CA, USA: USENIX Association, 2002, pp. 189–204. [Online]. Available: http://dl.acm.org/citation.cfm?id=1050517.1050540

[23] Y. Chen, S. Iyer, X. Liu, D. Milojicic, and A. Sahai, “Translating service level objectives to lower level policies for multi-tier services,” Cluster Computing, vol. 11, no. 3, pp. 299–311, Sep. 2008. [Online]. Available: http://dx.doi.org/10.1007/s10586-008-0059-6

[24] P. Xiong, Y. Chi, S. Zhu, H. J. Moon, C. Pu, and H. Hacigumus, “Intelligent management of virtualized resources for database systems in cloud environment,” in ICDE’11, 2011, pp. 87–98. [Online]. Available: http://dx.doi.org/10.1109/ICDE.2011.5767928

[25] D. Breitgand, E. Henis, O. Shehory, and J. Lake, “Derivation of Response Time Service Level Objectives for Business Services.” in Proceedings of the 2nd IEEE/IFIP International Workshop on Business-Driven IT Management (BDIM’07), May 2007, pp. 29–38. [Online]. Available: http://dx.doi.org/10.1109/BDIM.2007.375009

[26] M. Seibold, A. Kemper, and D. Jacobs, “Strict slas for operational business intelligence,” in Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing, ser. CLOUD ’11. Washington, DC, USA: IEEE Computer Society, 2011, pp. 25–32. [Online]. Available: http://dx.doi.org/10.1109/CLOUD.2011.22

[27] J. Kosinski, D. Radziszowski, K. Zielinski, S. Zielinski, G. Przybylski, and P. Niedziela, “Deﬁnition and evaluation of penalty functions in sla management framework,” in Proceedings of the Fourth International Conference on Networking and Services, ser. ICNS ’08. Washington, DC, USA: IEEE Computer Society, 2008, pp. 176–181. [Online]. Available: http://dx.doi.org/10.1109/ICNS.2008.32

[28] JupiterResearch, “Retail web site performance: Consumer reaction to a poor online shopping experience,” Akamai and JupiterKagan, Tech. Rep., June 2006. [Online]. Available: http://www.akamai.com/dl/reports/Site Abandonment Final Report.pdf

[29] D. A. Patterson, “A simple way to estimate the cost of downtime,” in Proceedings of the 16th USENIX conference on System administration, ser. LISA’02. Berkeley, CA, USA: USENIX Association, 2002, pp. 185–188. [Online]. Available: http://dl.acm.org/citation.cfm?id=1050517.1050538

[30] E. R. Alliance, “2001 Cost of Downtime,” http://contingencyplanningresearch.com/ 2001Survey.pdf, Eagle Rock Alliance Pty. Ltd., Tech. Rep., August 2001.

[31] D. Hilley, “Cloud Computing: A Taxonomy of Platform and Infrastructure-level Of- ferings,” http://www.cercs.gatech.edu/tech-reports/tr2009/git-cercs-09-13.pdf, Georgia Institute of Technology, Tech. Rep. git-cercs-09-13, April 2009.

204 REFERENCES

[32] D. Durkee, “Why cloud computing will never be free,” Commun. ACM, vol. 53, no. 5, 2010.

[33] H. Li and S. Venugopal, “Using reinforcement learning for controlling an elastic web application hosting platform,” in ICAC ’11, 2011, pp. 205–208. [Online]. Available: http://doi.acm.org/10.1145/1998582.1998630

[34] S. Malkowski, M. Hedwig, D. Jayasinghe, C. Pu, and D. Neumann, “Cloudxplor: a tool for conﬁguration planning in clouds based on empirical data,” in Proceedings of the 2010 ACM Symposium on Applied Computing, ser. SAC ’10. New York, NY, USA: ACM, 2010, pp. 391–398. [Online]. Available: http://doi.acm.org/10.1145/1774088.1774172

[35] J. Ramsaya, A. Barbesib, and J. Preecec, “A psychological investigation of long retrieval times on the World Wide Web,” Interacting with Computers, vol. 10, no. 1, pp. 77–86, 1998.

[36] N. Y. Alexandru Iosup and D. Epema, “On performance Variability of Production Cloud Services,” http://pds.twi.tudelft.nl/reports/2010/PDS-2010-002.pdf, Delft University of Technology, Tech. Rep. PDS-2010-002, January 2010.

[37] Amazon Web Services Inc., “Auto scaling,” http://aws.amazon.com/autoscaling/, Oct. 2012, date accessed: Nov. 2012.

[38] P. C. Brebner, “Is your cloud elastic enough?: Performance modelling the elasticity of infrastructure as a service (iaas) cloud applications,” in Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering, ser. ICPE ’12. New York, NY, USA: ACM, 2012, pp. 263–266. [Online]. Available: http://doi.acm.org/10.1145/2188286.2188334

[39] M. Michael, J. Moreira, D. Shiloach, and R. Wisniewski, “Scale-up x Scale-out: A Case Study using Nutch/Lucene,” International Symposium on Parallel and Distributed Pro- cessing, pp. 1–8, Mar. 2007.

[40] R. Shoup, “More best practices for large-scale websites: Lessons from ebay,” http://www. infoq.com/presentations/Best-Practices-eBay, Nov. 2010, date accessed: Apr. 2011.

[41] A. Khajeh-Hosseini, I. Sommerville, and I. Sriram, “Research challenges for enterprise cloud computing,” ACM Computing Research Repository (CoRR), vol. abs/1001.3257, Jan. 2010. [Online]. Available: http://arxiv.org/abs/1001.3257

[42] Microsoft Inc., “Windows azure auto-scaling block (wasabi),” http://msdn.microsoft. com/en-us/library/hh680945(v=pandp.50).aspx, Feb 2013, date accessed: May. 2013.

205 REFERENCES

[43] Scalr Inc., “How it works:scaling,” http://www.scalr.com/how-it-works/, Feb 2013, date accessed: Apr. 2013.

[44] RightScale Inc., “Set up autoscaling,” http://support.rightscale.com/03-Tutorials/ 02-AWS/02-Website Edition/How do I set up Autoscaling%3F, Feb 2013, date accessed: May. 2013.

[45] B. Suleiman, S. Sakr, S. Venugopal, and W. Sadiq, “Trade-oﬀ analysis of elasticity approaches for cloud-based business applications,” in Proceedings of the 13th International Conference on Web Information Systems Engineering, ser. WISE’12. Berlin, Heidelberg: Springer-Verlag, 2012, pp. 468–482. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-35063-4 34

[46] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, “Cloud computing and emerging it platforms: Vision, hype, and reality for delivering computing as the 5th utility,” Future Gener. Comput. Syst., vol. 25, no. 6, pp. 599–616, Jun. 2009. [Online]. Available: http://dx.doi.org/10.1016/j.future.2008.12.001

[47] Eagle Rock Alliance, Ltd., “2001 cost of downtime online survey,” http:// contingencyplanningresearch.com/2001Survey.pdf/, Mar. 2001, date accessed: Jun. 2021.

[48] S. Ferretti, V. Ghini, F. Panzieri, M. Pellegrini, and E. Turrini, “QoS- Aware Clouds,” in Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing (CLOUD), 2010, pp. 321–328. [Online]. Available: http: //dx.doi.org/10.1109/CLOUD.2010.17

[49] S. Bouchenak, “Automated control for sla-aware elastic clouds,” in Proceedings of the Fifth International Workshop on Feedback Control Implementation and Design in Computing Systems and Networks, ser. FeBiD ’10. New York, NY, USA: ACM, 2010, pp. 27–28. [Online]. Available: http://doi.acm.org/10.1145/1791204.1791210

[50] A. Iosup, S. Ostermann, M. Yigitbasi, R. Prodan, T. Fahringer, and D. H. J. Epema, “Performance analysis of cloud computing services for many-tasks scientiﬁc computing,” Parallel and Distributed Systems, IEEE Transactions on, vol. 22, no. 6, pp. 931–945, June 2011.

[51] J. Schad, J. Dittrich, and J.-A. Quian´e-Ruiz, “Runtime measurements in the cloud: observing, analyzing, and reducing variance,” VLDB, vol. 3, no. 1-2, pp. 460–471, sep 2010. [Online]. Available: http://dl.acm.org/citation.cfm?id=1920841.1920902

[52] A. Iosup, N. Yigitbasi, and D. Epema, “On the performance variability of production cloud services,” in CCGRID’11, 2011, pp. 104–113. [Online]. Available: http://dx.doi.org/10.1109/CCGrid.2011.22

206 REFERENCES

[53] Lenk, A. and Menzel, M. and Lipsky, J. and Tai, S. and Oﬀermann, P., “What Are You Paying For? Performance Benchmarking for Infrastructure-as-a-Service Oﬀerings,” in IEEE International Conference on Cloud Computing(CLOUD11)´ , 2011, pp. 484–491.

[54] L. M. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner, “A break in the clouds: towards a cloud deﬁnition,” SIGCOMM Comput. Commun. Rev., vol. 39, no. 1, pp. 50–55, Dec. 2008. [Online]. Available: http://doi.acm.org/10.1145/1496091.1496100

[55] A. Lenk, M. Klems, J. Nimis, S. Tai, and T. Sandholm, “What’s inside the cloud? an architectural map of the cloud landscape,” in Proceedings of the 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing, ser. CLOUD ’09. Washington, DC, USA: IEEE Computer Society, 2009, pp. 23–31. [Online]. Available: http://dx.doi.org/10.1109/CLOUD.2009.5071529

[56] Linux Foundation Collaborative Projects, “Xen project,” http://www.xenproject.org/, Jun. 2013, date accessed: Sep. 2013.

[57] VMware Inc., “Vmware,” http://www.vmware.com/, Jun. 2013, date accessed: Sep. 2013.

[58] Google Inc., “Google app engine,” https://developers.google.com/appengine/docs/ whatisgoogleappengine, Feb. 2013, date accessed: Jun. 2013.

[59] Microsoft Inc., “Microsoft azure,” http://www.windowsazure.com/, Feb. 2013, date accessed: May. 2013.

[60] Salesforce.com Inc., “Salesforce platform,” http://www.salesforce.com/platform/ overview/, Feb. 2013, date accessed: May. 2013.

[61] Salesforce.com, Inc., “Salesforce sales cloud,” http://www.salesforce.com/sales-cloud/ overview/, Feb. 2013, date accessed: Jun. 2013.

[62] SAP Inc., “Sap business bydesign,” http://www.sap.com/pc/tech/cloud/software/ business-management-bydesign/overview/index.html, Feb. 2013, date accessed: Jun. 2013.

[63] Microsoft Inc., “Office 365, office in the cloud,” http://office.microsoft.com/en-au/ business/office-365-online-business-software-programs-FX102997619.aspx, Feb. 2013, date accessed: May. 2013.

[64] A. Keller and H. Ludwig, “The wsla framework: Specifying and monitoring service level agreements for web services,” J. Netw. Syst. Manage., vol. 11, no. 1, pp. 57–81, Mar. 2003. [Online]. Available: http://dx.doi.org/10.1023/A:1022445108617

207 REFERENCES

[65] Microsoft Inc., “Windows azure support: Service level aggreements,” http://www. windowsazure.com/en-us/support/legal/sla/, Jan. 2014, date accessed: Feb. 2014.

[66] Amazon Web Services Inc., “Amazon ec2 service level agreement,” http://aws.amazon. com/ec2/sla/, Jun. 2013, date accessed: Jan. 2014.

[67] Amazon Web Services, Inc., “Amazon s3 service level agreement,” http://aws.amazon. com/s3/sla/, Jun. 2013, date accessed: Jan. 2014.

[68] Rackspace Inc., “Rackspace cloud sla,” http://www.rackspace.com/information/legal/ cloud/sla, Jan. 2014, date accessed: Feb. 2014.

[69] Terremark Inc., “Terremark cloud service level aggreement,” https://community. vcloudexpress.terremark.com/en-us/product docs/w/wiki/service-level-agreement.aspx, Jan. 2014, date accessed: Jan. 2014.

[70] R. Witty, “Best Practice in Business Continuity Planning, Gartner, 2001,” http: //www.gartner.com/5 about/news/bcpbestpractices.ppt/, March 2011, date accessed: Mar. 2013.

[71] Terremark Inc., “vcloud express pricing,” http://www.vcloudexpress.terremark.com/ pricing.aspx, May 2013, date accessed: Jun. 2013.

[72] G. Reese, Cloud Application Architectures: Building Applications and Infrastructure on the Cloud. O’Reilly, 2009.

[73] Amazon Web Services, Inc., “Elastic load balancing:dveloper guide,” http://awsdocs.s3. amazonaws.com/ElasticLoadBalancing/latest/elb-dg.pdf, Amazon Web Services, Tech. Rep., Jun. 2012, date accessed: Aug. 2012.

[74] Microsoft MSDN, “Microsoft Patterns and Practices: Autoscaling and Windows Azure,” http://msdn.microsoft.com/en-us/library/hh680945(v=pandp.50).aspx, Jun. 2012, date accessed: Jan. 2013.

[75] Michael Sheehan, “How to scale your gogrid infrastructure,” http://blog.gogrid.com/ 2013/02/13/how-to-scale-your-gogrid-infrastructure/, Mar. 2013, date accessed: Jun. 2013.

[76] Michael, Sheehan, “Ram scaling and more now available on gogrid,” http://blog.gogrid. com/2010/10/12/ram-scaling-and-more-now-available-on-gogrid/, Nov. 2013, date accessed: Dec. 2013.

[77] Sanjay Sohoni, “Rackspace cloud monitoring developer guide,” http://docs.rackspace. com/cm/api/v1.0/cm-devguide/content/overview.html, Feb. 2014, date accessed: Mar. 2014.

208 REFERENCES

[78] Amazon Web Services Inc., “Amazon cloud watch,” http://aws.amazon.com/ cloudwatch/, Mar. 2011, date accessed: Nov. 2012.

[79] RightScale Inc., “Rightscale,” http://www.rightscale.com/, Apr. 2013, date accessed: Jun. 2013.

[80] N. R. Herbst, S. Kounev, and R. Reussner, “Elasticity in cloud computing: What it is, and what it is not,” in Proceedings of the 10th International Conference on Autonomic Computing (ICAC 13). San Jose, CA: USENIX, 2013, pp. 23–27. [Online]. Available: https://www.usenix.org/conference/icac13/technical-sessions/presentation/herbst

[81] N. D. Mickulicz, P. Narasimhan, and R. Gandhi, “To auto scale or not to auto scale,” in Proceedings of the 10th International Conference on Autonomic Computing (ICAC 13). San Jose, CA: USENIX, 2013, pp. 145–151. [Online]. Available: https://www.usenix.org/conference/icac13/technical-sessions/presentation/mickulicz

[82] N. Roy, A. Dubey, and A. Gokhale, “Eﬃcient autoscaling in the cloud using predictive models for workload forecasting,” in 2011 IEEE International Conference on Cloud Computing (CLOUD), July 2011, pp. 500–507.

[83] B. Urgaonkar, P. Shenoy, A. Chandra, P. Goyal, and T. Wood, “Agile dynamic provisioning of multi-tier internet applications,” ACM Trans. Auton. Adapt. Syst., vol. 3, no. 1, pp. 1–39, Mar. 2008.

[84] E. Kalyvianaki, T. Charalambous, and S. Hand, “Self-adaptive and self-conﬁgured cpu resource provisioning for virtualized servers using kalman ﬁlters,” in Proceedings of the 6th International Conference on Autonomic Computing, ser. ICAC ’09. New York, NY, USA: ACM, 2009, pp. 117–126. [Online]. Available: http: //doi.acm.org/10.1145/1555228.1555261

[85] X. Dutreilh, S. Kirgizov, O. Melekhova, J. Malenfant, N. Rivierre, and I. Truck, “Using Reinforcement Learning for Autonomic Resource Allocation in Clouds: towards a fully automated workﬂow,” in Seventh International Conference on Autonomic and Autonomous Systems, ICAS 2011. IEEE, May 2011, pp. 67–74, moVe INT LIP6.

[86] M. Maurer, I. Brandic, and R. Sakellariou, “Enacting slas in clouds using rules,” in Proceedings of the 17th International Conference on Parallel Processing - Volume Part I, ser. Euro-Par’11. Berlin, Heidelberg: Springer-Verlag, 2011, pp. 455–466. [Online]. Available: http://dl.acm.org/citation.cfm?id=2033345.2033393

[87] R. Han, L. Guo, M. M. Ghanem, and Y. Guo, “Lightweight resource scaling for cloud applications,” in Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (Ccgrid 2012), ser. CCGRID ’12.

209 REFERENCES

Washington, DC, USA: IEEE Computer Society, 2012, pp. 644–651. [Online]. Available: http://dx.doi.org/10.1109/CCGrid.2012.52

[88] T. C. Chieu, A. Mohindra, A. A. Karve, and A. Segal, “Dynamic scaling of web applications in a virtualized cloud computing environment,” in Proceedings of the 2009 IEEE International Conference on e-Business Engineering, ser. ICEBE ’09. Washington, DC, USA: IEEE Computer Society, 2009, pp. 281–286. [Online]. Available: http://dx.doi.org/10.1109/ICEBE.2009.45

[89] T. Chieu, A. Mohindra, and A. Karve, “Scalability and performance of web applications in a compute cloud,” in 2011 IEEE 8th International Conference on e-Business Engineering (ICEBE), Oct 2011, pp. 317–323.

[90] B. Suleiman and S. Venugopal, “Modeling performance of elasticity rules for cloud-based applications,” in Proceedings of the 2013 17th IEEE International Enterprise Distributed Object Computing Conference, ser. EDOC ’13. Washington, DC, USA: IEEE Computer Society, 2013, pp. 201–206. [Online]. Available: http://dx.doi.org/10.1109/EDOC.2013.31

[91] Sanjay Sohoni, “Easily scale your cloud with rackspace auto scale,” http://www. rackspace.com/blog/easily-scale-your-cloud-with-rackspace-auto-scale/, Nov. 2013, date accessed: Feb. 2014.

[92] P. Lama and X. Zhou, “Eﬃcient server provisioning with control for end-to-end response time guarantee on multitier clusters,” IEEE Trans. Parallel Distrib. Syst., vol. 23, no. 1, pp. 78–86, Jan. 2012.

[93] J. Xu, M. Zhao, J. Fortes, R. Carpenter, and M. Yousif, “On the use of fuzzy modeling in virtualized data center management,” in Proceedings of the Fourth International Conference on Autonomic Computing, ser. ICAC ’07. Washington, DC, USA: IEEE Computer Society, 2007, pp. 25–34. [Online]. Available: http://dx.doi.org/10.1109/ICAC.2007.28

[94] S. J. Malkowski, M. Hedwig, J. Li, C. Pu, and D. Neumann, “Automated control for elastic n-tier workloads based on empirical modeling,” in Proceedings of the 8th ACM International Conference on Autonomic Computing, ser. ICAC ’11. New York, NY, USA: ACM, 2011, pp. 131–140. [Online]. Available: http://doi.acm.org/10.1145/1998582.1998604

[95] H. Ghanbari, B. Simmons, M. Litoiu, and G. Iszlai, “Feedback-based optimization of a private cloud,” Future Gener. Comput. Syst., vol. 28, no. 1, pp. 104–111, Jan. 2012.

210 REFERENCES

[96] B. Urgaonkar, G. Paciﬁci, P. Shenoy, M. Spreitzer, and A. Tantawi, “An analytical model for multi-tier internet services and its applications,” ser. SIGMETRICS’05. ACM, 2005, pp. 291–302.

[97] B. Urgaonkar, G. Paciﬁci, P. J. Shenoy, M. Spreitzer, and A. N. Tantawi, “Analytical modeling of multitier internet applications,” ACM Trans. Web, vol. 1, no. 1, pp. 291–302, May 2007. [Online]. Available: http://doi.acm.org/10.1145/1232722.1232724

[98] R. N. Calheiros, R. Ranjan, and R. Buyya, “Virtual machine provisioning based on analytical performance and qos in cloud computing environments,” in Proceedings of the 2011 International Conference on Parallel Processing, ser. ICPP ’11. Washington, DC, USA: IEEE Computer Society, 2011, pp. 295–304. [Online]. Available: http://dx.doi.org/10.1109/ICPP.2011.17

[99] I. Al-Azzoni and D. Kondo, “Cost-aware performance modeling of multi-tier web applications in the cloud,” in Networked Digital Technologies, ser. Communications in Computer and Information Science, 2012, vol. 293, pp. 186–196.

[100] R. Ghosh, K. Trivedi, V. Naik, and D. S. Kim, “End-to-end performability analysis for infrastructure-as-a-service cloud: An interacting stochastic models approach,” in 2010 IEEE 16th Paciﬁc Rim International Symposium on Dependable Computing (PRDC), Dec 2010, pp. 125–132.

[101] R. Ghosh, F. Longo, V. K. Naik, and K. S. Trivedi, “Modeling and performance analysis of large scale iaas clouds,” Future Gener. Comput. Syst., vol. 29, no. 5, pp. 1216–1234, Jul. 2013. [Online]. Available: http://dx.doi.org/10.1016/j.future.2012.06.005

[102] P. C. Brebner, “Performance modeling for service oriented architectures,” in Companion of the 30th International Conference on Software Engineering, ser. ICSE Companion ’08. New York, NY, USA: ACM, 2008, pp. 953–954. [Online]. Available: http://doi.acm.org/10.1145/1370175.1370204

[103] J. Dejun, G. Pierre, and C.-H. Chi, “Ec2 performance analysis for resource provisioning of service-oriented applications,” in ICSOC/ServiceWave’09, pp. 197–207. [Online]. Available: http://dl.acm.org/citation.cfm?id=1926618.1926641

[104] K. R. Jackson, L. Ramakrishnan, K. Muriki, S. Canon, S. Cholia, J. Shalf, H. J. Wasserman, and N. J. Wright, “Performance analysis of high performance computing applications on the amazon web services cloud,” in CLOUDCOM’10, 2010, pp. 159–168. [Online]. Available: http://dx.doi.org/10.1109/CloudCom.2010.69

[105] S. Ostermann, A. Iosup, N. Yigitbasi, R. Prodan, T. Fahringer, and D. Epema, “A performance analysis of ec2 cloud computing services for scientiﬁc computing,” in Cloud

211 REFERENCES

Computing, ser. Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, D. Avresky, M. Diaz, A. Bode, B. Ciciani, and E. Dekel, Eds., vol. 34. Springer Berlin Heidelberg, 2010, pp. 115–131. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-12636-9 9

[106] D. Jayasinghe, S. Malkowski, Q. Wang, J. Li, P. Xiong, and C. Pu, “Variations in performance and scalability when migrating n-tier applications to diﬀerent clouds,” in CLOUD’11, 2011, pp. 73–80. [Online]. Available: http://dx.doi.org/10.1109/CLOUD. 2011.43

[107] D. Kossmann, T. Kraska, and S. Loesing, “An evaluation of alternative architectures for transaction processing in the cloud,” in SIGMOD’10, 2010, pp. 579–590. [Online]. Available: http://doi.acm.org/10.1145/1807167.1807231

[108] G. Wang and T. S. E. Ng, “The impact of virtualization on network performance of amazon ec2 data center,” in Proceedings of the 29th Conference on Information Communications, ser. INFOCOM’10. Piscataway, NJ, USA: IEEE Press, 2010, pp. 1163–1171. [Online]. Available: http://dl.acm.org/citation.cfm?id=1833515.1833691

[109] J. Chen, C. Wang, B. B. Zhou, L. Sun, Y. C. Lee, and A. Y. Zomaya, “Tradeoﬀs between proﬁt and customer satisfaction for service provisioning in the cloud,” in Proceedings of the 20th International Symposium on High Performance Distributed Computing, ser. HPDC ’11. New York, NY, USA: ACM, 2011, pp. 229–238. [Online]. Available: http://doi.acm.org/10.1145/1996130.1996161

[110] P. Marshall, K. Keahey, and T. Freeman, “Elastic site: Using clouds to elastically extend site resources,” in Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, ser. CCGRID ’10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 43–52. [Online]. Available: http://dx.doi.org/10.1109/CCGRID.2010.80

[111] Y. Jie, J. Qiu, and Y. Li, “A proﬁle-based approach to just-in-time scalability for cloud applications,” in 2009 IEEE International Conference on Cloud Computing (CLOUD’09), Sept 2009, pp. 9–16. [Online]. Available: http://dx.doi.org/10.1109/CLOUD.2009.87

[112] H. Ghanbari, B. Simmons, M. Litoiu, C. Barna, and G. Iszlai, “Optimal autoscaling in a iaas cloud,” in Proceedings of the 9th International Conference on Autonomic Computing, ser. ICAC ’12. New York, NY, USA: ACM, 2012, pp. 173–178. [Online]. Available: http://doi.acm.org/10.1145/2371536.2371567

[113] S. Dutta, S. Gera, A. Verma, and B. Viswanathan, “Smartscale: Automatic application scaling in enterprise clouds,” in 2012 IEEE 5th International Conference on Cloud Computing (CLOUD), June 2012, pp. 221–228.

212 REFERENCES

[114] A. Kertesz, G. Kecskemeti, and I. Brandic, “An sla-based resource virtualization approach for on-demand service provision,” in Proceedings of the 3rd International Workshop on Virtualization Technologies in Distributed Computing, ser. VTDC ’09. New York, NY, USA: ACM, 2009, pp. 27–34. [Online]. Available: http: //doi.acm.org/10.1145/1555336.1555341

[115] ı. n. Goiri, F. Juli´ı, J. O. Fit´o, M. Mac´ıAs, and J. Guitart, “Supporting cpu-based guarantees in cloud slas via resource-level qos metrics,” Future Gener. Comput. Syst., vol. 28, no. 8, pp. 1295–1302, Oct. 2012. [Online]. Available: http://dx.doi.org/10.1016/j.future.2011.11.004

[116] N. Bobroﬀ, A. Kochut, and K. Beaty, “Dynamic placement of virtual machines for managing sla violations,” in 2007. IM ’07. 10th IFIP/IEEE International Symposium on Integrated Network Management, May 2007, pp. 119–128.

[117] H. N. Van, F. D. Tran, and J.-M. Menaud, “Sla-aware virtual resource management for cloud infrastructures,” in Proceedings of the 2009 Ninth IEEE International Conference on Computer and Information Technology - Volume 02, ser. CIT ’09. Washington, DC, USA: IEEE Computer Society, 2009, pp. 357–362. [Online]. Available: http://dx.doi.org/10.1109/CIT.2009.109

[118] J. M. Pujol, G. Siganos, V. Erramilli, and P. Rodriguez, “Scaling Online Social Net- works without Pains,” in 5th International Workshop on Networking Meets Databases (NetDB’09), 2009.

[119] Rackspace, Inc., “Rackspace cloud monitoring: Developer guide,” http://docs.rackspace. com/cm/api/v1.0/cm-devguide/cm-devguide-20140311.pdf, Rackspace Inc., Tech. Rep., Mar. 2014, date accessed: Nov. 2012.

[120] Rackspace,Inc., “Rackspace cloud load balancer: Getting started guide,” Rackspace Inc., Tech. Rep., Jan. 2014, date accessed: Feb. 2014. [On- line]. Available: http://docs.rackspace.com/loadbalancers/api/v1.0/clb-getting-started/ clb-getting-started-20140103.pdf

[121] Michael Sheehan, “How to create a distributed, reliable, and fault- tolerant gogrid dynamic load balancer,” http://blog.gogrid.com/2013/02/26/ how-to-create-a-distributed-reliable-fault-tolerant-gogrid-dynamic-load-balancer/, Feb. 2013, date accessed: Jun. 2013.

[122] Google Inc., “Google compute engine,” https://cloud.google.com/products/ compute-engine/, Aug. 2013, date accessed: Jun. 2013.

213 REFERENCES

[123] G. Casale, N. Mi, L. Cherkasova, and E. Smirni, “Dealing with burstiness in multi-tier applications: Models and their parameterization,” IEEE Trans. Soft. Eng., vol. 38, no. 5, pp. 1040–1053, 2012.

[124] R. Jain, Art of Computer Systems Performance Analysis Techniques For Experimental Design Measurements Simulation And Modeling. New York, USA: Wiley Computer Publishing, 1991.

[125] D. Gross, J. F. Shortle, J. M. Thompson, and C. M. Harris, Fundamentals of Queueing Theory, 4th ed. New York, NY, USA: Wiley-Interscience, 2008.

[126] D. A. Menasce, L. W. Dowdy, and V. A. F. Almeida, Performance by Design: Computer Capacity Planning By Example. Upper Saddle River, NJ, USA: Prentice Hall PTR, 2004.

[127] C. Millsap, “Thinking clearly about performance,” Queue, vol. 8, no. 9, pp. 10:10–10:20, Sep. 2010. [Online]. Available: http://doi.acm.org/10.1145/1854039.1854041

[128] C. Stewart, T. Kelly, and A. Zhang, “Exploiting nonstationary for performance prediction,” in Proceedings of the European System Conference, ser. EuroSys’03. New York, NY, USA: ACM, Mar. 2007, pp. 164–177.

[129] D. A. Menasc, “Testing e-commerce site scalability with tpc-w.” in Int. CMG Conference, 2001, pp. 457–466. [Online]. Available: http://dblp.uni-trier.de/db/conf/cmg/cmg2001. html#Menasce01

[130] G. Paciﬁci, W. Segmuller, M. Spreitzer, M. Steinder, A. Tantawi, and A. Youssef, “Managing the response time for multi-tiered web applications,” IBM Research, Tech. Rep. RC23651, July 2005. [Online]. Available: http://domino.watson.ibm.com/library/ cyberdig.nsf/papers/B260A7AFF3C130928525704B004E6A9B/$File/rc23651.pdf

[131] T. P. P. Council, “Tpc benchmark web commerce speciﬁcation (tpc-w),” http://www. tpc.org/tpcw/spec/tpcw V1.8.pdf, Tech. Rep. 202, Feb 2002, date accessed: Jan. 2011.

[132] T. Horvath, “Tpc-w java implementation,” http://www.cs.virginia.edu/∼th8k/ downloads/, Nov. 2008, date accessed: Nov. 2011.

[133] B. C. Tak, C. Tang, C. Zhang, S. Govindan, B. Urgaonkar, and R. N. Chang, “vpath: precise discovery of request processing paths from black-box observations of thread and network activities,” in USENIX Annual Technical Conference (USENIX’09), 2009, pp. 19–28. [Online]. Available: http://dl.acm.org/citation.cfm?id=1855807.1855826

[134] S. Haines, Pro Java EE 5 Performance Management and Optimization. USA: Apress, 2006.

214 REFERENCES

[135] D. L. Massart, J. A. Smeyers-Verbeke, X. Capron, and K. Schlesier, “Presentation of data by means of box plots,” LC GC Europe, vol. 18, no. 4, pp. 215–218, 2005.

[136] D. Gmach, J. Rolia, L. Cherkasova, and A. Kemper, “Workload analysis and demand prediction of enterprise data center applications,” in Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization, ser. IISWC ’07. Washington, DC, USA: IEEE Computer Society, 2007, pp. 171–180. [Online]. Available: http://dx.doi.org/10.1109/IISWC.2007.4362193

[137] P. Bodik, A. Fox, M. J. Franklin, M. I. Jordan, and D. A. Patterson, “Characterizing, modeling, and generating workload spikes for stateful services,” in Proceedings of the 1st ACM symposium on Cloud computing, ser. SoCC ’10. New York, NY, USA: ACM, 2010, pp. 241–252. [Online]. Available: http://doi.acm.org/10.1145/1807128.1807166

[138] A. Bahga and V. Madisetti, “Performance evaluation approach for multi-tier cloud applications,” Journal of Software Engineering and Applications, vol. 6, no. 1, pp. 74–83, 2013. [Online]. Available: http://www.scirp.org/journal/jsea

[139] H. J. Moon, Y. Chi, and H. Hacig¨um¨u¸s, “Performance evaluation of scheduling algorithms for database services with soft and hard slas,” in DataCloud-SC ’11, 2011, pp. 81–90. [Online]. Available: http://doi.acm.org/10.1145/2087522.2087536

[140] R. Gow, S. Venugopal, and P. K. Ray, “”the tail wags the dog”: A study of anomaly detection in commercial application performance,” in Proceedings of the 2013 IEEE 21st International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems, ser. MASCOTS ’13. Washington, DC, USA: IEEE Computer Society, 2013, pp. 355–359. [Online]. Available: http://dx.doi.org/10.1109/MASCOTS.2013.51

[141] A. Mahanti, N. Carlsson, A. Mahanti, M. F. Arlitt, and C. Williamson, “A tale of the tails: Power-laws in internet measurements,” IEEE Network, vol. 27, no. 1, pp. 59–64, 2013. [Online]. Available: http://dx.doi.org/10.1109/MNET.2013.6423193

[142] A. Clauset, C. R. Shalizi, and M. E. J. Newman, “Power-law distributions in empirical data,” SIAM Rev., vol. 51, no. 4, pp. 661–703, Nov. 2009. [Online]. Available: http://dx.doi.org/10.1137/070710111

[143] M. J.D., F. Carlos, B. Prashant, B. Scott, and R. Dennis, Microsoft Patterns and Prac- tices: Performance Testing Guidance for Web Applications, Microsoft MSDN Library, 2007, date accessed: Aug. 2013.

[144] Amazon Web Services Inc., “Amazon ec2 instances,” http://aws.amazon.com/ec2/ instance-types/, Oct 2011, date accessed: Nov. 2013.

215 REFERENCES

[145] B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, “Benchmarking cloud serving systems with ycsb,” in Proceedings of the 1st ACM symposium on Cloud computing, ser. SoCC’10. New York, NY, USA: ACM, 2010, pp. 143–154. [Online]. Available: http://doi.acm.org/10.1145/1807128.1807152

[146] A. Nathani, S. Chaudhary, and G. Somani, “Policy based resource allocation in iaas cloud,” Future Gener. Comput. Syst., vol. 28, no. 1, pp. 94–103, Jan. 2012. [Online]. Available: http://dx.doi.org/10.1016/j.future.2011.05.016

[147] Z. Xiao, W. Song, and Q. Chen, “Dynamic resource allocation using virtual machines for cloud computing environment,” IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 6, pp. 1107–1117, 2013.

216