MULTI-TIER INTERNET SERVICE MANAGEMENT :

STATISTICAL LEARNING APPROACHES

by

SIREESHA MUPPALA

B.E., Nagarjuna University, India, 1992

A dissertation submitted to the Graduate Faculty of the

University of Colorado at Colorado Springs

in partial fulfillment of the

requirements for the degree of

Doctor of Philosophy

Department of Computer Science

2013 This dissertation for the Doctor of Philosophy degree by

Sireesha Muppala

has been approved for the

Department of Computer Science

by

Xiaobo Zhou, Chair

Edward Chow

Chaun Yue

Rory Lewis

Jia Rao

Liqiang Zhang

Date

ii Muppala, Sireesha (Ph.D., Computer Science)

Multi-tier Internet Service Management: Statistical Learning Approaches

Dissertation directed by Associate Professor, Chair Xiaobo Zhou

Modern Internet services are multi-tiered and are typically hosted in virtualized shared platforms. While facilitating flexible service deployment, multi-tier architecture introduces significant challenges for Quality of Service (QoS) provisioning in hosted Internet services. Complex inter-tier dependencies and dynamic bottleneck tier shift are challenges inherent to tiered architectures. Hard-to-predict and bursty session-based

Internet workloads further magnify this complexity. Virtualization of shared platforms adds yet another layer of complication in managing the hosted multi-tier Internet services.

We consider three critical aspects of Internet service management for improved performance and qual- ity of service provisioning : admission control, dynamic resource provisioning and service differentiation.

This thesis concentrates on statistical learning based approaches for multi-tier Internet service management to achieve efficient, balanced and scalable services. Statistical learning techniques are capable of solving complex dynamic problems through learning and adaptation with no priori domain-specific knowledge. We explore the effectiveness of supervised and unsupervised learning in managing multi-tier Internet services.

First, we develop a session based admission control strategy to improve session throughput of multi- tier Internet services. Using a supervised bayesian network, it achieves coordination among multiple tiers resulting in a balanced service. Second, we promote session-slowdown, a novel session-oriented metric for user perceived performance. We develop a regression based dynamic resource provisioning strategy, which utilizes a combination of offline training and online monitoring, for session slowdown guarantees in multi-tier systems. Third, we develop a reinforcement learning based coordinated combination of admission control and adaptive resource management for multi-tier Internet service differentiation and performance improvement in a shared virtualized platform. It addresses limitations of supervised learning by integrating model-independence of reinforcement learning and self-learning of neural networks for system scalability

iii and agility. Finally, we develop an user interface based Monitoring and Management Console, intended for an administrator to monitor and fine tune the performance of hosted multi-tier Internet services.

We evaluate the developed management approaches using an e-commerce simulator and an implementa- tion testbed on a virtualized blade server system hosting multi-tier RUBiS benchmark applications. Results demonstrate the effectiveness and efficiency of statistical learning approaches for QoS provisioning and per- formance improvement in virtualized multi-tier Internet services.

iv Dedication

This thesis is dedicated to my father, Dasaraju Chandrashekhar Raju, for his constant support, encouragement and for teaching me the love of hard work.

v Acknowledgements

I would like to first and foremost thank my advisor, Dr. Xiaobo Zhou, for his amazing support and guidance throughout my Ph.D. program. His continuous demand for the best possible work and his enthusiastic will- ingness to help his students reach their goals were invaluable to me. I would also like to thank my graduate committee members, Dr. Chow, Dr. Yue, Dr. Lewis, Dr. Rao and Dr. Zhang, for their help and encour- agement throughout my student life at UCCS. Their critique and feedback at the research proposal stage tremendously helped improve the quality of this thesis. Many thanks to my fellow DISCO lab members,

Palden Lama and Yanfei Guo for their help over the past few years.

I am incredibly fortunate to have the support of my family. My husband, Damu and kids, Anurag and

Amritha never let me quit, even in the toughest of situations. I am forever grateful for their support and sacrifices they made so I could reach my goal. I am also thankful for my parents and mother-in-law for their experience, advice and encouragement.

Without the flexibility extended by my manager, Robert Holstine, I could not have justified being a part- time Ph.D. student while holding a full-time job. I am thankful for the privilege of working with my lead Ben

Kirk and the rest of the most talented and generous team at Blackhawk Network. Thanks to Yishiuan Chin and many, many other friends for their strength and love. Late night work sessions were bearable and even fun because of their companionship and humor.

The research and dissertation were supported in part by the US National Science Foundation CAREER

Award CNS-0844983 and research grant CNS-0720524. I thank the NISSC for providing blade servers for conducting the experiments.

vi Contents

1 Introduction 1

1.1 Typical Multi-tier Internet Service Architecture ...... 2

1.2 Virtualized Data Centers and Cloud Computing ...... 4

1.3 Multi-tier Internet Service Management and Challenges ...... 6

1.4 Statistical Learning for Multi-tier Internet Service Management ...... 12

1.5 Research Objectives ...... 13

1.6 Research Contributions ...... 17

1.7 Thesis Roadmap ...... 18

2 Related Work 20

2.1 Multi-tier Systems : Analytical Models ...... 20

2.2 Admission Control for Internet Services ...... 22

2.3 Dynamic Resource Provisioning of Internet Services ...... 24

2.4 Service Differentiation in Internet Services ...... 26

2.5 Related Issues in Internet Service Management ...... 29

2.5.1 Network Virtualization ...... 29

2.5.2 Server Consolidation, VM Placement and VMMigration ...... 30

2.5.3 Storage Virtualization ...... 30

2.5.4 Security ...... 31

2.5.5 Distributed Data Centers ...... 31

vii 2.6 Statistical Learning Techniques for Internet Service Management ...... 32

2.7 Summary ...... 33

3 Coordinated Session Based Admission Control In Multi-tier Internet Services 36

3.1 Introduction ...... 36

3.2 Session Based Admission Control : Algorithms ...... 38

3.2.1 A Blackbox Approach ...... 38

3.2.2 The MBAC Approach ...... 38

3.3 Coordinated Session Based Admission Control : A Statistical Learning Approach ...... 42

3.3.1 Bayesian Networks - Background ...... 43

3.3.2 Multi-tier Internet Service : A Bayesian Network Model ...... 44

3.3.3 Conditional Probability Tables (CPTs) and Bayesian Network Training ...... 45

3.3.4 CoSAC : Operation ...... 46

3.4 Performance Evaluation ...... 48

3.4.1 Experimental Setup ...... 48

3.4.2 Impact of Multi-tier Architecture on Session based Admission Control ...... 49

3.4.3 Why not accept Ordering sessions only ...... 51

3.4.4 Impact of CoSAC on Throughput ...... 53

3.4.5 Choosing the Admission Control Interval ...... 56

3.5 Summary And Discussion ...... 58

4 Dynamic Server Provisioning for Multi-tier Internet Service Performance Guarantees 60

4.1 Introduction ...... 60

4.2 Statistical Regression Analysis : Background ...... 62

4.3 Terms and Definitions ...... 63

4.3.1 Session Slowdown ...... 63

4.3.2 Tier Session Slowdown Ratio ...... 64

4.3.3 Resource Utilization ...... 65

viii 4.3.4 A Behavior Model ...... 66

4.4 Dynamic Resource Provisioning : A Statistical Regression Approach ...... 66

4.4.1 The Training Phase : Learning Behavior Models ...... 67

4.4.2 The Online Phase : Dynamic Server Provisioning ...... 71

4.5 Performance Evaluation ...... 74

4.5.1 Experimental Setup ...... 74

4.5.2 Session Slowdown Guarantee ...... 74

4.5.3 Efficiency in Per-Tier Resource Allocation ...... 77

4.5.4 Impact of Session Slowdown Threshold on Performance ...... 78

4.5.5 Impact of Resource Utilization Threshold on Performance ...... 79

4.5.6 Impact of Online Monitoring Interval ...... 81

4.5.7 Validation of Regression Models and Comparison with an Analytical Model . . . . . 82

4.5.8 Impact of TPC-W Workload Burstiness on Performance ...... 87

4.6 Summary and Discussion ...... 89

5 Multi-tier Internet Service Differentiation 90

5.1 Introduction ...... 90

5.2 Reinforcement Learning and Neural Networks : Background ...... 92

5.3 Multi-tier Service Differentiation and Performance Improvement ...... 95

5.3.1 Problem Statement ...... 95

5.3.2 Problem Formulation ...... 96

5.4 The System Design for Scalability and Agility ...... 98

5.5 Algorithms ...... 100

5.5.1 Basic Reinforcement Learning for VM Auto-Configuration ...... 100

5.5.2 Basic Reinforcement Learning for Session-Based Admission Control ...... 102

5.5.3 Cascade Neural Network Enhancements ...... 103

5.6 Testbed Implementation ...... 104

5.7 Performance Evaluation ...... 106

ix 5.7.1 Effectiveness of VM Auto-configuration - Stationary Workloads ...... 106

5.7.2 Effectiveness of VM Auto-configuration - Dynamic Workloads ...... 107

5.7.3 Effectiveness of Coordinated VM Auto-configuration and Admission Control . . . . 110

5.7.4 Impact of Bursty Workloads ...... 111

5.7.5 Sensitivity Analysis of the Learning Algorithms ...... 113

5.7.6 Agility Analysis of the Learning Algorithms ...... 115

5.7.7 Scalability Analysis of the Learning Algorithms ...... 116

5.7.8 Comparison with Statistical Regression Based Resource Provisioning ...... 117

5.8 Summary and Discussion ...... 122

6 Multi-tier Internet Service Management and Monitoring Console 124

6.1 MISMC Features ...... 125

6.1.1 Dashboard ...... 125

6.1.2 Applications ...... 127

6.1.3 Virtual Machines ...... 127

6.1.4 Notifications ...... 131

6.1.5 Reports ...... 131

6.2 Future Enhancements ...... 133

7 Conclusion 136

7.1 Contributions and Accomplishments ...... 136

7.2 Future Work ...... 138

x List of Figures

1.1 Typical multi-tier Internet service architectures...... 3

1.2 An abstracted 3-tier Internet service...... 4

1.3 A shared hosting platform...... 5

1.4 Key issues in virtualized data centers...... 6

1.5 Integrated dependency model for multi-tier Web systems ...... 8

1.6 Dynamics of a 3-tier e-commerce service...... 10

1.7 Traffic volume intensity variations at an online travel agency...... 11

1.8 1998 FIFA World Cup trace over ten consecutive days : Burstiness of traffic arrival . . . . . 12

1.9 A multi-tier Internet service with admission control...... 15

1.10 Dynamic resource provisioning in a multi-tier Internet service...... 16

3.1 CoSAC architecture...... 42

3.2 Bayesian network model of a multi-tier Internet service...... 44

3.3 Predicted utilizations of a three-tier website...... 50

3.4 Throughput comparison with different traffic mixes...... 52

3.5 Impact of admission control strategies on session throughput (with 1:1:1 workload mix). . . 54

3.6 Impact of admission control strategies on session throughput (with 3:2:1 workload mix). . . 55

3.7 Impact of admission control strategies on session throughput (with dynamic workload). . . . 57

4.1 A typical three-tier Internet service architecture...... 64

4.2 “Allocated resources - session slowdown” and “Allocated resources - resource utilization”. . 68

xi 4.3 Session slowdown due to regression based dynamic provisioning...... 75

4.4 Resources allocation at the overall Internet service and at the individual tiers...... 76

4.5 Impact of using a resource utilization threshold on the session slowdown...... 77

4.6 Impact of using a resource utilization threshold on the virtual server allocations...... 78

4.7 Impact of session slowdown threshold on slowdown guarantee and resource allocation. . . . 79

4.8 Impact of the resource utilization threshold on resource allocation efficiency...... 80

4.9 Impact of interval length on session slowdown...... 81

4.10 “Allocated virtual servers - session slowdown” and “Allocated virtual servers - resource uti-

lization” relations for a queuing model of a multi-tier Internet service...... 84

4.11 Session slowdown behavior with extended MVA algorithm...... 84

4.12 Performance comparison of regression and MVA based approaches...... 85

4.13 Bursty TPC-W workload...... 88

4.14 Performance evaluation with bursty TPC-W workload...... 88

5.1 Reinforcement Learning and Neural Networks ...... 93

5.2 A platform hosting multi-tier applications on virtualized servers...... 95

5.3 The diagram of the system design...... 99

5.4 Relative service differentiation between two RUBiS applications (stationary workload). . . . 107

5.5 A highly dynamic step-change workload...... 108

5.6 Relative service differentiation between two RUBiS applications (dynamic workloads). . . . 108

5.7 Resource allocation and utilization of the VM hosting the web tier of application A...... 109

5.8 Relative service differentiation between two RUBiS applications (CPU-only allocation). . . 110

5.9 Coordinated VM auto-configuration and admission control for service differentiation. . . . . 111

5.10 Coordinated VM auto-configuration and admission control for performance improvement. . 111

5.11 Coordinated VM auto-configuration and admission control for service differentiation (bursty

workload)...... 112

5.12 Comparison of VM auto-configuration and coordinated approaches for service differentia-

tion, application response time and provisioning oscillations (bursty workload)...... 112

xii 5.13 Effect of reinforcement learning parameter values ...... 114

5.14 Effect of cascade neural network learning parameters ...... 115

5.15 Agility of the enhanced reinforcement learning approach with cascading neural network. . . 115

5.16 Scalability of the enhanced reinforcement learning approach under dynamic workloads. . . . 117

5.17 Comparison of regression and reinforcement learning based VM auto-configuration...... 123

6.1 Dashboard ...... 126

6.2 Application management ...... 128

6.3 Application monitoring ...... 129

6.4 Virtual machine management ...... 130

6.5 Virtual machine monitoring ...... 131

6.6 Notification configuration...... 132

6.7 Notification view...... 133

6.8 Report configuration and view ...... 134

xiii List of Tables

1.1 The shifting bottleneck tier problem...... 10

1.2 Statistical learning techniques ...... 14

2.1 Related research summary - I ...... 34

2.2 Related research summary - II ...... 35

3.1 Session based admission control strategies : notations ...... 38

3.2 CPT for WebTierState Node ...... 46

3.3 Evidences applied to the Bayesian network...... 47

3.4 TPC-W transactions ...... 48

3.5 Request compositions in TPC-W...... 49

3.6 Accepted sessions by Blackbox and MBAC...... 51

4.1 Training workload characteristics...... 67

4.2 A behavior model...... 71

4.3 Regression-based dynamic resource provisioning strategy: notations ...... 72

4.4 Experimental workload characteristics...... 74

4.5 Inputs for Mean-Value Analysis algorithm...... 83

5.1 Training workload characteristics...... 119

5.2 A behavior model...... 121

xiv Chapter 1

Introduction

The Internet has become an indispensable segment of modern day society with billions of users world-wide.

The last few years witnessed an explosion of internet users from 394 million in the year 2000 to over 2.4 billion by June, 2012 [114]. Private citizens, business corporations, academic and government institutions rely heavily on the Internet on a daily basis. Internet services have evolved radically from serving static web pages to delivering highly dynamic and interactive content with extensive multi-media support. Influence of ubiquitous online services such as trading platforms, social networking and e-commerce on national and global economies is undeniable. J.P.Morgan, a leading financial company, forecasts a $963 billion global e- commerce revenue for the year 2013 [90]. Ongoing technology innovations, such as virtualization and cloud computing continue to revolutionize the way private corporations and government institutions manage their online presence with reduced IT infrastructure costs.

In this chapter, we discuss the typical Internet service architecture and identify challenges in managing multi-tier services. We introduce our choice of technical approach, statistical learning, to address these critical challenges. We next present our research focus and contributions. The chapter concludes with a road map to the rest of the thesis.

Modern Internet services are complex systems that typically employ a multi-tier architecture. In a multi- tier architecture, the functionalities of an Internet service are distributed among two or more tiers, with each tier serving distinct tasks. Each tier provides a certain functionality to its preceding tier and utilizes the

1 2 functionality provided by its successor [104]. The primary advantage of tiered architectures is ease of service manageability and maintenance. It is possible to repair or upgrade a tier, while the others remain both unaware and unaffected by the change [113]. Other advantages include scalability and reduced data replication.

In a 2-tier architecture, the service functionality is divided between presentation and database tiers. The client program residing in the presentation tier communicates directly with the database server residing in the database tier. The client is responsible for both browser display and application logic execution. The client workstation can thus be optimized for data input and presentation by providing mouse and graphics support.

The server can similarly be optimized for data processing and storage with large amounts of disk space and memory. While this simple structure facilitates ease of setup and maintenance, it can cause a bottleneck for data requests. Moreover, the client programs may get too complicated with various application logic rules and result in performance degradation.

The bottleneck and performance problems are addressed by a 3-tier model, where the service functional- ity is distributed among the web, application and database tiers. In a 3-tier architecture, the complex business logic is moved to the application layer. The result is a less complicated client program residing in the cus- tomer’s browser. A single server in the application layer can handle multiple clients simultaneously, thereby reducing the performance bottleneck effect. Database access is now limited to the application servers where tighter security measures can be employed. Figures 1.1 (a) and (b) show the tasks handled by each tier and tier interactions in 2-tiered and 3-tiered Internet services respectively.

1.1 Typical Multi-tier Internet Service Architecture

In multi-tiered Internet services, multiple tiers participate in the processing of each incoming request from the users. Depending on the processing demand, a tier may be replicated using clustering techniques. In such case, a dispatcher is used at each replicated tier to achieve load balancing by distributing requests among the replicas. Figure 1.2 depicts a typical e-commerce service with three tiers, where the first two tiers are replicated, while the third tier is not. Such an architecture is commonly employed by e-commerce services where a clustered Web server and a clustered Java application server constitute the first two tiers, and the third tier consists of a non-replicable database [104]. 3

(a) Two tiers

(b) Three tiers

Figure 1.1: Typical multi-tier Internet service architectures. 4

Figure 1.2: An abstracted 3-tier Internet service.

More complicated ‘n-tier’ architectures are also possible by separating each tier into multiple functional components. For example, the database tier may be separated into ‘data access’ and ‘data storage’ tiers.

In this thesis, we focus on Internet services deployed using a typical 3-tier architecture. The approaches developed are however, applicable to general ’n-tier’ architectures.

1.2 Virtualized Data Centers and Cloud Computing

Modern multi-tier Internet services are often hosted in data centers or on cloud computing platforms that abstract the data center resources. Data centers today play a major role in providing on demand computing power to various enterprise applications supporting different business processes such as e-commerce, human resource, payroll, customer relationship management, etc. [69]. Data centers typically apply server virtual- ization to host multiple Internet services that share the underlying high density server resources in the same platform.

Individual Internet services are typically hosted in dedicated virtual containers. A virtual container can be a virtual machine (VM) provided by hypervisor technologies including Xen [4] and VMware [6] or OSlevel virtualization like OpenVZ [7] and VServer [8]. A conceptual view of a shared platform hosting multiple Internet services is provided in Figure 1.3.

A public or private cloud platform supports on-demand allocation of virtual resources to the hosted ser- vices to satisfy the dynamic workloads. Public clouds facilitate “pay-as-you-go” charging for the resources provided to the hosted services. For example Amazon’s EC2, a leading public cloud platform in the industry 5

Figure 1.3: A shared hosting platform.

, supports dynamic provisioning (aka ”auto scaling”) where VMs are automatically started when a threshold on user specified metric such as CPU utilization is exceeded in the current service [1].

Popularity of the shared infrastructure paradigm is rapidly increasing in many enterprises due to several benefits offered by the server virtualization. The benefits include but are not limited to :

• Server consolidation : Multiple services are consolidated onto a single high density physical server.

This translates to reduced infrastructure and operational costs in data centers.

• High utilization : Sharing resources allows better utilization of overall data center resources.

• Performance Isolation : The hosted services can be isolated from each other by running them in ded-

icated virtual machines. Different virtualization technologies differ in levels of performance isolation,

but most of them offer safety, correctness and fault-tolerance isolation.

However, virtualization does not come for free. It adds a fair amount of complexity to the data center en- vironments. Figure 1.4 identifies the key areas that need to be addressed in an effective manner to completely reap the benefits offered by virtualized data centers and cloud computing platforms.

Performance of hosted services is affected by everything from the creation, imaging and deployment of 6

Figure 1.4: Key issues in virtualized data centers.

VMs, to managing the available resources at multiple granularities. Network virtualization plays a crucial role in QoS provisioning in public cloud platforms hosting high performance applications. Technological advances are also targeting storage virtualization to enable full pledged virtual data centers. Security is yet another critical issue instrumental in improving the adoption of clouds for mission critical applications. Data centers are also under economical and environmental pressures to reduce power consumption and minimize the green house gas emissions. Distributed data centers with physical resources in multiple geographical locations provide a viable option for realizing scalability and high availability. However, they present yet another set of challenges such as geographic load balancing, database mirroring and session monitoring across multiple data centers.

1.3 Multi-tier Internet Service Management and Challenges

The popularity of multi-tier Internet services continues to grow with an exponentially increasing customer base. Customers demand and expect ubiquitous availability, high performance and quality of service (QoS) guarantees from these services. Most of modern Internet services provide a service directly to end users. Thus 7 the service success criteria should be defined at the customer level, not just as easily verifiable service-level semantics.

However managing the multi-tier Internet services for optimal performance is non-trivial. Data center administrators managing the hosted Internet services often do not have comprehensive knowledge of the service being managed. Administrators have at best an incomplete view (and at worst a wrong view) of what the services are doing, what they are supposed to be doing and how they are doing it. This leads to unreliable services that are difficult to manage [55].

Kephart and Chess [53] originally defined autonomic computing as a general methodology in which computing systems can manage themselves given high-level objectives from administrators. It aims for self- management that attempts to “free system administrators from the details of system operation and mainte- nance and to provide users with a machine that runs at peak performance 24/7”. Since then, automating computer systems has been an active topic of research.

In this thesis, we systematically study three critical multi-tier Internet service management mechanisms :

1. Admission control : To regulate the load exercised on multi-tier Internet service resources by policing

and selectively accepting the incoming traffic.

2. Dynamic resource provisioning : To enable real-time adaptation to workload variations through on-

demand allocation and removal of resources from a multi-tier Internet service.

3. Service differentiation : To provide differentiated treatment of multiple multi-tier Internet services

hosted in a virtualized data center.

By exploring novel statistical learning based approaches for automating the three identified management mechanisms, our research aims to achieve improved performance and QoS assurances in hosted multi-tier

Internet services. QoS provisioning in multi-tier Internet services is a non-trivial challenge. While en- abling modularity and simplified service deployment, multi-tier architectures result in a complex environ- ment, which needs to serve unpredictable Internet workloads. Virtualization of the host platform further magnifies this complexity.

We now discuss several challenges encountered in managing multi-tier Internet services. 8

Figure 1.5: Integrated dependency model for multi-tier Web systems

1. Inter-tier dependencies : Multi-tier Internet services exhibit complex inter-tier dependencies, with

each tier depending on its preceding tier and providing functionality to its successor. An integrated

dependency model that identifies different types of inter-tier software dependencies in a multi-tier

system as proposed in [73] is shown in the Figure 1.5. However, the inter-tier dependency dynamics in

their entirety are so complex and complicated that it is even a big challenge to get a good understanding

of the entire system dynamic behavior [19, 89].

Moreover, the resource demands posed by user sessions on the individual tiers vary, but are also depen-

dent and correlated to each other. For instance, the web tier may consume mainly CPU and network

bandwidth, whereas the database tier consumes more I/O bandwidth than the web tier does. However,

a database tier only serves connections established through the web tier [69].

2. Dynamic session length : Typical customer interaction with an Internet service is captured as a

session. A session is a sequence of related requests of different types made by a customer during a

single visit to an Internet service. For example, consider a customer’s on-line shopping experience at

a retail e-commerce web site. A customer’s session involves multiple requests that search for products

of interest, retrieve information about a specific product, add the selected product to the shopping 9

cart, initiate the check-out process, and finally commit the order. In this scenario, completion and

performance of the session in its entirety is a critical QoS goal, in contrast to the well adopted individual

request-based QoS goals.

Request-based QoS goals typically aim for absolute performance metrics, response time and queueing

delay. However, they are not applicable to the session based workloads because of the dynamic session

length. Session length, which is the number of requests in a session, is dynamic and is unknown at

the time of session origination. Therefore, it is not practical to provide absolute session response time

or session delay guarantees. A relative performance metric that is independent of session length is

favorable for session based Internet services.

3. Tier-specific servers and constraints : Despite being interdependent, each tier of a multi-tier Internet

service comprises of distinct servers with distinct characteristics. In a typical e-commerce service, web

servers make up the web tier, application tier is composed of Java application servers and database tier

hosts SQL servers. While a web server may exclusively handle HTTP protocol, an application server is

capable of handling additional protocols like TCP/IP and SMTP. Similarly while the application servers

host business objects like EJBs, a database server hosts a database instance like MySQL. Different

server types are characterized by distinct performance metrics. Collectively modeling the effects of

different type of servers is a complex task [105].

Further complications result from different constraints enforced at different tiers. For instance, not all

tiers of a multi-tier Internet service may be replicable. Typically, database tier is difficult to replicate

on-the-fly.

4. Dynamic bottleneck tier shift : In a multi-tier Internet service, the resource demands posed by user

sessions on different tiers is dynamic in nature. Different customers often exhibit different navigational

patterns and hence invoke different functions in different ways and with different frequencies[117]. As

the client access patterns change, the bottleneck tier dynamically shifts among tiers.

To demonstrate the dynamic behavior of multi-tier Internet services, we simulate the activities of an

e-commerce service using the industry standard TPC-W benchmark workloads. TPC-W supports three 10

Intervals Bottleneck Tier

1, 4-8, 10-40, 42-50 None

2 All

3, 9 Web

41 Database

Table 1.1: The shifting bottleneck tier problem.

100 WebTierUtilization 90 AppTierUtilization DBTierUtilization 80

70

60

50

40 Utilization % 30

20

10

0 0 10 20 30 40 50 Sampling Intervals

Figure 1.6: Dynamics of a 3-tier e-commerce service. distinct session mixes, Browsing, Shopping and Ordering. Each mix is characterized by different prob- ability based session navigational patterns. Sessions belonging to different mixes visit the tiers varying number of times with different workloads. (TPC-W workloads and e-commerce service simulator are further discussed in detail in Chapter 3).

Figure 1.6 depicts the tier-specific capacity utilizations measured at different sampling intervals when the multi-tier service is subjected to a combination of equal number of TPC-W Browsing, Shopping and Ordering sessions. A tier is considered to be the bottleneck tier if its capacity utilization exceeds a pre-configured threshold, set to 60% for this experiment. Table 1.1 captures the observed bottleneck tier for each sampling interval. It is clear that a different tier becomes the bottleneck at different intervals. For instance, during the sampling interval 9 the web tier is the bottleneck, whereas the database tier becomes the bottleneck during the interval 41. This demonstrates the bottleneck shift 11

Figure 1.7: Traffic volume intensity variations at an online travel agency.

challenge inherent to multi-tier Internet services.

5. Highly variable workloads : Internet services experience extreme user demand variations because of

the unpredictable nature of the Internet traffic. Predicting the peak workload of an Internet service and

capacity provisioning based on worst case estimates is notoriously difficult [106]. Often unforeseeable

events such as stock market’s roller coaster ride, terror attacks, Mars landing, can result in a surge of

Internet traffic. The unexpected traffic surges can quickly saturate the service capacity and hence affect

the services and even lead to lawsuits due to breakage of service level agreements [117].

Internet workloads exhibit long-term variations such as time-of-day effects as well as short-term fluc-

tuations identified by burstiness (temporal surges). The workload experienced by a top national online

travel agency web site is characterized in [84]. The study reveals the traffic patterns and load variations

experienced by the web site over a period of 7 days as shown in the Figure 1.7.

Internet traffic burstiness is explored in [79]. The work analyzes the 1998 FIFA World Cup web-site

traces over a period of ten days. The analysis reveals dramatic traffic surges connected to sport events

as shown in Figure 1.8. Burstiness or temporal surges in the incoming requests in an e-commerce

server generally turns out to be catastrophic for performance, leading to dramatic server overloading,

uncontrolled increase of response times and, in the worst case, service unavailability.

6. Virtualization : In shared virtualized hosting platforms, performance of hosted services relies on ef- 12

Figure 1.8: 1998 FIFA World Cup trace over ten consecutive days : Burstiness of traffic arrival

fective management of the each virtual machine’s resource configuration. VMs should be dynamically

resized in response to the change in application demands. Dynamic Internet workloads can possibly

make prior good VM configurations no longer suitable and result in significant performance degrada-

tion. Once a VM is reconfigured, there is a possibility of delay before the performance can stabilize.

The difficulty in evaluating the immediate output of management decisions makes the modeling of

application performance even harder [116].

In addition to the performance of the hosted services, it is important to optimize the system-wide

resource utilization. Although server virtualization helps realize performance isolation to some extent,

in practice, VMs still have chances to interfere with each other. It is possible that one rogue hosted

service could adversely affect the others [18].

1.4 Statistical Learning for Multi-tier Internet Service Management

Statistical learning is concerned with the design and development of algorithms that allow computers to evolve behaviors based on knowledge gained from dynamic observation. The learning strategies are clas- sified into two types : supervised and unsupervised. Supervised learning typically operates in two phases, training and prediction. While training phase is used to gain generalized knowledge about the system under consideration, prediction phase is used to predict the system behavior using the knowledge gained. Unsuper- vised techniques, on the other hand, operate independent of training data. Reinforcement learning, a form of unsupervised learning, is the process of learning by interacting with dynamic environments without requiring an explicit offline models. In this thesis, we apply both types of statistical learning and their combination towards management of multi-tier Internet services. 13

Statistical learning is our choice of technical approach due to the following advantages :

1. Due to the complex nature of the multi-tier Internet services and high variability of the Internet work-

loads, it is extremely difficult, if not impossible, to derive a concrete analytical model of a multi-tier

system that effectively captures the complete system dynamics. Statistical learning as a modeling tool

provides an attractive alternative solution, where the behavioral dynamics of the multi-tier service can

be “learned” based on observing its operation, without requiring a priori application-specific knowl-

edge.

2. Dynamic observation of a multi-tier Internet service during the training phase captures its behavior for

a large number of distinct workloads. Capturing a large volume of behavior allows us to take advantage

of the “law of large numbers” and apply statistical techniques with reasonable confidence [55].

3. By employing statistical learning approaches, a multi-tier Internet service operating in a dynamic en-

vironment gains the ability to learn and adapt to workload variations. Therefore an administrator can

effectively manage hosted multi-tier Internet services without having to foresee and provide solutions

for all possible workload scenarios.

The table 1.2 identifies the various statistical learning techniques utilized throughout this dissertation. De- tailed description of each learning technique and its role in managing multi-tier Internet services is provided in later chapters.

1.5 Research Objectives

Our primary research goal is effective and efficient management of multi-tier Internet services for improved performance and QoS provisioning, while realizing efficient, balanced and scalable services. Towards this goal, we explore statistical learning based approaches for three critical management mechanisms : admission control, dynamic resource provisioning and service differentiation, in the context of hosted virtualized multi- tier Internet services. We argue for and justify a novel relative performance metric, Session Slowdown, that is favorable for session-based Internet workloads. Finally, we design and develop a user-interface based Multi- 14

Table 1.2: Statistical learning techniques

Learning Technique Learning Type Description

Bayesian Network Supervised A probabilistic graphical model that depicts probability

distribution over a set of problem domain model variables.

Statistical Regression Supervised A framework to gain knowledge of a problem domain,

construct system models and make quantitative predictions

based on the models.

Reinforcement Learning Unsupervised A reinforcement learner interacts with its environment and

receives reward or penalty for each of its actions. It aims

to generate policies to optimize a long-term goal.

Cascading Neural Networks Supervised Self organizing and self learning neural networks, that

Unsupervised capture complex non-linear relationship between multiple

problem domain variables. tier Internet Service Management and Monitoring Console, which can be used by a database administrator to monitor and manage hosted multi-tier Internet services in a shared virtualized platform.

A brief discussion of the topics that comprise our research follows.

1. Session based admission control : When a hosted multi-tier Internet service experiences transient

high user demand, restricting its availability to users is necessary to avoid complete service breakdown.

Admission control protects the service by preventing resource overload through policing and selective

acceptance of incoming traffic. Admission control employed at the request level can certainly protect

the service from overload. However it might result in resource wastage in the form of aborted sessions.

To improve the effective session throughput of a multi-tier Internet service, admission control ought

to be employed at session level. A multi-tier Internet service equipped with an admission controller is

depicted in Figure 1.9.

2. A session oriented relative performance metric - Session slowdown: The service quality and re-

sponsiveness of a request based system are typically measured by the absolute performance metrics,

response time and queueing delay. However, these metrics do not take into consideration the varying 15

Figure 1.9: A multi-tier Internet service with admission control.

demands posed by the different requests. It is known that clients are more likely to anticipate short

delays for “small” requests and more willing to tolerate long delays for “large” requests [47]. Request

slowdown is the relative ratio of a request’s queueing delay to its service time. Because the slowdown

metric directly translates to user-perceived relative performance and system load, it has been accepted

as an important performance metric on servers [47, 127, 128].

In addition to the traditionally accepted request oriented performance metrics, we consider the per-

formance of the session in its entirely an important QoS goal. The session length, which is the total

number of requests of the session, is unknown at the time of session origination. Due to dynamic ses-

sion length, it is not practical to guarantee absolute session completion time and session delay of a user

session. Instead, a relative performance metric that is independent of session length is appropriate for

session based Internet services [128]. We promote a new session-oriented relative performance metric

session slowdown, that captures user perceived performance at session level.

3. Dynamic resource provisioning for performance assurance : Dynamic resource provisioning

enables on-the-fly resource management to effectively handle dynamic workloads experienced by a

hosted Internet service. Ideally the service should be assigned necessary and sufficient amount of re-

sources to handle its current load [100]. A high level view of dynamic resource provisioning to provide

QoS assurances in a multi-tier Internet service is depicted in Figure 1.10. As the observed performance

of the service deteriorates, additional resources are added to satisfy the session slowdown guarantees.

Conversely, as the allocated resources become under-utilized they are dynamically removed from the

Internet service. 16

Figure 1.10: Dynamic resource provisioning in a multi-tier Internet service.

We consider two levels granularity for adaptive resource provisioning. One, number of virtual ma-

chines allocated to the web, application and database tiers of a multi-tier Internet service. Second,

multiple resources (cpu, memory, disk space) of the three dedicated virtual machines hosting the web,

application and database tiers of the multi-tier service.

4. Service differentiation : An Internet service can support multiple classes of customers such as pre-

mium members, regular subscription members and free users. Traffic from the premium members con-

tributes more to system revenue than traffic from the other classes. Therefore, the priority of serving

the sessions initiated by the premium members always precedes that of serving other classes. Ser-

vice differentiation, a key QoS requirement for multi-tier Internet services [34, 35, 69], is to provide

differentiated QoS treatment of multiple customer classes.

Service differentiation is applicable not only to multiple classes of a single applications, but also to

multiple Internet services hosted in a single shared platform. Provisioning differentiated treatment

to co-hosted Internet services is necessary due to their different subscription payments to the hosting

platform [30, 35, 40, 69, 87]. Service differentiation is to provide different service quality levels to

meet changing system configuration and resource availability and to satisfy different requirements and

expectations of applications and users.

5. Monitoring and Managing hosted services : Success of an e-commerce business depends on its web

site being available and responsive to its customers. Optimizing the performance and availability of 17

such Internet services is a crucial task. Monitoring the service’s resources and viewing the service from

an end user’s perspective is critical to ensure that the service is executing with acceptable performance

with regards to the customer service level agreements (SLA). In-depth monitoring can pinpoint existing

and potential problems leading to detecting problems before they impact the end users. Ideally, an

administrator should be alerted to outages, error conditions and threshold violations before they effect

the end users.

1.6 Research Contributions

Our research contributions are as follows :

1. A statistical learning based coordinated session based admission control strategy for a hosted Internet

service. It employs a Bayesian Network model of the multi-tier service to achieve inter-tier coordina-

tion for a balanced service and improved session throughput.

2. A novel session oriented relative performance metric, session slowdown, which is independent of the

session length. We promote the use of session slowdown to capture the user perceived QoS for session

based Internet workloads.

3. A statistical regression based adaptive resource provisioning strategy for session slowdown assurances

and effective resource utilization in a multi-tier Internet service under dynamic workloads. It considers

provisioning at the granularity of virtual machines, that is, allocating and removing the virtual machines

at the various tiers of a hosted multi-tier Internet service.

4. A reinforcement learning based coordinated combination of adaptive resource management and ses-

sion based admission control for simultaneous service differentiation and performance maximization

of multiple multi-tier services co-hosted in a virtualized platform. A basic reinforcement learning based

strategy is enhanced with cascading neural networks to facilitate scalability and agility. Provisioning

is conducted at the granularity of virtual machine resources, that is, adaptively managing cpu, memory

and disk resources of host virtual machines. 18

5. A user interface based desktop tool, Multi-tier Internet Service Management and Monitoring Console,

intended to be used by a data center administrator to monitor and manage multiple multi-tier Internet

services in a shared virtualized platform.

1.7 Thesis Roadmap

The remaining of this thesis is organized as follows.

Chapter 2 provides an overview of published research literature related to Internet services, multi-tier models, virtualization and data centers. We discuss the state-of-art admission control, dynamic resource provisioning and service differentiation techniques for single and multi-tier systems. We further discuss representative efforts that adopted statistical learning approaches in the domain of multi-tier Internet services.

In Chapter 3, we propose a novel statistical learning based ‘Coordinated Session-based Admission Con- trol (CoSAC)’ algorithm, which uses a Bayesian network to correlate the utilization states of all tiers. We conduct performance evaluation of the proposed admission control strategy using an e-commerce simulator and TPC-W benchmark workloads.

Chapter 4 provides a statistical regression based approach for session-oriented performance guarantees in a multi-tier Internet service. Regression models to capture service dynamic behavior are learned during training, which are further used to achieve session slowdown guarantees under dynamic workloads. We conduct performance evaluation using an e-commerce simulator and industry standard TPC-W benchmark to demonstrate the effectiveness and efficiency of the regression based adaptive resource provisioning.

A model-independent coordinated combination of adaptive resource management and admission control for multi-tier Internet service differentiation and performance improvement of individual hosted services is designed in Chapter 5. We develop reinforcement learning based approaches for virtual machine auto- configuration and session based admission control. We implement the integrated approach in a virtualized blade server system hosting multi-tier RUBiS applications and conduct extensive performance evaluation.

A user interface based desktop tool ‘Multi-tier Internet Service Management and Monitoring Console’, is presented in Chapter 7. The application allows a data center administrator to monitor and manage multi- ple multi-tier Internet services hosted in a shared virtualized platform. The administrator can fine tune the 19 performance of different statistical learning based strategies used for managing the services.

Chapter 7 concludes the thesis with summarization of the proposed multi-tier Internet service manage- ment techniques and evaluation results, a discussion of identified drawbacks and directions for future work. Chapter 2

Related Work

Multi-tier Internet services and their management is an active research topic with literature contributions from academic institutions and corporate research facilities alike. Data centers that typically host the multi-tier ser- vices and the underlying paradigms of virtualization and cloud computing have been extensively studied in the recent past. There exists an enormous body of research related to analytic models of multi-tier architec- tures, admission control, capacity planning, adaptive resource management and service differentiation in the domain of Internet services, see [43] for a survey. In this chapter we categorize and discuss representative related research efforts.

2.1 Multi-tier Systems : Analytical Models

Understanding complete system dynamics in a multi-tier system is a complex and non-trivial task. Several research efforts attempt to model multi-tier services and target specific QoS goals [34, 120, 93, 119].

Urgaonkar et al. in [104] proposed an analytic model for session-based multi-tier services using a network of queues, where each queue represents a different tier of the service. Their queuing model can handle services with an arbitrary number of tiers and account for service idiosyncrasies such as replication at tiers, load imbalances across replicas, caching effects, and concurrency limits at each tier. It proposes a Mean-

Value Analysis (MVA) algorithm for closed-queuing networks to compute the response time experienced by

20 21 a request in a network of queues. A very similar work in [28] models multi-station queues to capture the multi-thread architecture and handle the concurrency limits.

Diao et al. in [34, 35] propose another queuing network representation of multi-tier systems to enable dif- ferentiated services in web services. It combines layered queuing approaches with a few key approximation- based simplifications. Dependencies between multiple tiers, per-tier concurrency limits and resource con- tention are captured using a M/M/1 queueing model.

Queuing models were also used to represent a virtualization scheme in [5], where physical resources were partitioned into multiple virtual resources. Each virtual machine is represented by an analytical M/G/1 open queuing model, which is further used to predict relevant performance metrics.

In [120] an analytic model based on a network of queues was designed for evaluating multi-tier system performance. This model is capable of modeling diverse workloads with changing transaction mix over time.

It’s effectiveness is based on a regression based methodology to approximate the CPU demands of customers transactions on a given hardware along all the tiers in the system.

Bhulai et al. in [13] modelled a multi-tiered Internet service within a queueing theoretical framework.

Their model consists of N+1 nodes with a single entry node which receives the requests and sends requests to the other N nodes in a deterministic order. The request arrival distribution is assumed to be a Poisson process. Using this model, they derive expressions for the mean end-to-end latency and approximations to its variance.

A multi-tier web application is abstracted as a an M/GI/1 Processor Sharing queue (M/GI/1/PS) in [68].

It uses a feed forward queueing model predictor to maintain the system state near an equilibrium operation point, inspite of changes in the workload arrival process. Combined with adaptive control feedback, the queueing model predictor enforces admission control of the incoming requests to ensure that the desired response time target is met.

Layered queuing models are also applied in [50], to provide end-to-end solution for fine-grain dynamic re- source management in virtualized, consolidated server environments hosting multiple multi-tier applications.

It presents a novel hybrid approach for automatic generation of adaptation policies that uses a combination of offline queuing model evaluation, optimization, and offline rule generation using decision tree learning. 22

In a multi-tier system, the user perceived performance is the result of a complex interaction of complex workloads in a very complex underlying system [78]. Workload flows to a data center are often characterized as bursty. Recent studies [79, 97] have seen highly dynamic workloads of Internet applications that fluctuate over multiple time scales, which can have a significant impact on the processing demands imposed on servers.

While analytical models exist for multi-tier systems, each model aims for a specific QoS goal, such as bounded response time or request delay. Extending a QoS metric oriented model for another metric, like session slowdown, is a non-trivial challenge. Moreover, queuing based analytical models often fail to capture the dynamic bottleneck shift and session workload dynamics on multi-tier systems.

In this thesis, we use statistical learning approaches to gain insight into the complex dynamics of multi- tier environments and build scalable Internet services that provide high performance along with request and session oriented QoS guarantees under unpredictable workloads.

2.2 Admission Control for Internet Services

Admission control for e-commerce services is a non-trivial challenge and receives extensive attention from the research community [12, 22, 25, 67, 130].

Zhou et al. in [121] propose a load shedding mechanism for busy Internet services for overload protection.

The work recommends a selective early request termination mechanism to actively detect and abort overdue long requests to improve system throughput. However the focus of this work is a single web server and it aims to improve the request throughput. In this thesis, we consider the multi-tiered architecture of the modern

Internet services with the goal of improving session throughput.

The work in [39] deploys an admission controller between the service tier and the bottleneck database tier in a three-tier web site. It identifies different types of servlets and performs overload protection and preferen- tial request scheduling in the form of shortest job first. It assumes that the database tier is the bottleneck and overload control is applied to protect it. We argue that assuming a static bottleneck tier is very simplistic and does not represent the true dynamics of a multi-tier system. Effective overload management of a multi-tier system should take into consideration the dynamic bottleneck shift based on the workload experienced.

Kamra et al. in [51] present Yaksha, a control-theoretic approach for admission control in multi-tiered 23

Web sites that both prevents overload and enforces absolute client response times, while still maintaining high throughput under load. It utilizes a self-tuning proportional integral controller and does not require parameterization of controller weights. It’s focus however, is still the response time guarantees at the request level and request based throughput. Improved throughput of requests does not correspond to the improved session throughput.

The work in [52] uses an on-line feedback loop with an adaptive controller to selectively accept storage access requests in data centers to ensure that the available system throughput is shared among workloads according to their performance goals and their relative importance. The controller considers the system as a black box and adapts automatically to system and workload changes. The controller is distributed to ensure high availability under overload conditions, and can be applied for both block and file access protocols.

The work in [42] focuses on session-based admission control for secure dynamic web contents. It recog- nizes the fact that the cost of establishing a new Secure Socket Layer (SSL) connection is much greater than that of a resumed SSL connection. An admission control approach was designed that prioritizes the resumed

SSL connection for performance improvement.

SBAC is an innovative work on session-based admission control on e-commerce websites [29]. With a simulation model, it shows that an overloaded web server can experience a severe loss of throughput measured as a number of completed sessions compared against the server throughput measured in completed requests.

It is able to provide a fair guarantee of session completion, for any accepted session, independent of a session length. However, SBAC was designed for overload control in a single web server and is not effective in a multi-tier server system as the bottleneck tier shifts among tiers when access patterns change dynamically.

Overload control and service differentiation in multi-tier web services under bursty workloads is ad- dressed in [70] using autonomic session-based admission control. It proposes AWAIT, which differenti- ates between the requests of new and already accepted sessions. Requests of already accepted sessions are buffered in a blocking queue instead of rejecting them. The blocking queue capacity is adjusted in response to workload burstiness.

While admission control has been extensively documented in literature, these mechanisms do not provide any coordination between multiple tiers for session-based admission control. One of the goals of this thesis 24 is to improve session throughput in multi-tier Internet services by employing a statistical learning based coordinated admission control.

2.3 Dynamic Resource Provisioning of Internet Services

Dynamic resource provisioning for Internet services and adaptive resource management of data centers has been an active research topic in the recent past [5, 19, 34, 49, 51, 76, 103, 104, 106, 109, 112].

Bennani et al. in [10] addresses server provisioning in large data centers that host several service envi- ronments and are subjected to workloads whose intensity varies widely and unpredictably. They use analytic performance models design efficient controllers that dynamically switch servers from one service environ- ment to another as needed.

Along the same lines, resource allocation for various service environments in a generic large-scale utility computing infrastructures is addressed by Costa et al. in [33]. They provide a decentralized resource selection service where each compute node is directly responsible for providing accurate and timely information about their resources. Eliminating a centralized resource delegation service represents a simple solution to both implement efficient lookups and support large dimensional data.

In [57], Lama and Zhou proposed an efficient server provisioning scheme based on an end-to-end resource allocation optimization model and a model-independent fuzzy controller for the average and 90th-percentile request delay guarantees. They designed the model-independent fuzzy controller to address the lack of an accurate performance model.

A middleware for controlling performance and availability of cluster-based multi-tier systems, MoKa, is proposed in [7]. It uses an analytic model to predict the performance, availability and cost of cluster- based multi-tier services. A utility function based capacity planning algorithm calculates the optimal service configuration, guarantees performance and availability objectives while minimizing functioning cost.

A similar argument presented in [108] states that the way the application tier is provisioned can signif- icantly impact a provider’s profit margin. It designs queueing-theoretic methods to provision servers in the application tier with a profit optimization model.

Spietkamp et al. in [99] proposed a capacity planning method for virtualized IT infrastructures that uses 25 employs a combination of data preprocessing and an optimization model. Their approach can solve large- scale server consolidation problems in minutes based on simplifying heuristics. They prove that the size of the optimization problem to be solved is significantly impacted by the resource demands experienced by the services hosted in the virtualized platforms.

The challenge of dynamic provisioning of database servers in multi-tier e-commerce applications in cloud computing platforms is addressed in [23]. It proposes Dolly, a database provisioning system based on VM cloning and cost models to adapt the provisioning policy to the cloud infrastructure specifics and application requirements. They show that the VM cloning replication time depends only the VM disk size and is inde- pendent of the database size, schema complexity and database engine. Using this information, they propose models to accurately estimate the database replica spawning time.

The work in [111] proposes a probabilistic performance model between the virtualized resources and the hosted application performance. The relationships between CPU allocation, CPU contention, and application response time are further used to enable autonomic controllers to satisfy service level objectives while effec- tively utilizing the available resources. The model is based on the probability distributions of performance metrics, in terms of percentiles.

Xu et al. in [116] present a unified reinforcement learning approach, URL, to automate the configuration of both the VMs and the applications hosted in the VMs. Through this combined configuration, their approach is flexible enough to make good trade-offs between the resource utilization and application performance goals.

To accelerate the learning process in large scale systems and improve the data efficiency, they employ model based reinforcement learning.

Rao et al. in [86] design a reinforcement learning strategy for virtual machine auto-configuration called

VCONF. It automates the VM configuration and dynamically reallocates the resources allocated to VMs in response to the change of service demands or resources supply. The same authors propose iBalloon [85], a distributed learning mechanism that facilitates self-adaptive virtual machines resource provisioning. In this system, each VM acts as an autonomous agent and requests resources for its own benefit. Thus the resource allocation is treated as a distributed learning task.

In this thesis, we explore dynamic resource provisioning for multi-tier Internet services in virtualized 26 shared hosting platforms. We use threshold based strategies to determine when to do the dynamic provision- ing and statistical learning to determine how many virtual machines to provision. We uniquely aim for session oriented performance guarantees in addition to traditionally accepted request oriented guarantees. Addition- ally, we also consider resource provisioning at a finer granularity - resources of each host virtual machine.

Using advanced statistical learning approaches we explore model-independent adaptive resource provision- ing strategies that can operate without the need to model the complexities of multi-tier Internet services and their workloads.

2.4 Service Differentiation in Internet Services

Several research efforts [130, 128, 126, 123, 124, 122] addressed service differentiation in Internet services.

Its provisioning was extensively studied in the context of single-tier Web servers [8, 12, 67].

Generally there were two approaches to resource allocation for the purpose of ensuring service quality level of requests or preserving the quality spacings between request classes. One is priority-based and the other is rate-based. Admission control, feedback control, and content adaptation are often combined with two general approaches; see [3, 71, 128] for representative techniques and [129] for a survey.

Lee et al. in [66] proposed using traffic shaping and admission control to achieve proportional service differentiation for web servers. Maximum waiting time requirements of multiple clients are effectively met using two distinct admission control algorithms, one client based and the other server based.

A web server that can provide differentiated services to clients with different quality of service require- ments is considered in [65]. It proposes to using efficient admission control strategies to achieve effective service differentiation for the clients and to reduce usage costs of the server without violating the QoS re- quirements.

Queueing based analytical models for request slowdown differentiation on single-tier Internet servers are proposed in [127, 128]. In [127], Zhou et al. derived a closed form expression of the expected request slowdown in an M/G/1 FCFS queue with a bounded Pareto service time distribution.

The problem of quantitative service differentiation on cluster-based delay-sensitive servers is addressed by [125]. The work formulates the problem of quantitative service differentiation as a generalized resource 27 allocation optimization towards the minimization of system delay, defined as the sum of weighted delay of client requests. It derives a closed-form expression of the expected slowdown of a popular heavy-tailed workload model with respect to resource allocation on a server cluster.

Garcia et al. discussed main requirements that quality-of-service (QoS) control mechanisms must ful-

fill [40]. However, the proposed admission control based mechanism only works for absolute service guaran- tee on a single tier (application server tier), rather than on a multi-tier architecture. There is no quantitative service differentiation between multiple applications on a shared platform.

There are recent studies on service differentiation provisioning for multi-tier applications [24, 34, 35,

40, 69]. Diao et al. focused on modeling service differentiation in multi-tier Internet applications based on M/M/1 queueing and the mean value analysis [34, 35]. However, the simple queueing model does not reflect the real Internet application workload characteristics [21].

Liu et al. developed a multivariate control approach for dynamic capacity management for differentiated treatment of multi-tier applications in a shared hosting platform [69]. Assuming a linear relationship between the CPU resource entitlement and response time of a virtual machine, it achieves differentiated request re- sponse time guarantees. It is dependent on a model predictive control, relying on an accurate workload model that is difficult to obtain particularly in the face of highly dynamic and bursty workloads. The approach may require modification to the virtualization layer’s CPU scheduling mechanism, while a mechanism indepen- dent of the underlying virtualization and system software is favorable [40]. It does not address performance maximization of multiple applications with the limited resources in a shared platform.

In a recent study, Rao et al. designed a QoS provisioning approach that can provide absolute service differentiation of multiple classes but it cannot control the relative quality spacings between different appli- cations [87]. Overload control and service differentiation in multi-tier web services under bursty workloads is addressed using autonomic session-based admission control [70].

A study in [16] addresses the differentiation of multiple hosted services in terms of their availability guarantees. It proposes a self-managed key-value store that dynamically allocates the cloud resources of to several applications in a fair, cost-efficient fashion. Using a combination of VM migration and replication, they achieve multiple dynamic differentiated availability guarantees to each different application despite node 28 failures.

Padala et al. in [82] proposes AutoControl, that provides service differentiation according to the hosted application priorities during resource contention. AutoControl is a combination of an online model estimator and a novel multi-input, multi-output (MIMO) resource controller. The model estimator captures the complex relationship between application performance and resource allocations, while the MIMO controller allocates the right amount of multiple virtualized resources to achieve application SLOs.

Sharing the network and disk I/O among the hosted application and performance isolation of VMs in cloud platforms is addressed in [54]. It introduces the notion of Differential Virtual Time (DVT), which can provide service differentiation with performance isolation for VM guest OS resource management mechanisms. DVT is realized within a proportional share I/O scheduling framework without any changes to guest OSs. DVT is applied to message-based I/O, but is also applicable to subsystems like disk I/O.

While the works discussed so far use either admission control or dynamic resource management to achieve service differentiation, they do not consider a combination of the two for controlling the quality spacings between the hosted service classes. Almeida et al. addressed the problem of combined resource allocation and admission control in virtualized servers and provided a self-managing optimization technique to improving application response time [4]. Mazzucco et al. proposed heuristic server allocation and admission control algorithms for autonomic service provisioning systems [72], which can provide differentiated services to multiple customers and maximize the average revenue received per unit time. Similar combination was used to maximize provider profits in SaaS cloud systems [6].

While these few efforts explored the combination of admission control and dynamic resource manage- ment, they offer no coordination between the two. We design a model independent service differentiation approach, that utilizes a coordinated combination of adaptive VM auto-configuration and session-based ad- mission control. It aims to provide both relative service differentiation and performance improvement of multi-tier applications. 29

2.5 Related Issues in Internet Service Management

Besides admission control, resource provisioning and service differentiation, there are several other issues that affect management of hosted multi-tier Internet services. In this section, we briefly discuss a few repre- sentative research efforts that address these related issues.

2.5.1 Network Virtualization

In addition to the VM specific resources (cpu, memory, disk size), another important shared resource of a data center is its network. Network virtualization is described as a networking environment that allows one or more multiple service providers to compose heterogeneous virtual networks that co-exist together but in isolation from each other and to deploy customized end-to-end services on those virtual networks by ef- fectively sharing and utilizing underlying network resources provided by infrastructure providers [31]. A service-oriented network virtualization architecture was developed in [38] which consists of physical infras- tructure layer, virtual network layer, and service network layer from bottom to top. Analytical modeling and analysis techniques for evaluating end-to-end QoS in service-oriented network virtualization have also been developed in [37].

The authors of [95] show that simply relying on TCP’s congestion control in production data centers can lead to performance interference and denial of service attacks. As a solution, they present Seawall, a net- work bandwidth allocation scheme that divides network capacity according to policies specified by network administrators. Seawall supports dynamic policy changes and scales large number of hosted applications and varying resource demands.

VL2 proposed in [41] is a network architecture for data centers with uniform high capacity between servers, performance isolation between services, and Ethernet layer-2 semantics. It uses flat addressing to allow service instances to be placed anywhere in the network, load balancing to spread traffic uniformly across network paths and end system based address resolution to scale to large server pools without introducing complexity to the network control plane.

The authors of [44] propose a data center network virtualization architecture called SecondNet to enable the abstraction of Virtual Data Centers abstraction that achieves high network utilization and low time com- 30 plexity. It achieves scalability by distributing all the virtual-to-physical mapping, routing, and bandwidth reservation state in server hypervisors. The port-switching based source routing (PSSR) technique makes it applicable to arbitrary network topologies using commodity servers and switches.

2.5.2 Server Consolidation, VM Placement and VMMigration

Server consolidation is addressed in [27] with the goal of assigning VMs to as few physical servers as pos- sible. The authors propose a novel VM sizing approach called effective sizing. Effective sizing decides a

VM’s resource demand through statistical multiplexing principles. It takes into consideration various factors impacting the aggregated resource demand of a host to decide where the VM should be placed. The consol- idation is posed as a stochastic bin packing problem. Bobroff et al. in [15] propose an approach to identify the servers that are good candidates for dynamic placement, and present a mechanism for dynamic migration of VMs based on predicted loads. Along the same lines, Meng et al. in [77] propose a joint-VM provisioning approach in which multiple VMs are consolidated and provisioned based on an estimation of their aggregate resource need. Power efficiency is the main goal in another VM placement strategy proposed in [107]. The work in [118] addresses VM migration with the goal of integrated power and performance management. It provides data center administrators an UI to configure and control the system, so that performance costs and power costs are minimized while reducing the number of VM migrations performed.

2.5.3 Storage Virtualization

Virtualization of data center storage involves virtualizing physical storage in the enterprise storage area net- work (SAN) into virtual disks that can then be used by multiple applications. Storage virtualization also supports live migration of data in which a virtual disk can be migrated from one physical storage subsystem to another without any downtime.

An integrated server storage virtualization technique to scale to large amounts of storage is proposed in [110]. Local storage on each physical machine is partitioned into two parts, a persistent cache for locally hosted VMs and a contribution to a pool of distributed storage shared by the cluster. These two tasks are pro- vided as a service running in an isolated “Parallax VM”. Parallax is deployed inside Virtual Machine Monitor 31

(VMM) using host based storage virtualization, which requires changes to server virtualization environment for deployment.

A storage virtualization solution that is independent of VMM is proposed in [96]. The authors propose

HARMONY, an integrated solution for storage and server virtualization in a data center. They propose a load balancing algorithm called VectorDot for handling the hierarchical and multi-dimensional resource constraints in large data centers. They pose the problem as na multi-dimensional knapsack problem.

2.5.4 Security

Multi-tier Internet services are used in many important contexts and their security is a critical concern. The work in [17] proposes a redundant authentication as a strategy for achieving robust multi-tier security. It limits the privileges of higher tiers and requires that the higher tiers produce non-repudiable evidence that their principals have recently authenticated themselves to the service.

Security of the database tier, specfically protecting the database from SQL injection attacks, is addressed in [56]. The authors argue that one of the major security weak points of a website is a vulnerable database.

This may result in the loss of data, damage to the database, unauthorized access, bypassing the authentica- tion mechanisms or denial of services. They propose a framework for building secure and anti-theft web applications that operates in four stages. In each stage the user input data is analyzed for possibility of an attack.

Security vulnerabilities due to misconfiguration of cloud services, such as faulty network security con-

figurations, is discussed in [14]. It presents a novel approach in the security assessment of the end-user configuration of multi-tier architectures deployed on infrastructure clouds such as Amazon EC2. They pro- pose a query and policy language for the analysis which can be used to obtain insights into the configuration and to specify desired and undesired configurations.

2.5.5 Distributed Data Centers

Managing hosted applications in distributed data centers is addressed in [2]. It considers both state full ad stateless Internet applications, with the goal of building cost effective data centers. The problem is solved 32 using a combination of mixed integer programming and heuristic algorithms. Cost effectiveness is achieved by relaxing delay requirement for a small fraction of users.

The work in [20] discusses job migration between data centers. The authors of this work argue against injudicious job migration which might increase the overall operation cost due to the bandwidth costs of trans- ferring application state and data over the wide-area network. It proposes novel online algorithms for migrat- ing batch jobs between data centers to effectively handle tradeoff between energy and bandwidth costs. The algorithms consider multiple factors such as current availability, cost of (possibly multiple) energy sources and future variability and uncertainty.

Rao et al. in [92, 91] tackle the problem of minimizing the total electricity across multiple Internet data centers. They consider the joint cyber and physical management capabilities of data centers, and exploit both the central load balancing, and the server-level power control in a unified scheme. The problem is formulated as a constrained mixed integer programming based on Generalized Benders Decomposition (GBD) technique.

2.6 Statistical Learning Techniques for Internet Service Management

Statistical learning techniques are recently gaining popularity in the Internet services domain. As multi- tier systems grow in complexity, empirical models built using statistical learning have great potential in overcoming the scalability and complexity challenges [19, 89].

Statistical machine learning techniques have been used to measure the capacity of Internet websites [89].

It uses a bayesian network to correlate low level instrumentation data such as system and user CPU time, available memory size, and I/O status that are collected at run-time to high level system states in each tier of a multi-tier web site. A decision tree is induced over a group of coordinated bayesian models in different tiers to identify the bottleneck dynamically when the system is overloaded.

The work in [26] applies the K-nearest-neighbors (KNN) machine learning approach for adding database replicas in dynamic content Web server clusters. Their KNN based proactive scheme is effective in reducing both the frequency and peak level of service-level-agreement violations compared to the traditional reactive schemes.

Statistical learning techniques are also effectively used in configuration and tuning of system parameters. 33

A machine learning agent in combination with predefined rules is used in [115] to dynamically reconfigure high-end servers that can be partitioned into logical subsystems. This allows on-the-fly reconfiguration of distributed systems online to optimize for dynamically changing workloads.

The work in [19] uses a reinforcement learning approach for autonomic configuration and reconfiguration of multi-tier web systems. The proposed technique effectively adapts the performance parameter settings not only to the change of workload, but also to the change of virtual machine(VM) configurations. Reinforcement learning strategy is also used for virtual machine auto-configuration by VCONF [86]. It automates the VM configuration and dynamically reallocates the resources allocated to VMs in response to the change of service demands or resources supply.

Tesauro et al. proved the feasibility of statistical learning in autonomic resource allocation [101, 103] and power management [102]. A distributed learning mechanism in [85] facilitates self-adaptive virtual machines resource provisioning. In this system, each VM acts as an autonomous agent and requests resources for its own benefit. Thus the resource allocation is treated as a distributed learning task. These studies have demonstrated the effectiveness of using statistical learning techniques in system performance detection.

The work in [11] aims for an energy-efficient data center through an intelligent consolidation of turning on/off machines, power-aware consolidation algorithms, and machine learning techniques to deal with un- certain information while maximizing performance. Their machine learning framework uses models learned from previous system behaviors in order to predict power consumption levels, CPU loads, and SLA timings, and improve scheduling decisions.

2.7 Summary

In this chapter, we provided a brief discussion of few representative research efforts in the domain of virtu- alized data centers managing multiple multi-tier Internet services. We summarize the literature reviewed in this chapter and a few additional works mentioned throughout this thesis, in Table 2.1 and Table 2.2. 34

Table 2.1: Related research summary - I

Citation, Year Motivation Technical Approach

[64], 2013 Performance assurances in a data center through autonomic server provisioning Neural fuzzy control

[63], 2012 Performance isolation and reduced energy consumption in virtualized data center Control theoretic framework + adaptive machine learning

[61], 2012 Automation of heterogeneous Cloud resources Two-phase machine learning and optimization framework based on support vector

machines

[62], 2012 Virtual server provisioning for quality-of-service assurance in multi-tier Internet Model-independent fuzzy controller

applications

[45], 2012 Automated and agile server parameter tuning for maximizing effective throughput Neural fuzzy control

of multi-tier Internet applications

[46], 2012 Coordinated performance control and power management in virtualized machines A genetic algorithm with multi-agent reinforcement learning

[37], 2012 Modelling of end-to-end network delivery in virtualized environments Analytical models

[116], 2012 Automated configuration of virtualized machines and appliances running in the vir- Reinforcement learning

tual machines.

[2], 2012 Cost effective distributed data centers Mixed integer programming + Heuristics

[60], 2011 Simultaneous power and performance guarantees in data centers Fuzzy logic + MIMO control + Artificial neural network

[59], 2011 Tradeoff between performance and resource allocation efficiency in data center Multi-objective stress-strain curves

[83], 2011 Power and performance management in virtualized server clusters Optimization strategy based on mixed integer programming model

[85], 2011 Self-adaptive virtual machines resource provisioning in cloud environments Distributed multi agent machine learning

[87], 2011 Response time assurances of web servers in virtualized environments Self-tuning fuzzy control

[18], 2011 Coordinated configuration of virtual machines and their applications in virtualized Hybrid reinforcement learning

environments

[23], 2011 Dynamic database provisioning in cloud computing platforms Virtual machine cloning using KNN machine learning

[30], 2011 QoS differentiated provisioning of services Optimization by solving constraint satisfaction problem

[27], 2011 Server consolidation Statistical multiplexing + bin packing problem

[58], 2010 Autonomic server provisioning for performance assurance in data centers. Model-independent and self-adaptive neural fuzzy control.

[4], 2010 Joint resource allocation and admission control for QoS in service oriented archi- Non-linear programming + heuristic decomposition.

tectures

[7], 2010 Performance and availability of cluster-based multi-tier systems Analytical model and utility functions

[11], 2010 Energy efficient data center Machine learning

[16], 2010 Differentiated availability guarantees in data clouds Self managed key-value store

[21], 2010 Minimize power usage in data centers under bursty workloads Mathematical expressions for Index of dispersion

[38], 2010 Network virtualization in data centers Service-oriented layered hierarchical model

[97], 2010 Dynamic provisioning technique for non-stationarity in the workload of Internet K-means clustering algorithm

data centers

[99], 2010 Server consolidation in virtualized data centers Heuristics based mathematical programming

[111], 2010 Relationship between application performance and virtualized resource allocation Probabilistic models

[44], 2010 Data center network virtualization architecture Port switching based routing

[14], 2010 Security in user-configurable cloud services Query and Policy language

[92],2010 Power aware data centers Mixed integer programming based on Generalized Benders Decomposition

[91], 2010 Minimize power costs across distributed data centers Mixed integer programming based on Generalized Benders Decomposition

[19], 2009 Autonomic configuration and reconfiguration of multi-tier web systems Reinforcement learning

[31], 2009 Network virtualization in Internet Survey of the state-of-art

[33], 2009 Resource selection in utility computing infrastructure Centralized and local network query routing

[40], 2009 Differentiation between distinct categories of service consumers and protection Control theory

against server overloads

[41], 2009 Scalable network architecture in large data centers Flat addressing + Valiant load balancing + End-system based address resolution

[57], 2009 Dynamic server provisioning for quality-of-service assurance in multi-tier Internet Model-independent fuzzy controller

applications

[79], 2009 Inject burstiness into TPC-W benchmark workloads Mathematical expressions for Index of dispersion

[82], 2009 Adaptive resource control for application SLOs in virtualized data centers Online model estimator + MIMO controller

[86], 2009 Automation of VM auto-configuration Reinforcement learning

[112], 2009 Workload balance in virtualized cluster systems Symmetric multiprocessing (SMP) based tuning algorithms 35

Table 2.2: Related research summary - II

Citation, Year Motivation Technical Approach

[96], 2008 Storage virtualization in large scale data centers. Multi-dimensional knapsack problem

[24], 2008 Varying service levels in computing systems for trade-offs between output quality Constrained optimization problem

and performance.

[50], 2008 Adaptive configuration policies for virtualized consolidated server environments Layered queuing models + Bin packing and gradient search

[68], 2008 Automated performance control of Web applications Queueing-Model-Based Adaptive Control

[78], 2008 Detecting burstiness symptoms in multi-tier systems Index of dispersion of the service process at a server

[89], 2008 Capacity measurement in multi-tier website Machine learning

[102], 2008 Simultaneous online management of performance and power consumption in large- Reinforcement learning

scale IT systems

[106], 2008 Dynamic provisioning for multi-tier Internet applications Queueing models

[125], 2008 Quantitative service differentiation on cluster-based delay-sensitive servers Optimization for system delay

[13], 2007 Modelling response time variability in multi-tier internet applications Queueing theoretical framework

[28], 2007 SLA guarantees for 3-tier e-commerce application in a virtualized environment Queueing network models

[49], 2007 Power management in server farms Dynamic voltage scaling

[69], 2007 Differentiated services in shared hosting platform MIMO control

[108], 2007 Dynamic provisioning of servers in application tier M/G/1/PS queueing models

[109], 2007 Virtual-appliance-based autonomic resource provisioning framework for large vir- Non-linear constrained optimization model

tualized data centers

[115], 2007 Online reconfiguration of distributed systems Machine learning

[119], 2007 Performance prediction models of enterprise multi-tier applications Statistical regression + analytical models

[120], 2007 Dynamic provisioning of multi-tier Internet applications Statistical regression + analytical models

[43], 2007 Performance management for Internet applications Survey

[5], 2006 Quality of service in service oriented architectures Optimization problem

[26], 2006 Autonomic database provisioning in dynamic web servers KNN Machine learning

[34], 2006 Service differentiation in multi-tier web applications Queuing models

[35], 2006 Modeling differentiated services of multi-tier web applications Queuing models

[74], 2006 Utility function optimization in virtualized environments. Analytical models

[121], 2006 Overload control in Internet services Threshold based monitoring of request lengths

[10], 2005 Autonomous resource allocation in data centers Analytical modelling

[42], 2005 Session-based adaptive overload control in web sites SSL connection differentiation

[93], 2005 Differentiated QoS in web servers Analytical models

[101], 2005 Online resource allocation in a distributed multi-application computing environ- Reinforcement learning

ment

[104], 2005 Modelling multi-tier Internet services Analytical models

[122], 2005 Proportional service differentiation in Internet service clusters Closed form expression for slowdown differentiation

[124], 2005 Responsiveness differentiation in Web servers Queuing theory + feedback controller

[39], 2004 Admission control for e-commerce web sites External profiling measurements

[51], 2004 Performance management of multi-tier web sites Control theory

[52], 2004 Performance isolation and differentiation for storage systems in data centers On-line feedback loop with adaptive controller

[66], 2004 Admission control and proportional differentiation in web servers. Non cooperative gaming technique

[123], 2004 Proportional differentiation for scalable web servers Queuing models

[126], 2004 Two dimensional service differentiation for online transactions M/G/1 queuing model

[71], 2003 Delay guarantees in web servers Queueing theory + feedback control

[3], 2002 Performance guarantees in web servers Control theory

[22], 2002 Session-based admission control in web servers Non linear optimization

[25], 2002 Overload control in web servers Dynamic weighted fairing sharing scheduling

[29], 2002 Session-based admission control in web servers Exponential moving averages

[130], 2001 Service differentiation in cluster-based network servers Dynamic scheduling

[8], 2000 Differentiation in cluster based network servers Constraint optimization

[67], 2000 Request based admission control and differentiation in web servers Exponential moving average based bandwidth allocation Chapter 3

Coordinated Session Based Admission

Control In Multi-tier Internet Services

3.1 Introduction

Admission control is critical for peak load management in Internet services and becomes crucial when the service is operating at or close to its maximum available capacity. We argue that an admission control mech- anism should accept new clients only when the service can provide acceptable quality and can guarantee successful transaction completion. Internet services are typically session-based. A session is a sequence of individual requests of different types made by a customer during a single visit to a website [75]. Deferring clients at the at the very beginning of their transactions rather than in the middle is essential to minimize resource wastage due to aborted sessions. [29].

Cherkasova et al. presents a pioneering session-based admission control (SBAC) approach for e-commerce websites [29]. The work originally proposes to use the effective session throughput defined as the number of completed sessions, instead of request throughput, to evaluate the web server performance. SBAC though proven to improve the effective session throughput, was designed for single-tier web servers in mind. It is not effective for peak load management in a multi-tier architecture. It is non trivial and more importantly ineffective to extend admission control strategies designed for single tier systems to multi-tier systems. This

36 37 is due the dynamic bottleneck tier shift in a multi-tier website as client access pattern changes.

Further challenges result from requests of different session types imposing different resource consump- tions at the different tiers. For example, studies found that the browsing related requests tend to put more pressure on the backend database server while the ordering related requests tend to put the least pressure on the database tier [88, 89]. Therefore, designing the admission control simply based on the bottleneck tier is not effective. The admission control strategy should take into consideration the coordinated state all tiers involved in processing the session to its successful completion.

In this chapter, we propose two new session-based admission control approaches for multi-tier Internet services with the goal of improving the effective session throughput. We aim to improve throughput by realizing a balanced multi-tier service.

We first propose a multi-tier measurement based admission control (MBAC), which proportionally ac- cepts different session mixes based on the utilization of multiple tiers. The motivation is that saturation of the system in processing one type of requests may not necessarily mean that the system cannot handle other types of requests. By accepting a mixture of different types of sessions the multi-tier service is more balanced and results in improved effective session throughput.

Next, we propose a statistical learning based coordinated session-based admission control approach

(CoSAC). It achieves coordination among all service tiers by modeling the multi-tier system as a Bayesian network. The probability with which a session is admitted is determined by the probabilistic inference of the network after applying the evidence in terms of utilization and processing time at each tier to the Bayesian network. This results in a coordinated admission decision that takes into account the state of all tiers at the time of admission.

We compare MBAC and CoSAC with each other and also with a black-box approach tailored from SBAC.

We choose SBAC strategy for performance comparison because it is a prevalent approach for session-based admission control in Web servers. We evaluate the approaches using the TPC-W workload benchmark in a three-tier e-Commerce simulator. Experimental results demonstrate the superior performance of CoSAC.

It can improve the effective session throughput by about 50% compared to the black-box approach in most scenarios, while MBAC can improve that effective session throughput by about 20%. 38

Table 3.1: Session based admission control strategies : notations

Symbol(s) Description

uweb, uapp, udb Observed web, application and database tier utilizations

pred pred pred uweb , uapp , udb Predicted web, application and database tier utilizations

tmin tmin min uweb ,uapp ,udb Minimum utilization thresholds for web, application and database tiers

tmax tmax tmax uweb , uapp , udb Maximum utilization thresholds for web, application and database tiers

3.2 Session Based Admission Control : Algorithms

3.2.1 A Blackbox Approach

The blackbox approach is a straightforward extension of SBAC, a widely accepted session-based admission control approach [29], that was designed for a single web server. The blackbox admission control approach extends SBAC to a multi-tier website as follows. The utilization of various tiers is periodically monitored at sampling intervals. Based on the measured utilizations in the recent past intervals, the predicted utilization for each tier for the next interval is calculated using the exponential moving average method. The predicted utilizations are compared with the pre-configured tier-specific utilization thresholds. Based on the comparison results, the admission control decision is made whether to accept or reject new sessions in the next interval.

As soon as the predicted utilization of any tier exceeds the maximum threshold, new incoming user sessions will be rejected. New sessions will be accepted again when the predicted utilization of all tiers fall below the threshold in subsequent intervals. Essentially, the admission controller treats the multi-tier website as a blackbox. The admission decision is based on the utilization of the bottleneck tier, whichever it is in a multi-tier Internet service. Note that once a session is accepted, all of its requests are admitted regardless of the predicted tier utilizations.

The admission control strategy is given by Algorithm 1. Table 3.1 summarizes key notations used.

3.2.2 The MBAC Approach

In a multi-tier e-commerce website, processing a request involves multiple system components in different tiers. Saturation of the system in the processing of one type of requests may not necessarily mean it cannot 39

Algorithm 1 Blackbox approach for session based admission control repeat

pred tmax pred tmax pred tmax if (uweb < uweb and uapp < uapp and udb < udb ) then

Accept requests from previously accepted sessions.

Accept new incoming sessions.

else

Accept requests from previously accepted sessions.

Reject new incoming sessions.

end if

until (ALL SESSIONS PROCESSED) handle other requests. With the Blackbox approach, in any given interval, either all of the new sessions are accepted or none of them are accepted. This can lead to under utilization of the system resources at certain tiers, when only one of the tiers is overloaded and the other tiers are operating at a normal load. We propose a multi-tier measurement based admission control (MBAC) approach, which aims to overcome this limitation by accepting different traffic mixes based on the utilizations of the individual tiers.

We consider the session-based workloads provided by the industry standard TPC-W benchmark specifi- cation. A TPC-W workload session belongs to one of the three distinct traffic mixes, Browsing, Shopping and

Ordering. Each workload mix is characterized by different probability based navigational patterns. TPC-W workloads are further detailed in Section 3.4. An interesting study conducted in [88, 89] shows that Browsing mix tends to be database tier intensive, Shopping mix tends to be application tier intensive and Ordering mix tends to be web tier intensive.

The idea of MBAC is to pro-actively accept different session mixes in proportion to the predicted uti- lizations at the different tiers. By accepting a mixture of different sessions, MBAC achieves more balanced utilization of the tier resources and improves the effective session throughput (completed sessions). To facil- itate this, two utilization thresholds per tier, a minimum and a maximum are maintained. Once the minimum utilization of one tier is exceeded, incoming sessions belonging to different traffic mixes are proportionally accepted to keep the tier utilizations in balance and improve session throughput.

The basic procedure is as follows. At periodic sampling intervals, the utilization of each tier is measured. 40

pred pred pred Using the exponential moving average method, the predicted utilization of all tiers (uweb , uapp and udb ) for the next interval are computed based on the measured utilization in a number of previous intervals. At the end of each interval, the admission control decision for the next interval is made based on the predicted utilizations as below :

• If the predicted utilizations of all tiers are below their minimum threshold values, all new sessions are

admitted in the next interval.

• If the predicted utilizations of all tiers are above their maximum threshold values, no new sessions will

be admitted in the next interval.

• If the predicted utilizations of all tiers are in the range of the minimum and maximum threshold values,

new sessions belonging to different mixes will be accepted in the next interval in proportion to the

pred pred pred predicted utilization ratio of the tiers, uweb : uapp : udb .

• If one of the predicted tier utilizations is above its maximum threshold, new sessions belonging to

different traffic mixes will be accepted in a proportion to the other two tier utilizations. For example, if

the predicted utilization of the web tier exceeds the maximum threshold and the predicted utilizations

of the application and database tiers are below the maximum thresholds, no sessions from Ordering

mix will be accepted but sessions from Browsing and Shopping mixes will be accepted in proportion

pred pred to the predicted utilization of the database and application tiers, udb : uapp .

• If two of the predicted tier utilizations are above their maximum thresholds, new sessions of only one

type are admitted. The session type to admit is either Browsing, or Shopping, or Ordering, depending

on which tier utilization is below the threshold.

As in the blackbox approach, the requests from already accepted sessions are always admitted. The admission control strategy is given by Algorithm 2. Table 3.1 summarizes key notations used. 41

Algorithm 2 MBAC for session based admission control repeat

pred tmin pred tmin pred tmin if (uweb < uweb and uapp < uapp and udb < udb ) then

Accept all new sessions.

pred tmax pred tmax pred tmax else if (uweb > uweb and uapp > uapp and udb > udb ) then

Reject all new incoming sessions.

tmin pred tmax tmin pred tmax tmin pred tmax else if (uweb < uweb < uweb and uapp < uapp < uapp and udb < udb < uapp ) then

pred pred pred Accept new Ordering, Shopping and Browsing sessions in the proportion uweb : uapp : udb

pred tmax pred tmax pred tmax else if (uweb > uweb and uapp < uapp and udb < udb ) then

pred pred Accept only new Shopping and Browsing sessions in the proportion uapp : udb

pred tmax pred tmax pred tmax else if (uweb < uweb and uapp > uapp and udb < udb ) then

pred pred Accept only new Ordering and Browsing sessions in the proportion uweb : udb

pred tmax pred tmax pred tmax else if (uweb < uweb and uapp < uapp and udb > udb ) then

pred pred Accept only new only Ordering and Shopping sessions in the proportion uweb : uapp

pred tmax pred tmax pred tmax else if (uweb > uweb and uapp > uapp and udb < udb ) then

Accept only new Browsing sessions

pred tmax pred tmax pred tmax else if (uweb > uweb and uapp < uapp and udb > udb ) then

Accept only new Shopping sessions

pred tmax pred tmax pred tmax else if (uweb < uweb and uapp > uapp and udb > udb ) then

Accept only Ordering sessions

end if

until (ALL SESSIONS PROCESSED) 42

Figure 3.1: CoSAC architecture.

3.3 Coordinated Session Based Admission Control : A Statistical Learn-

ing Approach

The Blackbox and the MBAC admission control approaches discussed in the previous section have the fol- lowing disadvantages :

• The admission control decision is executed at the interval edges. If a decision to reject new sessions is

made at the start of the interval, no sessions will be accepted even if some tiers become no longer over-

loaded during the interval. This may lead to reduced session throughput and resource under utilization.

On the other hand, if the decision is to accept new sessions, sessions will not be rejected even if some

tiers become overloaded during the interval. This may lead to resource wastage in the form of aborted

sessions.

• The length of the admission control sampling interval is essentially a tradeoff between the decision

making overhead and prediction accuracy.

• The admission control decision is completely based on measured values of the utilizations at individual

tiers. There is no coordination between the states of the various tiers.

We design a coordinated admission control approach that utilizes a statistical learning technique, Bayesian network, to overcome these shortcomings. Figure 3.1 illustrates the system architecture. Instead of making the admission control decision at a specific interval, the decision will be made each time a new session enters 43 the system. Coordination between various tiers is achieved by modeling the system as a Bayesian network.

The probability with which a session is accepted or rejected is determined by the probabilistic inference of the network after applying the evidence in terms of utilization and processing time at each tier to the network.

This leads to a coordinated admission decision, by taking into account the state of all three tiers at the time of decision.

3.3.1 Bayesian Networks - Background

A bayesian network is a high-level representation of a probability distribution over a set of variables that represent a problem domain model. It is a probabilistic graphical model that represents a set of variables as nodes and their conditional probabilistic dependencies as arcs between them, in a parent-child hierarchy. The quantitative relationships between the parent and child nodes are captured by conditional probability tables

(CPT). The CPT of each child node captures a collection of probability distributions over the child node, one for each different parental configuration, thus quantifying the parent-child dependency. The network thus be- comes a complete model for variables and their relationships and can be used to answer probabilistic queries about them. The network can be used to find updated knowledge of the state of a subset of variables when other variables (the evidence variables) are observed. This process of computing the posterior distribution of variables given evidence is called probabilistic inference.

Bayesian networks offer the following benefits.

• They provide a compact representation of complex problem domains. Since Bayesian networks are

models of the problem domain probability distribution, they can be used for computing the predictive

distribution on the outcomes of possible actions.

• The models have been found to be very robust in the sense that small alterations in the model do not

affect the performance of the system dramatically. As such, maintaining and updating existing models

is easy since the functioning of the system changes smoothly as the model is being modified.

• Bayesian modeling allows for combing expert knowledge with statistical data in a very practical way.

Expert domain knowledge can be coded as prior distributions, that is, the probability distributions 44

App Tier App Tier Processing Utilization Time WebTier DB Tier BT 0 BT 100 Utilization Utilization NR 100 NR 0 AT 0 AT 0 BT 0 BT 0 NR 0 NR 0 AT 100 AT 100

WebTier App Tier DB Tier State State State Underloaded 0 Underloaded 50 Underloaded 0 Normal 0 Normal 50 Normal 0 Overloaded 100 Overloaded 0 Overloaded 100 WebTier DB Tier Processing Processing Time Time

BT 0 BT 0 NR 0 Sessions NR 0 AT 100 To Accept AT 100 All 0 None 0 Browsing 0 Shopping 100 Ordering 0

Session Admit Type Session Browsing 0 Admit 100 Shopping 100 Reject 0 Ordering 0

Figure 3.2: Bayesian network model of a multi-tier Internet service.

can defined independently of processing any sample data. Moreover, all the parameters in bayesian

networks have an understandable semantic interpretation. So they can be constructed directly by using

domain expert knowledge, without a time-consuming learning process.

• Probabilistic models can handle several different type variables at the same time, whereas many alter-

native model technologies are designed for some single specific type of variables (continuous, discrete

etc.).

3.3.2 Multi-tier Internet Service : A Bayesian Network Model

We develop a Bayesian network representation of the multi-tier Internet service as illustrated in Figure 3.2.

Each oval is a Bayesian network node that represents a parameter of the Internet service. For each oval, the top portion specifies the name and the bottom portion shows the mutually exclusive and exhaustive valid states of the Bayesian network node.

Each tier of a multi-tier Internet service has a utilization parameter and a processing time parameter, 45 which leads to a more comprehensive representation of the workload states of the tier. For example, the web tier state is represented by a combination of network nodes, WebTierUtilization and WebTierProcessingTime.

The utilization measured at the web tier during the multi-tier service operation is represented by WebTierUti- lization, which can be in either Below Threshold (BT), within Normal Range (NR) or Above Threshold (AT) states. Similarly the processing time parameter measured at the web tier is represented by the network node

WebTierProcessingTime, which can be in either BT, NR or AT states.

There are seven top level nodes in the Bayesian network, i.e., WebTierUtilization, WebTierProcessing-

Time, AppTierUtilization, AppTierProcessingTime, DbTierUtilization, DBTierProcessingTime and Session-

Type. Determining the state of the network nodes is an essential part of the probabilistic inference of a

Bayesian network. The state of each tier-specific utilization and processing time nodes is determined pro- grammatically. For top level nodes, the state is based on the configurable threshold values and observed

tmin utilization and processing times at each tier during Internet service operation. For example, if uweb < uweb , the WebTierUtilization node is in ‘BT’ state.

The session type node represents the traffic mix (Browsing, Shopping or Ordering) of the incoming user session. It’s state is determined by the type of the incoming session.

3.3.3 Conditional Probability Tables (CPTs) and Bayesian Network Training

In addition to the seven top level nodes, the Bayesian network model of the multi-tier Internet service consists of five other nodes, WebTierState, AppTierState, DBTierState, SessionsToAccept and AdmitSession. Con- ditional Probability Tables associated with these nodes play an important role in determining their state. A

CPT is defined based on the node’s inputs and output. The state of a node is determined by processing the applied inputs according to the probabilities defined in its CPT.

We use a simple and efficient offline training of the Bayesian network in order to determine the CPT tables.

The training process involves conducting experiments with several variations of CPTs for different values of the threshold utilizations. The workload used for training consists of equal number of Browsing, Shopping and Ordering sessions. The system performance with various CPTs with respect to the system throughput and number of aborted sessions is analyzed. The CPTs that consistently resulted in high throughput and few 46

Table 3.2: CPT for WebTierState Node

WTProcessingTime WTUtilization Normal Underloaded Overloaded

BT BT 0 100 0

BT NR 50 50 0

BT AT 0 50 50

NR BT 50 50 0

NR NR 100 0 0

NR AT 50 0 50

AT BT 0 50 50

AT NR 50 0 50

AT AT 0 0 100

aborted sessions are adopted.

One such CPT determined by the training process is provided in Table 3.2. The table represents the

CPT of the WebTierState node. The first two columns are the inputs to this node and the third, fourth and

fifth columns show the probabilities of the node being in the “Normal”, “Underloaded” and “Overloaded” states. The CPT shows that if the WTProcessingTime and WTUtilization nodes are NR and BT, respectively, the probability of WebTierState being in either “Normal” or “Underloaded” state is 0.5. Similar CPTs are determined for the five non top-level nodes of the Bayesian network.

3.3.4 CoSAC : Operation

When a new session arrives, the utilization and processing time parameters of each tier are measured and applied as evidence to the appropriate nodes of multi-tier Internet service Bayesian network model. The measured value of the web tier utilization is applied as evidence for WebTierUtilization node, the measured application tier utilization as evidence for AppTierUtilization node, and so on. Additional evidence applied is the traffic mix of the incoming session. An example of such evidence applied to the network is shown in

Table 3.3. The numerical values shown in Figure 3.2 are the probability of the node being in the corresponding state when this specific evidence is applied to the network. 47

Table 3.3: Evidences applied to the Bayesian network.

Multi-tier System Parameter Value

Utilization of the web tier 83%

Processing time of the web tier 85 milliseconds

Utilization of the application tier 48%

Processing time of the application tier 27 milliseconds

Utilization of the database tier 95%

Processing time of the database tier 89 milliseconds

Incoming session type Shopping

The evidences applied to the top-level nodes are propagated through the Bayesian network. It executes the probabilistic inference, taking into account the coordinated states of the entire multi-tier system. Based on the evidence applied to the top-level network nodes, WebTierUtilization node state is inferred as AT with 100% probability. Similarly, the state of other nodes and its probability will be determined by the probabilistic inference. Because of the probabilistic dependencies defined by the CPTs of the WebTierState and DBTierState nodes, both these nodes now have the Overloaded state with 100% probability. Similarly, the AppTierState node has two states, i.e., Underloaded and Normal each with 50% probability. This leads to the state of Shopping for the node SessionsToAccept. The type of the incoming session is applied as evidence to the node SessionType, which gives the node the state of Shopping. The inference process is finished as the state of the AdmitSession node is inferred as Admit with 100% probability. This results in the incoming session of Shopping type to be accepted. If the incoming session is of any other type, it would be rejected.

We note that in this case the probability of admitting the session is 100%. But this may not be the case for other values of the evidence. As long as the probability of the SessionsToAccept node with the state Admit is above a threshold value (say 60%), the incoming session of any type will be accepted. 48

Table 3.4: TPC-W transactions

Browsing Type Ordering Type

Home Shopping Cart

New Products Customer Registration

Best Sellers Buy Request

Product Detail Buy Confirm

Search Request Order Inquiry

Execute Search Order Display

Admin Request

Admin Confirm

3.4 Performance Evaluation

3.4.1 Experimental Setup

3.4.1.1 Simulator

To evaluate the session-based admission control strategies, we developed a Java based multi-threaded simula- tion model of a three-tier e-commerce site. It consists of a customer generator, a session and request generator, multiple web servers, application servers and database servers. Without being affected by the implementation methods, the simulator can effectively evaluate the performance of the admission control strategies.

3.4.1.2 TPC-W Workloads

The session-based workload processed by the e-commerce simulator is generated according to the guidelines provided by the TPC-W benchmark specification. TPC-W benchmark [98] is an industry standard transac- tional web benchmark workload. TPC-W defines 14 different transactions. These transactions can be roughly classified as Browsing or Ordering type, as shown in Table 3.4.

TPC-W defines three standard traffic mixes, Browsing, Shopping and Ordering, based on the weight of each transaction type in the particular traffic mix as shown in Table 3.5. Each of the workload mixes 49

Table 3.5: Request compositions in TPC-W.

Browsing Shopping Ordering

Browsing request 95% 80% 50%

Ordering request 5% 20% 50%

is characterized by different probability based navigational patterns. A session is created as a sequence of interactions for the same customer. For each session of a specific mix, the next interaction is determined by a state transition matrix that specifies the probability of moving from one interaction to another. Typically, a user session starts with a Home transaction request. The session time for the session and think time between the interactions are generated by an exponential distribution with a given mean [29].

The resource demand and the processing time of each interaction is derived from the WIRT (Web In- teraction Response Time) based on the observation that different mixes pose varying load on the tiers. For example, for the requests of the Browsing Sessions, the database tier is the most intensive and the web tier is the least intensive. As others in [88, 89], we assume that Browsing mix is database tier intensive, Shopping mix is application tier intensive and Ordering mix is web tier intensive.

3.4.1.3 Modeling Bayesian Network

The Bayesian network representation of the multi-tier Internet service is modeled with the Netica soft- ware [32]. Netica is a powerful and easy-to-use software for working with belief networks and influence diagrams. It uses the fastest known algorithm for exact general probabilistic inference in a compiled Bayesian network, known as “message passing in a junction tree of cliques”. Netica-J is the Java API that can be used in conjunction with the Netica software.

3.4.2 Impact of Multi-tier Architecture on Session based Admission Control

We start our performance evaluation by comparing the first two admission control strategies proposed, the

Blackbox and MBAC approaches. The purpose of this experiment is to demonstrate the impact of a multi- tier architecture on the effectiveness of session-based admission control. The workload for the experiment 50

100 100 WebTierUtilization WebTierUtilization 90 AppTierUtilization 90 AppTierUtilization DbTierUtilization DbTierUtilization 80 80

70 70

60 60

50 50

40 40 Utilization % Utilization % 30 30

20 20

10 10

0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Sampling Intervals Sampling Intervals

(a) Predicted utilizations with Blackbox. (b) Predicted utilizations with MBAC.

Figure 3.3: Predicted utilizations of a three-tier website. consists of an equal number of Browsing, Shopping and Ordering sessions.

We capture the predicted utilizations for the web, application and database tiers at the interval edges.

Figure 3.3 shows the tier utilizations due to the Blackbox approach and the MBAC approach, captured for a period of 10 intervals during the experiment.

At each interval, the admission control decision is made by comparing the predicted utilizations to a pre-configured threshold utilization values. For this experiment, the threshold utilization values for the web, application and database tiers is set at 80%. The admission control decision determines the type of the sessions that can be accepted for the next interval. For the Blackbox approach, if the predicted utilizations of any of the tiers exceed 80%, no new sessions will be accepted in the next interval. For the MBAC approach, if the predicted utilizations of all of the tiers exceed 80%, no new sessions will be accepted in the next interval. Otherwise, sessions of different traffic mixes will be accepted based on the predicted utilization of the different tiers. Table 3.6 gives the session types that were accepted for the 10 intervals due to the Blackbox and MBAC approaches.

Results clearly show that by the time the admission control decision is made at interval 1, the predicted utilizations of all the tiers exceed the threshold values. Thus, in both strategies, the admission control decision is to reject new sessions. At the edge of the interval 2, predicted utilizations of all the tiers fall below the threshold value, so new sessions of all traffic types will be accepted in the interval. So far the behaviors of Blackbox and MBAC strategies are similar. However, at the interval 3, only the web tier’s predicted 51

Table 3.6: Accepted sessions by Blackbox and MBAC.

Interval By Blackbox By MBAC

1 None None

2 Any Any

3 None Ordering

4 None Ordering

5 Any Shopping

6 Any Any

7 None Browsing

8 Any Any

9 None Browsing

10 None Any

utilization is below the threshold. As the result, Blackbox approach rejects all the new sessions, while the

MBAC approach accepts the Ordering sessions. No Shopping and Browsing sessions will be accepted by

MBAC approach as the utilization of the application tier and database tier are above the threshold. A similar observation can be found in intervals 4, 7 and 9. We observed that MBAC approach accepts sessions of different types while Blackbox approach rejects all new sessions in those intervals. Compared to the Blackbox approach, the MBAC approach is able to accept more sessions, leading to more balanced utilization of the multiple tiers of the website.

3.4.3 Why not accept Ordering sessions only

One may argue that under heavy load conditions, intuitively only Ordering sessions be accepted since they are more likely than other sessions to result in economic benefits for the e-commerce site. We executed experiments to support our counter-intuitive argument that accepting only Ordering sessions under a heavy load condition might not be the right way to utilize the resources in a multi-tier website.

For this experiment, we consider two different workloads. The first workload consists of just Ordering sessions and the second workload consists of an equal number of Browsing, Shopping and Ordering sessions. 52

80 80

70 1:1:1 Mix 70 1:1:1 Mix Ordering Mix Ordering Mix 60 60

50 50

40 40

30 30 Rejected Sessions Accepted Sessions 20 20

10 10

0 0 0 20 40 60 80 100 0 20 40 60 80 100 Session Arrival Rate (per second) Session Arrival Rate (per second)

90 80 1:1:1 Traffic Mix 80 70 1:1:1 Mix Ordering Mix 70 60 60 50 50 40 40 30 30 Completed Sessions 20 20 10 10 0 Improvement over Ordering Mix (%) 10 20 30 40 50 60 70 80 90 100 0 0 20 40 60 80 100 Session Arrival Rate(per second) Session Arrival Rate (per second)

80 90 1:1:1 Traffic Mix 70 1:1:1 Mix 80 Ordering Mix 60 70 60 50 50 40 40 30 30 Aborted Sessions 20 20 10 10 Improvement over Ordering (%) 0 0 10 20 30 40 50 60 70 80 90 100 0 20 40 60 80 100 Session Arrival Rate (per second) Session Arrival Rate(per second)

(a) Accepted sessions. (b) Rejected sessions.

() Acceptance improvement. (d) Completed sessions.

(e) Aborted sessions. (f) Completion improvement.

Figure 3.4: Throughput comparison with different traffic mixes. 53

We apply the workloads to the multi-tier service with MBAC admission control and compare the results. We represent the workload as the number of arrival sessions per second instead of a utilization percent. This is due the fact that there is no single utilization percent in a multi-tier architecture and the bottleneck tier shifts from time to time.

Figures 3.4(a)(b)(d)(e) show the comparison of the number of accepted sessions, completed sessions, rejected sessions, and aborted sessions, respectively. The results show that workload consisting of different session types leads to more sessions being accepted, which results in a significant increase of the throughput.

Figures 3.4(c)(f) depict the percentage improvement in the accepted session and completed sessions respec- tively. As many as 50% more sessions are accepted and completed when the workload consists of a mix of different session types. The rationale is that just accepting one type of sessions will overload one of the tiers sooner, leading to the system imbalance. On the other hand, accepting a mixture of different sessions will place more balanced resource demand across all three tiers. Thus the effective session throughput is increased significantly.

As shown in the Figure 3.4(e), the number of aborted sessions with the second workload is slightly more.

This translates to a small increase in the wastage of the multi-tier service resources. We note that the increased resource wastage is negligible when compared to the throughput improvement gained.

3.4.4 Impact of CoSAC on Throughput

Next, we evaluate the impact the Bayesian network model based CoSAC on the session throughput and compare it with the Blackbox and MBAC approaches. For this experiment, we consider several workloads consisting of an equal number of Browsing, Shopping and Ordering sessions. Each workload consists of sessions arriving at different arrival rates, 10 sessions/sec, 20 sessions/sec, ··· , 100 sessions/sec.

Figures 3.5(a)(b)(d)(f) show the comparison of the accepted, rejected, completed, and aborted sessions, respectively. Results demonstrate that CoSAC is able to accept and complete significantly more sessions than the MBAC and Blackbox approaches. When the overall session arrival rate is greater than 80 sessions/sec, the saturation point of the website is reached. Using the results of the Blackbox approach as the baseline,

Figure 3.5(c) shows that CoSAC is able to accept as many as 45% more sessions and MBAC is able to accept 54

80 80

70 BlackBox 70 MBAC BlackBox CoSAC MBAC 60 60 CoSAC

50 50

40 40

30 30 Rejected Sessions Accepted Sessions 20 20

10 10

0 0 0 20 40 60 80 100 0 20 40 60 80 100 Session Arrival Rate (per second) Session Arrival Rate (per second)

90 80 80 MBAC 70 BlackBox CoSAC MBAC CoSAC 70 60 60 50 50 40 40 30 30 Completed Sessions 20 20 10 10 Improvement over Blackbox (%) 0 10 20 30 40 50 60 70 80 90 100 0 0 20 40 60 80 100 Session Arrival Rate(per second) Session Arrival Rate (per second)

80 90 70 BlackBox 80 MBAC MBAC CoSAC CoSAC 60 70 60 50 50 40 40 30 30 Aborted Sessions 20 20 10 10 Improvement over Blackbox (%) 0 0 10 20 30 40 50 60 70 80 90 100 0 20 40 60 80 100 Session Arrival Rate (per second) Session Arrival Rate(per second)

(a)Accepted sessions. (b) Rejected sessions.

(c) Acceptance improvement. (d) Completed sessions.

(e) Aborted sessions. (f) Completion improvement.

Figure 3.5: Impact of admission control strategies on session throughput (with 1:1:1 workload mix). 55

80 80

70 BlackBox 70 BlackBox MBAC MBAC CoSAC CoSAC 60 60

50 50

40 40

30 30 Rejected Sessions Accepted Sessions 20 20

10 10

0 0 0 20 40 60 80 100 0 20 40 60 80 100 Session Arrival Rate (per second) Session Arrival Rate (per second)

90 80 80 MBAC 70 BlackBox CoSAC MBAC CoSAC 70 60 60 50 50 40 40 30 30 Completed Sessions 20 20 10 10 Improvement over Blackbox (%) 0 10 20 30 40 50 60 70 80 90 100 0 0 20 40 60 80 100 Session Arrival Rate(per second) Session Arrival Rate (per second)

80 90 70 BlackBox 80 MBAC MBAC CoSAC CoSAC 60 70 60 50 50 40 40 30 30 Aborted Sessions 20 20 10 10 Improvement over Blackbox (%) 0 0 10 20 30 40 50 60 70 80 90 100 0 20 40 60 80 100 Session Arrival Rate (per second) Session Arrival Rate(per second)

(a)Accepted sessions. (b) Rejected sessions.

(c) Acceptance improvement. (d) Completed sessions.

(e) Aborted sessions. (f) Completion improvement.

Figure 3.6: Impact of admission control strategies on session throughput (with 3:2:1 workload mix). 56 about 20% more sessions. Figure 3.5(f) shows that CoSAC is able to complete as many as 50% more sessions and MBAC is able to complete as many as 22% more sessions. This demonstrates the significance of multi- tier admission control approaches. Particularly, the admission coordination between multiple tiers via the

Bayesian network representation leads to significant session throughput improvement.

Next we repeat our experiment with a workload consisting of unequal number of Browsing, Shopping and Ordering sessions. Once again each workload consists of sessions arriving at different arrival rates, 10 sessions/sec, 20 sessions/sec, ··· , 100 sessions/sec. Figures 3.6 shows the results due to the use of a workload consisting a 3:2:1 ratio of Browsing, Shopping and Ordering sessions. The plots have similar shapes as those in Figure 3.5. But the overall session throughput is slightly lower. We believe this is because with workloads consisting of the equal number of sessions of different types, the load is more uniformly distributed to the different tiers of the system. With the sessions of different types in more different ratios, some tiers are more loaded than the other tiers, leading to more rejected and aborted sessions. Nevertheless, the CoSAC outperforms other two approaches significantly with respect to the accepted session, leading to an improved effective session throughput measured as the number of completed sessions.

The previous two experiments use several workloads that contain sessions arriving at constant session ar- rival rate. Next, we study the performance of the admission control strategies under a dynamic workload. The dynamically increasing session arrival rate of the workload is depicted in Figure 3.7(a). The overall session arrival rate changes from 10 sessions/sec during the first 10 seconds to 50 sessions/sec in the last 10 seconds.

Figures 3.7(b)(d)(e) show the comparison of the number of accepted, aborted, completed sessions for the different admission control strategies. Figure 3.7(c)(f) show the percentage improvement of the CoSAC and

MBAC strategies over the Blackbox. CoSAC is able to complete as many as 80% more sessions and MBAC is able to complete as many as 40% more sessions. This demonstrates the superiority of the CoSAC with dynamic workloads.

3.4.5 Choosing the Admission Control Interval

The length of the admission control interval is a tradeoff between admission control overhead and resource utilization efficiency. On one hand, executing the admission control decision require system resources and 57

60 400

350 50 BlackBox MBAC 300 CoSAC 40 250

30 200

150 20 Accepted Sessions 100

SessionArrivalRate (per second) 10 50

0 0 0 10 20 30 40 50 0 10 20 30 40 50 Time (sec) Time (sec)

90 400 MBAC 80 350 BlackBox CoSAC MBAC 70 300 CoSAC 60 250 50 40 200 30 150 Aborted Sessions 20 100 10 50 Improvement over Blackbox (%) 0 10 20 30 40 50 0 0 10 20 30 40 50 Session Arrival Rate(per second) Time (sec)

400 90 350 80 BlackBox MBAC MBAC 300 CoSAC 70 CoSAC 60 250 50 200 40 150 30 Completed Sessions 100 20 10 50 Improvement over Blackbox (%) 0 0 10 20 30 40 50 0 10 20 30 40 50 Time (sec) Session Arrival Rate(per second)

(a) Dynamic session arrivals. (b) Accepted sessions.

(c) Acceptance improvement. (d) Aborted sessions.

(e) Completed sessions. (f) Completion improvement.

Figure 3.7: Impact of admission control strategies on session throughput (with dynamic workload). 58 running it too frequently imposes unnecessary load on the system. On the other hand, executing the admission control decision too infrequently may result in under-utilization of the resources in some situations. For instance, if at the beginning of an interval a decision is made not to admit new sessions, a long interval may result in subsequent new sessions to be rejected even when one or more tiers return to their normal state during the interval.

We conducted experiments to observe the effect of interval length values. Based on the experimental results we used a fixed interval length of 3 sec for all our evaluations. It provided a reasonable tradeoff between the two factors. We note that the interval length is an important factor that has direct affect on performance of the admission control strategies. Setting it to a static value is a simple approach. Dynamic interval length in the context of admission control deserves further study.

3.5 Summary And Discussion

We developed two novel session-based admission control approaches for multi-tier Internet services, MBAC and COSAC. Both recognize the fact that the bottleneck in a multi-tier website shifts among tiers as client access patterns change dynamically. Blackbox admission control approaches based on the bottleneck tier are thus not effective as different client sessions impose different resource consumptions at the different tiers.

MBAC approach is based on the dynamic measurement of utilizations at the different tiers and it makes admission decision pro-actively based on the session mix characteristics. It demonstrates that the prevalent session-based admission control approach that are effective in single tier Web systems are no longer effective in multi-tier environments. CoSAC novelly uses a statistical learning technique, a Bayesian network, to coordinate the admission decision based on utilization state of multiple tiers.

The admission control strategies developed in this chapter do not impose any functionality changes on the multi-tier Internet services. Services that are already deployed can therefore be enhanced with an admission controller component. Administrators can externally monitor the tier-specific metrics and feed them to the admission controller. Developers of a new multi-tier service can also choose to provide an integrated solution by including the monitoring and reporting component within the service. These techniques achieve effective session policing without imposing additional latency on the user sessions. Overhead of using the Bayesian 59 networks as in CoSAC strategy is negligible since only a three-level network used and the CPTs were already determined during the training process. Bayesian networks add another advantage of being flexible and robust to changes, so that the multi-tier Internet service model can be easily extended for management strategies beyond admission control.

Extensive simulation results based on TPC-W benchmark workload demonstrate the superior perfor- mance of the new approaches, particularly the CoSAC which employs a Bayesian network model of the multi-tier Internet service, in achieving improved effective session throughput. CoSAC can improve the effective session throughput by about 50% compared to the black-box approach in most scenarios, while

MBAC can improve the effective session throughput by about 20%. While these admission control strate- gies focussed on an e-commerce Internet service, they can be generalized to other types of applications that employ a multi-tier architecture.

The two admission control strategies however are dependent on the knowledge of traffic type of the incoming sessions. User profiling tracks the user behavior based on their registration and previous shopping patterns. User profiles assist in determining the type of the user session when the service is accessed [75, 128].

While profiling can be used to probabilistically determine the session type for the purpose of admission control, Internet service management should preferably be independent of such dependencies. Additionally, the success of the admission control strategy depends on the accuracy of the offline training conducted. This is a typical drawback of supervised statistical learning techniques. The mandatory lengthy training process is a big impediment to the adoption of statistical learning approaches in managing real-life dynamic systems.

We address these two limitations and explore a novel reinforcement learning based admission control strategy independent of user’s session type in Chapter 5. Chapter 4

Dynamic Server Provisioning for

Multi-tier Internet Service Performance

Guarantees

4.1 Introduction

Dynamic resource provisioning is critical for quality-of-service provisioning in multi-tier Internet services.

It enables on-the-fly resource management of a multi-tier Internet service to effectively handle dynamic and bursty workloads. Ideally the service will be assigned necessary and sufficient amount of resources to handle its current load [100]. As the observed performance of the service deteriorates, additional resources should be dynamically provisioned to the service. Conversely, as the resources become under-utilized they ought to be removed from the Internet service.

Dynamic resource management techniques have been proposed for request based performance guarantees in multi-tier servers [34, 57, 104, 120]. However, there is a lack of techniques that provide session based performance guarantees. Response time and queueing delay are the major absolute performance metrics in evaluating request based system responsiveness and service quality. However, they are not only unsuitable

60 61 for comparing requests that have very different resource demands, but also not applicable for session based workloads because of the dynamic session length. Session length, which is the number of requests in a session, is dynamic and unknown at the time of session origination. Because of the unpredictability of the session length, one cannot guarantee the absolute response time of a session. A relative metric independent of session length is favorable and mandatory for performance guarantee of session based Internet services [128].

Slowdown is the relative ratio of a request’s queueing delay to its actual processing time [47, 48]. It is known that clients are more likely to anticipate short delays for “small” requests like browsing, and more willing to tolerate long delays for “large” requests like search [128]. Using slowdown as the QoS metric fa- cilitates attaining the anticipation. Because the slowdown metric directly translates to user-perceived relative performance and system load, it has been accepted as an important performance metric on servers [47, 48,

94, 125, 127, 128].

We promote the use of session slowdown for performance guarantee in multi-tier servers. Session slow- down is the relative ratio of the total queueing delay of requests of one session to the total processing time of the requests. It is compelling for session-based performance measurement because it is user-perceived ser- vice quality at the session level and it is independent of the session length [128]. However, providing session slowdown guarantee on multi-tier servers is important, but also challenging.

There are queueing analytical models for request slowdown differentiation on single-tier Internet servers [125,

127, 128]. Zhou et al. derived a closed form expression of the expected request slowdown in an M/G/1 queue with a bounded Pareto service time distribution [127]. However, the extension of the analytical model to the multi-tier architecture is greatly challenging, if not impossible due to the inter-tier dependencies, per- tier replication and caching policies and concurrency limits [34]. An extended model, even if feasible, may not accurately capture the dynamics of session-based workloads on multi-tier servers, resulting in perfor- mance guarantee violations. Additionally, in a multi-tier Internet service, the resource demand posed by user sessions on different tiers is dynamic in nature leading to the bottleneck tier shift issue [19, 78, 80].

In this chapter, we use statistical learning to model the multi-tier Internet service dynamics and provide session slowdown guarantees by dynamically provisioning servers (virtual machines) at multiple tiers. Sta- tistical regression models that effectively capture the dynamic behavior of a multi-tier Internet service under 62 dynamic and complex session-based workloads are learned offline. The learned regression models are used by a novel dynamic server provisioning strategy to determine the servers required at each tier to satisfy the workloads. Two distinct statistical regression models are utilized to control the upper and lower bounds of resources allocated to the multi-tier Internet service. We consider resource provisioning at the granularity of servers/virtual machines.

We evaluate the effectiveness and efficiency of the proposed approach using the industry standard TPC-W benchmark workloads in a typical 3-tier e-commerce environment. Simulation results demonstrate that the approach adapts to workload variations when it is subjected to a workload different than the training work- loads. It achieves session slowdown guarantee for various dynamic workloads while the service resources are efficiently utilized.

4.2 Statistical Regression Analysis : Background

Statistical regression provides a framework to gain knowledge of a computer system, construct system models and make predictions based on the models. The problem domain is represented by a set of dependent and in- dependent variables. The observed values of the variables are plotted to identify the general data trend without having to necessarily match the individual data points. The general trend determines the specific regression analysis, such as linear and exponential, to be performed on its mathematical representation. Regression analysis results in a quantitative model representation of the relation between the two sets of variables.

The quality of the regression model is quantified by statistical measures. One popular technique is to use the correlation coefficient of the model to quantify the “goodness” of the observed data fit to the model.

Correlation coefficient is a statistical measure of the interdependence of two or more random variables and its values vary between -1 and +1. A correlation coefficient of +1 reflects a perfect fit with a positive slope between variables, -1 reflects a perfect fit with a negative slope and 0 indicates that the variables are inde- pendent of each other. Regression analysis models with qualified fitness can further be used to predict the dependent variable value from one or more measured independent variables.

We use regression analysis as a modeling tool to capture the multi-tier Internet service behavior. The multi-tiered environment is complex due to inter-tier dependencies, per tier replication and caching con- 63 straints, bottleneck tier shift. It is further complicated by highly dynamic and bursty Internet workloads. It is extremely difficult, if not impossible, to derive a concrete model of a multi-tier system that can effectively capture the complete system dynamics. Regression analysis as a modeling tool provides an attractive alter- native solution. Instead of capturing the complete system dynamics, regression analysis captures the patterns and trends of a multi-tier Internet service behavior as simple quantitative models. Regression models can ef- fectively capture Internet service parameter relationships that reflect its behavior when subjected to dynamic workloads.

Two important performance metrics of interest are : session slowdown and resource utilization efficiency.

Regression analysis is used to capture the session slowdown behavior patterns with respect to the Internet service capacity when subjected to dynamic resource demands. We use the statistical regression model of the

“allocated resources - session slowdown” relationship to make accurate resource allocation predictions based on observed value of session slowdown to provide session slowdown guarantees. Similarly, we use regression analysis to quantitatively capture the “allocated resources - resource utilization” relation as a regression model that can be used for resource removal predictions to assure efficient utilization of the resources allocated to service. We thus use two distinct statistical regression models to control the upper and lower bounds of the resources allocated to the multi-tier Internet service.

4.3 Terms and Definitions

We now present several important terms that are integral to our statistical regression based dynamic server provisioning strategy.

4.3.1 Session Slowdown

A typical e-commerce application consists of three tiers; a front-end Web tier that is responsible for HTTP request processing, a middle application tier that implements core application functionality say based on Java

Enterprise platform, and a backend database that stores product catalogs and user orders. Figure 4.1 illustrates the architecture of a three-tier e-commerce server cluster.

A session is a sequence of individual requests of different types made by a customer during a single 64

dispatcher dispatcher clients ...... Tier 1 − Web Tier 2 − Application Tier 3 − Database (clustered or not clustered)

d1, p1 d2, p2 d3, p3

Figure 4.1: A typical three-tier Internet service architecture. visit to a web site. In this context, an incoming user request undergoes HTTP processing, application server processing, and triggers queries or transactions at the database. In an n-tier architecture, let dij and pij denote the queueing delay and processing time of a request j of a session at tier i, respectively. The session slowdown is defined as the relative ratio of the total queueing delay of the session requests to the total processing time of the session requests. The rationale is that the session-oriented performance metric should describe the perceived performance at the session level, not at the individual request level. As work in [57, 58, 104, 106], this work assumes that one request at one tier of the multi-tier system does not spawn more than one request to the downstream tier. Thus, a requests total queueing delay is the sum of its queueing delay at the individual tiers and a requests total processing time is the sum of its processing time at the individual tiers. The session slowdown (s) is calculated as Pn Pm i=1 j=1 dij s = Pn Pm (4.1) i=1 j=1 pij where m is the number of requests in the session.

Note that Eq. (4.1), takes into consideration that all requests of a session need not visit all the tiers. In such a case, dij = pij = 0.

4.3.2 Tier Session Slowdown Ratio

Sessions belonging to different mixes visit the tiers varying number of times with different workloads. Re- quests of different types impose varying resource demands on the different tiers. To capture the different resource demands at individual tiers, we define tier session slowdown as the ratio of the total queuing delay of the requests of the session at a tier to the total processing time of the requests at that tier. That is 65

Pm j=1 dij si = Pm . (4.2) j=1 pij

A tier session slowdown is affected by the dynamic resource demand on the tier and the resources allo- cated to the tier. Note that according to the definitions in Eq. (4.1) and Eq. (4.2), the tier session slowdowns at individual tiers do not add up to the session slowdown at the multi-tier service level.

One may argue that session slowdown should be modeled at the tier level to represent the resource demand variations. However, modeling the session slowdown at the tier level is not practical because of the inter-tier dependencies of the multi-tier architecture. The session slowdown at an individual tier is dependent on the resources allocated at that tier, but also on the resources allocated at the preceding or succeeding tiers. Those dependencies are dynamic in nature. Even if the session slowdown can be modeled at the tier level, it can only be used to provide guarantee of tier-level session slowdown instead of user-perceived multi-tier session slowdown.

The normalized tier session slowdown at a tier i is calculated as

s sr = i . (4.3) i s

We define the tier session slowdown ratio of an n-tier service as the ratio of the normalized tier session slowdowns at the individual tiers. That is

sr1 : sr2 : ··· : srn. (4.4)

While a tier session slowdown reflects the resource demand at a tier, the ratio of the normalized tier session slowdowns reflects weighted proportional resource demands on the individual tiers of a multi-tier service. We utilize the ratio to distribute provisioned resources to the individual tiers.

4.3.3 Resource Utilization

We use the term “resource” as an abstract representation of a computing entity with fixed capacity that pro- cesses session-based workloads. For example, a virtual machine or a server assigned to the multi-tier Internet service is considered a resource [74]. In this chapter, we use the term “resource” to indicate a “virtual server” 66 and use the two terms interchangeably. As others in [26, 57, 106], we assume that the resources are homo- geneous with same capacity and can be assigned to any tier. Resource utilization is the percentage of the resource capacity that is utilized to serve the sessions.

4.3.4 A Behavior Model

A behavior model represents the learned behavior of the multi-tier Internet service when subjected to a spe- cific workload. Two important Internet service parameter relations are “allocated resources - session slow- down” and “allocated resources - resource utilization”. A behavior model captures the two parameter relations as quantitative statistical regression modes. Along with the regression models, the correlation coefficients that quantify the quality of the models are included in the behavior model.

The workload characteristics, session traffic mix and session arrival rate, impact the quantitative values of the statistical regression models. Different workloads result in unique quantitative representations of the

“allocated resources - session slowdown” and “allocated resources - resource utilization” relations. The work- load characteristics are integral to a behavior model that captures these relations and are used to distinguish one behavior model from another. A behavior model also captures the tier session slowdown ratio, which indicates weighted proportional resource demands of the workload on the individual tiers.

4.4 Dynamic Resource Provisioning : A Statistical Regression Ap-

proach

Our statistical regression approach for combines extensive offline training and online monitoring of a multi- tier Internet service towards dynamic server provisioning in multi-tier Internet service for session slowdown guarantee. Offline training is conducted to learn and model the multi-tier Internet service behavior dynamics when subjected to dynamic workloads. During the training phase, regression analysis is used to quantitatively model the Internet service parameter behavior patterns. The statistical regression models along with the workload characteristics are encapsulated by a behavior model. An extensive set of behavior models is learned during the training phase using a diverse set of workloads. The learned behavior models collectively 67

Table 4.1: Training workload characteristics.

TPC-W Session Mix Session Arrival Rate (per second) Allocated Resources

Browsing 5, 10, 15, ... 100 3, 4, 5, ... 100

Shopping 5, 10, 15, ... 100 3, 4, 5, ... 100

Ordering 5, 10, 15, ... 100 3, 4, 5, ... 100

represent the multi-tier Internet service behavior when subjected to dynamic workloads.

When multi-tier Internet service is subjected to real-time workloads, the session slowdown values and resource utilization metrics are monitored and compared with predefined thresholds at periodic time intervals.

If a threshold violation is observed, the statistical regression models of a behavior model are used to predict the resource allocation and resource removal requirements of the Internet service. Resources are allocated to or removed from different tiers of the multi-tier service by taking the dynamic resource demands at the individual tiers into consideration.

We next discuss the training and online phases of the statistical regression based resource provisioning in detail.

4.4.1 The Training Phase : Learning Behavior Models

The training phase is used to observe and quantitatively capture the multi-tier Internet service behavior. An extensive set of behavior models is derived using diverse workloads. We generate workloads for the training purpose according to the industry standard TPC-W benchmark specification. The workload characteristics and the resource variations used during the training phase are summarized in Table 4.1. Each possible com- bination of the session type, session arrival rate and allocated resources results in a unique behavior model.

We use statistical regression analysis to derive regression model representations of the “allocated re- sources - session slowdown” and “allocated resources - resource utilization” relations. To demonstrate the statistical regression analysis performed to model the relations, in the following we consider a workload con- sisting of TPC-W browsing sessions arriving at 20 sessions/second. The workload is applied to the multi-tier 68

100 100

Session slowdown Resource utilization 80 Negative exponential curve 80 Negative exponential curve

60 60

40 40 Session slowdown Resource utilization 20 20

0 0 5 10 15 20 25 5 10 15 20 25 Number of allocated virtual servers Number of allocated virtual servers

(a) Session slowdown fit. (b) Resource utilization fit.

Figure 4.2: “Allocated resources - session slowdown” and “Allocated resources - resource utilization”.

Internet service multiple times as the number of allocated virtual servers varying from 3 to 100. Note that at least one virtual server is needed at each tier.

4.4.1.1 Regression model of “allocated resources - session slowdown”

Figure 4.2(a) depicts the session slowdown behavior with the number of virtual servers allocated to the multi- tier system. Because there is no significant change in the session slowdown when the number of allocated virtual servers is greater than 25, those results are omitted in Figure 4.2(a). The results reveal a negative exponential growth trend. The negative exponential growth relationship is expressed by Eq. (4.5), where the variables x and y correspond to the number of virtual servers allocated to the multi-tier Internet service and the observed values of the session slowdown respectively.

−b1x y = a1e . (4.5)

As the relationship trend is observed to be exponential in nature, statistical exponential regression analysis is performed on Eq. (4.5). Exponential regression analysis involves linearizing an exponential equation and performing linear regression analysis of the linearized equation. Eq. (4.5) is linearized by taking its natural logarithm. It yields

ln y = ln a1 − b1x ln e. (4.6)

Performing linear regression analysis on Eq. (4.6) results in the following expressions for the coefficients 69

a1 and b1, P ln y + b P x ln a = i 1 i . (4.7) 1 n P P P n xi ln yi − xi ln yi b1 = P 2 P 2 . (4.8) n xi − ( xi) where (xi, yi) are the individual data points and n is the total number of data points plotted in Figure 4.2(a).

The numerical values of a1 and b1 substituted in the Eq. (4.5) results in

y = 65.4833e−0.067x. (4.9)

It represents the quantitative exponential regression model of the “allocated resources - session slowdown” relation.

The quality of the regression model is quantified by the correlation coefficient r. The correlation coeffi- cient for the linearized Eq. (4.6) using the least square error analysis is

P(x − x¯)(log y − log¯ y) r = i i . (4.10) pP 2pP ¯ 2 (xi − x¯) (log yi − log y)

¯ (xi,yi) are the individual data points plotted in Figure 4.2(a). x¯ and log y represent the mean of x and log y respectively.

The calculated correlation coefficient r for the data plotted in Figure 4.2(a) is 0.9838. It indicates that the negative exponential relation is a high quality fit for the observed session slowdown data.

Figure 4.2(a) also depicts a visual representation of the data compliance to the regression model. It shows the observed session slowdown values relative to the negative exponential curve. The session slowdown values fit the curve very closely, with most data points being on or very close to the curve.

4.4.1.2 Regression model of “resources allocated - resource utilization”

Figure 4.2(b) depicts the resource utilization behavior with the number of virtual servers allocated to the multi-tier system. As there is no significant change in the resource utilization when the number of allocated virtual servers is greater than 25, those results are omitted in the figure. The results reveal a negative ex- ponential growth relation between the two parameters. The relation is similar to the “resources allocated - session slowdown” relation, but with quantitative differences. The negative exponential growth relationship 70 is expressed by Eq. (4.11), where the variables x and y correspond to the number of virtual servers allocated to the multi-tier Internet service and the observed values of the resource utilization respectively.

−b2x y = a2e . (4.11)

Eq. (4.11) is linearized by taking its natural logarithm. It yields

ln y = ln a2 − b2x ln e. (4.12)

Performing linear regression analysis on Eq. (4.12) results in the following expressions for the coefficients a2 and b2: P ln y + b P x ln a = i 2 i (4.13) 2 n P P P n xi ln yi − xi ln yi b2 = P 2 P 2 (4.14) n xi − ( xi) where (xi, yi) are the individual data points and n is the total number of data points plotted in Figure 4.2(b).

The numerical values of a2 and b2 substituted in the Eq. (4.11) lead to a quantitative exponential regres- sion model of the “allocated resources - resource utilization”. That is,

y = 76.2381e−0.098x. (4.15)

The correlation coefficient of the model calculated by applying the data plotted in Figure 4.2(b) to the

Eq. (4.10) results in 0.9458, indicating a very good quality fit. The data fit to the negative exponential curve is also presented in Figure 4.2(b), which shows the observed resource utilization values relative to the negative exponential curve.

4.4.1.3 Tier session slowdown ratio

The tier session slowdown ratio reflects the proportional resource demands posed by the workload on the individual tiers. To determine the tier session slowdown ratio, queuing delay and processing time of the individual requests at each tier are first measured. The request level measurements are then used to calculate the multi-tier service level session slowdown, tier session slowdowns, normalized tier session slowdowns using the Eqs. (4.1), (4.2), and (5.16) respectively. Finally the tier session slowdown ratio is calculated using

Eq. (4.4). 71

Table 4.2: A behavior model.

Session arrival rate 20 sessions/sec

Session type TPC-W Browsing

“resources - session slowdown” regression model y = 65.4833e−0.067x

“resources- session slowdown” correlation coefficient 0.9938

“resources- resource utilization” regression model y = 76.2381e−0.0989x

“resources-resource utilization” correlation coefficient 0.9458

Tier session slowdown ratio 0.25 : 0.16 : 0.59

The behavior model learned from conducting the training with the specific workload is summarized in the

Table 4.2. Training is repeated with the workloads summarized in Table 5.1. For each workload instance two regression models are learned. The correlation coefficients of the resulting “resources - session slowdown” regression models range from 0.7951 - 0.9982 with a mean value of 0.87. For the “resources- resource utilization” regression models learned, the correlation coefficients range from 0.7328 - 0.9546 with a mean of 0.82.

4.4.2 The Online Phase : Dynamic Server Provisioning

Regression-based dynamic resource provisioning strategy aims to effectively meet session slowdown guar- antee of a multi-tier Internet service under dynamic workloads while ensuring efficient resource utilization.

Knowledge of the behavioral dynamics of the multi-tier Internet service is critical for effective and efficient resource management for session slowdown guarantee. The provisioning strategy gains the knowledge from an extensive set of behavior models captured during the training phase.

The resource provisioning process is divided into a sequence of intervals. In each interval, average session slowdown and resource utilization are measured and compared to predefined thresholds. When a threshold vi- olation is observed, resource requirements of the service are predicted using a single learned behavior model.

The workload characteristics observed in the interval determine the behavior model used for predictions. A behavior model with the session type same as the dominant session type and session arrival rate closest to the observed session arrival rate is selected. If there are more than one behavior models that meet the criteria, the 72

Table 4.3: Regression-based dynamic resource provisioning strategy: notations

Symbol Description

sthr Session slowdown threshold

uthr Resource utilization threshold

thr sviolation Session threshold violation (true/false)

thr uviolation Resource utilization threshold violation (true/false)

savg Average session slowdown

uavg Average resource utilization

model with the higher correlation coefficients is selected. The selected behavior model represents the session slowdown and resource utilization behaviors as regression models.

We develop a threshold-based policy that uses a session slowdown threshold and a resource utilization threshold for efficient resource allocation. A session slowdown threshold is set below the session slowdown bound. A threshold violation indicates a possible risk of session slowdown guarantee violation. The “allo- cated resources - session slowdown” exponential regression model of the selected behavior model is used to predict additional resources required to keep the session slowdown under the threshold in the subsequent in- tervals. The additional resources are allocated to the individual tiers in proportion to the tier session slowdown ratio of the behavior model.

When there is a resource utilization threshold violation, the “allocated resources - resource utilization” exponential regression model of the behavior model is used to predict the number of virtual servers to be removed. The virtual servers are removed from the individual tiers in inverse proportion to the tier slowdown ratio of the behavior model. Fewer virtual servers will therefore be removed from a tier with relative higher resource demand.

The provisioning strategy is given by Algorithm 3. Table 4.4.2 summarizes key notations. 73

Algorithm 3 Regression-based dynamic resource provisioning strategy: description repeat

thr avg thr sviolation ← (s ≥ s ) ? true : false

thr avg thr uviolation ← (u ≤ u ) ? true : false

thr thr if (sviolation || uviolation) then

Select a representative behavior model from the set of learned behavior models.

thr if (sviolation) then

Predict resources to add using “allocated resources - session slowdown” regression model.

Divide resources among multiple tiers in proportion to the tier session slowdown ratio.

Allocate additional resources to the various tiers.

thr else if (uviolation) then

Predict resources to remove using “allocated resources - resource utilization” regression model.

Divide resources among multiple tiers in inverse proportion to the tier session slowdown ratio.

Remove resources from the various tiers.

end if

end if

until (ALL SESSIONS PROCESSED) 74

Table 4.4: Experimental workload characteristics.

Workload Session Mix Type Demand Intensive Tier

Workload-B TPC-W Browsing Database

Workload-S TPC-W Shopping Application

Workload-O TPC-W Ordering Web

4.5 Performance Evaluation

4.5.1 Experimental Setup

The experimental setup used to evaluate the dynamic provisioning strategy is similar to the one used to evaluate the session based admission control strategies (Chapter 3, Section 4). Our three-tier simulation model of an e-commerce site consists of a workload generator, web servers, application servers and database servers. The session based workloads are generated according to the guidelines provided by the TPC-W benchmark specification.

We evaluate the proposed regression based dynamic resource provisioning strategy using workloads dif- ferent than the training workloads. Offline training is conducted using a set of workloads, where each training workload consists of a single traffic mix arriving at constant session arrival rate. For evaluation, we use work- loads consisting of combination of different traffic mixes with dynamically varying session arrival rates. To generate dynamic workloads with varying session arrival rates, the workload generator takes as input the session arrival rate for each interval. It then determines the number of sessions to launch within the interval.

Table 4.4 summarizes three workloads used in our experiments.

4.5.2 Session Slowdown Guarantee

The first experiment is to show that the regression-based dynamic resource provisioning approach effectively provisions resources to meet the session slowdown guarantees of the multi-tier application. It further allocates resources to the appropriate tiers to satisfy different types of workload sessions.

We use three different workload models in Table 4.4 to examine the performance of the regression based 75

60 8 Session slowdown bound 7 Workload-B 50 Workload-S 6 Workload-O 40 5 30 4 3 20

Session slowdown 2 10

Session arrival rate(per sec) 1 0 0 0 180 360 540 720 900 0 180 360 540 720 900 Time(sec) Time (sec)

(a) A step-change dynamic workload. (b) Session slowdown.

Figure 4.3: Session slowdown due to regression based dynamic provisioning. dynamic resource provisioning strategy. Figure 4.3 (a) shows a step-change dynamic workload. It is highly dynamic as the session arrival rate increases from 10 sessions/sec to 50 sessions/sec. The total number of virtual servers available for the multi-tier application is 45. The session slowdown bound is set to 5. The session slowdown threshold is set to 3.5. The rationale for selecting the session slowdown threshold value is provided in section 4.5.4.

Figure 4.3 (b) shows the observed average session slowdown values for the three workload models. Re- sults show that the proposed approach is effective in achieving the session slowdown guarantees of all three workload models for the majority of time. When the session arrival rate reaches 50 sessions/second, higher values of session slowdown are observed and violations start to happen. We note that by this time all available virtual servers have been allocated to the multi-tier application. When the overload occurs, our session-based admission control for multi-tier applications designed in Chapter 3 can be extended and applied. We explore a coordinated combination of admission control and dynamic resource provisioning in Chapter 5.

Figure 4.4(a) shows the number of virtual servers provisioned to the multi-tier service for the three work- load models. There are no considerable differences for different workloads. This is due to the fact that in all three cases, workloads of similar session arrival rate are applied to the multi-tier service and the regression- based resource provisioning approach effectively distributed virtual servers to appropriate tiers as three work- load models impose different resource demands on different tiers.

Next we examine the resource allocation at the individual tiers for each workload and the differences be- 76

60 30 Web tier virtual servers 50 25 Application tier virtual servers Workload-B Database tier virtual servers Workload-S 40 Workload-O 20

30 15

20 10 Virtual servers Virtual servers

10 5

0 0 60 120 180 240 300 360 420 540 600 60 120 180 240 300 360 420 540 600 Time (sec) Time (sec)

30 30 Web tier virtual servers Web tier virtual servers 25 Application tier virtual servers 25 Application tier virtual servers Database tier virtual servers Database tier virtual servers 20 20

15 15

10 10 Virtual servers Virtual servers

5 5

0 0 60 120 180 240 300 360 420 540 600 60 120 180 240 300 360 420 540 600 Time (sec) Time (sec)

(a) Overall virtual servers allocation. (b) Per-tier server allocations: Workload-B.

(c) Per-tier server allocations: Workload-S. (d) Per-tier server allocations: Workload-O.

Figure 4.4: Resources allocation at the overall Internet service and at the individual tiers. 77

50 7 7 Session slowdown bound Session slowdown bound 6 6 40 5 5

30 4 4

20 3 3 2 2 Session slowdown Session slowdown 10 1 1 Session arrival rate(per second) 0 0 0 0 180 360 540 720 900 1080 1260 1440 1620 0 180 360 540 720 900 1080 1260 1440 1620 0 180 360 540 720 900 1080 1260 1440 1620 Time(sec) Time (sec) Time (sec)

(a) A dynamic workload. (b) Without the threshold use. (c) With the threshold use.

Figure 4.5: Impact of using a resource utilization threshold on the session slowdown. come apparent. Figure 4.4 (b) shows the number of virtual servers allocated at different tiers with Workload-

B. It shows that more virtual servers are allocated to the database tier than the web and application tiers. This agrees with the assumption that the browsing sessions impose more resource demand on the database tier than the other two tiers.

Figure 4.4(c) shows the number of virtual servers allocated at different tiers with Workload-S. It shows that more virtual servers are allocated to the application tier than other tiers. Figure 4.4(d) shows that for

Workload-O more virtual servers are allocated to the web tier than other tiers. The results demonstrate that the trained regression model accurately captures the workload dynamics and is effectively utilized by the dynamic resource allocation provisioning approach for session slowdown guarantee.

4.5.3 Efficiency in Per-Tier Resource Allocation

We demonstrate that the dynamic provisioning strategy effectively meets the session slowdown guarantees while ensuring efficient resource utilization by the use of a resource utilization threshold.

Figure 4.5(a) shows the session arrival rate of the workload used in the experiment. It is highly dynamic as the workload is a random combination of sessions from the three workload models. The session arrival rate dynamically varies from 10 sessions/sec to 40 sessions/sec. The total number of virtual servers available for the multi-tier application is 45. The session slowdown bound is set to 5. The session slowdown threshold is set to 3.5.

We use the same workload trace to execute the provisioning strategy two times, without and with using the resource utilization threshold. Figurers 4.5(b) and 4.5(c) shows the session slowdown values observed in 78

50 50 100 without utilization threshold with utilization threshold 40 40 80

30 30 60

20 20 40

10 10 Resource utilization 20

Number of allocated virtual servers 0 Number of allocated virtual servers 0 0 0 180 360 540 720 900 1080 1260 1440 1620 0 180 360 540 720 900 1080 1260 1440 1620 0 180 360 540 720 900 1080 1260 1440 1620 Time (sec) Time (sec) Time (sec)

(a) Without the threshold use. (b) With the threshold use. (c) Resource utilizations.

Figure 4.6: Impact of using a resource utilization threshold on the virtual server allocations. the two scenarios. In both scenarios, there are very few session slowdown guarantee violations. However, the resource utilization efficiency is very different.

Figure 4.6(a) shows the number of virtual servers allocated to the multi-tier application. When no resource utilization threshold is used, the virtual servers once allocated to the multi-tier application are not removed when there is a decrease in the session arrival rate. Next, a resource utilization threshold (70%) is used. In this scenario, the virtual servers are allocated and removed dynamically from the multi-tier application according to the variations in the session arrival rates. Figure 4.6(b) shows the number of resources allocated to the multi-tier application. Figure 4.6(c) compares the resource utilization in the two scenarios. Apparently, using a threshold achieves much better resource utilization. The experiment demonstrates that the threshold-based resource provisioning strategy is capable of achieving session slowdown guarantee while efficiently using the allocated resources.

4.5.4 Impact of Session Slowdown Threshold on Performance

The session slowdown threshold can significantly affect the performance of the dynamic resource provision- ing strategy. A threshold set far below the session slowdown bound may lead to more threshold violations.

This results in more resources provisioned than needed to meet the session slowdown guarantee. On the other hand, if the threshold is set too close to the bound, when threshold violations occur it may be too late to avoid session slowdown guarantee violations by provisioning additional resources.

We conduct an experiment to study the affect of the session slowdown threshold on session slowdown guarantee and on resource utilization efficiency. We change the session arrival rate from 10 sessions/sec to 79

100 100

80 80

60 60

40 40

20 20

0 Number of allocated virtual servers 0 Session slowdown guarantee violations 10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90 Session slowdown threshold (% of the bound) Session slowdown threshold (% of the bound)

(a) Impact on session slowdown violation. (b) Impact on virtual server allocation.

Figure 4.7: Impact of session slowdown threshold on slowdown guarantee and resource allocation.

80 sessions/sec. The workload is dynamic and consists of random combination of sessions from the three workload models. The total number of virtual servers available for the multi-tier application is 100. The session slowdown bound is set to 5. We repeatedly execute the experiment using the same workload as the session slowdown threshold changes to at 10%, 20%, ··· , 90% of the session slowdown bound.

Figure 4.7 (a) shows the number of session slowdown guarantee violations. Figure 4.7 (b) shows the number of virtual servers allocated to the multi-tier application. The results show that there is no session slowdown guarantee violation when the threshold is below 70% of the bound. However, more virtual servers are allocated to the multi-tier application. As the threshold increases, fewer virtual servers are required to maintain the session slowdown guarantee. However, session slowdown violations start to happen when the threshold is set to 70% of the bound, and, more violations quickly build up as the threshold further increases.

It is a tradeoff between the resource utilization efficiency and performance guarantee. It deserves a further study on adaptive threshold tuning by statistical learning and control in our future work.

4.5.5 Impact of Resource Utilization Threshold on Performance

The efficiency of the regression-based dynamic provisioning approach is affected by the resource utilization threshold setup. A low threshold may result in infrequent threshold violations and fewer resource removals, leading to inefficient utilization of the allocated virtual servers. On the other hand, a high resource utilization threshold may lead to premature removal of the virtual servers allocated that may be reallocated in subsequent 80

100

80

60

40

20

0 10 20 30 40 50 60 70 80 90 # of virtual server allocations and removals Resource utilization threshold (%)

Figure 4.8: Impact of the resource utilization threshold on resource allocation efficiency. intervals.

Frequent resource provisioning oscillations, which are successive virtual server removals followed by reallocations, are undesirable. They negatively affect the service performance as each virtual server allocation and removal is associated with a cost in terms of time and processing overhead.

We examine the effect of the resource utilization threshold on the total number of virtual server allocations and removals. The workload used is same as the workload used for analyzing the impact of session slowdown threshold in section 4.5.4. We repeatedly execute the experiment using the same workload as the resource utilization threshold is varied at 10%, 20%,...,90%.

Figure 4.8 shows the total number of virtual server allocations and removals as the resource utilization threshold is varied. The results show that when the threshold is below 70%, the total number of virtual server allocations and removals remain fairly low. As the threshold is increased above 70% this number increases drastically, indicating that the allocated virtual servers are removed too frequently and are reallocated in the subsequent intervals.

We also observe the session slowdown guarantees violations for specific session slowdown thresholds. We note that the violations observed are similar to those plotted in Figure 4.7 (a). Using a combination of session slowdown threshold and resource utilization threshold, the regression-based dynamic resource provisioning strategy can effectively achieve session slowdown guarantee while minimizing the resource provisioning oscillations. 81

8 Session slowdown bound 7 30 sec 60 sec 6 90 sec 5 4 3

Session slowdown 2 1 0 0 10 20 30 40 50 Session arrival rate (sec)

Figure 4.9: Impact of interval length on session slowdown.

4.5.6 Impact of Online Monitoring Interval

Each interval of an interval-based provisioning process entails overhead of comparing measured performance metrics to thresholds and predicting resource requirements in case of threshold violations. The length of the provisioning interval is a trade off between the overhead and the resource requirement prediction accuracy.

Short intervals require frequent predictions that lead to increased overheads. Longer intervals may lead to delayed response to the threshold violations, increasing the risk of session slowdown guarantee violations.

To study the effect of the interval length, we use several workload traces with different session arrival rates and consisting of random combination of three different session types. The interval lengths are set to

30, 60 and 90 seconds. Session slowdown bound is set to 5 and session slowdown threshold is 3.5. The total number of virtual servers available for the multi-tier application is 100.

Figure 4.9 shows the observed session slowdown for each possible combination of the workload session arrival rate and interval lengths. Results show that the session slowdown guarantee is met for all workloads when the interval length is set to 30 or 60 seconds. However, the 30-second interval length will lead to more overhead. When the interval length is increased to 90 seconds, violations occur for workloads with higher session arrival rates. However, threshold violations are observed and new resource allocation is triggered only at the end of an interval. 82

4.5.7 Validation of Regression Models and Comparison with an Analytical Model

4.5.7.1 Validating the statistical regression relations

The training phase described in section 4.4.1 utilized TPC-W workloads to derive the “allocated resources - session slowdown” and “allocated resources - resource utilization” statistical regression models. We examine the validity of the derived statistical regression models using an alternative representation of the multi-tier

Internet service.

We consider a queueing-based analytical model of a multi-tier Internet service proposed in [104], which represents an n-tier application as a network of n queues processing session based workloads. It proposes a

Mean-Value Analysis (MVA) algorithm for closed-queuing networks to compute the response time experi- enced by a request in a network of queues. The algorithm takes as inputs, the average request service time at each tier S¯n, average think time of a session Z¯, and the number of concurrent sessions N. It calculates the average queuing delay of requests at each tier R¯n, average response time of request R¯ and throughput τ as each session is introduced to the queuing network.

We extend and tailor the algorithm to compute the session slowdown and resource utilization. In accor- dance with Eq. (4.1), the session slowdown is calculated as the relative ratio of the total queuing delay of the requests of the session to the total service time of those requests. That is,

Pn Pm ¯ i=1 j=1 Rij Pn Pm ¯ (4.16) i=1 j=1 Sij where m is the number of requests in the session.

The resource utilization is calculated according to the utilization law for a queuing system, which states that S = ρ/τ, where S, ρ and τ are the service time, queue utilization and throughput respectively. That is, resource utilization is calculated as n X ρ = τ S¯i. (4.17) i=1 We apply the extended MVA-based algorithm to a typical 3-tier Internet service, where each tier contains multiple queues that represent the virtual servers allocated to that tier. For different values of the input param- eters, we computed the two outputs session slowdown and resource utilization using the extended MVA-based algorithm. In the interest of clarity of the figures, we plot the outputs only for two input parameters sets de- 83

Table 4.5: Inputs for Mean-Value Analysis algorithm.

Parameter Set 1 Set 2

Average web tier request service time 5 ms 12 ms

Average application tier request service time 10 ms 17 ms

Average database tier request service time 5 ms 15 ms

Average user session think time 1 sec 5 sec

Number of concurrent sessions 25 50

tailed in the Table 4.5.

For both the input parameter sets, the per-tier request service times are arbitrary as the MVA algorithm does not make any assumption about the service time distributions and the proposed queuing model is suffi- ciently general to handle workloads with an arbitrary service time requirements [104]. As the work in [104], the user think time of a session is chosen using an exponential distribution and the mean is chosen uniformly at random from the set {1 sec, 5 sec}. For the two input parameter sets, we choose the two extreme values,

1 second and 5 seconds as the average user session think times. Choosing sessions with two widely different think times ensures variability in the workload imposed by individual sessions [104].

Figure 4.10 shows the session slowdown and resource utilization computed for the two input parameter sets as the total number of virtual servers (queues) allocated to the multi-tier service are varied. Two plots reveal a negative exponential growth trend exhibited by the session slowdown and the resource utilization metrics with respect to the virtual servers allocated to the service. The plots also show the observed ses- sion slowdown values relative to a negative exponential curve. They reveal that while the trend observed is negative exponential, the values due to the use of MVA algorithm do not fit the regression curve very closely.

4.5.7.2 Impact of concurrent sessions on session slowdown

We apply the extended MVA-based algorithm to the typical 3-tier Internet service. The number of allocated virtual servers at each tier is five. The number of concurrent sessions is changed from 25 to 500 at increments of 25. The workload consists of a random mix of TPC-W Browsing, Shopping and Ordering sessions. User think time and tier specific service times are chosen to be uniformly random within the range of two sets of 84

10 100

Session slowdown parameter set 1 Session slowdown parameter set 1 8 negative exponential curve (set 1) 80 negative exponential curve (set 1) Session slowdown parameter set 2 Session slowdown parameter set 2 negative exponential curve (set 2) negative exponential curve (set 2) 6 60

4 40 Session slowdown

2 Resource Utilization (%) 20

0 0 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Number of allocated virtual servers Number of allocated virtual servers

(a) Session slowdown trend. (b) Resource utilization trend.

Figure 4.10: “Allocated virtual servers - session slowdown” and “Allocated virtual servers - resource utiliza- tion” relations for a queuing model of a multi-tier Internet service.

10

8

6

4

Session slowdown 2

0 50 100 150 200 250 300 350 400 450 500 Number of concurrent sessions

Figure 4.11: Session slowdown behavior with extended MVA algorithm. parameters specified in Table 4.5.

We measure the average session slowdown for all completed sessions. Figure 4.11 shows the average session slowdown with the number of concurrent sessions. The results show that the session slowdown consistently increases with increase in the workload until the system is saturated at 375 concurrent sessions.

We note that at this saturation point, the fixed capacity of the multi-tier Internet service results in sessions being aborted due to the overload. 85

40 6 Session slowdown bound 35 5 30 4 25 20 3 15 2

10 Session slowdown 1

Session arrival rate(per sec) 5 0 0 0 180 360 540 720 900 1080 0 180 360 540 720 900 1080 Time(sec) Time(sec)

6 10 Session slowdown bound 5 Regression 8 MVA 4 6 3

2 4 Session slowdown

1 2

0 0 180 360 540 720 900 1080 0 Time(sec) StandardDeviation Violations

40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5

Number of allocated virtual servers 0 Number of allocated virtual servers 0 0 180 360 540 720 900 1080 0 180 360 540 720 900 1080 Time(sec) Time(sec)

(a) A dynamic workload. (b) With regression approach.

(c) With MVA approach. (d) Deviation and violations.

(e) With regression approach. (f) With MVA approach.

Figure 4.12: Performance comparison of regression and MVA based approaches. 86

4.5.7.3 Comparison with MVA-based dynamic capacity provisioning

We compare the performance of the regression-based resource provisioning approach with the extended MVA algorithm for session slowdown guarantee in the three-tier architecture. The workload used for the experiment consists of random combination of three session types with highly dynamic session arrival rates. The session arrival rate is varied every 30 seconds as shown in Figure 4.12(a). Session slowdown bound is set to 5. For the regression based approach, session slowdown threshold is 3.5 and resource utilization threshold is set at

70%.

The MVA algorithm is typically used for steady state workloads. We divide the MVA-based dynamic capacity provisioning process into periodic intervals. The interval length is set to 60 seconds. Within each interval, the MVA algorithm treats the workload as steady state. This was practiced by Urgaonkar et al. [104] to conduct MVA-based dynamic capacity planning for multi-tier Internet services. Similarly, in our approach, at the end of each interval, MVA algorithm is applied to determine the number of servers needed at each tier to satisfy the session slowdown guarantees. The MVA algorithm takes the number of simultaneous sessions to be served and the session slowdown target as the inputs. At each interval edge, the number of simultaneous sessions (N) to be served in next interval is derived from the peak session arrival rate (λ) and average session duration (d) in the current interval using the Little’s Law (N = λ × d).

Figures 4.12(b) and 4.12(c) show the session slowdown values achieved with the regression and MVA based approaches, respectively. We can observe that the regression-based approach is much more robust to workload variations when providing session slowdown guarantee. A couple of spikes in the session slowdown are observed due to the sudden changes in the applied workload. However, the regression-based approach is responsive in assuring session slowdown guarantee, resulting in much fewer violations compared to the results due to the MVA-based approach.

Figure 4.12(d) shows the standard deviation of resulted session slowdown values and the total number of violations for the two resource provisioning approaches. The regression-based approach achieves session slowdown guarantee with lower standard deviation and fewer number of violations than the MVA-based approach does. We attribute the robustness to regression-based prediction and use of the session slowdown threshold. 87

Figures 4.12(e) and 4.12(f) show the number of virtual server allocated for the multi-tier system with the regression and MVA based approaches respectively. Results show that the regression based approach performs timely virtual server allocation and removal in accordance to the workload changes. The use of the resource utilization threshold results in more efficient resource utilization. Note that the MVA based approach does not consider the removal of the virtual servers [104].

4.5.8 Impact of TPC-W Workload Burstiness on Performance

Finally we evaluate the effectiveness of the regression based dynamic server provisioning for session slow- down guarantee under a bursty workload. An interesting recent study in [79] discussed a model of bursty e-commerce workloads. It uses a two-state Markovian arrival processes(MAP) to inject burstiness into the

TPC-W benchmark. A single parameter, index of dispersion (I), is used to control the degree of bursti- ness. The index of dispersion is used to dynamically modify the think times of a user between submission of consecutive requests within one session.

Using the algorithm proposed in [79] with the dispersion index I set to 300, we generate bursty TPC-W workloads consisting of random combination of three traffic mixes. Figure 4.13 shows the generated bursty workload. For this experiment, the session slowdown bound is set to 5, the session slowdown threshold is set to 3.5 and the resource utilization threshold is set to 70%. The initial number of virtual servers used is 6, with

2 virtual servers allocated per tier. The maximum number of virtual servers available is set to 60. The online monitoring interval length is set to 20 seconds.

Figure 4.14(a) shows the observed session slowdown values. During the first interval, due to the workload spikes the session slowdown guarantee is violated. In the subsequent intervals, the regression based approach effectively predicts the number of virtual servers required to handle the workload changes. The subsequent session slowdown values observed are under the session slowdown bound.

However, after about 900 seconds, the system observes higher values of session slowdown. Starting at the

1000th second, session slowdown violation occurs. We note that at the 1000th second, all available virtual servers have been allocated to the system and further capacity increase of the system is not feasible. When overload occurs, we note that our session-based admission control for multi-tier applications designed in 88

100

80

60

40

20 Session arrival rate (per second) 0 0 120 240 360 480 600 720 840 960 1080 Time(sec)

Figure 4.13: Bursty TPC-W workload.

8 70 Session slowdown bound 7 60 6 50 5 40 4 30 3 20 Session slowdown 2 1 10

0 Number of allocated virtual servers 0 0 120 240 360 480 600 720 840 960 1080 0 120 240 360 480 600 720 840 960 1080 Time(sec) Time(sec)

(a) Session slowdown (b) Resource allocations

Figure 4.14: Performance evaluation with bursty TPC-W workload.

Chapter 3 can be applied.

Figure 4.14(b) shows the number of virtual servers allocated in each interval to process the bursty work- load. Note that during the last few intervals, there are many residual requests belonging to the sessions entering in the previous intervals. Thus, more virtual servers are allocated. While the session slowdown guarantee is satisfied after the first interval, bursty workloads result in frequent provisioning oscillations. A provisioning oscillation is allocation of virtual servers in an interval followed by removal of virtual servers in the next interval or vice versa. Frequent resource allocations add overhead to resource management and switching delay to session requests. We explore a coordinated admission control and dynamic provisioning in Chapter 5. 89

4.6 Summary and Discussion

We propose a novel session based performance metric,Session slowdown, for multi-tier Internet services.

Session slowdown is a compelling performance metric of session-based Internet services because it directly measures user-perceived relative performance. In this chapter, we proposed a statistical regression based approach for effective resource management of a multi-tier Internet service for session slowdown guarantee.

We used statistical regression analysis to learn the session slowdown behavior with respect to the Internet service resources. We designed a novel regression-based dynamic resource provisioning strategy that utilizes learned models to predict and manage the resource requirements of the Internet service. Extensive simulation results using TPC-W benchmark workloads have demonstrated the superior performance of the new resource provisioning approach. The regression-based approach adaptively and efficiently provisions resources to appropriate individual tiers for session slowdown guarantee of the multi-tier service, taking the dynamic resource demand and resource utilization into account.

The regression based dynamic provisioning also suffers from similar limitations as the admission control

(CoSAC) strategy discussed in chapter 3. The session type of the incoming user session is an integral part of the behavior models learned offline and its knowledge is critical for the online phase of the provisioning process. As in CoSAC, the offline training phase is mandatory for successful provisioning. Despite these shortcomings, chapters 3 and 4 established the feasibility of applying supervised statistical learning to the domain of multi-tier Internet service management. In the next chapter, we explore an unsupervised learn- ing technique that is independent of training, towards a coordinated combination of admission control and dynamic resource provisioning. Chapter 5

Multi-tier Internet Service

Differentiation

5.1 Introduction

Modern data centers apply server virtualization to host popular multi-tier Internet services that share the underlying high density server resources in the same platform. Provisioning relative service differentiation among hosted services is necessary due to their different subscription payments to the hosting platform [30,

35, 40, 69, 87]. Service differentiation is to provide different service quality levels to meet changing system configuration and resource availability and to satisfy different requirements and expectations of applications and users.

Two approaches to service differentiation are absolute and relative differentiation. In absolute differenti- ation, a high priority hosted service always receives the desired performance from the shared platform, even in the presence of traffic from other low priority hosted services. This may lead to high priority services monopolizing the shared resources resulting in low priority service starvation.

Whereas in relative differentiation, a high priority service is only guaranteed a better performance than the low priority service. While the relative differentiation avoids low-priority service starvation, it can lead to performance degradation with increased demand from other services. Due to fairness and practicability, the

90 91 proportional differentiation model [36] has been widely used for relative service differentiation [69, 122, 129].

The model states that the achieved service quality levels should be proportional to the predefined service differentiation parameters, independent of the workload dynamics.

Provisioning service differentiation in single Web servers was well studied using request scheduling, control and processing rate allocation with queueing models, feedback control and content adaptation [3, 71,

122, 129]. However, multi-tier service differentiation provisioning is a very hard problem in practice. In a shared virtualized hosting platform, the user-perceived service quality is the result of a complex interaction of complex workloads in a very complex underlying server system [21, 57, 79, 81, 120].

The multi-tier Internet service architecture imposes challenging performance issues such as inter-tier performance dependencies, concurrency limit per tier and dynamic bottleneck tier shifting. Furthermore, multiple multi-tier Internet applications often share a virtualized infrastructure of high density servers. Such a hosting platform has become so complicated that it is even non-trivial to get a good understanding of the entire system dynamic behavior. The dynamic and bursty nature of the Internet workloads further magnifies the complexities.

Dynamic resource management and admission control explored in the previous two chapters are critical for quality-of-service provisioning and load management in shared platforms. However, the statistical learn- ing based dynamic resource management and admission control approaches designed in the previous two chapters have some limitations. They are closely coupled with the session type of the incoming sessions. For

CoSAC, the incoming session type is one of the evidences applied to the Bayesian Network. For regression based dynamic provisioning, the session type is part of the behavior models learned during the training phase.

While user profiling can be used to probabilistically determine the session type, Internet service management should preferably be independent of such close coupling. Yet another limitation is the mandatory training required for the supervised statistical learning techniques employed.

Additionally, the regression based dynamic provisioning can only manage the resources at the individual server level, that is, the number of virtual machines assigned to the web, application and database tiers. The shared resources of a data center can also be managed at a finer granularity. The resources of each virtual machine (cpu, memory, ··· ) should adaptively provisioned to satisfy the dynamic workloads experienced by 92 the hosted Internet services.

In this chapter, we propose a coordinated combination of adaptive resource management and admission control for proportional multi-tier service differentiation and performance improvement of hosted services in a shared virtualized platform. We develop a reinforcement learning module for VM auto-configuration with respect to CPU, memory and disk storage resources. We further develop a reinforcement learning module for session based admission control to police the application workloads. Coordination between the two is achieved through a shared reward. We enhance the reinforcement models with cascading neu- ral networks to integrate the strengths of model-independence of reinforcement learning and self-learning and self-construction of neural networks for system scalability and agility.

We implement the service differentiation approach in a virtualized HP ProLiant blade server hosting multiple multi-tier RUBiS applications. We adopt two popular performance metrics, average request re- sponse time [59, 106] and relative session throughput that is the percentage of the sessions completed [45].

Experimental results demonstrate that our approach can accurately meet the differentiation targets between co-hosted applications while improving the response times and relative session throughput of the applications by effectively utilizing the shared resources. The approach reacts to highly dynamic and bursty workloads in an agile and scalable manner.

5.2 Reinforcement Learning and Neural Networks : Background

In reinforcement learning, the learner is a decision-making agent that takes actions in an environment and receives reward (or penalty) for its actions in trying to solve a problem. The objective is for the agent to choose actions so as to maximize the expected reward over some period of time. After a set of trial-and-error runs, the agent learns the best policy, which is the sequence of actions that maximize the total reward.

Figure 5.1(a) illustrates the basic interaction in the normal operation of reinforcement learning. Assuming time progresses in discrete steps, each interaction consists of

• observing the system’s current state st ∈ S at time t

• performing some legal action at ∈ A in state st 93

ENVIRONMENT

x1

Y1 x State Reward Action 2

Ym

x AGENT n w w ’ w"

Ouput Layer Input Layer Hidden Layer

(a) Reinforcement Learning. (b) Neural Network.

Figure 5.1: Reinforcement Learning and Neural Networks

• receiving a reward rt+1, followed by a transition to a new state st+1.

The policy Π defines the agent’s behavior and is a mapping from the environment states to actions: Π :S

Π → A. The policy defines the action to be taken in any state st: at = Π(st). The value of the policy Π, V (st), is the expected cumulative reward that will be received while the agent follows the policy, starting from state st.

Π 2 V (st) = E[rt+1 + γrt+2 + γ rt+3 + ··· ] (5.1)

where 0 ≤ γ < 1 is the discount rate to keep the value finite. Note that the reward at time t is discounted by a factor of γt. Thus, for large values of expectation, positive rewards should be accrued as soon as possible, and postpone negative results as long as possible.

Reinforcement learning offers two main advantages over the supervised learning, such as bayesian net- works and statistical regression. First, it does not require an explicit model of either the system being managed or of the process that generates the the workload. This not only eliminates the need for designing a multi-tier

Internet service model, but also eliminates the task of capturing complex dynamics of the Internet traffic.

Second, due to its basis in Markovian Decision Process(MDP) theory, it takes into consideration both the immediate and all future rewards of an action. Thus RL could potentially outperform methods that approx- imate or completely ignore the dynamic effects or cast the decision making problem as a series of unrelated instantaneous optimizations [102]. 94

However, there are two major challenges in applying the reinforcement learning paradigm to multi-tier service differentiation and performance improvement, i.e., scalability and slow online learning. Reinforce- ment learning employs temporal difference based Q-value learning, which is a tabular approach that persists all states, intermediate actions and rewards in a table. This may result in poor scalability due to exponentially increased state space when the system scales up supporting many applications.

Furthermore, in the look-up table based policy learning, the intermediate values are stored separately without interactions. The convergence of the optimal policy requires that each table entry is visited at least once. In practice, the time required to collect samples to populate the Q table is prohibitively long. Due to this, when used for managing online systems, interaction based reinforcement learning suffers from lengthy trail-and-error iterations and leads to slow online adaptation [9].

To address the scalability and agility challenges, we enhance the reinforcement learning with neural networks. A neural network is a mathematical model that simulates the structure and/or functional aspects of biological neural networks. Neural networks are non-linear statistical data modeling and decision making tools. They can be used to model complex input-output relationships and to find patterns in data.

A multilayer perceptron (MLP), most widely used type of a neural network, is shown in Figure 5.1(b).

An MLP consists of one input layer, one output layer and one or more hidden layers. It has n inputs and generates m outputs, thereby maps an n dimensional space into an m dimensional space.

Neural networks can approximate any non-linear function and capture complex non-linear input-output parameter relationships, without prior knowledge of the parameters involved and without making any assump- tions. A neural network effectively captures the non-linear relationship between large number of variables.

Incremental training of neural networks requires fewer data samples and results in faster online learning when integrated with reinforcement learning. 95

Figure 5.2: A platform hosting multi-tier applications on virtualized servers.

5.3 Multi-tier Service Differentiation and Performance Improvement

5.3.1 Problem Statement

We consider a hosting platform with fixed amount of resources shared by multiple multi-tier Internet services based on server virtualization technology [83]. The goal is to provide relative service differentiation between co-hosted services while improving the performance of individual applications through effective utilization of shared platform resources. Figure 5.2 illustrates a platform with a resource pool, which is a collection of

fixed amount of CPU, memory and disk storage resources.

Relative service differentiation should satisfy two basic requirements: predictability and controllabil- ity [36]. Requests are categorized into different classes based on the specific applications. Predictability requires that higher priority classes receive better or no worse service quality than lower classes, indepen- dent of the class load distributions. Controllability requires a number of controllable parameters be available, which are adjustable for the control of quality spacings between classes. An additional requirement is fair- ness, which is a quantitative extension of the predictability and describes how much better service quality 96 received by a class compared with that received by another class.

Due to fairness and practicability, the proportional differentiation model [36] has been widely used for relative service differentiation [69, 122, 129]. The model states that the achieved service quality levels should be proportional to the predefined service differentiation parameters, independent of the workload dynamics.

Two performance metrics of interest are, the average request response time and relative session through- put of the hosted applications. For improved application performance, gap between target and observed request response times should be reduced and relative session throughput should be increased. Intuitively, to increase session throughput more sessions should be accepted. However, this may lead to aborted ses- sions under excessive loads. Sessions aborted in the middle of their transactions utilize resources but do not contribute to the overall throughput, resulting in resource wastage. We aim for proportional differentiation and performance improvement through effective utilization of the shared resources while reducing resource wastage due to aborted sessions.

5.3.2 Problem Formulation

We consider a shared platform with M hosted applications, each being an N-tier architecture. In a typical modern e-commerce application, N = 3. Each tier of an individual application is hosted inside a dedicated

VM. The M × N VMs share the resource pool of the virtualized platform. We consider a combination of two complementary mechanisms, adaptive VM auto-configuration and session-based admission control, to achieve our two-folded objective.

5.3.2.1 Objective 1: Proportional service differentiation

Let R be a vector representing resources in the resource pool. We consider three shared resources, i.e., CPU, memory and disk space. That is,

R = {cpu, mem, disk}. (5.2)

Let RM×N be a vector representing the resource configuration of M × N VMs. That is,

RM×N = {C11,M11,D11, ··· ,CMN ,MMN ,DMN }. (5.3) 97

where Cmn, Mmn, and Dmn represent the CPU, memory and disk space of the VM hosting the nth tier of the mth application, respectively.

Proportional service differentiation aims to ensure that the service quality spacing between application i and application j is proportional to their pre-specified differentiation parameters δi and δj; that is,

q δ i = i 1 ≤ i, j ≤ M, (5.4) qj δj where qi and qj are the service quality factors of application i and application j, respectively. For differentia- tion between co-hosted applications, we use the average end-to-end response time in a multi-tier architecture as the primary performance metric [106]. Our developed approach can also use the relative session through- put as the performance metric. Without loss of generality, we assume that application 1 is the highest priority application and we have 0 < δ1 < δ2 < . . . < δN .

Each application has a service-level agreement on the target average response time, denoted as Ti, where

Ti/Tj = δi/δj. Provisioning relative service differentiation aside, we want to ensure that the user-perceived average response time of each application is as close to its target as possible. That is to minimize the gap between the target and observed average response times for each hosted application.

minimize |Ti − qi| ∀i ∈ {1 ··· M}. (5.5)

The VM auto-configuration problem is how to allocate available CPU, memory and disk resources in the shared platform to the RM×N vector so that Eq. (5.4) and Eq. (5.5) are satisfied at the same time. It is up to the data center administrator and application providers to select appropriate service quality levels in terms of the differentiation parameters and the target service quality level that best meet their requirements, cost, and constraints.

5.3.2.2 Objective 2: System throughput maximization

Let SM be a vector representing the number of accepted and rejected sessions of the M applications. That is

SM = {A1,R1, ··· ,AM ,RM } (5.6)

where Ai and Ri represent the number of accepted and rejected sessions of application i, respectively. 98

We want to maximize the number of completed sessions and minimize the number of aborted sessions for each application so as to improve the relative session throughput and reduce wastage of the shared resources.

That is,

maximize Coi ∀i ∈ {1 ··· M} (5.7)

minimize Abi ∀i ∈ {1 ··· M} (5.8)

where Coi and Abi represent the number of completed and aborted sessions of application i, respectively.

The admission control problem is how to selectively admit user sessions to determine the SM vector so that

Eqs. (5.7) and (5.8) are satisfied simultaneously.

Note that although the problem is formulated to address per-application service differentiation and per- formance improvement of multiple applications, the modeling and the designed approach are also applicable for per-class service differentiation and performance maximization in one application.

5.4 The System Design for Scalability and Agility

Figure 5.3 illustrates the system design. It consists of an admission control module, a performance monitor, and a VM auto-configuration module. The admission control and VM auto-configuration modules oscillate between idle and active modes, with their initial mode being idle. When a new session arrives, the admission control module switches to active mode, decides to either accept or reject the session, and switches back to idle mode. The decision is applied to all sessions arrived while the module is in active mode. The performance monitor periodically interacts with the virtualization infrastructure to measure the average response time of the hosted applications and feeds them to the VM auto-configuration module, switching it into active mode.

The VM auto-configuration module determines new VM resource configurations and switches back to idle mode. The VM resource changes are actuated via the VM management API.

Admission control and VM auto-configuration modules are reinforcement learning based, which coordi- nate with each other using a shared reward. A reinforcement learner interacts with its environment that is expressed as a set of states and receives reward or penalty for its actions. It aims to learn the policy that captures the best sequence of actions to maximize the total reward. It can operate without an explicit of- 99

Environmental Model Environmental Model Neural Network Neural Network Reinforcement Reinforcement Learner Learner Function Approximator Function Approximator Neural Network Neural Network Admission Control Module Virtual Machine Auto Configuration Module

Performance Monitor

Hosted Hosted Hosted Application Application Application

Virtualization Management Module

VIRTUALIZATION INFRASTRUCTURE

Figure 5.3: The diagram of the system design.

fline model of the system being managed. This not only eliminates the need to obtain an accurate multi-tier performance model, but also the need to capture complex dynamics of the Internet workloads.

However, there are two major challenges in applying the reinforcement learning paradigm to multi-tier service differentiation and performance improvement, i.e., scalability and slow online learning. Reinforce- ment learning employs temporal difference based Q-value learning, which is a tabular approach that persists all states, intermediate actions and rewards in a table. This may result in poor scalability due to exponentially increased state space when the system scales up supporting many applications.

Furthermore, in the look-up table based policy learning, the intermediate values are stored separately without interactions. The convergence of the optimal policy requires that each table entry is visited at least once. In practice, the time required to collect samples to populate the Q table is prohibitively long. Due to this, when used for managing online systems, interaction based reinforcement learning suffers from lengthy trail-and-error iterations and leads to slow online adaptation [9, 19].

We improve the scalability and agility of the reinforcement learning approach by enhancing it with two neural networks. Neural networks can approximate any non-linear function and capture complex non-linear input-output parameter relationships, without a priori knowledge of the parameters involved and without 100 making any assumptions.

The first neural network enhancement acts as an environmental model to facilitate faster reward deter- mination. It captures the relationship between the workload characteristics, state, actions and reward. Since there is no need to capture exhaustive set of samples to populate the Q table, the reward prediction is agile.

By including workload characteristics as model input, it makes the reward prediction adaptive to dynamic workloads.

The second neural network enhancement acts as a non-linear function approximator and replaces the tabular policy learning. It captures the non-linear relationship between the state-action pairs and the approx- imated Q value. An enhanced reinforcement learner operating in a large scale environment can now persist relationship between a large number of states, actions and intermediate Q-values as weights of a single neural network and is therefore scalable.

Integration of reinforcement learning with neural networks is non-trivial, however. While reinforcement learning is an online incremental algorithm, advanced neural network training algorithms are often off-line batch algorithms. They require the complete training data available before the training can start, which is in contrast with the purpose of reinforcement learning. We train the two neural networks using an advanced training technique, cascade correlation, which was designed to address the problems inherent to the back propagation training. Cascading allows not only online training but also a controlled growth of a neural network. The two networks start with zero hidden layers and neurons and grow based on the knowledge gained at run time. Hence, the need to design and train the networks offline is eliminated.

5.5 Algorithms

5.5.1 Basic Reinforcement Learning for VM Auto-Configuration

We use a reinforcement learner to determine resource configurations of the VMs for proportional service differentiation among multi-tier applications while minimizing the gap between the target and observed re- sponse times of the applications. For reinforcement learning, the VM reconfiguration is defined in terms of state set, action set and a reward function. 101

State Set: The state set for the VM reconfiguration problem captures the configuration of multiple resources of the VMs hosting N tiers of M applications, which is described as

vc st = {C11,M11,D11, ··· ,CNM ,MNM ,DNM } (5.9)

where Cnm,Mnm,and Dnm represent CPU, memory and disk space of the VM hosting the nth tier of mth application, respectively.

Action Set: For each configurable resource of a VM, possible operations are increase(+1), decrease(-1) or nochange(0). All possible combinations of the three actions applied to |RM×N | configurable parameters of all VMs result in a complete action set. A sample action

vc at = {C11(0),M11(+1),D11(0), ··· , (5.10)

CNM (0),MNM (0),DNM (0)} indicates an increase in the memory of the VM hosting tier 1 of application 1.

With each action, only a single resource is modified. This resembles the natural trial-and-error method closely and searches the state space exhaustively. An action is invalid if it results in that the total resources allocated to the VMs exceed the available shared platform resources. When available resources are exhausted,

VM resource configurations either remain unchanged or their capacities are reduced since only decrease and nochange actions are valid.

Reward: The reward function takes into consideration two factors, service quality of different applications relative to each other and service quality of each individual application. We define a utility function for application i as

( δi − qi ) vc (Ti−qi) δi+1 qi+1 Ui = e + e (5.11)

where qi and qi+1 are the average response time for applications i and i+1 respectively, and Ti is the target response time of application i. 102

The utility function increases when it performs close to its target service quality and satisfies the relative differentiation ratio with respect to the next priority application. The exponential utility function has an inher- ent momentum feature. As better performance is achieved, the application utility achieved for a given action will grow quickly, thus promoting better and faster convergence toward better service quality provisioning.

The module reward is then defined as cumulative utility of multiple applications, that is

M vc X vc rt = Ui . (5.12) i=1

Each time the VM auto-configuration module is invoked, it executes multiple iterations to determine the action sequence that maximizes its reward till Q-value converges. Each iteration captures an action that updates a single VM resource. A sequence of actions results in updates to multiple resources in multiple

VMs.

5.5.2 Basic Reinforcement Learning for Session-Based Admission Control

Admission control selectively admits the application workloads to maximize the relative session throughput of the hosted applications while minimizing the resource wastage due to aborted sessions. For reinforcement learning, session based admission control is defined in terms of state set, action set and a reward function.

State Set: The state set captures the number of accepted and rejected sessions of M applications. It is described by

ac st = (A1,R1, ··· ,AM ,RM ) (5.13)

where Ai, Ri represent the number of accepted and rejected sessions of the application i respectively.

Action Set: The possible actions for the admission control are increasing the number of accepted or rejected sessions of each application, which are indeed to accept or reject a new session. Note that it is impractical to decrease the number of accepted and rejected sessions or not to act on an incoming session.

Reward: The reward function takes into consideration three factors, completed and aborted sessions of the 103 applications and latest reward from the VM auto-configuration module. We define a utility function for application i as

Coi−Abi ac (M+1−i)∗( A +R ) Ui = e i i (5.14)

where Coi and Abi represent the completed and aborted sessions of the application i, respectively.

The application utility increases as the number of completed sessions increases and the number of aborted sessions decreases. Additionally, the utility of a higher priority class increases faster than the lower priority class does. The module reward is defined as a combination of cumulative utility of multiple applications and the reward from the VM auto-configuration module, that is

M ac X ac vc rt = Ui + rt . (5.15) i=1

By including the application priority into the utility function, the admission control decision contributes to the differentiated treatment of the applications. Including the VM auto-configuration module’s reward in

Eq. (5.15) facilitates the coordination between the two modules. It enables the admission control module to accept or reject new sessions to maximize the relative session throughput and to minimize the resource wastage while maintaining the proportional differentiation achieved by dynamic VM reconfiguration.

5.5.3 Cascade Neural Network Enhancements

The VM auto configuration and admission control reinforcement learners are each enhanced with two cas- cading neural networks, an environmental model and a function approximator.

vc Inputs to the VM auto-configuration environmental model are: VM resource configurations (st ), actions

vc applicable to each resource (at ) and the request arrival rates of the application workloads. The output

vc of the network is the reward (rt ) corresponding to the inputs. By including workload characteristics as input to the neural network, the reward prediction is adaptive to the workload conditions. Inputs to the VM

vc auto-configuration function approximator are: the reward value from the environmental model (rt ), VM

vc vc resource configurations (st ), and applicable actions (at ). The output is the approximated Q value of the reinforcement learner. It is used to derive the policy that specifies the VM resource configuration changes. 104

Inputs to the admission control environmental model are: accepted and rejected sessions of the multiple

ac ac ac applications (st ), accept or reject actions(at ). The output of the network is the reward (rt ) corresponding to the inputs. Inputs to the admission control function approximator are: reward value from the environmental

ac ac ac model (rt ), accepted and rejected sessions of the multiple applications (st ) and applicable actions (at ).

The output is the approximated Q value of the reinforcement learner. It is used to derive the policy that specifies the admit or reject decision for a session.

The four cascading neural networks are trained online using the sliding window cache based Cascade algorithm as shown in Algorithm 4. It provides an efficient way to conduct online training of the neural networks. It alternates between adding new candidates to the network and training for the network output. The neural network structure is learned and trained incrementally for the initial n time-steps and batch training is applied after the n time-steps. With online incremental growth and updates to the trained networks, the neural network enhancements collectively determine the best possible VM resource configuration and admission control decision to effectively meet the differentiation targets while maximizing the performance of hosted applications.

5.6 Testbed Implementation

We developed a testbed platform in a virtualized HP ProLiant BL460C G6 blade server. It is equipped with

Intel Xeon E5530 2.4 GHz quad-core processor and 32 GB PC3 memory. Virtualization of the cluster is enabled by VMware vSphere 5.0. We created a resource pool from the virtualized server cluster to host multiple multi-tier applications. The resource pool is configured with 7500 MHz CPU and 1500 MB of memory and 600 GB disk space.

We implemented a three-tier server architecture with Apache 2.2.14, PHP 5.3.2 and MySQL 5.1 servers for the web, application and database tiers. Each tier server is hosted inside a dedicated VM. The guest used in each VM is Ubuntu Linux version 10.04. The vSphere module controls the CPU, memory and disk space allocated to the VMs. It provides an API to support remote management of VMs.

Initial resource configuration of all VMs is set to 256 MHz CPU, 64 MB memory and 5 GB disk. The granularity of resource provisioning is 16 MHz for CPU, 8 MB for memory and 1 GB for disk. We choose the 105

Algorithm 4 Cascade neural network training algorithm. Clear sliding window cache.

repeat

training-phase ← output-training.

if (cache is filled) then

if (training-phase = output-training) then

Train outputs for one epoch.

if (output training has stagnated) then

training-phase ← network-training.

end if

else if (training-phase = network-training) then

Train whole network for one epoch.

if (training has stagnated) then

Train candidates and install a candidate in the network.

training-phase ← output-training.

end if

end if

end if

until (network trained) 106 granularity based on empirical study. The enhanced reinforcement learning based resource allocation module interacts with the VM manager (VMM) through the vSphere Management API. It dynamically allocates

CPU, memory and disk space resources to each VM for provisioning proportional service differentiation and minimizing the gap between the target and measured response times of applications.

Likes others [45, 63, 106], we use open-source multi-tier benchmark application RUBiS. RUBiS imple- ments the core functionality of an eBay like auction site: browsing and bidding. RUBiS sessions have an average duration of 15 minutes and the average think time is 5 seconds. It defines two workload mixes: a browsing mix made up of only read-only interactions and a bidding mix that includes 15% read-write inter- actions. We instrument the RUBiS clients to generate workloads of different mixes as well as workloads of time-varying intensity and burstiness [79]. Each RUBiS client provides a sensor that measures the client- perceived average response time. RUBiS application is implemented with Apache 2.2.14, PHP 5.3.2 and

MySQL 5.1 servers for the web, application and database tiers.

For the implementation of integrated reinforcement learning with cascading neural networks algorithms, we used the open-source library Fast Artificial Neural Network (FANN) [?]. We empirically determined the learning rate α, exploration rate , discount rate γ of reinforcement learning and cache size and comment interval of cascade neural networks. Choosing appropriate learning parameter values is further discussed in 5.7.5.

5.7 Performance Evaluation

5.7.1 Effectiveness of VM Auto-configuration - Stationary Workloads

We first evaluate the effectiveness of the enhanced reinforcement learning based VM auto-configuration tech- nique in achieving proportional differentiation between two hosted RUBiS applications while minimizing the average request response time of each application under stationary workloads. Application A has a workload of a browsing mix of 500 concurrent users. Application B has a workload of bidding mix of 1000 concurrent users. We set the relative service differentiation target as δA/δB = 1 : 2. The response time targets for application A and application B are set to 1000 ms and 2000 ms respectively. Performance monitoring and 107

1 8000 Application A (target) Target ratio 7000 Application A (achieved) 0.8 Achieved ratio Application B (target) 6000 Application B (achieved)

0.6 5000 4000 0.4 3000

Differentiation ratio 2000 0.2 1000

0 Average request response time (ms) 0 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 Time (min) Time (min)

(a) Proportional service differentiation. (b) Average response time.

Figure 5.4: Relative service differentiation between two RUBiS applications (stationary workload). dynamic VM auto-configuration occur at 60-second intervals. In this experiment, we disable the admission control module so as to study the impact of the VM Auto-configuration on performance.

Figure 5.4(a) shows the achieved service differentiation ratio between the two applications and Fig- ure 5.4(b) shows the average request response time of each application. In the beginning, the achieved differentiation ratio and the average response times are higher than the desired targets. As the neural network structure grows, the system stabilizes with fast online learning. It improves the prediction accuracy of inter- mediate Q-values and the reward fed to the reinforcement learner. This enables the reinforcement learner to accurately determine the VM configurations. Just after four control intervals, the differentiation predictability and proportionality are effectively achieved. The average response times of two applications are also very close to the targets. The limited fluctuations are due to the session-based workloads. Although the number of users keeps the same, the number of requests coming to the system indeed is still dynamic.

5.7.2 Effectiveness of VM Auto-configuration - Dynamic Workloads

We next evaluate the performance of the enhanced reinforcement learning based VM auto-configuration technique under highly dynamic step-change workloads similar to what used in [58, 106]. Figure 5.5 shows that the number of concurrent users of two applications changes every 10 minutes. Application A has a workload of a browsing mix and application B has a workload of bidding mix. The differentiation target ratio is set as δA/δB = 1/3. The request response time targets of applications A and B are set to 1000 ms and 108

1000 Application A Application B 800

600

400

Number of active users 200

0 0 10 20 30 40 50 Time(min)

Figure 5.5: A highly dynamic step-change workload.

1 8000 Application A (target) Target ratio 7000 Application A (achieved) 0.8 Achieved ratio Application B (target) 6000 Application B (achieved)

0.6 5000 4000 0.4 3000

Differentiation ratio 2000 0.2 1000

0 Average request response time (ms) 0 0 10 20 30 40 50 0 10 20 30 40 50 Time (min) Time (min)

(a) Proportional service differentiation. (b) Average response time.

Figure 5.6: Relative service differentiation between two RUBiS applications (dynamic workloads).

3000 ms respectively. Performance monitoring and VM reconfiguration occur at 60-second intervals. In this experiment, the admission control module is disabled to study the impact of the VM Auto-configuration on performance.

Figure 5.6(a) shows the achieved service differentiation ratio between the two applications. Each time when there is a sudden step change in the workload intensity, the achieved ratio deviates from the desired target. However, due to fast online reinforcement learning, the developed approach adapts to highly dynamic workload changes in an agile manner. The system stabilizes rapidly and achieves the differentiation ratio close to the target in just a few intervals. The request response times of the individual applications are shown in Figure 5.6(b).

A comparison of the system stability with stationary and dynamic workloads in Figure 5.4(b) and Fig- 109

1200 300 35 Allocated 1000 Utilized 30 250 Allocated Allocated Utilized Utilized 25 800 200 20 600 150 15 Disk (GB) CPU (MHz)

400 Memory (MB) 100 10

200 50 5

0 0 0 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 Time (min) Time (min) Time (min)

(a) CPU usage. (b) Memory usage. (c) Disk space usage.

Figure 5.7: Resource allocation and utilization of the VM hosting the web tier of application A. ure 5.6(b) reveals counter intuitive results. The response time with stationary workloads seems more volatile than the response time with dynamic workloads. We explain this behavior as follows: with the dynamic workloads, there are fewer number of active users in any given interval when compared to the experiment with stationary workloads. In addition, the response time target for application B is higher (3000 ms) than the target in stationary workload experiment (2000 ms). Once the differentiation target stabilizes after the workload changes, the resource pool resources are enough to satisfy the differentiation ratio and to maintain steady application response times. On the other hand, in the stationary workload experiment, due to more active users and lower response time target of application B , the resource pool resources are shuffled between the two applications to maintain the stabilized differentiation target. This results in fluctuation of the response times of both the applications.

We further examine the provisioned resources of the VMs hosting the individual tiers of the two appli- cations. Figures 5.7 (a,b,c) show the allocation and utilization of CPU, memory and disk space of a single

VM hosting the web tier of application A. It shows that the VM auto-configuration technique efficiently pro- visions resources to the VMs. All allocated resources are effectively utilized. The resource provisioning is done in a self-adaptive manner according to the dynamic workload variations. Similar results are observed for the VMs hosting other tiers of application A and the other application B.

To demonstrate the merit of co-provisioning multiple resources in VMs, we modify the state and action sets of VM auto-configuration reinforcement learner to just consider reconfiguration of a single resource

CPU. We repeat the experiment with the same workload traces. Figure 5.8(a) shows the achieved service differentiation ratio. Figure 5.8(b) shows the observed response time of two applications. Compared to 110

1 8000 Application A (target) Target ratio 7000 Application A (achieved) 0.8 Achieved ratio Application B (target) 6000 Application B (achieved)

0.6 5000 4000 0.4 3000

Differentiation ratio 2000 0.2 1000

0 Average request response time (ms) 0 0 10 20 30 40 50 0 10 20 30 40 50 Time (min) Time (min)

(a) Service differentiation. (b) Average response time.

Figure 5.8: Relative service differentiation between two RUBiS applications (CPU-only allocation). results in Figure 5.6(b), it demonstrates that auto-configuration of only CPU resource leads to longer average response time of applications compared to auto-configuration of multiple resources.

5.7.3 Effectiveness of Coordinated VM Auto-configuration and Admission Control

We next evaluate the coordinated combination of VM auto-configuration and session-based admission con- trol. Using the same workload traces, monitoring interval and targets from the previous experiment, we compare the performance of the coordinated approach with that of the VM auto-configuration technique only.

Figures 5.9(a) and 5.9(b) show the achieved service differentiation ratio between the two applications and individual application request response times with the coordinated approach. A comparison with Fig- ures 5.6(a) and 5.6(b) shows that the coordinated approach achieves slightly better application response times and the differentiation ratio stabilizes to the target in fewer iterations.

Figures 5.10(a,b) compare the completed and aborted sessions of two applications by the two approaches.

Benefits of the coordinated approach become very apparent. While VM auto-configuration accepts all the sessions, the coordinated approach selectively accepts about 85% of application A’s sessions and 77% of application B’s sessions. However, it completes about 9% more sessions for application A and 2% more ses- sions for application B. Compared with the VM auto-configuration only approach, the coordinated approach also reduces the number of aborted sessions drastically for both applications, from 30% to 6% for application 111

1 8000 Application A (target) Target ratio 7000 Application A (achieved) 0.8 Achieved ratio Application B (target) 6000 Application B (achieved)

0.6 5000 4000 0.4 3000

Differentiation ratio 2000 0.2 1000

0 Average request response time (ms) 0 0 10 20 30 40 50 0 10 20 30 40 50 Time (min) Time (min)

(a) Service differentiation. (b) Average response time.

Figure 5.9: Coordinated VM auto-configuration and admission control for service differentiation.

100 100 VM auto-configuration VM auto-configuration Coordinated approach Coordinated approach 80 80

60 60

40 40 % of sessions % of sessions

20 20

0 0 Accepted Completed Aborted Accepted Completed Aborted

(a) Application A. (b) Application B.

Figure 5.10: Coordinated VM auto-configuration and admission control for performance improvement.

A and from 56% to 31% for application B. The coordinated approach can simultaneously meet differentiation targets, improve effective system throughput, and significantly reduce the resource wastage due to aborted sessions.

5.7.4 Impact of Bursty Workloads

We evaluate the robustness of the coordinated approach under bursty workloads. Figure 5.11(a) shows the bursty workload, which we generate for application B using the algorithm proposed in [79] with the index of dispersion set to 4000 and the maximum number of concurrent users set to 500.

Application A has a stationary workload of a browsing mix of 500 concurrent users. The differentiation 112

500 1 6000 Application A (Target) 450 Target Ratio Application A (Achieved) 5000 400 0.8 Achieved Ratio Application B (Target) Application B (Achieved) 350 4000 300 0.6 250 3000 200 0.4 2000 150 Differentiation ratio

Number of active users 100 0.2 1000 50

0 0 Average request response time (ms) 0 0 30 60 90 120 150 180 0 30 60 90 120 150 180 0 30 60 90 120 150 180 Time (sec) Time (sec) Time (sec)

(a) B’s bursty workload. (b) Service differentiation. (c) Average response time.

Figure 5.11: Coordinated VM auto-configuration and admission control for service differentiation (bursty workload).

100 6000 5 Target VM auto-configuration VM auto-configuration VM auto-configuration Coordinated approach Coordinated approach 5000 Coordinated approach 80 4 4000 60 3 3000

40 2000 2 % of total sessions 20 1000 1

Average request response time (ms) 0 0 0 30 60 90 120 150 180 0 Accepted Completed Aborted Time (sec) CPU Memory Disk

(a) Effective throughput. (b) Response times. (c) Provisioning oscillations

Figure 5.12: Comparison of VM auto-configuration and coordinated approaches for service differentiation, application response time and provisioning oscillations (bursty workload).

target ratio is set as δA/δB = 1/2. The request response time targets for applications A and B are set to 1000 ms and 2000 ms respectively. Performance monitoring and dynamic VM reconfiguration occur at 30-second intervals. Figures 5.11(b,c) show the achieved differentiation ratio and average request response times of the two applications. Due to its fast online learning, the enhanced reinforcement learning approach is robust to the workload variations.

Further, we compare the effectiveness of VM auto-configuration approach and the coordinated approach in adapting to bursty workloads. We choose Application B for the case study. Figure 5.12(a) reveals 6% improvement in the completed sessions and 41% reduction in the aborted sessions with the coordinated ap- proach. Figure 5.12(b) compares the average response time of the Application B with the two approaches. It shows that the coordinated approach achieves better response time for the application, which again demon- strates the merit of the coordinated admission control and VM auto-configuration. 113

We also compare the effectiveness of the VM auto-configuration and the coordinated approach in the context of resource provisioning oscillations. We consider a provisioning oscillation to be a succession of increase in a VM resource, followed by a decrease of the resource, followed by another increase of the same resource in three consecutive control intervals. A provisioning oscillation also occurs when there is a succession of decrease, increase and decrease of a same VM resource in consecutive intervals. Frequent resource provisioning oscillations are undesirable and negatively affect the performance due to processing overheads.

We choose application B with a bursty workload for the case study. We can observe the total number of provisioning oscillations for cpu, memory and disk resources for the three VMs hosting the application.

The total number of provisioning oscillations observed with the VM auto-configuration and the coordinated approach are plotted in Figure 5.12(c). Results indicate that with the coordinated approach, resource pro- visioning oscillations for all the three resources are either reduced or remain the same. In the coordinated approach, admission controller reduces bursty fluctuations faced by the application and thereby reduces the resource provisioning oscillations. Specifically, provisioning oscillations for cpu are reduced from 4 to 3, for memory are reduced for 3 to 1 and the disk provisioning oscillations remain the same.

We can tell that while the coordinated approach does not achieve significant improvement in terms of reduced resource oscillations, its true benefit lies in achieving improved application throughput and reducing the resource wastage due to aborted sessions while satisfying the differentiation targets and improving the application response time.

5.7.5 Sensitivity Analysis of the Learning Algorithms

Effectiveness of a statistical learning technique is heavily dependent on it’s learning parameters values. The

VM auto-configuration and admission control modules use a combination of two learning techniques, re- inforcement learning and cascade neural networks. For reinforcement learning, three important learning parameters are: learning rate α, discount factor γ and exploration rate .

Value of learning rate α determines to what extent the agent continues to learn from new observations.

While a low learning rate will cause the agent to not learn anything new, a high learning rate will result in the 114

4000 100 4000 100 4000 100 Request Response Time Request Response Time Request Response Time 3500 Throughput 90 3500 Throughput 90 3500 Throughput 90 80 80 80 3000 3000 3000 70 70 70 2500 60 2500 60 2500 60 2000 50 2000 50 2000 50 1500 40 1500 40 1500 40 Throughput Throughput Throughput 30 30 30 1000 1000 1000 20 20 20 Request Response Time Request Response Time Request Response Time 500 10 500 10 500 10 0 0 0 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.9 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.9 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Learning Rate Exploration Rate Discount Rate

(a) Learning Rate. (b) Exploration Rate. (c) Discount Rate.

Figure 5.13: Effect of reinforcement learning parameter values agent considering only the most recent information. The discount factor γ determines the importance of future rewards. A low discount factor results in an “opportunistic” agent that will only consider current rewards, while a higher discount factor will make the agent strive for a long-term high reward. The exploration rate  balances between the exploitation and exploration of the agent. We use a -greedy policy, which allows the agent to explore random actions with a small probability of  and follow the best policy found for the rest of the time. Similarly for a cascading neural network the two learning parameters, cache size and cache commit interval strongly influence the network growth. A larger cache size will result in more candidate networks to be added to the network due to smaller change in mean square error from step to step. Similarly, a smaller cache commit interval means that the network is trained more often and candidate neurons are added at a higher rate.

In this experiment, we empirically determine the learning parameter values to use with the VM auto- configuration and admission control modules. We consider a single hosted application with dynamic work- loads as shown in Figure 5.5. We modify the VM auto-configuration module to minimize the gap between the observed and target response time of the application and admission control module to maximize the ap- plication throughput. The learning parameter values for the reinforcement learning modules (α,γ,) are in the range [0 - 1]. For the cascading neural networks, the cache size is varied in the range [500 - 3000] and the commit interval in the range [10 - 100]. We experiment with different combinations of each learning parameter values with other learning parameters fixed at a value in the middle of the corresponding range.

Figure 5.13(a) shows the application performance with different values of α with fixed values of γ = 0.5,

 = 0.5, cachesize = 1250 and commitinterval = 45. It shows that best application performance in terms 115

4000 100 4000 100 Request Response Time Request Response Time 3500 Throughput 90 3500 Throughput 90 80 80 3000 3000 70 70 2500 60 2500 60 2000 50 2000 50 1500 40 1500 40 Throughput Throughput 30 30 1000 1000 20 20 Request Response Time Request Response Time 500 10 500 10 0 0 0 0 500 1000 1500 2000 2500 3000 10 20 30 40 50 60 70 80 90 100 Cache Size Cache Commit Interval

(a) Cache Size. (b) Cache Commit Interval.

Figure 5.14: Effect of cascade neural network learning parameters

1 6000 1 Application A (Basic) 5500 Application A (Enhanced) Basic Application A Target Basic Enhanced 5000 Application B (Basic) Enhanced 0.8 Differentiation Target Application B (Enhanced) 0.8 Differentiation Target 4500 Application B Target 4000 0.6 3500 0.6 3000

0.4 2500 0.4

Response Time 2000 Differentiation ratio Differentiation ratio 1500 0.2 0.2 1000 500 0 0 0 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 10 20 30 40 50 60 70 80 90 100 Time (min) Time (min) Number of iterations

(a) Service differentiation. (b) Average response time. (c) Agility of auto-configuration.

Figure 5.15: Agility of the enhanced reinforcement learning approach with cascading neural network. of both request response time and throughput is achieved with α = 0.3. Similarly Figure 5.13(b) shows that lowest response time and highest throughput is achieved for the exploration rate  = 0.1. However discount rates plotted in Figure 5.13(c) reveals that γ = 0.9 for the VM auto-configuration module gives the lowest response time, but γ = 0.8 for the admission control module gives best throughput. Similarly, best values for the parameters, cachesize and commitinterval are observed in Figures 5.14(a) and 5.14(b) respectively.

5.7.6 Agility Analysis of the Learning Algorithms

To demonstrate the agility of the enhanced reinforcement learning approach, we consider reinforcement learn- ing based VM auto-configuration with and without the enhancements. We consider two applications with the dynamic step-change workloads shown in Figure 5.5. The target differentiation ratio is set as δ1/δ2 = 1/3 and the control interval length is 60 seconds. 116

Figures 5.15(a) and 5.15(b) show the differentiation ratio, response times of the two applications with and without the cascading neural network enhancements. Without the enhancements, it takes longer for the service differentiation to stabilize. For a case study, we consider the 10th minute in the experiment when both applications have increased workloads. For fair comparison, in both cases the reinforcement learner will execute multiple iterations, where each iteration corresponds to the update step of Q-learning. In basic reinforcement learning, each iteration updates the Q-value table and in enhanced reinforcement learning the cascade neural network is updated.

We compare the number of iterations needed to achieve the differentiation target with the basic and the enhanced reinforcement learning approaches. Due to sudden step-change workloads, the achieved differen- tiation ratio of two applications fluctuates. Figure 5.15(d) shows that the enhanced reinforcement learning approach stabilizes and reaches the differentiation target in about 50 iterations while the basic reinforcement learning approach takes about 80 iterations. Due to the neural network enhancements, the reward of the re- inforcement learner is adaptive to workload changes and thus the differentiation target is achieved in much fewer iterations.

5.7.7 Scalability Analysis of the Learning Algorithms

Scalability is a critical challenge in adopting the reinforcement learning paradigm to real time dynamic prob- lems. This is due to its increased Q-table size with increased problem scale. For the VM auto-configuration problem, the number of Q-table entries is given by (M × N × |R| × 3)2, where M is the total number of hosted applications, N is the number of tiers per application, |R| is the number of resources managed per VM and 3 represents the increase, decrease, nochange actions for each VM resource. With basic reinforcement learning, as either the number of hosted applications or the number of resources managed per VM increase, the Q-table entries increase and needs to be completely populated. To achieve better scalability, we enhance the reinforcement learner by replacing the Q-table with a cascading neural network function approximator.

To demonstrate the improved scalability of the enhanced reinforcement learning approach, we experiment with an increased number of hosted applications. We incrementally host an additional multi-tier application resulting in 3,4,. . . ,10 hosted applications. The workload of each hosted application dynamically changes at 117

1 10000 600 9000 2 applications Ratio 1 3 applications Ratio 2 500 0.8 Differentiation Target 1 8000 4 applications Differentiation Target 2 Application A (Target) 5 applications 7000 Application A (Observed) 6 applications Application B (Target) 400 7 applications 0.6 6000 Application B (Observed) Application C (Target) 8 applications 5000 Application C (Observed) 300 9 applications 10 applications 0.4 4000 Response Time

Differentiation ratio 3000 200

0.2 2000 Number of iterations 100 1000

0 0 5 10 15 20 25 30 35 40 45 50 30 0 Time (min) Time (min) Basic Enhanced

(a) Service differentiation. (b) Average response time. (c) Scalability.

Figure 5.16: Scalability of the enhanced reinforcement learning approach under dynamic workloads.

10 minute intervals, in the same step-change pattern as shown in Figure 5.5. The target differentiation ratio between two successive applications is set to 1/2, leading to the reference average request response times for the hosted applications as 1000, 2000, . . . , 512000 ms respectively.

Figures 5.16(a) and 5.16(b) show the achieved service differentiation and average response time for the

first three applications by the enhanced reinforcement learning approach. For a case study, at a specific time

(the 20th minute) we examine the approach with and without the cascading neural network enhancement.

As Figure 5.16(c) shows, when the number of hosted applications are 2,3,. . . ,10 the basic reinforcement learning approach without the enhancement takes more iterations to achieve the desired performance than the enhanced reinforcement learning approach. Specifically, when the number of hosted applications increased to

10, the number of iterations required are reduced by 52%. Due to the neural network function approximator, the enhanced reinforcement learner determines the updated VM resource configurations necessary to stabilize the differentiation ratio. It does not require to completely populate the increased Q-table, while it is the case for the basic reinforcement learning approach. This demonstrates that the enhanced approach is more scalable.

5.7.8 Comparison with Statistical Regression Based Resource Provisioning

We next compare the reinforcement learning based VM-autoconfiguration with the statistical regression based dynamic resource management strategy proposed in Chapter 4. For fair comparison, the two approaches dynamically configure same resources and consider same performance metric. We choose request response time to be the performance metric and enhance the regression based approach from Chapter 4 to dynamically 118 configure the cpu, memory and disk size of three dedicated tier-specific virtual machines.

5.7.8.1 Statistical Regression based VM auto-configuration

The statistical regression based VM auto-configuration strategy employs a combination of offline learning and online monitoring. We first introduce a few important terms and then discuss the learning and monitoring phases in detail.

Allocated Resources represent the total resources allocated to a hosted application. That is, allocated cpu is the total cpu of the three VMs hosting the web, application and database tier of the hosted application.

Similarly, allocated memory and allocated disk size represent the total memory and disk size of the three

VMs.

Resource Utilization represents the utilization of resources allocated to the hosted application. It is defined as geometric mean of the cpu, memory and disk utilizations of the three VMs.

Tier Request Response Ratio

Let r be the total response time of a request and ri be the response time of the request at tier i. The normalized tier response time at a tier i is calculated as r rr = i . (5.16) i r

We define the tier request response ratio of a three-tier application as the ratio of the normalized tier request response times at individual tiers. That is

rr1 : rr2 : rr3. (5.17)

A Behavior Model represents the learned behavior of a hosted application when subjected to a specific work- load. Two important application parameter relations are “allocated resources - request response time” and

“allocated resources - resource utilization”. A behavior model captures the two parameter relations as quanti- tative statistical regression modes. Along with the regression models, the correlation coefficients that quantify 119

Table 5.1: Training workload characteristics.

Rubis Number of Allocated CPU Allocated Memory Allocated Disk

Session Mix concurrent users (MHz) (MB) (GB)

Bidding 50, 100, ... 1000 256, 272, .... 1024 64, 72, ... 512 5, 6, ... 50

Selling 50, 100, ... 1000 256, 272, .... 1024 64, 72, ... 512 5, 6, ... 50

the quality of regression models are also included in the behavior model.

The workload characteristics are integral to a behavior model and impact the quantitative values of the statistical regression models. Different workloads result in different quantitative representations of the “allo- cated resources - request response time” and “allocated resources - resource utilization” relations. A behavior model also captures the tier request response time ratio, which indicates weighted proportional resource de- mands on individual tiers.

Offline Training

During the training phase an extensive set of behavior models is derived using diverse RUBiS workloads.

The workload characteristics and the resource variations used during the training phase are summarized in

Table 5.1. Each possible combination of the session mix, number of concurrent users and allocated resources result in a unique behavior model. A behavior model represents the hosted application behavior when sub- jected to a specific workload. We use statistical regression analysis to derive the “allocated resources - request response time” and “allocated resources - resource utilization” regression models.

To demonstrate the statistical regression analysis performed to model the relations, in the following we consider a workload consisting of RuBIS bidding sessions with 50 concurrent users. The workload is applied to the hosted application multiple times as the CPU, memory and disk size of the three host VMs are varied according to the Table 5.1.

The “allocated resources - request response time” relation is captured as a multi-variate multi-variable 120 regression model

b1 b2 b3 y1 = a1x1 x2 x3 (5.18)

and the “allocated resources - resource utilization” relation is captured by

b4 b5 b6 y2 = a2x1 x2 x3 (5.19)

where y1 is the request response time, y2 is the resource utilization and x1, x2, x3 represent the allocated cpu, memory and disk size of the hosted application.

Determining the constants a1, a2, b1, ..., b6 involves linearizing the Eqns. (5.18,5.19) and using the least square method to minimize the sum of squares of errors. Applying these two techniques on Eqn. (5.18) result in Eqns. (5.20,5.21,5.22,5.23), which are further solved for the constants using Guass Elimination.

X X X X y1i = na1 + x1ib1 + x2ib2 + x3ib3 (5.20)

X X X 2 X X y1ix1i = x1ia1 + x1ib1 + x2ix1ib2 + x3ix1ib3 (5.21)

X X X X 2 X y1ix2i = x2ia1 + x1ix2ib1 + x2ib2 + x3ix2ib3 (5.22)

X X X X X 2 y1ix3i = x3ia1 + x1ix3ib1 + x2ix3ib2 + x3ib3 (5.23)

The multiple regression correlation coefficient, R, is given by : v u SSE u n−p−1 R = t1 − SST (5.24) n−1 where n is the total number of observations and p is the number of independent variables.

SSE, the sum of squares due to error, is given by

X 0 2 SSE = (y1i − y1i) (5.25)

0 where y1i is the estimated value and y1i is the observed value.

SST, the total sum of squares is

X 2 SST = (y1i − y¯1) (5.26)

where y¯1 is mean of observed y1i values. 121

Table 5.2: A behavior model.

Session arrival rate 50 sessions/sec

Session type RuBIS bidding

2.3 5.5 1.2 “resources - request response time” regression model y1 = 14.3x1 x2 x3

“resources - request response time” correlation coefficient 0.9538

8.5 4 3.6 “resources - resource utilization” regression model y2 = 7.2x4 x5x6

“resources - resource utilization” correlation coefficient 0.8958

Tier request response time ratio 0.25 : 0.16 : 0.59

Constants a2, b4, b5, b6 are derived from Eq.( 5.19) using the same procedure. A behavior model resulting from training with this specific workload is given in Table 5.2.

Online Monitoring

The VM auto-configuration process is divided into a sequence of intervals. In each interval, average request response time and resource utilization are measured and compared to predefined thresholds. When a threshold violation is observed, resource requirements of the hosted application are predicted using a single learned behavior model. The workload characteristics observed in the interval determine the behavior model used for predictions. A behavior model with the session mix same as the dominant session mix observed and number of concurrent users closest to the observed number of concurrent users is selected. If more than one behavior models meets the criteria, the model with highest correlation coefficients is selected. The selected behavior model represents the response time and resource utilization behaviors as regression models.

We develop a threshold-based policy that uses a request response time threshold and a resource utilization threshold for efficient resource allocation. A request response time threshold is set below the request response time target. A threshold violation indicates a possible risk of request response time guarantee violation. The

“allocated resources - request response time” multivariate multi-variable regression model of the selected behavior model is used to predict the additional resources required to keep the request response time under the threshold in the subsequent intervals. Form the multi-variable regression model, new value for a single resource is predicted by considering the other two resources as constant. Once new values for the three 122 resources are determined, they are allocated to the individual tiers in proportion to the tier request response time ratio of the behavior model.

When a resource utilization threshold violation occurs, the “allocated resources - resource utilization” regression model of the behavior model is used to predict the new resource configurations. Resources are removed from the individual tiers in inverse proportion to the tier request response time ratio of the behavior model. Fewer resources will therefore be removed from a tier with relative higher resource demand.

5.7.8.2 Comparison of regression and reinforcement learning based VM auto-configuration approaches.

Next, we compare the two VM auto-configuration approaches in terms of request response times and pro- visioning oscillations. The application chosen for the case study will face the bursty workloads shown in

Figure 5.11(a). Figure 5.17 (a) shows the request response times achieved with the two approaches. It reveals that with regression based approach, slightly higher request response times are observed than with the rein- forcement learning VM configuration. Similarly, the provisioning resource oscillations plotted in Figure 5.17

(b) are also higher.

Results reveal that the enhanced reinforcement learning based VM-autoconfiguration is able to achieve comparable, even slightly better, performance than the regression based approach. The true advantage of en- hanced reinforcement learning is the ability to achieve improved performance in dynamically environments without needing extensive offline training. Furthermore, with the regression based approach, when the num- ber of hosted applications or number of resources managed increase, a different set of regression models should be learned and new behavior models should be persisted. However, the enhanced reinforcement learn- ing technique is capable of handling increased number of hosted applications and managed resources without any additional changes.

5.8 Summary and Discussion

Differentiated treatment of multiple multi-tier Internet applications hosted in a shared virtualized platform is an important but challenging problem. In this chapter, we developed a novel and practical reinforcement learning approach that uses a coordinated combination of VM auto-configuration and session based admission 123

6000 5 Target Statistical Regression Regression Reinforcement Learning 5000 Reinforcement Learning 4 4000 3 3000

2000 2

1000 1 Request response time (ms)

0 0 30 60 90 120 150 180 0 Time (sec) CPU Memory Disk

(a) Request Response Time. (b) Resource Provisioning Oscillations.

Figure 5.17: Comparison of regression and reinforcement learning based VM auto-configuration. control for provisioning service differentiation in multi-tier Internet services. The approach is enhanced with cascade neural networks. The new approach simultaneously provides proportional service differentiation between multi-tier applications, minimizes the gap between the target and the achieved average response time, and improves the relative session throughput of each application. Experimental results demonstrate the effectiveness, agility and scalability of the developed approach in the face of highly dynamic and bursty workloads.

We further provide a comparison between the supervised and unsupervised statistical learning for adaptive

VM auto-configuration. Results show that enhanced reinforcement learning results in performance compa- rable to that achieved by supervised statistical regression. The true advantage of enhanced reinforcement learning is the ability to manage hosted services in dynamic environments without needing extensive offline training to capture the complex behavior of multi-tier Internet services facing dynamic and bursty workloads. Chapter 6

Multi-tier Internet Service Management and Monitoring Console

E-commerce is the norm for modern day businesses that allows them to reap benefits of the ubiquitous nature of the Internet. Hosting a multi-tier Internet service in a data center allows a business to maintain an online presence without incurring heavy IT infrastructure and maintenance costs. Business continuity along with high performance are crucial goals for businesses and in turn for the host data center.

Data centers are rapidly adopting virtualization, which offers several advantages such as resource con- solidation, high resource utilization and performance isolation. However it leads to highly dynamic hosting environment layered on static physical environment, increasing data center complexity. The supporting data center infrastructure can be very complex involving a wide range of servers, networks, databases, operat- ing systems, and third-party web services. Virtualization thus introduces new challenges to both physical infrastructure management and hosted service management.

Meeting availability and performance SLAs are crucial for businesses, which requires that data center resources remain healthy and are effectively managed. Unexpected failures and downtime have severe neg- ative impact on the businesses. A USA Today survey of 200 data center managers found that over 80% of these managers reported that their downtime costs exceeded $50,000 per hour. For over 25%, downtime cost exceeded $500,000 per hour.

124 125

Real-time performance monitoring and management of both the data center resources and the deployed services is at the heart of ensuring that the delivered results justify the investment. Thus a monitoring and management tool is a critical component in adoption of virtualized data centers.

An exhaustive monitoring and management console that conducts real time monitoring of all data center infrastructure components is a huge undertaking that stretches beyond the scope of a single research task of a PhD thesis. Instead, we focus on developing a ”Multi-tier Internet Service Management and Monitoring

Console (MISMC)” that closely aligns with our other research tasks. While most commercial management tools focus on data center infrastructure, we include management and monitoring of the applications deployed in the data center as well.

MISMC focusses on management and monitoring of the hosted multi-tier Internet applications and the host virtual machines by a data center administrator. The main goal of MISMC is to facilitate the data center administrator in configuring and fine-tuning the statistical learning based admission control, server provisioning and service differentiation strategies applied to the hosted multi-tier applications. It also allows the administrator to fine-tune the SLA parameters of individual applications. Extensive monitoring of both the hosted applications and the hosting virtual machines is also a feature. An administrator can choose to get notified about several topics of interest, through console and/or email. An administrator can further configure and receive periodic reports about the application and virtual machine metrics.

We next discuss the MISMC features in detail and provide screenshots of different views of the manage- ment, monitoring and reporting features.

6.1 MISMC Features

MISMC is a desktop tool consisting of five major views.

6.1.1 Dashboard

Figure 6.1 shows the dashboard view , which provides the data center administrator with an overview of the deployed multi-tier applications and the host virtual machines. For each deployed application, the dashboard displays the average effective session throughput, average request response time and average resource uti- 126

Figure 6.1: Dashboard 127 lization. Similarly for each virtual machine, along with the application running on the virtual machine, the average CPU, memory utilization and disk utilization metrics are displayed. The dashboard also informs the administrator about the number of active and inactive virtual machines in the data center. It further indicates the number of notifications generated for the administrator. Additionally, the dashboard provides an easy access to more details of each application and the host virtual machine.

6.1.2 Applications

The applications view provides an administrator a way to manage and monitor the individual applications deployed in the data center. The administrator can perform scheduled outgates by stopping a deployed appli- cation, performing necessary maintenance and re-starting the application. Application configuration involves setting different threshold values that effect the performance of the statistical learning based strategies for session admission control, resource provisioning and service differentiation. Additionally the administrator can fine tune the SLA metrics of the request response times and differentiation ratio. Application monitoring feature provides an administrator with the real time view of utilization and performance metrics. Utilization metrics include CPU, memory and disk utilizations of the application and the performance metrics include request response time, session throughput, aborted sessions and rejected sessions. In addition to the real time monitoring, the utilization and performance trends can be viewed over a period of several minutes, hours, days or weeks.

6.1.3 Virtual Machines

Similar to the deployed applications, the host virtual machines can be managed and monitored by the data center administrator. The virtual machine management and monitoring views are shown in Figure 6.4 and

Figure 6.5 respectively. Each host virtual machine can be started and stopped through the management view.

The administrator can perform scheduled outgates by stopping the virtual machines, performing necessary maintenance and re-starting the VMs. Monitoring of the virtual machine is done via real time view of CPU, memory and disk utilizations. In addition to the real time monitoring, the utilization and performance trends can be viewed over a period of several minutes, hours, days or weeks. 128

Figure 6.2: Application management 129

Figure 6.3: Application monitoring 130

Figure 6.4: Virtual machine management 131

Figure 6.5: Virtual machine monitoring

6.1.4 Notifications

The notification view provides an administrator the ability to both configure the notifications to receive and view the notifications received.

Notification configuration as depicted in Figure 6.6 involves selecting an event of interest, assigning the event a severity (Low, Medium, High) and specifying the notification delivery mechanism (Console/Email).

For instance, an administrator can choose to trigger a ’Medium’ severity notification when a virtual machine’s memory utilization is greater than 75% and a ’High’ severity notification when a hosted service’s request response time SLA is violated. For each of the notifications triggered, the administrator can further choose to either view the notification in the console or receive an email. For instance, the ’Low’ severity notifications may be only viewed in the console and for all ’High’ severity notifications an email is sent to email addresses provided by the administrator.

To see the notifications delivered to him in the console, the administrator navigates to the Notifications

”View” as shown in Figure 6.7. The pages lists all the notifications available for the administrator. After reviewing the notifications, the administrator can choose to delete some or all of his notifications.

6.1.5 Reports

The reports view provides an administrator with the ability to both configure the reports to be generated and view the reports generated.

Report configuration as shown in Figure 6.1.4 involves selecting the report period and choosing the met- 132

Figure 6.6: Notification configuration. 133

Figure 6.7: Notification view. rics to be included in the report. Reports can be generated daily, weekly or monthly. Metrics that can be included in the report are : utilization and performance metrics for each hosted application and virtual ma- chine utilization metrics. Periodic reports are automatically generated by the console based on the report configuration. An administrator can choose to view the reports within the management console or receive them via email.

Reports visible in the console are categorized into daily, weekly and monthly reports. Administrator can choose a report to view, which opens the report in a pdf format. Once viewed, the report can be deleted as well.

6.2 Future Enhancements

MISMC is a simplistic management and monitoring console that closely aligns with the overall goals of man- aging multi-tier Internet services. It is however, not intended to be a full pledged monitoring or management solution. We foresee the following enhancements to take it a step closer to being a comprehensive product.

• The console is currently a desktop tool. A web based monitoring solution that can be accessed from

different locations and different browsers is desirable.

• Currently the monitoring is limited to the hosted applications and hosting virtual machines. There are

several other infrastructure component such as network, storage devices and so on that can benefit from

the real time management and monitoring. 134

Figure 6.8: Report configuration and view 135

• Automatic discovery of all the involved components and hosted applications is another possible exten-

sion.

• Discovery, management and monitoring of assets across multiple locations of the data center is another

desirable extension. Chapter 7

Conclusion

7.1 Contributions and Accomplishments

Multi-tier Internet services are ubiquitous in the modern society. With their growing popularity comes the need for improving performance, satisfying QoS guarantees and meeting customer expectations. Evolving technological trends such as virtualization and cloud platforms facilitate economical options for multi-tier

Internet service hosting. In this thesis, we undertake the challenge of managing hosted multi-tier Internet services with the goal of building high performing, balanced, scalable and agile services. We concentrate on session-based admission control and adaptive resource management for performance improvement of indi- vidual services and service differentiation among multiple services hosted in virtualized hosting platforms.

Multi-tier Internet service management is a non-trivial challenge. The tiered architecture results in com- plex inter-tier dependencies and dynamic bottleneck shift. Additional complications are posed by the virtual- ization techniques employed by the hosting platforms. These challenges are further magnified by the highly dynamic and bursty workloads. In a shared virtualized hosting platform, the user-perceived service quality is the result of a complex interaction of complex workloads in a very complex underlying server system.

Statistical learning techniques gain knowledge of complex environments through dynamic observation without requiring a priori application-specific knowledge. Statistical learning based management approaches can thus address the challenges posed by the tiered architectures, complex hosting environments and bursty

136 137 workloads through a combination of learning and adaptation.

In this dissertation, we examine the applicability of different types of statistical learning : supervised and unsupervised learning for managing multi-tier Internet services. While supervised learning requires two distinct phases for training and prediction, unsupervised techniques can operate independent of training data.

The main roadblock in adopting supervised learning techniques for managing real-time dynamic systems is the long training periods required build simulation models. Reinforcement learning, a form of unsupervised learning, uniquely learns by interacting with dynamic environments without requiring explicit offline models.

When a hosted Internet service is operating at or close to its maximum capacity, it is necessary to police the incoming user sessions in order to protect the service from overload and avoid sessions being aborted in the middle of their transactions. We studied session-based admission control in the hosted multi-tier Internet service with the goal of improving the session throughput of the service. We designed a coordinated session based admission control (CoSAC) strategy that models a multi-tier Internet service as a Bayesian Network model. It considers multiple tier states to arrive at a coordinated decision to either accept or reject a new incoming session. Evaluation using an e-Commerce simulator and industry standard TPC-W benchmark workloads show that CoSAC can improve the effective session throughput by 50% when compared with the admission control decisions that treat the multi-tier service as a blackbox.

A virtualized hosting platform adaptively provisions shared resources to the hosted multi-tier Internet services for QoS guarantees. We design a statistical regression based provisioning strategy that dynamically allocates and removes virtual machines allocated to a multi-tier Internet service to satisfy session-oriented

QoS performance guarantees. We propose a novel performance metric,Session slowdown, which is the rela- tive ratio of the total queueing delay of requests of a session to the total processing time of the requests. We use two distinct statistical regression models to capture the service behavior with respect to the session slow- down and resource utilization. Simulation results using TPC-W benchmark show that the regression-based approach adaptively and efficiently provisions resources to appropriate individual tiers for session slowdown guarantee of the multi-tier service, taking the dynamic resource demand and resource utilization into account.

While the first two studies focussed on single hosted service, we next explored performance improvement of and differentiation among multiple hosted multi-tier services. In this study, we considered a coordinated 138 combination of admission control and adaptive resource management for providing relative service differ- entiation in virtualized shared platforms. We explored dynamic provisioning at a finer granularity, that is on-demand provisioning of dedicated virtual machines with cpu, memory and disk size.

We developed a novel and practical reinforcement learning approach that uses a coordinated combina- tion of VM auto-configuration and session based admission control for provisioning service differentiation in multi-tier Internet services. We enhance the reinforcement models with cascading neural networks to inte- grate the strengths of model-independence of reinforcement learning and self-learning and self-construction of neural networks for system scalability and agility. The new approach simultaneously provides proportional service differentiation between multi-tier applications, minimizes the gap between the target and the achieved average response time, and improves the relative session throughput of each application. Implementation in a RUBiS based testbed demonstrated the effectiveness, agility and scalability of the developed approach in the face of highly dynamic and bursty workloads.

Finally, we developed a Internet service management and monitoring console that is intended to be used by a data center administrator. The desktop based console focusses on management and monitoring of the hosted multi-tier Internet applications and the host virtual machines. It facilitates the data center administrator in configuring and fine-tuning the statistical learning based admission control, server provisioning and service differentiation strategies applied to the hosted multi-tier applications. It also allows the administrator to fine- tune the SLA parameters of individual applications. Extensive monitoring of both the hosted applications and the hosting virtual machines is also a feature. An administrator can choose to get notified about several topics of interest, through console and/or email. An administrator can further configure and receive periodic reports about the application and virtual machine metrics.

Our work has been peer-reviewed and accepted by the research community. We have multiple publications in major IEEE conferences and top-ranked journals. A list of publications is provided in Chapter 8.

7.2 Future Work

Data centers and the underlying paradigms of virtualization and cloud computing are hot active research topics. These topics are receiving increasing attention from academic research community and corporations 139 alike. While this dissertation explored a subset of challenges in this area, there remain several avenues this research can be extended to.

A possible extension is to consider the effect of the heterogeneous workloads on the performance of the statistical learning strategies for multi-tier Internet service management. While we conducted extensive performance evaluation using a combination of simulation and implementation, the workloads used during the evaluation are homogenous. For evaluating the independent bayesian network based admission control and regression based dynamic server provisioning strategies, we used session-based workloads generated in accordance with the TPC-W benchmark specification. For the evaluation of reinforcement learning based coordinated combination of admission control and adaptive VM resource management, we used multiple

RUBiS applications hosted in a virtualized testbed. It would be interesting to evaluate the management strategies with different benchmarks, such as RUBiS, TPC-W, TPC-C to study the effect of heterogenous workloads on their performance.

Our regression based dynamic server provisioning assumes that the virtual machines being dynamically provisioned are homogeneous and that their performance is also homogenous. However, real-time cloud sys- tems such as Amazon’s EC2 offer multiple types of virtual machines, each type with different configurations.

The performance of different types of virtual machines is obviously heterogenous, but is also proven that the performance of virtual machines belonging to the same type is not identical. The concept of workload affinity suggests that different instances of same type of virtual machines may be more suitable to process different type of workloads. Evaluating the performance of our dynamic provisioning strategies with heterogeneous resources would be a worthwhile exercise.

Furthermore, our adaptive resource management techniques consider only provisioning virtual machines and resources (cpu, memory and disk) of the virtual machines. In addition to these resources, the network plays an crucial role in a data center. Experimenting with different statistical learning techniques to under- stand the effect of network bandwidth as a shared resource of multiple hosted services is a relevant extension to our work. Strategies for effective network sharing among multiple hosted services that result in better performance isolation will improve the adoption of virtualized public cloud platforms.

While our research focussed on a single location data center, exploring and evaluating the statistical 140 learning approaches in distributed data centers is a yet another possible extension to our work. Service- oriented applications are commonly deployed in different data centers for fault tolerance and to deliver good quality of service to users in different locations around the globe. Distributed data centers facilitate improved scalability and survivability. They require both good scheduling policies and adaptive resource allocation so that SLAs can be honored across geographically distributed servers and reduce user perceived latency.

Finally, the MISMC discussed in Chapter 6 is a simplistic management and monitoring tool that can benefit from several enhnacemens such as automatic discovery of assets, monitoring of all infrastructure components and enabling web based access. Publications

Conference proceedings

1. CoSAC: Coordinated Session-based Admission Control for Multi-tier Internet Applications, Sireesha

Muppala and Xiaobo Zhou, Proc. IEEE Intl Conf. on Computer Communications and Networks

(ICCCN), 2009, 6 pages, acceptance rate 30%.

2. Regression Based Multi-tier Resource Provisioning for Session Slowdown Guarantees, Sireesha Mup-

pala, Xiaobo Zhou and Liqiang Zhang, Proc. IEEE Intl Conf. on Perfor- mance, Computing, and

Communications (IPCCC), 2010, 8 pages, acceptance rate 29%.

3. Multi-tier Service Differentiation: Coordinated Resource Provisioning and Admission Control, Siree-

sha Muppala, Xiaobo Zhou, and Guihai Chen, Proc. of the 18th IEEE International Conference on

Parallel and Distributed Systems (ICPADS), Singapore, December 2012, 8 pages, acceptance rate 29%.

Journal articles

4. Coordinated Session-based Admission Control with Statistical Learning for Multi-tier Internet Appli-

cations, Sireesha Muppala and Xiaobo Zhou, Journal of Network and Computer Applications, Elsevier,

2011, 34(1): 20-29.

5. Regression Based Resource Provisioning for Session Slowdown Guarantees in Multi-tier Internet Servers,

Sireesha Muppala, Xiaobo Zhou, Liqiang Zhang, and Guihai Chen, Journal of Parallel and Distributed

Computing, Elsevier, 2012, Vol. 72(3) : 362-375. 142

Submitted (Under peer review)

1. Multi-tier Service Differentiation by Coordinated Learning-based Resource Provisioning and Admis-

sion Control, Sireesha Muppala, Guihai Chen, Xiaobo Zhou. Submitted to Journal of Parallel and

Distributed Computing, Elsevier. Bibliography

[1] Amazon auto scaling. http://aws.amazon.com/autoscaling.

[2] Z. Abbasi, T. Mukherjee, G. Varsamopoulos, and S. K. S. Gupta. Dahm: A green and dynamic web application

hosting manager across geographically distributed data centers. J. Emerg. Technol. Comput. Syst., 8(4), 2012.

[3] T. F. Abdelzaher, K. G. Shin, and N. Bhatti. Performance guarantees for Web server end-systems: a control-

theoretical approach. IEEE Trans. on Parallel and Distributed Systems, 13(1):80–96, 2002.

[4] J. Almeida, V. Almeida, D. Ardagna, ı. Cunha, C. Francalanci, and M. Trubian. Joint admission control and

resource allocation in virtualized servers. Journal of Parallel and Distributed Computing, 70(4):344–362, 2010.

[5] J. Almeida, V. Almeida, D. Ardagna, C. Francalanci, and M. Trubian. Resource management in the autonomic

service-oriented architecture. In Proc. IEEE Int’l Conference on Autonomic Computing (ICAC), 2006.

[6] D. Ardagna, C. Ghezzi, B. Panicucci, and M. Trubian. Service Provisioning on the Cloud: Distributed Algorithms

for Joint Capacity Allocation and Admission Control. Springer Berlin / Heidelberg, 2010.

[7] J. Arnaud and S. Bouchenak. Adaptive internet services through performance and availability control. In Proc. of

ACM Symposium on Applied Computing, pages 444–451, 2010.

[8] M. Aron, P. Druschel, and W. Zwaenepoel. Cluster reserves: a mechanism for resource management in cluster-

based network servers. In Proc. ACM SIGMETRICS, pages 90–101, 2000.

[9] C. G. Atkeson and J. C. Santamaria. A comparison of direct and model-based reinforcement learning. In Proc.

Intl. Conference on Robotics and Automation, pages 3557–3564, 1997.

[10] M. N. Bennani and D. A. Menasce. Resource allocation for autonomic data centers using analytic performance

models. In Proc. IEEE Int’l Conference on Autonomic Computing (ICAC), 2005.

143 144

[11] J. L. Berral, I. Goiri, R. Nou, F. Julia,` J. Guitart, R. Gavalda,` and J. Torres. Towards energy-aware scheduling in

data centers using machine learning. In Proc. of the 1st International Conference on Energy-Efficient Computing

and Networking, 2010.

[12] N. Bhatti and R. Friedrich. Web server support for tiered services. IEEE Network, 13(5):64–71, 1999.

[13] S. Bhulai, S. Sivasubramanian, R. Van Der Mei, and M. Van Steen. Modeling and predicting end-to-end response

times in multi-tier internet applications. In Proc. of the Int’l Teletraffic Conference (ITC), pages 519–532, 2007.

[14] S. Bleikertz, M. Schunter, C. W. Probst, D. Pendarakis, and K. Eriksson. Security audits of multi-tier virtual infras-

tructures in public infrastructure clouds. In Proc. of the ACM workshop on Cloud computing security workshop,

pages 93–102, 2010.

[15] N. Bobroff, A. Kochut, and K. Beaty. Dynamic placement of virtual machines for managing sla violations. In

IEEE International Symposium on Integrated Network Management, pages 119–128, 2007.

[16] N. Bonvin, T. G. Papaioannou, and K. Aberer. Cost-efficient and differentiated data availability guarantees in data

clouds. In Proc. of IEEE Int’l Conference on Data Engineering (ICDE), pages 980–983, 2010.

[17] J. P. Boyer, R. Hasan, L. E. Olson, N. Borisov, C. A. Gunter, and D. Raila. Improving multi-tier security using

redundant authentication. In Proc. of ACM workshop on Computer security architecture, pages 54–62, 2007.

[18] X. Bu, J. Rao, and C. Xu. A model-free learning approach for coordinated configuration of virtual machines and

appliances. In Proc. IEEE Int’l Symposium on Modelling, Analysis, and Simulation of Computer and Telecommu-

nication Systems(MASCOTS), 2011.

[19] X. Bu, J. Rao, and C.-Z. Xu. A reinforcement learning approach to online web system auto-configuration. In Proc.

IEEE Int’l Conference on Distributed Computing Systems (ICDCS), 2009.

[20] N. Buchbinder, N. Jain, and I. Menache. Online job-migration for reducing the electricity bill in the cloud. In

Proc. of Intl IFIP conference on Networking, pages 172–185, 2011.

[21] A. Caniff, L. Lu, N. Mi, L. Cherkasova, and E. Smirni. Fastrack for taming burstiness and saving power in

multi-tiered systems. In Proc. of the Int’l Teletraffic Congress (ITC), 2010.

[22] J. Carlstrom and R. Rom. Application aware admission control and scheduling in web servers. In Proc. IEEE Int’l

Conference on Computer Communications (INFOCOM), 2002.

[23] E. Cecchet, R. Singh, U. Sharma, and P. Shenoy. Dolly: virtualization-driven database provisioning for the cloud.

In Proc. ACM SIGPLAN/SIGOPS Int’l Conf. on Virtual execution environments, pages 51–62, 2011. 145

[24] S. Chaitanya, B. Urgaonkar, and A. Sivasubramaniam. Qdsl: a queuing model for systems with differential service

levels. ACM SIGMETRICS Performance Evaluation Review, 36(1):289–300, 2008.

[25] H. Chen and P. Mohapatra. Session-based overload control in QoS-aware Web servers. In Proc. IEEE INFOCOM,

pages 516–524, 2002.

[26] J. Chen, G. Soundararajan, and C. Amza. Autonomic provisioning of backend databases in dynamic content Web

servers. In Proc. IEEE Int’l Conference on Autonomic Computing (ICAC), 2006.

[27] M. Chen, H. Zhang, Y. Su, X. Wang, G. Jiang, and K. Yoshihira. Effective vm sizing in virtualized data centers.

In Integrated Network Management, pages 594–601, 2011.

[28] Y. Chen, S. Iyer, X. Liu, D. Milojicic, and A. Sahai. Sla decomposition: Translating service level objectives to

system level thresholds. In Proc. IEEE Int’l Conference on Autonomic Computing (ICAC), 2007.

[29] L. Cherkasova and P. Phaal. Session-based admission control: A mechanism for peak load management of com-

mercial web sites. IEEE Trans. on Computers, 51(6):669–685, 2002.

[30] M. B. Chhetri, Q. V. Bao, and R. Kowalczyk. A flexible policy framework for the QoS differentiation provisioning

of services. In Proc. IEEE/ACM Int’l Symposium on Clusters, Cloud and Grid Computing (CCGrid), 2011.

[31] N. Chowdhury, K. Mosharaf, and R. Boutaba. Network virtualization: state of the art and research challenges.

IEEE Communications, 47(7):20–26, 2009.

[32] N. S. Corporation. Netica-j: Java netica api. http://www.norsys.com/netica-j.html.

[33] P. Costa, J. Napper, G. Pierre, and M. V. Steen. Autonomous resource selection for decentralized utility computing.

In Proc. of Int’l Conference on Distributed Computing Systems, 2009.

[34] Y. Diao, J. L. Hellerstein, S. Parekh, H. Shaihk, and M. Surendra. Controlling quality of service in multi-tier Web

applications. In Proc. IEEE Int’l Conference on Distributed Computing Systems (ICDCS), 2006.

[35] Y. Diao, J. L. Hellerstein, S. Parekh, H. Shaihk, M. Surendra, and A. Tantawi. Modeling differentiated services

of multi-tier web applications. In Proc. IEEE Int’l Symposium on Modeling, Analysis, and Simulation table of

contents (MASCOTS), 2006.

[36] C. Dovrolis, D. Stiliadis, and P. Ramanathan. Proportional differentiated services: Delay differentiation and packet

scheduling. IEEE/ACM Trans. on Networking, 10(1):12–26, 2002.

[37] Q. Duan. End-to-end modelling and performance analysis for network virtualisation in the next generation internet.

Journal of Communication Network Distribution Systems, 8(2):53–69, 2012. 146

[38] M. El Barachi, N. Kara, and R. Dssouli. Towards a service-oriented network virtualization architecture. In Inno-

vations for Future Networks and Services, pages 1–7, 2010.

[39] S. Elnikety, E. Nahum, J. Tracey, and W. Zwaenepoel. A method for transparent admission control and request

scheduling in e-commerce web sites. In Proc. ACM WWW, pages 276–286, 204.

[40] D. F. Garcia, J. Garcia, J. Entrialgo, M. Garcia, P. Valledor, R. Garcia, and A. M. Campos. A qos control mechanism

to provide service differentiation and overload protection to internet scalable servers. IEEE TRANS. ON SERVICES

COMPUTING, 2(1):3–16, 2009.

[41] A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. Vl2:

a scalable and flexible data center network. In Proc. of the ACM SIGCOMM conference on Data communication,

2009.

[42] J. Guitart, D. Carrera, V. Beltran, J. Torres, and E. Ayguade. Session-based adaptive overload control for secure

dynamic web applications. In Proc. ICPP, 2005.

[43] J. Guitart, J. Torres, and E. Ayguade.´ A survey on performance management for internet applications. Concur-

rency: Practice and Experience, 22(1):68–106, 2010.

[44] C. Guo, G. Lu, H. Wang, S. Yang, C. Kong, P. Sun, W. Wu, and Y. Zhang. Secondnet: a data center network

virtualization architecture with bandwidth guarantees. In Proceedings of the 6th International COnference, pages

1–12, 2010.

[45] Y. Guo, P. Lama, and X. Zhou. Automated and agile server parameter tuning with learning and control. In Proc.

IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 656–667, 2012.

[46] Y. Guo and X. Zhou. Coordinated vm resizing and server tuning: Throughput, power efficiency and scalability. In

Proc. IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication

Systems (MASCOTS), pages 289–297, 2012.

[47] M. Harchol-Balter. Task assignment with unknown duration. Journal of ACM, 29(2):260–288, 2002.

[48] M. Harchol-Balter, B. Schroeder, N. Bansal, and M. Agrawal. Size-based scheduling to improve Web performance.

ACM Trans. on Computer Systems, 21(2):207–233, 2003.

[49] T. Horvath, T. Abdelzaher, K. Skadron, and X. Liu. Dynamic voltage scaling in multitier web servers with end-to-

end delay control. IEEE Trans. on Computers, 56(4):444–458, 2007. 147

[50] G. J., K. Joshi, M. Hiltunen, R. Schlichting, and C. Pu. Generating adaptation policies for multi-tier applications

in consolidated server environments. In Proc. IEEE Int’l Conference on Autonomic Computing (ICAC), pages 23

–32, 2008.

[51] A. Kamra, V. Misra, and E. M. Nahum. Yaksha: a self-tuning controller for managing the performance of 3-tiered

web sites. In Proc. Int’l Workshop on Quality of Service (IWQoS), 2004.

[52] M. Karlsson, C. Karamanolis, and X. Zhu. Triage: performance isolation and differentiation for storage systems.

In Proc. Int’l Workshop on Quality of Service (IWQOS), 2004.

[53] J. O. Kephart and D. M. Chess. The vision of autonomic computing. Computer, 36(1):41–50, 2003.

[54] M. Kesavan, A. Gavrilovska, and K. Schwan. Differential virtual time (dvt): rethinking i/o service differentiation

for virtual machines. In Proc. of the 1st ACM symposium on Cloud computing, 2010.

[55] E. Kiciman. Using statistical monitoring to detect failures in internet services. PhD thesis, Stanford University,

2005.

[56] P. Kumar. The multi-tier architecture for developing secure website with detection and prevention of sql-injection

attacks. International Journal of Computer Applications, pages 30–36, 2013.

[57] P. Lama and X. Zhou. Efficient server provisioning for end-to-end delay guarantee on multi-tier clusters. In Proc.

IEEE Int’l Workshop on Quality of Service (IWQoS), 2009.

[58] P. Lama and X. Zhou. Autonomic provisioning with self-adaptive neural fuzzy control for end-to-end delay guar-

antee. In Proc. IEEE Int’l Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication

Systems (MASCOTS), 2010.

[59] P. Lama and X. Zhou. amoss: Automated multi-objective server provisioning with stress-strain curving. In Proc.

IEEE Int’l Conference on Parallel Processing (ICPP), 2011.

[60] P. Lama and X. Zhou. PERFUME: Power and performance guarantee with fuzzy mimo control in virtualized

servers. In Proc. IEEE International Workshop on Quality of Service (IWQoS), pages 1–9, 2011.

[61] P. Lama and X. Zhou. AROMA: Automated resource allocation and configuration of mapreduce environment in

the cloud. In Proc. ACM International Conference on Autonomic Computing (ICAC), pages 63–72, 2012.

[62] P. Lama and X. Zhou. Efficient server provisioning with control for end-to-end delay guarantee on multi-tier

clusters. IEEE Transactions on Parallel and Distributed Systems, 23(1):78–86, 2012. 148

[63] P. Lama and X. Zhou. Ninepin: Non-invasive and energy efficient performance isolation in virtualized servers. In

Proc. IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 1–12, 2012.

[64] P. Lama and X. Zhou. Autonomic provisioning with self-adaptive neural fuzzy control for percentile-based delay

guarantee. ACM Transactions on Autonomous and Adaptive Systems, 2013.

[65] S. C. M. Lee, J. C. S. Lui, and D. K. Y. Yau. Admission control and dynamic adaptation for a proportional-delay

DiffServ-enabled Web server. In Proc. ACM SIGMETRICS, 2002.

[66] S. C. M. Lee, J. C. S. Lui, and D. K. Y. Yau. A proportional-delay diffserv-enabled Web server: admission control

and dynamic adaptation. IEEE Trans. on Parallel and Distributed Systems, 15(5):385–400, 2004.

[67] K. Li and S. Jamin. A measurement-based admission-controlled Web server. In Proc. IEEE INFOCOM, 2000.

[68] X. Liu, J. Heo, L. Sha, and X. Zhu. Queueing-model-based adaptive control of multi-tiered web applications.

IEEE Transactions on Network and Service Management, 5(3):157–167, 2008.

[69] X. Liu, X. Zhu, P. Padala, Z. Wang, and S. Singhal. Optimal multivariate control for differentiated services on a

shared hosting platform. In Proc. IEEE Conference on Decision and Control (CDC), 2007.

[70] L. Lu, L. Cherkasova, V. de Nitto Persone,` N. Mi, and E. Smirni. Await: Efficient overload management for

busy multi-tier web services under bursty workloads. In Proceedings of the 10th international conference on Web

engineering, pages 81–97, 2010.

[71] Y. Lu, T. F. Abdelzaher, C. Lu, L. Sha, and X. Liu. Feedback control with queueing-theoretic prediction for relative

delay guarantees in web servers. In Proc. Real-Time and Embedded Technology and Applications Symposium,

pages 208–217, 2003.

[72] M. Mazzucco, I. Mitrani, M. Fisher, and P. McKee. Allocation and admission policies for service streams. In

Modeling, Analysis and Simulation of Computers and Telecommunication Systems, 2008.

[73] Z. Mehbood, Z. Didar, and D. Lowe. Supporting integrated dependency model for change impact analysis in web

systems, 2008.

[74] D. A. Menasc’e and M. N. Bennani. Autonomic virtualized environments. In Proc. IEEE Int’l Conference on

Autonomic Computing (ICAC), 2006.

[75] D. A. Menasce,´ R. Fonseca, V. A. F. Almeida, and M. A. Mendes. Resource management policies for E-commerce

servers. ACM SIGMETRICS Performance Evaluation Review, 27(4):27–35, 2000. 149

[76] S. Meng, S. R. Kashyap, C. Venkatramani, and L. Liu. Remo: Resource-aware application state monitoring for

large-scale distributed systems. In Proc. of Int’l Conference on Distributed Computing Systems (ICDCS), 2009.

[77] X. Meng, C. Isci, J. Kephart, L. Zhang, E. Bouillet, and D. Pendarakis. Efficient resource provisioning in compute

clouds via vm multiplexing. In Proc. of the Intl conference on Autonomic computing, pages 11–20, 2010.

[78] N. Mi, G. Casale, L. Cherkasova, and E. Smirni. Burstiness in multi-tier applications: Symptoms, causes, and new

models. In Proc. ACM/IFIP/USENIX Int’l Middleware Conference (Middleware), 2008.

[79] N. Mi, G. Casale, L. Cherkasova, and E. Smirni. Injecting realistic burstiness to a traditional client-server bench-

mark. In Proc. IEEE Int’ Conference on Autonomic Computing (ICAC), 2009.

[80] S. Muppala and X. Zhou. CoSAC: Coordinated session-based admission control for multi-tier internet applications.

In Proc. IEEE Int’l Conference on Computer Communications and Networks (ICCCN), 2009.

[81] S. Muppala, X. Zhou, and L. Zhang. Regression based multi-tier resource provisioning for session slowdown

guarantees. In IEEE Int’l Performance Computing and Communications Conference (IPCCC), 2010.

[82] P. Padala, K.-Y. Hou, K. G. Shin, X. Zhu, M. Uysal, Z. Wang, S. Singhal, and A. Merchant. Automated control of

multiple virtualized resources. In Proc. of the EuroSys Conference (EuroSys), pages 13–26, 2009.

[83] V. Petrucci, E. V. Carrera, O. Loques, J. C. B. Leite, and D. Mosse. Optimized management of power and perfor-

mance for virtualized heterogeneous server clusters. In Proc. IEEE/ACM Int’l Symposium on Clusters, Cloud and

Grid Computing (CCGrid), 2011.

[84] N. Poggi, D. Carrera, R. Gavalda, J. Torres, and E. Ayguade. Characterization of workload and resource consump-

tion for an online travel and booking site. In Proc. IEEE Int’ Symposium on workload characterization (IISWC),

2010.

[85] J. Rao, X. Bu, C. Xu, and K. Wang. A distributed self-learning approach for elastic provisioning of virtualized

cloud resources. Intl. Symposium on Modeling, Analysis, and Simulation of Computer Systems, 0:45–54, 2011.

[86] J. Rao, X. Bu, C. Xu, L. Wang, and G. Yin. Vconf: a reinforcement learning approach to virtual machines

auto-configuration. In Proc. IEEE Int’l Conference on Autonomic Computing (ICAC), 2009.

[87] J. Rao, Y. Wei, J. Gong, and C.-Z. Xu. Dynaqos: Model-free self-tuning fuzzy control of virtualized resources for

qos provisioning. In Proc. of the Int’l Workshop on Quality of Service (IWQoS), 2011.

[88] J. Rao and C. Xu. CoSL: a coordinated statistical learning approach to measuring the capacity of multi-tier

Websites. In Proc. IEEE Int’l Parallel and Distributed Processing Symposium (IPDPS), 2008. 150

[89] J. Rao and C. Xu. Online measurement of the capacity of multi-tier websites using hardware performance counters.

In Proc. IEEE Int’l Conference on Distributed Computing Systems (ICDCS), 2008.

[90] L. Rao. J.p.morgan: Global e-commerce revenue. http://techcrunch.com/2011/01/03/j-p-morgan-global-e-

commerce-revenue-to-grow-by-19-percent-in-2011-to-680b (Date of access: Apr 7, 2011).

[91] L. Rao, X. Liu, M. Ilic, and J. Liu. Mec-idc: joint load balancing and power control for distributed internet data

centers. In Proc. of ACM/IEEE Intl Conference on Cyber-Physical Systems, pages 188–197, 2010.

[92] L. Rao, X. Liu, L. Xie, and W. Liu. Minimizing electricity cost: Optimization of distributed internet data centers

in a multi-electricity-market environment. In Proc. IEEE INFOCOM, pages 1–9, 2010.

[93] M. M. Rashid, A. S. Alfa, E. Hossain, and M. Maheswaran. An analytical approach to providing controllable

differentiated quality of service in web servers. IEEE Trans. on Parallel and Distributed Systems, 16(11):1022–

1033, 2005.

[94] A. Riska, E. Smirni, and G. Ciardo. ADAPTLOAD: effective balancing in clustered Web servers under transient

load conditions. In Proc. IEEE Int’l Conference on Distributed Computing Systems (ICDCS), pages 104–111,

2002.

[95] A. Shieh, S. Kandula, A. Greenberg, C. Kim, and B. Saha. Sharing the data center network. In Proc. of the 8th

USENIX conference on Networked systems design and implementation, 2011.

[96] A. Singh, M. Korupolu, and D. Mohapatra. Server-storage virtualization: integration and load balancing in data

centers. In Proc. of the ACM/IEEE conference on Supercomputing, 2008.

[97] R. Singh, U. Sharma, E. Cecchet, and P. Shenoy. Autonomic mix-aware provisioning for non-stationary data center

workloads. In Proc. IEEE Int’l Conference on Autonomic Computing (ICAC), pages 21–30, 2010.

[98] W. D. Smith. TPC-W: Benchmarking an Ecommerce solution. http://www.tpc.org/tpcw (Date of access: Nov 16,

2004).

[99] B. Speitkamp and M. Bichler. A mathematical programming approach for server consolidation problems in virtu-

alized data centers. IEEE Trans. Serv. Comput., 3, 2010.

[100] C. Taton, S. Bouchenak, N. Palma, and D. Hagimont. Self-optimization of internet services with dynamic resource

provisioning, 2008.

[101] G. Tesauro. Online resource allocation using decompositional reinforcement learning. In Proc. 20th national

conference on Artificial intelligence (AAAI), pages 886–891, 2005. 151

[102] G. Tesauro, R. Das, H. Chan, J. O. Kephart, C. Lefurgy, D. W. Levine, and F. Rawson. Managing power consump-

tion and performance of computing systems using reinforcement learning. In Advances in Neural Information

Processing Systems 20, 2008.

[103] G. Tesauro, N. K. Jong, R. Das, and M. N. Bennani. A hybrid reinforcement learning approach to autonomic

resource allocation. In Proc. IEEE Int’ll Conference on Autonomic Computing (ICAC), 2006.

[104] B. Urgaonkar, G. Pacific, P. Shenoy, M. Spreitzer, and A. Tantawi. An analytical model for multi-tier Internet

services and its applications. In Proc. ACM SIGMETRICS, 2005.

[105] B. Urgaonkar, P. Shenoy, A. Chandra, and P. Goyal. Dynamic provisioning of multi-tier Internet applications. In

Proc. IEEE Int’l Conference on Autonomic Computing (ICAC), 2005.

[106] B. Urgaonkar, P. Shenoy, A. Chandra, P. Goyal, and T. Wood. Agile dynamic provisioning of multi-tier Internet

applications. ACM Trans. on Autonomous and Adaptive Systems, 3(1):1–39, 2008.

[107] A. Verma, P. Ahuja, and A. Neogi. pmapper: power and migration cost aware application placement in virtualized

systems. In Proc. ACM/IFIP/USENIX Int’l Middleware Conference (Middleware), pages 243–264, 2008.

[108] D. Villela, P. Pradhan, and D. Rubenstein. Provisioning servers in the application tier for e-commerce systems.

ACM Trans. on Internet Technology, 7(1):1–23, 2007.

[109] X. Wang, D. Lan, G. Wang, X. Fang, Y. Meng, Y. Chen, and Q. Wang. Appliance-based autonomic provisioning

framework for virtualized outsourcing data center. In Proc. IEEE Int.l Conference on Autonomic Computing

(ICAC), 2007.

[110] A. Warfield, R. Ross, K. Fraser, C. Limpach, and S. Hand. Parallax: Managing storage for a million machines. In

Proc. of the Workshop on Hot Topics in Operating Systems, 2005.

[111] B. J. Watson, M. Marwah, D. Gmach, Y. Chen, M. Arlitt, and Z. Wang. Probabilistic performance modeling of

virtualized resource allocation. In Proc. of ICAC (Int’l Conference on Autonomic Computing), ICAC ’10, pages

99–108, 2010.

[112] C. Weng, M. Li, Z. Wang, and X. Lu. Automatic performance tuning for the virtualized cluster system. In Proc.

of Int’l Conference on Distributed Computing Systems (ICDCS), 2009.

[113] Wikipedia. Client-server model. http://en.wikipedia.org/wiki/Client-server-model (Date of access: March 7,

2011).

[114] Wikipedia. Internet. http://en.wikipedia.org/wiki/Internet (Date of access: Feb 2, 2011). 152

[115] J. Wildstorm, P. Stone, E. Witchel, and M. Dahlin. Machine learning for on-line hardware reconfiguration. Inter-

national Joint Conferences on Artificial Intelligence, 14(2):108–136, 2007.

[116] C. Xu, J. Rao, and X. Bu. Url: A unified reinforcement learning approach for autonomic cloud management. J.

Parallel Distrib. Comput., 72(2):95–105, 2012.

[117] C. Z. Xu. Scalable and Secure Internet Services and Architecture. CRC Press, 2005.

[118] H. Zhang, K. Yoshihira, Y. Su, G. Jiang, M. Chen, and X. Wang. ipoem: a gps tool for integrated management in

virtualized data centers. In Proc. IEEE Int’l Conference on Autonomic Computing (ICAC), pages 41–50, 2011.

[119] Q. Zhang, L. Cherkasova, G. Mathews, W. Greene, and E. Smirni. R-capriccio: a capacity planning and anomaly

detection tool for enterprise services with live workloads. In Proc. ACM/IFIP/USENIX Int’l Middleware Confer-

ence (Middleware), pages 244–265, 2007.

[120] Q. Zhang, L. Cherkasova, and E. Smirni. A regression-based analytic model for dynamic resource provisioning of

multi-tier Internet applications. In Proc. IEEE Int’l Conference on Autonomic Computing (ICAC), 2007.

[121] J. Zhou and T. Yang. Selective early request termination for busy internet services. In Proc. ACM WWW, 2006.

[122] X. Zhou, Y. Cai, E. Chow, and M. Augusteijn. Two-tier resource allocation for slowdown differentiation on server

clusters. In Proc. IEEE Int’l Conference on Parallel Processing (ICPP), pages 31–38, 2005.

[123] X. Zhou, Y. Cai, G. K. Godavari, and C. E. Chow. An adaptive process allocation strategy for proportional

responsiveness differentiation on Web servers. In Proc. IEEE Int’l Conference on Web Services (ICWS), pages

142–149, 2004.

[124] X. Zhou, Y. Cai, J. Wei, and C.-Z. Xu. An integrated application-level approach to responsiveness differentiation.

Proc. IEEE Int’l Conference on Web Services (ICWS), 2005.

[125] X. Zhou and D. Ippolit. Resource allocation optimization for quantitative service differentiation on server clusters.

Journal of Parallel and Distributed Computing, 68(9):1250 – 1262, 2008.

[126] X. Zhou, J. Wei, and C.-Z. Xu. Modeling and analysis of 2D service differentiation on e-Commerce servers. In

Proc. IEEE Int’l Conference on Distributed Computing Systems (ICDCS), pages 740–747, 2004.

[127] X. Zhou, J. Wei, and C.-Z. Xu. Processing rate allocation for proportional slowdown differentiation on Internet

servers. In Proc. IEEE Int’l Parallel and Distributed Processing Symposium (IPDPS), pages 88–97, 2004.

[128] X. Zhou, J. Wei, and C.-Z. Xu. Resource allocation for session-based two-dimensional service differentiation on

e-commerce servers. IEEE Trans. on Parallel and Distributed Systems, 17(8):838–850, 2006. 153

[129] X. Zhou, J. Wei, and C.-Z. Xu. Quality-of-service differentiation on the internet: A taxonomy. Journal of Network

and Computer Applications, Elsevier, 30(1):354–383, 2007.

[130] H. Zhu, H. Tang, and T. Yang. Demand-driven service differentiation for cluster-based network servers. In Proc.

IEEE INFOCOM, pages 679–688, 2001.