Quantifying the Impact of Platform Configuration Space for Elasticity Benchmarking

Study Thesis

Nikolas Roman Herbst

At the Department of Informatics Institute for Program Structures and Data Organization (IPD), Informatics Innovation Center (IIC)

Reviewer: Prof. Ralf Reussner Advisor: Dr.-Ing. Michael Kuperberg Second advisor: Dipl.-Inform. Nikolaus Huber

Duration: April 14th, 2011 – August 31st, 2011

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association www.kit.edu

iii

Disclaimer The measurements and the results presented in this study thesis have been obtained using prototypic implementations of research ideas, deployed in a non-productive experimental environment. They are neither representative for the performance of IBM System z, nor can they be used for comparison or reference purposes. Neither IBM nor any other men- tioned hardware/software vendors have sanctioned or verified the information contained in these slides. Any reproduction, citation or discussion of the results contained herein must be accompanied by this disclaimer in complete and untranslated form. Usage of these results for marketing or commercial purposes is strictly prohibited.

Declaration of Originality

I declare that I have developed and written the enclosed Study Thesis completely by myself, and have not used sources or means that do not belong to my intellectual property without declaration in the text. Karlsruhe, 2011-08-26

Contents

Abstract 1

1. Introduction 3 1.1. Context ...... 3 1.2. Contribution ...... 4

2. Scalability 7 2.1. Problem Description ...... 7 2.2. Definition of Scalability ...... 8

3. Elasticity 11 3.1. A Definition of Elasticity ...... 12 3.2. Elasticity Metrics ...... 13 3.3. Direct and Indirect Measuring of Elasticity Metrics ...... 13 3.4. Extraction of Provisioning Time using Dynamic Time Warping Algorithm (DTW) for Cause Effect Mapping ...... 17 3.5. Interpretation of Provisioning Time Values based on Extraction by DTW Algorithm ...... 19 3.6. A Single-valued Elasticity Metric? ...... 20

4. Elasticity Benchmark for Thread Pools 21 4.1. Variable Workload Generation ...... 21 4.2. Experiment Setup ...... 21 4.3. Extraction of Elasticity Metrics ...... 24 4.4. Experiments and Platforms ...... 26 4.5. Results and Elasticity Metric Illustrations ...... 27 4.5.1. Experiment 1 - Platform 1 ...... 27 4.5.2. Experiment 3 - Platform 1 ...... 32 4.5.3. Experiment 5 - Platform 1 ...... 37 4.6. Observations and Experiment Result Discussion ...... 42

5. Elasticity Benchmark for Scaling Up of z/VM Virtual Machines 45 5.1. Experiment Setup ...... 46 5.2. Workload Generation ...... 47 5.3. Results ...... 47

6. Future Work 49 6.1. Scale Up Experiment using CPU Cores as Resource ...... 49 6.2. Elasticity Benchmark for Scaling out the Number of z/VM Virtual Machines Instances in a Performance Group ...... 49

7. Conclusion 51 vi Contents

8. Acknowledgement 53

Bibliography 54

Appendix 57 A. Experiment 2 and 4 on Platform 1 ...... 57 A.1. Experiment 2 - Platform 1 ...... 57 A.2. Experiment 4 - Platform 1 ...... 61 B. Experiments 1-5 on Platform 2 ...... 65 B.1. Experiment 1 - Platform 2 ...... 65 B.2. Experiment 2 - Platform 2 ...... 69 B.3. Experiment 3 - Platform 2 ...... 73 B.4. Experiment 4 - Platform 2 ...... 77 B.5. Experiment 5 - Platform 2 ...... 81 Abstract

Elasticity is the ability of a software system to dynamically adapt the amount of the resources it provides to clients as their workloads increase or decrease. In the context of cloud computing, automated resizing of a virtual machine’s resources can be considered as a key step towards optimisation of a system’s cost and energy efficiency. Existing work on cloud computing is limited to the technical view of implementing elastic systems, and definitions of scalability have not been extended to cover elasticity. This study thesis presents a detailed discussion of elasticity, proposes metrics as well as measurement techniques, and outlines next steps for enabling comparisons between cloud computing offerings on the basis of elasticity. I discuss results of our work on measuring elasticity of thread pools provided by the Java virtual machine, as well as an experiment setup for elastic CPU time slice resizing in a virtualized environment. An experiment setup is presented as future work for dynamically adding and removing z/VM Linux virtual machine instances to a performance relevant group of virtualized servers.

1. Introduction

The technical report Defining and Quantifying Elasticity of Resources in Cloud Computing and Scalable Platforms [1] was an early result of the research work in our team consisting of J´oakim von Kistowski, Michael Kuperberg and myself as authors. Some contents and figures that have already been published in this technical report are reused in sections 1-4 of this study thesis.

1.1. Context

In Cloud computing [2, 3], resources and services are made available over a network, with the physical location, size and implementation of resources and services being trans- parent. With its focus on flexibility, dynamic demands and consumption-based billing, cloud computing enables on-demand infrastructure provisioning and “as-a-service” offering of applications and execution platforms. Cloud computing is powered by virtualization, which enables an execution platform to provide several concurrently usable (and independent) instances of virtual execution plat- forms, often called virtual machines (VMs). Virtualization itself is a decades-old technique [2, 4], and it has matured over many gener- ations of hardware and software, e.g. on IBM System z mainframes [5, 6]. Virtualization comes in many types and shapes: hardware virtualization (e.g. PR/SM), OS virtualiza- tion (e.g. Xen, VMware ESX), middleware virtualization (e.g. JVM) and many more. Virtualization is a technique which has become very popular in industry and academia, leading to a large number of new products, business models and publications. Mature cloud computing platforms promise to approximate performance isolation: a strug- gling VM with saturated resources (e.g. 100% CPU load) should only minimally affect the performance of VMs on the same native execution platform. To implement this be- haviour, a virtual execution platform can make use of a predefined maximum allowed share of the native platform’s resources. Some platforms even provide facilities for run time adaptation of these shares, and Cloud computing platforms which feature run time and/or demand-driven adaptation of provided resources are often called elastic. Platform Elasticity is the feature of automated, dynamic, flexible and frequent resizing of resources that are provided to an application by the execution platform. Elasticity can be considered as a key benefit of the cloud computing. Elasticity carries the potential for 4 1. Introduction optimizing system productivity and utilization, while maintaining service level agreements (SLAs) and quality of service (QoS) as well as saving energy and costs. While the space of choices between cloud computing providers for each domain (public, private and hybrid clouds) is getting wider, the means for comparing their features and service qualities are not yet finally developed. Several security issues for virtualised sys- tems running on the same physical hardware cannot be answered completely, as well as guaranteeing performance independence of so called “noisy neighbours”. Scalability metrics have already been proposed by M. Woodside et al. in 2000 [7] by evalu- ating a systems productivity at different levels of scale. For considerations on how “elastic” a system is, these proposed metrics are insufficient because of not taking the temporal as- pect of automated scaling actions into account. The viewpoint of dynamic scalability demands further observation and metrics on how often, how fast and how significantly scaling of a system can be executed. The actuality and relevance of researching this topic can be confirmed by the Gartner study “Five refining Attributes of Public an Private Cloud Computing” from May 2009 [8], where elasticity of virtual resources is stated as one central highlight of the modern cloud computing technology. The term “elasticity” itself is already often used in advertising of cloud infrastructure providers. Amazon even names their infrastructure “Elastic Compute Cloud - EC2”. As an example for automated resizing actions, Amazon offers a automated scaling API for their EC2 clients, which is described in detail at [9]. The client can control the number of virtual machine instances via policies, which observe the average CPU usage in an scaling group of virtual machine instances. The high utility of these features is demonstrated in a number of use cases within this manual. Anyhow, Amazon has not published yet any figures on how fast scaling actions are executed. The scaling automatism is restricted to only control the number of VMs in a group by observing CPU usage. Perhaps a higher granularity of automated programmable scaling actions could be useful to the Amazon cloud clients, like resizing a VM’s virtual resources at run time.

1.2. Contribution

In this study thesis I outline the basic idea of elasticity in the context of cloud computing. I discuss the term of scalability as the enabling feature of a system for elastic behaviour. Furthermore, I explain several key metrics, that characterize resource elasticity, and discuss possible ways towards direct and indirect measurements of these elasticity metrics. An appropriate definition for resource elasticity in the context of cloud computing could not yet be found in research literature, for example in [10, 11, 7, 12]. A clear definition of resource elasticity is provided within this study thesis, as well as an detailed outline of key metrics characterizing elastic behaviour of a system. The basic concepts of resource elasticity metrics are transferable to other (also non-IT) contexts, where changing resource demands are supposed to fit to provided resources. As validation for the concept of elasticity metrics, the following three experiments, that are based on different elastic resource pools, are presented in detail, followed by an in- terpretation of the conducted measurement’s results. Configuration parameters will be highlighted that influence the elasticity behaviour directly. 1.2. Contribution5

Experiments: • Dynamic growing and shrinking of Java Thread Pools • Variable CPU time slice distribution to virtual machines • Dynamically adding/removing virtual machine instances to a performance relevant group of virtualized servers The conducted experiments aim towards the development of a dedicated benchmark to measure resource elasticity metrics. By validating the approach on a reference cloud environment, namely an IBM System z, resource elasticity could probably be added as an intuitively comparable parameter for differentiation between cloud computing offerings.

2. Scalability

An execution platform that is not scalable within certain boundaries cannot be elastic in any case. Scalability is the basis for elasticity of a system. Due to this, scalability is introduced and discussed first in this chapter, whereas elasticity is then defined in the following chapter.

2.1. Problem Description

Scalability is a term that can be applied both to applications and to execution platforms. In this section, I introduce a more precise terminology for scalability and discuss existing approaches of measuring it. Scalability of computing systems is a crucial point already at the design phase of software components. A lot of research has already been done on how to achieve good scalability of software or hardware systems. This includes approaches on how to avoid bottlenecks already in the system’s architecture, that can possibly slow a system down even with when more resources are assigned to it. As it is outlined in the technical report [1] a system is considered as scalable when response time and through put of jobs processed by the system, lay in certain acceptable intervals, even if the number of users is increasing over time. A scalable system, that is adapted by a system administrator, can achieve this only if the workload of the system is somehow predictable. For a scalable system the productivity should stay more or less constant even if the number of users varies significantly. The system’s overhead to organize more sessions should grow at least linear to the number of users and not in a higher polynomial or even exponential function class. Due to system hardware or software design issues, scalability can only be provided within certain boundaries. To be aware of these boundaries is important when scaling a system automatically. For example when a web page becomes popular, the changes in the number of users can vary in orders of magnitude and also at high speeds in both directions. In this case manual resizing of a system would be ineffective due to high hardware investments and adminis- tration costs. Amazon’s EC2 offers a Auto Scaling API [9] to their clients that features to automatically adapt the number of virtual machine instances that share the same work- loads. Amazon has not yet published figures on how quick these changes are executed, 8 2. Scalability just states the upper bound of 21 virtual machines as the limit where communication and synchronisation overheads start to increase. Means for describing the temporal aspects of automated scaling behaviour may be helpful when compare different cloud platforms.

2.2. Definition of Scalability

The following definitions have been presented in the technical report [1] in section 2. Application scalability is a property which means that the application maintains its perfor- mance goals/SLAs even when its workload increases (up to a certain upper bound). This upper workload bound defines a scalability range and highlights the fact that scalability is not endless: above a certain workload, the application won’t be able to maintain its performance goals or SLAs. Application scalability is limited by application’s design and by the use of execution plat- form resources by the application. In addition to a performance-aware implementation (efficient resource use and reuse, minimization of waiting and thrashing, etc.), applica- tion scalability means that the application must be able to make use of more additional resources (e.g. CPU cores, network connections, threads in a thread pool) as the demand- driven workload increases. For example, an application which is only capable of actively using 2 CPU cores will not scale to an 8-core machine, as long as the CPU core is the limiting (“bottleneck”) factor. Of course, to exercise its scalability, the application must be “supported” by the execution platform, which must be able to provide the resources needed by an application. This means that when establishing metrics for quantifying scalability, the scalability metric values are valid for a given amount of resources and a given amount/range of service demand. Determination of speedup for the same service demand with additional resources and efficiency is then detectable, as a measure of how good the application is using the additionally provided resources. Correspondingly, platform scalability is the ability of the execution platform to provide as many (additional) resources as needed (or explicitly requested) by an application. In our terminology, an execution platform comprises hardware and/or software layers that an application needs to run. An example application can be an execution system (where the execution system comprises hardware and possibly a hypervisor) or a web shop (where the execution platform comprises middleware, operating system and hardware). Other examples of execution platforms include IBM System z (for running z/OS and other oper- ating systems), cloud environment (e.g. IBM CloudBurst [13]) or a “stack” encompassing hardware and software (e.g. LAMP [14]). The execution platform can be visible as a single large “black box” (the internal composition is not visible from outside) or as a set of distinguishable layers/components. There are two “dimensions” of platform scalability, and a system can be scalable in none, one or both of them: Scale vertically or scale up means to add more resources to a given platform node, like additional CPU cores, bigger CPU time slices shares or memory in a way that the platform node can handle a larger workload. By scaling up a single platform node, physical limits that impact bandwidth, computational power etc. are often reached quite fast. Scale horizontally or scale out means adding new nodes of (e.g. virtual machine instances or physical machines) to a cluster or distributed system in a way that the entire 2.2. Definition of Scalability9

system can handle bigger workloads. Depending on the type of the application, the high I/O performance demands of the single instances that work on shared data often increase communication overheads and prevent the emergence of substantial performance gains, especially when adding nodes at bigger cluster sizes. In some scenarios, scaling horizontally may even result in performance degradation. Note that this definition of platform scalability does not include the temporal aspect: when scaling means that additional resources are requested and used, the definition of scalabil- ity does not specify how fast, how often and how significantly the needed resources are provisioned. Additionally, scalability is not a constant property: the state of the execution platform and the state of the application (and its workload) are not considered. In fact, scalability can depend on the amount of already provided resources, on the utilization of these resources, and on the amount of service demand (such as number of requests per second). Fig. 2.1 sketches a simplified, synthetic example of how a fixed allocation of available resources (i.e. no platform scalability) means that the response time rises with increasing workload intensity and diminishes when the workload intensity does the same. In Fig. 2.1, the response time rises monotonically with the workload intensity (and falls monotonically while workload intensity diminishes). The reality may be more complicated.

Figure 2.1.: Schematic example of a scenario where execution platform is not scalable: the (idealized) correlation between application workload intensity and application response time

Fig. 2.2 in turn sketches how, on a scalable execution platform and with a fixed work- load, the performance metric (response time) should be affected positively by additional resources. Note that in Fig. 2.2, the response time decreases monotonically with additional resources. Woodside and Jogalekar [7] established a scalability metric based on productivity. To define scalability, they use a scale factor k (which is not further explained in [7]), and observe the three following metrics: • λ(k): throughput in responses/second at scale k, • f(k): average value of each response, calculated from its quality of service at scale k, • C(k): costs at scale k, expressed as a running cost (per second, to be uniform with λ) 10 2. Scalability

Figure 2.2.: Schematic Example of a Scenario with a fixed Application Workload and scal- able execution platform: the (idealized) correlation between amount of re- sources provided by the platform and application response time

Productivity F (k) is calculated in [7] as the throughput delivered per second, divided by the cost per second: F (k) = λ(k) ∗ f(k)/C(k). Then, Woodside and Jogalekar postulate that “if productivity of a system is maintained as the scale factor k changes, the system is regarded as scalable”. Finally, the scalability metric ψ relating systems at two different scale factors is then defined as the ratio of their productivity figures: ψ(k , k ) = F (k2) . 1 2 F (k1) While this definition of scalability allows to compare the scalability from the workload (or application) view, it is not possible to compare execution platforms, as the metric is specific for a given workload. Additionally, the definition assumes a “stable state” and does not consider the actual process of scaling, where resource allocations are adapted. Therefore, the provided definition of elasticity will not be based on the scalability definition from [7]. In the next section, a platform-centric definition of resource elasticity is presented which considers temporal and quantitative aspects of execution platform scaling. 3. Elasticity

The following definitions can also be found similarly in our publication [1]. When service demand increases, elastic cloud platforms dynamically add resources (or make more resources available to a task). Thus, elasticity adds a dynamic component to scalability - but how does this dynamic component look like? On an ideal elastic platform, as application workload intensity increases, the distribu- tion of the response times of an application service should remain stable as additional resources are made available to the application. Such an idealistic view is shown by the synthetic example in Fig. 3.1, which also includes dynamic un-provisioning of resources as the application workload decreases. Note that the dynamic adaptation is a continuous (non-discrete) process in Fig. 3.1.

Figure 3.1.: Schematic Example of an (unrealistically) ideal elastic System with immediate and fully-compensating elasticity:

However, in reality, resources are actually measured and provisioned in larger discrete units (i.e. one processor core, processor time slices, one page of main memory, etc.), so a continuous idealistic scaling/elasticity cannot be achieved. On an elastic cloud platform, the performance metric (here: response time) will rise as workload intensity increases until a certain threshold is reached at which the cloud platform will provide additional 12 3. Elasticity resources. Until the application detects additional resources and starts making use of them, the performance will recover and improve - for example, response times will drop. This means that in an elastic cloud environment with changing workload intensity, the response time is in fact not as stable as it was in Fig. 3.1. Now that we have determined a major property of elastic systems, the next question is: how to quantify elasticity of a given execution platform? The answer to this question is provided by Fig. 3.2 - notice that it reflects the previously mentioned fact that the performance increases at certain discrete points. When quantifying to define and to measure elasticity, it will be necessary to quantify the temporal and quantitative properties of those points at which performance is increased.

Figure 3.2.: Schematic Example of an Elastic System

3.1. A Definition of Elasticity

Changes in resource demands or explicit scaling requests trigger run time adaptations of the amount of resources that an execution platform provides to applications. The magnitude of these changes depends on the current and previous state of the execution platform, and also on the current and previous behaviour of the applications running on that platform. Consequently, elasticity is a multi-valued metric that depends on several run time factors. This is reflected by the following definitions, which are illustrated by Fig. 3.3:

Elasticity of execution platforms consists of the temporal and quantitative properties of runtime resource provisioning and un-provisioning, performed by the execution plat- form; execution platform elasticity depends on the state of the platform and on the state of the platform-hosted applications. Reconfiguration point is a time point at which a platform adaptation (resource provi- sioning or un-provisioning) is processed by the system.

Note that a reconfiguration point is different from (and later than) an eventual triggering point (which is in most cases system internal and starts the reconfiguration phase), and also take into consideration that the effects of the reconfiguration may become visible some time after the reconfiguration point, since an application needs time to adapt to the changed resource availability. Besides this it is important to know that while reconfiguration points and the time point of visibility of effects may be measurable, the triggering points may not be directly observable. 3.2. Elasticity Metrics 13

Figure 3.3.: Three aspects of the Proposed Elasticity Metric

3.2. Elasticity Metrics

There are several characteristics of resource elasticity, which (as already discussed above) are parametrised by the platform state/history, application state/history and workload state/history:

Effect of reconfiguration is quantified by the amount of added/removed resources and thus expresses the granularity of possible reconfigurations/adaptations. Temporal distribution of reconfiguration points describes the density of reconfiguration points over a possible interval of a resource’s usage amounts or over a time interval in relation to the density of changes in workload intensity. Provisioning time or reaction time is the time interval between the instant when a re- configuration has been triggered/requested until the adaptation has been completed. An example for provisioning time would be the time between the request for an ad- ditional thread and the instant of actually holding it.

An example matrix describing elasticity characteristics of an execution platform is given in Fig. 3.4 for a hypothetical cloud platform. Each resource is represented by a vector of the three aforementioned key elasticity characteristics.

Figure 3.4.: Elasticity Matrix for an Example Platform

3.3. Direct and Indirect Measuring of Elasticity Metrics

In general, the effects of scalability are visible to the user/client via changing response times or throughput values at a certain scaling level of the system. On the other hand, the elasticity, namely the resource resizing actions, may not be directly visible to the client due to their shortness or due to client’s limited access to an execution platform’s state and configuration. 14 3. Elasticity

Therefore, measuring resource elasticity from a client’s perspective by observing through- put and response times, requires indirect measurements and approximations, whereas mea- suring the resource elasticity on the “server side” (i.e. directly on the execution platform) can be more exact, if the execution platform provides information about held resources and resource provisioning and un-provisioning. If the execution platform is not aware of its recently held resources, it is necessary to develop tools for measuring them in a fine granular way. The ways of getting the resource amount a system is holding at the moment in time can differ strongly between platforms and types of resources. This is one reason why a portable elasticity benchmark is hard to develop. For elasticity measurements on any elastic system, it is necessary to fill the system with a variable intensity of workloads. The workload itself consists of small independent work- load elements that are supposed to run concurrently and designed to stress mainly one specific resource type (like Fibonacci calculation for CPU or an array sort for memory). “Independent workload element” means in this case that there is no interdependency be- tween the workload elements that would require communication or synchronisation and therefore induce overheads. It is necessary to stress mainly the “resource under test”, to avoid bottlenecks elsewhere in the system. Before starting the measurement on a specific platform, the basic workload elements should be calibrated without any concurrency on the resources and in a way that one workload element needs approximately one resource entity for a certain time span. This workload element calibration is also necessary to provide comparability between different systems later on. While measuring, we keep track of the workload intensity as well as of the number of provided resource entities by the system. Having these values logged for a single experiment run, we calculate further indirect values representing the key elasticity metrics. As the concepts of resource elasticity are validated in the following chapter using Java thread pools as virtual resources provided by a Java Virtual Machine, I would like to introduce them already as running example. Java thread pools are designed to grow and shrink dynamically in size, while still trying to reuse idle resources. In general we differentiate between the following orders of values (and can be applied to the Java thread pool example)

1st order values - client side are all workload attributes and workload events that can directly be measured and logged precisely even on the client-side like timestamps, number of workload elements submitted and finished and their waiting and processing times. For the example of Java thread pools, client side 1st order values would be: • attributes: a workload element’s start time, waiting time and processing time • events: time stamp and values of any change in numbers of submitted or finished workload elements 1st order values - execution system side include the above mentioned events and attri- butes, that are visible at the client side too, and in addition the direct exact mea- surement of provided resource entities at a certain point in time. The execution system should be able to accurately provide these values, otherwise an additional measurement tool must run on the execution platform. In addition to the above mentioned client side 1st order values, we measure: 3.3. Direct and Indirect Measuring of Elasticity Metrics 15

• resource events: time stamp and values of any change in numbers of provided thread resources 2nd order values - execution system side are values that can directly be calculated from the 1th order values. They include amount and time point of increase / decrease of workload or resource entities as well as the rate of reconfigurations in relation to the rate of workload changes. Even though these values are not directly visible to a client, they can be considered as precise due to their direct deduction from measurable values. For the Java thread pool example, 2nd order values are: • workload intensity: time stamp and value of right now executing workload elements as the difference of submitted minus finished ones. • resource events: time interval of two successive measurement samples and amount of added or removed thread resources • resizing ratio: number of changes in thread pool size in relation to changes in workload intensity within a certain time interval 3rd order values - execution system side are values that cannot be directly calculated from 1st or 2nd order values. These values need to be approximated in a suitable way. The system’s provisioning time is a delay between an internal trigger event and the visibility of a resource reallocation. Time points of internal trigger events and information about execution of such an event are system internal and often not logged. Suitable ways for approximation are discussed in detail in the next section. For the Java thread pool example, 3rd order values are: • approximation for the provisioning time of a thread pool size change. 4th order values - execution system side are values that can be derived from system in- ternal log files, that represent the state of the system, and can contain time point and reason for a reconfiguration trigger event, information on resource provisioning ac- tions or workload rejections. When using these values, intrinsic knowledge about the system implementation is necessary. Portability of those metrics is not given. Access to these values would be important for trustworthy validation of our approach. For the Java thread pool example, 4th order values are: • time points of trigger events and thread pool state. Since we know details about a Java thread pool implementation, a trigger event for a new thread resource is thrown if the waiting queue is full and no thread is idle. A thread resource stays idle for a given timeout parameter and dies after that time, if it could not be reused.

Response times of the basic workload elements should be logged and split up in waiting and processing times. As explained before, response time should stay within an acceptance interval given for example by a SLA. We define the waiting time of a workload element as the duration from feeding a workload task to the execution system until the system finally schedules the workload task and begins its processing. This definition of waiting time does not include any waiting times of interruptions by hardware contention or system scheduling. Waiting and processing times are intuitively interpretable if the size of tasks is not too small so that the processing time values do not show high variability due to system scheduling influences. Processing times should not show high variations, when a workload element is at least one order of magnitude 16 3. Elasticity larger than scheduling time slices and the workload’s maximum concurrency level can still be directly mapped to physically concurrent resources. Waiting and processing time of the workload tasks mainly depend on recent workload intensity, the system’s scheduling and the actual available level of physical concurrency of the resource under test.

When taking response time as a performance metric, it is necessary to exercise elasticity measurements only on elastic platforms, where the level of concurrency within the work- load can be covered by physically provided resources with no hardware contention at the maximal possible system configuration (upper bound for scaling).

This approach is illustrated in an idealized way in the following Fig. 3.5 on the left side. A suitable performance metric for the resource under test (like response time for CPU bound tasks) declines, the system automatically reacts by provisioning of more resources and then the performance metric recovers again. The provisioning time could be approximated by the time interval from the time points where the performance metric declines until system reacts.

Figure 3.5.: Two different Approaches for Measuring Provisioning Times [15]

If we measure elasticity of a virtual resource that shares physical resources with other tasks, no precise view on a correlation between cause and effect will be given anymore when trying to interpret the characteristics of waiting and processing times. In this case observing response times does not allow direct and exact extraction of elasticity metrics.

The provisioning times of a elastic system, which were defined as the delay between a trig- ger event and the visibility of a resource reallocation, cannot be measured directly without having access to system internal log files and setting them in relation with measured re- source amounts.

If a new workload task is added and cannot be served directly by the system, we assume that a trigger event for resizing is created within the execution platform. Not every trigger event results in a resource reallocation. In addition, the measurement log files on execution platform side must be enabled to keep track of any changes in resource amounts.

As illustrated in Fig. 3.5 on the right side, it is possible to intuitively find a mapping between a trigger event and its effect manually. In reality this graph can look much more complicated, as we see in the results section in the following chapter. Manually finding of mappings intuitively can still be done for most cases, but several special cases occur, like the resource reuse (visualized in Fig. 3.6 in the left part). If the execution system works with waiting queues and these queues are longer than two elements, the trigger events and effects can overlap. This results in ambiguous mappings as it can be seen in Fig. 3.6 on the right side. 3.4. Extraction of Provisioning Time using Dynamic Time Warping Algorithm (DTW) for Cause Effect Mapping 17

Figure 3.6.: Finding the right correlation between Trigger and Effect event for provisioning time extraction

This leads to the need for an exact algorithmic approach to find a plausible cause-effect mapping by establishing several rules that must be fulfilled. E.g. any reallocation of x resource entities must have at least x trigger events in advance without execution of a change since the trigger’s release. This algorithmic approach is discussed more detailed in Joakim von Kistowski’s bachelor thesis [15]. We also researched another approach for finding a suitable approximation for a cause-effect mapping by using already existing algorithms: Dynamic Time Warping (DTW) algorithms minimize the sum of distances between two time series [16, 17]. A time series is a table with two columns, the first contains a time stamp, the second column a value that has been measured at that time. Our measurement log file will contain a time stamp, a resource amount value as well as a workload intensity value per entry. Therefore we can consider these data as two time series, whereas it is important to have only one entry per change in number of tasks in execution for the first time series or in number of resources for the second. The workload time series influences the resource amount time series. We consider only changes in workload intensity as triggers for changes in the resource amount time series. We are looking for a cause-effect mapping that enables us to extract provisioning times of these two time series, filled with change events. How this could be possibly done by the DTW algorithm is discussed in the following section.

3.4. Extraction of Provisioning Time using Dynamic Time Warping Algorithm (DTW) for Cause Effect Mapping The Dynamic Time Warping (DTW) algorithm offers the feature to calculate a similarity metric by computing the minimum sum of distances between two time series. The two time series may vary in time or speed in a non-linear way. The DTW algorithm outputs the induced mapping of measurement samples(timepoints) and the sum of distances between these mappings as a distance/similarity metric. Having this similarity metric (called DTW distance) is very useful in the field of automated speech recognition, to match differently stressed words even at variable levels of talking speeds. The DTW algorithm is an example for an algorithm using dynamic programming and backtracking [16, 17]. This algorithm is able to solve such a one-dimensional optimisation problem in polynomial time (normally with a complexity of O(n2) in memory and time). A second dimension can be added to a time series by having two y-values per measurement sample. For any two-dimensional time series the described optimisation problem of finding the smallest distance between them is already NP-complete (also called planar warping)[16, 17]. Note that the DTW algorithm just takes the timestamps of two time series into account 18 3. Elasticity and no additional values that may be related to a time stamp. In the case of elasticity measurements, every time stamp relates to a change event of workload intensity or a change in the number of resource entities. The following Fig. 3.7 from [17] illustrates an example DTW mapping. We see on the x-axis the number of measurement samples. The y-axis is not shown in this illustration. To understand this illustration it helps to imagine two different y-axis, whereas the second y-axis has a constant offset be able to make the mapping visible. Unlike it will be in our time series, there are still measurement samples, where the y-value does not change until the following sample. Note that the time stamp values itself are irrelevant to the result of the DTW algorithm, just y-values of the measurement point. There is no restriction, that the two time series have to be of the same length of samples, even though they actually are in the illustrated case in Fig. 3.7. The mappings do not cross other mappings. The time series length has to be finite for applying the DTW algorithm. Due to this assumption, every sample of the one time series is at least mapped to one sample in the other time series. The DTW algorithm is deterministic.

Figure 3.7.: Illustration of a Dynamic Time Warping Result [17]

Figure 3.8.: A Cost Matrix for Fig. 3.7 with the Minimum-Distance Warp Path traced through it. [17] 3.5. Interpretation of Provisioning Time Values based on Extraction by DTW Algorithm19

To explain Fig. 3.8 I would like to cite from [17], section 2.1: “[The figure] shows an example of a cost matrix and a minimum-distance warp path traced through it from D(1,1) to D(|X|,|Y|). The cost matrix and the warp path [. . . ] are for the same two time series [. . . ]. The warp path is W = (1,1), (2,1), (3,1), (4,2), (5,3), (6,4), [. . . ] (15,15), (16,16). If the warp path passes through a cell D(i, j) in the cost matrix, it means that the ith point in time series X is warped to the jth point in time series Y. Notice that where there are vertical sections of the warp path, a single point in time series X is warped to multiple points in time series Y, and the opposite is also true where the warp path is a horizontal line. Since a single point may map to multiple points in the other time series, the time series do not need to be of equal length. If X and Y were identical time series, the warp path through the matrix would be a straight diagonal line.” In [16, 17] it is also explained in detail how a linear complexity can be achieved, when just a good approximation for the minimal distance and not the very best mapping for every case is necessary. The normal DTW algorithm calculates the minimal distance in a complexity O(n2). The algorithm achieving linear complexity is called fast DTW algorithm (fDTW) and useful if the data sets are massive or the processing should not consume too much energy or time. The fDTW algorithm has a radius as additional input parameter. Only within this radius of measurement samples the algorithm searches for the best mapping. By having the radius as a constant input, that is not growing with the input size, the linear complexity is achieved, whereas the optimum may not be found within this radius. The fDTW is still deterministic. Now we need to discuss whether a DTW or even a fDTW algorithm can be appropriate for our case of extracting a mapping where one series represents the causes and the other the effects. A normal DTW algorithm can have a distance function as input parameter, which adds more functionality. The DTW algorithm by itself cannot differentiate between a cause and a effect series. This can lead to senseless “backward” mappings. It is important that the input values for each of the time series is a change event on the resource time series side and also on the workload side - not just a point of measurement, where nothing happened. Normally a measurement log file contains a new data set for every time just a single value changed. These time series must be simplified of all “meaningless no change” time stamps before passed to the DTW algorithm.

3.5. Interpretation of Provisioning Time Values based on Ex- traction by DTW Algorithm The mapping of measurement points given by DTW enables the approximation of provi- sioning times. These provisioning times are based on DTW approximation and therefore must be interpreted very carefully. It is not yet validated that a DTW mapping is a good mapping for our cause-effect mapping problem. Not every given mapping must be a real provisioning of the system that took place during the measurement. Several trigger events can have a just a single visible effect. When applying DTW we assume having a reactive system in which any effect takes place after it was triggered. All triggers are system internal events. A system could possibly have external triggers that induce foreseeable changes in resource demands but we do not know about by default, because they are domain specific (e.g. like opening hours of banking terminals), is also thinkable. We observed that an elastic system can overreact and provide more resources than actually needed. For an reactive elastic execution platform this is only possibly if the granularity 20 3. Elasticity of resizings is too low. It is also possible that an elastic system has features implemented that try to intelligently foresee the workload intensity changes (which than inserts internal trigger events that are normally unknown). In this case, the system is not just a reactive system anymore. DTW possibly outputs negative provisioning times. These values could be interpreted as the above mentioned intelligent behaviours of workload foreseeing, external triggering or just a system’s overreaction. By the concept of cause-effect mappings these backward map- pings are not covered and therefore it cannot be said explicitly that negative provisioning times are meaningful and they should be interpreted very carefully. If a high negative provisioning time is in fact meaningful for an intelligent or external trigger elastic system, it would not result in observable worse response times, but in lower resource efficiency due to a lower utilisation rate. Whereas high positive values for provi- sioning times are linked to high utilisation of resources and slower performance/response times. Taking these thoughts into account, one could say that provisioning times are the better the smaller their absolute distance to zero is. A provisioning time of zero itself would be only possible in an synthetic idealised elastic system.

3.6. A Single-valued Elasticity Metric?

It is still an open question whether a single metric is possible and meaningful, that captures the three key aforementioned elasticity characteristics as in Fig. 3.4. One challenge of defining such a metric would be to embed state dependencies, as well as selecting a value range that would maintain the applicability of the metric to future, more powerful and more complex systems. A unified (single-valued) metric for elasticity of execution platforms could possibly be achieved by a weighted product of the three characteristics. To get intuitively under- standable values, elasticity could be measured and compared on the basis of percentage values. 0% resource elasticity would stand for no existing reconfiguration points, high or infinite provisioning times (which would mean just manual reconfiguration) and very high effects of resizing actions, e.g. new instantiations, when performance problems have already been discovered or reported by SLA monitoring. near 100% resource elasticity would hint towards a high density of reconfiguration points, small provisioning time and small effects. In the optimal case the virtualized system’s usage appears to be constant, while the number of users and resource demand vary. This could achieve optimal productivity concerning costs or energy consumption in a cloud environment. If such a metric can be established, elasticity of cloud platforms could easily be compared. A proactive system which implements intelligent resource provisioning techniques like de- scribed in [18, 19, 20, 21, 22] should exhibit better values of resource elasticity than a simpler reactive system which triggers resource requests or releases by events and which does not perform workload prediction or analysis. 4. Elasticity Benchmark for Thread Pools

4.1. Variable Workload Generation

Intelligently designed workloads are extremely important when trying to witness elastic effects. A constant workload for example will never produce elastic effects on its own, when the execution platform’s usage does not change by other workloads. In order to witness elastic effects workloads have to push the boundary of what already provisioned resources can offer them. Only then a drop in performance followed by an increase after resource addition will be visible. One important aspect of designing workloads for elasticity benchmarking is to understand in which way the targeted execution platform scales and the triggering events are released which then may lead to elasticity. Until now, existing benchmarks do not provide the functionality of workloads with the purpose of variable workloads specifically to force resource reallocation. It is explained in detail in J´oakim von Kistowski’s bachelor thesis [15] how we achieve flex- ible workloads for elasticity measurements and the set of parameters to design or influence such a workload.

4.2. Experiment Setup

The evaluation concept of resource elasticity can be applied to various kinds of resources, even to virtualized ones that are not mapped 1:1 to hardware resources. As a proof-of- concept, we researched the elasticity behaviour of Java thread pools in depth. Thread pools are an implementation of the pooling pattern, which is based on a collection (pool) of same-typed, interchangeable resources that are maintained continuously. In a resource pool, even when a given resource instance in the pool is not needed, it is not released immediately because it is assumed that the instance may be needed in the near future: the runtime costs of releasing and (later) re-acquiring that resource are significantly higher than the costs of keeping an idle instance. Resource pools enable resource reuse to minimize initialisation costs of new ones. The resources in a pool have be managed efficiently not to induce higher system management costs than initialisation and release would cost for the same amount of resources. After a certain time interval, the resource can be released when it hasn’t been used - this “disposal delay” can be often be set in the implementations of the pool pattern. Beyond “disposal delay”, further configuration 22 4. Elasticity Benchmark for Thread Pools

Figure 4.1.: Illustration of the Thread Pool Pattern [15] options are the minimum pool size (often called “core size”), the maximum pool size and the length of the queue that resides “in front of” the pool. Thread pools are heavily used in databases, application servers and other middleware applications handling many concurrent requests. Similar to thread pools, connection pools (in DBMS drivers, such as JDBC) implement the pooling pattern with the same rationale. For this experiment, we use the default thread pool implementation provided by the Java SE platform API, and the threads have to perform non-communicating, CPU-bound computation-only tasks in parallel. These task that form a configurable, multi-phased workload which is described in [15]. The tasks consist of carefully-defined Fibonacci com- putation, with randomly-chosen starting values (to prevent function inlining as constant values), and with evaluation of the computation result (to prevent dead code elimination). We are interested in situations with a “fully-busy” thread pool where the arrival of a new task and the finish of another task’s execution are temporally close, to see whether the pool immediately “overreacts” by allocating a new thread instance and accepting the task from the queue, rather than waiting for a (short) time in the hope that another task will be finished. In line with the elasticity metrics defined in Sec. 3.1, we measure the decreases / increases of the thread pool size (effects of reconfiguration and their distribution) and approximate the temporal aspects of the adaptation (delays, provisioning duration, etc.). We use the term “jump size distribution” in the measurement results for the distribution of reconfiguration effects. At the beginning of each measurement, a warmup period of the task and measurement methods is executed to avoid effects of Java Just-In-Time compiler (JIT) optimisations and method inlining. We permitted the benchmark to allocate up to 900 MB of Java heap memory to minimize interference, and used the standard thread pool implementation of the Java platform API, found in the java.util.concurrent package. The core size of the thread pool is not equal to the minimum or initial pool size: even if the core size is set to 2, the JVM may initialize the pool to have 1 active thread. The option of prestarting the core pool changes this. For any number of threads that are below the size of the core pool, no timeout feature for killing these threads is given in the default 4.2. Experiment Setup 23 implementation. The official Java platform API documentation explains that when “a new task is submitted [...], and fewer than corePoolSize threads are running, a new thread is created to handle the request, even if other worker threads are idle. If there are more than corePoolSize but less than maximumPoolSize threads running, a new thread will be created only if the queue is full” [23]. In the experiment implementation we use the core size of 1 and do not prestart the pool. Our tooling includes the functionality to calibrate the problem size of a single Fibonacci task so that it takes a certain duration (e.g. 50 ms) when running in isolation with no interruptions. The calibration allows us to run the same workload on several platforms with (approximately) same duration of the task. The workload is composed of tasks by specifying inter-task arrival times, as well as grouping tasks in batches (with separate specification of intra-batch and inter-batch arrival times). The measured workload is a configurable series of batches and is designed as follows: the initial batch size is 2 tasks, with inter-task wait time of 10 ms. After an inter-batch wait time of 40 ms, the second batch with three tasks and the same inter-task wait time is started. For each next batch, the number of tasks increases by one, the inter-batch wait time increases by 4 ms, and the inter-task wait time stays the same. The workload intensity reaches its peak when the batch contains 12 tasks, and afterwards decreases again. Each decrease is characterised by making the batch size less by one, and decreasing the inter- batch wait time by 4 ms. The last batch of the workload again contains two tasks. Recall that by design of the thread pool, the number of actively processed tasks is equal to or smaller than the number of the threads in the pool. The design and configuration of the 1 workload used for the following experiments is illustrated more detailed in J¨ı¿ 2 akim von Kistowki’s bachelor thesis [15]. We implemented a feeder class that passes the tasks in the discribed way to the thread pool executor. A Java thread pool executor from the java.util.concurrent package offers the use of three different kinds of queues: • direct queues that directly dispatch any tasks that is queued up. Since with this choice resources are allocated immediately there is no space for variations in elasticity. Thread resources are allocated as if they do not induce any system management overheads. • infinite queues have the ability to hold a theoretically endless amount of tasks. With this kind of queue a fixed number of threads (namely core size parameter) processes tasks concurrently. By using infinite queues for thread pool the reuse of the thread resources is forced to the maximum. The thread pool is on the other hand inhibited from showing any resource elasticity. • finite queues can hold a maximum fixed amount of tasks. When using a finite queue every task is queued (unless a core thread is available) and taken from the queue following a first come first served strategy (FCFS or FIFO) if an idle thread is available. If a task cannot be en-queued it is hold in an additional buffer and a new thread is requested ( trigger event) Tasks have to wait for variable durations to begin their execution, but elastic behaviour can be seen, using finite queues. The queue length is a property of the waiting queue the thread pool executor is initialized with. It has a direct influence on the elasticity values measured. The Java thread pool is set up with an array blocking queue from the java.util.concurrent package (which is a implementation of a finite queue). The queue length is varied between a length of 2 and 10 for different runs. The “disposal delay” of idle threads or also called stay alive time was set to 3 different values between 10ms and 250ms. These two parameters directly influence elasticity behaviour, whereas the core pool size of 2 threads and the maximum pool size 24 4. Elasticity Benchmark for Thread Pools of 100 is held constant due to no direct influence. Once the maximum thread pool size has been reached, if all instances in the pool are busy performing work and the queue is full, rejection of incoming tasks will occur. In our implementation, rejections of tasks are logged and taken into account when interpreting measurement results. For modern CPUs the maximum pool size of 100 should be big enough not to experience task rejections, if the CPU is not under load from other tasks. We measure and protocol task arrival times and task finish times using fine-granular wall- clock timer (selected in a platform-specific way using [24]). The high-resolution time provided by the sun.misc.Perf class which has an accuracy of 1 ns can be used on supported Java VMs (whereas this timer is supported in Oracle JDK 6, it cannot be used with in the IBM Java 6 64bit Virtual Machine running under z/OS). A separate“measurer” thread outside of the thread pool runs with the maximum priority, and records the state of the thread pool and the state of its queue. While the measurer runs in a loop without wait or sleep method calls, only changes in the thread pool state are recorded, keeping the logs small. One of the challenges consists in capturing the changes in the thread pool state - this requires a tight measurer loop and short (or no) intervals between measurements, while eliminating longer pauses (e.g. those caused by the Java Garbage Collection). When measuring, we made sure that our benchmark was the single major performance demanding workload during that time periods by having just a small base load on the machine. We observed a platform specific behaviour of the method T hread.sleep(ms, ns) from the java.util.concurrent package: While in a JVM running on Mac OS X the thread woke up quite exact in the specified time of ns (below 1ms), the same code executed in a JVM on a system doesn’t meet the specified time. Values below 1ms could not be achieved, and therefore were not short enough for our measurements. For any JVM on a Windows system we decided to use a busy waiting approach that polls for changes as fast as possible for achieving the most accurate measurements.

4.3. Extraction of Elasticity Metrics

After every experiment run, we extract the aforementioned elasticity metrics from the measurement log file. 2nd order values can directly be calculated. All resource resizings are counted and set into relation to the absolute number of changes in workload intensity with is the “resizing ratio”. Every occurring jump size (differentiated by jump direction up or down) is accumulated to finally have a distribution of jump sizes. For visualisation of measurement data, we use JFreeChart to plot x-y-charts and his- tograms. The plotted x-y-charts show the course of selected discrete values on the time line in the unit of milliseconds. The individual measurements (“dots”) have been connected by a line to improve readability, even though the measurements are of course non-continuous. In the first charts of each experiment run the number of tasks in the executor and the corresponding resource amount are plotted. This plot is intuitively understandable and illustrates the elasticity behaviour. The second plot shows the difference between tasks and provided resources to get an impression of the area between the time lines of tasks numbers and resource entities. Values above 0 occur when resources cover the demand of workload at that point in time, whereas values below 0 hint towards a congestion in resources. This plot visualizes the speed of changes from up to down scaling or the other way around (frequency of cutting the x-axis) as well as the system’s characteristic, if it tends to save resources or more generously provides new ones (amplitude and symmetry of deflections). jFreeChart histograms are plotted for the distribution of waiting times and response times with a bin size of 10ms. 4.3. Extraction of Elasticity Metrics 25

To extract approximations for provisioning times we use a Java implementation of the fDTW algorithm provided by the authors of [17] at a GoogleCode server: “http://code.google.com/p/fastdtw/” in Version 1.0.1 from February 2011. We pass two time series to the fDTW algorithms time series constructor that just contain change events in resource amount or concurrent tasks in executor. As were are using a fast DTW algorithm, we calculate a suitable DTW radius as the maximum distance between resources and task entities. The algorithm then outputs the calculated DTW distance, which was defined as the minimum sum of distances between the mappings. Since this metrics captures the similarity of 2 time series, we get another metrics that quantifies elasticity. Due to the fact that by workload design we know the constant numbers of workload intensity changes, the overall count of resource resizings has to be smaller than or equal to the workload intensity changes. Therefore the number of summands of the minimum sum is equal even for runs on different platforms. This fact helps the DTW distance metric to be portable for our concerns. fDTW also outputs the warp path, which is the mapping of measurement point. The every mapping is a vector [x, y] with the value of x as the xth measurement point in the first time series and accordingly the yth measurement point in the second time series. We look up the time stamps that belong to these indexes and calculate the differences. For every mapping output by fDTW we obtain a single approximate time for a provisioning action. These provisioning times are then plotted as a jFreeChart Histogram again with a bin size of 10 ms. For better illustration of the mapping results, we pass the time series additionally to a DTW algorithm which is implemented in the statistics tool R and independent from the used Java implementation. To plot them we use the following R script:

library(dtw)

tasks <- read.csv(file="[Path]dtwInTasks[dateTime].csv", header=TRUE, sep= ";")

threads <- read.csv(file="[Path]dtwInThreads[dateTime].csv", header=TRUE, sep= ";")

thseries <- threads$X2_newTPSize

taseries <- tasks$X2_taskChange

plot( dtw(taseries, thseries, k = TRUE),

type = "two", off = 100, match.lty = 2,

match.indices = 150, xlab="count of changes",

ylab="Resource amount / WL intensity", main="DTW")

dtwPlotThreeWay( dtw(taseries, thseries, k = TRUE), xts=NULL, yts=NULL,

type.align="l", type.ts="l", match.indices=NULL, margin=4,

inner.margin=0.2, title.margin=1.5, xlab="Workload Intensity",

ylab="Provided Resources", main="DTW time series alignment")

This script reads in the simplified and prepared time series and plots DTW results in a way as already explained in the DTW section. 26 4. Elasticity Benchmark for Thread Pools

4.4. Experiments and Platforms

The Java thread pool experiment has been conducted on the following to platforms:

1st Platform: z/OS - IBM z10 BC mainframe server:

• CPU: 5 CPs @3,5 GHz non-dedicated • OS: IBM z/OS 1.12 running in a logical partition (LPAR) • RAM: 5120 MB • JAVA: IBM JRE6 64bit

2nd Platform: Windows 7 - Sony z Series VPCZ1:

• CPU: Core i7 M620 2 Core processor @2,67 GHz featuring Hyper-Threading to virtualize 4 cores and over-boost of frequency up to 3,2 GHz • OS: Microsoft Windows 7 Ultimate SP1 64bit • RAM: 8 GB • JAVA: Oracle JRE 6.26

For each platform the experiment has been run in 5 different configurations of the thread pool executor:

Experiments 1-5 and Configuration

• 1st: queue size 2 - stay alive time 10 ms • 2nd: queue size 2 - stay alive time 250 ms • 3rd: queue size 4 - stay alive time 50 ms • 4th: queue size 10 - stay alive time 10 ms • 5th: queue size 10 - stay alive time 250 ms 4.5. Results and Elasticity Metric Illustrations 27

4.5. Results and Elasticity Metric Illustrations

In this section, the most significant results of measuring elasticity of the thread pool resource are presented. On the following pages experiments 1,3 and 5 on platform 1 can be found. Experiment 2 and 4 as well as experiments 1-5 for the second platform can be found in the appendix.

4.5.1. Experiment 1 - Platform 1 • Executor configuration: – Queue length 2 – Stay alive time 10 ms

• Total experiment duration: 2800ms • Maximum thread pool size: 21 • resizing ratio: 67/239 = 28% as density of reconfigurations • Mean DTW approximated provisioning time: 15ms • Jump sizes as distribution of reconfiguration effects: – 1 up: 37 times – 1 down: 21 times – 2 down: 7 times – 3 down: 1 time • fDTW distance: 156 28 4. Elasticity Benchmark for Thread Pools 2,900 2,800 2,700 2,600 2,500 2,400 2,300 2,200 2,100 2,000 1,900 1,800 1,700 1,600 8_tasks_in_executor 1,500 1,400 wall-clock time in milliseconds 1,300 4_threads_pool_size_at_runtime 1,200 1,100 thread pool size vs. number of tasks in executor 1,000 900 800 700 600 500 400 300 200 100 0

9 8 7 6 5 4 3 2 1 0

24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 thread pool size vs. number of tasks in executor in tasks of number vs. size pool thread

Figure 4.2.: Experiment 1 Platform 1: Thread Pool Size vs. Workload Intensity 4.5. Results and Elasticity Metric Illustrations 29

difference between tasks in executor and thread pool size 7

6

5

4

3

2

1

0

-1 difference between tasks in executor and thread pool size

-2

-3

-4

-5 0 100 200 300 400 500 600 700 800 900 1,000 1,100 1,200 1,300 1,400 1,500 1,600 1,700 1,800 1,900 2,000 2,100 2,200 2,300 2,400 2,500 2,600 2,700 2,800 2,900 wall-clock time in milliseconds

10_difference_4-8

Figure 4.3.: Experiment 1 Platform 1: Difference between Thread Pool Size and Workload Intensity

responseTime [ms]

13.5

13.0

12.5

12.0

11.5

11.0

10.5

10.0

9.5

9.0

8.5

8.0

7.5

7.0

6.5 frequency

6.0

5.5

5.0

4.5

4.0

3.5

3.0

2.5

2.0

1.5

1.0

0.5

0.0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 responseTime [ms]

responseTime

Figure 4.4.: Experiment 1 Platform 1: Response Time Histogram 30 4. Elasticity Benchmark for Thread Pools

waitTime [ms]

70.0

67.5

65.0

62.5

60.0

57.5

55.0

52.5

50.0

47.5

45.0

42.5

40.0

37.5

35.0 frequency 32.5

30.0

27.5

25.0

22.5

20.0

17.5

15.0

12.5

10.0

7.5

5.0

2.5

0.0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 waitTime [ms]

waitTime

Figure 4.5.: Experiment 1 Platform 1: Wait Time Histogram

fDTW provisioning time [ms]

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15 frequency

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0 -100 -75 -50 -25 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 fDTW provisioning time [ms]

3_DTW_provisioningtime

Figure 4.6.: Experiment 1 Platform 1: fDTW Provisioning Time Approximation Histogram 4.5. Results and Elasticity Metric Illustrations 31

DTW 25 15 5 0 Resource amount / WL intensity 25 15 5 0

0 50 100 150 200

count of changes

Figure 4.7.: Experiment 1 Platform 1: R DTW Mapping with the lower line as workload intensity as cause for changes in resource amount (upper line)

DTW time series alignment 20 15 10 5 0 60 50 40 d$index2 30 Provided Resources Provided 20 10 0 20 yts 15 d$index1 xts 10 5 0

0 50 100 150 200 Workload Intensity

Figure 4.8.: Experiment 1 Platform 1: R DTW Cost Matrix 32 4. Elasticity Benchmark for Thread Pools

4.5.2. Experiment 3 - Platform 1 • Executor configuration: – Queue length 4 – Stay alive time 50ms

• Total experiment duration: 2840ms • Maximum thread pool size: 22 • resizing ratio: 40/245 = 16% as density of reconfigurations • DTW approximated provisioning times: peaks at 50ms, 130ms, 210ms • Jump sizes as distribution of reconfiguration effects: – 1 up: 21 times – 1 down: 16 times – 3 down: 2 time • fDTW distance: 335 4.5. Results and Elasticity Metric Illustrations 33 2,900 2,800 2,700 2,600 2,500 2,400 2,300 2,200 2,100 2,000 1,900 1,800 1,700 1,600 8_tasks_in_executor 1,500 1,400 wall-clock time in milliseconds 1,300 4_threads_pool_size_at_runtime 1,200 1,100 thread pool size vs. number of tasks in executor 1,000 900 800 700 600 500 400 300 200 100 0

9 8 7 6 5 4 3 2 1 0

27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 thread pool size vs. number of tasks in executor in tasks of number vs. size pool thread

Figure 4.9.: Experiment 3 Platform 1: Thread Pool Size vs. Workload Intensity 34 4. Elasticity Benchmark for Thread Pools

difference between tasks in executor and thread pool size 9

8

7

6

5

4

3

2

1

0

-1 difference between tasks in executor and thread pool size

-2

-3

-4

-5

-6 0 100 200 300 400 500 600 700 800 900 1,000 1,100 1,200 1,300 1,400 1,500 1,600 1,700 1,800 1,900 2,000 2,100 2,200 2,300 2,400 2,500 2,600 2,700 2,800 2,900 wall-clock time in milliseconds

10_difference_4-8

Figure 4.10.: Experiment 3 Platform 1: Difference between Thread Pool Size and Workload Intensity

responseTime [ms]

13.5

13.0

12.5

12.0

11.5

11.0

10.5

10.0

9.5

9.0

8.5

8.0

7.5

7.0

6.5 frequency

6.0

5.5

5.0

4.5

4.0

3.5

3.0

2.5

2.0

1.5

1.0

0.5

0.0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 responseTime [ms]

responseTime

Figure 4.11.: Experiment 3 Platform 1: Response Time Histogram 4.5. Results and Elasticity Metric Illustrations 35

waitTime [ms] 62.5

60.0

57.5

55.0

52.5

50.0

47.5

45.0

42.5

40.0

37.5

35.0

32.5

30.0 frequency

27.5

25.0

22.5

20.0

17.5

15.0

12.5

10.0

7.5

5.0

2.5

0.0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 waitTime [ms]

waitTime

Figure 4.12.: Experiment 3 Platform 1: Wait Time Histogram

fDTW provisioning time [ms]

17.5

17.0

16.5

16.0

15.5

15.0

14.5

14.0

13.5

13.0

12.5

12.0

11.5

11.0

10.5

10.0

9.5

9.0

8.5 frequency 8.0

7.5

7.0

6.5

6.0

5.5

5.0

4.5

4.0

3.5

3.0

2.5

2.0

1.5

1.0

0.5

0.0 -25 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 550 575 600 fDTW provisioning time [ms]

3_DTW_provisioningtime

Figure 4.13.: Experiment 3 Platform 1: fDTW Provisioning Time Approximation Histogram 36 4. Elasticity Benchmark for Thread Pools

DTW 25 15 5 0 Resource amount / WL intensity 25 15 5 0

0 50 100 150 200 250

count of changes

Figure 4.14.: Experiment 3 Platform 1: R DTW Mapping with the lower line as workload intensity as cause for changes in resource amount (upper line)

DTW time series alignment 20 15 10 5 0 40 30 20 d$index2 Provided Resources Provided 10 0

yts 20 d$index1 xts 10 5 0

0 50 100 150 200 250 Workload Intensity

Figure 4.15.: Experiment 3 Platform 1: R DTW Cost Matrix 4.5. Results and Elasticity Metric Illustrations 37

4.5.3. Experiment 5 - Platform 1 • Executor configuration: – Queue length 10 – Stay alive time 250ms

• Total experiment duration: 3050ms • Maximum thread pool size: 20 • resizing ratio: 33/248 = 13% as density of reconfigurations • Mean DTW approximated provisioning time: 250ms • Jump sizes as distribution of reconfiguration effects: – 1 up: 19 times – 1 down: 18 times – 2 down: 1 time • fDTW distance: 4280 38 4. Elasticity Benchmark for Thread Pools 3,100 3,000 2,900 2,800 2,700 2,600 2,500 2,400 2,300 2,200 2,100 2,000 1,900 1,800 1,700 8_tasks_in_executor 1,600 1,500 wall-clock time in milliseconds 1,400 4_threads_pool_size_at_runtime 1,300 1,200 thread pool size vs. number of tasks in executor 1,100 1,000 900 800 700 600 500 400 300 200 100 0

9 8 7 6 5 4 3 2 1 0

32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 thread pool size vs. number of tasks in executor in tasks of number vs. size pool thread

Figure 4.16.: Experiment 5 Platform 1: Thread Pool Size vs. Workload Intensity 4.5. Results and Elasticity Metric Illustrations 39

difference between tasks in executor and thread pool size 22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

-1

-2 difference between tasks in executor and thread pool size

-3

-4

-5

-6

-7

-8

-9

-10

-11

-12 0 100 200 300 400 500 600 700 800 900 1,000 1,100 1,200 1,300 1,400 1,500 1,600 1,700 1,800 1,900 2,000 2,100 2,200 2,300 2,400 2,500 2,600 2,700 2,800 2,900 3,000 3,100 wall-clock time in milliseconds

10_difference_4-8

Figure 4.17.: Experiment 5 Platform 1: Difference between Thread Pool Size and Workload Intensity

responseTime [ms]

13.5

13.0

12.5

12.0

11.5

11.0

10.5

10.0

9.5

9.0

8.5

8.0

7.5

7.0

6.5 frequency

6.0

5.5

5.0

4.5

4.0

3.5

3.0

2.5

2.0

1.5

1.0

0.5

0.0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 450 460 470 480 490 500 510 responseTime [ms]

responseTime

Figure 4.18.: Experiment 5 Platform 1: Response Time Histogram 40 4. Elasticity Benchmark for Thread Pools

waitTime [ms]

18.5

18.0

17.5

17.0

16.5

16.0

15.5

15.0

14.5

14.0

13.5

13.0

12.5

12.0

11.5

11.0

10.5

10.0

9.5

9.0 frequency 8.5

8.0

7.5

7.0

6.5

6.0

5.5

5.0

4.5

4.0

3.5

3.0

2.5

2.0

1.5

1.0

0.5

0.0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 waitTime [ms]

waitTime

Figure 4.19.: Experiment 5 Platform 1: Wait Time Histogram

fDTW provisioning time [ms]

18.5

18.0

17.5

17.0

16.5

16.0

15.5

15.0

14.5

14.0

13.5

13.0

12.5

12.0

11.5

11.0

10.5

10.0

9.5

9.0 frequency 8.5

8.0

7.5

7.0

6.5

6.0

5.5

5.0

4.5

4.0

3.5

3.0

2.5

2.0

1.5

1.0

0.5

0.0 -600 -550 -500 -450 -400 -350 -300 -250 -200 -150 -100 -50 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1,000 fDTW provisioning time [ms]

3_DTW_provisioningtime

Figure 4.20.: Experiment 5 Platform 1: fDTW Provisioning Time Approximation Histogram 4.5. Results and Elasticity Metric Illustrations 41

DTW 15 5 0 Resource amount / WL intensity 25 15 5 0

0 50 100 150 200 250

count of changes

Figure 4.21.: Experiment 5 Platform 1: R DTW Mapping with the lower line as workload intensity as cause for changes in resource amount (upper line)

DTW time series alignment 20 15 10 5 0 30 20 d$index2 Provided Resources Provided 10 0

yts 25 d$index1 15 xts 5 0

0 50 100 150 200 250 Workload Intensity

Figure 4.22.: Experiment 5 Platform 1: R DTW Cost Matrix 42 4. Elasticity Benchmark for Thread Pools

4.6. Observations and Experiment Result Discussion Experiment 1 is configured to have the best elasticity among the experiments, experiment 2 is configured to lay in the middle whereas experiment 3 has the longest waiting queue and stay alive time. Note that in all experiments, the workload is the same. The maximum reached number of threads is 21 +/-1 on platform 1 for all configurations. Another value that stays almost equal within the different runs, is the total time needed for the amount of work. (The small differences in experiment run time are due to the different thread stay alive times, the work itself finished every time at 2800 seconds, due to a constant level of physical concurrency of cores.) This could imply, that the absolute amount of resources needed to cover the demand is independent from elasticity behaviour. The jump size distribution appears to be almost equal within the experiments 1,3 and 5. On the 2nd platform they behave differently. The distribution of jump sizes can characterize the system’s intrinsic scaling accuracy. It cannot be assured that bigger jumps in resource amounts are caused by measurement delays. The ratio of resizings would be in the hypothetical optimal case 100% and in the non- elastic case 0%. For the three experiments it starts in the best (first) case with 28%, 16% for experiment 3 and 13% for experiment 5. This ratio correctly quantifies the density of resizings of a system for that workload, with is a crucial point of elasticity behaviour. The fDTW distance as the minimum sum of distances between the two time series con- stantly grows for worse elasticity. I would propose the fDTW distance metric as a unified relative elasticity metric. The fDTW distance depends mainly on the workload. For sys- tems that are able to run the same workload and therefore are comparable, the fDTW distance could be the mean to quantify the relative difference of elasticity. If we compare the first “Thread Pool Size vs. Workload Intensity” charts or directly fol- lowing charts presented secondly for each experiment that capture the difference between the time series, we intuitively get the impression of the size and number of areas between resource and demand amounts. In the more elastic cases the difference is quickly alter- nating, whereas in the 5th configurations, showing worse elasticity, we see two areas: the first when the resources cannot cover the demand, and the second when too many resource entities are provided. The response and wait time histograms let us compare the mean and variance response times of workload elements. In experiment 1 the mean response time is 220ms (for a 50 ms task). Response times between 80ms and 350ms occur. Most tasks had to wait for 10ms or less until they start in experiment 3, whereas the mean response time is around 240ms. The interval spreads from 80ms to 400ms. The distribution shows a slightly higher variance than for experiment 1. Most tasks still had to wait for 10ms or less, but a higher number of tasks had to wait longer than in experiment 1. In experiment 5 we see the response time mean is about 380ms (!) whereas the interval spreads from 100ms to 550ms. The distribution shows a higher variance due to the fact that a higher number of tasks had to wait up to 275ms until they where scheduled. By comparing these the wait and response time histograms of each experiment, we see that the elasticity has direct influence on response times of small tasks, but not on the overall time consumption of the workload. The tasks are just scheduled in a better way. Comparing the fDTW Provisioning Time Approximation Histograms we can validate the approximation approach by the fDTW algorithm, since we know what un-provisioning time we had configured. In the first experiment we see a significant peak from 0ms to 25ms, the configured stay alive time was 10ms and a short queue. For experiment 3 the fDTW provisioning time histogram show 3 peaks, the first at 50ms, which was the 4.6. Observations and Experiment Result Discussion 43 configured value for stay alive time. The two following peaks at 130ms and 200ms are results of the longer queue. For experiment 3 we see the peak at 380ms due to the queue length of 10. We find a smaller peak at 250ms (due to the smaller number of only 20 downsizings events). In the other experiments having a short queue and a relatively long stay alive time we get a higher variance, but still peaks at the configured values. This indirect approach seems to work already properly in approximation of provisioning times. It could be further optimized by a distinction between upsizing mappings and downsizing mappings (by comparison of values at that points). Due to the fact, that we take workload change events and not the trigger events directly (in the thread pool case it would be a queue full event), we see a higher variation in provisioning time approximations. But for validation of the approximation approach, we did not want to assume this intrinsic implementation knowledge. When comparing the DTW Time Series Alignments with the statistics tool R, we directly see that in the experiment 1, the alignment line deviates only slightly from the optimal diagonal line. For experiment 5 a significantly higher deviation from the diagonal line can be detected. The experiment results on platform 2 (see appendix) show higher variances due to the processor architecture with uses frequency over-boost and hyper-threading. The longer experiment run times are due to the lower level of physical concurrency. To conclude the validation section, it can be said that the proposed elasticity metrics characterize the system’s elasticity behaviour in a plausible way. Future work can be done to bring more portability and comparability into these values. The fDTW distance metric could serve as a unified relative metric that captures elasticity and enables to build an order of elastic systems, that can execute the same workloads.

5. Elasticity Benchmark for Scaling Up of z/VM Virtual Machines

The concepts of the proposed elasticity metrics have been validated by the thread pool experiment. In addition we setup experiments that exercise and measure elasticity of resources other than threads. Be aware that if several virtual machines are running on the same physical hardware, these machines compete against the other virtual machines for the physical resources. A very important aspect in the cloud computing context is the performance isolation of competing virtual machines, that do not belong to each other, which would mean they are not in the same performance relevant cluster and are hosted for different customers. Performance isolation between virtual machines and elastic behaviour seem to be contradictory by concept. For future work it has to be researched if elasticity and performance isolation can be combined in a transparent way. Approaches solving this problem could be of very high use to cloud providers and clients. One example focusing on IBM Workload Manager (WLM) application can be found in [6]: When a low-priority process p1 is working alone on the execution platform, it can consume (almost) all available resources. Yet when a process p2 with a higher priority begins execution and demands resources currently used by process p1, the execution platform will re-assign resources to p2. Thus, an elasticity benchmark needs to explore the interplay between processes with different priorities. Additionally, as the resource demands of p2 rise, p1 will also have to release more and more of its resources to be assigned to p2. In this case we can witness elastic effects as p2’s service demand and performance rise at the same time when new resources are provided to p2. In this chapter I present an experiment, which scales up a virtual machine in CPU time slice resources squeezing out another low priority virtual machine. 46 5. Elasticity Benchmark for Scaling Up of z/VM Virtual Machines

5.1. Experiment Setup

We run two virtual machines which are “Suse Enterprise Servers 11 for System z” within an IBM z/VM instance. The z/VM itself runs in a logical partition (LPAR) on a Z10 BC mainframe. The LPAR has three dedicated IFL processors (which means it is the only one that can make use of them) and 8GB of RAM. Both Linux virtual machines have 3 virtual cores assigned, so that they can make full use of the 3 provided IFLs when running alone in the z/VM instance. By explicit dedication of the IFL processors, we minimize the impact of the first virtual- ization layer, namely the LPAR. The physical resources of the LPAR are directly mapped to the z/VM. The z/VM hypervisor distributes time slices of each IFL processor to the z/VM guests. The z/VM is set up having relative shares, no hard limits or absolute shares. Relative shares are not meant to provide performance isolation. They regulate the resource access in case of hardware contention. We dynamically adapt these relative CPU shares of the two z/VM guest by using the z/VM module “Virtual Machine Resource Manager (VMRM)” which is setup with simple rules defined by us via a configuration file. VMRM can change the relative shares every 60 seconds. VM 1 has a very low priority and runs a constant workload, that completely consumes processing time of all three provided 3 cores in case of no contention. VM 2 is set up with a high priority and starts a peak-shaped workload while VM 1 is already running its constant workload for several minutes consuming all provided CPU resources. We expect to see the VM 1 squeezed out gradually as the VM 2 starts its peak-shaped workload. This can be seen in the relative number of time slices VM 1 is obtaining. The constant workload running in VM 1 consists of small tasks having a duration of 10ms. By observing the response times of these small tasks, we calculate the CPU time, that VM 1 is forced by the hypervisor to give away to VM 2.

Figure 5.1.: Illustration of Virtual Machine Scale Up Experiment Setup [15] 5.2. Workload Generation 47

5.2. Workload Generation

The workload is generated by the Ginpex measurement framework. More details on that can be found in the J´oakim von Kistowski’s bachelor thesis [15].

Figure 5.2.: Workload Design for Virtual Machine Scale Up Experiment [15]

5.3. Results

The results in Fig. 5.3 clearly show that VM 1 almost immediately gives away its processing resources, when the peak workload increases. The 10ms tasks in VM 1 have response times of about 400ms, when VM 2 is at its peak. VM 1 regains their resources and shows again response times of 10 ms after VM 2 finished its workload. We were not yet able to apply elasticity metrics to this behaviour. As it is shown in [25] relative shares setting tend to penalize low priority VM guests. Our experiment and the platform setup aimed towards almost squeezing out the low priority guest. We observed that this guest recovered quickly after the contention dispersed. 48 5. Elasticity Benchmark for Scaling Up of z/VM Virtual Machines 1.050 1.025 1.000 975 950 925 900 875 850 825 800 775 750 725 700 675 650 625 600 575 550 sliceAverage 525 500 #tasks Wall-clock time in seconds 475 450 Time Slice Length vs. Task Number 425 400 375 350 325 300 275 250 225 200 175 150 125 100 75 50 25 0

4 3 2 1 0 Time Slice Length vs. Task Number Task vs. Length Slice Time

Figure 5.3.: Time Slice Average vs. Workload Intensity 6. Future Work

6.1. Scale Up Experiment using CPU Cores as Resource

The linux kernel brings the feature to disable and enable installed virtual cores dynamically and online without a reboot. For this experiment just a single virtual machine may run within the z/VM instance. This virtual machine has the number of assigned cores to the z/VM installed, but only one core is set to be online. To build an elastic system for dynamic CPU core scaling, we need a monitoring module that observes the CPU resource usage. Another module that implements simple rules which define when to scale up and when to scale down. If one of these rules comes true a trigger event is thrown. This event then should be processed by a third module that is able to change the state of the installed cores from online to offline and the other way around. Having these modules implemented, a peak-shaped workload could exercise elasticity of CPU cores as resource. Having the measurement results, elasticity metrics can be extracted and possibly compared between different Linux distribution and environments.

6.2. Elasticity Benchmark for Scaling out the Number of z/VM Virtual Machines Instances in a Performance Group

This experiment is supposed to measure the time from request until finally holding a virtual machine instance. Scaling out means adding virtual machine instances that are identical in resources and configuration to a cluster where a load-balancer distributes incoming requests. Having the “Extreme Cloud Administration Toolkit (xCAT)” installed on an administra- tion machine with access to the z/VM instance enables us to clone a virtual machine by a single command from the xCAT. xCAT is open source and available at “http://xcat.sourceforge.net/”. The z/VM guest to be cloned is then shut down, cloned and started again. Within the VM clone the hosts and MAC address is adapted and then started. The duration of this cloning process depends on the size of the virtual machine to be cloned and underlying hardware features. For example one can make use of the IBM FlashCopy feature the Z10 BC has build in and activated. By having such acceleration features the duration for cloning a virtual machine can be minimized. 50 6. Future Work

As in the scale up experiment of CPU cores it would be necessary to have a monitoring module implemented. This monitoring could for example by done by a Ganglia monitor. Secondly a rule based system has to be implemented that triggers an event and initializes a cloning or shutdown of a virtual machine. In difference to the scale up experiment of CPU cores, where the Linux scheduler distributes the workload tasks to the resources, we need to enable the workload and measurement side to be able to balance the workload between all available virtual machines. Running workload tasks may not be interrupted by a cloning or stopping of a virtual machine. Having this scale out experiment implemented, one would be able to exercise the elasticity of virtual machines as resources. The provisioning time, which is the time the system need to clone a VM, may be interesting when comparing cloud infrastructure providers. 7. Conclusion

This study thesis outlined the idea of scalability, defined the term of elasticity as well as elasticity characteristics. Conceptual ways to measure and approximate metrics that quantify elasticity have been discussed. Later on these concepts are validated by the extensive experiment on Java thread pools, exercising the virtual resource of threads. All presented elasticity metrics have been extracted in this experiment in a consistent way. Values for provisioning times have been approximated by indirect measurement and DTW algorithms. I proposed the DTW distance metric as a suitable approach for a unified relative elasticity metric enabling comparison between and ordering of platforms, that are able to run the same workloads. In following experiment, we exercised the resource of CPU time slices and saw how a hypervisor can dynamically distribute computation time between high and low priority guests, when having a hardware contention. For future work and further validation of the presented concepts, I shortly outlined an experiment setup that exercises the elasticity of CPU cores and another experiment setup for varying the amount virtual machine instances that belong to a performance relevant cluster and share the same workloads.

8. Acknowledgement

I would like to thank all people that helped making this study theses research work possible.

• My advisor Dr. Michael Kuperberg for outstandig supervision - endless ideas, energy, uncountable hours of time and helpful feedback. • My research college J´oakim von Kistowksi who wrote his bachelor thesis in a closely related research field for the all time perfect collaboration • My second advisor Nikolaus Huber for his feedback and time. • Michael Hauck, Philipp Kern and Philipp Merkle for help at implementation and configuration. • IBM Research and Development Lab in B¨oblingen for making the IIC possible and especially Uwe Denneler, Elisabeth Puritscher and Erich Amrehn for their technical support and advice.

Bibliography

[1] Kuperberg, M., Herbst, N.R., von Kistowski, J.G., Reussner, R.: Defining and Quan- tifying Elasticity of Resources in Cloud Computing and Scalable Platforms. Technical report, Informatics Innovation Center Karlsruhe Institute of Technology Karlsruhe, Germany (June 2011) [2] Cafaro, M., Aloisio, G.: Grids, Clouds and Virtualization. Computer Communications and Networks. Springer-Verlag London Limited, London (2011) [3] Baun, Christian; Kunze, M..N.J..T.S.: Cloud Computing : Web-basierte dynamis- che IT-Services. Informatik im Fokus SpringerLink: Bucher.¨ Springer-Verlag Berlin Heidelberg, Berlin, Heidelberg (2011) [4] Smith, J., Nair, R.: Virtual machines : Versatile Platforms for Systems and Processes. Morgan Kaufmann, San Francisco, CA (2005) [5] Spruth, W.G.: System z and z/OS unique Characteristics. Technical report, Uni- versit¨at Tubingen¨ (2010) http://tobias-lib.uni-tuebingen.de/volltexte/2010/4710, last consulted on May 23rd, 2011. [6] Teuffel, M., Vaupel, R.: Das Betriebssystem z/OS und die zSeries. Oldenbourg Wissenschaftsverlag GmbH (2004) [7] Jogalekar, P., Woodside, M.: Evaluating the Scalability of Distributed Systems. IEEE Trans. Parallel Distrib. Syst. 11 (June 2000) 589–603 [8] Gartner: Five Refining Attributes of Public and Private Cloud Computing (2009) [9] Amazon Web Services: Auto Scaling Developer Guide (API Version 2010-08-01). (2010) http://docs.amazonwebservices.com/AutoScaling/latest/DeveloperGuide/. [10] Chiu, D.: Elasticity in the cloud. Crossroads 16 (March 2010) 3–4 [11] Ciceres, J., Vaquero, L.M., Rodero-Merino, L., Polo, A., Hierro, J.J.: Service Scalabil- ity Over the Cloud. In Furht, B., Escalante, A., eds.: Handbook of Cloud Computing. Springer US (2010) 357–377 [12] Woodside, M.: Scalability Metrics and Analysis of Mobile Agent Systems. In Wagner, T., Rana, O., eds.: Infrastructure for Agents, Multi-Agent Systems, and Scalable Multi-Agent Systems. Volume 1887 of Lecture Notes in Computer Science. Springer Berlin / Heidelberg (2001) 234–245 [13] IBM: IBM CloudBurst V2.1 on Power Systems helps IT organi- zations rapidly provide dynamic cloud services (2010) http://www- 03.ibm.com/systems/power/solutions/cloud/cloudburst/, last consulted on May 23rd, 2011. [14] Lee, J.B., Ware, B.: Open source Web development with LAMP : using Linux, Apache, MySQL, Perl, and PHP. Addison-Wesley, Boston, Mass. (2003) 56 Bibliography

[15] von Kistowski, J.G.: Defining and Measuring Workloads for Elasticity Benchmarking. Bachelor Thesis at Informatics Innovation Center Karlsruhe, Karlsruhe Institute of Technology (August 2011) [16] Keogh, E.J., Pazzani, M.J.: Scaling up Dynamic Time Warping to Massive Dataset. In: Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery. PKDD ’99, London, UK, Springer-Verlag (1999) 1–11 [17] Salvador, S., Chan, P.: Toward accurate Dynamic Time Warping in linear Time and Space. Intell. Data Anal. 11 (October 2007) 561–580 [18] Chapman, C., Emmerich, W., M´arquez, F.G., Clayman, S., Galis, A.: Software Architecture Definition for on-demand Cloud Provisioning. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. HPDC ’10, New York, NY, USA, ACM (2010) 61–72 [19] Hu, Y., Wong, J., Iszlai, G., Litoiu, M.: Resource Provisioning for Cloud Comput- ing. In: Proceedings of the 2009 Conference of the Center for Advanced Studies on Collaborative Research. CASCON ’09, New York, NY, USA, ACM (2009) 101–111 [20] Kim, H., Kim, W., Kim, Y.: Predictable Cloud Provisioning Using Analysis of User Resource Usage Patterns in Virtualized Environment. In: Grid and Distributed Computing, Control and Automation. Volume 121 of Communications in Computer and Information Science. Springer Berlin Heidelberg (2010) 84–94 [21] Zhu, Q., Agrawal, G.: Resource Provisioning with Budget Constraints for Adaptive Applications in Cloud Environments. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. HPDC ’10, New York, NY, USA, ACM (2010) 304–307 [22] Meng, X., Isci, C., Kephart, J., Zhang, L., Bouillet, E., Pendarakis, D.: Efficient Resource Provisioning in Compute Clouds via VM Multiplexing. In: Proceeding of the 7th international conference on Autonomic computing. ICAC ’10, New York, NY, USA, ACM (2010) 11–20 [23] Oracle: Documentation of the class java.util.concurrent.ThreadPoolExecutor (2010) http://download.oracle.com/-javase/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html; last consulted on May 23rd, 2011. [24] Kuperberg, M., Krogmann, M., Reussner, R.: TimerMeter: Quantifying Accuracy of Software Times for System Analysis. In: Proceedings of the 6th International Conference on Quantitative Evaluation of SysTems (QEST) 2009. (2009) [25] van der Heij, R.: Why Relative Share Does Not Work. http://www.velocitysoftware.com/relshare.pdf (March 2010) Appendix

A. Experiment 2 and 4 on Platform 1 A.1. Experiment 2 - Platform 1 • executor configuration: queue length 2, stay alive time 250 ms • experiment duration: 3050 ms • resizing ratio: 42/234 = 18% • number of threads max reached: 21 • jump size distribution: 1: 20 times UP and 21 times DOWN • fDTW distance: 240

thread pool size vs. number of tasks in executor 25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

thread pool size vs. number of tasks in executor 8

7

6

5

4

3

2

1

0 0 100 200 300 400 500 600 700 800 900 1,000 1,100 1,200 1,300 1,400 1,500 1,600 1,700 1,800 1,900 2,000 2,100 2,200 2,300 2,400 2,500 2,600 2,700 2,800 2,900 3,000 3,100 3,200 wall-clock time in milliseconds

4_threads_pool_size_at_runtime 8_tasks_in_executor

Figure A.1.: Experiment 2 Platform 1: Thread Pool Size vs. Workload Intensity 58 Appendix

difference between tasks in executor and thread pool size 19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3 difference between tasks in executor and thread pool size

2

1

0

-1

-2

-3

-4 0 100 200 300 400 500 600 700 800 900 1,000 1,100 1,200 1,300 1,400 1,500 1,600 1,700 1,800 1,900 2,000 2,100 2,200 2,300 2,400 2,500 2,600 2,700 2,800 2,900 3,000 3,100 3,200 wall-clock time in milliseconds

10_difference_4-8

Figure A.2.: Experiment 2 Platform 1: Difference between Thread Pool Size and Workload Intensity

responseTime [ms]

17.5

17.0

16.5

16.0

15.5

15.0

14.5

14.0

13.5

13.0

12.5

12.0

11.5

11.0

10.5

10.0

9.5

9.0

8.5 frequency 8.0

7.5

7.0

6.5

6.0

5.5

5.0

4.5

4.0

3.5

3.0

2.5

2.0

1.5

1.0

0.5

0.0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 responseTime [ms]

responseTime

Figure A.3.: Experiment 2 Platform 1: Response Time Histogram A. Experiment 2 and 4 on Platform 1 59

waitTime [ms]

92.5

90.0

87.5

85.0

82.5

80.0

77.5

75.0

72.5

70.0

67.5

65.0

62.5

60.0

57.5

55.0

52.5

50.0

47.5

45.0 frequency 42.5

40.0

37.5

35.0

32.5

30.0

27.5

25.0

22.5

20.0

17.5

15.0

12.5

10.0

7.5

5.0

2.5

0.0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 waitTime [ms]

waitTime

Figure A.4.: Experiment 2 Platform 1: Wait Time Histogram

fDTW provisioning time [ms] 10.50 10.25 10.00 9.75 9.50 9.25 9.00 8.75 8.50 8.25 8.00 7.75 7.50 7.25 7.00 6.75 6.50 6.25 6.00 5.75 5.50 5.25 5.00 frequency 4.75 4.50 4.25 4.00 3.75 3.50 3.25 3.00 2.75 2.50 2.25 2.00 1.75 1.50 1.25 1.00 0.75 0.50 0.25 0.00 -375 -350 -325 -300 -275 -250 -225 -200 -175 -150 -125 -100 -75 -50 -25 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 550 fDTW provisioning time [ms]

3_DTW_provisioningtime

Figure A.5.: Experiment 2 Platform 1: fDTW Provisioning Time Approximation Histogram 60 Appendix

DTW 20 10 0 Resource amount / WL intensity 20 10 0

0 50 100 150 200

count of changes

Figure A.6.: Experiment 2 Platform 1: R DTW Mapping

DTW time series alignment 20 15 10 5 0 40 30 20 d$index2 Provided Resources Provided 10 0 20 yts d$index1 xts 10 5 0

0 50 100 150 200 Workload Intensity

Figure A.7.: Experiment 2 Platform 1: R DTW Cost Matrix A. Experiment 2 and 4 on Platform 1 61

A.2. Experiment 4 - Platform 1 • executor configuration: queue length 10, stay alive time 10 ms • experiment duration: 2850 ms • resizing ratio: 33/248 = 13% • number of threads max reached: 21 • jump size distribution: 1: 20 times UP and 9 times DOWN 2: 2 times DOWN 8: 1 time DOWN • fDTW distance: 4280

thread pool size vs. number of tasks in executor 33

32

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11 thread pool size vs. number of tasks in executor

10

9

8

7

6

5

4

3

2

1

0 0 100 200 300 400 500 600 700 800 900 1,000 1,100 1,200 1,300 1,400 1,500 1,600 1,700 1,800 1,900 2,000 2,100 2,200 2,300 2,400 2,500 2,600 2,700 2,800 2,900 wall-clock time in milliseconds

4_threads_pool_size_at_runtime 8_tasks_in_executor

Figure A.8.: Experiment 4 Platform 1: Thread Pool Size vs. Workload Intensity 62 Appendix

difference between tasks in executor and thread pool size 10

9

8

7

6

5

4

3

2

1

0

-1

-2

-3

-4

-5

-6 difference between tasks in executor and thread pool size

-7

-8

-9

-10

-11

-12

-13 0 100 200 300 400 500 600 700 800 900 1,000 1,100 1,200 1,300 1,400 1,500 1,600 1,700 1,800 1,900 2,000 2,100 2,200 2,300 2,400 2,500 2,600 2,700 2,800 2,900 wall-clock time in milliseconds

10_difference_4-8

Figure A.9.: Experiment 4 Platform 1: Difference between Thread Pool Size and Workload Intensity

responseTime [ms]

9.25

9.00

8.75

8.50

8.25

8.00

7.75

7.50

7.25

7.00

6.75

6.50

6.25

6.00

5.75

5.50

5.25

5.00

4.75

4.50 frequency 4.25

4.00

3.75

3.50

3.25

3.00

2.75

2.50

2.25

2.00

1.75

1.50

1.25

1.00

0.75

0.50

0.25

0.00 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 450 460 470 480 490 500 510 520 530 540 550 560 570 responseTime [ms]

responseTime

Figure A.10.: Experiment 4 Platform 1: Response Time Histogram A. Experiment 2 and 4 on Platform 1 63

waitTime [ms] 21.0 20.5 20.0 19.5 19.0 18.5 18.0 17.5 17.0 16.5 16.0 15.5 15.0 14.5 14.0 13.5 13.0 12.5 12.0 11.5 11.0 10.5 10.0 frequency 9.5 9.0 8.5 8.0 7.5 7.0 6.5 6.0 5.5 5.0 4.5 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 180 185 190 195 200 205 210 215 220 225 230 235 240 245 250 255 260 265 270 275 280 285 290 waitTime [ms]

waitTime

Figure A.11.: Experiment 4 Platform 1: Wait Time Histogram

fDTW provisioning time [ms] 11.50 11.25 11.00 10.75 10.50 10.25 10.00 9.75 9.50 9.25 9.00 8.75 8.50 8.25 8.00 7.75 7.50 7.25 7.00 6.75 6.50 6.25 6.00 5.75 5.50 frequency 5.25 5.00 4.75 4.50 4.25 4.00 3.75 3.50 3.25 3.00 2.75 2.50 2.25 2.00 1.75 1.50 1.25 1.00 0.75 0.50 0.25 0.00 -350 -300 -250 -200 -150 -100 -50 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1,000 1,050 1,100 1,150 1,200 1,250 1,300 fDTW provisioning time [ms]

3_DTW_provisioningtime

Figure A.12.: Experiment 4 Platform 1: fDTW Provisioning Time Approximation Histogram 64 Appendix

DTW 20 10 0 Resource amount / WL intensity 30 20 10 0

0 50 100 150 200 250

count of changes

Figure A.13.: Experiment 4 Platform 1: R DTW Mapping

DTW time series alignment 20 15 10 5 0 30 25 20 d$index2 15 Provided Resources Provided 10 5 0 30 yts

20 d$index1 xts 10 0

0 50 100 150 200 250 Workload Intensity

Figure A.14.: Experiment 4 Platform 1: R DTW Cost Matrix B. Experiments 1-5 on Platform 2 65

B. Experiments 1-5 on Platform 2 B.1. Experiment 1 - Platform 2 • executor configuration: queue length 2, stay alive time 10 ms • experiment duration: 5300 ms • resizing ratio: 75/204 = 36% • number of threads max reached: 28 • jump size distribution: 1: 46 times UP and 17 times DOWN 2: 1 time UP and 4 times DOWN 3: 4 times DOWN 4: 1 time UP 7: 1 time DOWN 8: 1 time DOWN • fDTW distance: 259

thread pool size vs. number of tasks in executor 32

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11 thread pool size vs. number of tasks in executor 10

9

8

7

6

5

4

3

2

1

0 0 250 500 750 1.000 1.250 1.500 1.750 2.000 2.250 2.500 2.750 3.000 3.250 3.500 3.750 4.000 4.250 4.500 4.750 5.000 5.250 5.500 wall-clock time in milliseconds

4_threads_pool_size_at_runtime 8_tasks_in_executor

Figure B.15.: Experiment 1 Platform 2: Thread Pool Size vs. Workload Intensity 66 Appendix

difference between tasks in executor and thread pool size 17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2 difference between tasks in executor and thread pool size

1

0

-1

-2

-3

-4

-5 0 250 500 750 1.000 1.250 1.500 1.750 2.000 2.250 2.500 2.750 3.000 3.250 3.500 3.750 4.000 4.250 4.500 4.750 5.000 5.250 5.500 wall-clock time in milliseconds

10_difference_4-8

Figure B.16.: Experiment 1 Platform 2: Difference between Thread Pool Size and Work- load Intensity

responseTime [ms]

7,25

7,00

6,75

6,50

6,25

6,00

5,75

5,50

5,25

5,00

4,75

4,50

4,25

4,00

3,75

3,50 frequency

3,25

3,00

2,75

2,50

2,25

2,00

1,75

1,50

1,25

1,00

0,75

0,50

0,25

0,00 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 550 575 600 625 650 675 700 725 750 775 800 825 850 875 900 responseTime [ms]

responseTime

Figure B.17.: Experiment 1 Platform 2: Response Time Histogram B. Experiments 1-5 on Platform 2 67

waitTime [ms]

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

frequency 14

13

12

11

10

9

8

7

6

5

4

3

2

1

0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 450 460 470 480 490 500 510 520 530 540 550 560 570 580 590 600 610 620 630 640 waitTime [ms]

waitTime

Figure B.18.: Experiment 1 Platform 2: Wait Time Histogram

fDTW provisioning time [ms] 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 frequency 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 -300 -275 -250 -225 -200 -175 -150 -125 -100 -75 -50 -25 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 fDTW provisioning time [ms]

3_DTW_provisioningtime

Figure B.19.: Experiment 1 Platform 2: fDTW Provisioning Time Approximation Histogram 68 Appendix

DTW 30 20 10 0 Resource amount / WL intensity 30 20 10 0

0 50 100 150 200

count of changes

Figure B.20.: Experiment 1 Platform 2: R DTW Mapping

DTW time series alignment 25 15 5 0 60 40 d$index2 Provided Resources Provided 20 0

yts 25 d$index1 15 xts 5 0

0 50 100 150 200 Workload Intensity

Figure B.21.: Experiment 1 Platform 2: R DTW Cost Matrix B. Experiments 1-5 on Platform 2 69

B.2. Experiment 2 - Platform 2 • executor configuration: queue length 2, stay alive time 250 ms • experiment duration: 5300 ms • resizing ratio: 44/110 = 40% • number of threads max reached: 34 • jump size distribution: 1: 26 times UP and 12 times DOWN 2: 2 times DOWN 4: 1 time UP 6: 1 time UP 21: 1 time DOWN • fDTW distance: 329

thread pool size vs. number of tasks in executor 36

35

34

33

32

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

thread pool size vs. number of tasks in executor 12

11

10

9

8

7

6

5

4

3

2

1

0 0 250 500 750 1.000 1.250 1.500 1.750 2.000 2.250 2.500 2.750 3.000 3.250 3.500 3.750 4.000 4.250 4.500 4.750 5.000 5.250 5.500 wall-clock time in milliseconds

4_threads_pool_size_at_runtime 8_tasks_in_executor

Figure B.22.: Experiment 2 Platform 2: Thread Pool Size vs. Workload Intensity 70 Appendix

difference between tasks in executor and thread pool size 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 difference between tasks in executor and thread pool size 6 5 4 3 2 1 0 -1 -2 -3 -4 -5 -6 0 250 500 750 1.000 1.250 1.500 1.750 2.000 2.250 2.500 2.750 3.000 3.250 3.500 3.750 4.000 4.250 4.500 4.750 5.000 5.250 5.500 wall-clock time in milliseconds

10_difference_4-8

Figure B.23.: Experiment 2 Platform 2: Difference between Thread Pool Size and Work- load Intensity

responseTime [ms] 6,25

6,00

5,75

5,50

5,25

5,00

4,75

4,50

4,25

4,00

3,75

3,50

3,25

3,00 frequency

2,75

2,50

2,25

2,00

1,75

1,50

1,25

1,00

0,75

0,50

0,25

0,00 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1.000 1.050 1.100 1.150 1.200 1.250 1.300 1.350 1.400 1.450 1.500 1.550 1.600 1.650 1.700 responseTime [ms]

responseTime

Figure B.24.: Experiment 2 Platform 2: Response Time Histogram B. Experiments 1-5 on Platform 2 71

waitTime [ms] 24,0 23,5 23,0 22,5 22,0 21,5 21,0 20,5 20,0 19,5 19,0 18,5 18,0 17,5 17,0 16,5 16,0 15,5 15,0 14,5 14,0 13,5 13,0 12,5 12,0 11,5 frequency 11,0 10,5 10,0 9,5 9,0 8,5 8,0 7,5 7,0 6,5 6,0 5,5 5,0 4,5 4,0 3,5 3,0 2,5 2,0 1,5 1,0 0,5 0,0 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 550 575 600 625 650 675 700 725 750 775 800 825 850 875 900 925 950 975 1.000 1.025 1.050 1.075 1.100 1.125 1.150 1.175 1.200 waitTime [ms]

waitTime

Figure B.25.: Experiment 2 Platform 2: Wait Time Histogram

fDTW provisioning time [ms]

14,5

14,0

13,5

13,0

12,5

12,0

11,5

11,0

10,5

10,0

9,5

9,0

8,5

8,0

7,5

7,0 frequency

6,5

6,0

5,5

5,0

4,5

4,0

3,5

3,0

2,5

2,0

1,5

1,0

0,5

0,0 -200 -100 0 100 200 300 400 500 600 700 800 900 1.000 1.100 1.200 1.300 1.400 1.500 1.600 1.700 1.800 1.900 2.000 2.100 2.200 2.300 2.400 2.500 2.600 2.700 fDTW provisioning time [ms]

3_DTW_provisioningtime

Figure B.26.: Experiment 2 Platform 2: fDTW Provisioning Time Approximation Histogram 72 Appendix

DTW 30 20 10 0 Resource amount / WL intensity 30 20 10 0

0 20 40 60 80 100

count of changes

Figure B.27.: Experiment 2 Platform 2: R DTW Mapping

DTW time series alignment 35 25 15 5 40 30 d$index2 20 Provided Resources Provided 10 0 25 yts d$index1 15 xts 5 0

0 20 40 60 80 100 Workload Intensity

Figure B.28.: Experiment 2 Platform 2: R DTW Cost Matrix B. Experiments 1-5 on Platform 2 73

B.3. Experiment 3 - Platform 2 • executor configuration: queue length 4, stay alive time 50 ms • experiment duration: 5850 ms • resizing ratio: 54/190 = 28% • number of threads max reached: 18 • jump size distribution: 1: 28 times UP and 19 times DOWN 2: 1 time UP and 3 times DOWN 3: 2 times DOWN • fDTW distance: 649

thread pool size vs. number of tasks in executor 24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

thread pool size vs. number of tasks in executor 8

7

6

5

4

3

2

1

0 0 250 500 750 1.000 1.250 1.500 1.750 2.000 2.250 2.500 2.750 3.000 3.250 3.500 3.750 4.000 4.250 4.500 4.750 5.000 5.250 5.500 5.750 6.000 wall-clock time in milliseconds

4_threads_pool_size_at_runtime 8_tasks_in_executor

Figure B.29.: Experiment 3 Platform 2: Thread Pool Size vs. Workload Intensity 74 Appendix

difference between tasks in executor and thread pool size 13

12

11

10

9

8

7

6

5

4

3

2

1

0 difference between tasks in executor and thread pool size

-1

-2

-3

-4

-5

-6 0 250 500 750 1.000 1.250 1.500 1.750 2.000 2.250 2.500 2.750 3.000 3.250 3.500 3.750 4.000 4.250 4.500 4.750 5.000 5.250 5.500 5.750 6.000 wall-clock time in milliseconds

10_difference_4-8

Figure B.30.: Experiment 3 Platform 2: Difference between Thread Pool Size and Work- load Intensity

responseTime [ms]

7,25

7,00

6,75

6,50

6,25

6,00

5,75

5,50

5,25

5,00

4,75

4,50

4,25

4,00

3,75

3,50 frequency

3,25

3,00

2,75

2,50

2,25

2,00

1,75

1,50

1,25

1,00

0,75

0,50

0,25

0,00 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 550 575 600 625 650 675 700 725 750 775 800 825 850 875 900 925 950 975 1.000 1.025 1.050 responseTime [ms]

responseTime

Figure B.31.: Experiment 3 Platform 2: Response Time Histogram B. Experiments 1-5 on Platform 2 75

waitTime [ms]

19,5

19,0

18,5

18,0

17,5

17,0

16,5

16,0

15,5

15,0

14,5

14,0

13,5

13,0

12,5

12,0

11,5

11,0

10,5

10,0

9,5 frequency 9,0

8,5

8,0

7,5

7,0

6,5

6,0

5,5

5,0

4,5

4,0

3,5

3,0

2,5

2,0

1,5

1,0

0,5

0,0 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 550 575 600 625 650 675 waitTime [ms]

waitTime

Figure B.32.: Experiment 3 Platform 2: Wait Time Histogram

fDTW provisioning time [ms] 12,50 12,25 12,00 11,75 11,50 11,25 11,00 10,75 10,50 10,25 10,00 9,75 9,50 9,25 9,00 8,75 8,50 8,25 8,00 7,75 7,50 7,25 7,00 6,75 6,50 6,25 6,00 frequency 5,75 5,50 5,25 5,00 4,75 4,50 4,25 4,00 3,75 3,50 3,25 3,00 2,75 2,50 2,25 2,00 1,75 1,50 1,25 1,00 0,75 0,50 0,25 0,00 -500 -475 -450 -425 -400 -375 -350 -325 -300 -275 -250 -225 -200 -175 -150 -125 -100 -75 -50 -25 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 550 575 600 625 650 675 700 725 750 775 800 fDTW provisioning time [ms]

3_DTW_provisioningtime

Figure B.33.: Experiment 3 Platform 2: fDTW Provisioning Time Approximation Histogram 76 Appendix

DTW 15 5 0 Resource amount / WL intensity 25 15 5 0

0 50 100 150

count of changes

Figure B.34.: Experiment 3 Platform 2: R DTW Mapping

DTW time series alignment 15 10 5 0 50 40 30 d$index2 Provided Resources Provided 20 10 0 20 yts d$index1 xts 10 5 0

0 50 100 150 Workload Intensity

Figure B.35.: Experiment 3 Platform 2: R DTW Cost Matrix B. Experiments 1-5 on Platform 2 77

B.4. Experiment 4 - Platform 2 • executor configuration: queue length 10, stay alive time 10 ms • experiment duration: 5600 ms • resizing ratio: 57/194 = 29% • number of threads max reached: 30 • jump size distribution: 1: 35 times UP and 17 times DOWN 3: 2 times DOWN 6: 1 time DOWN 7: 1 time DOWN • fDTW distance: 2901

thread pool size vs. number of tasks in executor 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15

thread pool size vs. number of tasks in executor 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 250 500 750 1.000 1.250 1.500 1.750 2.000 2.250 2.500 2.750 3.000 3.250 3.500 3.750 4.000 4.250 4.500 4.750 5.000 5.250 5.500 5.750 wall-clock time in milliseconds

4_threads_pool_size_at_runtime 8_tasks_in_executor

Figure B.36.: Experiment 4 Platform 2: Thread Pool Size vs. Workload Intensity 78 Appendix

difference between tasks in executor and thread pool size 23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

-1 difference between tasks in executor and thread pool size -2

-3

-4

-5

-6

-7

-8

-9

-10

-11

-12 0 250 500 750 1.000 1.250 1.500 1.750 2.000 2.250 2.500 2.750 3.000 3.250 3.500 3.750 4.000 4.250 4.500 4.750 5.000 5.250 5.500 5.750 wall-clock time in milliseconds

10_difference_4-8

Figure B.37.: Experiment 4 Platform 2: Difference between Thread Pool Size and Work- load Intensity

responseTime [ms]

9,25

9,00

8,75

8,50

8,25

8,00

7,75

7,50

7,25

7,00

6,75

6,50

6,25

6,00

5,75

5,50

5,25

5,00

4,75

4,50 frequency 4,25

4,00

3,75

3,50

3,25

3,00

2,75

2,50

2,25

2,00

1,75

1,50

1,25

1,00

0,75

0,50

0,25

0,00 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1.000 1.050 1.100 1.150 1.200 1.250 1.300 1.350 1.400 1.450 1.500 1.550 1.600 1.650 1.700 1.750 1.800 responseTime [ms]

responseTime

Figure B.38.: Experiment 4 Platform 2: Response Time Histogram B. Experiments 1-5 on Platform 2 79

waitTime [ms] 22,0 21,5 21,0 20,5 20,0 19,5 19,0 18,5 18,0 17,5 17,0 16,5 16,0 15,5 15,0 14,5 14,0 13,5 13,0 12,5 12,0 11,5 11,0 10,5 frequency 10,0 9,5 9,0 8,5 8,0 7,5 7,0 6,5 6,0 5,5 5,0 4,5 4,0 3,5 3,0 2,5 2,0 1,5 1,0 0,5 0,0 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1.000 1.050 1.100 1.150 1.200 1.250 1.300 1.350 waitTime [ms]

waitTime

Figure B.39.: Experiment 4 Platform 2: Wait Time Histogram

fDTW provisioning time [ms]

7,25

7,00

6,75

6,50

6,25

6,00

5,75

5,50

5,25

5,00

4,75

4,50

4,25

4,00

3,75

3,50 frequency

3,25

3,00

2,75

2,50

2,25

2,00

1,75

1,50

1,25

1,00

0,75

0,50

0,25

0,00 -1.200 -1.150 -1.100 -1.050 -1.000 -950 -900 -850 -800 -750 -700 -650 -600 -550 -500 -450 -400 -350 -300 -250 -200 -150 -100 -50 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 fDTW provisioning time [ms]

3_DTW_provisioningtime

Figure B.40.: Experiment 4 Platform 2: fDTW Provisioning Time Approximation Histogram 80 Appendix

DTW 30 20 10 0 40 Resource amount / WL intensity 30 20 10 0

0 50 100 150 200

count of changes

Figure B.41.: Experiment 4 Platform 2: R DTW Mapping

DTW time series alignment 30 20 10 0 50 40 30 d$index2 Provided Resources Provided 20 10 0 40

yts 30 d$index1 20 xts 10 0

0 50 100 150 200 Workload Intensity

Figure B.42.: Experiment 4 Platform 2: R DTW Cost Matrix B. Experiments 1-5 on Platform 2 81

B.5. Experiment 5 - Platform 2 • executor configuration: queue length 10, stay alive time 250 ms • experiment duration: 5750 ms • resizing ratio: 28/219 =13% • number of threads max reached: 15 • jump size distribution: 1: 14 times UP and 11 times DOWN 2: 2 times DOWN • fDTW distance: 3528

thread pool size vs. number of tasks in executor 27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

thread pool size vs. number of tasks in executor 9

8

7

6

5

4

3

2

1

0 0 250 500 750 1.000 1.250 1.500 1.750 2.000 2.250 2.500 2.750 3.000 3.250 3.500 3.750 4.000 4.250 4.500 4.750 5.000 5.250 5.500 5.750 6.000 wall-clock time in milliseconds

4_threads_pool_size_at_runtime 8_tasks_in_executor

Figure B.43.: Experiment 5 Platform 2: Thread Pool Size vs. Workload Intensity 82 Appendix

difference between tasks in executor and thread pool size 11

10

9

8

7

6

5

4

3

2

1

0

-1

-2

-3

-4

-5 difference between tasks in executor and thread pool size

-6

-7

-8

-9

-10

-11

-12 0 250 500 750 1.000 1.250 1.500 1.750 2.000 2.250 2.500 2.750 3.000 3.250 3.500 3.750 4.000 4.250 4.500 4.750 5.000 5.250 5.500 5.750 6.000 wall-clock time in milliseconds

10_difference_4-8

Figure B.44.: Experiment 5 Platform 2: Difference between Thread Pool Size and Work- load Intensity

responseTime [ms] 6,25

6,00

5,75

5,50

5,25

5,00

4,75

4,50

4,25

4,00

3,75

3,50

3,25

3,00 frequency

2,75

2,50

2,25

2,00

1,75

1,50

1,25

1,00

0,75

0,50

0,25

0,00 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1.000 1.050 1.100 1.150 1.200 1.250 1.300 1.350 1.400 1.450 1.500 1.550 1.600 1.650 1.700 1.750 1.800 1.850 1.900 responseTime [ms]

responseTime

Figure B.45.: Experiment 5 Platform 2: Response Time Histogram B. Experiments 1-5 on Platform 2 83

waitTime [ms] 24,0 23,5 23,0 22,5 22,0 21,5 21,0 20,5 20,0 19,5 19,0 18,5 18,0 17,5 17,0 16,5 16,0 15,5 15,0 14,5 14,0 13,5 13,0 12,5 12,0 11,5 frequency 11,0 10,5 10,0 9,5 9,0 8,5 8,0 7,5 7,0 6,5 6,0 5,5 5,0 4,5 4,0 3,5 3,0 2,5 2,0 1,5 1,0 0,5 0,0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 450 460 470 480 490 500 510 520 530 540 550 560 570 580 590 600 610 620 630 waitTime [ms]

waitTime

Figure B.46.: Experiment 5 Platform 2: Wait Time Histogram

fDTW provisioning time [ms]

7,25

7,00

6,75

6,50

6,25

6,00

5,75

5,50

5,25

5,00

4,75

4,50

4,25

4,00

3,75

3,50 frequency

3,25

3,00

2,75

2,50

2,25

2,00

1,75

1,50

1,25

1,00

0,75

0,50

0,25

0,00 -1.500 -1.400 -1.300 -1.200 -1.100 -1.000 -900 -800 -700 -600 -500 -400 -300 -200 -100 0 100 200 300 400 500 600 700 800 900 1.000 1.100 1.200 1.300 1.400 1.500 1.600 1.700 fDTW provisioning time [ms]

3_DTW_provisioningtime

Figure B.47.: Experiment 5 Platform 2: fDTW Provisioning Time Approximation Histogram 84 Appendix

DTW 30 20 10 0 Resource amount / WL intensity 20 10 0

0 50 100 150 200

count of changes

Figure B.48.: Experiment 5 Platform 2: R DTW Mapping

DTW time series alignment 30 20 10 0 50 40 30 d$index2 Provided Resources Provided 20 10 0

yts 20 d$index1 xts 10 5 0

0 50 100 150 200 Workload Intensity

Figure B.49.: Experiment 5 Platform 2: R DTW Cost Matrix List of Figures

2.1. Schematic example of a scenario where execution platform is not scalable: the (idealized) correlation between application workload intensity and ap- plication response time ...... 9 2.2. Schematic Example of a Scenario with a fixed Application Workload and scalable execution platform: the (idealized) correlation between amount of resources provided by the platform and application response time ...... 10

3.1. Schematic Example of an (unrealistically) ideal elastic System with imme- diate and fully-compensating elasticity: ...... 11 3.2. Schematic Example of an Elastic System ...... 12 3.3. Three aspects of the Proposed Elasticity Metric ...... 13 3.4. Elasticity Matrix for an Example Platform ...... 13 3.5. Two different Approaches for Measuring Provisioning Times [15] ...... 16 3.6. Finding the right correlation between Trigger and Effect event for provi- sioning time extraction ...... 17 3.7. Illustration of a Dynamic Time Warping Result [17] ...... 18 3.8. A Cost Matrix for Fig. 3.7 with the Minimum-Distance Warp Path traced through it. [17] ...... 18

4.1. Illustration of the Thread Pool Pattern [15] ...... 22 4.2. Experiment 1 Platform 1: Thread Pool Size vs. Workload Intensity . . . . . 28 4.3. Experiment 1 Platform 1: Difference between Thread Pool Size and Work- load Intensity ...... 29 4.4. Experiment 1 Platform 1: Response Time Histogram ...... 29 4.5. Experiment 1 Platform 1: Wait Time Histogram ...... 30 4.6. Experiment 1 Platform 1: fDTW Provisioning Time Approximation His- togram...... 30 4.7. Experiment 1 Platform 1: R DTW Mapping with the lower line as workload intensity as cause for changes in resource amount (upper line) ...... 31 4.8. Experiment 1 Platform 1: R DTW Cost Matrix ...... 31 4.9. Experiment 3 Platform 1: Thread Pool Size vs. Workload Intensity . . . . . 33 4.10. Experiment 3 Platform 1: Difference between Thread Pool Size and Work- load Intensity ...... 34 4.11. Experiment 3 Platform 1: Response Time Histogram ...... 34 4.12. Experiment 3 Platform 1: Wait Time Histogram ...... 35 4.13. Experiment 3 Platform 1: fDTW Provisioning Time Approximation His- togram...... 35 4.14. Experiment 3 Platform 1: R DTW Mapping with the lower line as workload intensity as cause for changes in resource amount (upper line) ...... 36 4.15. Experiment 3 Platform 1: R DTW Cost Matrix ...... 36 4.16. Experiment 5 Platform 1: Thread Pool Size vs. Workload Intensity . . . . . 38 86 List of Figures

4.17. Experiment 5 Platform 1: Difference between Thread Pool Size and Work- load Intensity ...... 39 4.18. Experiment 5 Platform 1: Response Time Histogram ...... 39 4.19. Experiment 5 Platform 1: Wait Time Histogram ...... 40 4.20. Experiment 5 Platform 1: fDTW Provisioning Time Approximation His- togram...... 40 4.21. Experiment 5 Platform 1: R DTW Mapping with the lower line as workload intensity as cause for changes in resource amount (upper line) ...... 41 4.22. Experiment 5 Platform 1: R DTW Cost Matrix ...... 41

5.1. Illustration of Virtual Machine Scale Up Experiment Setup [15] ...... 46 5.2. Workload Design for Virtual Machine Scale Up Experiment [15] ...... 47 5.3. Time Slice Average vs. Workload Intensity ...... 48

A.1. Experiment 2 Platform 1: Thread Pool Size vs. Workload Intensity . . . . . 57 A.2. Experiment 2 Platform 1: Difference between Thread Pool Size and Work- load Intensity ...... 58 A.3. Experiment 2 Platform 1: Response Time Histogram ...... 58 A.4. Experiment 2 Platform 1: Wait Time Histogram ...... 59 A.5. Experiment 2 Platform 1: fDTW Provisioning Time Approximation His- togram...... 59 A.6. Experiment 2 Platform 1: R DTW Mapping ...... 60 A.7. Experiment 2 Platform 1: R DTW Cost Matrix ...... 60 A.8. Experiment 4 Platform 1: Thread Pool Size vs. Workload Intensity . . . . . 61 A.9. Experiment 4 Platform 1: Difference between Thread Pool Size and Work- load Intensity ...... 62 A.10.Experiment 4 Platform 1: Response Time Histogram ...... 62 A.11.Experiment 4 Platform 1: Wait Time Histogram ...... 63 A.12.Experiment 4 Platform 1: fDTW Provisioning Time Approximation His- togram...... 63 A.13.Experiment 4 Platform 1: R DTW Mapping ...... 64 A.14.Experiment 4 Platform 1: R DTW Cost Matrix ...... 64 B.15.Experiment 1 Platform 2: Thread Pool Size vs. Workload Intensity . . . . . 65 B.16.Experiment 1 Platform 2: Difference between Thread Pool Size and Work- load Intensity ...... 66 B.17.Experiment 1 Platform 2: Response Time Histogram ...... 66 B.18.Experiment 1 Platform 2: Wait Time Histogram ...... 67 B.19.Experiment 1 Platform 2: fDTW Provisioning Time Approximation His- togram...... 67 B.20.Experiment 1 Platform 2: R DTW Mapping ...... 68 B.21.Experiment 1 Platform 2: R DTW Cost Matrix ...... 68 B.22.Experiment 2 Platform 2: Thread Pool Size vs. Workload Intensity . . . . . 69 B.23.Experiment 2 Platform 2: Difference between Thread Pool Size and Work- load Intensity ...... 70 B.24.Experiment 2 Platform 2: Response Time Histogram ...... 70 B.25.Experiment 2 Platform 2: Wait Time Histogram ...... 71 B.26.Experiment 2 Platform 2: fDTW Provisioning Time Approximation His- togram...... 71 B.27.Experiment 2 Platform 2: R DTW Mapping ...... 72 B.28.Experiment 2 Platform 2: R DTW Cost Matrix ...... 72 B.29.Experiment 3 Platform 2: Thread Pool Size vs. Workload Intensity . . . . . 73 List of Figures 87

B.30.Experiment 3 Platform 2: Difference between Thread Pool Size and Work- load Intensity ...... 74 B.31.Experiment 3 Platform 2: Response Time Histogram ...... 74 B.32.Experiment 3 Platform 2: Wait Time Histogram ...... 75 B.33.Experiment 3 Platform 2: fDTW Provisioning Time Approximation His- togram...... 75 B.34.Experiment 3 Platform 2: R DTW Mapping ...... 76 B.35.Experiment 3 Platform 2: R DTW Cost Matrix ...... 76 B.36.Experiment 4 Platform 2: Thread Pool Size vs. Workload Intensity . . . . . 77 B.37.Experiment 4 Platform 2: Difference between Thread Pool Size and Work- load Intensity ...... 78 B.38.Experiment 4 Platform 2: Response Time Histogram ...... 78 B.39.Experiment 4 Platform 2: Wait Time Histogram ...... 79 B.40.Experiment 4 Platform 2: fDTW Provisioning Time Approximation His- togram...... 79 B.41.Experiment 4 Platform 2: R DTW Mapping ...... 80 B.42.Experiment 4 Platform 2: R DTW Cost Matrix ...... 80 B.43.Experiment 5 Platform 2: Thread Pool Size vs. Workload Intensity . . . . . 81 B.44.Experiment 5 Platform 2: Difference between Thread Pool Size and Work- load Intensity ...... 82 B.45.Experiment 5 Platform 2: Response Time Histogram ...... 82 B.46.Experiment 5 Platform 2: Wait Time Histogram ...... 83 B.47.Experiment 5 Platform 2: fDTW Provisioning Time Approximation His- togram...... 83 B.48.Experiment 5 Platform 2: R DTW Mapping ...... 84 B.49.Experiment 5 Platform 2: R DTW Cost Matrix ...... 84