Resource Contention Detection and Management for Consolidated Workloads

Resource Contention Detection and Management for Consolidated Workloads J. Mukherjee D. Krishnamurthy J. Rolia C. Hyser Dept. of ECE, Dept. of ECE, HP Labs HP Labs Univ. of Calgary, Canada Univ. of Calgary, Canada Palo Alto, CA, USA Palo Alto, CA, USA Email: [email protected] Email: [email protected] Email: [email protected] Email: chris [email protected] Abstract—Public and private cloud computing environments response times. Even if they did it would be difficult, over typically employ virtualization methods to consolidate application short time scales, to infer whether variations in an application’s workloads onto shared servers. Modern servers typically have response times are normal fluctuations due to the nature of one or more sockets each with one or more computing cores, a multi-level caching hierarchy, a memory subsystem, and an the application or whether they are due to interference on interconnect to the memory of other sockets. While resource the server. Others have reported success at detecting such management methods may manage application performance by contention by using physical host level metrics, e.g., CPU controlling the sharing of processing time and input-output rates, utilization, Clock Cycles per Instruction (CPI), and cache hit there is generally no management of contention for virtualization rates, to predict performance violations of applications running kernel resources or for the memory hierarchy and subsystems. Yet such contention can have a significant impact on application within VMs [5], [6], [3], [14]. However, these approaches performance. Hardware platform specific counters have been pro- focus on scientific and batch applications. Such applications posed for detecting such contention. We show that such counters do not have the high degree of request concurrency or OS- are not always sufficient for detecting contention. We propose intensive activity that is typical for interactive applications, a software probe based approach for detecting contention for e.g., high volume Internet services. We show that the existing shared platform resources and demonstrate its effectiveness. We show that the probe imposes a low overhead and is remarkably approaches are not always effective for recognizing contention effective at detecting performance degradations due to inter- among VMs in environments that host interactive applications. VM interference over a wide variety of workload scenarios. Our We propose and evaluate an alternative approach for de- approach supports the management of workload placement on tecting contention. This approach makes use of a probe shared servers and pools of shared servers. VM running a specially designed low overhead application. The probe executes sections of code capable of recognizing I. INTRODUCTION contention for various unmanaged resources shared among the Large scale virtualized resource pools such as private and VMs. Baseline execution times are collected for these sections public clouds are being used to host many kinds of appli- of code while executing the probe VM in isolation on a given cations. In general, each application is deployed to a virtual server. These values can be continuously compared with those machine (VM) which is then assigned to a server in the pool. gathered when the probe runs alongside other VMs on the Performance management for applications can be a challenge server. Any statistically significant deviation between these in such environments. Each VM may be assigned a certain sets of measures can be reported to a management controller share of the processing and input/output capacity of the server. with information on the type of resource contention that is However, servers still have other resources that are shared by present, thus allowing a management controller to initiate VMs but that are not managed directly. These include virtual actions to remedy the problem such as migrating a VM. machine monitor (VMM) resources and the server’s memory In this paper, we implement a probe designed to identify hierarchy. VMs may interfere with each others’ performance problems related to high TCP/IP connection arrival rates and as they use these resources. Performance degradation from memory contention. A method is given for automatically contention for such resources can be significant and espe- customizing the probe for a host by taking into account the cially problematic for interactive applications which may have host’s hardware and software characteristics. Using applica- stringent end-user response time requirements. We present a tions drawn from the well-known RUBiS [6] and DACAPO method for detecting contention for such resources that is [7] benchmark suites, we show that the probe is remarkably appropriate for both interactive and batch style applications. effective at detecting performance degradations due to inter- The method supports runtime management by continually re- VM interference. Furthermore, the probe imposes very low porting on contention so that actions can be taken to overcome overheads. In the worst case scenario, it caused 1.5% ad- problems that arise. ditional per-core CPU utilization and a 7% degradation in Detecting contention for shared but unmanaged resources is application response time performance. a key challenge. In cloud environments, management systems The remainder of the paper is organized as follows. Sec- typically do not have access to application metrics such as tion II gives a brief introduction to modern server architectures 978-3-901882-50-0 c 2013 IFIP 294 and VMMs. Section III describes the design of the probe. contention at the VMM related to establishing TCP/IP connec- Section IV demonstrates the effectiveness of the approach. tions. The second phase deals with contention for the memory Related work is discussed in Section V. Section VI offers hierarchy. We refer to these as the connection and memory summary and concluding remarks. phases respectively. The probe load generator controls which phase the probe sensor is in and is able to communicate any II. SERVER ARCHITECTURES AND VIRTUAL MACHINE inferred performance degradations to management controllers. MONITORS In the connection phase, an Apache Web Server instance within the probe sensor VM is subjected to a burst of c Web Modern processors from Intel and AMD have adopted cps over a period of t1 seconds. The Apache instance services socket based architectures with non-uniform memory access a connection request by executing a CPU-bound PHP script. In (NUMA) between sockets. Each socket itself is a physical the memory phase, the probe sensor VM executes a script that package with a connection to its own local, i.e. “near”, traverses an array of integers of size n for t2 seconds, where n memory subsystem and an inter-socket communication con- is related to the L3 cache size for the socket. Specifically, the nection possibly to other sockets and “far” memory, i.e., value of n is chosen so as to cause a L3 miss rate of 0.1. The remote memory, attached to other sockets in the system. It values of c, t1, n, and t2 need to be carefully tuned for each contains multiple processing cores all sharing a large L3 type of server in a resource pool such that the probe imposes cache to limit transactions leaving the socket. The cores in low overhead while having the ability to identify contention Intel processors may additionally implement hyper-threading among VMs. by creating two logical cores (separate architectural state, but To tune the probe for a particular server, we employ static shared execution units) introducing additional shared resources information about host socket cache size to choose n and a set subject to contention. of micro-benchmark applications. Each micro-benchmark is Modern operating systems such as Linux and Windows known to stress a particular shared host resource. To tune one have process scheduling and memory allocation support for of the probe’s phases, the number of its corresponding micro- NUMA hardware. Virtual machine monitors (VMM) such as benchmark VMs is increased until performance degradation is Kernel Virtual Memory (KVM), being based on the Linux observed. This process is repeated again but with the probe in operating system, provide a similar level of support as well. place. Parameters for the probe, i.e., c, t1, and t2 are optimized The key characteristic of such mechanisms is the ability to pin in a one factor at a time manner until the probe has low processes and VMs to specific cores. Furthermore, a process resource utilization yet correctly reports the degradations as or a VM can be configured using these mechanisms to use incurred by the micro-benchmark VMs. Details of the tuning only the local memory. This minimizes contention on the process are given in Section IV-D3. inter-socket communication connections due to L3 cache miss traffic. From a management perspective, this motivates the IV. CASE STUDY need to pin a VM to a single socket and assign resources to it Our experimental design has four parts. The first part from within that socket. We note that VMM kernel resources demonstrates the importance of socket affinity for workload and caches represent unmanaged resources since there are management. The second part focuses on OS and hardware typically no mechanisms to share these in a controlled manner counters. It assesses their effectiveness at detecting contention between VMs. for shared but unmanaged resources. The third part demonstrates the effectiveness of the probe approach at detecting III. PROBE DESIGN such contention for the micro-benchmark applications and the The probe we introduce has two parts. The first part is tuning of the probe using the micro-benchmark applications. the sensor application that runs in a VM and makes use of Finally, the fourth part demonstrates the effectiveness of the shared server resources. We denote this as the probe sensor. tuned probe when used with well known benchmarks of We require one VM probe sensor per socket.

Load more