Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services

Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services Shrideep Pallickara 1, Olaf David 1,2 , Wes Lloyd 2 Ken Rojas Institute of Technology Mazdak Arabi USDA-Natural Resource 1Department of Computer Science University of Washington 2 Conservation Service Tacoma, Washington USA Dept. of Civil and Env. Engineering Fort Collins, Colorado USA [email protected] Colorado St. Univ., Ft. Collins, USA [email protected] shrideep.pallickara, olaf.david, [email protected] Abstract — Abstraction of physical hardware using provisioned; and Determining WHERE infrastructure should infrastructure-as-a-service (IaaS) clouds leads to the simplistic be provisioned? view that resources are homogeneous and that infinite scaling is Determining WHAT server infrastructure to provision for possible with linear increases in performance. Hosting scientific application hosting concerns two dimensions: type of resources modeling services using IaaS clouds requires awareness of (vertical sizing), and quantity (horizontal sizing) of virtual application resource requirements and careful management of cloud-based infrastructure. In this paper, we present multiple machines (VM) resources. Vertical sizing involves selecting methods to improve public cloud infrastructure management to the appropriate resource configuration “type” considering: the support hosting scientific model services. We investigate public required number of CPU cores, memory allocation, disk cloud VM-host heterogeneity and noisy neighbor detection to capacity, and network bandwidth. On public clouds such as inform VM trial-and-better selection to favor worker VMs with Amazon EC2, specific VM resource configurations are known better placements in public clouds. We present a cpuSteal noisy as “instance types”. Each instance type, for example m3.large neighbor detection method (NN-Detect) which harnesses the or c3.xlarge, describes a given VM configuration with a fixed cpuSteal CPU metric to identify worker VMs with resource amount of RAM, hard disk space, and network bandwidth. contention from noisy neighbors. We evaluate potential Horizontal sizing involves determining the appropriate quantity performance improvements provided from leveraging these of VMs for workload hosting and then load balancing the techniques in support of providing modeling-as-a-service for two workload across this VM pool. Instance types such as the environmental science models. Amazon EC2 m1.large VM have several physical implementations and research has demonstrated a 30% Keywords Resource Management and Performance; IaaS; performance variance for web application hosting [1] [2]. Virtualization; Multi-Tenancy; WHERE server resources are provisioned within a specific datacenter is abstracted by IaaS clouds. Representing VMs as I. INTRODUCTION tuples and using them to pack physical machines (PMs) can be Public cloud environments abstract the physical hardware thought of as an example of the multidimensional bin-packing implementation of resources providing users with only limited problem shown to be NP-hard [3]. In public cloud settings, details regarding the actual physical hardware VM placement is abstracted and difficult to discern [4]. implementations. This abstraction hides resource Previous efforts using heuristics to infer VM co-residency by heterogeneity and restricts users’ ability to make well informed launching probe VMs for exploration can be expensive and deployment decisions based on real knowledge about the state only partially effective at determining VM locations [5] [17] of the system. Resource heterogeneity includes differences in [18] [19] [20]. Public clouds do support features including the physical hardware used to implement identically named Virtual Private Clouds (VPCs) and “placement groups” to resources, and also performance heterogeneity resulting from ensure locality of VM placements for application hosting. contention for shared CPU, disk, and network resources. This These features help ensure data center and rack locality for lack of information forces users to rely on the good will of application VMs --- but they do not ensure resource quality. cloud hosts to make resource deployments. But to what degree Though application resource placements are localized, the can we assume that cloud hosts will protect users from physical hosts may be overprovisioned, or host “noisy potential performance degradation and increased hosting costs neighbor” VMs, resource hungry VMs which consume an resulting from heterogeneous resources? unusual share of CPU, disk, or network resources. This Infrastructure-as-a-Service (IaaS) cloud resource resource contention has been shown to degrade performance of management challenges can be broken down into three primary scientific applications hosted in public clouds [4] [6]. concerns: Determining WHEN infrastructure should be In this paper, we investigate the performance implications provisioned; Determining WHAT infrastructure should be of public cloud infrastructure abstraction for hosting scientific modeling web services to provide scientific “modeling-as-a- 1 service” for ubiquitous clients. Specifically, we investigate the the implications for average service execution time for implications of WHAT infrastructure gets provisioned (resource USDA model-as-a-service workloads from heterogeneous implementation), and WHERE this infrastructure is provisioned VM type implementations. (resource state). We first quantify performance degradation II. BACKGROUND AND RELATED WORK resulting from public cloud resource type heterogeneity (WHAT ), and second from resource contention ( WHERE ). We A. Scientific Modeling on Public Clouds benchmark the performance by hosting two US Department of Ostermann et al. provided an early assessment of public Agriculture environmental modeling web service applications clouds for scientific modeling in [7]. They assessed the ability on Amazon’s public EC2 cloud. Secondly, we propose two of 1st generation Amazon EC2 VMs (e.g. m1.*-c1.*) to host approaches which help mitigate this performance degradation HPC-based scientific applications. They identified that EC2 through intelligent public cloud resource provisioning. performance, particularly network latency, required an order of A. Research Questions magnitude improvement to be practical and suggested that scientific applications should be tuned for operation in We investigate the following research questions: virtualized environments. Other early efforts highlighted RQ-1: How common is public cloud VM-type similar challenges regarding performance for HPC applications implementation heterogeneity? [8] [9] [10]. To increase ubiquity of scientific models and RQ-2: What performance implications result from this applications, and to support dissemination of research, many heterogeneity for hosting web services application organizations now build and host scientific web services [11] workloads? [12]. The Cloud Services Integration Platform (CSIP), a Java- based programming framework, has been developed by RQ-3: How effective is cpuSteal at identifying worker VMs Colorado State University in cooperation with the US with high resource contention due to multi-tenancy Department of Agriculture. CSIP supports development and (e.g. noisy neighbor VMs) in a public cloud? deployment of web services for a number of USDA scientific RQ-4: What are the performance implications of hosting modeling applications implemented in languages such as C, scientific modeling workloads on worker VMs with FORTRAN, Python, and others. Infrastructure-as-a-Service consistently high cpuSteal measurements in a public cloud computing offers an ideal platform for CSIP services cloud? which mix legacy code with modern framework CpuSteal provides a processor metric that helps quantify instrumentation to provide modeling-as-a-service, on demand, overprovisioning of the physical hardware [15] [21]. When a to diverse client applications [13] [14]. VM CPU core is ready to execute but the physical host CPU B. Public Cloud Resource Contention Detection core is unavailable because it is performing other work, a CpuSteal tick is registered. Herein we describe our Schad et al. [6] demonstrated the unpredictability of investigation on the feasibility of harnessing this CPU metric to Amazon EC2 VM performance caused by contention for identify resource contention from overprovisioned hosts in physical machine resources and provisioning variation of VMs. public clouds. More recently researchers have developed techniques to identify resource contention from coresident VMs running on B. Contributions shared hardware [15] [16] [17] [18] [19] [20]. This paper reports on our case study which develops and Mukherjee et al. developed a software-based resource applies two approaches to improve performance of service contention probe based on techniques not specific to hardware oriented workloads hosted using public clouds. The primary or OS specific counters to provide a more portable technique contributions of this paper include: [17]. They monitored TCP/IP connection rates and memory 1. We investigate the prevalence of resource contention contention from cache misses, but their work was restricted to among VMs on a shared host using the cpuSteal CPU a small private cluster. Novakovic et al. developed a multi- metric. We measure cpuSteal for several hours while level approach for detecting and mitigating resource contention running compute intensive workloads across eight- in private clusters

Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services

NUMA-Aware Thread Migration for High Performance NVMM File Systems

Resource Access Control in Real-Time Systems

Memory and Cache Contention Denial-Of-Service Attack in Mobile Edge Devices

A Case for NUMA-Aware Contention Management on Multicore Systems

Thread Evolution Kit for Optimizing Thread Operations on CE/Iot Devices

Computer Architecture Lecture 12: Memory Interference and Quality of Service

Optimizing Kubernetes Performance by Handling Resource Contention with Custom Scheduler

Addressing Shared Resource Contention in Multicore Processors Via Scheduling

Research Collection

Redhawk Linux User's Guide

Analysis of Application Sensitivity to System Performance Variability in a Dynamic Task Based Runtime

KIT PPT Master