The Who, What, Why, and How of High Performance Computing in the Cloud

The Who, What, Why, and How of High Performance Computing in the Cloud

The Who, What, Why, and How of High Performance Computing in the Cloud Abhishek Gupta, Filippo Gioachin, Paolo Faraboschi, Laxmikant V. Kale Verdi March, Richard Kaufmann, University of Illinois at Chun Hui Suen, Dejan Milojicic Urbana-Champaign Bu-Sung Lee HP Labs Urbana, IL 61801, USA HP Labs, Singapore Palo Alto, CA, USA (gupta59, kale)@illinois.edu fi[email protected][email protected] Abstract—Cloud computing is emerging as an alternative to outcome of these studies paints a rather pessimistic view of supercomputers for some of the high-performance computing HPC clouds, recent efforts towards HPC-optimized clouds, (HPC) applications that do not require a fully dedicated machine. such as Magellan [1] and Amazon’s EC2 Cluster Compute [5], With cloud as an additional deployment option, HPC users are faced with the challenges of dealing with highly heterogeneous point to a promising direction to overcome some of the resources, where the variability spans across a wide range of fundamental inhibitors. processor configurations, interconnections, virtualization environ- Unlike previous works [1–4, 6–9] on benchmarking clouds ments, and pricing rates and models. for science, we take a more holistic and practical viewpoint. In this paper, we take a holistic viewpoint to answer the Rather than limiting ourselves to the problem – what is question – why and who should choose cloud for HPC, for what applications, and how should cloud be used for HPC? To this the performance achieved on cloud vs. supercomputer, we end, we perform a comprehensive performance evaluation and address the bigger and more important question – why and analysis of a set of benchmarks and complex HPC applications who should choose (or not choose) cloud for HPC, for what on a range of platforms, varying from supercomputers to clouds. applications, and how should cloud be used for HPC? In our Further, we demonstrate HPC performance improvements in efforts to answer this research question, we make the following cloud using alternative lightweight virtualization mechanisms – thin VMs and OS-level containers, and hypervisor- and contributions in this work. application-level CPU affinity. Next, we analyze the economic • We evaluate the performance of HPC applications on a aspects and business models for HPC in clouds. We believe that is an important area that has not been sufficiently addressed range of platforms varying from supercomputer to cloud, by past research. Overall results indicate that current public analyze bottlenecks and the correlation between applica- clouds are cost-effective only at small scale for the chosen HPC tion characteristics and observed performance, identifying applications, when considered in isolation, but can complement what applications are suitable for cloud. (x III,x IV) supercomputers using business models such as cloud burst and • We analyze the impact of virtualization on HPC applica- application-aware mapping. tions and propose techniques, specifically thin hypervi- Keywords-HPC, Cloud, Performance Analysis, Economics sors, OS-level containers, and hypervisor and application- level CPU affinity, to mitigate virtualization overhead and I. INTRODUCTION noise, addressing – how to use cloud for HPC. (x V) Increasingly, some academic and commercial HPC users • We investigate the economic aspects of running in cloud why are looking at clouds as a cost effective alternative to dedi- vs. supercomputer and discuss it is challenging to cated HPC clusters [1]. Renting rather than owning a cluster make a profitable business for cloud providers for HPC avoids the up-front and operating expenses associated with a compared to traditional cloud applications. We also show dedicated infrastructure. Clouds offer additional advantages of that small/medium-scale HPC users are the most likely who x a) elasticity – on-demand provisioning, and b) virtualization- candidates can benefit from an HPC-cloud. ( VI) enabled flexibility, customization, and resource control. • Instead of considering cloud as a substitute of supercom- puter, we investigate the co-existence of supercomputer Despite these advantages, it still remains unclear whether, and cloud addressing – how to use cloud for HPC.(x VII) and when, clouds can become a feasible substitute or com- plement to supercomputers. HPC is performance-oriented, We believe that it is important to consider views of both, whereas clouds are cost and resource-utilization oriented. Fur- HPC users and cloud providers, who sometimes have con- thermore, clouds have traditionally been designed to run busi- flicting objectives: users must see tangible benefits (in cost ness and web applications. Previous studies have shown that or performance) while cloud providers must be able to run commodity interconnects and the overhead of virtualization a profitable business. The insights from comparing HPC on network and storage performance are major performance applications execution on different platforms is useful for both. barriers to the adoption of cloud for HPC [1–4]. While the HPC users can better quantify the benefits of moving to a TABLE I: Testbed Platform Resource Ranger Taub Open Cirrus Private Cloud Public Cloud Processors 16×AMD Opteron 12×Intel Xeon 4×Intel Xeon 2×QEMU Virtual 4×QEMU Virtual in a Node QC @2.3 GHz X5650 @2.67 GHz E5450 @3.00 GHz CPU @2.67 GHz CPU @2.67 GHz Memory 32 GB 48 GB 48 GB 6 GB 16 GB Network Infiniband QDR 10GigE internal, Emulated Emulated (1 GB/s) Infiniband 1GigE x-rack 1GigE 1GigE OS Linux Sci. Linux Ubuntu 10.04 Ubuntu 10.04 Ubuntu 10.10 cloud and identify which applications are better candidates VM run on top of the KVM hypervisor. In the thin VM setup, for the transition from in-house to cloud. Cloud providers can we enable Input/Output Memory Management Unit (IOMMU) optimize the allocation of applications to their infrastructure on the Linux hosts to allow VMs to directly access the Ethernet to maximize utilization, while offering best-in-class cost and hardware, thus improving the network I/O performance [16]. quality of service. TABLE II: Virtualization Testbed II. EVALUATION METHODOLOGY Virtualization Resource In this section, we describe the platforms which we com- Phy., Container Thin VM Plain VM pared and the applications which we chose for this study. Processors in 12×Intel Xeon 12×QEMU 12×QEMU Vir- a Node/VM X5650 Virtual CPU tual CPU A. Experimental Testbed @2.67 GHz @2.67 GHz @2.67 GHz Memory 120 GB 100 GB 100 GB We selected platforms with different interconnects, operat- Network 1GigE 1GigE Emulated 1GigE ing systems, and virtualization support to cover the dominant OS Ubuntu 11.04 Ubuntu 11.04 Ubuntu 11.04 classes of infrastructures available today to an HPC user. B. Benchmarks and Applications Table I shows the details of each platform. In case of cloud a To gain insights into the performance of selected platform node refers to a virtual machine and a core refers to a virtual over a range of applications, we chose benchmarks and ap- core. For example, “2 × QEMU Virtual CPU @2.67GHz” plications from different scientific domains and those which means each VM has 2 virtual cores. Ranger [10] at TACC differ in the nature, amount, and pattern of inter-processor was a supercomputer with a theoretical peak performance of communication. Moreover, we selected benchmarks written in 579 Tera FLOPS1, and Taub at UIUC is an HPC-optimized two different parallel programming environments – MPI [17] cluster. Both use Infiniband as interconnect. Moreover, Taub and CHARM++ [18]. Similarly to previous work [2, 3, 6, 9], uses scientific Linux as OS and has QDR Infiniband with band- we used NAS Parallel Benchmarks (NPB) class B [19] (the width of 40 Gbps. We used physical nodes with commodity MPI version, NPB3.3-MPI), which exhibit a good variety of interconnect at Open Cirrus testbed at HP Labs site [11]. The computation and communication requirements. Moreover, we final two platforms are clouds – a private cloud setup using chose additional benchmarks and real world applications: Eucalyptus [12], and a public cloud. We use KVM [13] for • Jacobi2D – A 5-point stencil kernel to average values virtualization since it has been shown to be a good candidate in a 2-D grid. Such stencil kernels are common in for HPC virtualization [14]. scientific simulations, numerical linear algebra, numerical In case of cloud, most common deployment of multi- solutions of Partial Differential Equations (PDEs), and tenancy is not sharing individual physical cores, but rather image processing. done at the node, or even coarser level. This is even more • NAMD [20] – A highly scalable molecular dynamics true with increasing number of cores per server. Hence, our application and representative of a complex real world cloud experiments involve physical nodes (not cores) which application used ubiquitously on supercomputers. We were shared by VMs from external users, hence providing a used the ApoA1 input (92k atoms) for our experiments. multi-tenant environment. • ChaNGa [21] (Charm N-body GrAvity solver) – A cos- Another dedicated physical cluster at HP Labs Singapore mological simulation application which performs colli- (HPLS) is used for controlled tests of the effects of virtual- sionless N-body interactions using Barnes-Hut tree for ization (see Table II). This cluster is connected with a Gigabit calculating forces. We used a 300,000 particle system. Ethernet network on a single switch. Every server has two • Sweep3D [22] – A particle transport code widely used for CPU sockets, each populated with a six-core CPU, resulting evaluating HPC architectures. Sweep3D is the heart of a in 12 physical cores per node. The experiment on the HPLS real Accelerated Strategic Computing Initiative (ASCI) cluster involved benchmarking on four configuration: physical application and exploits parallelism via a wavefront pro- machines (bare), LXC containers [15], VMs configured with cess. We ran the MPI-Fortran77 code in weak scaling the default emulated network (plain VM), and VMs with pass- mode maintaining 5 × 5 × 400 cells per processor with through networking (thin VM).

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    9 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us