Enabling Automated HPC / Database Deployment Via the Appscale Hybrid Cloud Platform

Enabling Automated HPC / Database Deployment via the AppScale Hybrid Cloud Platform Chris Bunch Chandra Krintz Computer Science Department University of California, Santa Barbara {cgb, ckrintz} @ cs.ucsb.edu 1. ABSTRACT Keywords In this paper, we discuss a prevalent issue facing the HPC Cloud Platform, Service Placement, Domain Specific Lan- community today: the lack of automation in the installation, guage deployment, and integration of HPC and database software. As a result, scientists today must play a dual role as re- 2. INTRODUCTION searchers and as system administrators. The time required for scientists to become proficient with software stacks is High Performance Computing (HPC) aims to solve prob- significant and has increased with the complexity of modern lems normally intractable on a single computer via the use systems such as cloud-based platforms and infrastructures. of large numbers of machines (e.g., clusters, datacenters, or However, cloud computing offers many potential benefits supercomputers). Over several decades different types of to HPC software developers. It facilitates dynamic acquisi- machine architectures, software frameworks, and program- tion of computing and storage resources and access to scal- ming languages have been developed to aid the HPC com- able services. Moreover, cloud platforms such as AppScale munity in using computing to work on scientific problems. abstract away the underlying system and automate deploy- Throughout this period, one problem has remained persis- ment and control of supported software and services. As tent to HPC users: the high barrier-to-entry imposed by the part of this project, we have extended AppScale with do- lack of automation in software packages. main specific language support called Neptune that gives The lack of automation has meant that scientists look- developers straightforward control over automatic configu- ingtouseHPCresourceshavehadtobecomesystemad- ration and deployment of cloud applications. Neptune also ministrators to understand how to use the large numbers extends cloud support beyond web-services to HPC appli- of machines typically involved for HPC applications. This cations, components, and libraries. We discuss AppScale unnecessarily takes time away from their research and adds and Neptune, and how they can be extended via more in- a barrier-to-entry to scientists who want to use HPC re- telligent database usage to provide a better solution for the sources but lack the advanced technical skills of a system next-generation of cloud-based HPC and data-intensive ap- administrator. Further time must be dedicated to sharing plications. data produced by scientific experiments, and the inability to do so trivially stifles the ability of others to verify and reproduce their work. Social networking software created over the last decade Categories and Subject Descriptors has proven that automated data sharing is possible. Like D.3.2 [Programming Languages]: Software Engineering the users of social networking applications, HPC users need - Language Classifications (Extensible Languages); C.2.4 [ a simple service that they can save the outputs of their pro- Computer Systems Organization]: Computer-Commun- grams to and retrieve new inputs from. This service should ication Networks - Distributed Systems (Distributed Appli- be able to act as a replicated, distributed, fault-tolerant cations) data storage mechanism that provides easy access for scientists, the community at large, and other HPC programs. Distributed file systems and databases have therefore been General Terms natural choices for this job: file systems like NFS and Lus- tre [11] have been used by the HPC community to varying Design, Languages, Performance levels of success, while databases like MySQL and, as of re- cent, NoSQL datastores like HBase [9] and Cassandra [3] offer easier interfaces and greater data access mechanisms (e.g., structured data formats, query languages) for users of Permission to make digital or hard copies of all or part of this work for various programming languages. personal or classroom use is granted without fee provided that copies are Some of these software packages can be installed with a not made or distributed for profit or commercial advantage and that copies single command-line instruction, as their installation process bear this notice and the full citation on the first page. To copy otherwise, to is largely automated, but many are not, requiring scientists republish, to post on servers or to redistribute to lists, requires prior specific who want to use these highly tuned software packages to permission and/or a fee. HPCDB’11, November 18, 2011, Seattle, Washington, USA. learn processes like kernel recompilation, which exists well Copyright 2011 ACM 978-1-4503-1157-1/11/11 ...$10.00. beyond the knowledge base that is comfortable to scientists 13 at large. Furthermore, for software packages that provide 3. CLOUD INFRASTRUCTURES FOR HPC automated installation (which is not the majority of those This system we propose is not tied to anything cloud- in use by the HPC community), deploying the software in a specific, but utilizing cloud resources will enable a greater distributed setting is often equally difficult. For databases, class of scientists to use this system. This is largely a matter this can require scientists to learn the optimal parameters of access: most scientists working at research facilities have for settings such as the block size of the underlying virtual some access to shared cluster resources, but oftentimes find file system (in the case of HBase and Hypertable), the max- an inadequate number of resources available for the work imum number of clients that the database should support (a they need to run experiments on. Additionally, this work common database configuration option), or the desired level fits in nicely with the “bursty” style of access that the cloud of consistency for the data itself. Setting any of these vari- specializes in: scientists we have encountered often only need ables (as well as dozens more that each database exposes to a few resources at all times to develop their code on, and users) incorrectly can significantly degrade the performance many resources only when they are ready to run large-scale of applications using it, and a scientist who has not used experiments. This use case also makes better use of cloud these technologies before is unlikely to know how to config- resources from a monetary standpoint than statically acquir- ure these databases for optimal use. ing a large number of machines for a grid and leaving them The problem of installation and optimal deployment is idle for most of the time. exacerbated in the case of newer technologies, such as the Cloud infrastructure providers such as Amazon Web Ser- NoSQL datastores, which have been around for less time vices [1] have seen some use by scientists to date. How- than databases like MySQL. This is not advocating against ever, the consensus among the HPC community appears their use: it is simply the case that less “best practices” are to be that since infrastructure providers employ virtualiza- available for newer technologies, and as these software pack- tion, which necessarily involves a performance degradation ages mature, experimentation will reveal how to tune them depending on the workload involved (e.g., CPU, memory, to get them working with HPC codes in an acceptable man- disk I/O, or network I/O) and virtualization technology em- ner. For example, the Cassandra datastore aims to yield ployed (e.g., Xen, KVM, VMware), that virtualization is higher performance to programmers at the cost of strong thus an impediment to solving larger problems than a fur- consistency, but to acquire these performance gains, pro- therance. This is exacerbated by the opaqueness of the cloud grammers must rewrite their applications to keep in mind itself: because resources are meant to be interchangable, that the data they receive might not be completely up-to- users often cannot specify that resources are meant to reside date. This different mindset is difficult for seasoned pro- in close physical proximity to one another. This is in great grammers to accommodate, and it is reasonable to assume contrast to the grid computing model HPC scientists are that it is just as difficult (if not more so) for scientists. familiar with, in which they can assume that the machines We see a solution to these problems in the cloud com- they are operating with are in the same datacenter and en- puting realm. Here, resources can dynamically (and inex- joy a low latency with one another (often being connected pensively) be acquired and released as needed for computa- with high-speed network technologies such as Infiniband). tion and storage by users via web interfaces or within their Cloud infrastructure providers have made some progress programs themselves. Cloud computing is typically broken towards removing opacity in exchange for greater perfor- down into three layers of abstraction. The lowest level of mance. Amazon Web Services enables users to pay for access abstraction, infrastructure-as-a-service (IaaS), provides ac- to “Cluster Compute Instances”, a special type of resource cess to scalable computation and storage resources. Amazon that is guaranteed to be colocated within close range of other Web Services, through their Elastic Compute Cloud offer- instances of the same type (and thus enjoy low network la- ing, gives users rapid access to large numbers of virtualized tencies), and boasts more CPU and memory for HPC jobs compute instances that can be spread out across multiple ge- in exchange for more than an order-of-magnitude price com- ographic locations if needed. At a higher level of abstraction pared to the low-end “Small” instance type. Yet the ultimate exists platform-as-a-service (PaaS), which exposes scalable customization is offered by private cloud infrastructures like APIs that developers write programs against. Google App Eucalyptus [12], which facilitate the use of any type of hard- Engine is a prominent force in this field, allowing users to ware to be used to construct a cloud infrastructure.

Load more