View from the Editor: George Pallis • [email protected] The AppScale Cloud Platform Enabling Portable, Scalable Web Application Deployment

Chandra Krintz • University of California, Santa Barbara

AppScale is an open source distributed software system that implements a cloud platform (PaaS). AppScale makes cloud applications easy to deploy and scale over disparate cloud fabrics, implementing a set of and architecture that also makes apps portable across the services they employ. AppScale is API-compatible with App Engine (GAE) and thus executes GAE applications on-premise or over other cloud infrastructures, without modification.

s compute power, disk storage, and high- key barrier to their widespread use remains: end network communication costs plum- lock-in to a particular cloud system or app ser- A met, has emerged to vice implementation. We can address this porta- provide intuitive, utility-style access to vast bility problem with a new cloud platform called pools of resources (compute, storage, network- AppScale. ing, and software services). Although such pro- cessing power is cheap and readily available, The Portability Problem accessing it from cloud infrastructure providers Public PaaS systems such as Google App via infrastructure as a service (IaaS) currently Engine (GAE), Azure, Amazon Elastic requires significant expertise, experience, and Beanstalk, and VMWare CloudFoundry all offer time to customize, configure, deploy, and man- similar cloud service ecosystems for app use. age virtual machines (VMs). Unfortunately, they do so via different APIs and Recent advances in platform-level cloud language bindings, scale and service-level guar- computing (, or PaaS) have antees, performance levels, pricing models, and significantly simplified cloud use by giving standards, rules, and restrictions to which apps developers complete software/runtime stacks must comply. The sheer multitude of offerings (versus the self-service VMs of IaaS) on which to and options makes it challenging for new and execute their Web-accessible applications (apps) expert software developers to determine which and services. PaaS systems offer programmatic set of services is best for their apps, for some access to scalable, distributed, and fault-tolerant definition of “best” (price, performance, scale, cloud services, which eliminates the need for configurability, familiarity, ease of use, and so developers to write or deploy their own, and on). Moreover, once users choose a platform, lets them focus on innovation. Cloud platform code their app to its service interfaces, and con- application services typically include key-value, figure it for that system, they become locked in relational, object, or blob data storage, data cach- to both the public cloud fabric and to the service ing, email and messaging, authentication, moni- implementation the platform chooses to export. toring, resource/service acquisition, background This lock-in occurs because changing, even to a tasking, and data analytics technologies, among similar service or platform from a different pro- others. Extant PaaS systems automatically fully vider, requires the developer to exert significant or partially configure, deploy, and scale the apps porting effort (that is, code changes in the app). and services they execute. Unfortunately, given Surprisingly, this lack of portability across the current state of the art in PaaS systems, a popular app services also occurs even for open

72 Published by the IEEE Computer Society 1089-7801/13/$31.00 © 2013 IEEE IEEE COMPUTING

IC-17-02-VftC.indd 72 3/5/13 12:08 PM The AppScale Cloud Platform

source technologies. With cloud ser- Our work at the University of for each service or API its apps use vices’ popularity, a vast diversity of California, Santa Barbara, addresses within a given cloud instance. open source alternatives have emerged these portability limitations that Such choice precludes application/ that implement service functionality. extant cloud systems and app ser- code lock-in to any particular cloud For example, the immensely popular vices impose with new research and system (IaaS or Paas) or service key-value (informally dubbed NoSQL) technology. In particular, we attempt implementation (NoSQL, SQL, task- datastores, initially described and to reduce lock-in and encourage and ing, messaging, authentication, and used by Google and Amazon for their facilitate broader use (and thus inves- so on). Moreover, AppScale lets users internal “big data” concerns, are now tigation) of these systems for deploy- experiment with these different widely available as open source. Cur- ing current and future Web-based technologies with a low barrier to rently, more than 200 NoSQL options and data-intensive apps. Toward this entry, and gives cloud researchers a offer a similar overall service for apps, end, we’ve designed, developed, and platform that facilitates the investi- but they differ in one or more charac- released as open source the AppScale gation of cloud technologies using a teristics, such as their programming cloud platform (http://appscale.cs diversity of real apps and integrated interfaces (APIs), query language and .ucsb.edu). AppScale is a software technologies. query support, deployment topology infrastructure that implements a (master/slave versus peer-to-peer), PaaS cloud to exploit the benefits AppScale performance and scaling behavior, that such systems offer (ease of use Figure 1 depicts the design of the data consistency policies, replication through a simplified deployment AppScale system. Our AppScale policies, and programming language and execution model, and automatic approach is unique in that we fully bindings, among others. Moreover, each service alternative has a unique methodology for its configuration and Although open source app services offer deployment in a distributed setting, which imposes overheads and learn- developers more implementation choices than ing curves on developers and sys- tem administrators because each is a public cloud systems, the choice of any single complex system, performance is sen- sitive to configuration, and software implementation also leads to code lock-in. updates are frequent. A similar diversity in offerings is available for other app services, deployment, configuration, and elas- emulate GAE by implementing the including authentication, Map­ tic scalability). App Engine programming model and Reduce data analytics, full text search, AppScale differs from other PaaS APIs both to bring successful pub- multitasking and distributed coordi- offerings in three primary ways. First, lic cloud technologies on-premise nation, caching, SQL databases, and AppScale executes automatically over and under developer control, and messaging. Although open source multiple IaaS clouds (on-premise to export these APIs via a plug-in app services offer developers more and public), providing “write-once, software architecture that decouples implementation choices than public deploy-anywhere” functionality for the app from its service implemen- cloud systems (which choose, limit, cloud applications that execute over tation so that its implementation or restrict implementations for scal- it. Second, AppScale implements the (and underlying cloud fabric) can ing and management purposes), the programming model and APIs of the de be swapped out without changing choice of any single implementation facto PaaS public cloud standard: GAE the app. also leads to code lock-in. As for cloud (http://code.google.com/appengine/ GAE is a public cloud platform fabrics and proprietary systems, it’s docs/whatisgoogleappengine.html). (distributed Web service stack) that difficult to know in advance which Applications that execute over App hosts more than 1 million active apps option is best for a particular app Engine also execute over AppScale today within Google’s data centers or workload, and moving between without modification, extending app (http://googleappengine.blogspot service implementations, even if the portability to multiple IaaS and PaaS .com/2011/05/year-ahead-for-google- app is designed to do so, consumes cloud fabrics. Third, the AppScale app-engine.html). This widespread time, lines of code, and programmer software architecture integrates (and use and uptake has resulted from focus that could instead be used for automatically configures, deploys, the programming model that App the core innovation. and scales) multiple alternatives Engine implements, which reflects a

MARCH/APRIL 2013 73

IC-17-02-VftC.indd 73 3/5/13 12:08 PM View from the Cloud

AppScale platform APIs (GAE++) instance plugs in a single service for Load balancing each API, but users can choose from multiple alternatives upon deploy- Fault tolerance and elastic scaling The AppScale distributed ment. This support lets users eas- Con guration and deployment cloud platform ily compare and contrast different API implementation/cloud and service integration service alternatives with their apps Plug-in and workloads. For example, for adaptors the Datastore API, AppScale inte- • Data management NoSQL, SQL, objects grates more than a dozen different Cassandra MySQL HBase S3 plug-in alternatives, including those AppScale • Analytics for Cassandra, HBase, Hypertable, search, MapReduce plug-ins Solr Cloudera Lucene Google Redis, MySQL Cluster (which we employ as a key-value store), and • Clouds SimpleDB (an Amazon public cloud On-premise AWS GAE Azure datastore service). Apps that use the Datastore API can use any one Figure 1. Design of the AppScale cloud platform. AppScale implements a of these plug-ins simply by execut- multitier distributed Web service stack with automatic deployment, load ing over a different AppScale cloud balancing, and scaling, along with API adaptors for alternatives for each service instance (AppScale supports moving API. Developers or systems operators deploy AppScale over virtualized cluster data if needed). As part of this plug- resources or cloud infrastructures and then upload their apps to the platform, in support, the AppScale platform using an AppScale Web service or command line toolset. automatically configures, deploys, scales, and manages any faults of set of best practices for intuitive and That is, AppScale is API-compatible the service plug-ins to relieve both expedited Web service development, and emulates GAE’s fully distrib- developers and cloud administrators as observed by Google engineers. uted behavior — on-premise or of this significant burden. The programming model extracts over an IaaS cloud instead of on and provides “as-a-service” appli- Google’s resources — so that any Additional Features cation support technologies that app that executes over GAE also The AppScale plug-in architecture are common across a wide range of executes over AppScale, without code also lets us export many other fea- Web applications. Developers incor- modification. tures and cloud technologies to app porate the services (data storage, com- API-compatibility with GAE lets developers. In particular, AppScale munications, authentication, tasking, us engender a large and growing efficiently profiles and extends its and so on) into their apps using a sim- user community, provide develop- integrated services in a technology- ple and intuitive API for each. This ers with access to extant applica- independent way. For example, we model lets developers focus on app tions, and investigate the potential provide a limited form of ACID innovation rather than on the ancil- implementations of, and extensions transaction semantics across Data­ lary support on which their apps to, public cloud systems using open store API plug-ins. We also integrate rely. GAE automatically instanti- source technologies. Developers code and export public cloud services as ates, scales, and manages faults for their applications to this set of APIs API plug-ins from popular cloud fab- the app as well as its service ecosys- to access each of the ancillary ser- rics including tem and isolates apps using sandbox vices the app requires. AppScale, (AWS), Google cloud technologies, support via high-level languages like App Engine, implements each and . We currently (Java, Python, and Go) and their of these APIs and then automati- integrate these technologies via runtimes. cally deploys, manages, and scales plug-in implementations for NoSQL, the apps along with their service SQL, and unstructured storage. We Programming Model and APIs ecosystems. also go beyond the GAE APIs and AppScale implements the App To implement the services the leverage the AppScale architecture’s Engine programming model for GAE APIs export, AppScale provides extensibility to provide access to VM Web-based application develop- a software framework into which we control (start, stop, and monitoring), ment by implementing each API plug multiple, competitive implemen- background tasking of arbitrary that GAE defines and supporting all tations of each service (open source programs or scripts, and support for the GAE programming languages. or proprietary). Each AppScale cloud other services that are increasingly

74 www.computer.org/internet/ IEEE INTERNET COMPUTING

IC-17-02-VftC.indd 74 3/5/13 12:08 PM The AppScale Cloud Platform

important for data-intensive apps, and public cloud deployments over was funded in part by Google, IBM, US such as MapReduce (via Hadoop) and Amazon Elastic Compute Cloud National Science Foundation grants CNS- statistical analytics. (EC2). Cloud administrators supply 0546737, CNS-0905237, CNS-1218808, and The AppScale platform also pro- their credentials to the AppScale the US National Institutes of Health grant vides the scalability, ease of use, and tools to initiate automatic deployment 1R01EB014877-01. high availability that users have and then manage developers’ cloud come to expect from public cloud use. Developers log into an AppScale References platforms and infrastructures. This cloud to upload their apps. The 1. C. Bunch et al., “A Pluggable Autoscaling includes elasticity and fault detection/ AppScale cloud then executes and Service for Open Cloud PaaS Systems,” recovery,1 authentication and user automatically scales the apps on Proc. IEEE/ACM Int’l Conf. Utility and control, monitoring and logging, the developers’ behalf. Because Cloud Computing, 2012. cross-cloud data and application AppScale can also run as a single 2. N. Chohan et al., “North by Northwest: migration,2 hybrid cloud multi- VM instance, developers can run Infrastructure Agnostic and Data­ tasking,3 and offline analytics and AppScale locally over virtualization store Agnostic Live Migration of Pri- disaster recovery.2,4 In particular, we to test and debug their applications vate Cloud Platforms,” Proc. 4th Usenix couple elasticity and fault tolerance prior to cloud deployment (AppScale Workshop Hot Topics in Cloud Comput- to start/stop platform components or App Engine). ing (HotCloud 12), Usenix Assoc., 2012; within and across VMs, and we ulti- www.cs.ucsb.edu/~ckrintz/papers/ mately rely on Apache Zookeeper — hotcloud12.pdf. which we employ for distributed n summary, AppScale is an exten- 3. C. Bunch et al., “Language and Runtime coordination and state management — I sible and freely available distrib- Support for Automatic Configuration and for system survivability. uted cloud platform that facilitates Deployment of Scientific Computing Soft- simplified development, automated ware over Cloud Fabrics,” J. Grid Comput- AppScale Deployment deployment, and empirical investi- ing, vol. 10, no. 1, 2012, pp. 23–46. and Use gation of cloud apps and their ser- 4. N. Chohan et al., “Hybrid Cloud Sup- We release AppScale as a single VM vice ecosystems. AppScale enables port for Large-Scale Analytics and Web and easy-to-use Web-based tool- applications written in high-level Processing,” Proc. 3rd Usenix Conf. Web kit, which automatically deploys an languages to execute via AppScale Application Development (WebApps 12), AppScale cloud using one or more over different cloud fabrics and to Usenix Assoc., June 2012. VM instances. Each instance imple- employ a vast diversity of applica- 5. D. Nurmi et al., “The Open- ments one or more AppScale com- tion service implementations with- Source Cloud-Computing System,” Proc. ponents and services. An AppScale out modification. Such portability IEEE Int’l Symp. Cluster Computing and cloud integrates and automatically enables novice and expert devel- the Grid, 2009; http://open.eucalyptus deploys various open source technol- opers alike to quickly and easily .com/documents/ccgrid2009.pdf. ogies that facilitate its functionality, develop GAE apps that implement including Apache Zookeeper (for interesting Web service and data Chandra Krintz is a professor of computer fault-tolerant distributed coordina- analytic applications and use extant science at the University of California, tion), RabbitMQ (for distributed multi- and emerging cloud systems without Santa Barbara, and is CTO and cofounder tasking), ejabberd/Strophe.js (for requiring them to become experts of AppScale Systems. Her research inter- messaging and channel communica- at the underlying technologies or ests include cloud computing, compil- tion), Lucene (for full text search), locked in to any particular cloud ers and runtime systems, dynamic and sendmail, (for distrib- or service implementation their adaptive optimization, high-performance uted caching), Hadoop MapReduce, apps use. Additional information on computing, resource-aware Internet and R analytics, the network file system, AppScale and directions for down- embedded systems, and broadening par- and the Hadoop distributed file sys- load and use are available at http:// ticipation in computing. Krintz has a PhD tem, among others. appscale.cs.ucsb.edu. in computer science from the University Cloud configuration and elastic- of California, San Diego. Contact her at ity is automatic but can be informed Acknowledgments [email protected]. by user preferences if desired. As This work wouldn’t be possible without an mentioned previously, our tools incredible group of students, led by Chris deploy AppScale over extant IaaS Bunch and Navraj Chohan, with whom I Selected CS articles and columns fabrics; we currently support on- have had the privilege to work on this proj- are also available for free at http:// premise deployments over Eucalyptus5 ect over the past three years. This work ComputingNow.computer.org.

MARCH/APRIL 2013 75

IC-17-02-VftC.indd 75 3/5/13 12:08 PM