Next Stop, the Cloud: Understanding Modern Web Service Deployment in EC2 and Azure
Total Page:16
File Type:pdf, Size:1020Kb
Next Stop, the Cloud: Understanding Modern Web Service Deployment in EC2 and Azure Keqiang He, Alexis Fisher, Liang Wang, Aaron Gember, Aditya Akella, Thomas Ristenpart University of Wisconsin – Madison {keqhe,afisher,liangw,agember,akella,rist}@cs.wisc.edu ABSTRACT Despite the popularity of public IaaS clouds, we are unaware of An increasingly large fraction of Internet services are hosted on a any in-depth measurement study exploring the current usage pat- cloud computing system such as Amazon EC2 or Windows Azure. terns of these environments. Prior measurement studies have quan- But to date, no in-depth studies about cloud usage by Internet ser- tified the compute, storage, and network performance these clouds vices has been performed. We provide a detailed measurement deliver [29, 30], evaluated the performance and usage patterns of study to shed light on how modern web service deployments use specific services that are hosted in these clouds, e.g., Dropbox [23], the cloud and to identify ways in which cloud-using services might or examined cloud usage solely in terms of traffic volume [28]. improve these deployments. Our results show that: 4% of the We present the first in-depth empirical study of modern IaaS Alexa top million use EC2/Azure; there exist several common de- clouds that examines IaaS cloud usage patterns and identifies ways ployment patterns for cloud-using web service front ends; and ser- in which cloud tenants could better leverage IaaS clouds. We focus vices can significantly improve their wide-area performance and specifically on web services hosted within IaaS clouds, which our failure tolerance by making better use of existing regional diversity study (unsurprisingly) indicates is a large and important use case in EC2. Driving these analyses are several new datasets, includ- for IaaS. ing one with over 34 million DNS records for Alexa websites and We first examine who is using public IaaS clouds. We generate a packet capture from a large university network. a dataset of cloud-using domains using extensive DNS probing in order to compare the IPs associated with websites on Alexa’s top 1 million list [1] against published lists of cloud IP ranges. This Categories and Subject Descriptors identifies that ≈40K popular domains (4% of the Alexa top mil- C.2.0 [Computer-Communication Network]: General; C.4[Perfo- lion) have a subdomain running atop Amazon EC2 or Windows rmance of Systems]: Metrics—performance measures Azure, two of the largest public clouds. We extract an additional ≈13K cloud-using domains from a one week packet capture from General Terms a large university network, and we use this capture to characterize the network traffic patterns of cloud-hosted web services. These Measurement results indicate that a large fraction of important web services are already hosted within public IaaS clouds. Keywords We proceed to dissect how these services are using the cloud. Web Service, Cloud Computing, EC2, Azure, DNS, trace analysis EC2 and Azure both have a veritable potpourri of features, includ- ing virtual machines, load balancers, platform-as-a-service (PaaS) environments, content-distribution networks (CDNs), and domain 1. INTRODUCTION name services. They also give tenants the choice of deploying Up until a few years ago, web services were hosted in heteroge- their services in several different regions (i.e., geographically dis- neous server clusters or co-location centers that were widely dis- tinct data centers), and EC2 provides several different “availability tributed across different network providers and geographic regions. zones” within each region. We couple analysis of DNS records with Today, web services are increasingly being deployed in infrastruc- two different cloud cartography techniques [34] to identify which ture-as-a-service (IaaS) clouds such as Amazon EC2, Windows features, regions and zones web services use. We identify several Azure, and Rackspace. Industry and the media claim that over 1% common front end deployment patterns and report estimates of the of Internet traffic goes to EC2 [31] and that outages in EC2 are percentages of Alexa subdomains using each of the patterns. In par- reputed to hamper a huge variety of services [4, 6, 24, 35]. ticular, we find that about 4% of EC2-using web services use load balancers and 8% of them leverage PaaS. Only 5% of the DNS Permission to make digital or hard copies of all or part of this work for personal or servers used by cloud-using subdomains run on VMs inside EC2 classroom use is granted without fee provided that copies are not made or distributed or Azure. We also show that 97% of the subdomains hosted on for profit or commercial advantage and that copies bear this notice and the full cita- EC2 and 92% of the subdomains hosted on Azure are deployed tion on the first page. Copyrights for components of this work owned by others than in only a single region. Counted among these are the subdomains ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission of most of the top 10 (by Alexa rank) cloud-using domains. Ser- and/or a fee. Request permissions from [email protected]. vices deployed in EC2 also appear to make limited use of different IMC’13, October 23–25, 2013, Barcelona, Spain. availability zones: our measurements estimate that only 66% of Copyright 2013 ACM 978-1-4503-1953-9/13/10 ...$15.00. subdomains use more than one zone and only 22% use more than http://dx.doi.org/10.1145/2504730.2504740. two. This lack of redundancy means that many (even highly ranked (PaaS) node (P3). With P2, the client request is subsequently di- rected to a VM1. Tenants using P1–P3 may also rely on additional VMs or systems (dashed lines) to handle a client’s request; these additional components may or may not be in the same region or availability zone (indicated by the gray boxes). An object returned to a client (e.g., a web page) may sometimes require the client to (a) P1: VM front end (b) P2: Load balancer front end obtain additional objects (e.g., a video) from a content-distribution network (P4). We focus on studying the front end portions of web service de- ployments within the above four deployment patterns (indicated by the thicker lines in Figure 1). These portions are encountered within the initial few steps of a client making a request. We leave an (c) P3: PaaS front end (d) P4: Leverage CDN exploration of deployment/usage patterns covering the later steps (e.g. back-end processing) for future work. Figure 1: Deployment patterns for web services. 2.1 Datasets Alexa) services will not tolerate single-region or even single-zone We use two primary datasets: (i) a list of cloud-using subdo- failures. mains derived from Alexa’s list of the top 1 million websites, and Finally, we use a series of PlanetLab-based [17] active measure- (ii) packet traces captured at the border of the UW-Madison cam- ments and simulations to estimate the impact of wide-area route pus network. Both datasets leverage the fact that EC2 [12] and outages and the potential for wide-area performance improvement. Azure [8] publish a list of the public IPv4 address ranges associ- We find that expanding a deployment from one region to three ated with their IaaS cloud offerings. Below, we provide details on could yield 33% lower average latency for globally distributed cli- our Alexa subdomains and packet capture datasets. We augment ents, while also substantially reducing the risk of service downtime these data sets with additional traces and active measurements to due to downstream Internet routing failures. aid specific analyses; we describe these at the appropriate places in The remainder of our paper is organized as follows. We first pro- subsequent sections. vide background on Amazon EC2 and Windows Azure and discuss the primary datasets we use in our study (§2). We then examine Top Cloud-Using Subdomains Dataset. Our first dataset is a list who is using the cloud (§3), and which cloud features, regions, and of subdomains which use EC2 or Azure and are associated with do- zones are used by the cloud-using web services we identified (§4). mains on Alexa’s list of the top 1 million websites [1]. We consider Based on these observations, we proceed to estimate the wide-area- a subdomain to use EC2 or Azure if a DNS record for that subdo- failure tolerance of current web service deployments and the poten- main contains an IP address that falls within EC2 or Azure’s public tial for performance improvement (§5). Finally, we discuss related IP address ranges. work (§6) before concluding (§7). To construct this dataset, we first identified the subdomains as- sociated with each domain on Alexa’s list of the top 1 million web- 2. MEASUREMENT SCOPE & DATASETS sites. We started with Alexa’s top 1 million list from Feburary 6, 2013 and attempted to issue a DNS zone transfer (i.e., a DNS query Public IaaS clouds, such as Amazon EC2, Windows Azure, and of type AXFR) for each domain on the list. The query was suc- Rackspace, allow tenants to dynamically rent virtual machine (VM) cessful for only about 80K of the domains. For the remaining do- instances with varying CPU, network, and storage capacity. Cloud mains, we used dnsmap [16] to identify subdomains by brute-force. tenants have the option of renting VMs in one or more geographi- Dnsmap uses a pre-defined word list, which we augmented with cally distinct data centers, or regions.