Cloud Performance from the End User

Cloud Performance from the End User

CLOUD PERFORMANCE FROM THE END USER A CloudOps ANALYSIS OF CLIENT-SIDE LATENCY AND AVAILABILITY ON CLOUD PLATFORMS EXECUTIVE SUMMARY We analyzed several days’ data collected worldwide from visitors to specially instrumented sites. The data totaled nearly three hundred million individual tests, conducted by 1.2 million individual browser visits, measuring performance and availability of nine cloud providers. Some quick lessons learned include: Regional effects matter tremendously, and the difference between countries can be significant. Directing a regional client to the correct regionalized zone for a cloud provider does improve performance. The average time to complete an HTTP request and receive a response, worldwide, was 426.4 milliseconds. The average availability worldwide was 97.69%. While most cloud providers had roughly similar average availability, a closer analysis of percentiles shows that the worst-served visitors fared very differently. This underscores the importance of doing proper analysis on performance data. End user information is an important complement to synthetic testing. While the results aren’t as consistent, they provide insight into the conditions of far-flung end users across a broad spectrum of networks and countries. Given variance in cloud performance and availability by region and by day, it makes sense for serious cloud users to hedge their bets, and find ways to arbitrate cost and service quality across providers. PAGE 2 > CLOUD PERFORMANCE FROM THE END USER www.cloudops.com INTRODUCTION The risk and uncertainty of cloud computing platforms are often cited as major obstacles to adoption by a broader audience, particularly those of mainstream businesses. In this report, we analyze data provided by Cedexis to estimate the performance of several leading cloud providers from the end users’ point of view. Because the information comes from actual browsers, we not only learn about the performance of the clouds, but also of the countries and service providers. How the data was collected Cedexis captures performance information from visitors’ browsers when those browsers are idle. Customers insert a small Javascript into their pages: <script type=”text/javascript”> (function(d, w) { var requestorId = ‘1- 10001’, onWindowLoaded = function() { var a = document.createElement(‘script’); a.type = ‘text/javascript’; a.src = ‘radar_config. js’; d.body.appendChild(a);}; if (w.addEventListener) { w.addEventListener(‘load’, onWindowLoaded, false); } else if (w.attachEvent) { w.attachEvent(‘onload’, onWindowLoaded );} })(document, window); </script> Once the page is finished loading, the browser executes the script. The browser receives a weighted list of providers, with URLs for small and large objects for each provider. The client randomly picks some number of these providers to measure each time. For each provider, the browser then requests two objects: a 50-byte object called cdx10b.js and a 100-Kbyte object called cds10b-100KB.js. The smaller object is requested twice, each time with a different random URI parameter, in order to prevent any intermediate caches from skewing results. That means requests look like this: http://wac.0de9.edgecastcdn.net/800DE9/www.cedexis. com/cdx10b.js?rnd=7009653189 Cedexis uses a dynamic application to generate the response, rather than a static object. For some providers, static objects are served by a different infrastructure from dynamic ones; dynamically generating the response eliminates this external shell of infrastructure and instead tests the cloud itself. PAGE 3 > CLOUD PERFORMANCE FROM THE END USER www.cloudops.com Cedexis captures a relatively small amount of data on each cloud each time a page loads. This includes: Availability—whether the page loads or not. HTTP Connect Time—how long it takes for the browser to establish a connection with the server, including DNS time and the establishment of the initial TCP and HTTP sessions to the provider. HTTP Response Time—how long it takes for the server to respond to a subsequent request, once all of the noise of establishing a connection is completed. This is a relatively close approximation of TCP round-trip time (RTT) from the browser to the provider. Throughput—this is the data rate of the connection, in kilobits per second, as measured from the retrieval of the 100 Kbyte object. The system doesn’t collect other information such as OS or browser that could be used to further segment the results, or other sources of delay such as DNS lookup time and SSL negotiation time (though these can be inferred from the difference between the first and second request of the small object.) What client-side data looks like An actual message from a client contains the following information: Field Example Count 1 total_ms 100 time 2011-02-24 21:00:00 market_num 3 country_num 1 autosys_num 6752 provider_owner_zone_id 0 provider_owner_customer_id 0 provider_id 13 probe_type_num 3 Cedexis did several things to the data before they provided it to CloudOps Research. First, they limited it to cloud provider test data. Through its customers, Cedexis collects data on a wide range of providers1. We focused only on nine clouds. 1 Measurements are collected for Cloud providers (EC2 APAC, EC2 EU, EC2 US-East, EC2 PAGE 4 > CLOUD PERFORMANCE FROM THE END USER www.cloudops.com Second, they replaced certain fields’ numbers with their text equivalents—the service provider, the cloud provider, and the country, for example. Third, they removed some of the data that doesn’t aggregate well, leaving us with the HTTP response time and availability tests. We did not analyze HTTP Connect Time or Throughput metrics. Fourth, they aggregated it by day to make it feasible to process, which meant we had one row per cloud provider, per service provider, per day. This means that, for every ISP in the world, we had a minimum of 63 rows (7 days times 9 cloud providers) assuming at least one visitor from that ISP on a given day. As we’ll see below, this loses a significant amount of information, but is still valuable for overall analysis. The resulting data, which we received from them in raw format, looked like this: Metric What it means EC2 APAC The provider that handled the request—in this case Amazon’s EC2 zone in Asia Pacific. Egypt The country from which the request came, in this case, Egypt. Bibliotheca Alexandrina The service provider from which the request came, in this case the Library of Alexandrina. 33782 The Autonomous System (AS) number of the network from which the request came. 2010-12-15 00:00:00 The timestamp of the request. Cedexis truncated this data to individual days. HTTP Connect Time The measurement, in this case the time to connect to a server via HTTP. 2 The number of measurements made, in this case two. 883 The average HTTP Connect Time across these two measurements. Data aggregation It’s well understood among statisticians, performance experts, and site operators that averages mask problems. Average across too many data points, and it’s hard to identify incidents that are otherwise hidden by the US-West, GoGrid, Google App Engine, Joyent, Rackspace CloudServer, Windows Azure); cloud storage and delivery networks (CacheFly, CloudFlare, Cloudfront, MaxCDN, Nirvanix, Rackspace CloudFiles, S3-EU, S3-US, Voxel) traditional CDN/ADN providers (Akamai, Akamai HD, BitGravity, CDNet- works, Cotendo, Edgecast, Highwinds, Internap, Level3, Limelight, CDNVideo, ChinaCache, Internode, Ngenix, Prime Networks, Yacast) and major web destinations (Amazon.com, AOL.com, Bing.com, Blogger.com, Facebook.com, MSN.com, Twitter.com, WordPress.com, Yahoo.com, YouTube.com). Ce- dexis clients may also test their own providers or data centers. PAGE 5 > CLOUD PERFORMANCE FROM THE END USER www.cloudops.com smoothing effect of averages; and a few outliers can skew an otherwise good experience. In its day-to-day use, Cedexis needs more than one-day granularity. Once a minute, all of the measurements collected from the same provider network (i.e. “Bibliotheca Alexandrina in Egypt”) for the same test metric (i.e. “HTTP response time of cdx10b.js”) are stored as a row in the company’s database. That means that Cedexis has views like the following one into the performance of much of the Internet’s backbone, as seen by browser users around the world. For this report, we’re dealing with the daily aggregate of data. That means that—in an extreme case—one day of activity on France Telecom’s network yielded 810,065 data points about Amazon’s European availability zone, because there were 810,065 unique tests of Amazon Europe done by browsers on that network. This results in a very reliable understanding of network availability, but it also hides the variance across those individual browsers. This limitation comes down to a simple lack of time, tools, and computing horsepower on our part; one-minute summaries are likely to be relevant for Cedexis’ core business of making good service provider selections as the Internet’s backbones change. We used percentiles to provide better insight into end user experience. This is important because we often care about how the worst-served 5% experienced an application. To do this, we had to build histograms by provider, day, country, and service provider, which consumes more computing resources but ultimately gives us a clearer means of comparing providers. There’s another important source of bias in the results. As a result of its early beta and rollout, Cedexis’ largest customers are popular French media PAGE 6 > CLOUD PERFORMANCE FROM THE END USER www.cloudops.com sites. This means it’s unwise to make sweeping generalizations from this data; as Cedexis’ customers become more global, the data will be easier to apply to providers worldwide. The two data sets Cedexis provided us with two sets of data. The first, described above, was analyzed using Google’s Fusiontables product. It consisted of 1.2 million rows of information, spanning five days, representing 296,265,830 individual tests conducted by visiting browsers.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    43 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us