A PEER-TO-PEER MEASUREMENT PLATFORM AND ITS APPLICATIONS IN CONTENT DELIVERY NETWORKS

BY

SIPAT TRIUKOSE

Submitted in partial fulfillment of the requirements

for the degree of Doctor Of Philosophy

DISSERTATION ADVISOR: DR. MICHAEL RABINOVICH

DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

CASE WESTERN RESERVE UNIVERSITY

JANUARY 2014 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES

We hereby approve the dissertation of

SIPAT TRIUKOSE candidate for the Doctor of Philosophy degree *.

MICHAEL RABINOVICH

TEKIN OZSOYOGLU

SHUDONG JIN

VIRA CHANKONG

MARK ALLMAN

(date) December 1st, 2010

*We also certify that written approval has been obtained for any proprietary material contained therein. Contents

List of Tables ...... vi List of Figures ...... ix List of Abbreviations ...... x Abstract ...... xi

1 Introduction 1 1.1 Internet Measurements ...... 1 1.2 Content Delivery Network (CDN) ...... 4 1.2.1 Akamai and Limelight ...... 6 1.2.2 Coral ...... 7 1.3 Outline ...... 7 1.4 Acknowledgement ...... 9

2 Related Work 10 2.1 On-demand Network Measurements ...... 10 2.2 Content Delivery Network (CDN) Research ...... 12 2.2.1 Performance Assessment ...... 12 2.2.2 Security ...... 13 2.2.3 Performance Improvement ...... 14

3 DipZoom: Peer-to-Peer Internet Measurement Platform 17 3.1 System Overview ...... 17

i 3.2 The DipZoom Measuring Point (MP) ...... 21 3.2.1 MP-Loader, MP-Class, and MP Configurations ...... 25 3.2.2 Authentication ...... 30 3.2.3 Keep Alive ...... 37 3.2.4 Measurement ...... 39 3.3 The DipZoom Client and API ...... 43 3.4 Security ...... 44 3.5 Performance ...... 47 3.5.1 Scalability: Measuring Point Fan-Out ...... 47 3.5.2 Scalability: Client Fan-Out ...... 49 3.5.3 Demonstration Experiments ...... 49 3.6 Conclusion ...... 54

4 A Large Scale Performance Study of a Commercial CDN 55 4.1 Introduction ...... 55 4.2 Methodology ...... 58 4.2.1 Edge Server Discovery ...... 58 4.2.2 Overriding CDN Edge Server Selection ...... 60 4.2.3 Controlling Edge Server Caching ...... 61 4.2.4 Assessing Client-Side Caching Bias ...... 63 4.2.5 Measuring Edge Server Performance ...... 65 4.3 Performance of Edge Discovery ...... 65 4.4 Performance of Akamai CDN ...... 69 4.4.1 Does a CDN Enhance Performance? ...... 69 4.4.2 How Good Is Akamai Server Selection? ...... 73 4.5 Performance of Consolidated Akamai CDN ...... 76 4.5.1 Consolidation ...... 76 4.5.2 Impact of Incomplete Edge Server Discovery ...... 78

ii 4.5.3 DipZoom Experiment ...... 79 4.5.4 A Live Study ...... 81 4.6 Conclusion ...... 85

5 Security Issues in Commercial CDNs 89 5.1 Introduction ...... 90 5.2 The Attack Components ...... 93 5.2.1 Harvesting Edge Servers ...... 93 5.2.2 Overriding CDN’s Edge Server Selection ...... 94 5.2.3 Penetrating CDN Caching ...... 95 5.2.4 Amplifying the Attack: Decoupled File Transfers ...... 98 5.2.5 Verification ...... 99 5.3 End-to-End Attack ...... 101 5.3.1 The Setup ...... 103 5.3.2 A Sustained Attack ...... 103 5.3.3 A Burst Attack ...... 104 5.3.4 Discussion: Extrapolation to Commercial CDNs ...... 105 5.4 Implication for CDN Security ...... 107 5.5 Mitigation ...... 108 5.5.1 Defense by Content Provider ...... 108 5.5.2 Mitigation by CDN ...... 110 5.6 Conclusion ...... 111 5.7 Acknowledgement ...... 112

6 Client-Centric Content Delivery Network 115 6.1 Introduction ...... 116 6.2 Architectural Approaches ...... 117 6.3 The Effect of Infrequent Server Selection ...... 119

iii 6.4 Performance Improvement ...... 122 6.4.1 Data Set ...... 123 6.4.2 The Improvement Simulation ...... 126 6.4.3 The Replay Experiment ...... 131 6.5 Discussion ...... 139 6.5.1 Realization of this approach ...... 139 6.6 Conclusion ...... 140 6.7 Acknowledgement ...... 140

7 Conclusion 142 Bibliography ...... 146

iv List of Tables

3.1 Detail of MP INFO field ...... 34 3.2 TCP/UDP destination ports summary ...... 42 3.3 Security threads in DipZoom and counter-measures ...... 44

4.1 Initial vs. repeat download performance of an object with an appended random search string...... 62 4.2 The difference of RTT distance (in milliseconds) from clients to the nearest data center in a given consolidated platform and to the Akamai- selected server in the current platform (live clients)...... 82

5.1 The throughput of a cached object download (KB/s). Object requests have no appended random string...... 96 5.2 Initial vs. repeat download throughput for Akamai (KB/s). Requests include appended random strings...... 97 5.3 Initial vs. repeat download throughput for Limelight (KB/s). Requests include appended random strings...... 98 5.4 The download throughput (KB/s) of the monitor client. The monitor request is sent 0.5s after the probing request...... 101 5.5 Average traffic increase during the attack period...... 104

6.1 Pearson correlation of all trial pairs ...... 122

v 6.2 Performance of the client-centric CDN vs the current practice (best case scenario) ...... 128 6.3 Replay times (in second) of ext-our-replay and ext-current-replay. . . 135 6.4 Replay times (in second) of int-our-replay and int-current-replay. . . . 135 6.5 TCP connection utilization in the network with no local Akamai edge server deployed...... 135 6.6 TCP connection utilization in the network with a local Akamai edge server deployed...... 136

vi List of Figures

1.1 Content delivery network ...... 5

3.1 DipZoom measuring point overview ...... 21 3.2 Measuring thread on measuring point (MP) ...... 24 3.3 An example of MP configuration file dipzoom mp.conf in XML format 26 3.4 Custom measurement plug-in management system: login screen . . . 30 3.5 Custom measurement plug-in management system: new measurement sign-up ...... 31 3.6 Custom measurement plug-in management system: available measure- ment list ...... 32 3.7 Custom measurement plug-in management system: MP configuration tool...... 33 3.8 Custom measurement plug-in management system: generated MP con- figuration ...... 34 3.9 Custom measurement plug-in management system: preference . . . . 35 3.10 Burst MP logins test ...... 49 3.11 Average MP login successes ...... 50 3.12 The sustained rate of MP operations involved in a measurement . . . 51 3.13 Classified King/DipZoom ratios ...... 52

4.1 Active DipZoom measurement points on 5/09/09...... 59

vii 4.2 Relation between download throughput and RTT...... 63 4.3 Cumulative Akamai’s edge server discovery against CNAMEs . . . . . 65 4.4 Akamai’s edge servers discovery ...... 66 4.5 Progressive discovery of Akamai edge servers with time ...... 67 4.6 The performance benefits of Akamai delivery...... 70 4.7 The residential client performance benefits of Akamai delivery. . . . . 71 4.8 The comparison of no-cache download through Akamai and download from server...... 72 4.9 The fraction of servers outperformed by the Akamai-selected server. . 74 4.10 Download throughput difference between Akamai-selected and an al- ternative edge server ...... 75 4.11 The implication of incomplete platform discovery: A client may be redirected to a more distant location...... 78 4.12 The performance of a consolidated Akamai platform with different number of data centers...... 80 4.13 The performance of a consolidated Akamai platform with different tar- get object size...... 86 4.14 The performance of a consolidated Akamai platform with different download speed...... 87 4.15 The performance of a consolidated Akamai platform for residential speed links...... 88

5.1 Decoupled file transfers experiment ...... 99 5.2 DoS attack with Coral CDN ...... 102 5.3 The effects of a sustained DoS attack...... 113 5.4 The effects of a burst DoS attack...... 114

viii 6.1 Download performance ratio of reused edge server over Akamai selected edge server throughput ...... 120 6.2 Distribution of requests/connections to Akamai edge servers per client 125 6.3 Distribution of time gap between consecutive Akamai requests from the same client ...... 128 6.4 Utilization (requests/connection) in both approaches ...... 129 6.5 The ratio of per-client total TCP handshake time spent in the current approach over our approach...... 136

ix List of Abbreviations

Abbreviation Description API Application Programming Interface CDN Content Delivery Network DipZoom Deep Internet Measurement Zoom DNS DoS Denial Of Service GUI Graphical User Interface MP DipZoom Measuring Point

x A Peer-to-Peer Internet Measurement Platform and Its Applications in Content Delivery Networks Abstract by SIPAT TRIUKOSE

Network measurement is crucial for ensuring effective operation, secu- rity, and continued development. However, collecting representative measurements in a complex infrastructure like the Internet is extremely challenging. To address this challenge, we propose a novel approach to provide focused, on-demand Internet measurements called DipZoom (for Deep Internet Performance Zoom). Unlike prior approaches that face a difficulty in building a measurement platform with sufficiently diverse measurements and measuring hosts, DipZoom implements a matchmaking service, which uses P2P concepts to bring together experimenters in need of measure- ments and external measurement providers. Further, to demonstrate the utility of DipZoom as a tool for real-world research, we use it to answer some of the challenging questions regarding Internet operation. Specifically, we use DipZoom to conduct an extensive study of content delivery net- works (CDN ), which are among the key components of today Internet infrastructure. in performance, security, and improvement aspects. First, we conduct a large-scale performance study of the CDN platform operated by the leading DNS service provider. The study’s result shows that the number of worldwide data centers in CDN platform could be significantly reduced without affecting the content delivery performance. Therefore, system designers can decide on the number of data centers to meet their other objectives without having to worry about performance degradation.

xi Second, we used some measuring techniques developed for the above performance study to uncover a significant security vulnerability in CDNs. We showed that several CDNs, including commercial CDNs, not only left their customers vulnerable to the application-level denial of service attack, but CDNs themselves are also susceptible to be recruited to amplify the attack. Finally, based on insights gained in our CDN studies, we propose an approach to improve the content delivery performance from the client perspective without a need to modify a CDN platform. We quantify performance gains a client site would experience by adopting our approach and show that our approach can significantly improve the efficiency of Web browsing by the users.

xii Chapter 1

Introduction

1.1 Internet Measurements

Network measurements play a fundamental role in network system research and devel- opment. We measure networks/systems to understand them better and these insights are consequently converted to the development of better designs of networks/systems. Measurements are also used for network systems diagnosis and performance evalua- tion. Examples of measurements are network latency between a pair of the Internet hosts, packet loss rate, bottleneck bandwidth, web-content delivery throughput from a given site to a given client, and so on. Due to their significance, several network measurement platforms (that is, measurement infrastructures that offer a multitude of vantage points for measurement collection, e.g. [25, 91, 62, 58, 24, 77, 17, 3, 33]) and tools (e.g. [37, 42, 71, 10, 28, 14, 39, 27, 52, 47, 82, 56, 7, 92, 23]) have been devel- oped. However, a large-scale and diversity of the Internet [1, 19] give rise to significant challenges in the establishment of a representative yet accurate measurement. The di- versity in this context includes locations, connectivities, operating systems, networks, and so forth. Several current research platforms are deployed on well-connected servers such as

1 PlanetLab [17, 63] while a number of devices connect to the Internet via other channels (e.g. DSL, dial-up, cable modem, satellite, cellular broadband, Wi-Fi). This lack of diverse connectivities on measurement platforms makes measurement results not very representative especially on residential and cellular networks. Another issue that makes network measurement challenging is measurement stag- ing. Network measurements could be complex and involve multiple steps. For exam- ple, to measure the quality of the server selection of a Web content delivery network (CDN), one might first use an nslookup [5] measurement to discover a set of CDN selected servers for each given measuring point and then use a curl [23] measurement from each measuring point to compare page download time from the CDN-selected server and from other servers [79]. Coordinating a complex measurement experiment often involves acquiring control over measuring points, obtaining permission from network operators where measurement points are, installing/executing measurement scripts at the measurement points, and collecting measurement results. This requires a tremendous work on the part of the experimenter and in many cases personal con- nections to operators. As a result, there is a high threshold for obtaining high-quality and large-scale measurements. We propose an on-demand measurement approach that addresses the above chal- lenges. The challenge of collecting representative measurements often entails the need of accessing a large number of vantage points. It can be a daunting task for one entity to support and scale this campaign. However, we address this challenge by utilizing resources made available by the Internet users who join DipZoom. We build a matchmaking service, which is the DipZoom’s core, and provide a peer bundle and easy means for any new users to join and contribute their resources to the platform. Another challenge, on a large measurement platform involves a ma- nipulation of large number of measuring points and result collection. We handle this challenge by providing a coherent interface, which is supportive of interacting with

2 measuring points. The software bundle includes a default measuring point (which can be extended) and an API that gives a Java developer an access/control over the entire pool of measurement points. With the above concepts in tackling the difficulties, we have launched DipZoom (for Deep internet Performance Zoom), a measurement platform that supports fo- cused (or zoomed-in) and on-demand measurements. An experimenter can query the system for the measurement points corresponding to his/her specific needs and also can request measurements from the selected measuring points. The detailed archi- tecture and performance of our DipZoom system will be described in the Chapter 3. Our measurement software suite comprises client GUI application, client library, and measuring point agent supporting MS-Window, MacOS, and Linux systems. This interoperability of our software suite not only facilitates adoption by users but also makes DipZoom a heterogeneous ecosystem. In addition, we have utilized DipZoom in various studies on content delivery networks. With a peer-to-peer concept, DipZoom simply outsources the measurements to the crowd. This strategy allows the platform to be scaled easily without excessive investment on the measuring point resources. In order to take full advantage of this strategy, DipZoom needs to have the following components: 1) a scalable core, which is a centralized system that facilitates a matchmaking service between measurement requesters and providers; and 2) an effortless way for new participants to join Dip- Zoom and become measurement providers, i.e., operate measurement points - vantage points from which others can collect measurements. We evaluate scalability of the core in Section 3.5. Any clients running Mac OS, Windows, and Linux can simply join DipZoom by installing the DipZoom software with no configuration needed. DipZoom ecosystem not only offers a scalable platform and capability to achieve diversity, but also facilitates the measurement staging and data collection. DipZoom’s library and API provide experimenters a coherent view of resources on the entire

3 platform and allow access to them(?). A user can simply write a Java program on their local machine to request some certain measurements from a certain set of measuring points. DipZoom will later manage to respond to those requests to the targeted measuring points. Then, it will collect the measurement results, and deliver them back to the user’s machine. This allows a user to conduct complex and multi- steps studies where input of the current measurement is determined by a result from the previous one. A number of peer reviewed studies [83, 90, 85] have been conducted using DipZoom. In the early stage, DipZoom was used in the house to conduct various network measurements and to test new systems/protocols. Since 2007, DipZoom became ma- ture enough and was opened for public participants. As an example of DipZoom in real-world research, we use it to conduct a large-scale measurement study on per- formance implications of two major CDN’s architectures – adopted by Akamai and Limelight CDNs. Since most of the studies in this thesis revolves around CDN, we give a background on what is CDN and how does it work in section 1.2.

1.2 Content Delivery Network (CDN)

A content delivery network (CDN) is a crucial component of today Internet infras- tructure. It is a shared infrastructure comprising a large number of so-called edge servers that are deployed across the Internet for efficient delivery of third-party Web content to Internet users. By sharing its vast resources among a large number of di- verse customer Web sites, a CDN derives the economy of scale: because different sites experience demand peaks (“flash crowds”) at different times, the same slack capacity can be used to absorb unexpected demand for multiple sites. Most CDNs utilize domain name system (DNS) [55] to redirect user requests from the origin Web sites hosting the content to the so-called edge servers operated by

4 Client

Figure 1.1: Content delivery network the CDN. The basic mechanism is illustrated in Figure 1.1 and involves the following steps. When a client wants to get a content (e.g. image) from firm-x.com, the client first sends a query to a local DNS to obtain the IP address of images.firm-x.com (step 1 in the figure). If content provider firm-x.com wants to accelerate delivery of content from images.firm-x.com through a CDN, the provider configures its DNS server to respond to queries for images.firm-x.com not with the IP address of its own server but with a so-called canonical name, e.g., “images.firm-x.com.cdn.com” (step 2). The user’s local DNS would now have to resolve the canonical name, with a query that will arrive at the DNS responsible for the cdn.com domain (step 3). This DNS server is operated by the CDN; it can therefore select an appropriate edge server for this client and respond to the query with the selected server IP address. Note that the content provider can selectively outsource some content delivery to a CDN while retaining the responsibility for the remaining content. For example, the content provider can outsource all URLs with hostname “images.firm-x.com” as described above while delivering content with URL hostnames “www.firm-x.com” from its own origin site

5 directly. Once receiving the IP address from CDN’s DNS, the local DNS forwards it to the client (step 4). The client then sends a request for a content directly to the selected server’s IP address (step 5). When an edge server receives an HTTP request from a client, it fetches the indicated object from the origin site unless the edge server has already stored the object in its cache and forwards the object to the client. The edge server also caches the object and satisfies subsequent requests for this objects locally, without contacting the origin site. It is through caching that a CDN protects the origin Web site from excessive load, in particular from application-level DoS attacks. Serving the requested object from an edge server near the client, CDN is able to improve overall content delivery performance over the origin site. In Sec- tion 4.4.1, we discuss whether Akamai CDN actually enhances performance of its customer’s origin sites.

1.2.1 Akamai and Limelight

Akamai [8] and Limelight [50] are two leading CDN providers representing two basic approaches to content delivery. Akamai attempts to increase the likelihood of finding a nearby edge server for most clients and thus deploys its servers in a large number of network locations. Its platform is made up of over 100,000 servers, deployed in over 75 countries [8]. Akamai claims to be delivering 20% of all the Web traffic [9]. Limelight concentrates its resources in fewer “massively provisioned” data centers (around 18 according to their map) and connects each data center to a large number of access networks. This way, it also claims direct connectivity to 589 last-mile access networks around the world [4]. The two companies also differ in their approach to DNS scalability, with Aka- mai utilizing a multi-level distributed DNS system and Limelight employing a flat collection of DNS servers and IP [60] to distribute load among them. Most importantly, either company employs vast numbers of edge servers, which

6 as we will see can be recruited to amplify a denial of server attack on behalf of a malicious host.

1.2.2 Coral

Besides Akamai and Limelight, we utilize Coral CDN [34, 22] in this thesis. Coral CDN is a free content distribution network deployed largely on the PlanetLab nodes.

It allows any Web site to utilize its services by simply appending a string ".nyud.net" to the hostname of objects’ URLs. Coral servers use peer-to-peer approach to share their cached objects with each other. Thus, Coral will process a request without contacting the origin site if a cached copy of the requested object exists anywhere within its platform. Coral currently has around 260 servers world-wide [22]. In Chapter 5, we demonstrate that a particular vulnerability in the Coral platform can be exploited to launch a DoS attack against servers outsourcing their contents through Coral CDN.

1.3 Outline

This thesis is mainly related to the performance and measurement of networks. In brief, we develop a new measurement tool and use it heavily in a series of our CDN studies. First of all, we develop a new P2P measurement platform called DipZoom using a match-making approach. A DipZoom working prototype has been implemented and made publicly available since 2007. Chapter 3 covers design and operational details of DipZoom. This tool greatly facilitates our studies on CDN in different aspects. We first study a delivery performance implication of two major CDN approaches – network- core and co-location. While co-location approach tries to create presence at the edge of

7 networks in as many locations as possible (e.g. Akamai, Digital Island), network-core approach tries to deploys large and fewer data centers (e.g. Limelight, AT&T). Our question is whether the network-core approach can offer the performance delivered by co-location approach. This question is not just an abstract problem because there are major CDN players pursuing either of the approaches. Therefore, the answer to this question would contribute directly to the actual CDN business and their customers. We find that a CDN can vary a number of cache server locations in a wide range without noticeable degradation in overall delivery performance. Chapter 4 discusses this study in details. During the above study, we discover a major security flaw shared across different commercial and experimental CDNs. We find that one can utilize CDN resources to launch an amplified DoS attack against origin servers of the CDN’s customers. In Chapter 5, we demonstrate that using our attack even with conservative amplification, a malicious node with dial-up link can deny a website with a 5 Mbps link the ability to serve legitimate user requests. We also discuss both short-term and long-term solutions to this problem. Stemming from our years of experience in CDN, we have insight on aspects of today’s CDNs that can be improved. Therefore, we propose a simple but effective approach to improve the content delivery performance from the client perspective without a need in CDN platform modification. In more details, we try to reduce DNS query and TCP handshake time overheads. Furthermore, we then lower a chance for a TCP session to undergo TCP slow-start and congestion window timeout events. To demonstrate how much impact our approach can make in practice, we replay the HTTP requests for Akamai contents, collected from actual users in a large campus network, against our and current content delivery approaches. As we have envisioned, our result shows good improvement over the content delivery in current network configuration.

8 Hence, we believe methodologies developed for our CDN study and conclusions drawn from our experimental results will be useful for other researchers in the CDN arena.

1.4 Acknowledgement

This material is based upon work supported by the U.S. National Science Foundation under Grants No. CNS-0520105 and CNS-0721890.

9 Chapter 2

Related Work

2.1 On-demand Network Measurements

A network measurement is vital for an advancement of the next generation network systems. The best way to design future system is to learn from the past and under- stand the present. Today, the world is interconnected and a measurement at large or global scale becomes more essential. One of the key challenges is to have vantage points across a large number of networks or geo-locations. However, it’s not easy for a single entity to have enough resources to cover a large-scale network such as the Internet. Therefore, several existing systems leverage the Internet users to fulfill their goals. Resource requirements are varied by types of study and measurement. Seti@home [75] focuses on crowdsourcing of CPU cycles from users for a large-scale scientific computation. DIMES and Lip6 projects [24, 82] harvest users’ network resources to conduct particular measurements of their interests. While the major- ity of existing projects crowdsource the resources of users through software agents, some projects such as RIPE Atlas [6] coordinate specific measurements through their give-away USB sticks. DipZoom is a peer-to-peer measurement platform that offers on-demand measure-

10 ments initiated and organized by any participants. Commercial performance moni- toring services such as Gomez [65] and Keynote [3] also offer on-demand measurement through their networks of measuring points. However, they are closed systems that select measuring points, measurement types, and measuring targets based on their customers’ requests. DipZoom is, on the other hand, an open system that functions as a matchmaking service and a coordinator between experimenters and measurement providers. In fact, it gives experimenters full access to available resources – offered measurement types on active measuring points – and allows the experimenters to ex- ecute scriptable complex or longitudinal experiments from their personal computers. There are several existing research platforms that support on-demand measure- ment. Examples of them include NIMI [62], Scriptroute [77], PlanetLab [17], and also commercial systems like Keynote [3]. DipZoom has been designed to be easy for everyone, including a non-technical person, to install and run. Therefore, it can po- tentially offer a richer set of measurement types and greater diversity on measuring points. Measurement requests are conveniently submitted either programmatically through the API or graphically through our GUI application. There exists a wide range of measurement tools including per-hop bandwidth [39], the available/bottleneck bandwidth [40, 47, 15], TCP bandwidth [53, 42], latency [56, 7], packet loss [81, 71], and a web page download [92, 23]. DipZoom covers a fair bit of these measurements and currently includes wget, curl, ping, traceroute, tcp- traceroute, tracepath, host, iplookup, dig, and nslookup as part of the DipZoom’s standard installation package. We engineer the measuring point (MP) to support “measurement plug-in” mechanism, allowing an MP to incorporate arbitrary new measurement tools. However, the MP requires a consent from the DipZoom core before the plug-in measurement can be seen/used by DipZoom clients. At this point, we have a prototype of the automatic process that allows MPs to submit consent requests to the DipZoom core; cf. Section 3.2.1.3. A full deployment of this aspect

11 into the production system is left for the future work.

2.2 Content Delivery Network (CDN) Research

2.2.1 Performance Assessment

A number of CDNs offer acceleration services today [72, 8, 50, 38, 67]. The perfor- mance study of CDN in Chapter 4 could be useful for them as they decide on their infrastructure investment priorities. Several papers have studied the performance of CDNs. Krishnamurthy et al. compared the performance of a large number of CDNs in existence at the time of the study (in 1999) [45]. This study provides the first indication that size may not be directly translated into performance. We study this issue directly by focusing on a single CDN and analyzing the performance implications of consolidating its data centers. Previously, Johnson et al. [41] and Krishnamurthy et al. [45] considered the quality of CDN server selection, albeit at a limited scale; the former study used only three client hosts and the latter “approximately two dozen”. In contrast, the DipZoom platform allows anyone to utilize hundreds of client hosts and discover many more edge servers. We therefore revisit this question on a larger scale. Canali et al. [13] studied user-perceived performance of CDN delivery, focusing on longitudinal aspects, by monitoring the CDN delivery performance from three locations for two years. Xu et al [93] used a dataset collected from smart-phones and server logs to conduct comprehensive characterization of four major cellular data networks within USA. They concluded that the routing of cellular data traffic is quite restricted compared to the wired Internet traffic. Following this, they discussed how this finding could impact the mobile content placement and the content server selection. We, on the other hand, concentrate on regular Web traffic delivery by Akamai,

12 which is engineered differently from streaming. From Biliris et al.’s study which describes performance implications of accelerating the same content through different CDNs for different users [12], we have the information that no single CDN provides adequate coverage for all Internet users. Coupled with our current results that Akamai would not suffer from performance degradation as a result of consolidating CDN’s data centers, this indicates (if Biliris et al.’s findings still hold today) that CDNs may be able to optimize their facility locations. Su et al. [79] investigated a possibility of leveraging Akamai’s server selection for finding high-quality overlay routes on the Internet. In the process, the authors considered various performance aspects of Akamai and observed that Akamai server selection reflects the current network condition of the paths between a client and edge servers. This and several other studies require the discovery of Akamai’s servers. Our observations and guidelines in this regard shall prove the usefulness for future investigations of this kind.

2.2.2 Security

Most prior work considering security issues in CDNs have focused on the vulnerability and self protection of the CDN infrastructure and on the level of protection it affords to its customers [88, 78, 48, 43]. In particular, Wang et al considered the protection of edge servers against break-ins [88] and Su and Kuzmanovich discovered vulnerabilities in Akamai’s streaming infrastructure [78]. The vulnerability that we have uncovered in Chapter 5 can be exploited to mount an application-level DoS attack against not only the CDN but also its customer Web sites. Lee et al. proposed a mechanism to improve the resiliency of edge servers to SYN floods, which in particular prevents a client from sending requests to unintended edge servers [48]. Thus, it would in principle offer some mitigation against our attack (at least in terms of detection avoidance) as a result of disallowing the attacking host to

13 connect to more than one edge server. Unfortunately, this mechanism requires the CDN to know the client IP address when it selects the edge server. This information that is not available in DNS-level redirection. Jung et al. investigated the degree of CDN’s protection of a Web site against a flash crowd and found that cache misses from a large number of edge servers at the onset of the flash event can overload the origin site [43]. Their solution – dynamic formation of caching hierarchies – will not help with our attack as our attack pene- trates caching. Andersen [11] mentioned a possibility of a DoS attack that includes the amplification aspect but otherwise is the same as flash crowds considered in [43] (since repeated requests do not penetrate CDN caches). Thus, the solution from [43] applies to this attack also. Again, our attack is immune to this solution due to its ability to penetrate CDN caches. The attacker-laundering capability of our attack is similar to that in reflector attacks [61]. However, while reflector attacks typically co-opt third-party hosts to attack an unrelated victim, in our case, the CDN is recruited to attack its own customer. The amplification aspect of our attack takes advantage of the fact that HTTP responses are much larger than requests. The similar property in the DNS protocol has been exploited for DNS-based amplification attacks [87, 73]. Some of the measures we suggest as mitigation, namely, abort forwarding and connection throttling have been previously suggested in the context of improving benefits of forward Web proxies [32]. We show that these techniques can be useful for the edge servers in content distribution networks as well.

2.2.3 Performance Improvement

Since today’s Internet relies heavily on the content delivery networks to deliver static, dynamic, and streaming contents to end-users worldwide, improving delivery perfor-

14 mance of CDNs would contribute to the Internet users as a whole. Our work in Chapter 6 focuses on trying to improve user perceived delivery performance. A num- ber of studies [41, 45, 13, 80, 79] tried to understand and/or suggest the possible improvements on CDNs especially in the performance related aspects. Poese et al [64] purposed a solution to optimize the server selection using distance information only available for the ISP. Therefore, with this help, CDN providers are more likely to select servers that offer higher content delivering performance for each particular client. This work as well as ours suggests CDNs to take local factors into an account when it comes to their performance improvement. Several approaches have utilized social networks to improve performance of CDNs. Scellato et al [74] showed how geographic information extracted from social cascades of YouTube video can be utilized to improve caching of multimedia files in a CDN. They used Twitter to track social cascades. Ruhela et al [70, 69] studied the temporal growth and popularity decay of topics on Twitter, and used these observations to derive heuristics that could lead to CDN performance improvement. Our approach tries to increase efficiency of content delivery by reducing the TCP handshake and DNS resolution overheads and avoiding sub-optimal situations such as slow start periods. In turn, it yields better delivery performance. We achieve this by essentially ”pinning” all downloads from the CDN for a given client’s site to the same edge server, regardless of which Web site clients are accessing. Details on methodology and benefits are discussed in Chapter 6. Another technique to reduce DNS resolution overhead is DNS prefetching. Im- plications of DNS prefetching have been studied [46, 16, 76, 20] and most of major web browsers today offer the DNS prefetching as their standard feature to improve web browsing experiences. Our approach differs from DNS prefetching in that we do not perform DNS resolution in the anticipation of future use and therefore do not run the risk of unnecessary resolutions. SPDY [2] proposed a collections of measures

15 to improve latency at the transport layer. They tried to increase the utilization of a TCP connection to make today’s web browsing faster. While the SPDY and our ap- proach share a common idea to increase the utilization of TCP connection, no changes required in the protocol stack to realize the benefit of our approach. Additional impli- cation of the high utilization is that chances are higher to avoid sub-optimal situations such as slow-start period and TCP congestion window timeout.

16 Chapter 3

DipZoom: Peer-to-Peer Internet Measurement Platform

3.1 System Overview

The DipZoom platform comprises 3 components: measuring points (MPs), clients, and the core. A measuring point is a host that run an MP daemon and offers certain amount of its resources for requested measurements from DipZoom clients. A client is an experimenter who requests for measurements from a set of MPs through a DipZoom API. The core matches a request from a client with the offers from MPs and supervises the measurement result delivery. The system security is a joint responsibility from all components. A platform operation is briefly described as follows. Once the core is online, MPs verify themselves with the core and advertise to the core their configurations– offered measurements, resource consumption constraints, and their profiles. The core main- tains a database of currently active MPs and makes it accessible to clients. Clients query the database for MPs that meet their requirements (geographical locations,

17 network connection types, AS numbers, etc. 1) and submit measurement requests to selected MPs via the core. The core performs security and integrity check on requests and passes legitimate requests to target MPs. Upon completion, MPs pass along the measurement results back to clients though the core. Following the peer-to-peer paradigm where one can utilize the platform resource only when one is willing to offer it own resources to the system, all DipZoom’s clients need to contribute as an MP as well. Although the DipZoom MP and client libraries are separate pieces of software, they are bundled together in a DipZoom package to ease the download and installation tasks. The MP runs as a daemon and optionally starts automatically each time the host machine boots. The client library checks if there is an MP running locally before it allows any communications with the core. Note that while DipZoom exhibits the peer-to-peer behavior in terms of requiring peers to barter their services to each other, all communication between the peers is tunneled through the core. In this latter aspect, DipZoom is unlike most traditional peer-to-peer systems, which allow peers to communicate directly. The continued pres- ence of the core during peer interactions enables enforcement of security protection of MPs and measurement targets. During a DipZoom package download, the system dynamically generates an MP software instance with a unique global ID (MPID) and a unique secret key. All inter- actions between the MP instance and the core are encrypted using this key. During the installation, users can adjust the following default resource constraints – min- imum inter-measurement time interval, per measurement bandwidth consumption, and a number of outstanding measurement requests allowed. The client-side includes a client library for Java and a graphical front-end applica- tion. The client library allows access to well-defined DipZoom platform API. Through this API, experimenters can develop Java applications for their experiments ranging

1All this information is derived from the MP profile; see Section 3.2 for details.

18 from a simple to a complex multi-steps measurement. For instance, an application can query the core for the MPs suiting specific requirements (choosing from location, autonomous system number, bandwidth, operating system, supported measurements, and measurement capacity as attributes), then select the MPs to submit measure- ment requests. After receiving an analyzing results, the application can decide the next step of the experiment based on the current measurement outcomes. Basically, experimenters have all the flexibility of a Java program and the ability to control a global measurement platform at the single-measurement granularity. The graphical front-end is just an example of such a Java program built on top of the DipZoom client library. It instantly allows users to interact with the system in an explorative mode: discover active MPs, request measurements from select MPs, analyze the results, and decide on a next step. While some existing measurement platforms, PlanetLab [17] and NIMI [62], also offer programmatic control of global measurements, DipZoom has the following ad- vantages:

• It simplifies experiment scripting by removing the need to explicitly communi- cate with each MP. In particular, NIMI requires the experimenter’s client ma- chine to get on access control lists (ACL) of every MP before sending them any requests. While this process is facilitated by the ability of NIMI’s node owner to delegate ACL management, this delegation does not fundamentally change the fragmented nature of the platform. Instead of negotiating with individual node owners, the experimenter has to negotiate with each delegate. Thus, even if a delegate aggregates a few nodes, it does not change fundamentally the mode of operation, especially in a large-scale system envisioned by DipZoom’s design.

• It removes the requirement that an experimenter knows in advance the list of MPs. Rather, the experimenter can have his/her measurement application submit queries to the core for the dynamically changing set of MPs and specify

19 query parameters according to the measurement needs. Again, in NIMI, the experimenter can query a delegate for the MPs under the delegate’s umbrella, but in a large-scale system it is still cumbersome because one would have to discover a large number of delegates and query them individually.

• Having a centralized core, the system can be made more robust to nefarious behaviour of the players. For example, DipZoom effectively guards against a denial of service attack of a measurement target (e.g., when an over-zealous experimenter asks a larger number of NIMI nodes to measure performance of page download from a web site, e.g., cnn.com). Similarly, the core can enforce a limit of the load on any individual MP from the totality of all clients. In distributed platforms, this task falls on individual MPs, and some load limits can be difficult to enforce from that vantage point (e.g., preventing the incoming measurement requests from overloading the MP’s network).

At the same time, DipZoom relies on the core being always connected, and the experimenter can only work with those MPs that are able to communicate with the core. In some measurements, e.g., a study of routing connectivity, this can lead to measurement bias. There are situations where experimenters may need to craft their own measure- ment tools. For example, in a packet train bandwidth assessment, an experimenter will need to send a train of packets in an unorthodox manner that does not read- ily fit in a DipZoom’s pre-defined set of measurement tools such as Ping or Wget. DipZoom’s plug-in mechanism allows an experimenter to introduce new measurement tools. We cover the details of this mechanism in Section 3.2.1.3.

20 3.2 The DipZoom Measuring Point (MP)

This section discusses details on DipZoom measuring point (MP) operation and its interactions with the core.

Figure 3.1: DipZoom measuring point overview

The DipZoom measuring point (MP) could be any computing host or device with the Internet connectivity and capable of running the measuring point software imple- mented in Java. The measurement requests from DipZoom clients are forwarded to target MPs via the core. MPs download requests from the core, perform requested measurements, and upload the results back to the core. The core then notifies clients to pick up their results or deletes the results after certain timeout. Since the MP soft- ware runs on a Java virtual machine, it could easily be ported to various platforms. Currently, the MP software works on Mac OSX, Microsoft Windows, and Linux.

21 The MP software suite, referring to as MP, consists of two components, an MP- Loader and an MP-Class. The MP-Loader is a Java program that starts/stops the MP software suite. Since it is a small piece of code containing only crucial functions needed to load the MP-Class, the loader is likely to remain unchanged as the overall MP software moves to later versions. Any new features/functions will be introduced in the MP-Class. The MP-Loader contains an embedded ID and secret key and therefore is unique across the DipZoom platform. The MP-Class is a collection of Java classes in a JAR file and it handles most of the MP operations. All the development and security updates are carried on by modifying this MP-Class. While the MP-Loader is unique for each client, the MP-Class is the same across the platform. The reason behind this two-pieces software design is to enable self-upgrade of the MP software. The MP-Loader checks with the core for an available upgrade during the authentication step. If self-upgrade is required, the MP-Loader simply downloads a new version of MP-Class from the platform central repository and reloads the new MP-Class into memory. Figure 3.1 gives a high-level view of how MP works. To begin with, the MP- Loader is started in a Java virtual machine. For security purposes, the MP-Loader verifies an MP-Class certificate for its authenticity. We use key and certificate infor- mation from a keystore to generate digital signatures for dipzoom mp.jar containing both MP-Loader and MP-Class. The keystore is a database of our private keys and their associated X.509 certificate chains authenticating the corresponding public keys. Upon the successful verification, the MP-Loader loads the class into main mem- ory. Then, the MP loads all the customizable configurations and probes the current MP environment for variables such as operating system and connectivity type. The DipZoom’s core uses a GeoIP database [54] to infer locations of MPs from their IP addresses. To roughly estimate a connectivity type, an MP downloads sample files from our three landmark servers and considers one with the best average throughput

22 as its perceived connectivity speed. In essence, this procedure attempts to measure the perceived last-mile TCP throughput. We will discuss all details about these class-loading steps and configurations in section 3.2.1. In order to be online, an MP has to authenticate itself with the core. An MP starts the authentication process by opening a TCP connection to the core and communi- cating its identity and status to the core. In this process, the core not only verifies the MP identity but also receives information describing the current MP characteristics, such as bandwidth, IP address, and OS type. Here, the MP also checks with the core if there is a newer version of MP software available. If so, the MP performs a self-update to get new features and/or security patches. The details of authentication process and self-update are discussed in sections 3.2.2 and 3.2.2.2.1 respectively. Once authenticated, the MP is fully operational and the core includes this MP in the active MP list. Now, MP operations are handled by two threads, KeepAlive() and Measuring(), running in parallel. The KeepAlive() thread, section 3.2.3, is respon- sible for keeping the core informed about its status by periodic heartbeat messages. The Measuring() thread, section 3.2.4, is responsible for receiving and processing the measurement requests and delivering the results to the core. Upon termination, an MP sends an offline notification to the core to remove itself from the active MP list. This notification facilitates management of the list. If the MP goes offline abruptly or the notification get lost, the core will eventually be able to remove the MP from the active list after an absence of MP heartbeat for a certain period of time. The MP sends no offline notification only when the MP receives a self-shutdown notification from the core. See section 3.2.4.2 for further detail about the self-shutdown message. At this point, we have a brief understanding of how MP operates, the rest of this section will discuss each step in details.

23 Figure 3.2: Measuring thread on measuring point (MP)

24 3.2.1 MP-Loader, MP-Class, and MP Configurations

Once installed, the MP software comprises two components – MP-Loader and MP- Class – and two configuration files – dipzoom mp.conf and measures.xml. The fol- lowing describes detail of the components.

3.2.1.1 MP-Loader

The loader is a very small piece of code that is just enough to verify the MP-class certificate and perform a dynamic class loading. Each MP-Loader instance has a unique ID and AES 128 bits key embedded. The main purpose of having separate MP- Loader and MP-Class is to allow self-upgrade of the MP software suite. The loader has only a few functions and needs no change down the software development road. Most of the functions needed for the MP operation are implemented in MP-Class and are dynamically loaded during runtime. At runtime, the loader dynamically loads the newest version of MP-Class available locally. For security purposes, the loader verifies the MP-Class’s certificate for its authenticity and integrity before loading it into a memory. This design not only allows self-upgrade but also yields another benefit. Since it is so small, we are able to test the loader extensively and have high confidence in the correctness of the code because of its simplicity.

3.2.1.2 MP-Class

The MP-Class is a collection of Java classes packed into a JAR file. All the devel- opment and bug fixes of the MP software suite are carried out by modifying this MP-Class. Once ready to be released, the new MP-Class file is signed for security purpose and uploaded to dipzoom.case.edu website for download. Then, we advance the current MP version variable at the core. During the authentication step (section 3.2.2), the MP checks with the core for

25 the current MP version and commits self-upgrade if its own version is older. The version information of the MP code currently running is embedded into the filename of the MP-class, which has the format dipzoom mp-version.jar. When there are multiple local MP-Class files, the loader extracts version information from each file’s name and chooses to run the newest one. The reason we still keep the older versions locally is to allow fall back when the new version has a major problem.

3.2.1.3 MP Configurations and Custom Measurement Plug-In

MP configurations are in XML format and stored in two files, dipzoom mp.conf and measures.xml. The owner of the MP can modify dipzoom mp.conf to enable or disable a certain measurement type and also to fine-tune how much resources are allowed

for each measurement type through three parameters, mx interval, max byte, and max mx. These resource constraint parameters allow the user to set minimum inter- measurement time interval, per measurement data transfer volume, and a number of outstanding measurement requests allowed respectively.

Interface Name Mac Address IP Address ping 100 10 150000 curl 2000 10 150000 Figure 3.3: An example of MP configuration file dipzoom mp.conf in XML format

26 Figure 3.3 is an example of MP configuration in XML format from dipzoom mp.conf file. This example shows a single interface configuration but an MP could have mul- tiple interfaces indicated by ... enclosing tags. The 2nd to 4th lines describe this interface with the interface name (e.g. eth0 in Linux), the MAC address of this NIC, and the IP address assigned to this interface at runtime. An MP automatically updates the list of interfaces on dipzoom mp.conf file when it detects changes on system interfaces. For example, when a user reconnects his or her notebook to the home network in the evening, its wireless interface gets a different IP address and then the MP automatically updates the configuration file. The pair of IP address and MAC address allows the core to distinguish between two nodes behind the same NAT box and running the same MP with the same ID.

Each enclosing ... describes a certain measurement type. The tag indicates measurement tool’s name. The enforces a minimum inter-measurement time (in milliseconds) between two consecu- tive measurements. By increasing this parameter, the MP processes measurement re- quests less frequently and therefore reduces its resource consumption. The tag indicates the maximum number of outstanding measurement requests allowed for this particular MP. The core will not assign more requests to this MP than indicated in this parameter until some previously assigned requests has been completed. The

tag indicates maximum byte transfer allowed per each measurement of this type. The MP will terminate an ongoing measurement once it received or sent this number of bytes over the network. For each measurement, an MP starts a mon- itoring thread in parallel to the measurement thread to monitor the data transfer volume on the measurement thread and to terminate the measurement thread if the transfer volume exceeds a limit specified by the MP. Currently, the monitoring thread only enforces curl and wget and it works as follows. When curl and wget start, they save the on-going download into a temporary file, which will be deleted once the

27 measurement is done or terminated, and the monitoring thread sends a stop signal to the measurement thread if the file is large than the limit. In brief, this configuration file allows the user to set resource consumption limits through tunable parameters for each particular measurement offered by this MP. The MP software also has an option that allows MP owners to add their own cus- tom measurement tools through a measurement plug-in. To enable this option, a user simply modifies measures.xml file to add the custom measurement tools. However, a consent from the core is required before the MP can advertise this measurement to the platform and accept requests from clients. To help improve user experience, we have designed and developed a proof of con- cept plug-in management system. Figure 3.4, 3.5, 3.6, 3.7, 3.8, and 3.9 are screenshots of the system. In order to manage a registered measurement plug-in, a user first needs to login to the system. Figure 3.4 shows the login screen. As detailed below, once the password is verified, the user will be able to a) view the plug-in detail b) generate an MP configuration for the plug-in and c) modify the plug-in preference 2. A new measurement plug-in can be registered via a sign-up page (see Figure 3.5). Figure 3.6 shows the currently available measurement plug-in list. Once registered for the custom measurement plug-in, the user has to modify his/her MP configuration, measures.xml, accordingly. The system provides a tool that helps generate a proper configuration for each particular registered plug-in. Figure 3.7 shows a screenshot of this tool. The bottom part of the screen advises the user on what to give as inputs to the tool. Once the user has entered all the required information, the user can click ”Generate Configuration” and the system will generate the corresponding configura- tion as shown in Figure 3.8. In the generated configuration, ... specifies the measurement’s name. The ... instructs the MP on how to execute this plug-in, that is, which command to invoke.

2Preference is a profile of the plug-in. The system will incorporate part of the preference into a configuration generated for the plug-in.

28 ... is generated uniquely for each plug-in by the DipZoom core. The core uses this code to authenticate the plug-in during the MP login (c.f. section 3.2.2). Also, when the plug-in owner wants to allow other MPs to run the plug-in, the owner gives this generated configuration to them as a ticket; A method to automatically deliver the plug-in ticket to MPs is yet to be developed. The owner should also have the right to revoke this ticket (to be implemented in the future). The ... tells the MP how to exe- cute multiple requests of the same measurement; It has two execution modes, loop or parameter. The ”loop” mode is for a tool that does not accept a parameter to repeat measurement; in this mode, an MP executes a requested measurement N times. For example, if a client requests three nslookup from an MP, the MP will run nslookup three times. The ”parameter” mode is for a tool that accept a parameter to repeat measurement; in this mode, an MP passes a number of measurement as a parameter to the tool and execute it only once. For example, if a client requests an MP to ping a target three times, the MP will run ping with parameter ”-c 3”. The core applies the standard regular expression on the input parameters string in the request from the DipZoom’s client to filter out unwanted input parameters or to limit a set of acceptable input parameters. This is a per measurement type restriction and can be configured at the core. For example, ping cannot use a flood option. Additional regular expressions for restriction on the target and input parameters can be added by an MP via ”” and ”” respectively. The user is also allowed to modify their plug-in preference as shown in figure 3.9. The prototype implementing the measurement plug-ins is at the moment associ- ating with the backup copy of the DipZoom database. To enable this system, one would need to a) point the system to the operating database b) modify the core to be aware of the plug-in’s authentication code c) enhance the security of the system especially in a sign-up process for the author of a new measurement tool attempting

29 to make it available (e.g., implementing an email confirmation feature to increase accountability). We leave this as well as other improvements to the system as future work. Currently, the plug-in registration and its source code examination are done manually by DipZoom staff. Consequently, due to lack of manpower, the plug-in system is disabled in the operational platform.

Figure 3.4: Custom measurement plug-in management system: login screen

3.2.2 Authentication

In the authentication process (or MP login process), an MP notifies its online status and update the core with its current environment variables. Once the core verifies that the MP is genuine, the MP is made visible to all DipZoom clients and measurement requests to this MP are allowed. If the login has failed, the MP sleeps for a randomly selected time within the 5-10 second interval (for reasons explained below) and repeats the login process again. If the login fails more than 300 times consecutively, the MP sleeps for 10 minutes and starts the login process again. The reason the MP keeps repeating the login until success is to make the MP robust for any intermittent interruptions such as the network connection being down or the DipZoom core being not responsive. Normally, a machine running as a server might restart only a few times a year at most. Even on a notebook computer, a user could keep using a sleep mode instead of shutdown, and months may pass between

30 Figure 3.5: Custom measurement plug-in management system: new measurement sign-up restarts. Since the MP only starts when the computer boots up, we do not want the MP to terminate because of a temporary network or service outage and then wait for a month or so to be back online again. Therefore, the MP keeps trying to login and backs off after a while before trying again. The random inter-retrial time is to help spread re-logins of multiple MPs over a period of time. For example, when a network with multiple MPs goes down and later comes back online, the random inter-retrial time protects the core from a burst of simultaneous logins from all these MPs. From our experience, a randomly selected time within the 5-10 second interval is good enough to absorb this burst of logins effect at the core side. However, this number is a tunable parameter that we could fine-tune it as needed. The authentication process can be described in three phases – verification, infor-

31 Figure 3.6: Custom measurement plug-in management system: available measure- ment list

mation update, and finalization – as follows.

3.2.2.1 Phase I (Verification)

To begin a login process, an MP opens a TCP connection (destination port 2626)

to the core and sends a message "MP ID;", referred to as a HELLO message. The MP ID in the first field is in plain-text and is used by the core to identify to which MP this message belongs. The string enclosed with

<...> is a cipher-text encrypted with AES 128 bits algorithm. Only the core and the corresponding MP can decrypt this message using a symmetric key embedded

in an MP-Loader. The FLAG is set to ONLINE NOTIFY in this message to inform the core that this MP is coming online; the Version is the current MP software version; The Rand#1 is a random number that is included for security purpose. Without this random number, the login cipher-texts from different MPs differ only in the first field

(MP ID), which also appears in plain-text. This could make the protocol prone to a decipher attempt by a malicious player. Once the message arrives, the core retrieves the corresponding cipher’s KEY for this MP ID from the database and uses the key to decrypt the cipher-text inside

<...>. If the KEY lookup for this ID fails, the ID is invalid; the core replies with AUTH FAIL message and terminates the authentication process. If the lookup succeeds,

32 Figure 3.7: Custom measurement plug-in management system: MP configuration tool

the core compares the MP ID in the first field, which is in plain-text, with the MP ID decrypted from the cipher-text. If the two IDs match, the message is legitimate and can be trusted. The core creates an object for this MP in the main memory and uses this object to store the current MP status and environment. The core replies

to the MP with a message "". The FLAG is set to VALID MP and UpdateString contains the current released version of MP software and the URL of this released MP-Class. The core also includes the

Rand#1 with this message and adds the message with the newly generated Rand#2 number. The principle is that each end sends a random number and echoes the most recently received random number to prevent replay. Note that, from this point on, all the messages from both sides will be completely encrypted. Unlike the first HELLO message that included MP ID in the clear to allow the core to identify the right key, these subsequent messages encrypt even MP ID because the core will now use the same key for all messages arriving on the current TCP connection.

33 Figure 3.8: Custom measurement plug-in management system: generated MP con- figuration

List Of Data In MP INFO MP Identification Number Network Interface Name MAC Address Of NIC IP Address Of Network Interface Operating System Type List Of Measurement Tools & Parameters3 Effective Bandwidth4

Table 3.1: Detail of MP INFO field

3.2.2.2 Phase II (Information)

The MP receives the reply from the core and decrypts the message. If the FLAG is not VALID MP, the MP records the reason in the log file and terminates the authentication process. To prevent a replay attack, the MP checks if Rand#1 matches the number used in the HELLO message. The MP checks the UpdateString and commits self- update process if the released version is higher than its current version. If no self-upgrade needed, the MP sends a message containing information about its current status and a list of offered measurements to the core. This message (the

34 Figure 3.9: Custom measurement plug-in management system: preference second to arrive following the HELLO message) is called INFO message and its format is . As usual, the MP includes Rand#2 and adds Rand#3 to the message to prevent replay as well as other security threats. A possible range of FLAG values is discussed in section 3.2.4. The MP Info field contains the MP’s profile (i.e., MAC Address, IP Address, and Operating System) and a list of offered measurements along with their resource constraints in XML format. The list of offered measurements is customizable by altering the MP’s configuration. Table 3.1

shows an example of the MP Info field. The core will update its state for this MP. Therefore, if the user changes the MP configuration, the new configuration will only be in effect after the next MP login. In practice, the user simply restarts the MP after the modification.

35 3.2.2.2.1 Self-Update Process During the authentication, the MP checks for an available update and starts a self-update process if there exists a new version of MP. This process starts by downloading the current version of MP-Class from

the URL specified in UpdateString field of the reply message in the 2nd phase of authentication. Once the download is completed, MP-Loader verifies if the MP- Class is properly signed. If the verification is failed, the MP stops the self-update process and goes on with authentication process using the current MP-Class. Upon a successful verification, the MP terminates the authentication process and reloads the new MP-Class into a memory. Figure 3.1 shows details of this reload which starts from MP loader unloading the current MP-Class from the memory and proceeding to the ”LOAD MP-Class” step to dynamically load the new MP-Class along with all configurations into the system memory. Then, the MP restarts an authentication with the core.

3.2.2.3 Phase III (Finalization)

The core receives INFO message from the MP and decrypts it. The core checks Rand#2 to make sure this is not a replay message. The core than extracts MP information from MP INFO field and stores it in the MP object in its main memory. The core determines the autonomous system number and geographical location of this MP using GeoIP database from MaxMind [54]. To avoid confusion in cases when an MP is behind a network address translation (NAT) device, the core extracts the IP address

from the MP INFO field and not from the IP header of the message. This information is also stored in the MP object.

To finalize the authentication process, the core replies with message. The FLAG is set to NONE this time and Rand#3 is included as usual. The SEQ is a current sequence number for the UDP Keep Alive message system. Details of the UDP Keep Alive system will be discussed in the next section (Section 3.2.3).

36 The core uses AdsMsg text to communicate to the MP owner any DipZoom news, notifications, and so on. The MP receives the message and decrypt it. As usual, the

MP checks if Rand#3 is correct. If so, the MP extracts SEQ and uses it as an initial sequence number in its KeepAlive message system. Finally, the MP prints out the

AdsMsg on the screen and finishes the authentication process.

3.2.3 Keep Alive

The DipZoom KeepAlive message is a UDP message that serves two purposes – 1) as a heartbeat to keep the core informed about the MP operation status, and 2) to keep a hole on the NAT/Firewall opened for UDP messages from the core to be able to traverse back to the MP when the MP is behind a NAT/Firewall. The KeepAlive messages are sent by a special KeepAlive thread within the MP. The KeepAlive thread keeps the core informed about MP’s status by sending three duplicated UDP messages (with the same sequence number) to the core every 60 seconds. This redundancy helps increase platform resiliency under certain adverse situations such as packet drops due to network congestion. The sequence number is advanced over time and the core always ignores the UDP messages with obsolete sequence numbers. To make the core robust against a potentially excessive churn of online MPs, how- ever unlikely (especially that MPs are not under the core control and could maliciously perform rapid succession of departures and joins), we implemented the following ap- proach to processing MP departures. Having not received a KeepAlive message from an MP for more than three minutes (or three consecutive unique KeepAlive mes- sages, as an MP sends out the message every minute), the core sets the MP’s status to OFFLINE, but still keeps information of the MP in the main memory until the next MP clean-up cycle. At the end of every cycle, which is 10 minutes by default, the core performs a special garbage collection to free records of inactive MPs in the

37 main memory. There are several situations where an MP goes offline temporary and comes back online again shortly – e.g., the client is rebooted, and the client’s network connection is congested/disconnected. In such cases, without implementing the clean- up cycle, the core would have to spend CPU cycles cleaning the MP profile up and reconstructing the MP’s profile in the main memory when it appears online again. With the clean-up cycle, if the inactive MP comes back online again before the next clean-up cycle, the core just simply changes a status of the MP back to online. This function help reduces workloads on the core at the price of larger memory footprint. When the core receives a KeepAlive message from an MP and finds that this MP status is not online, the core sends ReAUTH message over UDP (see Section 3.2.4.3), to force the MP to start the authentication process over again. This situation mostly happens after a period of network connection downtime on the MP side. Once the MP comes back online, re-authentication is done not only for security purpose but also to allow the MP to update the core with its current status and configurations. Note that the outstanding measurement requests and/or results on MP are saved locally before the Re-Authentication and the Measuring() thread is resumed after the authentication. Essentially, DipZoom offers a best effort measurement and experimenters have to use measurement results carefully. For instance, if an MP receives a measurement request and then goes offline before executing it, the request is saved locally and will be executed later on when the MP comes back online again, which could be the next minute or the next day. In the later case, the measurement result might no longer be meaningful to the measurement campaign. It is a responsibility of the experimenter to filter out the non-meaningful results. Besides heartbeat information, periodic KeepAlive messages help maintain a map- ping in the NAT/Firewall that may shield the MP from the rest of the Internet. This hole allows UDP traversal from the core to MPs behind NATs/Firewalls. The core

38 uses this UDP channel to alert MPs about the incoming requests and enforces certain activities such as ReAuth, Self-Shutdown, and so forth.

3.2.4 Measurement

After successful authentication, the MP is able to handle measurement requests and submit measurement results back to the core. This activities are performed by the Measuring() thread. This thread is running continuously until the MP receives Self- Shutdown message (UDP) from the core or is terminated by the user. We design the DipZoom platform to use UDP messages to notify MPs on incoming requests and control certain MP behaviors. We choose UDP because of its light weight and small resource footprint. However, a UDP channel doesn’t work in every situation (as explained in Section 3.2.4.1) and we use TCP as a fallback channel whenever UDP notifications fail to arrive. Figure 3.2 is a flow-chart describing how this Measuring() function works. To begin with, the CheckUpdateViaTCP timer is reset and the timeout is randomly selected from a range of 300-600 seconds. If no UDP notifications will arrive within the timeout interval, the MP will itself open a TCP connection to the core to check for missed updates. Then, the MP is listening for any UDP messages 5.

The UDP notification message format is ID:. ID is the MP identification number. FLAG indicates UDP message types – TKAlert, ReAuth, Self- Shutdown, and ACK. RAND is a random number added for security purpose. The notation <...> indicates that the information inside is encrypted. Once the MP receives a UDP message, it checks the message integrity by com- paring the plaintext ID from the 1st field to the 2nd ID obtained by decrypting the cipher text. If both IDs are the same, the message is authentic. The MP then checks the type of the message from the FLAG field and handles the four possible message

5Table 3.2

39 types as follows.

3.2.4.1 Acknowledgement (ACK)

The core sends an ACK message (UDP) for every 3rd arrival of KeepAlive messages 6 from the MP. When the MP receives the ACK message (or any UDP messages7), it resets the CheckUpdateViaTCP timer. The timeout of this timer varies from 300 to 600 seconds to avoid possible bursts of TCP sessions at the core. We pick this range arbitrarily and it has been working well. The MP uses the timer to determine whether this MP is reachable by the core. When the timer goes off, the MP assumes that the core cannot communicate with this MP via UDP. Therefore, as a fallback, the MP actively opens a TCP connection 8 to the core and checks for any updates and/or outstanding measurement requests. This situation sometime happens when the MP is behind some variation of NATs/Firewalls.

3.2.4.2 Self-Shutdown

Upon receiving the Self-Shutdown message, the MP just stops all the threads and terminates itself. The core uses this message to force a certain MP to go offline. For example, when two MPs with the same ID attempt to come online, the core enforces the invariant of having unique MPs by forcing the one that went online first to go offline by sending it a Self-Shutdown message. The core uses IP and MAC addresses to distinguish between two instances of MPs with the same ID, and assumes that the most recent online instance is a more relevant one. This prevents one from copying the same MP instance and running them from multiple machines. In fact, knowing the MPs’ IP and MAC addresses, the core also prevents one from operating multiple MP instances (with distinct MP IDs) on the same machine. Again, the core assumes that

6Described in Section 3.2.3. 7How the MP reacts to other types of UDP messages is described in Section 3.2.4. 8Table 3.2

40 the most recent MP instance is the proper one and sends the Self-Shutdown message to any earlier instances operated by a host with the same IP and MAC addresses, while marking these instances offline.

3.2.4.3 Re-Authentication Process (ReAuth)

Upon receiving the ReAuth message, the MP stops the KeepAlive() thread and starts the Authentication process again. After the successful authentication, MP restarts the KeepAlive() thread and the Measuring() thread goes back to listen for UDP messages from the core.

3.2.4.4 Ticket Alert (TKAlert)

When a DipZoom client sends a measurement request to the core, the core generates a corresponding ticket for each request. Each ticket contains information describing how to perform the measurement (e.g. contains the measurement type, measuring target, and other input parameters) and the requester (client) information. Then, the core sends a special UDP message, called Ticket Alert (TKAlert) message, to the corresponding MP to inform it of the availability of the measurement request ticket. Upon receiving the TKAlert (Ticket Alert) message from the core, the MP opens a TCP connection with a destination port 4646 to the core to check for any updates and available outstanding measurement request tickets. The MP processes each ticket and performs the corresponding measurement sequentially. The MP pauses for a certain customizable period of time, described in section 3.2.1, before it proceeds to the next ticket. Once all the tickets are processed, the MP opens another TCP connection to the core and uploads all the measurement results. Also, over the same TCP connection, the MP downloads new outstanding tickets if available. The MP then processes the downloaded tickets in the same fashion. This process continues until no more outstanding tickets are available. Then, the MP goes back to listen for

41 Activity Descriptions Listening Ports Authentication TCP destination port 2626 on the core Re-Authentication Measurement requests retrieval TCP destination port 4646 on the core Measurement results submission UDP KeepAlive messages UDP destination port 2626 on the core UDP notifications from the core UDP destination port on an MP is allocated by an OS on the MP at runtime. The core extracts the port number from KeepAlive messages

Table 3.2: TCP/UDP destination ports summary

a UDP message again. Please note that all the TCP connections concern ticket retrieval, result submis- sion, and other updates. In the absence of measurement requests, all control messages are exchanged over UDP in the common case. We use UDP instead of TCP because of its smaller resource consumption nature.

42 3.3 The DipZoom Client and API

A client side is where researchers request DipZoom to perform their requested mea- surements. Therefore, we want to make this experience as simple as it can be and yet flexible enough to accommodate a complex multi-steps large-scale measurement. DipZoom client provides certain functionality as follows.

• Measuring point (MP) and measurement type lookup A client can request a list of currently active MPs that support certain mea- surement type and have certain characteristics (based on the applied filters)

• Measurement request A client can send a set of measurement requests to a set of active MPs via the core

• Measurement results retrieval A client is notified when the measurement results are ready and then can retrieve them from the Core

We developed a Java library that allows developers to access platform resources through a simple API. An experience with DipZoom measurement would be simply running a Java code on one’s laptop and then waiting for the platform to perform global measurements, gather all the results, and deliver them back to one’s local machine. Further technical and design details on the client operation and API can be found in [89] 9 [26].

9The DipZoom is a joint work led by Sipat Triukose and Zhihua Wen. The client API and its corresponding operation within the core is mainly developed by Zhihua Wen.

43 3.4 Security

DipZoom is by design an open system: any host can join as an MP and/or a client. As in any open system, security becomes an important issue. Table 3.3 lists the main security threats in DipZoom environment, and our counter-measures.

Security Threat Counter-measure Induced DoS attack against Per-target rate limiting measurement target DoS attack against an MP Per-MP rate limiting (MP-enforced) Measurement side-effects ”Trial” measurement by the core Measurement pollution Enforcement of single MP instance per host Fake measurement Individually generated MP instances with unique embedded keys Replayed measurement Using nonce in DipZoom protocol DoS attack against A combination of customary DipZoom core defensive measures

Table 3.3: Security threads in DipZoom and counter-measures

Induced DoS Attack This is an attack where a large set of MPs are requested to perform certain measurements on the same target host in order to consume resources or interrupt services on this host. The DipZoom core has an aggregate view of how many measurements target each particular host; this allows the core to limit the total number of outstanding measurements (across all MPs) targeting each host by declining additional requests from clients. Details on how the core declines a request from a client is discussed in [89]. Currently, this limit is a conservative global limit for all types of measurements, but per-measurement-type limits can be added.

DoS Attack against an MP In this attack, an attacker tries to continuously request a large number of measurements from a particular MP in order to consume its resources and degrade its performance. The DipZoom core rate-limits a number of outstanding measurement requests for each MP according to restrictions specified by the MP itself in its configuration file.

44 Basically, the denial of service attacks against measurement target and an MP are dealt with in a straightforward way through rate limiting.

Measurement Side-Effect This is an interesting threat where an MP can be used as a proxy to access certain restricted resources and/or services. For an example, some devices with a Web-based management interface could be configured by invoking a certain URL. An attacker can reconfigure such a device even if it is behind a firewall by requesting an MP behind the same firewall to perform a download of the URL. To prevent this, for URL access measurement (such as curl and wget) requests, the DipZoom core will attempt to perform the same measurements by itself once (using HTTP HEAD request to improve the performance). As long as DipZoom can perform the measurement from outside the MPs network, letting the MP do the same does not increase the vulnerability.

Measurement Pollution An attacker may attempt to skew measurement results by polluting the set of available MPs with a large number of MP instances that run on a specially configured host that produces desirable measurement results. The core counters this threat by enforcing the condition that at most one MP instance can login from any given IP and MAC address. Note that a multi-homed host can legitimately run multiple MP instances because different interfaces may have different performance characteristics. However, this measure is not effective against a corner case when a malicious MP can a) fake MAC addresses and b) get multiple IP addresses from the network.

Fake Measurement A malicious MP may also fake measurement results or lie to the core in announcing some of its characteristics such as operating system. A definitive defense against this threat can only be provided by a trusted computer supporting program attestation that Microsoft and Intel are working on [31].

45 Replayed Measurement An attacker could replay its own or someone else’s mes- sages with measurement results to the core. DipZoom uses a standard solution of including a nonce in every message between the core and an MP and reflecting the received nonce in the next message in the opposite direction (that is, every message carries a reflected nonce received from the other party and a new nonce). Since all interaction between MP and the core are encrypted, only legitimate MP/Core can reflect the incoming Nonce. Nonce is an arbitrary number that is generated and used only once. The system keeps a list of nonces on the outstanding interactions, therefore an attempt to replay an outdated message will be detected and rejected.

DoS Attack against DipZoom Core Finally, there is a question of a denial of service attack against DipZoom core. To guard against a DoS attack to the core is the same as to guard against any public service servers. An approach could be a combination of a firewall, IDS system, and other tools. As with other public services, this issue is an ongoing arms race.

46 3.5 Performance

Earlier in this chapter, we have discussed how the DipZoom platform works, how all components interact, and how the platform ensures security. One of the keys to success of DipZoom is its scalability. We show platform scalability in Section 3.5.1 and 3.5.2 , and demonstrate practicality of DipZoom in an actual research problem in Section 3.5.3.

3.5.1 Scalability: Measuring Point Fan-Out

Our first set of tests concerns the number of MPs the core can support. We study the number of MP login operations and the operations involved in a measurement that the core can support. For these tests, we disable the core feature that limits each host to only one MP instance and started a number of instances on the load- generating machine. In our experiment, we use a typical low-end server machine (Sun Fire X2100) as a core and used a more powerful machine (Penguin Altus 1400 with Opteron 275 and 4GB memory) to generate workloads. our preliminary results show that the core, even when run by a single low-end server, will be able to handle the load for a substantial number of participants. To study the login load the DipZoom core can handle, we conduct two experiments, the burst test and continuous test. The burst test examines the limit of concurrent MPs that can login to the core. The continuous test considers the average sustained rate of MP login successes at the core. The burst test is conducted by instructing a number of MPs to login to the core simultaneously and counting the percentage of successful logins and the time to complete the last login operation. In all the experiments (with up to 1000 simultaneous logins), all logins succeeded. Figure 3.10 shows the time it took the core to process all logins in the burst. It shows that even a backlog of a 1000 logins is cleared in less than 12 seconds, which seems acceptable

47 because this is not an overly time-sensitive operation. For the continuous test, we start 350 MPs on the load generating machine, with each MP logging in and out every second. The test continues until 70,000 MP login successes have been reached. Figure 3.11 is the histogram showing the number of login successes observed in each second of the experiment. We see the sustained rate of around 150 logins/sec, with the tail showing the clearance of the backlog of the extra operations. Note that each login recorded in this test actually corresponds to two operations, a login and a logout (we do not report them as separate operations because login is twice more expensive). Thus, on average the core can support around 300 logins and logouts per second. Assume that on average a regular user turns on his computer 3 times per day (once in the morning at work, another time in the afternoon after lunch, and the last time at home). Then 150 logins/sec could support up to 4.3 million MPs assuming their logins are uniformly distributed over a 24 hour period. Although uniform logins are unrealistic, clearly the core can support a large number of logins. Now let us turn to the interactions between the MP and the core to perform a measurement. There are two operations involved: getting the tickets and sending back the results. For this test, we start 300 stub MPs, which login, then repeatedly get a ticket, immediately respond with a pre-recorded result, and sleep for one second. We then record the number of successful operations (not counting initial logins) at the core. Note that both getting a ticket and reporting the result involve opening a new TCP connection. Figure 3.12 shows the number of operations completed in each second (we ran the test until every MP completed 200 iterations). Ideally, we should see 600 operations per second. However, Figure 3.12 shows much lower rate. Still, it shows the sustained rate of operations of over 100 op/sec.

48 Figure 3.10: Burst MP logins test

3.5.2 Scalability: Client Fan-Out

DipZoom scalability with respect to clients was studied by Zhihua Wen and has been documented in [89].

3.5.3 Demonstration Experiments

To draw lessons in using the DipZoom platform in an actual measurement, we con- ducted an experiment testing the accuracy a proposed tool for measuring the distance between a pair of hosts: King [37]. The evaluation reported in [37] describing this tool was limited to well-connected servers as one of the hosts in each pair, because the authors only had publicly available traceroute servers at their disposal. With Dip- Zoom, we were able to use residential measuring points using MPs that our colleagues downloaded at our request, thanks to the simplicity of joining the DipZoom platform. Thus, we were able to test the accuracy of these tools separately for well-connected servers (using MPs that we deployed on PlanetLab nodes) and for residential hosts.

49 Figure 3.11: Average MP login successes

We implemented the experiment as Java programs on top of the DipZoom client library. King measures the distance between arbitrary pair of hosts by cleverly tricking their respective DNS servers to query each other and approximating the distance between the hosts by the distance between their DNS servers. In our experiments, we select one of the hosts in each pair to be a computer running a DipZoom MP and then compare the distance from this computer to the other host with the measured distance from that MP to the other host. For the well-connected MPs, we selected three PlanetLab nodes in Italy, Ohio, and New York. For the residential MPs connected by DSL lines, we used three hosts in Ohio, Texas, and Michigan. We also performed this measurement using an MP connected via high-speed DSL (6Mbps), and an MP using

50 600 500 400 300 200 Successful Operations Per Seconds 100 0

0 50 100 150 200 250 300 Time in Seconds

Figure 3.12: The sustained rate of MP operations involved in a measurement a recently introduced Verizon Broadband Cellular connection (advertised bandwidth of 500Kbps). For the other host in the pairs, we selected 1000 pingable nodes from a set of 200,000 IP addresses from a Gnutella network snapshot [35][36]. For each pair, we measure the distance using King four times, and then directly using DipZooms ping measurements 10 times. Following the format of the King paper, we represent the accuracy of King measurements as a ratio of the King result to DipZooms ping. Figure 3.13 presents the CDF of the ratios for all MPs (All Users), as well separate CDFs for the well-connected MPs, residential DSL-connected MPs, and the MP using cellular wireless. The more the CDF curves jump vertically along the ratio 1, the more accurate King estimates are. The values of the CDF function left of x=1 point show the percentages of under-estimated results. Figure 3.13 shows that considering all MPs together, over 80 percents of King results tend to underestimate the distance (matching closely the corresponding result from the King study) and almost half of all estimations have values less than half of the actual latency. The latter result is

51 significantly worse than measured in the King study. We find an explanation to the discrepancy by considering individual classes of MP. The well-connected MPs show very similar accuracy to that reported in King study. Thus, the discrepancy is due to the slower-connected residential MPs. At the extreme, the cellular MP shows extreme inaccuracy (under-estimation) of King estimations. We should mention that, during the experiment, we find that King is unable to measure distances between many pairs of IP. Approximately 20 percents of all residential IPs we have tried consistently cannot be measured by King. We mostly find this problem prevailing in nodes from Europe and East Asia. In summary, our experiments estimates the accuracy of King as the distance estimator for well-connected nodes. At the same time, DipZoom also allows us to show the nuanced differences in King accuracy for hosts with diverse connectivity.

Figure 3.13: Classified King/DipZoom ratios

Recently, TurboKing [49] tries to improve the accuracy of original King. However,

52 the improvement has to be traded off with the more complex nature of this purposed tool. This experiment is a concise example of how DipZoom can be used in an actual network measurement. The more comprehensive examples are covered in chapter 4, 5, and 6 where we utilized DipZoom in our CDN studies. Of course, DipZoom is an alternative to existing platforms such as ScriptRoute and PlanetLab. However, key advantages of DipZoom are as follows.

• DipZoom matchmaking approach enables the platform to grow with a low in- vestment cost because no dedicated infrastructure beyond the core is strictly necessary.

• DipZoom can facilitate diversity of MPs due to its P2P nature, platform inde- pendence, and orientation on running the MP on users’ end devices.

• DipZoom API lowers the barrier of entry for conducting large-scale Internet measurements, as one can implement a global multi-step experiment by simply coding and running a local Java program.

We conclude that DipZoom is a promising alternative with a potential to go beyond the limitations of currently available Internet measurement tools.

53 3.6 Conclusion

This chapter presents our implementation and initial experiences with DipZoom, a system for facilitating diverse on-demand network measurements. The system ad- dresses several key needs of networks researchers and designers, as well as IT special- ists. First, it addresses the challenge of creating a measurement platform that would be representative of the scale and diversity of todays Internet. Instead of deploying its own platform, DipZoom makes it easy for Internet users to join the network of measurement providers. Second, it makes it simpler for measurement requesters to stage an experiment. Indeed, an experimenter has a consistent view of the totality of all the measuring points currently available and has a coherent interface to discover the MPs that suits her needs. Further, one can script a complex and long-running experiment, launch it, and collect the results from one’s own computer using a general programming language (Java). By simplifying the creation of measuring points and improving accessibility of measurements, we hope DipZoom will increase the use of sound measurements in the networking arena and thus reduce the scope for decisions based on guesswork and intuition.

54 Chapter 4

A Large Scale Performance Study of a Commercial CDN

In this chapter, we present a performance study of the leading content delivery provider Akamai. We measure performance of the current Akamai platform and con- sider a key architectural question faced by both CDN designers and their prospective customers: whether the co-location approach to CDN platforms adopted by Aka- mai, which tries to deploy servers in numerous Internet locations, brings performance benefits over a more consolidated data center approach pursued by other influential CDNs such as Limelight.

4.1 Introduction

Two main “selling points” of a CDN service are (a) that they supply on-demand capacity to content providers and (b) that they improve performance of accessing the content from user perspective because they deliver the content from a nearby location. This study focuses on the second aspect, and considers the question of how widely dispersed a CDN platform needs to be to provide proximity benefits to the users.

55 This study is motivated by the ongoing active discussion on the two main ap- proaches in CDN design that have emerged over the years. One approach, exemplified by Akamai and Digital Island (currently owned by ), aims at creating relatively limited-capacity points of presence at as many locations as possible. For example, Akamai platform in 2007 spanned more than 3,000 locations in over 750 cities in over 70 countries, with each location having on average less than ten servers, and their platform has grown further since then (see http://www.akamai.com/hdwp p.2). The other approach utilizes much larger data centers, comprising thousands of servers, but in many fewer locations. The examples of providers pursuing this ap- proach include Limelight and AT&T. Limelight currently lists 20 data centers on its web site [50]. In practice, there may be complex reasons contributing to this design choice. On one hand, Akamai attempts to obtain free deployment of its cache servers at some ISPs in return for reducing the ISPs’ upstream traffic thus reducing the cost of run- ning its platform. On the other hand, consolidated platforms pursued by Limelight and AT&T can be more manageable and often are deployed in data centers owned rather than rented by the CDN. Still, a large number of locations is often cited as directly translating to improved client proximity and content delivery performance. We therefore focus on the technical factor and address the question: how many lo- cations is enough from the client-observed performance perspective? In 2000, there were over 10K ISPs in the world [18], and some large ISPs surely warrant presence in more than one location. So is 30 locations enough, or 600, or 6000? When does one hit the diminishing return in terms of improving client proximity by increasing the number of locations? Note that this issue is orthogonal to the overall CDN capacity. By provisioning enough network connectivity, power supply, and servers at a given data center, one can assemble a very large aggregate CDN capacity at relatively small number of

56 data centers. For example, from the statement on Limelight’s website that “Each Limelight Networks Delivery Center houses thousands of servers” [50] one can infer that Limelight has at least 20,000 servers across their 20 data centers, which is only at worst twice as few as Akamai, despite having two orders of magnitude fewer locations. One would assume that content delivery networks would have done this study themselves long time ago. This might be true - we will never know. However, pro- prietary research is not open to public (and public scrutiny) and is often driven by vested interests. In any case, drastic differences in the number of locations used by different CDNs indicate that any proprietary research they might have conducted has not resolved this general question adequately. This study attempts to answer the above question by examining Akamai perfor- mance. We chose Akamai because it is the dominant CDN provider, both in terms of the market share and size. Our general approach is to investigate how performance of Akamai-accelerated content delivery would suffer if it were done from fewer data centers. Obviously, this approach can only find the point of diminishing return if it happens to be below the Akamai’s number of locations. The study results indicate that this is indeed the case. An abstract of our preliminary results appeared in [84]. This study present the complete work. This work makes several contributions:

• Edge server discovery. CDNs and their performance have been a subject of a number of studies [41, 45, 13, 79]. Most of them, including all the studies cited above, require the discovery of edge servers. Previously, this required one to gain access to and communicate with a large number of hosts in diverse geographical locations. Our study raises awareness of a simple and effective edge server discovery procedure using a generic measurement platform, which can be accomplished by anyone by writing what looks like a simple local Java application.

57 • CDN performance improvement. CDNs offer capacity on-demand, and hence overload protection, to subscribing content providers. But do CDNs improve user experience during normal load? Krishnamurthy et al. [45] compared the performance of different CDNs, but not the performance improvements that a given CDN delivers to a given content provider. We consider how Akamai- selected cache improves the download performance and the quality of Akamai server selection when it selects a cache for a download.

• The number of locations. We address a question whether a large number of points of presence improves CDN performance. While being considered just one aspect – performance – of this issue, this contributes to the debate on the merits of the co-location vs. in-core approaches to CDN design from the customer perspective.

• Security. In the course of this study, we discover a significant hole in Akamai’s protection against application-level denial of service attacks. We further study this problem in detail and present the complete work on this issue in Chapter 5.

In addition to the above performance insights, we hope the methodology we de- velop to obtain them will be useful for others conducting research in this area.

4.2 Methodology

This section describes the major experimental approaches we used to conduct our study.

4.2.1 Edge Server Discovery

Most CDN investigations involve the discovery of the CDN’s edge servers [45, 13, 79]. The basic technique for edge server discovery is well established and involves

58 Figure 4.1: Active DipZoom measurement points on 5/09/09. simply identifying and resolving an outsourced hostname. For example, nslookup on “images.amazon.com” or its canonical name “a1794.l.akamai.net” will usually return at least two IP addresses of edge servers. The challenge, however, arises when one needs to harvest a large collection of edge servers since this requires hostname lookups resolutions from different geographical locations. To avoid the complexities of gaining access to and communicating with multiple hosts, we utilized DipZoom, a peer-to-peer Internet measurement platform, for this purpose (cf. Chapter 3). DipZoom has a large number of measurement points around the world, and it allows global experiments to be implemented as local Java applications, without the need to explicitly interact with the individual measurement points. While the available DipZoom measurement points vary in time, there are typically more then 400 active MPs available, mostly on PlanetLab nodes and about 5-10% are academic and residential hosts. As an indication of geographical coverage, Figure 4.1 shows a typical map snapshot of active DipZoom peers cropped from [25]. In our discovery process, we complied canonical names outsourced to Akamai by downloading and examining Web page sources of the 95 Akamai customers listed

59 on the Akamai’s Web site. We then periodically (twice a week) performed DNS resolution of these names from a large number of network locations over a period of 13 weeks. As the result, we harvested just under 12, 000 Akamai edge servers, of which 10, 231 servers were pingable at the end of the discovery process. By clustering these edge servers by city1 and autonomous system, with conservatively estimate we discovered at least 308 locations. The real number is likely higher as the discovered edge servers represented 864 distinct /24 prefixes. We attempted to discover Akamai’s edge servers twice. In the first attempt, we launched a small scale discovery aiming to understand implications of the number of vantage points and the number of outsourced addresses on the discovery rate; details are described in Section 4.3. Having gained understanding of the process, we launched a larger scale discovery and therefore gained about 12,000 edge servers, as describe in the previous paragraph. We used the set of edge servers from our second attempt in our performance study of a commercial CDN.

4.2.2 Overriding CDN Edge Server Selection

In assessing CDN’s performance, our performance metric is the effective throughput of the page download as reported by the curl tool [23]. To measure download per- formance from a particular edge server rather than the server of Akamai’s choosing, we need to connect to the desired edge server directly using its raw IP address rather than the DNS hostname from the URL. We found that to trick an arbitrary Akamai’s edge server into processing the request, it is sufficient to simply include the HTTP host header that would have been submitted with a request using the proper DNS hostname. For example, the following invocation will successfully download the object from a given Akamai edge server (206.132.122.75) by supplying the expected host header

1We used the GeoIP database from MaxMind [54] (commercial version) to obtain server’s geo- graphical information. GeoIP was able to map 98.13% of our discovered edge servers.

60 through the “-H” command argument:

curl -H Host:ak.buy.com \

"http://206.132.122.75/db assets

/large images/093/207502093.jpg"

4.2.3 Controlling Edge Server Caching

Some experiments require forcing the HTTP requests to be fulfilled from the origin server and not from the edge server cache. Normally, requesting a cache to obtain an object from the origin server could be done by using HTTP’s Cache-Control header. However, as we will show shortly, Akamai’s edge servers do not seem to honor the Cache-Control header in client requests. To manipulate the edge server to obtain the content from the origin, we exploit the following observation. On the one hand, mod- ern caches use the entire URL strings, including the search string (the optional portion of a URL after “?”) as the cache key. In particular, a request for foo.jpg?randomstring will be forwarded to the origin server because the cache is unlikely to have previously stored an object with this URL. On the other hand, origin servers ignore unexpected search strings in otherwise valid URLs. Thus, the above request will return the valid foo.jpg image from the origin server. To verify this technique, we performed a series of downloads from “planetlab1.iii.u- tokyo.ac.jp” of an Amazon object ”http://g-ec2.images-amazon.com/images/G/ 01/nav2/gam- ma/n2CoreLibs/n2CoreLibs-utilities-12475.js” using edge server 60.254.185.89. By selecting the client and edge server that are close to each other but likely distant from the origin server (both the client and the edge server are in Japan while the origin is likely to be in the US), we hope to be able to distinguish download from cache vs. from the origin by the performance difference. Table 4.1 lists the throughput of the download series (the long object URL is

61 Target & Parameter Throughput 1 /foo.js?rand1 43 KB/s 2 /foo.js?rand1 7750 KB/s 3 /foo.js?rand1 5300 KB/s 4 /foo.js?rand2 61 KB/s 5 /foo.js?rand2 8000 KB/s 6 /foo.js?rand2 4850 KB/s 7 no-cache, /foo.js?rand2 5070 KB/s 8 no-cache, /foo.js?rand2 6780 KB/s 9 no-cache, /foo.js?rand3 56 KB/s Table 4.1: Initial vs. repeat download performance of an object with an appended random search string. replaced with “foo.js” for convenience). Download 1 is a non-cache download and shows poor performance. Downloads 2 and 3 are the downloads with the same ran- dom string; they exhibit distinctly higher performance, reflecting the fact that these requests are fulfilled from the cache, which stored this URL as the result of the first download. The downloads 4-6 confirm this behavior with a different random string. However, downloads 7 and 8, using the previously seen URLs with a no-cache header, produce similarly high performance, indicating they are processed from the cache de- spite the no-cache directive. Yet download 9, with the no-cache directive and a new random string, exhibits performance comparable to downloads 1 and 4 and indica- tive of non-cache downloads. This shows that the No-Cache directive of the HTTP Cache-Control header is not honored by Akamai. In summary, we can precisely control the caching behavior of Akamai’s edge servers by appending search strings to image URLs. The first appearance of a string will cause the download to occur from the origin server while the subsequent downloads of the same string from the same edge server will occur from the edge server cache. A series of retrievals of the same content with different random strings will cause the edge server to obtain the original copy every time.

62 10000 1000 Throughput

1000 RTT 800

100 600

10 400 RTT (ms) Throughput (KB/s) 1 200

0.1 0 0 1000 2000 3000 4000 5000 6000 7000 Trial(s)

Figure 4.2: Relation between download throughput and RTT.

4.2.4 Assessing Client-Side Caching Bias

Most of our experiments below involve comparing download performance from differ- ent Akamai locations. A crucial question is a possible bias introduced by transparent caching potentially used by the ISPs through which our measuring points connect to the Internet. Indeed, a cached download would then be performed from the cache regardless to which Akamai location we apparently direct the HTTP request. To estimate the extent of transparent cache use by the ISPs utilized by DipZoom MPs, we consider how HTTP download performance from a given Akamai edge server depends on its ping round-trip latency. If transparent caching is widely used, we would see no dependency. To study these correlations, we perform a set of four curl downloads, five pings, and a traceroute to an edge server from each measuring point to various edge servers. For HTTP download performance, we average the download throughputs reported by the last three curls (the first curl ensures that the object is in the edge server cache);

63 for ping RTT, we average the RTTs from the five pings. We use an outsourced cacheable 437, 688B Amazon object for curl downloads. We use 385 measuring points and 20 widely distributed edge servers in this ex- periment. To ensure that target edge servers are widely distributed across the world, we cluster all the discovered edge servers into twenty clusters based on the estimated distance between the servers (see section 4.5) and selected centers of the clusters as our targets. Figure 4.2 plots the values of the ping RTT and curl download throughput in each trial, sorted in the order of the increasing throughput. (The total number of trials is less than 385 × 20 because not all measurements were successful.) The figure shows clear dependency of the download performance on ping distance. Quantitatively, the two metrics show Spearman’s rank correlation coefficient of −0.8074 indicating a strong dependency thus and thus no significant use of client-side ISP caching. In fact, we later discover a direct method to detect a transparent cache. We deploy our own with a cacheable object and request it twice from every measuring point. We then examine the server’ access log. If the downloads from a given MP are filtered through an ISP cache, the log would contain only one request from this MP (the other would be terminated by the ISP cache); otherwise the log would contain both requests. We test 527 measurement points (as many as we could assemble) and find only three2 with a single request in our log. Thus, the overwhelming majority (99.43%) of MPs had no transparent caches between them and the Web servers. While this test is conducted a few months later than the rest of our study, the two experiments described here give me a high level of confidence that client-side caching does not bias our findings.

2These MPs were 137.132.250.12 (nusnet-250-12.dynip.nus.edu.sg, ), 132.252.152.193 (planetlab1.exp-math.uni-essen.de, Germany, Essen), and 147.126.95.167 (unknown hostname, Chicago).

64 4.2.5 Measuring Edge Server Performance

In section 4.4 and section 4.5.1, we use download performance as the “goodness metric” of a given edge server, and we express it as effective throughput, which is the object size divided by the download time. Given that the performance of downloads of different-sized objects is predominantly affected by different network characteristics (bandwidth for large objects and RTT for small objects), we verify our main findings on objects of a wide size range (from 10K to 400K). In addition, section 4.5.4 uses round-trip time from the client as the goodness metric of a given server.

4.3 Performance of Edge Discovery

Discovering of CDN edge servers is a typical precursor of CDN performance studies. This section describes the first attempt of our discovery process and investigates using Akamai how quickly one can discover edge servers with various techniques.

Figure 4.3: Cumulative Akamai’s edge server discovery against CNAMEs

Recall from section 4.2.1 that content providers engage CDN delivery by out- sourcing some hostnames to the CDNs DNS service. Some CDN customers outsource

65 Discovered Edge Servers 2500 2000 1500 1000 500 0 100 80 90 60 70 0 50 50 100 30 40 Number Of Measuring 150 Points 10 20 200 250 Number Of Customers

Figure 4.4: Akamai’s edge servers discovery multiple hostnames, in which case resolving different hostnames may lead to discover- ing different edge servers. Compiling these hostnames, however, requires crawling and parsing the customer Web sites, so our first question is how the number of hostnames used affects the discovery. We collected the total of 169 outsourced CNAMEs from the Web sites of 95 Akamai customers and resolved these names from around 200 DipZoom measuring points. Figure 4.3 depicts the discovery progress. To produce this figure, we first sort all CNAMEs so that the names belonging to the same customer appear next to each other. We then number all the CNAMEs and all the customers. The dotted line shows how the customer numbers grow with the CNAME numbers. The plateaus in this curve indicate customers with multiple CNAMEs. The solid line shows the cumulative number of edge servers discovered using a given number of CNAMEs. For example, just over 1000 edge servers were discovered using the first 60 CNAMEs. The plateaus on this curve show periods of no progress, when additional CNAMEs discovered few new servers. Comparing both curves shows that the CNAME number intervals producing plateaus on the customer curve generally also produce plateaus on the discovery curve, although the latter has additional plateau segments. This

66 6000 Accumulative Edge Servers Edge Servers From The Experiment 5000

4000

3000

2000

1000 Number of Distinct Edge Servers 0 0 2 4 6 8 10 12 14 Experiment Number

Figure 4.5: Progressive discovery of Akamai edge servers with time indicates that additional CNAMEs from the same customer do not contribute much to server discovery. In particular, for the remaining experiments in this section, we randomly select only one CNAME from each customer. Another two ways to discover edge servers involve resolving CNAMEs belonging to different customers, or performing the resolution from multiple vantage points. We resolved one randomly selected CNAME from each of the 95 customers from each of 214 DipZoom measuring points. To avoid the influence of changing DNS resolutions over time, each CNAME was resolved only once from each measuring point (we study the effect of repeated resolutions over time separately below). Figure 4.4 presents the results of this discovery process. To produce this figure, we number all measuring points and customers. Each point on the 3D-graph shows how many edge servers can be cumulatively discovered, using the corresponding number of measuring points and customers. As seen from this graph, both measuring points and customers contribute measurably to the discovery of edge servers. In particular, it shows that one has to include a large number of cus- tomers into the discovery process. This indicates that Akamai commonly partitions

67 rather than shares its edge servers among its customers. Finally, we investigate the contribution of time to the discovery process. Su et al. found rather high variability in Akamai server selection [79]. This suggests that more servers can be discovered by resolving the same CNAMEs from the same measuring points repeatedly over time. To investigate this aspect of server discovery, we repeated the experiment corresponding to Figure 4.4 every week. Figure 4.5 shows the discovery progress with the number of experiments. The vertical line presents how many edge servers discovered for each experiment alone and the solid ascending line presents the cumulative discovery across all the experiments. Generally, each experiment discovers roughly 3000 servers. However, repeated experiments result in a growing number of discovered servers; in fact, while the speed of discovery varies between the experiments, we have not reached the sat- uration point in 13 runs. Furthermore, we confirmed that this phenomenon is not caused by server replacement: an overwhelming majority of servers discovered in early runs were still operational at the end of the series. Overall, the only shortcut in large-scale edge server discovery we found is the elimination of multiple CNAMEs from the same customer. All other dimensions – measurement locations, customers, and time – are essential in server discovery. Based on what we learned from the performance study of our first attempt to discover Akamai’s edge servers, we conducted the later discovery attempt by per- forming DNS resolutions from a larger number of vantage points and executing these resolutions more frequently. We used the results of this second discovery in the rest of this study.

68 4.4 Performance of Akamai CDN

This section considers how well the existing Akamai platform fulfills its mission of accelerating content delivery. First, we study how it affects the end-user download performance, then consider the quality of its server selection.

4.4.1 Does a CDN Enhance Performance?

In general, a content delivery network can improve Web performance in two ways: by providing overload protection to Web sites and by delivering content to users from nearby locations. In this study, we are focusing on the latter aspect. To answer this question precisely, one would need to compare download perfor- mance from a CDN with the direct download of the same object from the origin server. Unfortunately, we found that the origin servers hosting Akamai-delivered content are usually hidden from the Internet users: our attempts to request this content from the origin sites that host the container pages were not successful. Thus, we resort to two estimates of the download performance from the origin servers. First, we estimate it by the download time of the Akamai-delivered content when we prevent caching at the edge servers. This actually measures the Akamai cache miss performance and hence adds the miss penalty to the true value. But they can indicate the performance of the latter, assuming Akamai infrastructure is highly optimized. Second, we estimate the origin server performance by the download performance of a different object of similar size, which is not outsourced to Akamai delivery, from the same Web site. We use the methodology in section 4.2.3 to control CDN caching when measuring the download speed with and without caching; we use regular HTTP download of the container page with curl to measure content download from origin server. We collect these download measurements from 377 measuring points. For the CDN-delivered

69 1

0.8

0.6 CDF 0.4

0.2 cache / non-cache cache / origin 0 0 5 10 15 20 Ratio

Figure 4.6: The performance benefits of Akamai delivery. content, we use an outsourced Amazon object with size 50K. For the download from the origin, we found a non-outsourced static page3 of 55K bytes on the Amazon website. (We assume the page is static by the fact that multiple downloads from different vantage points resulted in the same content.) We compare the CDN performance with direct download by considering the down- load throughput ratios:

Cache Download Cache Download and NonCache Download Origin Download

In both ratios, values over 1 mean the CDN improves performance over direct delivery (under the corresponding estimate of the direct delivery performance) and values below 1 indicate the opposite. Figure 4.6 plots the cumulative distribution functions of the cache-to-no-cache and cache-to-origin throughput ratios in all the collected measurements. The figure

3http://72.21.206.5/gp/help/customer/display.html

70 1

0.8

0.6 CDF 0.4

0.2 cache / non-cache cache / origin 0 0 2 4 6 8 10 12 14 16 18 Ratio

Figure 4.7: The residential client performance benefits of Akamai delivery. indicates that a CDN promises a very significant performance improvement under both estimates of the direct delivery performance. CDN delivery outperforms both no-cache and origin delivery in 98% and 96% of cases respectively. Furthermore, in 67% and 41% of the cases, the CDN delivery is at least five times faster than no-cache and origin delivery respectively. A possible reason for such dramatic improvement could be because most of our measuring points are well connected to the Internet and do not experience the bottleneck. Therefore, to see the benefits for residential clients, we reproduce, in Figure 4.7, the same results but only for 133 measuring points that the maximum download bandwidth is less than 1.5Mbps – typical residential connectivity. For these measuring points, CDN delivery still outperforms both no-cache and origin delivery in 95.5% and 91% of cases respectively but the performance benefits drop significantly. CDN delivery is now at least five times faster that origin delivery only in 2.3% of the cases. However, CDN delivery still improves the content download performance of residential clients.

71 1 0.9 0.8 0.7 0.6 0.5 CDF 0.4 0.3 0.2

0.1 non-cache / origin 0 0 0.5 1 1.5 2 2.5 3 Ratio

Figure 4.8: The comparison of no-cache download through Akamai and download from origin server.

Finally, we consider the Akamai global cache miss penalty: how much overhead the edge servers add for a request when the requested object has to be fetched from the origin server4. We estimate this penalty by comparing the direct delivery from the origin with the no-cache delivery through Akamai. The two delivery modes are compared in Figure 4.8, which plots the CDF of the no-cache-to-origin throughput ratio. It shows that in 70% of the cases, origin downloads are faster than no-cache downloads, and in 50% of the cases they are at least twice as fast. This indicates that Akamai miss penalty can be significant. A potential limitation of the experiment in this section is that it considers perfor- mance of a single Web object and not entire page downloads. Pages typically include multiple embedded objects, which may affect direct and CDN-accelerated downloads differently. Indeed, depending on the setup of the CDN service, the browser can download the initial container HTML object directly from the origin site and the

4We refer to this as global miss because of a possibility than an edge server may obtain the object from another edge server cache rather than from the origin.

72 embedded objects from the CDN’s edge server. Thus, the CDN delivery could entail an extra DNS resolution (to resolve host names of embedded URLs) and an extra TCP connection establishment (to establish connections to both the origin and edge server). However, this should not materially affect our conclusions because these ef- fects are amortized over the embedded objects on a page and often over several pages in a session (if the user accesses these pages in short succession so that cached DNS responses and persistent connections remain valid between page accesses). Akamai allows clients to cache DNS responses for 20 seconds (although many use them far longer [59]). We also probed around 1000 Akamai edge servers for persistent connec- tion support and found they maintain them for unusually long time - 500 seconds after a request (compared to Apache’s default of 15s). This provides ample opportunity for connection reuse.

4.4.2 How Good Is Akamai Server Selection?

When a client initiates the download to a content cached by Akamai, the best edge server for this particular client and content is selected dynamically. This section investigates how good is the Akamai selection algorithm from the client performance perspective. To answer this question, we selected one customer, Amazon, and measured the download throughput of an Akamai-delivered object of 50K bytes from 391 measuring points. Out of 10, 231 edge servers we discovered, 430 servers were discovered for Amazon. At each of the 391 measuring points, 80 edge servers are randomly selected from these 430 servers, and the object is downloaded from the edge server selected by Akamai for this measuring point and from each of the 80 alternative edge servers, using the methodology of section 4.2.2 to download from a particular edge. Because the entire experiment from a given measuring point takes considerable time, we always perform a pair of measurements one right after the other, one for the Akamai-selected

73 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Fraction of Outperformed Servers 0 0 50 100 150 200 250 300 350 400 Measuring Point Number

Figure 4.9: The fraction of servers outperformed by the Akamai-selected server. server, and one for a given alternative edge server. Note that the Akamai-selected server may change from one pair to the next. Finally, we use the techniques of section 4.2.3 to ensure that the edge server delivers the object from its cache in each case. Figure 4.9 shows, for each measuring point, the percentage of time that Akamai- selected server performed better than an alternative server, in the increasing order. For example, for measuring point number 98, it shows that 50% of the 80 edge servers were outperformed by the Akamai-selected server. At measuring point number 200, Akamai-selected server outperforms an alternative 80% of the time. The remaining 191 measuring points, measuring point number 201-391, indicated more successful server selection. Overall, this experiment confirms an early study [41] that CDNs rarely select the best edge server but successfully avoid the worst ones. Indeed, in roughly 75% of the MPs, the Akamai-selected server outperformed half of the alternatives.

74 4 Average Difference Maximum Gain Minimum Loss 2

0

-2

-4 Throughput Difference (MByte/s) -6 0 50 100 150 200 250 300 350 400 Measuring Point Number

Figure 4.10: Download throughput difference between Akamai-selected and an alter- native edge server

While Figure 4.9 shows how many times Akamai-selected servers perform better or worse than the alternatives, it does not say by how much. To provide this infor- mation we plot on Figure 4.10 the average difference, the maximum gain, and the maximum loss of throughput of the Akamai-selected server vs. the alternatives. The average difference is taken over all 80 alternative servers; negative values indicate the alternative servers performed better on average and positive values correspond to the advantage of the Akamai-selected server. Maximum gain and maximum loss are the throughput differences of the Akamai-selected server over, resp., the slowest and fastest alternative server. If positive, the maximum gain shows the advantage of the Akamai-selected server over the worst alternative. Similarly, if negative, the maximum loss shows the maximum penalty vs. the best alternative. The “average difference” curve shows that there are 147 measuring points where Akamai-selected server delivers on average inferior performance, by at most 255 KB/s for measuring point 1. However, Akamai-selected servers have superior performance

75 in average on 244 measuring points with up to 3.38 MB/s average advantage. Inter- estingly, although Akamai-selected servers have superior performance on average, the “maximum loss” curve shows that, in a number of cases, the best alternative servers outperform Akamai-selected servers significantly, by up to 6.35 MB/s. In brief, Aka- mai usually makes good server selection decisions, but there may be substantial room for further improvements.

4.5 Performance of Consolidated Akamai CDN

Akamai attempts to deliver content to users from nearby servers by placing edge server in a large number of network locations. The question we pose is: how much would it cost in performance if the number of data centers were reduced significantly? Krishnamurthy et al. compared performance of different CDNs, which happened to have drastically different number of data centers [45]. One could infer from that study that performance did not directly correlate with the number of data centers. This study is a direct study of the performance implications of data center consolidation by the same CDN. We first describe our technique for data center consolidation, then present a study using measurements from DipZoom measurement points, and conclude with a live study involving real Internet users. Our DipZoom study uses download throughput as the performance metric and the live study uses latency5. Both studies show that a considerable consolidation is possible without a noticeable effect on performance.

4.5.1 Data Center Consolidation

Our methodology for studying a hypothetical consolidated CDN is as follows. We first group edge servers that we believe are close to each other into hypothetical

5As described later, Javascript limitations prevented us from comparing download throughputs from arbitrary edge servers in the live study.

76 consolidated data centers, which we refer to as big clusters. We then “place” each big cluster into the location of a central server in the cluster called the representative of the cluster. To this end, for a given client, we replace the server selected by Akamai,

Sakam with the representative of the cluster to which Sakam belongs. In other words, we assume that all clients that would have been sent to any server in a given big cluster in the existing platform, will be served from the cluster center in the consolidated case. We then consider performance implications of this replacement by comparing the performance of the downloads from both servers. We would like to stress again that our study only considers the implication of CDN consolidation on the proximity of clients to data centers: the aggregate CDN capacity, both in terms of network bandwidth and server capacity, is orthogonal to the number of data centers as fewer locations can be compensated by higher pro- cessing and connectivity capacity at each data center. Because our probes impose trivial load on Akamai servers, the performance of our downloads reflect the prox- imity between servers and clients. In fact, any server load differences in individual server-pair comparisons should work against platform consolidation because Akamai avoids overloaded servers in its server selection while we use cluster centers regardless of their load. To cluster edge servers, we start by estimating the pair-wise network distances between all the servers using a recently proposed dynamic triangles method [90], and then group nearby servers into a predefined number of big clusters by applying a hierarchical clustering algorithm to the resulting distance matrix. (We use the so- called hierarchical clustering with complete linkage method, following by the cut-the- tree procedure, both provided by the R software [68] for this computation.) we could equally use other techniques, such as network-aware clustering [44]. However, our goal is to show that we can consolidate large numbers of servers into fewer data centers without loss of performance, and how we select servers for consolidation is immaterial

77 (a) Complete Platform (b) Partial Platform

Figure 4.11: The implication of incomplete platform discovery: A client may be redirected to a more distant location. as long as we find the performance of the consolidated platform comparable. In other words, our consolidation technique is conservative: better clustering could lead to a better-constructed consolidated platform with even fewer data centers.

4.5.2 Impact of Incomplete Edge Server Discovery

While we discovered a large number of Akamai edge servers and locations, they still represent an incomplete view of the entire Akamai platform. However, the incomplete discovery only makes our findings conservative. To illustrate this point, compare Figure 4.11a, which shows a hypothetical CDN with nine locations with 4.11b, which shows the same CDN where only six of those locations have been discovered. To consolidate both the full and incomplete platforms by the same fraction (by two-thirds in the example), the incomplete platform would need to bring more distance servers into the same data center, causing users to be served from more distant locations. In other words, the incomplete discovery does not undermine the voracity of our findings regarding the consolidation of the locations that we discovered.

78 4.5.3 DipZoom Experiment

To judge the performance of the hypothetical consolidated Akamai platform, we com- pare the performance of downloads from the current and consolidated platforms using DipZoom measurement points. We had 412 measurement points in this experiment. From each measurement point, we downloaded the object of a given size from the Akamai-selected server and from the center of its cluster in the consolidated plat- form. The center is represented by a randomly selected available server from the five closest servers, in term of latency, to the center of the cluster6. To avoid the bias from changing network conditions, we perform each pair of downloads in immediate succession. Further, we pre-request each object from both servers to ensure the server is delivering the object from its cache, thus excluding the possibility of skewing results by the cache-miss penalty. Having placed the object into the servers’ cache, we perform three downloads from either server and take the average as its download performance. We compare the performance of the existing and consolidated Akamai platforms by the ratio between the download throughput in both platforms. The performance ratio more than 1 means Akamai’s existing configuration yields better download per- formance than the consolidated configuration and vice versa. We grouped the 10, 231 edge servers into 150, 100, 60, 40, and 20 data centers, thus considering the varying degree of consolidation. To check if our conclusions might depend on the size of downloaded objects, we controlled the size of the downloads precisely by finding a large (over 400K) object7 and specifying an appropriate “Range” header in our HTTP request. We verified that Akamai servers honor range requests. Figure 4.12 presents the CDF of the download throughput ratios of the existing-to- consolidated configurations. The figure reflects data points obtained by downloading

6We choose one of five closest servers instead of always using the centroid node to reduce an undue effect of a single server on the entire experiment. 7http:///images/G/01/digital/video/preview/Player. V16565163 .swf

79 an outsourced Amazon object of size 150K, 100K, 50K, and 10K from 412 mea- surement points world-wide. The curves reflect 5522 ratios (some downloads were unsuccessful).

1 0.9 0.8 0.7 0.6 150 Clusters 0.5

CDF 100 Clusters 0.4 0.3 60 Clusters 0.2 40 Clusters

0.1 20 Clusters 0 0 0.5 1 1.5 2 2.5 3 3.5 4 Ratio

Figure 4.12: The performance of a consolidated Akamai platform with different num- ber of data centers.

As seen in Figure 4.12, consolidating edge servers into 150, 100, and 60 data centers does not cause noticeable performance degradation. Only when we get down to 40 and 20 data centers, does the original platform start outperforming the consolidated configuration – 60% and 70% of the time for 40 and 20 data centers, respectively. Furthermore, as Figure 4.13 shows, these results are largely independent of the target object size. The above experiment reflects the mix of measurement points, which are skewed towards well-connected hosts. To see the performance implication for users with different connectivity, we consider the performance of consolidated configurations separately for measurement points with different download bandwidth. Specifically, we group the measurement points by the maximum download throughput observed

80 in the course of the experiment of Figure 4.12. The results, shown in Figure 4.14, indicate that the existing Akamai configuration with large number of data centers favors well-connected users. A number of measure- ment points within (0,256Kbps], (256,512Kbps], (512Kbps,1.5Mbps], (1.5,3Mbps], and (3,6Mbps], (6,∞) ranges are 14, 54, 58, 65, 122, and 86 respectively. For mea- surement points with over 6Mbps bandwidth, the existing configuration outperformed consolidated configurations once they get down to 60 data centers. However, the less the bandwidth of the measurement points the less the advantage of the existing configuration. Given this trend, and because CDNs are often used by high-volume consumer-oriented sites, a pertinent question is how the consolidation may affect typical residential users. Thus, we compare, in Figure 4.15, the performance of existing vs. consolidated configurations for 126 measurement points with bandwidth below 1.5Mbps, which is a typical download bandwidth for DSL users. Figure 4.15 shows that these clients would not see noticeable performance difference if the servers were further consolidated into 40 data centers. Overall, we conclude that one could consolidate Akamai platform to 60 data cen- ters, and for typical residential users, even to 40 data centers without noticeable performance penalty. Even compared with our conservative approximation of 308 original data centers, this represents significant potential savings. However, as we mentioned, the real number of Akamai locations is likely much higher, which would make our findings even more significant.

4.5.4 A Live Study

The study of the previous subsection measured real downloads but performed them from DipZoom measurement points. In this section, we perform live measurement from a larger number of vantage points and from real Internet users but measure the

81 Configuration Best in 10 Best in 50 Best in 100 Sample Median 16.00 −5.00 −18.40 Sample Mean −15.66 −54.43 −81.53 Bootstrap Median 15.954417 −5.013368 −18.260679 Bootstrap Mean −15.695077 −54.313225 −81.697586 95% conf. interval [−33.816798, [−72.209521, [−99.081914, of the mean −5.129345] −43.798741] −70.692385]

Table 4.2: The difference of RTT distance (in milliseconds) from clients to the nearest data center in a given consolidated platform and to the Akamai-selected server in the current platform (live clients). latency between the users and the edge servers. We built a Web page 8 with an AJAX application that, when loaded, measures latency to the server that Akamai selects for this browser as well as to a list of other Akamai edge servers, and reports the results to us. We requested a mid-size commercial company to embed our special page into a zero-sized frame, and we also embedded it into our own Web pages. As clients access these Web sites, we collect our measurement results. We picked a CNAME (a1694.g.akamai.net, utilized by pcworld.com) which we found is mapped by Akamai to a large number - 979 - of edge servers. These servers represent 343 different /24 networks; using the city+AS heuristic (see Section 4.2.1), we estimate them to represent 168 locations. To compare the current configuration with consolidated configurations, we par- titioned the 979 servers into 10, 50 and 100 clusters using our clustering approach. For the 100 cluster configuration, we used the centroid nodes from each cluster as the cluster representative and measured the latency from clients to these servers from the AJAX application. To cut down on the number of measurements, for the configura- tions with fewer (and hence larger) clusters, we selected cluster representatives among these 100 servers as well (even if they were not the closest to the cluster center). Note that, in our clustering, a larger cluster is built as a union of smaller clusters, so we could always find an appropriate server this way. Also note that this does not under-

8Available at http://haddock.case.edu:8000/realdistquery

82 mine our results because we have a discretion where to place our consolidated data centers. We could not find a way to pass a custom host header to requests sent from Javascript, which is necessary to obtain a real object from a non-Akamai-selected edge server (section 4.2.2). Thus, we submit a bogus URL that returns Bad Request in all cases. Obtaining this response by the client involves two RTTs, allowing us to measure the latency between the client and the edge server involved. We have collected 24,079 measurements from 2,926 client IP addresses by the time of this writing. According to the GeoIP database, these vantage points cover a wide area, representing 47 US states and 43 foreign countries, and as our statistical analysis will show, are sufficient to derive a meaningful result. We consider the differences between the measured latencies from each client to its closest consolidated data center and the Akamai-selected server. A negative value indicates that the consolidated platform performed better than the existing Akamai platform in the corresponding observation, while a positive value reflects an advantage of the existing Akamai platform. This assumes that the consolidated configuration performs perfect data center selection and thus should be viewed as an upper bound of the consolidation benefits. However, we note that fewer farther-apart locations make server selection easier. To avoid using correlated measurements in the analysis, we average all the mea- surements from the same client, so that each client will contribute only one data point to our analysis. Further, to remove possibly correlated measurements from the same network in the same locale, we use only one randomly selected client from all the clients with the same city and autonomous system according to GeoIP 9. This reduced the number of data points for our analysis from 2,926 to 2,029. To assess the significance of the results, we build confidence intervals for the

9Note that the observations should not be averaged in this case because we do not know if in fact these measurements are all correlated.

83 reported means. Because the distribution of the observations is unknown, we use the non-parametric bootstrap method to estimate the population mean and median and to build the confidence interval for the mean. In particular, we used the bootstrap bias-corrected accelerated (BCa) interval method [30] with 10,000 re-sampling sets and relied on Matlab-provided functions that implement the core of the method. Table 4.2 summarizes the results of these measurements, with confidence intervals for the means built for confidence probability 95%. The results show that with both 100 and 50 consolidated data centers, we could still find a closer data center than the Akamai-selected server for a majority of clients, and the average distance to the nearest consolidated data center across all clients is also lower. In fact, the entire confidence interval for the mean distance difference is negative, indicating that the above conclusion is statistically significant, and even the upper limit of the confidence interval indicates the RTT difference of over 40ms for 50 data centers and 70ms for 100 data centers. Only with 10 data centers do we see a mixed result: while the entire confidence interval for the average distance difference is still negative (indicating the distance from clients to the nearest consolidated data center still smaller on average than to the Akamai-selected server in the current platform), the median difference indicates an advantage of the current platform, indicating that current Akamai plat- form would perform better for a majority of clients. We also note that the median difference between Akamai and consolidated platforms is much smaller than average in all configurations. This says that the average difference is skewed against Akamai by occasional poor server selections, confirming a finding from Section 4.4.2. Overall, our results show that even with significant consolidation, better server selection can more than make up for any performance impact from consolidation.

84 4.6 Conclusion

This chapter presents a large-scale performance study of the Akamai’s CDN platform. To our knowledge, this is the first study to provide an independent direct estimate of the performance improvements of Akamai-accelerated downloads. We further in- vestigated the quality of Akamai server selection at a large scale. A central to this work is the evaluation of the possibility of consolidating Akamai’s platform into fewer large data centers. We found that quite significant consolidation is possible without appreciably degrading the platform performance.

85 1 0.9 0.8

0.7 400k Download Size 0.6 300k Download Size 0.5

CDF 200k Download Size 0.4 0.3 100k Download Size 0.2 50k Download Size 0.1 10k Download Size 0 0 0.5 1 1.5 2 2.5 3 3.5 4 Ratio (a) 60 Data Centers 1

0.8

400k Download Size 0.6 300k Download Size

CDF 200k Download Size 0.4 100k Download Size 0.2 50k Download Size 10k Download Size 0 0 0.5 1 1.5 2 2.5 3 3.5 4 Ratio (b) 40 Data Centers Figure 4.13: The performance of a consolidated Akamai platform with different target object size.

86 1 1 0.9 0.9 0.8 0.8

0.7 0 - 256 Kbps 0.7 0 - 256 Kbps 0.6 0.6 256 - 512 Kbps 256 - 512 Kbps 0.5 0.5 CDF 512 Kbps - 1.5 Mbps CDF 512 Kbps - 1.5 Mbps 0.4 0.4 0.3 1.5 - 3.0 Mbps 0.3 1.5 - 3.0 Mbps 0.2 3.0 - 6.0 Mbps 0.2 3.0 - 6.0 Mbps

0.1 > 6.0 Mbps 0.1 > 6.0 Mbps 0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 Ratio Ratio (a) 100 Data Centers (b) 60 Data Centers 1 1 0.9 0.9 0.8 0.8

0.7 0 - 256 Kbps 0.7 0 - 256 Kbps 0.6 0.6 256 - 512 Kbps 256 - 512 Kbps 0.5 0.5 CDF 512 Kbps - 1.5 Mbps CDF 512 Kbps - 1.5 Mbps 0.4 0.4 0.3 1.5 - 3.0 Mbps 0.3 1.5 - 3.0 Mbps 0.2 3.0 - 6.0 Mbps 0.2 3.0 - 6.0 Mbps

0.1 > 6.0 Mbps 0.1 > 6.0 Mbps 0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 Ratio Ratio (c) 40 Data Centers (d) 20 Data Centers

Figure 4.14: The performance of a consolidated Akamai platform with different down- load speed.

87 1 0.9 0.8 0.7 0.6 150 Clusters 0.5

CDF 100 Clusters 0.4 0.3 60 Clusters 0.2 40 Clusters 0.1 20 Clusters 0 0 0.5 1 1.5 2 2.5 3 3.5 4 Ratio

Figure 4.15: The performance of a consolidated Akamai platform for residential speed links.

88 Chapter 5

Security Issues in Commercial CDNs

Content Delivery Networks (CDNs) are commonly believed to offer their customers protection against application-level denial of service (DoS) attacks. Indeed, a typi- cal CDN with its vast resources can absorb these attacks without noticeable effect. This chapter uncovers a vulnerability which not only allows an attacker to penetrate CDNs protection, but to actually use a content delivery network to amplify the attack against a customer Web site. We show that leading commercial CDNs Akamai and Limelight and an influential research CDN ”Coral ” can be recruited for this attack. By mounting an attack against our own Web site, we demonstrate an order of mag- nitude attack amplification though leveraging the Coral CDN. We present measures that both content providers and CDNs can take to defend against our attack. We be- lieve it is important that CDN operators and their customers be aware of this attack so that they could protect themselves accordingly.

89 5.1 Introduction

CDNs typically deploy a large number of servers across the Internet. By doing this, CDNs offer their customers (i.e., content providers) large capacity on demand and better end-user experience. CDNs are also believed to offer their customers the protec- tion against application-level denial of service (DoS) attacks. In an application-level attack, the attacker sends regular requests to the server with the purpose of consum- ing resources that would otherwise be used to satisfy legitimate end-users’ requests. These attacks are particularly dangerous because they are often hard to distinguish from legitimate requests. Since CDNs have much larger aggregate pool of resources than typical attackers, CDNs are supposed to be able to absorb DoS attacks without affecting the availability of their subscribers’ Web sites. However, in this study, we describe mechanisms that attackers can utilize to not only defeat the protection against application-level attacks provided by CDNs but to leverage their vast resources to amplify the attack. The key mechanisms that are needed to realize this attack are as follows.

• Scanning the CDN platform to harvest edge server IP addresses. There are known techniques for discovering CDN edge servers, based on resolving host names of CDN-delivered URLs from a number of network locations [79, 84]. Recent research focusing on performance of CDNs report discovering large num- bers of edge servers. Our parallel work, described in chapter 4, discovers around 11,000 Akamai edge servers and includes an analysis of the discovery process [84].

• Obtaining HTTP service from an arbitrary edge server. While a CDN performs edge server selection and directs HTTP requests from a given user to a particular server, we show an easy way to override this selection. Thus, the attacker can send HTTP requests to a large number of edge servers from a single machine.

90 • Penetrating through edge server cache. We describe a technique with which the attacker can command an edge server to obtain a fresh copy of a file from the ori- gin even if the edge server has a valid cached copy. This can be achieved by ap- pending a random query string to the requested URL (“?”). Thus, the attacker can ensure that its requests reach the origin site.

• Reducing the attacker’s bandwidth expenditure. We demonstrate that at least the CDNs we considered transfer files from the origin to the edge server and from the edge server to the user over decoupled TCP connections. Thus, by throttling or dropping its own connection to the edge server, the attacker can conserve its own bandwidth without affecting the bandwidth consumption at the origin site.

Some of the above techniques, such as overriding sever selection and cache pen- etration, were developed during our CDN performance study in chapter 4 and were re-used here in this study for different purposes. Combining these mechanisms together, the attacker can use a CDN to amplify its attack. To this end, the attacker only needs to know the URL of one sizable object that the victim content provider delivers through a CDN. Then, the attacking host sends a large number of requests for this object, each with a different random query string appended to the URL, to different edge servers from this CDN. Different query strings for each request prevent the possibility of edge servers fetching the content from each other [43] and thus reducing the strength of the attack. After establishing each TCP connection and sending the HTTP request, the attacker drops its connection to conserve its bandwidth. For the CDNs we study, every edge server will forward every request to the origin server and obtain the object at full speed. With enough edge servers, the attacker can easily saturate the origin site while expending only a small amount of bandwidth of its own. Furthermore, because the attacker spreads its requests among the edge servers,

91 it can exert damage with only a low request rate to any given edge server. From the origin’s perspective, all its requests would come from the edge servers, known to be trusted hosts. Thus, without special measures, the attacker will be hidden from the origin behind the edge servers and will not raise suspicion at any individual edge server due to low request rate. The aggregation of per-customer request rates across all the edge servers could in principle detect the attacker, but doing this in a timely manner would be challenging in a large globally distributed CDN. Hence, it could help in a post-mortem analysis but not to prevent an attack. Even then, the attacker can use a botnet to evade traceability. While our attack primarily targets the origin server and not the CDN itself (mod- ulo the cache pollution threat to the CDN discussed in section 5.4), it is likely to disrupt the users’ access to the Web site. Indeed, a Web page commonly consists of a dynamic container HTML object and embedded static content - images, multimedia, style sheets, scripts, etc. A typical CDN delivers just the embedded content, whereas the origin server provides the dynamic container objects. Thus, by disrupting access to the container object, our attack will disable the entire page. This study makes the following main contributions:

• We present a DoS attack against CDN customers that penetrates CDN caches and exploits them for attack amplification. We show that customers of three popular content delivery networks (two leading commercial CDNs – Akamai and Limelight – and an influential research CDN – Coral) can be vulnerable to the described attack.

• We demonstrate the danger of this vulnerability by mounting an end-to-end attack against our own Web site that we deployed specially for this purpose. By attacking our site through the Coral CDN, we achieve an order of magni- tude attack amplification as measured by the bandwidth consumption at the attacking host and the victim.

92 • We present a design principle for content providers’ sites that offers a definitive protection against our attack. With this principle, which we refer to as “no strings attached”, a site can definitively protect itself against our attack at the expense of a restrictive CDN setup. In fact, Akamai provides an API that can facilitate the implementation of this principle by a subscriber [51].

• For the cases where these restrictions prevent a Web site from following the “no strings attached” principle, we discuss steps that could be used by the CDN to mitigate our attack.

As typical, once one realizes a threat, it is possible to devise a protection, and we describe steps that a CDN and its subscribers can take to defend against our attack. In fact, Akamai already provides an API allowing a subscriber to implement our mitigation [51]. However, implementing these steps has significant implications on the Web site design and how it sets up its CDN service (as discussed in sections 5.5 ). With a growing number of young CDN firms on the market and the crucial role of CDNs in the modern Web infrastructure (indeed, Akamai alone claims to be delivering 20% of the entire Web traffic [9]), we believe it is important that CDNs and their subscribers be aware of this threat so that they can protect themselves accordingly.

5.2 The Attack Components

This section describes the key mechanisms comprising our attack and our methodol- ogy to verify that CDNs under study support these mechanisms.

5.2.1 Harvesting Edge Servers

CDN edge server discovery is based on resolving hostnames of CDN-delivered URLs from a number of network locations. Researchers have used public platforms such

93 as PlanetLab to assemble a sizable list of edge servers for CDN performance studies [79, 84]. An attacker can employ a botnet for this purpose. We previously utilized the DipZoom measurement platform [25] to harvest around 11,000 Akamai edge servers for a separate study (See Chapter 4 and [84]). For the present study, we used the same technique to discover Coral edge servers. We first manually compile a list of URLs cached by Coral CDN. We then randomly select one URL and resolve its hostname into an IP address from every DipZoom measurement point around the world. We repeat this process over several hours and discover 263 unique IPs of Coral cache servers. Since according to Coral website, there are around 260 servers, we believe we essentially discovered the entire set.

5.2.2 Overriding CDN’s Edge Server Selection

To recruit a large number of edge servers for the attack, the attacker needs to submit HTTP requests to these servers overriding CDN’s server selection for this host. In other words, the attacker needs to bypass DNS lookup, i.e., to connect to the desired edge server directly using its raw IP address rather than the DNS hostname from the URL. We found that to trick this edge server into processing the request, it is sufficient to simply include the HTTP host header that would have been submitted with a request using the proper DNS hostname. One can verify this technique by using curl - a command-line tool for HTTP downloads. For example, the following invocation will successfully download the object from a given Akamai edge server (206.132.122.75) by supplying the expected host header through the “-H” command argument:

curl -H Host:ak.buy.com http://206.132.122.75/.../207502093.jpg

We verified that this technique for bypassing CDN’s server selection is effective for all three CDNs we consider.

94 5.2.3 Penetrating CDN Caching

The key component of our attack is to force the attacker’s HTTP requests to be fulfilled from the origin server instead of the edge server cache. Normally, requesting a cache server to obtain an object from its origin could be done by using HTTP Cache-Control header. However, we were unable to force Akamai to fetch a cached object from the origin this way: adding the Cache-Control did not noticeably affect the download performance of a cached object. As an alternative, we exploit the following observation. On one hand, modern caches use the entire URL strings, including the search string (the optional portion of a URL after “?”) as the cache key. For example, a request for foo.jpg?randomstring will be forwarded to the origin server because the cache is unlikely to have a previously stored object with this URL. On the other hand, origin servers ignore unexpected search strings in otherwise valid URLs. Thus, the above request will return the valid foo.jpg image from the origin server.

Verification To verify this technique, we first check that we can download a valid object through the CDN even if we append a random search string to its URL, e.g., ”ak.buy.com/db assets/ large images/093/207502093.jpg?random”. We observed this to be the case with all three CDNs. Next, we measure the throughput of downloading a cached object from a given edge server. To this end, we first issue a request to an edge server for a regular URL (without random strings) and then measure the download throughput of repeated requests to the same edge server for the same URL. Since the first request would place the object into the edge server’s cache, the performance of subsequent downloads indicates the performance of cached delivery. Finally, to verify that requests with random strings are indeed forwarded to the origin site, we compare the performance of the first download of a URL with a given

95 Trial Limelight Akamai Number 208.111.168.6 192.5.110.40 1 775 1295 2 1028 1600 3 1063 1579 4 1009 1506 5 958 1584 6 1025 1546 7 941 1558 8 1029 1570 9 1019 1539 10 337 1557 Average 918 1533

Table 5.1: The throughput of a cached object download (KB/s). Object requests have no appended random string. random string (referred to as “initial download” below) with repeated downloads from the same edge server using the same random string (referred to as “repeat download”) and with the cached download of the same object. The repeat download would presumably be satisfied from the edge server cache. Therefore, if changing the random string leads to distinctly worse download performance, while repeat downloads show similar throughout to the cached download, it would indicate that the initial requests with random strings are processed by the origin server. We select one object cached by each CDN: a 47K image from Akamai1 and a 57K image from Limelight2. (The open nature of Coral allows direct verification, which we describe later.) Using a client machine in our lab (129.22.150.231), we resolve the hostname from each URLs to obtain the IP address of the edge server selected by each CDN for our client. These edge servers, 192.5.110.40 for Akamai and 208.111.168.6 for Limelight, were used for all the downloads in this experiment. Table 5.1 shows the throughput of ten repeated downloads of the selected object from each CDN, using its regular URL. These results indicate the cached download

1”ak.buy.com/db assets/large images/093/207502093.jpg” 2”modelmayhm-8.vo.llnwd.net/d1/photos/081120/17/4925ea2539593.jpg”

96 String Initial Repeat Number Download Download 1 130 1540 2 156 1541 3 155 1565 4 155 1563 5 156 1582 6 155 1530 7 156 1522 8 147 1536 9 151 1574 10 156 1595 Average 152 1555

Table 5.2: Initial vs. repeat download throughput for Akamai (KB/s). Requests include appended random strings. performance. Tables 5.2, and 5.3 present the throughput of initial and repeat downloads of the same objects with ten different random strings. The results show a clear difference in download performance between initial and repeat downloads. The repeat download is over 10 times faster for the Akamai case and almost 7 times faster for Limelight. Furthermore, no download with a fresh ran- dom string, in any of the tests, approaches the performance of any repeat downloads. At the same time, the performance of the repeat download with random strings is very similar to the cached download. This confirms that a repeat download with a random string is served from the cache while appending a new random string defeats edge server caching in both Akamai and Limelight. In the case of Coral CDN, we verify its handling of random search strings directly as follows. We setup our private Web server on host saiyud.case.edu (129.22.150.231) whose only content is an object http://saiyud.case.edu/pic01.jpg. Given the open na- ture of Coral CDN, a client can now download this object through Coral by accessing URL ”http://saiyud .case.edu.nyud.net/pic01.jpg”. Next, we obtain the edge server selected by Coral for our client by resolving the hostname saiyud.case.edu.nyud.net. Then, we use this server (155.246.12.164) explicitly for this experiment with the tech-

97 String Initial Repeat Number Download Download 1 141 611 2 111 876 3 20 749 4 192 829 5 196 736 6 125 933 7 166 765 8 128 1063 9 18 847 10 140 817 Average 124 828

Table 5.3: Initial vs. repeat download throughput for Limelight (KB/s). Requests include appended random strings. nique from section 5.2.2. To check that Coral caches our object, we requested pic01.jpg from the above edge server three times without a random search string and verified that the log on our web server recorded only one access of pic01.jpg. This means the other downloads were served from the edge server cache. Then, we again issued three requests of pic01.jpg to this edge server, but now with a different random search string in each request. This time, our Web server log recorded three accesses of pic01.jpg from the edge server. We conclude that appending a random string causes Coral edge server to fetch the file from the origin regardless of the state of its cache, as was the case with Akamai and Limelight.

5.2.4 Amplifying the Attack: Decoupled File Transfers

We showed in Section 5.2.3 that one can manipulate a CDN edge server to download the file from the origin server regardless of the content of its cache and therefore penetrate CDN’s protection of a Web site against a DoS attack. We now show that the attacker can actually recruit an edge server to consume bandwidth resources from the origin site without expending much of the attacker’s own bandwidth.

98 Figure 5.1: Decoupled file transfers experiment

In particular, we will show that edge servers download files from the origin and upload them to the client over decoupled TCP connections, so that the file transfer speed from the origin is largely independent from the transfer speed to the client3. In fact, this is a natural implementation of an edge server, which could also be rational- ized by the desire to have the file available in the cache for future requests as soon as possible. Unfortunately, as we will see, it also has serious security implications.

5.2.5 Verification

To demonstrate the independence of the two file transfers, we setup two client com- puters, a prober and a monitor as shown in Figure 5.1. The prober has the ability to shape its bandwidth or cut its network connection right after sending the HTTP request. The monitor runs the regular Linux network stack. The prober requests a CDN-accelerated object from an edge server E with an appended random string to ensure that E obtain a fresh copy from the origin server. The prober shapes its bandwidth to be very low, or cuts the connection altogether after sending the HTTP request. While the prober is making a slow (if any) progress

3We do not claim these are completely independent: there could be some interplay at the edge server between the TCP receive buffer on the origin-facing connection and the TCP send buffer on the client-facing side. These details are immaterial to the current study because they do not prevent the attack amplification we are describing.

99 in downloading the file, the monitor sends a request for the same URL with the same random string to E and measures its download throughput. If the throughput is comparable to the repeat download throughput from section 5.2.3, it means the edge server processed the monitor’s request from its cache. Thus, the edge server must have completed the file transfer from the origin as the result of the prober’s access even though the prober has hardly downloaded any content yet. On the other hand, if the throughput is comparable to that of the initial download from section 5.2.3, then the edge server has not acquired the file and is serving it from the origin. This would indicate that the edge server matches in some way the speed of its file download from the origin to the speed of its file upload to the requester. Because edge servers may react differently to different behavior of the clients, we have experimented with the prober (a) throttling its connection, (b) going silent (not sending any acknowledgements) after sending the HTTP request, and (c) cutting the connection altogether, with sending the reset TCP segment to the edge server in response to its first data segment. We found that none of three CDNs modify their file download behavior in response to any of the above measures. Thus, we present the results for the most aggressive bandwidth savings technique by the requester, which includes setting the input TCP buffer to only 256 bytes – so that the edge server will only send a small initial amount of data (this cuts the payload in the first data segment from 1460 bytes to at most 256 bytes), and cutting the TCP connection with a reset after transmitting the HTTP request (so that the edge server will not attempt to re-transmit the first data segment after timeouts). The experiments from the previous subsection showed that both Akamai and Limelight transferred their respective object from origin with the throughput of be- tween 100 and 200KB/s (an occasional outlier in the case of Limelight notwithstand- ing). Given that either object is roughly 50K in size, we issue the monitoring request 0.5s after the probing request, so that if our hypothesis of the full-throttle download

100 String Limelight Akamai Number (208.111.168.6) (192.232.17.8) 1 1058 1564 2 1027 1543 3 721 1560 4 797 1531 5 950 1562 6 759 1589 7 943 1591 8 949 1600 9 935 1583 10 928 1544 Average 907 1567

Table 5.4: The download throughput (KB/s) of the monitor client. The monitor request is sent 0.5s after the probing request. is correct, each edge server will have transferred the entire object into the edge server cache by the time of the monitoring request arrival. The results are shown in Table 5.4. It shows that the download throughputs mea- sured by the monitor matches closely those for repeat downloads from section 5.2.3. Thus, the monitor obtained its object from the edge server cache. Because the edge server could process this request from its cache only due to the download caused by the request from the prober, and the latter downloaded only a negligible amount of content, we have shown that, with the help of the edge server, the prober can consume (object-size)/0.5s, or roughly 100KB/s, of the origin’s bandwidth while ex- pending very little bandwidth of its own. With a choice of larger object size, one could generate a much larger consumption.

5.3 End-to-End Attack

This section demonstrates the end-to-end attack that brings together the vulnerabil- ities described in the previous section. To do so, we setup our own web server as a victim and use the Coral CDN to launch the amplified DoS attack against this server.

101 Figure 5.2: DoS attack with Coral CDN

This way, we can show the effectiveness of our attack without affecting any existing Web site; further, due to elaborate per-node and per-site rate controls imposed by Coral [22] we do not affect the Coral platform either. In fact, our experiments gener- ate only roughly 18Kbps of traffic on each Coral server during the sustained attack and under 1Mbps during the burst attack - hardly a strain for a well-provisioned node. Our results show that even the modest attempt resulted in over an order of magni- tude attack amplification and two-three orders of magnitude service degradation of the web site. We should note that after a number of attacks, Coral apparently was able to correlate our request pattern across their nodes and block our subnet from further attacks. This, however, happened only after a number of successful attacks. The mitigation methods we describe in section 5.5 would allow one to prevent these attacks before they occur. Furthermore, a real attacker could use a botnet to change the attacking host at will, complicating even post-mortem detection. We discuss data- mining-based protection in more detail in section 5.3.4.

102 5.3.1 The Setup

Figure 5.2 shows our experimental setup. The victim web server hosts a single 100K target object. The attacker host issues a number of requests for this object with different random strings to each of the Coral cache servers. To reduce its load imposed by the return traffic, the attacker sets an artificially small input TCP buffers of 256 bytes for its HTTP downloads and terminates its connections upon the arrival of the first data packet. The monitor acts as a regular client. It downloads the object directly from the victim web site once a second to measure the performance that would be experienced by an end-user visiting the site. We use the identical machines for both the victim web server and the attacker: a dual core AMD Opteron 175 CPU with 2 GB memory and a 1Gbps ethernet link. The Web server is Apache 2.2.10 with the number of concurrent clients set to 1000 to increase parallelism. The monitor is a desktop with Intel P4 3.2GHz CPU, 1GB memory and a 1Gbps ethernet link. We use a set of 263 Coral cache servers to amplify the attack in our experiment.

5.3.2 A Sustained Attack

To show the feasibility of sustaining an attack over a long period of time, We let the attacker send 25 requests to each of the 263 Coral cache servers every two minutes, repeating this cycle 20 times. Thus, this is an attempt to create a 40-minute long attack. The effects of this attack are shown in Figure 5.3. Figures 5.3(a) and 5.3(b) depicts the in-bound and out-bound per-second traffic on the web server and the attacker before, during, and after the attack. Table 5.5 shows the average increase of traffic during the attack on the server and the attacker. As seen from this table, the attack increases overall traffic at the origin site by 555, 728 Byte/s (4.45 Mbps), or almost half of the 10Base Ethernet link bandwidth. Moreover, this load is imposed at the cost of only 45, 666 Byte/s traffic increment to the attacker,

103 or a quarter of a T1 link bandwidth. Thus, the attacker was able to use a CDN to amplify its attack by an order of magnitude over the entire duration of the attack.

In-Bound (B/s) Out-Bound (B/s) Total (B/s) Server 40,528 515,200 555,728 Attacker 13,907 31,759 45,666 Table 5.5: Average traffic increase during the attack period.

Figure 5.3(c) shows the dynamics of the download performance (measured as throughput) as seen by the monitor, representing a regular user to our web site. The figure indicates a dramatic degradation of user-observed performance during the attack period. The download throughput of the monitor host dropped by 71.67 times on average over the entire 40-minute attack period, from 8824.2KB/s to 123.13KB/s.4 In summary, our attack utilized a CDN to fill half of the 10Base Ethernet link of its customer Web site at the cost of a quarter of T1 link bandwidth for 40 minutes. A more aggressive attack (using more edge servers and a larger target file) would result in an even larger amplification.

5.3.3 A Burst Attack

A CDN may attempt to employ data mining over the arriving requests to detect and block our attack. While we discuss in section 5.3.4 why this would be challenging to do in a timely manner, we also wanted to see what damage the attacker could inflict with a single burst of requests to minimize a chance of detection. Consequently, in this experiment, the attacker sends a one-time burst of 100 requests to each of the 263 Coral servers. This apparently tripped Coral’s rate limiting, and only around a third of the total requests made their way to the victim Web server. However, as we will see below, these requests were more than enough to cause damage.

4We should note that the absolute performance numbers regarding the web server performance should be taken with a grain of salt because they depend on server tuning. Tuning a web server, however, is not a focus of this study, and our measurements reflect a typical configuration.

104 The dynamics of this attack are shown in Figure 5.4. We should mention that, for simplicity, this experiment was performed with the attacker host going completely silent instead of resetting the connection right after receiving the first data packet. With this setup, the Coral servers performed multiple re-transmission attempts for the unacknowledged first data packet of the response. This lead to a slight increase of the attacker bandwidth consumption. However, even with this increase, the attacker achieves an effective attack amplification, by more than the factor of 50 at its peak. As one can see from Figure 5.4, a single burst attack can have a long-lasting effect on the web site. Its bandwidth consumption increased by an order of magnitude or more for 85 seconds. The attack amplification of at least an order of magnitude lasted for almost two minutes (114 seconds). The average download performance seen by the monitor dropped three orders of magnitude, from the average of 8.6 MB/s during the normal period to 8.4 KB/s for over three minutes. These long-lasting effects are caused by the pending requests accumulated at the server, which take a long time to resolve and prolong the the attack. We conclude that a burst attack can cause a significant temporary disruption of a Web site. By repeating burst attacks from random botnet nodes at random times, the attacker can lead to intermittent availability and erratic performance of its victim site.

5.3.4 Discussion: Extrapolation to Commercial CDNs

We have shown above the end-to-end effect of our attack with Coral CDN. Since we can only assess the effect by observing a degraded performance, we cannot perform a similar demonstration with commercial CDNs without launching a DoS attack against the affected content provider. We then considered to try to degrade the performance of the content provider “just a bit”, but realized that this degradation would be in the noise or else noticeable. If degradation appears as a noise, our demonstration would

105 be inconclusive. On the other hand, noticeable degradation implies a DoS attack unless the content provider consented to our experiment. Although we cannot safely replicate our Coral attack with commercial CDNs, we conclusively show that an attacker could make the origin site consume almost 1Mbps of its bandwidth (i.e., transmit a file of roughly 50K in at most 0.5s – see Section 5.2.5), at the expense of negligible bandwidth of its own. Simply replicating this action, using different random strings and different edge servers, would allow the attacker to saturate the content provider bandwidth or other resources. In theory, one could imagine a CDN to use some clever data mining to detect and block the attacker that replicates these actions. However, such data mining would be challenging and at best only provide partial protection. Indeed:

• It cannot protect against a burst attack. Because the attack consumes very little resources on the attacking host, the attacker can send a large number of requests to a large number of edge servers almost instantaneously. As we saw in Section 5.3.3, because of queuing of pending requests, a single burst can affect the content provider for a long time.

• A CDN cannot perform this data mining at individual edge servers or even data centers because each server will only see a very low request rate from the attacker. For example, to saturate a T3 line, the attacker must send only 45 requests per second (less if a larger than 50K object were used in the attack). Assuming a CDN with 500 locations, this translates into less than one request per ten second to each data center. Thus, the data mining by a CDN has to be centralized.

• Performing centralized data mining over global request rates requires transfer- ring large amounts of data, in real time, to the central location. Although CDNs do provide global usage reports to their customers, detecting our attack

106 requires data at the fine granularity of individual clients’ requests to individ- ual URLs. As an example, Akamai’s EdgeSuite service provides usage reports only at 1-minute granularity and with aggregated information such as numbers of clients accessing various Akamai locations and their overall request rates to the subscriber’s content. The timeliness with which they can “drill down” to individual clients and URLs is unclear.

• Even if real-time centralized data mining were possible, the attacker can further complicate the detection by using a botnet and/or employing multiple objects in the attack.

In summary, while data mining detection of a sustained attack is theoretically pos- sible, we believe (a) a far better protection is to prevent amplified malicious requests and/or provide enough data to subscribers to let them perform their own site-specific detection (see Section 5.5), and (b) content delivery networks and their subscribers must be aware of this dangerous attack regardless, to make sure they are protected.

5.4 Implication for CDN Security

Although this study focuses on the threat to CDN customers, the vulnerabilities we describe also pose security issues for the CDN itself. We demonstrated in section 5.2.3 that edge servers view each URL with an appended random string as a unique URL, and cache it independently. Thus, by requesting an object with multiple random strings, the attacker can consume cache space multiple times. Furthermore, by over- riding CDN’s edge server selection (section 5.2.2), the attacker can employ a botnet to both target strategically selected edge servers and to complicate the detection. Con- structing its requests from several base URLs can further complicate the detection of this attack.

107 In principle, the attacker can attempt to pollute the CDN cache even without the random strings, simply by requesting a large number of distinct CDN-accelerated URLs. However, unlike forward caches, edge servers only accelerate a well-defined set of content which belongs to their customers, limiting the degree of cache pollution that could be done with legitimate URLs. The random string vulnerability removes this limit. Detailed evaluation of this attack is complicated and is outside the scope of this study. We only note that the countermeasure described in section 5.5.1 will protect against this threat as well.

5.5 Mitigation

The described attack involves several vulnerabilities, and different defensive measures can address different vulnerabilities. In this section, we describe a range of measures that can be taken by content providers and by CDNs to protect or mitigate our attack. However, we view our most important contribution to be in identifying the attack. Even the simple heuristic of dropping URLs in which query strings follow file extensions that indicate static files, such as “.html”, “.gif”, “.pdf”, would go a long way towards reducing the applicability of our attack. Indeed, these URLs should not require query strings.

5.5.1 Defense by Content Provider

Our attack crucially relies on the random string vulnerability, which allows the at- tacker to penetrate the protective shield of edge servers and reach the origin. Content providers can effectively protect themselves against this vulnerability by changing the setup of their CDN service as described below. We will also see that some types of CDN services are not amenable to this change; in these cases, the content provider

108 cannot protect itself unilaterally and must either forgo these services or rely on CDN’s mitigation described in the next subsection. To protect against the random string vulnerability, a content provider can setup its CDN service so that only URLs without argument strings are accelerated by the CDN. Then, it can configure the origin server to always return an error to any requests from an edge server that contains an argument string. Returning the static error message is done from main memory and consumes few resources from both the server and network. In fact, some CDNs customize how their URLs are processed by edge servers. In particular, Akamai allows a customer to specify URL patterns to be dropped or ignored [51]. The content provider could use this feature to configure edge servers to drop any requests with argument strings, thus eliminating our attack entirely. The only exception could be for query strings with a small fixed set of legitimate values which could be enumerated at edge servers. We refer to this approach of setting up a CDN service as “no-strings-attached”. The details how no-strings-attached could be implemented depend on the indi- vidual Web sites. To illustrate the general idea, consider a Web site, foo.com, that has some dynamic URLs that do require seemingly random parameters. A possible setup involves concentrating the objects whose delivery is outsourced to CDN in one sub-domain, say, outsourced.foo.com, and objects requiring argument strings in an- other, such as self.foo.com. Referring back to Figure 1.1, foo.com’s DNS server would return a CNAME record pointing to the CDN network only to queries for the former hostname and respond directly with the origin’s IP address to queries for the latter hostname. Note that the no-strings-attached approach stipulates a so-called “origin-first” CDN setup [66] and eliminates the option of the popular “CDN-first” setup. Thus, the no-strings-attached approach clearly limits the flexibility of the CDN setup but allows content providers to implement the definitive protection against our attack.

109 5.5.2 Mitigation by CDN

Although the no-strings-attached approach protects against our attack, it limits the flexibility of a CDN setup. Moreover, some CDN services are not amenable to the no-strings-attached approach. For example, Akamai offers content providers an edge- side includes (ESI) service, which assembles HTTP responses at the edge servers from dynamic and static fragments [29]. ESI reduces bandwidth consumption at the origin servers, which transmit to edge servers only the dynamic fragments rather than entire responses. However, requests for these objects usually do contain parameters, and thus no-strings-attached does not apply. In the absence of the no-strings-attached, a CDN can take the following steps to mitigate our attack. To prevent the attacker from hiding behind a CDN, the edge server can pass the client’s IP address to the origin server any time it forwards a request to the origin. This can be done by adding an optional HTTP header into the request. This information will facilitate the identification of, and refusal of service to, attacking hosts at the origin server. Of course, the attacker can still attempt to hide by coming through its own intermediaries, such as a botnet, or public Web proxies. However, our suggestion will remove the additional CDN-facilitated means of hiding. Coral CDN already provides this information in its x-codemux-client header. We believe every CDN must follow this practice. Further, the CDN can prevent being used for an attack amplification by throt- tling its file transfer from the origin server depending on the progress of its own file transfer to the client. At the very least, the edge servers can adopt so-called abort forwarding [32], that is, stop its file download from the origin whenever the client closes its connection. This would prevent the most aggressive attack amplification we demonstrated in this study, although still allow the attacker to achieve significant amplification by slowing down its transfer. More elaborate connection throttling is not such a clear-cut recommendation at this point. On one hand, it would minimize

110 the attack amplification with respect to bandwidth consumption. On the other hand, it would tie other server resources (e.g., server memory, process or thread, etc.) for the duration of the download and delay the availability of the file to future requests. We leave a full investigation of connection throttling implications for future work.

5.6 Conclusion

This study describes a denial of service attack against Web sites that utilize a con- tent delivery network (CDN). We show that not only a CDN may not protect its subscribers from a DoS attack, but can actually be recruited to amplify the attack. We demonstrate this attack by using the Coral CDN to attack our own web site with an order of magnitude attack amplification. While we could not replicate this exper- iment on commercial CDNs without launching an actual attack, we showed that two leading commercial CDNs, Akamai and Limelight, both exhibit all the vulnerabilities required for this attack. In particular, we showed how an attacker can (a) send a re- quest to an arbitrary edge server within the CDN platform, overriding CDN’s server selection, (b) penetrate CDN caching to reach the origin site with each request, and (c) use an edge server to consume full bandwidth required for processing a request from the origin site while expending hardly any bandwidth of its own. We describe practical steps that CDNs and their subscribers can employ to protect against our attack. Content delivery networks play a critical role in the modern Web infrastructure. The number of CDN vendors is growing rapidly, with most of them being young firms. We hope that our work will be helpful to these CDNs and their subscribers in avoiding a serious security pitfall.

111 5.7 Acknowledgement

This work first appeared in [83]. We thank Mark Allman for an interesting discus- sion of the ideas presented here. He in particular pointed out the cache pollution implication of our attack.

112 1e+07

1e+06

100000

10000 Traffic (Byte/s)

1000 In-Bound Out-Bound 100 0 500 1000 1500 2000 2500 3000 Time (second) (a) Traffic on the Web server

1e+07 In-Bound Out-Bound 1e+06

100000

10000 Traffic (Byte/s)

1000

100 0 500 1000 1500 2000 2500 3000 Time (second) (b) Traffic on the attacker

1e+07

1e+06

100000

10000

1000 Download Speed (Bytes/s)

100 0 500 1000 1500 2000 2500 3000 Time (second) (c) The Web server performance observed by end- user

Figure 5.3: The effects of a sustained DoS attack.

113 1e+08 In-Bound

1e+07 Out-Bound

1e+06

100000 Traffic (Byte/s)

10000

1000 0 100 200 300 400 500 600 Time (second) (a) Traffic at the Web server

1e+06 In-Bound Out-Bound 100000

10000 Traffic (Byte/s) 1000

100 0 100 200 300 400 500 600 Time (second) (b) Traffic at the attacker host

1e+07

1e+06

100000

10000

1000 Download Speed (Bytes/s)

100 0 100 200 300 400 500 600 Time (second) (c) The Web server performance observed by end-user Figure 5.4: The effects of a burst DoS attack.

114 Chapter 6

Client-Centric Content Delivery Network

Content delivery networks (CDNs) are responsible for a significantly large portion of today web traffics. Improving the content delivery performance of CDNs would benefit Internet users as a whole. We propose a scalable client-centric approach to improve the content delivery performance of CDN with minimal alteration of the current CDN platform. We then quantify the improvement gain of our approach in actual web usage from a large organization network comprising 8.2 million HTTP downloads from over 21,000 clients. Our simulation shows that in the best case scenario, our approach can increase the TCP utilization by 2 orders of magnitudes and save 51 minutes of aggregated DNS resolution time during the 2.5 hours trace period 1 Our replay experiment shows that our approach can bring around 22%-36% overall performance improvement for individual HTTP downloads. Therefore, this approach will be worthwhile for CDNs seeking to improve their performance. Finally, we discuss steps required for the realization of our approach in practice, and benefits

1To be clear, the reduction includes cases where we shrink DNS resolution time of HTTP down- loads that were running in parallel. Therefore, the 51 minutes of the saved time do not translate directly into the reduction of user-visible delays from the 2.5 hours trace period.

115 of our approach over higher latency networks such as cellular network.

6.1 Introduction

Content delivery networks have become an integral part of the Internet infrastructure. Just the leading CDN provider – Akamai – claims to be delivering 20% of all Web traffic [9], and there are dozens other CDNs as well. CDNs are content-provider centric: they provide service to subscribing content providers and they answer to their subscribers regarding the performance of the outsourced content delivery. In particular, as part of this business relationship, the CDN and a subscribing content provider agree on the (sub)set of edge servers and locations to be used for delivering their content. Because different subscribers may be assigned to different edge servers, users from the same organizational network, or even the same user, may be directed to different edge servers for different downloads. This work explores potential performance benefits of a somewhat different oper- ation model, where instead of allocating edge servers to groups of content providers, the CDN “assigns” edge servers to (groups of) organizational or metropolitan net- works to which the users – the consumers of the content – belong. This brings the usage of an edge server closer to that of a client-side forward proxy, except that the edge server is still utilized only for the content from subscribers and is physically part of the CDN platform. In other words, users from a given organizational network would be directed to the same edge server to download content from all the CDN subscribers, while at the same time content-provider centric nature of the CDN does not change. Our idea was inspired by an observation from our previous work (Chapter 4 and 5) that a client can already download any CDN-outsourced object from any edge server, whether or not this edge server belongs to the server set assigned to the object’s

116 content provider, and that this edge server will cache the object for future use [83]. We further observe in this current work that a server that provides good download performance to a given client is likely to continue to be a good choice for an extended period of time. These features suggest a number of ways to potentially optimize client’s performance: (a) the client could download content from the best-performing edge server whether or not this server is assigned to the content provider; (b) by going to the same server regardless of the content being accessed, the client has a better chance to reuse an existing TCP connection and avoid the TCP connection start-up overhead; and (c) “binding” to the same server as long as it provides good performance may remove much of the overhead from frequent DNS queries. This paper presents our study to quantify these potential benefits and sketches architectural approaches that could realize these benefits. Our approach is related to several past ideas that blurred the boundaries between forward proxies and CDNs, such as the Content Bridge alliance [21] and an Aka- mai patent disclosing a possibility of deploying an edge server inside an enterprise Intranet [57]. Unlike these efforts, we propose to improve download performance of CDN-accelerated content simply by changing the way edge servers are selected. Our contributions are in (a) quantifying the potential for performance optimization from user-centric edge server allocation, and (b) describing the changes needed at the or- ganization network to enact our approach and steps to realize this approach in a wild.

6.2 Architectural Approaches

CDN subscribers outsource their content delivery to the CDN by replacing hostnames in their content URLs with hostnames belonging to the CDN domain. (We refer to these hostnames as “CDN-outsourced” hostnames.) Then, user DNS queries to

117 resolve the outsourced hostnames arrive to the CDN’s authoritative DNS servers, which select the appropriate edge servers for each query and return their IP addresses to the users. Our basic approach is for an organizational network to, most of the time, direct all its users to the same edge server, for all content. Specifically, there are two variations of our approach, which we describe using Akamai, the leading CDN provider, as an example. In one variant, the organization would modify its local DNS server (LDNS) to monitor a set of Akamai edge servers, which the LDNS can do, e.g., by periodic DNS queries for Akamai-outsourced hostnames. When processing DNS queries from the users, the LDNS would recognize queries for Akamai-outsourced hostnames and, in- stead of forwarding these queries in a usual way, simply respond with the IP address of an edge server of its own choosing, regardless of the specific hostname being re- solved. Normally the same edge server would be returned for all queries unless its performance degrades or it becomes unavailable (both of which could be detected and reported to the LDNS by users’ browsers). In our study, we chose to evaluate this variant to show feasibility and benefit of our client-centric solution to improve CDN’s performance. In another variant, the organization would deploy a specialized HTTP proxy that accepts user requests for all Akamai-accelerated content and downloads this content from a self-selected edge server (again, forgoing proper DNS resolution). In fact, this proxy could even cache the objects locally just as regular forward proxies do, although it is easier to avoid any potential bottlenecks by simply splicing TCP connections between the user and the edge server and forwarding datagrams between both sides at the line speed. Both variants virtually eliminate DNS resolution overhead for outsourced host- names. Both variants increase persistent HTTP connection utilization, although the

118 second approach does so to a higher degree because it can multiplex all user requests into the same TCP connection to the edge server. On the other hand, the first vari- ant only requires a modified LDNS server and no extra network components, while the second needs a specialized HTTP server, and care to avoid making it another bottleneck. There are further two ways to realize either flavor of our approach. One alternative is for the organizational network to simply override the CDN’s edge server selection with its own mechanism, and we verified that existing CDNs allow such user-imposed edge server selection [83]. However, CDNs can fairly easily implement mechanisms to block the selection of an unintended edge server. Instead, the other alternative is for the CDN itself to embrace this approach and make the necessary software modules 2 available to organizations. Installing these modules would be beneficial for both the the organizations and the CDN: the organization’s users will experience better download performance when accessing content from the CDN’s subscribers, and the CDN will deliver better performance to their subscribing content providers.

6.3 The Effect of Infrequent Server Selection

The side effect of reusing the DNS query answer and overriding the CDN’s server selection algorithm is the client incompetence to take an advantage of the current network condition. Therefore, the client performance could suffer in some situation such as traffic congestion, routing change, or server down in the worst case. However, in our experiment, we reuse the DNS query answer in several downloads for more than two hours and never found that this side effect reduces the overall improvement. To mitigate this side effect, a client could rebuild the lookup table once every a certain period of time. According to our result discuss in this section, our client

2These modules would include a browser plug-in to detect and report to LDNS edge server performance degradation or unavailability and an appropriately modified LDNS to act upon these notifications and to periodically probe CDN for a better edge server.

119 could redo the domain name resolution once every 2.5 hours. As a result, the client simply executes the resolution only around 10 times a day regardless of how many downloads are performed. We left the thorough investigation of the optimal reuse duration for pragmatic use in the real world as a future work.

1

0.8

Trial 1 0.6 Trial 2

CDF Trial 3 0.4 Trial 4 0.2 Trial 5 Trial 6 0 0 1 2 3 4 5 Ratio

Figure 6.1: Download performance ratio of reused edge server over Akamai selected edge server throughput

In this section, we consider how well the edge server keeps its performance level against the fluctuation of the Internet for an extended period of 2.5 hours. To answer this question, we conduct an experiment on the DipZoom measurement platform as follows. First of all, we choose an 47, 727 byte object3 from buy.com that buy.com outsourced to be cached by Akamai. At the beginning, we request 260 DipZoom mea- suring points (MPs) world-wide to perform domain name resolution of ”ak.buy.com” to get Akamai selected edge server IP addresses for each MP. We then store these MP-to-edge-server mappings in a lookup table for later use in this experiment. For the first trial, we send two measurement requests to the same set of MPs. The first measurement request, referred to as ”Edge-Reuse Download” asks each MP to request the buy.com object directly from the edge server IP address without

3http://ak.buy.com/db assets/large images/093/207502093.jpg

120 spending time on DNS query. The MP looks for the corresponding target edge server IP address from the lookup table. The second measurement request, referred to as ”Normal Download”, asks MP to perform a download directly from the edge server selected by Akamai at the time of the download. The latter is obtained by a normal DNS query to Akamai just prior to the download. We make sure that the requesting content is cached on the edge server prior to both downloads to avoid potential cache- misses skewing our results. Then, we wait for 30 minutes before repeating the trial again and do so up to 6 trials. We use the following performance improvement ratio to compare the download performance between Edge-Reuse and Normal downloads.

Edge−Reuse Download Speed Normal Download Speed

Then, we compute cumulative distribution function (CDF) curves of all ratios for each trials and present them in Figure 6.1. For a given point X on X-axis, the corresponding point Y on Y-axis shows the percentage of MPs with download performance ratio less than or equal to X. A ratio larger than 1 indicates that Edge-Reuse outperforms Normal download. The experimental result shows that the Edge-Reuse download outperforms the Normal download around 50% of the time, and vice versa. The average (across all the MPs) ratios of trial 1 to 6 are 1.12, 1.22, 1.16, 1.16, 1.26, and 1.12 respectively. It not only shows that both Edge-Reuse and Normal download performances are similar from the fact that ratios are clustering around 1, but it also shows that content delivery performance from the same edge server is consistent over the extended period of 2.5 hours. More specifically, to verify the time-invariant properties of these 6 trials visually seen in the Figure 6.1, we calculate correlation coefficients of all trial pairs and present them in Table 6.1. All the numbers in diagonal are 1 because they are auto-correlation and the rest

121 trial 1 2 3 4 5 6 1 1.0000 0.9902 0.9961 0.9936 0.9822 0.9960 2 1.0000 0.9852 0.9847 0.9742 0.9930 3 1.0000 0.9960 0.9903 0.9868 4 1.0000 0.9961 0.9874 5 1.0000 0.9726 6 1.0000

Table 6.1: Pearson correlation of all trial pairs are cross-correlation coefficients of all pairs. Since all the correlation coefficients are close to 1, it confirms that 6 trials of downloads within 2.5 hours period are highly correlated. This suggests that the need for clients to re-evaluate the network conditions every 20 seconds through repeated DNS resolutions as adopted by Akamai might not be the only option.

6.4 Performance Improvement

Our approach is a simple yet elegant solution to help improve delivery performance of CDN provider. According to the result in section 6.3, we can reuse the domain name resolution result for extended period of time to reduce the resolution time overhead per request. Similarly, the existing TCP persistent connection can be reused across requests for multiple objects to amortize TCP startup costs across a larger number of requests. Reducing these two overheads would result in a better responsiveness from the client perspective. The download performance is enhanced because of several factors. First of all, a fewer number of TCP connections for the same amount of requests can lead to less time spent during the slow-start periods, in which throughput is sub-optimal. Secondly, when a TCP connection is idle for more than the re-transmission timeout, the congestion window becomes invalid and TCP re-adjusts its congestion window value. How TCP adjusts its congestion window varies from implementation to im-

122 plementation. Throughout this thesis, we use the terminology ”congestion window timeout” to refer to an idle period that leads to invalidation of the congestion window. In general, TCP trends to set the window to some smaller value. However, in the extreme case, TCP could go back to slow-start after the congestion window timeout. Therefore, a subsequent request after this timeout would suffer from this performance factor. Consolidating multiple overlapping connections to different edge servers into a single connection can reduce the congestion window timeout events. Avoiding these sub-optimal throughput conditions promises a better content delivery performance. To demonstrate the content delivery performance improvement gains by adopting our approach, we consider potential performance benefits for clients on the example of the CWRU campus network and Akamai CDN. In the demonstration, we show the performance improvement of the DNS-based variant of our approach over the current operation of Akamai on the CWRU campus network. Throughout this study, unless stated otherwise, we refer to the ”DNS-based variant of our approach” as ”our approach” and ”current operation of Akamai on the CWRU campus network” as ”current approach”. We conduct two studies to quantify the benefits of our approach. First, we sim- ulate how much overheads can be reduced by adopting our approach. This study compares both our and current approaches at the scale of the entire campus network. Next, we replay transactions from randomly selected clients on CWRU campus using the current and our approaches and then compare the results. This is an end to end verification of gains from adopting our approach.

6.4.1 Data Set

This study makes use of two data sets – HTTP packet trace collected at line speed from the gateway of the CWRU campus network and DNS query traffic we collected during this study as described later.

123 6.4.1.1 HTTP Traffic

We use an anonymized packet trace of all HTTP traffic entering or leaving the Case campus network for our study. The trace spans 2.5 hours during the high traffic period (in the afternoon) of September 14, 2010 and comprises over 8.2 million HTTP downloads from 21,616 clients. Akamai-accelerated content was responsible for over 28.22% (2.3 million) of all HTTP requests, confirming the important role Akamai plays in modern Internet infrastructures and in the line with their above-mentioned 20% claim for our client population at the time of the trace. The requests in the trace occur over 4 million TCP connections, of which 0.78 million were to Akamai servers. Today, Akamai already deploys edge servers locally within some of the large net- works, and CWRU campus network is one of them. We refer to those edge servers as ”local edge servers”. While the trace contains external clients accessing Case Web servers, the majority of Akamai traffic is from internal users because Case web sites do not use Akamai. We extracted the Akamai traffic pattern from Case local clients on this trace and used it to evaluate our approach. The Akamai traffic from our trace has the following characteristics. There are 5,725 Case clients that generate 2.3 million HTTP requests to Akamai edge servers over 0.78 million TCP connections. Of which, there are 1.6 million requests with the total content size of 51 GB to local Akamai edge servers over 0.52 million TCP connections and 0.7 million requests with the total content size of 35 GB to off-campus Akamai edge servers over 0.26 million TCP connections. Figure 6.2 illustrates distributions of Akamai-cached object requests and connec- tions to Akamai edge servers on a per client basis. We are interested in these two distributions because of the following. (a) A large number of TCP connections per client suggests possible room for improvement using our approach. A client could consolidate its connections to benefit from higher utilization and lower overhead per requests. (b) A large number of requests per client implies a smaller chance to have an

124 1

0.9

0.8

0.7

0.6

0.5

P(X<=x) 0.4

0.3

0.2 HTTP Requests Per Client 0.1 TCP Connections Per Client 0 1 10 100 1000 10000 100000 x (number of unit per client)

Figure 6.2: Distribution of requests/connections to Akamai edge servers per client inter-request time gap bigger than the HTTP keep-alive timeout. This gap will cause the client to reopen another connection, hence decrease connection utilization and increase overhead. From the graph, around half of client population send at least 100 HTTP requests and open at least 36 TCP connections. In another word, the median of utilization is 100/36 = 2.78 HTTP requests/connection. The simulation of our approach shows that it increases the average utilization of the clients to 273 HTTP requests per connection. Therefore, a significant number of clients could potentially benefit from our approach. The section 6.4.2 discusses this issue in detail.

6.4.1.2 DNS Queries

The DNS query time for Akamai outsourced domain from the perspective of clients on the CWRU campus network, especially in case of misses in the local DNS cache, is of interest to us. Since our trace contains no DNS transactions, we conduct an additional experiment to collect the query answers as follows. First of all, we compile a list of 5,218 Akamai domain names from our HTTP traffic trace. Then, from a machine on the CWRU campus network, we perform resolution of every domain name on the list and collect query time and the DNS response (including A and CNAME records and

125 their TTL). Thereafter, we use this DNS query dataset to assess an average DNS query time of Akamai domains in the cache-miss scenario. We extract only the answers where TTL record is 20 seconds. Since the TTL for Akamai query answer is 20 seconds, the query times of these extracted records reflect the resolution time from Akamai’s authoritative DNS system and also include local DNS cache-miss penalty. After this filtering, we have 4,092 query times. Next, we weigh each query time with how often its corresponding domain appears during the HTTP traffic trace. As a result, the average DNS query time in cache-miss scenario is 22.38 milliseconds. We did this probing within a month after the trace collection.

6.4.2 The Improvement Simulation

In this section, we simulate the situation where all the Akamai traffic from our data set takes place in the network that adopts our DNS-based approach. Then, we calculate the overhead reduction and TCP connection utilization gain in comparison with what happen in the actual traffic on CWRU campus network today.

6.4.2.1 The Simulation Method

We use the simulation to compare our and current approaches in the following aspects – DNS overhead and TCP connection utilization. This section explains details of the simulation. We begin with the discussion on DNS and its overhead. In our approach, each client incurs only a single DNS resolution to get the Akamai selected server over the entire trace period and tries to maximize the utilization of the established TCP connection. The performance penalty of this infrequent server selection over the entire trace period has been shown to be negligible in section 6.3. Therefore, in the best case scenario, it suffices to perform only a single DNS resolution with Akamai’s

126 authoritative DNS server and reuse the answer for the entire trace period instead of issuing repeated DNS queries as in current practice. As already mentioned, our traffic trace only contains HTTP traffic and contains no DNS traffic. Therefore, we use the simulation to calculate the DNS overhead in the current practice. Since the TTL for Akamai DNS query answer is 20 seconds, the first query after 20 seconds from the previous one would cause the local DNS to re-obtain the new query result from the authoritative DNS. We simulate this effect at the local DNS level of campus network as follows. First of all, we sort all the Akamai-cached object requests to a certain target domain name across all clients by timestamps. The timestamp of the first request is selected as a reference point. Assuming that this reference DNS query causes the cache-miss at the local DNS server, the cache- miss count is increased by 1. All the requests within 20 seconds after this reference point are ignored because we assume that in this period the local DNS will answer all queries for this domain name from its own cache. Then, the first request after this 20 seconds is selected as a new reference point. We increase the cache-miss counter by one and progress in time for another 20 seconds in a similar fashion. This process continues until the last request. We repeat this process for each of 5,218 Akamai- outsourced domain names. On the high-speed campus network, we assume that local DNS spends a very small amount of time to answer the query from its own cache. Therefore, for simplicity and to favor the current practice, the local query time is excluded from the simulation. In practice, web browsers or operating systems might have their own heuristics to cache the DNS query answers longer than assigned TTLs. Therefore, our simulation could be viewed as an upper bound of DNS query overhead in the current practice. For the TCP utilization, since we already have the HTTP traffic trace, we can simply analyze the trace to get the TCP connection utilization in the current practice. Now, the utilization in our approach is simulated as follows. Since Akamai sets the

127 TCP keep-alive to 500 seconds [85], a set of requests with inter-request gap not exceeding 500 seconds, regardless of to which connection they belong in the trace, utilize the same TCP connection in our simulation. Once such a gap is found, the TCP connection is terminated and another connection is opened. We will show in the next section that our approach not only significantly reduces the amount of time spent on TCP handshake, but also increases the utilization by two orders of magnitudes.

6.4.2.2 Performance Improvement Result

Current Client-Centric Number of TCP connections 781,717 9,436 Total TCP handshake time (seconds) 11,130 134.4 Utilization (average requests/connection) 3.44 273.48 Local DNS cache-miss 138,237 1 Total DNS query time (seconds) 3,093.74 0.022

Table 6.2: Performance of the client-centric CDN vs the current practice (best case scenario)

1

0.9

0.8

0.7

0.6

0.5

P(X<=x) 0.4

0.3

0.2

0.1

0 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 10 100 1000 10000 x (inter-request gap in seconds)

Figure 6.3: Distribution of time gap between consecutive Akamai requests from the same client

128 1

0.9

0.8

0.7

0.6

0.5

P(X<=x) 0.4

0.3

0.2 our approach 0.1 current approach 0 1 10 100 1000 10000 100000 x (requests per connection)

Figure 6.4: Utilization (requests/connection) in both approaches

Here we discuss and compare the simulation result of our approach and the ac- tual situation from the collected trace. Table 6.2 compares different aspects of both approaches in the best case scenario when only one DNS query is sent to the author- itative DNS server. While the current approach opens 495, 213 TCP connections to handle all the Akamai requests on CWRU campus for 2.5 hours, our approach will only need 9, 436 connections to handle the same demand. Figure 6.3 shows the cumulative distribution function of the inter-request time gap of requests across all Akamai-accelerated contents of all customers combined. Most of the gaps are quite small. In fact, there are only 15% of the time that the gap is wider than 1 second and 0.6% of the time that the gaps are wider than 500 seconds. Accordingly, clients are less likely to reopen many TCP connections and therefore it leads to higher connec- tion utilization. Unlike the current approach, our approach can multiplex requests for contents of several Akamai’s customers into a single connection, therefore it can take an advantage of the higher utilization. Since the average TCP handshake time in the trace for connections to Akamai edge servers is 14.24 milliseconds, clients in the network that adopt our approach will only spend 2.24 minutes in total for TCP handshakes instead of over 3 hours in current setup. However, it is not the case in the current approach where requests for content from different content providers have to go through different connections.

129 As the result, clients can utilize a single connection for 273.48 Akamai object requests on average compared to approximately only 3.44 requests in the current approach. This is an improvement of two orders of magnitude. Figure 6.4 is a CDF graph of per-client connection utilization for both approaches. It shows that only 3.5% of the time current clients utilize the same connection for more than 10 requests, whereas in our approach, about 72% of connections are utilized by more than 10 requests.. With the higher utilization, not only that we can amortize the TCP handshake overhead across larger number of requests, but also reduce the frequency of slow-start and congestion window timeout events. While this simulation doesn’t capture slow-start and congestion window timeout effects, the replay experiment in the next section takes this into account. In particular, the CWRU network assigns private IP addresses to all wireless clients and funnels them through a single wireless gateway. Therefore, in our trace, we see traffic from all wireless clients as coming from a single client, the wireless gateway. In our simulation, this gateway is seen as a single client node. Since the gateway handles simultaneous transmission of many wireless clients, chances are low to have a large inter request time gap. The simulation result confirms that this special client would resort to only a single TCP connection to handle 0.77 millions requests (out of 2.3 millions) in our approach. Therefore, it overestimates an improvement gain for this special case. At the same time, this gateway could act as a TCP proxy, in effect implementing the forward-proxy variant of our approach for the wireless clients. The improvement of DNS query overhead is significant as well. In our approach, the local DNS only forwards the query for Akamai outsourced domain name to the authoritative DNS once to get Akamai selected servers. Then, it reuses this answer for the entire trace period. According to our simulation, the local DNS in the cur- rent approach could experience up to 138,237 cache-miss from Akamai domain name

130 queries during a 2.5 hours traffic period. Multiplied by the 22.38 ms average DNS query time from section 6.4.1.2, the current approach spends over 51 more minutes on DNS resolutions than our approach. Of a quick recap, the simulation results show that TCP handshake and DNS query time overheads can be significantly reduced by adopting our approach. Moreover, it allows clients to improve the utilization of TCP connections. Essentially, the higher utilization also has some effects on the throughput. Although the simulation doesn’t capture these effects into consideration, the replay experiment in the next section does.

6.4.3 The Replay Experiment

The simulation in section 6.4.2 investigates the benefit of our approach on a large cam- pus network scale. Essentially, it captures how much the overheads can be reduced. However, an impact on higher utilization on TCP throughput remains unaccounted for. The replay experiment in this section aims to capture this effect as well as all the benefits introduced by our approach from clients’ perspective.

6.4.3.1 The Replay Method

We perform the end-to-end comparison of the performance of the current and our approaches by replaying actual Akamai traffic from the trace in both approaches and observing the implications in various aspects. This replay method tries to capture four effects – DNS query, TCP handshake, TCP slow-start, and TCP re-transmission timeout – impacted by our approach. Since it is infeasible to replay the entire traffic from our trace, we randomly select a subset of clients whose traffic is to be replayed. We randomly select 30 clients accessing Akamai-accelerated sites from our trace and extract the Akamai traffic trace generated by these clients. This subset of the Akamai trace represents 4,717 HTTP requests over 1,852 TCP

131 Connections and constitutes 549 MB of content; There are 3,186 requests to local Akamai edge servers over 1,251 TCP connections and 1,531 requests to off-campus Akamai edge servers over 601 TCP connections; The internal and external requests constitutes 533 MB and 16 MB of Akamai content respectively. We use this extracted set of Akamai traffic throughout sections 6.4.3.1 and 6.4.3.2. We evaluate our approach when used by both an organization that has local Aka- mai servers deployed and an organization that does not. To this end, we conduct two experiments to measure the performance gains in adopting our approach. In each experiment, we compare time spent in downloading the same amount of con- tent for our and the current approaches. Basically, each experiment performs and compares two replays, our-replay and current-replay. Our-replay refers to the replay of the extracted trace in the network which adopts our DNS-based approach. The current-replay refers to the replay in the network using current Akamai practice. For simplicity, our replay program does not support HTTP pipelining; therefore, when we replay a set of HTTP requests, the program replays one HTTP request and waits un- til the corresponding content retrieval is done before proceeding to the next request. Thus, two overlapping downloads will be serialized as two successive back-to-back requests in the order of their starting timestamps in the trace. At the same time, consecutive non-overlapping downloads in the trace are replayed while preserving the original inter-request intervals in the trace. We developed our replay software with ANSI C and libcurl [23] library. In our-replay, for each client, we first perform the domain name resolution for the first Akamai-accelerated URL and measure the time to get the IP address of the Akamai-selected edge server. Then, we open a TCP connection to our designated edge server and attempt to utilize this connection for requests across all customer contents. For each subsequent request, we calculate the new timestamp using the difference between its timestamp in the actual trace and the minimum timestamp of

132 Akamai requests from this client in actual trace. We dispatch each request based on the new timestamp of each request. In case the connection is terminated by remote host, we re-open another connection to the same selected edge server. During the replay, we record the domain name resolution time, every connection establishment time, and content download time. We perform current-replay as follows. For each client, we group requests for Akamai contents by the TCP connection to which they belong. For each group of requests, a client opens TCP connection to our selected edge server and sends all these requests through this connection. For each request, we calculate the new timestamp using the difference between its timestamp in the actual trace and the minimum timestamp of requests in the same group. We dispatch each request within its connection based on the new timestamp. For the same client, we continue replaying each group of requests sequentially in immediate succession. In the current approach, we replay the DNS query to measure the query time separately from the content delivery replay as follows. For each client, we compile a target list from the 1st request of each group of requests. Then, we calculate the new timestamp for each target using the difference between the timestamp of the corresponding request in actual trace and the original timestamp of the earliest request in the target list of this client. We perform the domain name resolution of each target in the list based on new timestamp, which preserves both the order and inter-request time intervals of the original DNS resolutions. Akamai sets the TTL for their DNS query answer to 20 seconds. The consecutive query within less than 20s after the previous query makes the local DNS answer the query from its cache. On the other hand, the consecutive query 20s apart from the previous request makes the local DNS obtain the answer from Akamai, which will likely take more time than getting the answer from its cache. We want to make sure that our replay captures this effect. In the HTTP request replay, we serialize requests

133 for simplicity and this could skew some gaps between DNS queries. If we replay DNS queries then the time they take might require adjusting the time at which subsequent TCP connections are started; If we just blindly replay TCP connections at the right times then they may overlap with DNS queries that are not yet finished and this in the real world is impossible. Therefore instead of performing the DNS query before every TCP connection in the current approach HTTP request replay, we replay the DNS queries separately to ensure that the space between each DNS query is the same as it happened in the real trace. In the first experiment, referred to as the internal performance gain experiment, we use the local Akamai edge server on Case campus as the selected edge server in the replays. We refer to two replays in this experiment as int-our-replay and int-current- replay. This experiment shows the performance gain in the network which has the local Akamai edge server deployed. In the second experiment, we choose the most popular external Akamai edge server used by the Case community as our selected edge server. We refer to two replays in this experiment as ext-our-replay and ext-current-replay. This experiment shows the performance gain in the network which has no local Akamai edge server deployed. In each experiment, we compare the two replays in different aspects to understand whether our approach bring in any performance benefits and by how much. Results of the comparisons are discuss in the next section 6.4.3. In int-cur-replay, we perform the DNS replay from the same Case client that we used to replay the Akamai HTTP requests. For ext-current-replay, we cannot replay the DNS queries with Case client because there exist local Akamai DNS server on Case campus. Therefore, we perform this DNS replay in other network where no presence of Akamai platform exist. We find by manual inspection that the PlanetLab node from university of Chicago shows no evidence of the of local Akamai edge server or local Akamai DNS server. Therefore, we perform the DNS query replay in current

134 approach with no local Akamai platform scenario from this PlanetLab node. Note that we used a client on Case campus in all HTTP replays.

6.4.3.2 Performance Improvement Result

Our Current Our % Current DNS time 18.08 0.24 1.33 TCP Handshake time 1192.31 63.38 5.32 Download Time 3885.31 3200.21 82.37 Total time 5095.70 3263.83 64.05

Table 6.3: Replay times (in second) of ext-our-replay and ext-current-replay.

Our Current Our % Current DNS time 24.81 0.41 1.67 TCP Handshake time 827.98 42.06 5.08 Download Time 3267.67 3179.17 97.29 Total time 4120.47 3221.64 78.19

Table 6.4: Replay times (in second) of int-our-replay and int-current-replay.

Current Our # of Connections 1857 77 Utilization (average requests/connection) 2.5 61.3 Table 6.5: TCP connection utilization in the network with no local Akamai edge server deployed.

In this section, we present the experiments result to show how much content de- livery improvement can be gained by adopting our approach. We choose 192.5.110.39 and 192.232.17.17 as our selected edge servers in the replays as, respectively, internal and external Akamai server. As a reminder, this allows us to measure the improve- ment gain by adopting our approach in the network with and without local Akamai edge server deployed. We perform the int-our-replay and int-current-replay to the internal Akamai server and the ext-our-replay and ext-current-replay to the external Akamai server. Then, we measure the time spent on each component during the replay.

135 Current Our # of Connections 1856 91 Utilization (average requests/connection) 2.5 51.8 Table 6.6: TCP connection utilization in the network with a local Akamai edge server deployed.

1 Network with local edge server 0.9 Network without local edge server

0.8

0.7

0.6

0.5 P(X>=x) 0.4

0.3

0.2

0.1

0 0.0001 0.001 0.01 0.1 1 10 100 1000 10000 100000 ratio

Figure 6.5: The ratio of per-client total TCP handshake time spent in the current approach over our approach.

We summarize the comparison between total download time of the same contents in the current network configuration and in our proposed configuration in Tables 6.3 and 6.4. As seen from Tables 6.3 and 6.4, it takes around 69 and 84 minutes to deliver all the contents in the current configuration as opposed to around 53 and 54 min- utes when adopting our approach for the network with or without local Akamai edge server deployed respectively. Therefore, our approach offers 21.81 and 35.95 percent improvement respectively. We then dissect the improvement into 3 different compo- nents, DNS query time, TCP handshake time, and download time. By amortizing time spent for a single DNS query across all requests, our approach virtually eliminates DNS query time overhead of the current approach. Although the portion of a DNS query time even in the current approach is relatively small compared to the time spending on the entire delivery process, exempting a client from a DNS query on every click during web browsing could contribute to better user experience.

136 Our approach also amortizes time spent for a relatively small number of TCP handshakes across the all the requests. Unlike DNS query time, TCP handshake is more expensive in terms of delay and resource consumption. Therefore, it contributes a significant portion to the entire delivery time. According to our result, the total TCP handshake time in our approach constitutes only 5.08% and 5.32% of the handshake time in the current approach with and without local edge servers respectively. In further details, Figure 6.5 shows the cumulative distribution graph of the ratio of per-client total TCP handshake time spent in the current approach over our ap- proach. The ratio more than 1 means a particular client in aggregate spends more time for the handshake in the current approach than in our approach. We can see that in networks both with and without local Akamai servers, over 90% of the clients spend more handshake time in the current approach than in our approach. In fact, 40% and 60% of the clients in networks with and without local Akamai server respectively spend at least 10 times more on handshake in the current approach. Our approach attempts to reduce the download time by increasing the average throughput of the download. To increase the throughput, we try to avoid the TCP slow-start period and TCP congestion window timeout occurrence. Every new TCP connection starts transferring packets in slow-start mode which is a sub-optimal throughput period. Our approach tries to utilize the existing TCP connection for HTTP requests as much as possible. Therefore, it results in fewer number of TCP connections needed for the same amount of requests and less time spending in the slow-start mode. When the TCP connection is idle for a certain period of time the congestion window is re-adjusted. The re-adjustment varies by implementations and it could range from reducing the window to going back to the slow-start. In most cases, the re-adjustment will decrease the throughput. By increasing connection utilization, our approach lowers chances of the re-adjustment.

137 These improvements in TCP dynamics lead to the 17.63% reduction in the down- load time component in a network without local Akamai edge servers (see the ”Down- load Time” line in Table 6.3). Even in a network with locally deployed Akamai servers, our approach still shows a small 2.71% improvement in download time. Note that in the replay experiment, within each flow, we make sure that each request is re- played at the same timestamp relative to the beginning of its flow as in the real trace. Consequently, idle periods are preserved, and the effect of TCP congestion window timeouts can be captured. Thus, our replay experiments also capture the perfor- mance implications of the TCP congestion window timeout in both current and our approaches. The key to this throughput improvement in both cases is the higher utilization of TCP connection. As seen from Tables 6.5 and 6.6, our approach achieves more than 20 and 24 times higher connection utilization than the current approach in the network with and without local Akamai edge server respectively. This translates to improvements in several components of an HTTP download performance, including DNS resolutions, TCP handshake, and download time, as shown in Tables 6.3 and 6.4. Overall, our replay experiment indicates that our approach can bring around 36% improvement in total time spent by clients during Web accesses for a typical network where no Akamai edge server is deployed locally. However, even in a network where Akamai has optimized their content delivery with the local edge server deployed, our approach still brings almost 22% improvement. Therefore, we conclude that our easy- to-adopt DNS-based approach is able to bring forth performance benefits to CDNs.

138 6.5 Discussion

6.5.1 Realization of this approach

In this study, we identify a possibility of improvement in the current approaches of co-location CDNs and propose a solution, which requires very small changes on the client network, to achieve this improvement. Our simulations and replay experiments quantify how much improvement Akamai would possibly have gained if this approach has been deployed on the CWRU campus network. According to our study, from the client perspective, Akamai will achieve a good improvement with our approach. Although our approach requires no change in the mechanism of Akamai platform, it affects how the platform manages its resources, loads, other variables. How should Akamai adjust its internal system management in order to embrace benefits from our approach? How does our proposed mechanism to improve download experiences of users from one organization affect the performance of the platform as a whole? These questions have been raised and need to be answered before the adoption of this approach. However, they cannot be answered without the knowledge of how Akamai internally allocates its resources, load balances its traffic, and manages its global platform. We have suggested an improvement strategy and quantified the improvement gain from a client point of view. The accurate evaluation of these issues require collaboration from a CDN provider. Since CDNs are responsible for a significantly large portion of web traffics today, the improvements promised by our approach will also bring the benefit to the Internet users in general as well.

139 6.6 Conclusion

Content delivery networks have become a crucial part of modern Internet infrastruc- ture. Just the leading CDN provider – Akamai – claims to be delivering 20% of all web traffic [9]. Improving the content delivery performance would contribute significantly to the Internet in a wide range. We propose a feasible solution to improve the content delivery performance from the user perspective without a need to modify content delivery platform. We verify our approach in two different ways. First, we simulate what would be happening if the same requests on CWRUnet campus were introduced on a hypothetical network that adopts our approach. Consequently, the improvement is apparent. This is a large campus-scale verification. In addition, we perform end-to-end verification by replay- ing the Akamai traffic pattern collected from a random subset of actual Internet users from a large campus network, comparing the current and our proposed approaches. We compare the replay results from both approaches and observe the encouraging im- provement of content delivery performance when adopting our approach. We achieve the improvement even in a network with local Akamai edge servers deployed. A proof-of-concept implementation of our approach and the live evaluation of it with actual users would be the final step before a wide adoption of our approach can be recommended. We leave this investigation as a future work.

6.7 Acknowledgement

The study described in this chapter would not be possible without support from CWRU Information Technology Services staff who deployed our measurement instru- mentation. In particular, we would like to thank Roger Bielefeld, Chet Ramey, Kevin Chan, and Jim Nauer for their help. We are especially grateful to Jim Nauer for administering the trace-collecting host and collecting the actual traces, as well as for

140 his expertise and patience in answering all our questions and requests throughout this collaboration.

141 Chapter 7

Conclusion

This thesis proposes and makes the case for a novel approach to a perennial problem in Internet measurements - how to acquire measurements that would be be repre- sentative of the Internet scale in coverage and diversity. Instead of attempting to build a global-scale measurement platform, our approach implements a matchmaking service, which uses P2P concepts to bring together experimenters in need of mea- surements with external measurement providers. In the peer-to-peer spirit, platform participants offer measurements for the benefit of being able to access measurements offered by other participants. Taking the above approach in mind, we proposed a DipZoom measurement plat- form, a system for facilitating diverse on-demand network measurements. The system addresses several key needs of networks researchers and designers, as well as IT spe- cialists. DipZoom makes it easy for Internet users to join the network of measurement providers. Even for IT specialists who do not require large-scale measurements, they could use DipZoom to troubleshoot or diagnose their systems. For example, DipZoom could be use to check whether a particular server/service is accessible from certain target networks or countries of interest. Second, it significantly lowers the bar for measurement requesters to stage an experiment. An experimenter has a consistent

142 view of the totality of all the measuring points currently available and has a coherent interface to discover the MPs that suits her needs. Further, one can script a complex and long-running experiment, launch it, and collect the results from one’s own com- puter using a general programming language (Java). To sum up, DipZoom makes the following main contributions.

• DipZoom attempts to lower the bar for the creation of new measuring points and hence simplifies the recruiting of new MPs. DipZooms MPs can easily be installed by downloading a file and a self-extracting installation script from a Web server. By simplifying the creation of MPs and making them maintenance- free, we hope to be able to attract diverse measuring points and facilitate more accurate measurements than are currently possible.

• DipZoom makes measurements accessible to a casual experimenter by offering a coherent simple interface to the entire platform.

• DipZoom makes it possible for the user to discover and request measurements from suitable MPs, using either a graphical DipZoom client or directly from a Java program; this feature benefits both casual and expert experimenters.

• DipZoom’s programmatic access to the system makes it simple to stage complex measurements. A complex measurement is just a Java program, which uses API calls defined by a DipZoom client library to interact with the system. During the experiment, the measurement is coordinated from the program running on the experimenters computer, which can go through arbitrarily complex steps of discovering MPs, obtaining measurements from them, and obtaining more measurements based on the analysis of the results.

Having designed and implemented the DipZoom, we made it publicly available in 2007. Ever since, DipZoom has been utilized in several measurement studies by our

143 group as well as outsiders such as Vytautas et al. [86]. We used DipZoom to conduct a series of studies on content distribution networks. More specifically, we looked at performance, security, and architectural improvements aspects of CDNs. First, we investigated a key architectural dilemma in CDN design: whether a CDN with network-core design can offer a comparable content delivery performance to a CDN with co-location design. Central to this study is the evaluation of the possibility of consolidating Akamai’s platform into fewer large data centers. From the study, we concluded that quite significant consolidation is possible without appreciably degrad- ing the platform performance. In addition, to our knowledge, our study is the first study to provide an independent direct estimate of the performance improvements of Akamai-accelerated downloads. We further investigated the quality of Akamai server selection at a large scale. Our second study concerns security implications of CDNs. In addition to perfor- mance improvement, CDNs are believed to protect their customers against a surge of traffic during flash-crowd and in this way protect their customers against a denial of service attack as well. In our study on CDNs’ security, we found a security hole that allows an attacker to launch a denial of service attack against Web sites. A key finding is that not only a CDN may not protect its subscribers from a DoS attack, but can actually be recruited to amplify the attack. We demonstrated this attack by using the Coral CDN to attack our own web site with an order of magnitude attack amplification. While we could not replicate this experiment on commercial CDNs without launching an actual attack, we showed that two leading commercial CDNs, Akamai and Limelight, both exhibit all the vulnerabilities required for this attack. In particular, we showed how an attacker can (a) send a request to an ar- bitrary edge server within the CDN platform, overriding CDNs server selection, (b) penetrate CDN caching to reach the origin site with each request, and (c) use an edge server to consume full bandwidth required for processing a request from the origin

144 site while expending hardly any bandwidth of its own. Finally, we proposed a feasible solution to improve the content delivery perfor- mance from the user’s perspective without a need to modify the content delivery platform. The central to this study is a quantification of the proposed approach to improve CDN. We proved our approach in two different ways. First, we simulated what would be happening if the same requests on CWRU campus network were in- troduced on a hypothetical network that adopts our approach. Our simulation shows a significant performance improvement. This experiment evaluates our approach at the scale of the entire campus but utilizes simulation to conduct the evaluation. Sec- ond, we performed end-to-end verification by replaying the Akamai traffic pattern collected from a random subset of actual Internet users from a large campus network, comparing the current and our proposed approaches. We compare the replay results from both approaches and observe the encouraging improvement of content delivery performance when adopting our approach. We achieve the improvement even in a network with local Akamai edge servers deployed. In summary, our CDN studies have demonstrated the utility of DipZoom in con- ducting large-scale network measurements. DipZoom allowed us to gain immediate access to a large number of measurement points, including some residential measure- ment points that would not be available in Planet Lab, and quickly script complex multi-stage measurements. These conveniences facilitated several significant insights into design, performance, and security of CDNs, which emerged as a key component of the global Internet infrastructure.

145 Bibliography

[1] Akamai: State of the internet report. http://www.akamai.com/stateoftheinternet/.

[2] The chromium projects.

[3] Keynote – the mobile and internet performance authority. http://www.keynote.com.

[4] Limelight’s technology. http://www.limelight.com/technology/.

[5] Nslookup. http://en.wikipedia.org/wiki/Nslookup.

[6] Ripe atlas! https://atlas.ripe.net.

[7] Traceroute. http://en.wikipedia.org/wiki/Traceroute.

[8] . http://www.akamai.com/html/technology/index.html.

[9] Akamai Technologies. http://www.akamai.com/html/perspectives/index.html.

[10] Kostas G. Anagnostakis, Michael B. Greenwald, and Raphael S. Ryger. Cing: Measuring network-internal delays using only existing infrastructu re. In INFO- COM’03, April 2003.

[11] D. G. Andersen. Mayday: Distributed Filtering for Internet Services. In 4th Usenix Symp. on Internet Technologies and Sys., Seattle, WA, March 2003.

146 [12] A. Biliris, C. Cranor, F. Douglis, M. Rabinovich, S. Sibal, O. Spatscheck, and W. Sturm. CDN brokering. In 6th Int. Web Caching Workshop, June 2001.

[13] C. Canali, V. Cardellini, M. Colajanni, and R. Lancellotti. Evaluating user- perceived benefits of content distribution networks. In Int’l Symp. on Per- formance Evaluation of Computer and Telecommunication Systems (SPECTS 2004), 2004.

[14] Robert Carter and Mark Crovella. Measuring bottleneck link speed in packet- switched networks. Technical Report 1996-006, 15, 1996.

[15] Robert Carter and Mark Crovella. Dynamic server selection using bandwidth probing in wide-area networks. In IEEE Infocom, 1997.

[16] The chromium porjects – dns prefetching. http://dev.chromium.org/developers/design-documents/dns-prefetching.

[17] Brent Chun, David Culler, Timothy Roscoe, Andy Bavier, Larry Peterson, Mike Wawrzoniak, and Mic Bowman. Planetlab: an overlay testbed for broad-coverage services. SIGCOMM Comput. Commun. Rev., 33(3):3–12, 2003.

[18] CIA The World Factbook. https://www.cia.gov/library/publications/the-world- factbook/print/xx.html.

[19] k. claffy. Internet traffic characterization. PhD thesis, UC San Diego, 1994.

[20] Edith Cohen and Haim Kaplan. Proactive caching of dns records: Addressing a performance bottleneck. In In Proceedings of the Symposium on Applications and the Internet, pages 85–94. IEEE, 2001.

[21] , Adero and America Online announce strategic content and technology distribution agreement. http://www.timewarner.com/corp/news- room/pr/0,20812,666754,00.html.

147 [22] The Coral content distribution network. http://www.coralcdn.org/.

[23] cURL - tool for transferring files with url syntax. http://curl.haxx.se/.

[24] Dimes (distributed internet measurements & simulations). http://www.netdimes.org/.

[25] DipZoom - Deep Internet Performance Zoom. http://dipzoom.case.edu.

[26] DipZoom API Documents. http://dipzoom.case.edu/documentation.html.

[27] Constantinos Dovrolis, Parameswaran Ramanathan, and David Moore. What do packet dispersion techniques measure? In INFOCOM, pages 905–914, 2001.

[28] Allen B. Downey. Using pathchar to estimate internet link characteristics. In Measurement and Modeling of Computer Systems, pages 222–223, 1999.

[29] ESI Language Specification 1.0. http://www.w3.org/TR/esi-lang, August 2001.

[30] Bradley Efron and Robert J. Tibshirani. An Introduction to the Boostrap. Chap- man & Hall/CRC, 1993.

[31] Paul England, Butler Lampson, John Manferdelli, Marcus Peinado, and Bryan Willman. A trusted open platform. Computer, 36(7):55–62, 2003.

[32] A. Feldmann, R. C´aceres,F. Douglis, G. Glass, and M. Rabinovich. Performance of web proxy caching in heterogeneous bandwidth environments. In INFOCOM, pages 107–116, 1999.

[33] Paul Francis, Sugih Jamin, Cheng Jin, Yixin Jin, Danny Raz, Yuval Shavitt, and Lixia Zhang. IDMaps: a global internet host distance estimation service. IEEE/ACM Trans. Netw., 9(5):525–540, 2001.

[34] Michael J. Freedman, Eric Freudenthal, and David Mazi`eres. Democratizing content publication with coral. In NSDI, pages 239–252, 2004.

148 [35] Gnutella. http://www.gnutella.com.

[36] Gnutella Topology Snapshots. http://mirage.cs.uoregon.edu/P2P/info.cgi.

[37] K.P. Gummadi, S. Saroiu, and S.D. Gribble. King: Estimating latency between arbitrary internet end hosts. In Proceedings of the Second SIGCOMM Internet Measurement Workshop, 2002.

[38] Intelligent content distribution service. AT&T Inc. http://www.business.att.com/service fam overview.jsp?repoid=ProductSub- Category&repoitem=eb intelligent content distribution&serv por- t=eb hosting storage and it&serv fam=eb intelligent con- tent distribution&segment=ent biz.

[39] V. Jacobson. Pathchar. ftp://ftp.ee.lbl.gov/pathchar.

[40] Manish Jain and Constantinos Dovrolis. End-to-end available bandwidth: Mea- surement methodology, dynamics, and relation with tcp throughput. In Proceed- ings of SIGCOMM, Pittsburgh, PA, August 2002.

[41] K. Johnson, J. Carr, M. Day, and M. Kaashoek. The measured performance of content distribution networks. In 5th Web Caching Workshop, 2000.

[42] R. Jones. Netperf. http://www.netperf.org/.

[43] J. Jung, B. Krishnamurthy, and M. Rabinovich. Flash crowds and denial of service attacks: characterization and implications for CDNs and web sites. In WWW, pages 293–304, 2002.

[44] Balachander Krishnamurthy and Jia Wang. On Network-Aware Clustering of Web Clients. In The ACM SIGCOMM, August 2000.

149 [45] Balachander Krishnamurthy, Craig Wills, and Yin Zhang. On the use and perfor- mance of content distribution networks. In The First ACM SIGCOMM Internet Measurement Workshop (IMW-01), pages 169–182, 2001.

[46] Srinivas Krishnan and Fabian Monrose. Dns prefetching and its privacy impli- cations: When good things go bad.

[47] Kevin Lai and Mary Baker. Nettimer: A tool for measuring bottleneck link bandwidth. In USITS, 2001.

[48] K.-W. Lee, S. Chari, A. Shaikh, S.Sahu, and P.-C. Cheng. Improving the re- silience of content distribution networks to large scale distributed denial of ser- vice attacks. Computer Networks, 51(10):2753–2770, 2007.

[49] D. Leonard and D. Loguinov. ”turbo king: Framework for large-scale internet delay measurements. In INFOCOM, 2008.

[50] Limelight networks. http://www.limelightnetworks.com/network.htm.

[51] Bruce Maggs. Personal communication, 2008.

[52] B. Mah. Estimating bandwidth and other network properties. In Interenet Statistics and Metrics Analysis Workshop, 2000.

[53] M Mathis. Diagnosing internet congestion with a transport layer performance tool. In INET’96, 1996.

[54] MaxMind. http://www.maxmind.com.

[55] P. Mockapetris. RFC 1035 Domain Names - Implementation and Specification. Internet Engineering Task Force, November 1987.

[56] Mike Muuss. Ping. http://directory.fsf.org/network/misc/ping.html.

150 [57] Charles J. Neerdaels. Extending an Internet content delivery network into an enterprise. Akamai Technologies U.S. Patent #7,096,266.

[58] T. Ng and H. Zhang. Towards global network positioning. In ACM SIGCOMM Internet Measurement Workshop, San Francisco,CA, November 2001.

[59] Jeffrey Pang, Aditya Akella, Anees Shaikh, Balachander Krishnamurthy, and Srinivasan Seshan. On the responsiveness of dns-based network control. In The ACM SIGCOMM Internet Measurement Conference, pages 21–26, 2004.

[60] C. Partridge, T. Mendez, and W. Milliken. RFC 1546: Host anycasting service, November 1993.

[61] V. Paxson. An analysis of using reflectors for distributed denial-of-service attacks. SIGCOMM Comput. Commun. Rev., 31(3):38–47, 2001.

[62] Vern Paxson, Jamshid Mahdavi, Andrew Adams, and Matt Mathis. An architec- ture for large-scale internet measurements. IEEE Communications, 36(8):48–54, 1998.

[63] Planetlab. http://www.planet-lab.org.

[64] Ingmar Poese, Benjamin Frank, Bernhard Ager, Georgios Smaragdakis, and Anja Feldmann. Improving content delivery using provider-aided distance information. In Proceedings of the 10th annual conference on Internet measurement, IMC ’10, pages 22–34, New York, NY, USA, 2010. ACM.

[65] http://www.porivo.com/.

[66] M. Rabinovich and O. Spatscheck. Web Caching and Replication. Addison- Wesley, 2001.

[67] http://www.rapidedgecdn.com/.

151 [68] The R project for statistical computing. http://www.r-project.org.

[69] Amit Ruhela, Rudra Tripathy, Sipat Triukose, Sebastien Ardon, Amitabha Bagchi, and Aaditeshwar Seth. Towards the use of online social networks for effi- cient internet content distribution. In Proceedings of the 5th IEEE International conference on Advanced Networks and Telecommunication Systems, ANTS’12, 2012.

[70] Amit Ruhela, Sipat Triukose, Sebastien Ardon, Amitabha Bagchi, Aaditeshwar Seth, and Anirban Mahanti. The scope for online social network aided caching in web cdns. In ACM/IEEE Symposium on Architectures for Networking and Communications Systems, San Jose, CA, USA, October 2013.

[71] Stefan Savage. Sting: a TCP-based network measurement tool. In USITS, 1999.

[72] Savvis Inc. http://www.savvis.net/corp/Products+Services/Digital+Con- tent+Services/Content+Delivery+Services/.

[73] F. Scalzo. Recent DNS reflector attacks. http://www.nanog.org/mtg- 0606/pdf/frank-scalzo.pdf, 2006.

[74] Salvatore Scellato, Cecilia Mascolo, Mirco Musolesi, and Jon Crowcroft. Track globally, deliver locally: improving content delivery networks by tracking geo- graphic social cascades. In Proceedings of the 20th international conference on World wide web, WWW ’11, pages 457–466, New York, NY, USA, 2011. ACM.

[75] Seti@home. http://www.seti.org/.

[76] Hao Shang and Craig E. Wills. Piggybacking related domain names to improve dns performance .

152 [77] Neil Spring, David Wetherall, and Tom Anderson. Scriptroute: A public internet measurement facility. In Usenix Symp. on Internet Technologies and Systems, 2003.

[78] A.-J. Su and A. Kuzmanovic. Thinning Akamai. In ACM IMC, pages 29–42, 2008.

[79] Ao-Jan Su, David R. Choffnes, Aleksandar Kuzmanovic, and Fabi´anE. Busta- mante. Drafting behind akamai (travelocity-based detouring). In SIGCOMM, pages 435–446, 2006.

[80] Ao-Jan Su and Aleksandar Kuzmanovic. Thinning akamai. In The 8th ACM SIGCOMM Internet Measurement Conference, pages 29–42, October 2008.

[81] A. Tirumala, F. Qin, J. Dugan, and J. Ferguson. Iperf, 2002. http://dast.nlanr.net/Projects/Iperf/.

[82] Traceroute@home, Laboratoire lip6 – CNRS. http://tracerouteathome.net/.

[83] S. Triukose, Z. Al-Qudah, and M. Rabinovich. Content delivery networks: Pro- tection or threat? In ESORICS, pages 371–389, 2009.

[84] S. Triukose, Z. Wen, and M.Rabinovich. Content delivery networks: How big is big enough? (poster paper). In ACM SIGMETRICS, Seattle, WA, June 2009.

[85] Sipat Triukose, Zhihua Wen, and Michael Rabinovich. Measuring a commercial content delivery network. In Proceedings of the 20th international conference on World wide web, WWW ’11, pages 467–476, New York, NY, USA, 2011. ACM.

[86] Vytautas Valancius, Nikolaos Laoutaris, Laurent Massouli´e,Christophe Diot, and Pablo Rodriguez. Greening the internet with nano data centers. In Proceed- ings of the 5th international conference on Emerging networking experiments and technologies, pages 37–48. ACM, 2009.

153 [87] R. Vaughn and G. Evron. DNS amplification attacks. http://www.isotf.org/news/, 2006.

[88] L. Wang, K. Park, R. Pang, V. S. Pai, and L. Peterson. Reliability and security in the CoDeeN content distribution network. In USENIX, pages 171–184, 2004.

[89] Zhihua Wen. PROVIDING COHERENT INTERNET MEASUREMENT IN- TERFACE AND ITS APPLICATION IN NETWORK DISTANCE ESTIMA- TION. PhD thesis, Case Western Reserve University, 2009.

[90] Zhihua Wen and Michael Rabinovich. Network distance estimation with dynamic landmark triangles. In 2008 ACM SIGMETRICS Int. Conf. on Measurement and Modeling of Comp. Sys., pages 433–434, 2008.

[91] Zhihua Wen, Sipat Triukose, and Michael Rabinovich. Facilitating focused in- ternet measurements. In SIGMETRICS ’07, pages 49–60, 2007.

[92] GNU Wget. http://www.gnu.org/software/wget.

[93] Qiang Xu, Junxian Huang, Zhaoguang Wang, Feng Qian, Alexandre Gerber, and Zhuoqing Morley Mao. Cellular data network infrastructure characterization and implication on mobile content placement. In Proceedings of the ACM SIGMET- RICS joint international conference on Measurement and modeling of computer systems, SIGMETRICS ’11, pages 317–328, New York, NY, USA, 2011. ACM.

154