Content Delivery Networks(CDN) currently utilise the to direct end users towards optimal CDN replica servers. However, current CDN redirection mech- anisms do not adequately take into account DNS caching performed by end users. Specifi- cally, end-users may cache DNS answers that later lead to unoptimal CDN nodes, possibly leading to a decreased browsing experience. This document proposes a CDN redirection mechanism for ISP operated CDNs that overcomes this problem. By utilising SDN tech- niques it is possible for ISPs to perform end-user redirection at a TCP flow-based level, without relying on the end-user having up-to-date DNS information. This method is shown to demonstrate improvements in several web-browsing performance parameters such as access-delay time and page-load time. Thesis title: Request rerouting for ISP operated CDNs Topic number: TM125

A. Problem statement ISP operated CDNs and CDN-ISP collaborations are becoming increasingly common. These CDNs are in a unique position to be able to apply better request rerouting mechanisms, as well as utilise cloud and virtualised servers to rapidly create additional infrastructure as needed. There are currently no request rerouting mechanisms that can transparently reroute a user from one CDN node to another, if they have stale DNS information. Stale DNS information may arise when users cache DNS information, or if network topology changes occur that render their current information stale.

B. Objective Perform an analysis on the effects of client side DNS caching on current DNS-based redirection mechanisms Design and implement a request rerouting mechanism that can reroute users in case of stale DNS information, as well as being able to utilise the creation of new CDN replica servers via cloud services.

C. My solution DNS request rerouting mechanism already exists, and is proven to work – Why change what works? Propose that we augment the current DNS request rerouting mechanism with SDN techniques Using OpenFlow flow rules, detect if users are going towards an IP address belonging to a CDN we know about. Check if these users are going to the ideal CDN replica server mapped for their location (That we know since we are their ISP) If they are not going to the optimal CDN node, then perform a DNAT on their outgoing packets, and a SNAT on packets coming back from the CDN If they are going to the right CDN node, then forward and route as usual

D. Contributions (at most one per line, most important first) We have successfully created a request rerouting mechanism that augments the current DNS redirection mechanism and achieves the goals listed above. Performed a first of it's kind analysis on the effects of client side DNS caching on CDN performance.

E. Suggestions for future work Analyze the scalability of the mechanism that has been proposed by this thesis Generate a request rerouting algorithm that works efficiently with the mechanism proposed by this thesis

14 Problem Statement 14 Objective

Theory (up to 5 most relevant ideas) 10 Current CDN redirection mechanisms have limitations 12 Software defined networking allows me to achieve my design goals 15 Analysis of possible ways to achieve my design goals 17 The design and theory behind my chosen solution

Method of solution (up to 5 most relevant points) 19 Description of my solution 20 Description of the setup of my solution 21 Description of the testing framework for testing my solution

Contributions (most important first) 35 Demonstrate that my controller works to the design goals 28 Create testing procedures to demonstrate the effects of DNS caching on CDN performance

My work 21, 28 Testing procedures described, testing framework described 41 Appendix contains my code for my controller Description of procedure (e.g. for experiments)

Results 31 Succinct presentation of results 35 Analysis 35 Significance of results

Conclusion 38 Statement of whether the outcomes met the objectives 38 Suggestions for future research

Literature: (up to 5 most important references) 3 “Pushing cdn-isp collaboration to the limit” 5 “The akamai network: A platform for high- performance applications” Abbreviations

API Application Programming Interface AS Autonomous System BGP Border Gateway Protocol CDN Content Distrubution Network CLI Command Line Interface DNS Domain Name System EDNS Exenstion mechanisms for DNS HTTP HyperText Transfer Protocol IP Internet Protocol ISP Internet Service Provider Mbps Megabits per second NAT Network Address Translation OS Operating System PoP Point of Presence QoE Quality of Experience QoS Quality of Service REST Representational State Transfer RTT Return Trip Time SDN Software Defined Networking TCP Transport Control Protocol TTL Time To Live URL Uniform Resource Locator

3 Contents

1 Introduction 8

2 Background 10 2.1 Content Delivery Networks ...... 10 2.1.1 ISP operated CDNs ...... 11 2.2 Request rerouting ...... 12 2.2.1 Current request rerouting mechanisms ...... 12 2.2.2 Limitations of current request rerouting mechanisms ...... 13 2.3 Software Defined Networking ...... 15 2.4 Literature review ...... 15

3 Request Rerouting Mechanisms for ISP Operated CDNs 17 3.1 Problem statement & design goals ...... 17 3.2 Analysis of possible mechanisms ...... 18 3.2.1 Anycasting CDN nodes ...... 18 3.2.2 Using DNS augmented with NAT ...... 20 3.3 Experimental set up of a DNS request rerouting mechanism augmented with NAT ...... 24 3.3.1 Testing framework ...... 24 3.3.2 Testing procedures ...... 31

4 Results & Performance 34 4.1 Request rerouting after link failure ...... 34 4.2 Request rerouting during link congestion ...... 35 4.3 Request rerouting with the introduction of a new CDN replica server . . . 36

5 Evaluation 38 5.1 Results ...... 38 5.2 Discussion ...... 39

6 Conclusion 41 6.1 Future Work ...... 41

Bibliography 42

4 Appendix 1 44 .1 An example of a DNS request for, a CDN hosted . . . 45 .2 HTML for ...... 45 .3 The Ryu controller for my thesis...... 45 .4 The mininet topology for my thesis ...... 50

5 List of Figures

2.1 A visit to yields HTTP requests to various CDN hosts, as seen in the left-hand column...... 11

3.1 Our network topology diagram ...... 25 3.2 A Trace demonstrating the typical NAT procedure that happens to a TCP flow...... 28

1 A wireshark trace of a DNS request for ...... 45

6 List of Tables

2.1 Browser usage by market share ...... 14

3.1 Design requirements fulfilled by utilising an IP Anycast method ...... 20 3.2 Table miss flow that catches everything. Used as part of learning switch functionality ...... 26 3.3 Table miss flow that catches traffic from h1( to the CDN replica servers...... 26 3.4 An example forwarding flow inserted into the switch in the case that there is no need to further redirect the end user. Note that IP destination and TCP source port values are context dependant, the values here are just examples...... 26 3.5 Example flows inserted into the switch in case there is a need to further redirect the end user. Note that as with the previous table, the IP address and TCP port values are arbitrary in this case, but remain relevant between the DNAT and SNAT rules...... 27

4.1 Time taken to fully traverse and load the testpages, in seconds. 34 4.2 Time taken to fully traverse and load the testpages, in seconds. Also shown is time delay until the client connects to the optimal CDN server over the congested one...... 35 4.3 Time taken to fully traverse and load the testpages, in seconds. 36

7 Chapter 1


Content Delivery Networks are finding an increase in usage in today’s internet landscape. In response to a growing demand for high definition content, Internet Service Providers are also starting to provide and host their own CDNs [1, 2]. Large CDN providers such as Akamai have also started collaborating with ISPs to provide better services to their end users [3]. These ISP operated CDNs are in a unique position wherein they control the edge routers that sit at the last hop to their users. Because of this, ISP operated CDNs1 have a larger amount of control over how their end users are redirected towards CDN nodes. Currently, hostnames for CDN hosted are resolved using DNS services that provide end users with an IP address that maps to the optimal CDN node for that user at the time of request [4]. However, end user caching of DNS information can lead to an end user accessing a nonoptimal CDN node. To illustrate, we show in this thesis how a website’s Quality of Experience is affected by end user caching of DNS information in modern web browsers and operating systems. The results of this testing indicate that indeed, end user DNS caching has a negative effect on website QoE. As such, DNS caching also inhibits the rapid startup and application of cloud services to adapt to changes in network topology and congestion, possibly inhibiting future ISP CDN collaboration [3]. By utilising newly emerging Software Defined Networking techniques, we demonstrate that it is possible for ISP operated CDNs to overcome the issues caused by end user caching of DNS information. By using a topologically aware network controller, NAT rules are applied to end user TCP flows, providing transparent redirection in cases where users are utilising stale DNS information. This method is applicable to not only wholly owned ISP CDNs, but also for ISP CDN collaborations. In summary, the contributions of this thesis to the area of CDN redirection are: • We demonstrate the problems associated with client side caching of DNS responses to requests for CDN information.

• We design and implement a novel redirection mechanism for ISP operated CDNs, involving the use of newly emerging Software Defined Networking techniques to augment traditional DNS mechanisms with NAT operations.

1For the purpose of this document, the term ‘ISP operated CDN’ refers to both CDNs wholly owned and run by an ISP, as well as ISP CDN collaborations.

8 Chapter 2 provides the background and literature review for this document. Chap- ter 3 explores possible request rerouting mechanisms available to ISP operated CDNs, and describes the method and testing procedure used for the request rerouting mecha- nism proposed by this thesis. Chapter 4 shows the results from testing and executing the proposed CDN redirection method. Chapter 5 evaluates and discusses these results. Chapter 6 provides a summary of the findings of this thesis, and suggests areas of future research and development.

9 Chapter 2


2.1 Content Delivery Networks

CDNs have become the primary providers of content in today’s internet environment [5, 6]. With rapid changes and increases in end user content consumption, CDNs have been increasing their network footprints to keep up with traffic requirements. Current CDNs aim to serve content from content providers to end users in an efficient manner. While the principles of operation between different CDNs may differ, their overall structure is the same: replicas of content are pushed from the content provider’s origin server to replica servers across the CDN’s network. When an end user makes a request for that CDN hosted content, it is served by a replica server belonging to the CDN. This replica server is determined by a variety of factors such as current network load, content availability, geographical proximity, and latency. This provides many benefits for both the end user and the content provider, such as load-balancing and timeliness of delivery [7]. Primarily, an end user accesses CDN hosted content by visiting a website. With current CDN models, websites are either completely hosted by the CDN, or contain links to CDN hosted content 1. CDNs are now used to host a majority of content on the web. In 2012, a survey of the top 1000 most popular sites was performed by John et al. [4], which revealed that all the sites in the top 10 and over 70% of those in the top 1000 relied on CDNs. With the prevalence of cross-service APIs and resources, and links for ‘sharing’ content onto social media sites such as twitter and facebook, a single site can create HTTP requests for multiple CDNs. This can can be seen in figure 2.1 below, where requests for not only abc’s CDN are made, but also CDNs for advertising, social media, and analytics networks as well. 2

1For example, text on the website may be hosted on the origin server, but URLs for images and videos may lead to CDN replica servers. 2These do not necessarily go to individual CDN providers. and are both served by Akamai, for example.

10 Figure 2.1: A visit to yields HTTP requests to various CDN hosts, as seen in the left-hand column.

2.1.1 ISP operated CDNs As user demands for internet hosted content continue to grow [8, 6], ISPs have started to roll out their own CDNs to keep up [2, 1]. Previously, ISP operated caches were operated for the same effect [9], but not to the extent and sophistication of a CDN. Aside from customer Quality of Experience, there is a networking motivation for ISPs to operate their own CDNs, such as having lower inter-networking traffic, thus reducing the cost and requirements of building and maintaining inter-network infrastructure [10]. ISPs have also started collaborating with larger commercial CDN providers such as Akamai [3], and Netflix [11], to achieve the same benefits described above. ISP operated CDNs have various qualities that make them distinct from other CDN providers. Because ISPs operate as the ‘last mile’ network for their customers, they can control the specifics of how their clients are routed towards CDN replica servers with a higher level of control than other CDN providers. Also, being the network operator, they have access to network statistics that are otherwise unavailable to other CDN providers, such as link congestion, peak traffic, and the exact network location of their customers. Because of these qualities, ISP operated CDNs can utilise request routing mechanisms

11 that are otherwise unavailable to traditional CDNs. These ISP operated CDNs can also better utilise technologies such as cloud services to rapidly and dynamically create replica servers as demand requires [3].

2.2 Request rerouting

In order for CDNs to provide clients with the content they request, clients must first be directed to the server most appropriate for them. This is referred to as ‘Request Rerouting’. A request rerouting system is composed of two parts - A request rerouting algorithm that is used to decide which server address to provide for user requests, and a request rerouting mechanism that actually provides users with said addresses [12]. This thesis focuses on the request rerouting mechanisms for ISP operated CDNs.

2.2.1 Current request rerouting mechanisms Augmented DNS While multiple approaches to CDN request rerouting currently exist, companies predom- inantly use augmented DNS servers, or HTTP-based redirects [12, 13]. CDNs that use DNS based request rerouting mechanisms utilise augmented DNS servers that act as the authoritative domain for their customers 3. With this method of request rerouting, when an end user’s browser visits a website hosted by the CDN, it must first make a DNS request for the IP address corresponding to that hostname. For example, when an end user wishes to visit, their browser first makes a DNS request in order to create an IP address/Hostname mapping 4. When the authoritative DNS name server for that website receives the request 5, it uses parameters such as latency, server load, and the geographical location of the end user to determine which replica node can provide the best QoE to the end user. By altering the responses given by their DNS servers, CDNs utilising this request rerouting mechanism can choose which content servers provide content to end users. URL redirection DNS is also used in a request redirection mechanism known as URL redirection. In this mechanism, when a browser initially resolves a URL, they make an initial GET request towards the IP address that maps to that URL. The server at that IP address then replies in a manner that causes the client to be directed to another URL either by a HTTP status code, Javascript redirect, or other methods[14]. For example, when a user visits “” on their browser, they are provided with a “HTTP status 301 Moved Permanently” status code, along with a URL pointing to “”,

3Akamai is an example of a CDN provider that does this. [5] 4A wireshark trace of a DNS request chain for htpp:// has been provided in the appendix. 1 5This assumes that the client’s original DNS resolver did note return a cached result. More detail is provided in Section 3.2.2

12 and the browser automatically redirects to the new URL. Unfortunately, this method incurs extra access delay time for each element that is accessed, as the initial GET request is essentially wasted because it only serves to provide a redirect towards another URL. Additionally, another DNS request has to be performed to resolve the IP address of this new URL 6. This is okay for large blocks of content such as videos as it only incurs a once off delay, but undesirable for smaller content such as images. IP Anycast Another request rerouting mechanism involves allocating an anycast IP address [15] to multiple CDN replica servers. In this mechanism, all DNS servers reply with the same IP address for CDN content servers. When a client makes a content request to using that IP address however, they are then routed to the closest CDN replica server due to the nature of IP Anycasting.

2.2.2 Limitations of current request rerouting mechanisms While current mechanisms of request rerouting work, they invariably suffer from inherent limitations due to both the nature of their implementation, and a lack of network metrics available to traditional CDN operators [3]. DNS caching Specifically, a DNS based request rerouting mechanism relies on end users actually making DNS requests, in order to provide them with an IP address mapping to an optimal CDN replica server. While this is fine in practice, it does not take into consideration caching of DNS information by end users. For example, if a user visits a CDN hosted website and caches the DNS information provided by the CDN’s DNS server, then future requests for content hosted on that website will not generate a DNS request until the cache entry times out. This can cause issues if a more optimal CDN replica server becomes available during the meantime. Currently, CDN operators assign very low TTLs to their DNS entries in order to work around this [5], typically under a minute 7. However, nearly all internet users are currently in situations where these TTLs are not honoured: In September 2014, the w3schools organisation reports these statistics [16] on browser usage:

6The HTML for is provided in the appendix .2 7For example, performing a dns request for reveals a TTL of about 20 seconds on their responses

13 Browser Market share Chrome 59.6% 24.0% 9.9% 3.6% 1.6%

Table 2.1: Browser usage by market share

Out of these browsers:

• Chrome caches DNS entries for 60 seconds or the DNS TTL, whichever is lowest [17]

• Firefox caches DNS entries for 60 seconds [18]

• Internet Explorer caches DNS entries for 30 minutes [19]

• Opera caches DNS entries for 10 minutes [20]

From the referenced technical documentation, it is evident that the vast majority of DNS requests for CDN hosted content have their responses cached. If there are any other possible DNS caches that are being utilised, this amplifies the effect even further. Chrome, Firefox, and Internet Explorer also perform ‘’ [21, 22, 23], where DNS requests are made before a user actually follows those links. A browser achieves this by looking at the links on a page and performing the DNS requests for their hostnames. These browsers will also perform DNS requests during startup, to reduce the overall load time of a user’s most commonly accessed websites. The above two effects of DNS caching and link prefetching compound together to create the following possible scenario: It is possible for the IP address of a hostname to change, while an end user is still using the ‘old’ IP address. For example, if a user launches their browser, generating a few DNS requests via link prefetching, and only visits the websites that were prefetched a minute later, it is possible that the IP address they are using is stale. This can lead to QoE degradation for the user: Content may be out of date, or the CDN replica server may be congested, for example. Because of this, any serious attempt at improving the CDN redirection mechanism has to take into account user end DNS caching and link prefetching. Failure to do so implies that any changes to the CDN can take up to a minute or more for its effects to be felt for all users of the CDN. Problems with anycast The anycast method for directing users to CDN nodes, while largely immune to the problems with DNS caching as described above 8, still suffers from issues that make it inappropriate for ISP operated CDNs. Because IP anycasting routes users to the closest network that advertises a BGP route for that IP address, it fails to take into consideration

8Since DNS requests for anycasted CDN nodes all point to the same anycast IP address, these IP addresses rarely, if ever, change.

14 other parameters that are important for a good QoE, such as network congestion. This can lead to additional network bottlenecks [3], and cause extra congestion on already congested links. In order for current anycasting mechanisms to redirect clients from a congested server to another, less congested server, they must either be redirected from their originally requested server (Which adds to server load and latency), or changes to BGP routing tables must occur, which can take a relatively long time to propagate across a network [15]. Additionally, the configuration of an anycast mechanism can be different for each network. BGP routes have to be configured, and available IP spaces considered. Indeed, even industry-leading CDN operaters such as MaxCDN [24] and Cloudflare [25] claim “It is not easy to setup a true Anycasted network” and that IP anycasting is “complex”

2.3 Software Defined Networking

Software defined networking is a new networking paradigm that decouples the control plane in a network from the data plane. Instead of routing decisions being made by a layer 2/3 device, these decisions are instead made by a controller that has a view of the entire network. This overarching controller then installs flow rules into switches on the network, which allows it to perform networking functions similar to current networks. This allows network operaters to make rapid, adaptable changes to a network as needed. Currently, the most popular SDN protocol used for this purpose is OpenFlow [26], which defines the communication between an SDN controller and the switches that it governs. By using OpenFlow, network researchers can perform networking experiments and achieve network functionality in an easily reproducible and flexible manner. Because OpenFlow provides software abstractions and APIs for hardware functions, it becomes easier to programmatically apply network rules and functionality from a centralised location. In this thesis, SDN and OpenFlow is used to dynamically create the redirection rules that direct end users to more optimal CDN nodes. A controller using the open source Ryu framework [27] communicates to an OpenFlow capable switch to push flow rules as needed.

2.4 Literature review

Much work has been done on CDNs [5, 7] in the last decade, due to the large effects they have on increasing the effectiveness of content delivery. However, despite the fact that DNS caching has been around since CDNs were first established, published research on the effects of DNS caching by end users has not yet been explored in depth. A recently established working group in the IETF are developing a set of standards that define interconnections and collaborations between CDNs called the CDNI [28]. Work has been done within this set of standards to provide CDNI request routing using SDN meth- ods [29]. However, it focuses on the request routing between upstream and downstream CDN providers in a CDNI, and not between an end user and CDN replica nodes. As discussed in section, a current request redirection mechanism for CDNs is IP anycasting. Current research by Yulei Wu et al. [30] demonstrates an SDN implemen-

15 tation of IP anycast that overcomes IP anycast’s main issue of being unaware of network conditions such as congestion, a so called ‘load-aware’ anycast solution. While this im- plementation would work fine for a CDN that is wholly owned and operated by an ISP, it remains inflexible and unscalable for ISP CDN collaborations. Further analysis on this matter is provided in Section 3.2.1. OpenCache [31], an SDN caching platform, has recently been developed and is cur- rently under research. However, it is more focused on providing a caching solution for high quality streaming video. They claim that their OpenCache system can be used to enhance CDN ISP collaborations though, and indeed, work from this thesis could be combined with OpenCache to provide tools for a unified CDN collaboration/ISP CDN framework. Also, their implementation involves flow granularity at the IP level, whereas our request rerouting mechanism is based off TCP sessions. Work done by Akamai[3] on CDN ISP collaboration has shown that a DNS based system can adequately perform user server assignment in ISP CDNs. It also discusses the viability of rapidly ‘spinning up’ virtual cloud servers for the purpose of acting as CDN replica servers in less than a minute. This confirms that having a request redirection mechanism that can actively redirect users to more optimal CDN replica nodes in under a minute is a desirable outcome to achieve. An example of an existing ISP operated CDN is one operated by Comcast [32], where they use a combination of DNS and URL redirection techniques to direct their customers to suitable caches for their requests. However, their DNS mechanism does not avoid the limitations associated with user side caching. Also, their network is focused on utilising static caches in pre-established locations, with no mention on the possibility of utilising a virtualised replica node.

16 Chapter 3

Request Rerouting Mechanisms for ISP Operated CDNs

3.1 Problem statement & design goals

In order for a request rerouting mechanism to be successful, it should fulfill the following criteria:

• Be transparent to the end user and require no extra configuration client-side

• Successfully provide clients with addresses to content servers

• Provide clients with access to optimal servers for their content requests

• Be failure resilient

These are the base requirements that all current request rerouting mechanisms aim to achieve [12, 3]. Also, as detailed in Section 2.2.2, current CDN mechanisms:

• Are unable to provide rapid server changes to end users due to the effects of DNS caching (In the case of augmented DNS)

• Are unable to fully utilise network information available to a network operator (In the case of IP anycasting)

• Require complicated configuration to set up (In the case of IP anycasting)

As such, any request rerouting mechanism created as the result of work performed under this thesis should also fulfill the following requirements:

• In the case of DNS caching and/or prefetching and network changes that may hap- pen during that caching time-frame, be able to continually and transparently reroute users to the optimal CDN replica node for their request.

• Be able to fully utilise information available to the network operator

17 • Minimise the configuration necessary to implement our proposed mechanism, so as to maximise ease of ISP-CDN collaboration and ISP-CDN establishment.

Because the scope of this thesis is to provide a request rerouting mechanism for ISPs, it is also desirable to define a set of design goals that enable a simpler, more attractive implementation of our proposed mechanisms for ISPs:

• Be incrementally implementable, to lower implementation costs and reduce network downtime

• Be able to utilise and act on the ability to rapidly create virtual CDN replica servers using cloud services1

• Be scalable for network and service expansion.

• Optionally, be able to provide access to the CDN for end users that are not customers of the ISP.

These requirements will be used as a guideline throughout the rest of this document in design, execution, and results evaluation.

3.2 Analysis of possible mechanisms

In this section, an analysis of possible methods of performing request rerouting for ISP CDNs is performed. While a NAT-based augmentation of DNS services is the method proposed by this thesis, a detailed justification of this proposition is provided.

3.2.1 Anycasting CDN nodes One of the initial considerations when beginning this thesis project was to utilise IP any- casting to perform CDN rerouting. However, further analysis of this mechanism revealed that it would be unsuitable for our required goals. What follows is the analysis that led to this decision:

Anycasting allows for a one-to-many association of a hostname to CDN replica server endpoints. That is, when a browser makes a DNS request for CDN hosted content, they will always receive an anycasted IP address that leads to the closest replica server that responds to that address. This makes the issues caused by DNS caching non-existent, as the IP address in an anycasting setup should ideally never change. However, this is also a drawback; because IP anycasting only directs requests towards the closest server replica, it does not consider server load. Benefically however, this also means that clients using remote DNS services 2 will still get directed to a CDN replica server close to them without requiring the use of EDNS.

1As per recommendations in Akamai’s ISP collaboration paper[3] 2Services such as OpenDNS, as opposed to the DNS servers provided by an ISP

18 Additionally, the nature of IP anycasting creates a potential to pollute the global routing system, as it relies on BGP to function. This can lead to major network incidents that affect uptime: An example being in October of 2014 when a Latin American ISP accidentally directed traffic originally destined for Cloudflare to concentrate on a single data center, quickly overwhelming it [33]. This incident was caused by a false BGP announcement, where the Latin American ISP announced “via BGP that their network, instead of ours, handled traffic for CloudFlare”. It is important to note that failures of this type leads to a complete loss of access to the CDN datacenter throughout the duration of the failure. In this incident, access to Cloudflare’s services was lost for 49 minutes in not just Latin America, but also other parts of the world: “Traffic to CloudFlares customers sites dropped by 50% in North America and 12% in Europe.” This is a catastrophic failure for a service that aims to increase the availability of content hosted by it. This reliance on BGP means that an IP anycasted network will also require custom configuration and set up per network. As stated in industry blogs for leading CDNs that utilise IP anycasting themselves, this configuration can be complex and hard to setup [24, 25], which may be offputting for ISPs looking to lower costs. With an increasing trend in utilising cloud services [3], the use of cloud services to rapidly establish CDN replica servers is becoming a distinct possibility. This can cause issues with IP anycasting however; because IP anycasting directs packets to the nearest server that responds to that anycasted IP address, this will cause a newly created server to ‘suck in’ requests from nearby clients. For example, if a client is streaming video from server A that is 10 hops away, and then server B is populated at a location 7 hops away, then further requests from that client will be forwarded to server B, causing their video streaming session to close suddenly, decreasing QoE for the client. Past research by Hitesh Ballani et al. [34] indicates that this behaviour is to be expected in current anycast systems. While current CDN operators that implement IP anycast may have techniques to avoid this form of route flapping, they are currently unavailable to the author, and so will not be considered for this facet of IP anycasting. In summary, IP Anycasting in it’s current state has the following advantages and disadvantages 3: Advantages: Disadvantages:

• Catastrophic failure modes 4

• Not affected by client-side DNS • Unable to utilise cloud services due to caching route flapping • Unable to utilise network information, such as congestion • Not affected by the use of remote DNS • Requires a lot of configuration 4 servers • Possibly affected by BGP pollu- tion/route leakage 3Ignoring the inherent advantages and disadvantages that come with using a CDN, such as increased availability of content 4Compared to a DNS based mechanism, below 19 In summary, using an IP Anycasting mechanism fulfills the following design goals 5:

Able to work around DNS caching/prefetching ! Able to fully utilise network information ![30] Easy to configure # Incrementally implementable # Able to utilise and act on the ability to rapidly create virtual CDN replica servers # Scalable for network and service expansion ! Able to provide access to the CDN for end users that are not customers of the ISP !

Table 3.1: Design requirements fulfilled by utilising an IP Anycast method

While an SDN implementation of IP anycast could work around a few of the disad- vantages such as the inability to utilise network information [30], the other inherent dis- advantages of IP anycasting heavily outweigh the advantages that IP anycasting provides. Namely, the inability to utilise cloud services to rapidly create extra network resources as necessary 6, and the heavy configuration requirements mean that IP anycasting in it’s current state is not suitable for the rapidly expanding content delivery requirements [8] of ISP networks in the future, in the context of ISP-CDN collaborations.

3.2.2 Using DNS augmented with NAT The request rerouting mechanism proposed by this thesis utilises current DNS-based CDN rerouting mechanisms, but augments it with NAT to overcome the issues of client-side DNS caching. This subsection contains the analysis that led to the decision to use this method, as well as a detailed explanation of said method. An analysis on the worthiness of creating such a request rerouting mechanism is also explored.

As mentioned in Chapter 2, a currently popular method for performing CDN rerout- ing is with augmented DNS servers. Current implementations of DNS-based rerouting mechanisms already fulfill most of the design goals mentioned in the start of this chapter; indeed, some current ISP CDN implementations utilise augmented DNS [32] to perform request rerouting, indicating that CDN rerouting with DNS is at least ‘good enough’ for production ISP CDN networks. Knowing that DNS-based rerouting is a already a viable option for ISP CDN request rerouting, it is now important to determine if an improved request rerouting mechanism is worth both development and implementation in ISP operated CDNs. The main limitation with current DNS request rerouting mechanisms identified in this thesis is the fact that clients perform DNS caching that does not honour TTLs set by the CDN’s DNS resolver. This limitation implies that for an ISP wishing to utilise rapid establishment of CDN replica servers, their users potentially will only see the improvements of such an action

5Ignoring the base requirements needed for a successful CDN rerouting mechanism as outlined in the beginning of this chapter 6Which Akamai encourages [3] for future CDN collaborations

20 minutes after it was put into effect. Similarly, this also means that it may take over a minute to simply reroute users to other CDN replica nodes that already exist in the network. This can potentially lead to windows where end users are not being provided with the best achievable QoS that the network can provide, despite those resources existing. Given that one of the aims for current ISP CDNs is to be able to deliver video content such as Live TV and Video On Demand [2, 32], lowered QoS attributes such as access delay time and bitrate can harshly lower customer satisfaction and engagement in ISP networks. A study done by Florin Dobrian et al. [35] indicates that a 1% increase in buffering ratio 7 can reduce user engagement by more than 3 minutes for a 90-minute live video event. As user expectations for network performance increase over the years[8], this will become an even more important factor in customer satisfaction. Scenarios where an ISP may want to rapidly establish extra CDN replica servers, or otherwise reroute users away from the node they are currently connected to include:

• Flash floods, where a sudden influx of users (say, to watch a live sports event) can potentially overload a CDN cache location

• Sudden link/datacenter failure

• Sudden availability of more current content

• Wanting to redistribute link congestion across the network (maybe to ensure SLAs for other ISP customers)

As a real life example of the importance of being able to handle the first point above, an outage on the HBO GO network during a season premiere of Game of Thrones in April, 2014 affected nearly a million people [36]. The amount of public negativity that occcurred after this event demonstrates how lack of accessibility during key moments can have a large impact on a network’s reputation 8. Thus, it would be ideal to be able to minimise the possible negative effects of flashfloods on CDNs. To that end, it is clear that implementing a CDN rerouting mechanism that can accommodate a rapid introduction of new CDN replica nodes has value in an ISP CDN. Research done by Akamai [3] indicates that it is possible to “add, remove, and migrate live servers on demand in less than a minute”, indicating that it would be possible for a ISP CDN network to overcome incidents like the one above by allocating extra CDN replica servers as demand increases. Recent work by Justine Sherry et al. [37] indicates that it is indeed practical and feasible for enterprise networks to outsource middlebox processing to the cloud. Also, they indicate that “To allow redirection using multiple cloud PoPs, we can rely on DNS- based redirection similar to its use in CDNs” So, knowing that: a) It is important to keep QoS high for the types of content that an ISP CDN is likely to provide; b) It is possible to keep QoS high by rapidly outsourcing cloud services to spin up Virtual Machines to run CDN replica servers; and c) DNS caching means that any changes to network topologies can potentially take over a minute to be

7The amount of time a video is playing as opposed to sitting idle and buffering 8This is becoming increasingly true with the advent of social media platforms such as twitter where disenfranchised patrons can vent into the public space

21 fully noticed by an end user, it is therefore both desirable and effective to have a request rerouting mechanism that can overcome the effects of DNS caching by end users. With current augmented DNS mechanisms for request rerouting being so widespread [12], and it’s effectiveness in successfully rerouting users to CDN replica servers, a decision was made to develop a mechanism whereby it would be possible to additionally augment the current DNS rerouting methods in such a way that the issues associated with DNS caching could be avoided. As such, we propose a CDN redirection mechanism whereby request rerouting happens via the regular augmented DNS method. However, if an end user is determined by the network to be requesting content from an unoptimal node, that request is then transparently redirected to the optimal node for that end user. This redirection occurs at one of the ISP’s edge routers that serves that user. This is achieved by using Software Defined Networking techniques to install the relevant flow rules that perform the redirection on these packets at the edge router. In this context, there is a network controller device with access to network statistics from the ISP CDN network. Using this information, decisions are then made on whether or not to push flow rules to cer- tain edge switches to determine if they should perform redirection on packets that pass through them. Transparent user redirection is achieved by performing destination-NAT on their outgoing packets, redirecting them towards the appropriate CDN replica server. For packets replying from the CDN replica server, they have a Source NAT applied to them at the edge router before they are sent to the end user. The controller logic can be explained with the following rules:

• The controller knows the location of all customers in the ISP network. Similarly, it also knows the location of all current CDN replica nodes.

• The controller therefore knows which CDN replica nodes are optimal for each set of customers that are served by an edge router in the ISP network. For example, CDN replica server A should serve the customers in the suburb of Ashfield, who are all connected to the rest of the ISP network via a single edge router at their local exchange.

• If and when network conditions change, the mappings set out above may also change. For example, CDN replica server A may be currently under high congestion, so the suburb of Ashfield should instead be directed to CDN replica server B.

• If and when these mappings of servers to customers changes, then the controller shall install routing rules into the edge routers that serve these customers, that transparently redirect them towards the more optimal server. That way, if they are using stale CDN information, they still get directed to the optimal CDN replica server for them. If they are using the new CDN information, routing happens as usual to the expected destination.

Following this controller logic, the CDN request rerouting process for an end user in this scenario is outlined on the next page:

22 1. A client types a URL into their browser, wishing to view content that is hosted on the ISP CDN.

2. The client’s browser performs a DNS request to map an IP address to the URL entered into the browser

(a) Either this request is fulfilled by the browser/OS DNS cache, and a result is returned instantly, or (b) The requested URL is not in the cache, and is resolved by an external DNS server.

3. The client’s browser creates a new TCP session to communicate with the CDN replica server. This session can be identified by a source and destination IP, and source and destination TCP ports (In the case of HTTP, destination ports are always 80 or 443)

4. The initial packet for this TCP session reaches an edge router belonging to the ISP.

(a) If the IP destination for this TCP session is the same as the optimal CDN server that has been mapped for this customer, then no redirection needs to occur. A new flow that routes this packet towards the CDN server is created, and the packet is routed as per normal. (b) If the IP destination for this TCP session does not match the IP of the op- timal CDN server that has been mapped for this customer, then redirection needs to occur. A new flow is created that a) Performs a Destination based NAT, changing the destination IP address of packets that belong to this TCP session from the old CDN server address, to the new optimal CDN address; and b) Also routes the packet towards the optimal CDN node, as in a regular router. A ‘reverse’ flow is also created, that performs source based NAT on packets returning from the CDN replica server.

Because new flows are created only when new TCP sessions pass through ISP edge router, scenarios where end user TCP flows are suddenly and unexpectedly terminated are thus avoided.

23 3.3 Experimental set up of a DNS request rerouting mechanism augmented with NAT

In this section, we describe the framework and programs that were used to create, imple- ment, and test the request rerouting mechanism outlined above.

3.3.1 Testing framework The network topology We use Mininet [38], a network virtualisation tool, to create a virtual network topology in which we perform our testing. Mininet provides an API that allows programmers to quickly and easily create a network topology, populate it with hosts, switches and controllers, and run network tests. A copy of the source code used to create the topology is included in the Appendix .4. In our tests with mininet, we also use Open vSwitch 2.02 for our virtualised switches. In our networking tests, our topology contains the following devices:

• h1: A host used for the purposes of testing our ‘improved’ CDN rerouting mecha- nism. They act as our host that exists within the ISP’s network. It has a 20Mbps connection to s1 to simulate an ADSL2+ high speed subscriber line.

• h2, h3: Webservers acting as our CDN replica servers. They both serve identical websites and content. When uncongested, they have a 100Mbps connection to s1. When we are simulating them as being congested, we lower that connection speed down to 1Mbps.

• h4: A host used for the purposes of contrasting standard CDN rerouting mechanisms compared to our ‘improved’ mechanism. They act as our host that exists outside the ISP’s network, and thus does not receive the effects of our proposed request rerouting mechanism. It has a 20Mbps connection to s1 to simulate an ADSL2+ high speed subscriber line.

• h5: A DNS server

• s1: An OpenFlow enabled switch

• c0: Our Ryu controller.

All links have been configured to add a 10ms delay to packets passing through them to simulate network locations that are all close to an ISP edge router. This network topology was run under the Ubuntu 14.04 operating system. It is impor- tant to note that the Ubuntu operating system does not operate an inbuilt DNS cache. Figure 3.1 shows how our topology was linked together.

24 Figure 3.1: Our network topology diagram The SDN controller Ryu [27] was selected as the controller framework for this thesis. It was mainly chosen because the author was proficient with python, the language that the Ryu framework uses. This framework provides a rapid prototyping and testing platform for testing SDN OpenFlow controller applications, and an in-built REST API. We use the OpenFlow1.3 spec to communicate between our controller and network switch. While OpenFlow1.4 spec is the most current version as of writing, Ryu’s last stable release during the course of this thesis supported OpenFlow1.3. A REST API is implemented via shell scripting to notify network changes to the controller. The source code for our controller,, can be found in the Appendix .3. On controller startup, it installs two table-miss flow rules to the switch: One that matches on all traffic, with priority 0(As seen in Table 3.2), and one that matches on TCP HTTP traffic coming from h1( and headed towards the CDN sub- net( 9, with priority 10(As seen in Table 3.3). If a packet matches the rule with priority 0, then that implies there are no other rules on the switch that match that packet. This causes the switch to forward the packet to the controller for processing.

9This subnet could be any size/value, we have just set it to for convenience

25 When this packet arrives at the controller, it’s source MAC address is mapped to it’s ingress port, and if it’s destination MAC address is known to the controller, then a flow rule is created that maps this particular source MAC/Dest MAC pairing to prevent fur- ther packet-ins. Otherwise, the packet is flooded out from the switch. In this manner, our controller causes the switch to act exactly as a learning switch.

Prior- Ingress Proto- IP Source IP Desti- Source Destination Actions ity Port col Address nation Port Port 0 * * * * * * Send packet to controller

Table 3.2: Table miss flow that catches everything. Used as part of learning switch functionality

Prior- Ingress Proto- IP Source IP Desti- Source Destination Actions ity Port col Address nation Port Port 10 * TCP * 80 Send packet to controller

Table 3.3: Table miss flow that catches traffic from h1( to the CDN replica servers.

The second flow table miss entry however, matches on HTTP traffic with source IP of h1(, and destination IP Traffic that matches this rule at the switch is also sent to the controller. Now, depending on the controller and network state, one of two things happens: If there have been no changes to the network topology, then h1’s traffic that is directed towards the CDN should have a destination IP address that maps to the ideal CDN node for that customer. In this scenario, a regular flow(As seen in Table 3.4) rule is created that matches on the HTTP TCP session of the packet that was sent to the controller. This means any further packets from h1 to the CDN that belong to this TCP session will not hit the table-miss flow, and will get forwarded normally.

Pri- Ingress Pro- IP Source IP Des- Source Destina- Actions or- Port to- Address tination Port tion ity col Port 20 * TCP 18445 80 Send out on switch port corresponding to destination IP address

Table 3.4: An example forwarding flow inserted into the switch in the case that there is no need to further redirect the end user. Note that IP destination and TCP source port values are context dependant, the values here are just examples.

If there have been topology changes however, and h1 is sending packets to a nonoptimal CDN replica server, then the controller detects that the destination IP of this packet does not match the IP address of the optimal CDN replica server for h1. In this scenario, a flow

26 that performs a DNAT action on the packets belonging to this TCP session is created. Correspondingly, a flow that performs an SNAT action on packets belonging to this flow is also created. These flows can be seen in Table 3.5. In order to notify the controller of these topology changes, a REST API is used to update the controller with the current network status.

Pri- Ingress Pro- IP IP Source Desti- Actions or- Port to- Source Desti- Port nation ity col Ad- na- Port dress tion 20 * TCP 18445 80 DNAT destination IP from to, Send out on switch port corresponding to destination IP address 20 * TCP 80 18445 SNAT source IP from to, Send out on switch port corresponding to destination IP address

Table 3.5: Example flows inserted into the switch in case there is a need to further redirect the end user. Note that as with the previous table, the IP address and TCP port values are arbitrary in this case, but remain relevant between the DNAT and SNAT rules.

27 A trace of typical HTTP TCP sessions under this request rerouting framework is provided below in Figure 3.2.

Figure 3.2: A Trace demonstrating the typical NAT procedure that happens to a TCP flow.

The packets are as follows:

1. DNS request for

2. DNS response for, IP of

3. TCP-SYN from to as seen from h1’s point of view

4. TCP-SYN from to, after it has been DNAT’d at the switch s1 (Packet now TCP-SYN from to

5. TCP-SYN, ACK from to before SNAT

6. TCP-SYN, ACK from to after SNAT (Packet now TCP-SYN, ACK from to

7. the packets afterwards follow the same pattern of DNAT, and SNAT.

28 These flow rules that are installed by the switch therefore redirect users on a TCP session based level. This way, end users are not suddenly disconnected by a TCP session being redirected mid-flow. More analysis on this facet of our request rerouting mechanism is provided in Chapter 5. The code from our Ryu controller is provided in the Appendix .3. It uses some code snippets from the open source Ryu examples. OpenFlow enabled switch The switch is an instance of Open vSwitch run in the Mininet topology. We can inspect the flow tables of this switch at any time by issuing commands at the CLI. The switch is made to run OpenFlow1.3 spec. It operates as a learning switch with some extra flow rules that allow it to perform NAT and DNAT operations on TCP flows that pass through it. DNS server Because this thesis is primarily concerned with the request rerouting mechanism, and not the request rerouting algorithm [12], we keep our DNS server simple. It is implemented on h5 via the BIND DNS Nameserver [39]. When we want to simulate a network topology change(The addition or changing of the optimal CDN replica server that maps to the hostname), we update the A record provided by h5 via a shell script. In our testing framework, this DNS server acts as the authoritative nameserver for This server has an IP address of, and sets the TTL on it’s responses to be 30 seconds. CDN replica servers As above, because we are primarily concerned with the request rerouting mechanism, we keep our web servers simple. Our webservers at h2 and h3 both point to the same directory to serve their webpages; the content they serve is exactly identical. This is functionally similar to a CDN provider that has replicas of content across multiple servers. We run our webservers using a simple python script that serves files from the current working directory on port 80. h2 and h3 have IP addresses of and, respectively. They both serve as the webserver for consists of a landing page with links to other articles, and a link to, which is 36megabits in size. This page links to, which is 31megabits in size. Finally, this page links to, which is 40megabits in size. The content on each page consists of 10 high resolution pictures. There are also a series of scripts on each page, for the purposes of automating the actions of the end users h1 and h4 that visit it. The landing page itself has a script that waits for 10 seconds, and then directs the browser to On, there is a script that loads, after has been fully loaded. Similar behaviour occurs on secondpage.html, and thirdpage.html. thirdpage.html leads back to the landing page, allowing the cycle to continue again.

29 The end users h1 and h4 act as the end users in our network. They are just simple mininet hosts running web browsers. In our tests, two browsers are used: Mozilla Firefox, and . This should provide a fair representation of a large proportion of the current browser market share - w3schools estimates 83.6% of internet users use these browsers combined [16]. For the purposes of our testing, we have our controller only apply the proposed request rerouting mechanism to h1 - this allows us to compare with h4, that will only be using the standard DNS request rerouting mechanism. It should be important to note that both of these browsers perform link prefetching and DNS caching of at least one minute. The end users are configured to use h5 as their primary DNS server. End user au- tomation is achieved via a series of javascript commands embedded in, which causes the end user to automatically forward through pages at set intervals and after certain events, like after a page completely loads. On both chrome and firefox, the cache has been disabled (But not the DNS cache) in order to force the web browsers to fully download every time they visit it. Other tools such as AutoHotKey and shell scripting is used to further automate the timing, execution, and recording of the testing procedures.

30 3.3.2 Testing procedures This subsection describes the testing procedures we perform, and the methodology and reasoning behind them. We also outline the goals of our testing procedures.

To recap, these are the design goals we aim to achieve with our request rerouting mecha- nism:

1. Able to work around DNS caching/prefetching

2. Able to fully utilise network information

3. Easy to configure

4. Incrementally implementable

5. Able to utilise and act on the ability to rapidly create virtual CDN replica servers

6. Scalable for network and service expansion

7. Able to provide access to the CDN for end users that are not customers item of the ISP

From these, the ones we can directly test are goals 1, 2, and 5. The testing procedures we thus create must therefore be able to not only test for goals 1, 2, and 5, but also ideally test for the system performance in achieving those goals 10. To that end, the following test procedures were created:

• Request rerouting after link failure

• Request rerouting during link congestion

• Request rerouting with the introduction of a new CDN replica server

Also, within each of these procedures, we define two types of tests: Passive user and Active user. Because the ultimate goal of a CDN is to increase QoE and QoS for their end users, it is therefore desirable to model ‘expected’ user behaviour in times of duress. As such, we model a scenario where the user tries to access content via simply typing the URL into their browser, and another scenario where if congestion is experienced, or load time is greater than 20 seconds, the user refreshes the page. Request rerouting after link failure In this test, we observe the effectiveness of our proposed request rerouting mechanism after a CDN replica server experiences a link failure. Here, we are primarily concerned with content access delay. First, the end user visits, which points to CDN replica server h2. Doing so causes them to perform a DNS request for h2. They are assumed to stay on the landing

10bitrate, content access delay, etc.

31 page of for 10 seconds while they read some of the news. During this time period, a shell script is called that disables the link between h2 and s1, and updates the controller via the REST API that h2 has gone down, and that requests for h2 should be redirected towards h3. At the same time, the DNS server at h5 updates it’s A record to reflect that fact. After the end user has stayed on the landing page of for 10 seconds, they then click on a hyperlink to, which contains more content. We then time how long it takes for that page to successfully load. In the passive user case, if the page takes longer than one minute to load, we end the test. In the active user case, the user attempts to refresh the page if there are no signs of activity after 20 seconds. Request rerouting during link congestion In this test, we observe the effectiveness of our proposed request rerouting mechanism when our CDN replica server is under heavy congestion. In this scenario, we simulate the link between s1 and h2 (The switch and a CDN replica server) as being congested. This is achieved by lowering the bandwidth on the link between s1 and h2 to 1Mbps. As with the previous test, initially points to h2. The test begins when a user visits The user stays on for 10 seconds, and is then redirected to During this 10 second period, a shell script is executed that notifies the controller(c0) that CDN replica h3 is available, and uncongested. Correspondingly, the DNS server at h5 updates it’s A record for to point to CDN replica h3. We then time how long it takes for the user to complete a full trip around the testpages. After a full trip, when the user is back on the landing page, we then time how much longer it takes to for the user to connect to the new CDN replica server(h3) if they have not already done so. Request rerouting with the introduction of a new CDN replica server This test can be seen as an extension of the previous test. The purpose of this test is to examine the impact that the proposed request rerouting mechanism might have on users who are not located in the ISP’s network, but yet are still using the CDN service 11. This test will also provide insight on how rapid establishment of a new CDN server depending on server demand can have an effect on both the ISP’s customers, but also visitors to the ISP’s CDN network. In this scenario, both h1 and h4 (an end user in the ISP network, and an end user not in the ISP network) are initially connected to h2. We make the link between h2 and s1 constrained to 1Mbps, so as to easier simulate the effects of congestion. As with previous tests, both hosts visit the landing page of initially, and then wait for 10 seconds. They then visit, causing the link between s1 and h2 to become congested. At the same time that this link becomes congested, a 30 second timer is started. At the end of this 30 second timer, h3 is ‘brought online’, and the controller is notified that there is now a new, more optimal CDN replica server available. Throughout these tests, we are timing how long it takes for h1 and h4 to reach

11For example, they may be logging into a content portal while at work

32 the initial landing page again, compared to the case where we don’t use this thesis’ proposed method of request rerouting.

33 Chapter 4

Results & Performance

This chapter contains the results and performance of our proposed request rerouting mechanism. Observations relating to the testing process are also made.

4.1 Request rerouting after link failure

The results of 50 tests are shown in Table 4.1 below: Firefox Chrome mean std mean std Reference mean load time 22.8 0.05 22.7 0.08 With improved request rerouting mechanism 27.2 2.2 26.8 2.4 Without improved request rerouting mechanism; active user 75.3 or 115.9 1.52 or 0.6 144.3 4.196 Without improved request rerouting mechanism; passive N/A N/A N/A N/A user

Table 4.1: Time taken to fully traverse and load the testpages, in seconds.

The reference time is the time taken to load the website directly from the CDN replica server, without any form of redirection. With both Firefox and Chrome, when the browsers were left alone as described in passive testing mode, both failed to load a page after over a minute, and so are both counted as timeouts. Interesting behaviour was observed for the active testing mode in both Firefox and chrome. In Firefox, two different, but locally similar sets of results were gathered; One with a mean total load time of 75.3 seconds, and another with a mean total load time of 115.9 seconds, a 40 second difference. This was caused by two different behaviours during active testing: • A refresh was performed after 20 seconds of inactivity, as outlined in Subsec- tion Upon receiving no result, another refresh was done at 25 seconds, 30 seconds, etc. until the website finally started loading. Activity was consistently observed at 100 seconds after was initially loaded. This behaviour cor- responds to the 115.9 mean load time.

34 • A refresh was performed after 50 seconds of inactivity (60 seconds total since the start of the testing procedure). Activity was consistently observed immediately after this reload. This behaviour corresponds to the 75.3 mean load time.

As shown in the table, the average mean load time for Chrome is at least 30 seconds higher than that for firefox. The active behaviour was similar - refresh at 20 seconds, then 25, 30, etc. until activity is observed. In this case, the page access delay was a minute; when the page was refreshed after a total of a minute’s delay since the browser visited, the first testpage appeared to start loading. However, the page didn’t fully load - Only seven images out of the ten would load on firstpage.html. After 20 seconds, the page was refreshed and the page loaded fine; but the same behaviour occured on secondpage.html, as well. This behaviour appeared consistently, and contributed majorly to the extra time observed in Table 4.1. The results from this testing procedure indicate that our proposed request rerouting system can successfully reroute end users after a link failure. However, there is an ob- servable performance cost, with mean load time taking around 5 seconds more with our proposed mechanism compared to the reference. However, after the 10 second delay on, the access delay until firstpage.html started loading was negligible.

4.2 Request rerouting during link congestion

The results of 50 tests are shown in Table 4.2 below:

Firefox Chrome mean std mean std Reference mean load time(Fast) 22.8 0.05 22.7 0.08 Reference mean load time(Congested) 123.8 0.06 123.8 0.06 Load time with improved request rerouting mechanism 27.25 0.7 27.4 0.14 Time until connected to new CDN server, with improved request rerouting 11.7 0.54 11.65 0.22 mechanism Load time without improved request rerouting mechanism 85.73 0.05 83.575 1.51 Time until connected to new CDN server, without improved request 81.2 0.15 62.4 0.19 rerouting mechanism

Table 4.2: Time taken to fully traverse and load the testpages, in seconds. Also shown is time delay until the client connects to the optimal CDN server over the congested one.

There are two reference times in this test. The fast one is identical to the one presented in the previous test. The slow one is a reference for the amount of time it takes to fully load and it’s testpages across the congested link with 0 request redirection. In the tests without the improved request rerouting mechanism, both browsers beat the congested reference mean load time. This is because, between the second and third testpage, they would change connections from the congested h2, to the uncongested h3. However, as shown in the table, there is a large discrepancy between Chrome and Firefox

35 in the delay until the uncongested h3 server was connected to. Chrome would connect to h3 half-way through loading the second testpage - one of the GET requests performed while loading the second test page would be sent to h3, instead of h2. For Firefox however, h3 would only be connected to once the second testpage had finished loading - Firefox would only connect to h3 between page loads. This difference did not cause much of a discrepancy between the mean load time for the entire set of testpages however, as the Chrome browser only had a chance to load one extra image from the uncongested h3 server over the firefox browser. As with the previous test, our improved request rerouting mechanism shows that it is capable of providing request rerouting in a way that utilises current network information. Similarly also, the mean load time is around 5 seconds higher than the reference time. Users that are successfully rerouted by our rerouting system must have their first few TCP packets pass into the controller, in order to determine if the CDN node they originally are destined for is the one that is optimal for them. This handling and processing of packets by the controller may be what is causing that 5 second gap.

4.3 Request rerouting with the introduction of a new CDN replica server

The results of 50 tests are shown in Table 4.3 below:

h1 h4 Reference mean load time (together) mean: 236 std: 2.26 Mean load time with new CDN introduced mean: 56.9 std: 17.4 mean:135.2 std: 16.7

Table 4.3: Time taken to fully traverse and load the testpages, in seconds.

The singular reference load time in this test indicates how long it took on average for both hosts, h1 and h4 to finish loading the testpages when both hosts were accessing the congested CDN replica server h2 simultaneously. It is quite surprising to see that the standard deviation is comparatively low for this measurement. When this measurement was being taken, the two hosts would appear to load the website in random chunks, and ‘take turns’ in downloading content from h2. This measurement measures the entire time that h2 is put under load, so even if one host has finished the testpages, the measurement continues until the other hosts finishes their run. The large values for standard deviations seen in the other measurements here are caused via h1. If h1 was occupied with GET request responses from h2 at the 30 second mark, then it could take ten to twenty seconds for h1 to finish those downloads, and then get rerouted by our proposed mechanism. This had a flow on effect onto h4, whereby it would sometimes have to ‘wait’ for h1 to use another link, so h4 could utilise the now uncongested link. As can be seen, the mean time for h1 to finish loading the testpages is less than half that of h4’s. Similarly, mean time for h4 to finish loading the testpages is almost half that of the reference time. This indicates that our proposed

36 ISP CDN request rerouting system can be used to utilise the introduction of rapidly established CDN replica servers to increase QoE for both users of the ISP’s network, but also for users outside the ISP’s network by indirectly reducing the load on their links.

37 Chapter 5


For the sake of convenience, we will reiterate the design goals specified in Chapter 3 here. Our proposed design should be:

1. Able to work around DNS caching/prefetching

2. Able to fully utilise network information

3. Easy to configure

4. Incrementally implementable

5. Able to utilise and act on the ability to rapidly create virtual CDN replica servers

6. Scalable for network and service expansion

7. Able to provide access to the CDN for end users that are not customers item of the ISP

We will now assess the effectiveness of our proposed request rerouting mechanism for ISP operated CDNs by discussing and evaluating the results achieved in the previous chapter.

5.1 Results

The experiments outlined in Section 3.3.2 were designed specifically to test if our design met goals 1, 2, and 5. As the results in Chapter 4 indicate, our proposed implementation of an ISP CDN’s request rerouting mechanism via NAT augmented DNs is successful in those areas. The simplicity of the table-miss flow rules employed by our Ryu controller as defined in Section, and the usage of simple shell scripts and REST APIs indicate that our system should be easy to integrate and configure in an ISP’s network, thus fulfilling goal 3. Because the proposed request rerouting mechanism is to be used as an addition to traditional DNS routing mechanisms, it is possible for an ISP to implement that first, and then incrementally introduce OpenFlow-enabled switches throughout their network as demand requires. This fulfills goal 4. Goal 7 is similarly fulfilled: Users outside of the ISP network can simply use the traditional DNS request rerouting method, they

38 just wont get the benefits as demonstrated in this thesis. The only goal not covered by the results of our experiments and/or mechanism design is goal 6; whether or not our proposed request redirection mechanism is scalable. The results of our testing also confirms our claims about the ill effects of client side DNS caching. For example, in our first test procedure, browsers simply timed out, or had users refreshing for over a minute in order to get any kind of response.

5.2 Discussion

Our proposed request rerouting mechanism for ISP operated CDNs ended up being quite visibly successful. Load times were often cut in half 1 or more, compared to a request rerouting system such as traditional DNS. Granted, the situations our tests described were edge cases that don’t happen very often in current enterprise networks. But as Akamai and Justine Sherry et al. [3, 37] claim, the way forward for ISPs and CDNs in the future involves the usage of cloud services to rapidly adapt to changes in the networking landscape and demands. This means that situations where our proposed request rerouting mechanisms are beneficial will only increase in the future. The results of the thorough investigation into the effects of client side DNS caching are a bit surprising. Issues like how timing the refreshing of a page has an effect on how soon the page successfully refreshes are still not fully understood by the author. While we certainly expected some errors to occur due to DNS caching such as pages timing out, timing errors certainly were not. Unexpected behaviour such as this is exactly why such an investigation had to be undertaken, however. Unfortunately, a rigorous examination on the scalability of of our proposed mechanism has not been undertaken. Because of the current infancy of SDN technology compared to legacy networks, published performance and scalability research is not yet widely available, especially for a purely reactive 2 system like the one we have proposed. Indeed, this may be the cause of the 5 second gap between the reference timing and the successfully routed timing in our first two networking experiments. Adam Zarek [40] has done some research into the impact of flow timeout length on SDN performance, and indicates that a heuristic matching on TCP patterns is desirable. Given that the flow rules added by our controller are all TCP, and are designed to be short lived 3, future work on determining the scalability of our proposed mechanism looks promising. A potential solution to possible future scalability problems is to move our proposed request rerouting mechanism away from the ISP’s edge router, and instead into the home routers and devices of home owners. This way, instead of a singular device having to

1Note that this depends on network context; rerouting someone from a 10Mbit connection to a 5Mbit connection will obviously increase their load time. In the context of CDN request rerouting however, the reason you’re performing rerouting in the first place is to provide them with a faster connection to content. 2A reactive SDN system is one in which all new flows are sent to the controller for inspection. In our set up, all previously unseen TCP flows are sent to the controller to determine if it is going to the right CDN node or not. 3Our flow rules are only of use when there’s a possibility of a user utilising cached DNS information, or if we want to temporarily balance network congestion.

39 handle thousands of flows, a single device handles tens of flows per household. Similar networking tools and APIs have already been researched [41], which indicates that future work on this request rerouting tool could indeed be brought into the home networking space.

40 Chapter 6


In this thesis we have successfully demonstrated a request rerouting mechanism for ISP operated CDNs. It is able to transparently reroute users away from nonoptimal CDN replica servers to optimal ones in the situation that the user has stale DNS information due to client side DNS caching, or if network conditions require it. It utilises SDN techniques to achieve this, using the OpenFlow1.3 spec on an OpenFlow enabled switch running in a mininet virtualised network. A controller operating on the Ryu framework allows the request rerouting mechanism to occur, by installing NAT and DNAT rules that act on TCP flows. Such a request rerouting mechanism can find much use in both ISP operated CDNs, as well as ISPs that are collaborating with CDNs to provide a unified service. Current research shows that future requirements and design goals for ISPs include the ability to rapidly implement cloud-based services. Our proposed request rerouting mechanism offers an easily configured, incrementally implementable solution that can perform that role exactly. We have also performed a first of it’s kind analysis of the effects of client side DNS caching on DNS based CDN request rerouting mechanisms. This analysis revealed that client side DNS caching can lead to a degradation in Quality of Experience and Quality of Service experienced by the end user performing that caching.

6.1 Future Work

Because this thesis dealt mainly with the request rerouting mechanism, and not the request rerouting algorithm to go with it, the natural extension of this work is to develop a request rerouting algorithm that can utilise the request rerouting mechanism proposed by this thesis efficiently. Research could also be done on the scalability of the request rerouting mechanism outlined in this thesis, as tests have so far only been performed on a virtualised network on a desktop PC.

44 Appendix 1

.1 An example of a DNS request for, a CDN hosted website.

Figure 1: A wireshark trace of a DNS request for

.2 HTML for

301 Moved Permanently

Moved Permanently

The document has moved here .

.3 The Ryu controller for my thesis. import json import l o g g i n g

45 from operator import a t t r g e t t e r from ryu . base import app manager from ryu.controller import ofp event from ryu. controller .handler import MAIN DISPATCHER, DEAD DISPATCHER, CONFIG DISPATCHER from ryu. controller .handler import s e t e v c l s from ryu . l i b import hub from ryu . l i b import dpid as d p i d l i b from webob import Response from import ControllerBase , WSGIApplication, route from ryu. lib .packet import ethernet , tcp from ryu. lib .packet import packet simple switch instance name = ’simple s w i t c h a p i a p p ’ url = ’/simpleswitch/add/{ dpid } ’ url1 = ’/simpleswitch/remove/{ dpid } ’ f l a g = 0 class ThesisController(app manager . RyuApp) : CONTEXTS = { ’wsgi ’: WSGIApplication}

def i n i t ( s e l f , ∗ args , ∗∗ kwargs ) : super (ThesisController , self). i n i t (∗ args , ∗∗ kwargs ) self .datapaths = {} s e l f . mac to port = {} wsgi = kwargs[ ’wsgi’] wsgi. register(SimpleSwitchController , { simple switch instance name : s e l f }) self.switches = {} self.flows = {} self”Loading Thesis Controller ... \ n” )

# Initialiser. adds the table −miss flow entry. @ s e t e v c l s ( ofp event .EventOFPSwitchFeatures , CONFIG DISPATCHER) def s w i t c h f e a t u r e s handler(self , ev): datapath = ev.msg.datapath ofproto = datapath.ofproto parser = datapath.ofproto p a r s e r self .switches[datapath. id ] = datapath s e l f . mac to port. setdefault(datapath. id , {}) self .flows.setdefault(datapath. id , {}) # install table −miss flow entry

46 # # We specify NO BUFFER to max len of the output action due to #OVS bug. At this moment, if we specify a lesser number , e . g . , # 128, OVS will send Packet−In with invalid buffer i d and # truncated packet data. In that case, we cannot output p ac k et s # correctly. match = parser .OFPMatch() actions = [parser.OFPActionOutput(ofproto. OFPP CONTROLLER, o f p ro t o . OFPCML NO BUFFER) ] s e l f . add flow(datapath, 0, match, actions) #This is the TCP table −miss flow entry. Set to higher p r i o r i t y . match = parser .OFPMatch(eth type=0x0800, ip p r o t o =6, i p v 4 src=””, t c p dst=80, ipv4 dst=(”” , ””)) actions = [parser.OFPActionOutput(ofproto. OFPP CONTROLLER, o f p ro t o .OFPCML NO BUFFER) ] s e l f . add flow(datapath, 10, match, actions)

@ s e t e v c l s ( ofp event .EventOFPPacketIn , MAIN DISPATCHER) def p a c k e t i n handler(self , ev): global f l a g msg = ev . msg datapath = msg.datapath ofproto = datapath.ofproto parser = datapath.ofproto p a r s e r i n port = msg.match[ ’in p o r t ’ ] pkt = packet.Packet( eth = pkt.get protocols(ethernet.ethernet)[0] i f ( pkt . g e t protocol(tcp.tcp) and eth . s r c == ” 00:00:00:00:00:01”): # we got an unmatched TCP packet from h1 tc = pkt . g e t protocols(tcp.tcp)[0] dstport = tc.dst p o r t srcport = tc.src p o r t dst = eth.dst src = eth.src

47 dpid = datapath. id self”Received TCP packet \n s r c p o r t : %d\ t d s t p o r t : %d” , srcport , dstport) i f f l a g : self”Flag s e t ! \ n” ) match = parser .OFPMatch(tcp src=srcport , tcp d s t= dstport , i n p o r t=in port , i p p r o t o =6, eth type=0x0800)

i f dst in s e l f . mac to port[dpid]: out port = self.mac to port[dpid][dst] else : out port = ofproto.OFPP FLOOD

i f f l a g and eth.dst == ”00:00:00:00:00:02”: out port = 3 actions = [parser.OFPActionSetField(eth d s t=” 00:00:00:00:00:03”), parser .OFPActionSetField(ipv4 d s t=””), parser .OFPActionOutput(out port ) ] s e l f . add flow(datapath, 20, match, actions) match = parser .OFPMatch(tcp src=dstport , tcp d s t =srcport , i n p o r t=out port , i p p r o t o =6, eth type=0x0800) # eth.dst is the CDN destination from the received packet that the # host thinks they’re going to actions = [parser.OFPActionSetField(eth s r c=eth . dst ) , parser .OFPActionSetField(ipv4 s r c=””), #! ! ! SET THIS LATER parser .OFPActionOutput( in p o r t ) ] s e l f . add flow(datapath, 20, match, actions) else : actions = [parser.OFPActionOutput(out port ) ] s e l f . add flow(datapath, 20, match, actions) else : dst = eth.dst src = eth.src

48 dpid = datapath. id s e l f . mac to port.setdefault(dpid, {})

self”packet in %s %s %s %s”, dpid, src, dst , i n p o r t )

# learn a mac address to avoid FLOOD next time. s e l f . mac to port[dpid][src] = in p o r t

i f dst in s e l f . mac to port[dpid]: out port = self.mac to port[dpid][dst] else : out port = ofproto.OFPP FLOOD

actions = [parser.OFPActionOutput(out port ) ]

# install a flow to avoid packet i n next time i f out port != ofproto.OFPP FLOOD: match = parser .OFPMatch(in p o r t=in port , e t h d s t =dst ) s e l f . add flow(datapath, 1, match, actions)

data = None i f msg . b u f f e r id == ofproto.OFP NO BUFFER: data =

out = parser.OFPPacketOut(datapath=datapath , buffer i d= msg . b u f f e r i d , i n p o r t=in port , actions= actions , data=data) datapath.send msg ( out ) def add flow(self , datapath, priority , match, actions): ofproto = datapath.ofproto parser = datapath.ofproto p a r s e r

inst = [parser.OFPInstructionActions(ofproto. OFPIT APPLY ACTIONS, a c t i o n s ) ]

mod = parser .OFPFlowMod(datapath=datapath , priority= p r i o r i t y , match=match, instructions=inst) datapath.send msg (mod)

49 # This is the start of the REST api. class SimpleSwitchController(ControllerBase): def i n i t (self , req, link, data, ∗∗ c o n f i g ) : super (SimpleSwitchController , self). i n i t ( req , link , data , ∗∗ c o n f i g ) s e l f . s i mp l s wi t c h spp = data[simple switch instance name ]

@route(’simpleswitch ’, url , methods=[’GET’], requirements={ ’ dpid ’ : d p i d l i b .DPID PATTERN}) def l i s t m a c table(self , req, ∗∗ kwargs ) : global f l a g f l a g = 1 s i m p l e switch = self.simpl s wi t c h s pp dpid = d p i d l i b . s t r t o dpid(kwargs[ ’dpid’]) i f dpid not in s i m p l e s w i t c h . mac to port : return Response(status=404)

mac table = simple s w i t c h . mac to port.get(dpid, {}) body = json.dumps(mac table ) return Response(content type=’application/json ’ , body= body )

@route(’simpleswitch ’, url1 , methods=[’GET’], requirements={ ’dpid’: dpid l i b .DPID PATTERN}) def l i s t f l o w table(self , req, ∗∗ kwargs ) : global f l a g f l a g = 0 s i m p l e switch = self.simpl s wi t c h s pp dpid = d p i d l i b . s t r t o dpid(kwargs[ ’dpid’])

i f dpid not in s i m p l e switch.flows: return Response(status=404)

f l o w table = simple switch.flows.get(dpid, {}) body = json.dumps(flow t a b l e ) return Response(content type=’application/json ’ , body= body ) .4 The mininet topology for my thesis from mininet. cli import CLI from mininet. link import Link

50 from mininet. link import TCLink from import Mininet from mininet.node import RemoteController from mininet.term import makeTerm

i f ’ m a in ’ == name : net = Mininet(controller=RemoteController , autoStaticArp=True , autoSetMacs=True)

c0 = net.addController(’c0’) s1 = net.addSwitch(’s1’) h1 = net.addHost(’h1’) h2 = net.addHost(’h2’) h3 = net.addHost(’h3’) h4 = net.addHost(’h4’) h5 = net.addHost(’h5’) TCLink(h1, s1, port2=1, bw=10, delay=’10ms’) TCLink(h2, s1, port2=2, bw=1, delay=’10ms’) TCLink(h3, s1, port2=3, bw=100, delay=’10ms’) TCLink(h4, s1, port2=4, bw=10, delay=’10ms’) TCLink(h5, s1, port2=5, delay=’10ms’) c0 . s t a r t ( ) s1.start([c0]) s1 .sendCmd( ’ovs−v s c t l s e t bridge s1 protocols=OpenFlow13 ’)

CLI( net ) net . stop ( )