-INDEPENDENT ANYCAST FOR IPV6 CONTENT DELIVERY NETWORKS

by

YINHANG CHENG

Submitted in partial fulfillment of the requirements for the degree of

Master of Science

Department of Electrical Engineering and Computer Science

CASE WESTERN RESERVE UNIVERSITY

August 2019 CASE WESTERN RESERVE UNIVERSITY

SCHOOL OF GRADUATE STUDIES

We hereby approve the thesis of

Yinhang Cheng

candidate for the degree of Master of Science

Committee Chair

Michael Rabinovich

Committee Member

Michael Rabinovich

Committee Member

Vincenzo Liberatore

Committee Member

An Wang

Date of Defense

06/21/2019

*We also certify that written approval has been obtained

for any proprietary material contained therein. Table of Contents

List of Tables...... V

List of Figures...... VI

Abstract...... VII

1. Introduction...... 8

2. Background...... 11

2.1 DNS-based Request Routing in CDN...... 11

2.2 IPv6 Mobility...... 13

3. Related Work...... 18

3.1 Anycast Request Routing in CDN...... 18

3.2 Load-aware Anycast Routing...... 20

3.3 ECS Extension to DNS...... 21

3.4 TCP Connection Forwarding...... 23

3.5 IPv6 Anycast for CDNs...... 24

4. Overview of the Routing-Independent Anycast for TCP Communication...... 26

4.1 TCP Connection Establishment...... 27

4.2 Ongoing Connection...... 30

4.3 Binding Management and Connection Termination...... 32

4.4 Security Consideration...... 34

4.5 Benefits...... 36

5. Implementation...... 36

5.1 Anycast Server...... 38

5.2 Server...... 39

III 5.2.1 Control State...... 39

5.2.2 Unicast Server Operation...... 41

5.3 Client...... 45

5.3.1 Control State...... 45

5.3.2 Client Operation...... 46

5.4 An Alternative for Handling SYN-ACK Packets...... 49

5.5 An Alternative for Supporting Multiple Anycast Servers...... 54

5.6 Corner Cases...... 57

6. Evaluation...... 62

6.1 Environment...... 62

6.2 Overhead of RIA under Ideal Conditions...... 64

6.3 Comparison with Other Request Routing Methods...... 66

6.4 Experimental Results...... 69

6.4.1 Comparison with DNS-based Request Routing...... 79

6.4.2 Comparison with HTTP Redirect...... 80

7. Summary...... 81

Bibliography...... 84

IV List of Tables

Table 1 . The overhead of the prototype...... 65

Table 2 . Time for downloading a file with certain size using RIA (ms)...... 70

Table 3 . Time for downloading a file with certain size using DNS-based request

routing (ms)...... 70

Table 4 . Time for downloading a file with certain size using HTTP redirect (ms) 70

V List of Figures

Figure 1 . An example of DNS-based request routing...... 12

Figure 2 . Home address option header format...... 16

Figure 3 . Format of mobility header...... 16

Figure 4 . Format of type 2 routing header...... 17

Figure 5 . An example of Anycast request routing...... 19

Figure 6 . Establishment of TCP connection using Routing-Independent Anycast

mechanism...... 29

Figure 7 . Packet manipulation for an established connection...... 31

Figure 8 . Select structures in unicast server...... 41

Figure 9 . Format of Binding Update message...... 42

Figure 10 . Select structures in client...... 46

Figure 11 . Handling SYN-ACK in the prototype and its potential problem...... 50

Figure 12 . An alternative for handling SYN-ACK...... 51

Figure 13 . Testbed...... 63

Figure 14 . Network topology of our testbed...... 67

Figure 15 . Operations for testing DNS-based request routing...... 68

Figure 16 . Operations for testing HTTP redirect...... 69

Figure 17 . Comparison among RIA, DNS-based request routing and HTTP

redirect under different conditions. (a)-(d): Bandwidth = 10Mbps. (e)-(h):

Bandwidth = 20Mbps. (i)-(l): Bandwidth = 100Mbps...... 76

VI Routing-Independent Anycast For IPv6 Content Delivery Networks

Abstract

by

YINHANG CHENG

Content delivery networks (CDNs) become increasingly popular due to the demands for high performance of web content delivery. The key factor for

CDNs to improve user experiences is to direct user requests to the ‘best’ edge server that is the nearest to the users. Several request routing methods are employed by CDNs, such as DNS-based request routing, anycast, and HTTP redirect. However, these approaches all have their own limitations. In this thesis, we propose a Routing-Independent Anycast (RIA) for CDN request routing, which addresses the drawbacks in traditional request routing solutions.

Our mechanism is based on IPv6 Mobility. We describe RIA design, implement a prototype, and use this prototype to conduct performance evaluation of our approach. Our performance evaluation reveals inefficiencies of our implementation and shows that even our unoptimized prototype can outperform DNS-based request routing and HTTP redirect in some circumstance.

VII 1. Introduction

The increasing demands for web content delivery with high performance and high availability have led to the popularity of the content delivery networks (CDNs). A is a geographically distributed network of servers designed to provide optimal online experiences for users [1]. With the rapid growth of the web traffic such as media and live stream, the content providers tend to employ

CDNs to improve the availability and reduce the response time. According to [2], over 52% of the top one million websites are utilizing a CDN. When it comes to the top 10000 websites, the ratio becomes 81%. With no doubt, CDNs have become key component of the overall Internet infrastructure.

The CDN servers, also called the edge servers, cache web contents from the content providers’ origin servers and deliver them to the users. The performance of a CDN largely depends on the ability to redirect the client requests to the ‘best’ edge server, a process known as ‘request routing’ [3]. To routing a client to the ‘best’ server, the

CDN needs to consider multiple factors such as the network conditions, load balancing considerations and the distance between the client and the edge server.

An efficient request routing system significantly improves the online experiences for users.

8 The most commonly used request routing mechanism is called DNS-based request routing, which utilizes the System (DNS) to redirect the client requests.

DNS is a fundamental infrastructure in the Internet that translates host names to IP addresses. Almost every request from users starts with a DNS query. Normally, the

CDN’s name server receives the DNS query from a user’s local DNS (LDNS) server and then replies with the IP address of an appropriate edge server. The LDNS then forwards this result to the user. Finally, the user is able to request web services from this IP address. However, there are several limitations of DNS-based request routing

[4]. First of all, the DNS-based request routing is based on the location of the LDNS, so the selected edge server may not be the ‘best’ for the actual client. Furthermore, this mechanism is at domain level. The LDNS caches the DNS response from the CDN for the period of time called time to live (TTL), which is included with the response.

The DNS response also contains multiple IP addresses related to a certain domain name. The LDNS is supposed to shuffle these addresses when responding to clients.

Thus, a redirection sends all the users behind a single LDNS to the same set of IP addresses for the duration of the TTL, which may lead to load balancing problem.

Besides, the CDN’s name servers need to set short TTL value within DNS responses in order to responds quickly in the face of abrupt changes in the conditions. This tends to significantly increase the number of DNS queries.

Another widely used strategy for request routing is anycast [5]. To perform anycast, a set of replica servers needs to announce the same anycast IP address. Then, the requests from a user are directed to one appropriate server based on the BGP mechanism [6]. Anycast simplifies the procedure of finding the ‘closest’ server and

9 the redirection is handled based on the location of the actual users instead of LDNS.

However, anycast also has several drawbacks. First, anycast is unaware of network conditions and server load. If an edge server is overloaded, anycast may not be able to gradually redirect the traffic to another server, although there has been some progress for load-aware anycast [20, 7]. Another limitation of anycast is that a BGP route change can result in packet diversion to another edge server in the middle of an ongoing TCP connection, leading to the termination of the connection. It may be acceptable for short-lived TCP connections, but for long TCP flows, it is a problem.

A well-known challenge faced by the whole Internet is the shortage of IPv4 address.

In order to address this problem, a new version of the (IP) called

IPv6 is designed to replace IPv4. IPv6 uses a 128-bit address while IPv4 has only

32-bit address. Although the complete transition from IPv4 to IPv6 is not easy, the trend will continue.

This thesis presents a Routing-Independent Anycast (RIA) for Transmission Control

Protocol (TCP) and describes a prototype implementation of this mechanism. Our approach overcomes the limitations of both DNS-based request routing and IPv4 anycast while addressing potential security concerns. Furthermore, our method is forward-looking as it takes advantage of the ongoing transition to IPv6 in the face of the IPv4 address exhaustion.

We utilized some features of IPv6 Mobility to implement our architecture. Next section presents background information describing technologies on which our

10 approach is built. In Section 3, we will describe several related works such as traditional anycast, load-aware anycast, ECS extension to DNS, TCP connection forwarding and two IPv6 anycast mechanisms. Then, our Routing-Independent

Anycast approach is introduced in Section 4. Section 5 provides the implementation details of our prototype. In Section 6, we describe experiments on our prototype and evaluate the performance of our approach by comparing it to other request routing methods. Section 7 is the summary of this thesis.

2. Background

2.1 DNS-based Request Routing in CDN

Figure 1 shows a typical example of DNS-based request routing in CDN. Users typically initiate communication by first sending a DNS query to their local DNS server (LDNS). For example, the client asks the LDNS for the IP address of foo.com.

The job of The LDNS is to return the IP address of the requested domain name. To get the IP address, the LDNS contacts an Authoritative DNS server (ADNS) responsible for the requested domain, which is controlled by the content provider. If this content provider employs a CDN to deliver its web service, its ADNS should redirect the LDNS to the CDN’s ADNS by replying a CNAME DNS record, which maps foo.com to foo.cdn.com. Then, the LDNS requests the CDN’s ADNS for the IP address of foo.cdn.com. The CDN’s ADNS will select the edge server to serve the user and return the IP address of this server to the user’s LDNS. The CDN’s ADNS therefore act

11 as a request – it ultimately routes the client’s requests to the selected edge server. An edge server for the client is supposed to be chosen by the request router based on several factors, including proximity to the client, the information on network conditions and feedback from edge servers concerning their load [4]. Now the request router needs to return the IP address of the selected server to the client’s LDNS, which is 1.2.3.4 in Figure 1. Upon receiving the DNS response, the

LDNS caches the map between the domain name and the IP address and forwards the response to the client. Eventually, the client is able to use this IP address to access to the selected edge server.

Figure 1. An example of DNS-based request routing

DNS-based request routing is easy to deploy and operate. It does not require any changes to the existing protocols and clients. Besides, it makes use of the DNS system which is already widely used on the Internet. However, the DNS-based

12 request routing has several significant limitations. The request router is unaware of the location of the actual client because of the intermediate LDNS. Therefore, the selected edge server that is a good choice for the LDNS may not be a good choice for the actual user. Besides, an ideal request resolution system should service the requests for each user, not all the users behind a LDNS. Clients from the same LDNS are not distinguished by the request router. If a redirection happens, all the clients behind the LDNS will send their requests to the new server and may overload the target server. Furthermore, CDNs use short time-to-live (TTL, the amount of time the

ADNS allows the response to be reused from the LDNS’s cache) in their DNS responses in order to quickly react to the change of network condition. Short TTL could increase the load in DNS.

2.2 IPv6 Mobility

We were inspired by IPv6 Mobility (MIPv6) when designed RIA. A lot of functionality in MIPv6 is utilized in our approach. So it is reasonable for us to present our method based on IPv6. However, RIA can also be used in IPv4 as long as we implement similar functionality in the IPv4 options. In this thesis, we only describe and implement our approach on IPv6 because many routers accept the packets with IPv6

Mobility extensions, while packets with unknown IPv4 options are typically dropped.

IPv6 Mobility (MIPv6) is a protocol which allows a mobile node to remain reachable by its home address (the permanent address of the node within its home network) while moving around in the IPv6 Internet [8]. Basically, the mobile node (MN) has a

13 permanent home address belonging to its home network. While the mobile node visits foreign networks, it acquires care-of addresses through conventional IPv6 mechanisms. To maintain the association between the mobile node’s home address and care-of address, there is a special router in the home network, called home agent. Every time the mobile node moves to a new foreign network, it informs home agent of its current care-of address. Therefore, the home agent maintains the

“binding” between home address and care-of address.

Any node communicating with a mobile node is called a correspondent node (CN) in

MIPv6. Initially, the CN reaches the mobile host through the home address, with the home agent forwarding communication to the mobile host. Subsequently, to communicate with the mobile node directly, the correspondent node needs to acquire the “binding” information between the home address and its current care-of address from the mobile node. The mobile node informs the CN of the binding through a return routability procedure and a binding registration. To start the return routability procedure, the mobile node sends out a Home Test Init (HoTI) and a

Care-of Test Init (CoTI) messages. While the CoTI message arrives at the correspondent node directly, the HoTI message goes through the home agent. When receiving both HoTI and CoTI messages, the correspondent node responds with a

Home Test (HoT) and a Care-of Test (CoT) messages accordingly. Both HoT and CoT messages contain halves of the test. Similarly, the HoT message is sent through the home agent and the CoT message is sent directly to the mobile node. When the mobile node receives both HoT and CoT messages, it combines the test information to form the binding management key (Kbm). Further, the mobile node uses the Kbm

14 to construct an Authenticator. Now, the mobile node is able to create a verifiable

Binding Update message (BU). Beside the authentication information, the Binding

Update message contains a new IPv6 Home Address destination option with the home address in it. The Home Address option is used to indicate the binding between the home address and the care-of address (in the source address field of the IPv6 main header). Then, the mobile nodes transmits the BU message to the correspondent node directly. The correspondent node verifies the binding in the BU message and stores the binding in the Binding Cache. Finally, the CN sends a Binding

Acknowledge message (BA) to indicate that the binding is acknowledged by the CN.

The binding registration is completed.

Now, the correspondent node is able to communicate with the mobile node directly.

For the packets sent from the CN, the CN modifies the destination IP address from the home address to the care-of address according to the record in the Binding

Cache and adds a type 2 routing header with the home address in it to the packets.

These packets are transmitted directly to the mobile node. When receiving these packets, the mobile node replaces the destination IP address with the home address and strips off the type 2 routing header.

For the packets from the mobile node to the correspondent node, the mobile node changed the source IP address from the home address to the care-of address and includes a Home Address option in the packets. At the CN side, the CN modifies the source IP address from the care-of address to the home address and removes the

Home Address option. Thus, the applications that are on top of the IP layer on both

15 CN and MN are oblivious to the situation that the MN is moving around in the IPv6

Internet because the packet crafting occurs at the IP layer.

Octet 0 1 2 3 0 Next Header Hdr Ext Len Option Type Option Length 4 8 Home Address 12 16 Figure 2. Home address option header format

MIPv6 defines the Home Address option, which can be included in the Destination

Options header (Dst Opt). Home Address option is used to inform the client of MN’s

home address. The Dst Opt is identified by Next Header value of 60. Figure 2 shows

the format of Destination Options header with Home Address option. The first 8 bits

indicate either the type of the next extension header or the upper layer protocol.

The “Hdr Ext Len” field represents the length of this extension header (including

options) in 8-octet units. The option type is set to 201 and the option length must be

16, which is the length of an IPv6 address (the option length represents the length of

the option data only). The last 16 bytes contain the home address.

Octet 0 1 2 3 0 Payload proto Header Len MH Type Reserved 4 Checksum Message Data Variable Message Data Length Figure 3. Format of mobility header

IPv6 Mobility also utilizes the Mobility Header in a lot of types of message, such as

HoTI, CoTI, HoT, CoT, BU and BA messages. The format of the Mobility Header is

shown in Figure 3. The Payload proto is 8 bit field that identifies the following type of

header. The Header Len field indicates the length of the Mobility Header. There are

16 multiple types of the Mobility Header defined in [8] for different kinds of messages

used in MIPv6. The Checksum field is used to ensure the correctness of the Mobility

Header. Finally, the message data can be added according to its MH Type.

Octet 0 1 2 3 0 Next Header Hdr Ext Len Routing Type Segments Left 4 Reserved 8 12 Home Address 16 20 Figure 4. Format of type 2 routing header

Another extension header introduced by MIPv6 is type 2 routing header. The format

of the type 2 routing header is shown in Figure 4. The Next Header field is used to

identify the type of the following header. The Hdr Ext Len defines the length of the

whole routing header (excludes the first 8 bytes). In type 2 routing header, the Hdr

Ext Len must be configured to 2. The Routing Type field indicates the type of the

routing header, which is 2 in the type 2 routing header. The Segments Left

represents the number of intermediate nodes that the packet needs to go through,

which is set to 1 in the type 2 routing header. The last 16 bytes contain the home

address.

The return routability procedure is crucial for securing MIPv6. The HoTI and CoTI

messages include cookies that should be returned by HoT and CoT. The cookies

enable the mobile node to verify if it gets the HoT and CoT from the desired

correspondent node. Furthermore, HoTI and CoTI messages can protect the home

agent from denial of service attacks. For instance, let’s assume that an attacker is

17 aware of the home address and sends packets with the home address and fake care-of addresses to the correspondent node. Without HoTI and CoTI, the correspondent node may consider these packets valid and reply with HoT and CoT messages. Then, the home agent is very likely to be flooded with HoT messages.

In addition, the return routability procedure is used to assure that the correspondent node is communicating with the right mobile node. Without this procedure, an attacker can hijack the communication by sending a forged binding between the home address and attacker’s address to the CN. The combination of HoT and CoT avoids forged Binding Update messages and makes sure that the CN is talking to the right mobile node that is associated with the home address.

3. Related Work

3.1 Anycast Request Routing in CDN

With anycast, the machines share the same IP address, referred to the anycast IP address. Typically, when a user’s browser tries to access to a web service, a user will get the same anycast IP address from the ADNS server no matter where the user is.

Figure 5 shows a typical scenario of the anycast request routing in CDN. The client

18 asks its LDNS for the IP address of

Figure 5. An example of Anycast request routing foo.com. Then, the LDNS needs to send a DNS query to the foo.com’s ADNS. Because foo.com employs a CDN to deliver its web services, its ADNS responds to the LDNS with a host name of the CDN, foo.cdn.com. Now, the LDNS would send another DNS query for foo.cdn.com and eventually receives an anycast IP address from CDN’s

ADNS. All the edge servers inside this CDN share the same anycast IP address, 1.2.3.4.

The user can only send the request to this anycast IP address that is advertised by a set of edge servers located in multiple data centers. The routers choose the shortest path based on the BGP protocol to forward the request. Normally, the selected path has the least hop count of autonomous systems. Therefore, the client request tends to reach the ‘closest’ edge server [9].

In anycast, there is no request router to make decisions. The routing is completed by the routers and the BGP protocol. This simplifies the operations of redirection.

Besides, redirection in anycast is based on the location of actual users, instead of

19 LDNS as in the DNS-based request routing. However, while mechanisms have been proposed to add server load awareness to anycast within an autonomous system

[20], anycast in general is not sensitive to the network condition and server load.

Thus, it can not redirect clients around network congestion and avoid overloaded servers.

3.2 Load-aware Anycast Routing

One of the limitations in traditional anycast is its unawareness of the network condition and server load. However, several load-aware anycast mechanisms have been proposed [20, 7]. In [20], a route controller is deployed within the CDN network.

The route controller acquires knowledge of the network condition and server load from provider edges (PE) and CDN servers. The user requests are first directed to the ingress PEs. Now the route controller selects one egress PE to forward the requests, which are further tunneled to the CDN node. This approach is free of the drawback mentioned above. However, it only works within a single autonomous system. CDNs may get connectivity from numerous ISPs and thus their platforms are split among many autonomous systems.

There is another load-aware anycast routing mechanism called FastRoute [7]. This approach introduces a FastRoute node, which contains a proxy that provides service to the user and an ADNS to answer user DNS queries. Besides, this approach spreads all the FastRoute nodes to different layers of the platform. Each layer has a distinct

20 anycast IP address. Basically, a DNS query from the user will be forwarded to one of the FastRoute nodes in the outermost layer by traditional anycast routing. The ADNS inside this FastRoute node normally replies with the anycast IP address of the outermost layer. Then, the user is able to send the request to this anycast address and finally communicate with the proxy inside the FastRoute node at the outermost layer. The proxy and the ADNS inside a FastRoute node only has the knowledge of the server load within their own node. When the FastRoute node in the outermost layer is overloaded, its ADNS can redirect its user queries to an inner layer by replying with CNAME DNS responses. However, not all the user traffic is controllable.

For example, a user can obtain the anycast address from the ADNS inside the

FastRoute node1 and send the request to the proxy inside the FastRoute node2. If the FastRoute node2 is overloaded, this user traffic can not be redirect to the inner layer. The reason is that only the ADNS inside the FastRoute node2 is aware of the overloading, so the ADNS of the FastRoute node2 sends out DNS response to redirect the user traffic, which does not affect this user. Another possible limitation of this approach is that the TTL on the DNS response delays the redirection.

3.3 ECS Extension to DNS

DNS-based request routing mechanism has several limitations as mentioned above.

The primary one is that the ADNS is not aware of the location of the actual users due to intermediate recursive resolvers. Users behind the same resolver can not be distinguished by the ADNS. In order to deal with this problem, an extension to the

21 DNS called EDNS-Client-Subnet (ECS) has been proposed. ECS allows recursive resolvers to provide client subnet to the ADNS [23]. Then, the ADNS is able to select a server based on the client subnet, which is normally a better routing decision compared with the one chosen by the traditional DNS.

ECS does not change the process of a DNS query. However, to employ ECS, the recursive resolver needs to include an ECS option, which further contains the client subnet, in the DNS query. The client subnet is configured to the first three octets of the client’s IP address by default. Upon receiving the DNS query with the ECS option, the ADNS is capable of selecting an edge server based on the client subnet. In response, the ADNS includes an ECS option that contains the “scope prefix length” field to indicate the length of the client subnet used for server selection. Then, the recursive resolver should cache this DNS response and only reuse the response for the clients that are covered by the prefix of the scope prefix length.

Normally, a client subnet should be closer to the actual client than the recursive resolver is, especially in the case of large open resolvers. Therefore, using ECS extension tends to improve the performance of DNS-based request routing.

However, there might be privacy and security concerns for users using ECS-enabled resolvers [24]. In ECS-enabled DNS, anyone in the middle between a recursive resolver and an ADNS is able to get the client subnet. This provides a convenient channel for surveillance systems to monitor users. Besides, if an on-path attacker has access to the client subnet information, the attacker can selectively redirect users traffic by forging DNS responses. The attacker can put the target IP address in a

22 malicious DNS response. Upon receiving the forged response, the recursive resolver may cache the result for a certain client subnet. Then, future DNS queries from the clients in that subnet are affected.

3.4 TCP Connection Forwarding

TCP connection forwarding is a load balancing mechanism that spreads user traffic to different servers. Unlike the DNS-based request routing, which utilizes the DNS to direct user requests, TCP connection forwarding focuses on the mapping between the IP address and the server. One of the TCP connection forwarding approaches is called Network Dispatcher [25]. Network Dispatcher is a TCP connection router which forwards user traffic to a set of edge servers. The packets from the user arrives at the connection router and then are tunneled to one of the edge servers.

The packets from the edge server to the client do not go through the connection router, which reduces the load on the connection router. The connection router acts as a centralized load balancer. One limitation of the centralized connection router is that if the centralized device is down, the whole system can not work. To address this issue, a lot of distributed TCP connection forwarding mechanisms are proposed, one of them is called Distributed Packet Rewriting (DPR) [26]. Unlike the Network

Dispatcher, DPR employs a set of distributed TCP connection routers, which are co-located with the servers that provide the services to the users. In DPR, the user traffic arrives at the one of the servers that acts as a TCP router, referred as rewriter.

Then, the rewriter rewrites the packets and forwards the user packets to another

23 server that actually services the user, called the destination. Similar to the Network

Dispatcher, the destination in DPR delivers the web content directly to the user.

With distributed TCP connection routers, DPR is free of the drawbacks caused by a centralized system and improves the scalability of the whole system. All these TCP connection forwarding mechanisms focus on the mapping between the IP address and the server, which is similar to the RIA. However, in TCP connection forwarding, all the packets from the user have to go through the TCP connection router. The user is not able to communicate with the assigned edge server directly. Thus, these approaches are suitable for routing client's traffic among edge servers when all edge servers (as well as the request router) are within the same location (same data center).

3.5 IPv6 Anycast for CDNs

We are aware of two existing approaches that employ IPv6 Mobility to implement

IPv6 anycast request routing in CDN [10,11]. Both propose a scheme to redirect the client traffic from an anycast address to a selected unicast address.

The first method considers the request router as the home agent and the CDN server as the mobile node in MIPv6. The client initials the connection by sending a SYN packet to the request router, which is further tunneled to a CDN server selected by the request router. When seeing that the destination of the SYN packet is the address of the request router, the CDN server responds a SYN ACK packet with a destination option header. The destination option header contains the IP address of

24 the request router as the home address. Besides, a Binding Update is included in a mobility header of the SYN-ACK to tell the IP layer to create a binding between the address of the request router and the address of the selected CDN edge server. This system uses the fewest steps to route a request and establish a connection, nearly the same as the TCP three-way handshake. However, the authors do not mention any means to secure the request routing. As mentioned earlier, without return routability procedure, an attacker is able to hijack the traffic by a forged SYN ACK packet with a binding between the request router and attacker’s address.

Another approach presents a versatile anycast using IPv6 Mobility [11]. This mechanism employs IPv6 Mobility to transparently redirect client traffic from the home agent to the so-called contact node, and finally to the donor or acceptor

(mobile node). There are two important parts in their system: MIPv6 handoff and

TCP handoff. This approach uses the full version of the return routability procedure to handle the MIPv6 handoff. For TCP handoff, it employs a TCPCP tool to move the server-side TCP socket from the donor to the acceptor [21]. This architecture addresses security issues and can achieve the same security level as MIPv6. However, the complex procedure for traffic handoff may impose significant performance overhead during the handoff.

25 4. Overview of the Routing-Independent

Anycast for TCP Communication

IPv6 Mobility allows a correspondent node to communicate with a mobile node that moves around IPv6 network through a persistent home address. Our system learns from IPv6 Mobility to achieve anycast for TCP connections. TCP is the most prevalent transport layer protocol among CDNs, especially in the communication between clients and edge servers. because our discussion is centered at the IP layer, we refer to TCP segments as packets throughout the thesis. The idea behind our approach was first proposed by another member in our lab [22]. We corrected and refined his design to produce a practical mechanism1, implemented a working prototype and demonstrated its transparency to applications, and conducted performance evaluation of the mechanism.

A CDN is supposed to have a set of distributed servers. In order to deploy RIA, at least one of these servers is considered anycast server and is advertised as the anycast address, which is similar to the home address in MIPv6. The request from the user will first arrive at one of the anycast servers because only the anycast address is known to the user. The rest of the servers have their own unicast address

(care-of address in the MIPv6) and are regarded as unicast servers in RIA. Clients are not aware of the unicast addresses. The web services are deployed in the unicast

1 Among the key changes, this thesis introduces holding the SYN-ACK packet within the network layer until the completion of binding verification, a mechanism to redirect client to new unicast servers for new connections while preserving the old bindings for ongoing connections, and allowing for a possibility of multiple anycast IP addresses operating concurrently.

26 servers. Thus, the request from the user is ultimately forwarded to one of the unicast servers after it arrives at the anycast server. The unicast server is selected by the anycast server based on client proximity. In the end, the client is able to communicate with the specified unicast server directly. In our basic design and in the prototype, we assume that the services in the unicast server are supposed to be bound to the unicast address, which is different from what happens in the mobile node. However, it is also practicable for the unicast servers in RIA to bind the services to the anycast address as we discuss in Section 5.5. In Routing-Independent

Anycast, the client is able to communicate with one of the unicast servers through a persistent anycast address without knowing its unicast address.

4.1 TCP Connection Establishment

Normally, when a client needs to request a web service, it should sends out a DNS query. In Routing-Independent Anycast CDN, the client ultimately receives a response with an anycast address in it. Figure 6 shows the basic packet flow for a

TCP connection establishment in Routing-Independent Anycast. The client starts the connection by sending a SYN packet to the anycast address. This SYN packet is forwarded to one of the anycast servers through conventional anycast mechanism.

Beside standard packet contents, the client includes an indicator that implies its support for our mechanism in the SYN packet. The reason is that it is convenient for the anycast server to handle the SYN packet differently according to the indicator. If the client has no support for our system, the anycast server would not forward the

27 SYN packet to any unicast server. This SYN packet might be accepted by another application deployed in the anycast server or simply dropped.

Upon receiving the SYN packet, the anycast server first checks if the packet has the indicator required by our approach. If the SYN packet fulfills the requirement, the anycast server selects a unicast server to handle the request and tunnels the SYN packet to this unicast server. The unicast server usually replies with a SYN ACK packet to acknowledge the connection. This SYN ACK packet includes the anycast address in the Destination Option header and is transmitted to the client directly.

The client first needs to check if the IP address in the Dst Opt header is equal to the destination IP address in the SYN packet in order to verify that the unicast server acts as an end-point of the anycast service. Then, the client constructs and sends out HoT and CoT messages just like the correspondent node does in MIPv6. Each message contains half of the test data. The HoT message is tunneled to the unicast server through the anycast server while the CoT message is forwarded directly to the unicast server. The HoT message includes the unicast address in the Dst Opt header to suggest its ultimate destination. Finally, the client removes the Dst Opt header from the SYN ACK packet and replaces the source address (unicast address) with the anycast address. The client does not pass the SYN-ACK packet to the TCP layer until receiving a corresponding Binding Update (BU) message. Here we have an alternative for handling the SYN-ACK packet, which will be discussed in Section 5.4.

28 Figure 6. Establishment of TCP connection using Routing-Independent

Anycast mechanism

With both the HoT and CoT messages, the unicast server is able to prove that it is the right server which is addressable by the unicast address and the anycast address. In order to create this proof, the unicast server needs to generate a binding management key (Kbm) from the test data included in the HoT and CoT messages.

Then, the unicast server places the proof in a Binding Update (BU) message and sends it to the client. The BU contains a mobility header which further consists of three extension options. The first one is the Binding Authorization Data option that

29 includes an Authenticator that is derived from the Kbm. The second option is the

Nonce Indices option that is used by the client to reconstruct the Authenticator. The final option is the Home Address Destination option which contains the anycast address and indicates the binding between the anycast address and the unicast address for the client. The unicast server sends out this Binding Update message and waits for the Binding Acknowledgement (BA).

Upon receiving the BU message, the client verifies the Authenticator and then retrieves the anycast address in the Dst Opt and the source IP address (the unicast address) to form a binding. This binding is stored in the Binding Cache, which is similar to MIPv6. After that, the client can reply with a BA message and pass the SYN

ACK packet to the TCP layer. The TCP layer responds with an ACK packet to finish the

TCP three-way handshake. This completes the connection establishment at the client, allowing the client to send data packets to the unicast server through the binding.

On the server side, the BA allows the unicast server to activate the binding. Once the

TCP layer within the unicast server receives the ACK, the connection is established at the server side.

4.2 Ongoing Connection

For the data packets of an ongoing connection, all the traffic from the client is tunneled to the unicast server directly. This is transparent to the upper layers (above the IP layer) in the client, because the transformation of the packets fully occurs at the IP layer. Figure 7 illustrates the modification of the packets for an established

30 TCP connection in Routing-Independent Anycast. When outgoing packets reach the

IP layer from the application on the client side, the client checks the Binding Cache and replaces the destination IP address with the unicast address. For incoming packets from the server, the client strips off the Dst Opt header (anycast address) and modifies the source address from the server’s unicast address to the anycast address at the IP layer. Therefore, the TCP client is oblivious to the redirection and considers itself still communicates with the anycast server.

On the server side, there is no address replacement at the IP layer because the connection is bound to the unicast address. For incoming packets from the client, the unicast server simply passes them to the TCP layer. For outgoing packets, the IP layer within the unicast server adds an Dst Opt header with the anycast address in it.

Figure 7. Packet manipulation for an established connection

31 4.3 Binding Management and Connection Termination

The Routing-Independent Anycast employs the standard TCP four-way handwave to terminate a connection. However, the client needs to record the status whether a connection is closed or not, because the binding management depends on this information. Below we describe the binding management in RIA and how it is affected by the connection termination.

One of the disadvantages of the conventional IPv4 anycast is that ongoing connections are disrupted when a routing change sends user traffic to a new anycast end-point. Our approach overcomes this problem by allowing new connections to be routed to a new edge server while keeping ongoing connections bound to the old servers. To accomplish this functionality, our approach sets up two binding structures inside the client. The first one is a common Binding Cache just like in the

MIPv6, which records the bindings between anycast addresses and unicast addresses.

This binding allows the client to open multiple connections to the same unicast server without involving the anycast server every time. The second one is a per-connection binding structure which remembers the unicast addresses for specific TCP connections. The connection binding maps the TCP 4-tuple (Source address, Source port, Destination address, Destination port), which on the client side uses the anycast address as the destination address in TCP 4-tuple, to the edge server’s unicast address. Every time a connection is established, the client not only create a binding entry in the common Binding Cache, but also in the connection

32 binding structure. For all the subsequent data packets, the client looks up the connection binding to modify the IP addresses.

With these two structures, we are able to separate the connection state from the

Binding Cache. For instance, if the anycast server decides to direct a user to another unicast server or the lifetime of a certain binding expires, the client simply removes the corresponding binding entry from the Binding Cache. The new connections will go to the new unicast server and the old connections will continue using their unicast servers to which they were assigned during the handshake since their connection binding structure remains unmodified. Now, when the client sends out a

SYN packet to initiate a new connection, the anycast server can choose a different unicast server to handle the request. As a result, a new binding is created in the client’s Binding Cache. Since there is a connection binding structure for ongoing connections that still need the old binding, the ongoing connections are not interrupted.

The principle of managing the connection binding structure is very straightforward.

The client creates the connection binding record once the connection is established and deletes the entry after the connection is terminated. While the timing to create a entry is explicit (when the client responds a SYN-ACK with an ACK packet), when to remove a connection binding needs to be considered carefully. There are several situations that lead to considering a connection to be closed in our system.

Obviously, the common scenario is that a connection is terminated through TCP four-way handwave. In this case, both of the client and the unicast server can be the

33 initiator to send the first FIN packet, or they may send the FIN packets simultaneously. Anyway, there should be at least two FIN packets and two corresponding ACK packets within the connection. Therefore, RIA within the client is capable of deleting the connection binding after detects these four packets and the

TCP layer waits for a certain time (TIME_WAIT state). The TIME_WAIT state is a 2 maximum segment lifetime (MSL) wait state and is by default set to 120s in Linux system [12]. The second indication that we can use to consider a connection closed is a RST packet in either direction. The client needs to remove the connection binding entry after 120s (the same timeout as in the TIME_WAIT state) when it sees an RST packet within the connection. The last situation we need to deal with is that a connection becomes idle for a long time. A timeout should be set for an idle connection. According to the default timeout in Azure load balancer [13] and in Cisco

NAT box [14], the timeout for an idle connection varies from 30 min to 60 min. In our system, we choose a relatively conservative timeout, 60 min. Thus, an idle connection is considered expired after 60 min. Then, the client removes the corresponding connection binding entry.

4.4 Security Consideration

Routing-Independent Anycast is able to achieve the same security level as MIPv6.

The reason is that our approach fully employs the return routability procedure in

MIPv6 except HoTI and CoTI. There are two possible attacks to our approach, which are similar to the threats to MIPv6. The first one is the Denial of Service (DoS) attack on the anycast server. MIPv6 uses HoTI and CoTI to address this issue as we

34 introduced in Section 2.2. However, lack of HoTI and CoTI does not undermine our approach because RIA utilizes SYN-ACK packet with a Dst Opt to trigger the HoT and

CoT. The attacker has to forge a well-timed SYN-ACK packet with a correct 32-bit sequence number in response to the SYN packet in order to perform the attack.

Knowing when to send the forged packet while guessing the correct sequence number within the short window (2 seconds in Linux by default) when client would consider it is virtually impossible. Unexpected SYN-ACK packets are not accepted by the client. The client usually re-transmits the SYN packet (3 times in Linux by default) when there is no matched SYN-ACK. But it is still a short time for the attacker to forge a correct SYN-ACK packet. In addition, even if the attacker manages to generate a correct SYN-ACK, the client only replies with one HoT packet for each TCP connection that is initiated by the client. Such small number of bogus HoT packets would not overwhelm the anycast server. The second threat is that an attacker can hijack the communication by sending a forged binding between client’s address and any address assigned by the attacker. Besides, the attacker may cause a DoS attack to the victim by direct unexpected user traffic to the victim. However, RIA deals with this problem just like MIPv6. The combination of HoT, CoT and encrypted Binding

Update ensures that the client talks to the right unicast server that is addressable by the unicast address and the anycast address.

The increasing number of web content uses SSL/TLS security standards, the building blocks of the HTTPS protocol [15]. CDNs like Akamai tend to follow this trend and integrate these security standards in their networks. However, SSL/TLS and HTTPS run on top of the TCP protocol. Therefore, they can be used directly on top of our

35 approach. In summary, Routing-Independent Anycast is able to achieve the same security features as the MIPv6 and the CDN.

4.5 Benefits

We designed Routing-Independent Anycast to be primarily used for request routing in CDNs. Our approach achieves some improvements comparing with DNS-based and traditional anycast request routing. Our anycast server is capable of selecting the

“best” edge server based on the identity of the client, instead of the LDNS as in

DNS-based CDNs. Besides, the ongoing connections for a client would not be interrupted when the anycast server decides to tunnel its traffic to another unicast server. Furthermore, the anycast servers can be aware of the condition of the network and the load of the unicast servers. With this information, our system does not select the unicast server purely based on the routing path, which is the case in the traditional anycast. We do require the client to install a software to perform our approach. However, we believe it is acceptable. It is not uncommon to download add-ons or extensions to achieve better performance in today’s Internet.

5. Implementation

We implemented a prototype of Routing-Independent Anycast in Linux operating system. Linux is one of the most widely used systems that supports IPv6 Mobility.

Also, there is a tool called Netfilter within the Linux kernel that we can use to intercept incoming and outgoing packets in each node [16]. In fact, our approach can

36 be deployed in any system as long as it supports IPv6. We do not utilize the entire

MIPv6 functionality to implement our method. For instance, the HoTI and CoTI messages are not used in our approach. We re-implemented part of the MIPv6 functionality that we need. However, we preserved all the types and formats of

Mobility header. According to [17], a majority of the operating systems supports

IPv6.

Our implementation involves three components of the CDN architecture we envision: an anycast server that receives the initial request from the client and forwards it to a selected unicast server, an unicast server that actually provides the web service to the client, and a client. While the unicast server has its own IP address not shared with any other node, its role is analogous to an anycast end-point: the upper-layer protocols (from TCP and up) have an illusion of communicating with the same anycast address although they actually obtain service from one of a number of unicast servers. All the nodes in our approach must support IPv6. We now describe our implementation of each of the three components.

In our prototype, all the custom packet processing are done using Netfilters to intercept packets within the network layer and hand them off to custom modules in user space. In a real system, RIA functionality would be implemented within the network layer directly. Our prototype is implemented using Python program language, but it can be implemented using C for better performance.

37 5.1 Anycast Server

The anycast server acts as a request router in CDN. All the SYN packets from the client are directed to the anycast server and intercepted by the IP layer. Before forwarding the request to the unicast server, the anycast server has to check if there is an indicator in the SYN packet specifying that the user supports RIA. We set the reserved bits within the TCP header to 001 to achieve this purpose. If there is no indicator, this request is not supposed to be processed by our approach and the anycast server simply accepts the connection and acts as a regular edge server in a traditional DNS-based CDN. Otherwise, the anycast server removes the indicator by setting reserved bits to 000 and selects the most appropriate unicast server to forward the packet. The algorithm to choose an edge server is not part of our prototype, so the anycast server selects a random edge server. In practice, an arbitrary edge server selection mechanism based on the true identity (IP address) of the client can be plugged in here. The anycast server forwards the SYN packet to the selected server by replacing the destination IP address of the packet from its own

(anycast) address to the unicast address of the selected edge server.

Another task required of the anycast server is to direct HoT messages. Each HoT message contains a Dst Opt header which includes the corresponding unicast address. Thus, the anycast server does not need to maintain any information of the connection between the SYN and the HoT packets: it is able to forward the HoT message to the unicast server by simply changing the destination IP address to the unicast address from the Dst Opt header.

38 After a connection is established, the traffic of this connection no longer goes through the anycast server again. Hence, the anycast server does not need to maintain any connection state. Further, as mentioned earlier, there is no need for the anycast server to maintain the states across packets during connection establishment: all its processing is done on a per-packet basis. Thus, the anycast server is likely to scale to handling a large number of concurrent, which improves the performance of the whole system.

5.2 Unicast Server

Unicast servers are CDN edge servers located all around world that provide web services to the clients. The anycast server assigns clients to the unicast server by forwarding SYN packets to it. The unicast server then informs its clients of the binding between the anycast address and the unicast address. Once the connection is established, the client is able to talk to the unicast server directly using the binding.

5.2.1 Control State

We describe the structures recording the control state in this subsection and their use during unicast server operation in the next. There are four structures maintained by a unicast server. First of all, the unicast server uses SYN-RCV set (an unordered python collection of unique elements) to remember the source address of each SYN packet (tunneled by anycast server). With SYN-RCV, the IP layer is able to recognize

39 the corresponding SYN-ACK packets passed from the TCP layer. After processing and sending out the SYN-ACK packet, the unicast server moves the record from SYN-RCV to a new set called SYN-ACK-SENT.

Key Value

Anycast Address

Unicast Address

Lifetime

Home Keygen Token

Care-of Keygen Token

Client Address Home Indice

Care-of Indice

Status

Authenticator

Sequence Number

Created_time (a). Format of Binding Update map

Key Value

Anycast Address

TCP 4 Tuple = (Client Status Address, Sport, FIN_count Unicast Address, dport) ACK_count

Idle_time

40 RST_count (b). Format of Connection Binding map Figure 8. Select structures in unicast server

The unicast server does not have to modify the address of the packets at the IP layer like the client does. However, it needs to inform the client of which anycast server the client originally accessed to. The unicast server does this by including a Home

Address Destination option in each outgoing packet. The Home Address Destination option is an option that can be included in the Dst Opt. Therefore, every unicast server maintains a Binding Update (BU) map to record the information of each

Binding Update the unicast server sends to the clients. Map is a key-value structure.

Also, the BU map records the test information within HoT and CoT messages. The structure of the Binding Update record is displayed in Figure 8 (a).

Finally, there is a Connection Binding map that records the binding information for each connection (Figure 8 (b)). With the Connection Binding map, the unicast server is able to preserve the state of the TCP connections even when the corresponding entry in the Binding Update map expires (exceeds its lifetime specified by the server).

5.2.2 Unicast Server Operation

Once it receives the SYN packet that is tunneled by the anycast server, the unicast server checks if the source address of the SYN packet (client address) is already in the Binding Update map. If the answer is true, the unicast server reuses this binding instead of sending a new Binding Update to the client. Otherwise, the unicast server

41 records the source address in the SYN-RCV and passes the SYN packet to the TCP stack. The TCP server responds with a SYN-ACK packet. The IP layer within the unicast server checks whether the destination IP address matches any record in the

SYN-RCV. A match means that the unicast server needs to trigger the procedure to create a new binding. Therefore, the unicast server inserts into the SYN-ACK packet a

Dst Opt header that contains the anycast address and sends it to the client. Now the unicast server moves the client address from the SYN-RCV to the SYN-ACK-SENT and waits for the HoT and CoT messages from the client.

Octet 0 1 2 3

0 Part of the Mobility header Sequence #

4 A|H|L|K Reserved Lifetime

Mobility options Various

Figure 9. Format of Binding Update message

The HoT message first arrives at the anycast server and then is redirected to the unicast server, while the CoT message is delivered to the unicast server directly. Both the HoT and CoT messages contain halves of a puzzle, called home keygen token and care-of keygen token respectively. Also, both the HoT and CoT messages include so-called nonce indices (introduced in Section 5.3) which are used to parameterize the puzzle computation. When one of the HoT and CoT messages arrives at the unicast server, the unicast server creates an inactive entry in the Binding Update map in order to store the information contained in the test messages. Upon receiving both the HoT and CoT messages, the unicast server first makes sure that

42 they are part of a valid binding creation procedure, i.e., that the source address exists in the SYN-ACK-SENT. If so, the unicast server combines the HoT and CoT to form a Kbm and further computes a 96-bit Authenticator. Following that, the unicast server is able to construct a BU message and create an entry in the Connection

Binding map. The format of the BU message is shown in Figure 9. The Binding

Update message includes three options: a Binding Authorization Data option, a

Home Address option and a Nonce Indices option. The 96-bit Authenticator is placed in the Binding Authorization Data option. The client uses this field to verify the return routability, by recalculating the Authenticator and checking that it matches the one from the BU message. Obviously, the Home Address option contains the anycast address and indicates the anycast server that the client originally communicated with. The nonce indices are included in the Nonce Indices option with which the client is able to find the original parameters and recompute the

Authenticator consistently. The lifetime field represents the time units remaining before the binding considered expired. The unicast server has the responsibility to assign the lifetime. One time unit is 4 seconds. The sequence number field is used by the unicast server to match the corresponding Binding Acknowledgement message

(BA) that the client will send after the verification of the BU message.

As mentioned above, before sending out the Binding Update message, the unicast server already created entries in the Binding Update map and the Connection

Binding map. However, the unicast server can not activate these two entries before receiving the BA message from the client. Therefore, the “status” field in both two entries indicates whether or not the entry is active. After the entries for the current

43 connection are created, the status field is set to False, indicating the entries are inactive. Now the unicast server transmits the Binding Update message to the client and waits for the BA message to activate the binding and the ACK packet to finish the three-way handshake.

Upon receiving the BA message, the unicast server needs to verify it by checking if the sequence number field matches the original number in the Binding Update map.

Besides, there is a status field in the BA that indicates whether the binding is accepted by the client or not. If the status field implies the binding was accepted (the value is less than 128), the unicast server activates the entries in both the Binding

Update map and the Connection Binding map by setting the status field to True.2

Meanwhile, the ACK packet may arrives at the unicast server and will be passed to the TCP layer after the binding is activated. If the ACK arrives before the BA, it waits at the IP layer until the connection binding is activated. There is a possible exception that the BA message fails to arrive at the unicast server. We will discuss the solution to this issue in Section 5.6.

This completes the connection establishment on the server side (which also means that the client has completed the connection establishment on its side as well – see

Section 5.3 below). For outgoing data packets received from the TCP layer, the unicast server retrieves the TCP 4-Tuple from the IPv6 and TCP headers of each packet and looks up the corresponding anycast address in the Connection Binding

2 Our prototype performs two recalculations and retransmissions of the BU message in order to deal with the BA message that indicates the binding is not acceptable. But I currently think this is useless, because the recalculation does not change the BU message.

44 map. The anycast address is placed in the Dst Opt header of the outgoing data packets. For incoming data packets, the unicast server simply checks if the corresponding connection binding exists and passes them to the TCP layer.

5.3 Client

The client is a node that requests the service from the server. It originally sends the request to the anycast server and then creates a binding between the anycast address and the unicast address. With the binding, the client is able to communicate with the unicast server directly by changing the server address from anycast to unicast and back. Noted that all the replacement and binding management occur at the IP layer. Thus, the upper layer (the Transport layer and the Application layer) is transparent to the replacement.

5.3.1 Control State

The client needs to manage five structures, which are SYN-SENT set, SYN-ACK-RCV set, Binding cache, Connection Binding map and Nonce list. The SYN-SENT set maintains TCP 4-Tuple of the SYN packet sent by the client. The record in the

SYN-SENT will be later moved to the SYN-ACK-RCV set after the client receives the corresponding SYN-ACK packet. The Binding cache is a key-value collection that maps an anycast address to the corresponding unicast address. The structure of the

Binding cache is shown in Figure 10 (a). Almost every field is similar to the Binding

Update map in the unicast server except the key is the anycast address. Also, the structure of the Connection Binding map (Figure 10 (b)) is analogous to the one in

45 the unicast server except for the key field. The Nonce list consists of multiple random numbers that are used to generate tests in the return routability procedure. Each

Nonce is identified by its Nonce indice within the Nonce list.

Key Value

Unicast Address

Lifetime

Anycast Address Status

Sequence Number

Created_time (a). Format of Binding Cache

Key Value

Unicast Address

Status TCP 4 Tuple = (Client Address, FIN_count Sport, Anycast Address, ACK_count dport) Idle_time

RST_count (b). Format of Connection Binding map Figure 10. Select structures in client

5.3.2 Client Operation

The client starts the communication by sending a SYN packet to the anycast address.

Before the SYN packet goes out, the client includes an indicator in the SYN packet to notify the server that the client supports RIA. Here we make use of one of the

46 reserved bits in the TCP header and set this bit to 1 to indicate RIA support. Besides, the client records the TCP 4-Tuple in the SYN-SENT. The elements of the 4-tuple are easy to acquire from the IPv6 and TCP headers of the packet.

When receiving the SYN-ACK packet, the client does not pass it to the TCP layer directly. Instead, the client finds out that the source address of the SYN ACK packet is different from the address (anycast address) included in the Dst Opt from the packet

IP header. With this information, the client realizes that it has reached a RIA service.

Now the client needs to check if the TCP 4-Tuple from the SYN ACK matches any record in the SYN-SENT. If there is no match, the client does not have to waste time requesting a binding from the unicast server and just replies with a RST packet to reject the connection. Otherwise, the client moves the record from the SYN-SENT to the SYN-ACK-RCV and starts the return routability procedure by sending a HoT and a

CoT messages to the unicast server. Each test message contains its own nonce index which is the index of a Nonce within the Nonce list, and a token generated from the

Nonce. Besides, the HoT message also includes the unicast address in a Dst Opt header. Therefore, the CoT message will arrive at the unicast server directly while the HoT message will be sent to the anycast server then further tunneled to the unicast server. Then, the client creates an inactive entry in the Binding Cache. Note that the SYN-ACK packet is still held on at the IP layer. However, if the SYN-ACK packet stays longer than 4 seconds, it will be dropped and the connection establishment will be considered failure this time.

47 Normally, the unicast server responds to the client with a Binding Update message that consists of the binding and security information. Upon receiving the Binding

Update message, the client must re-generates the home keygen token and the care-of keygen token from the information contained in the packet. It then generates the Kbm and uses it to verify the Authenticator field. After the verification of the Authenticator, the client is able to activate the binding in the Binding cache and create an active new entry in the Connection Binding map. The lifetime of the entry in the Binding Cache is specified by the lifetime field in the Binding Update message. Any item in the Binding Cache will be deleted after the expiration of its lifetime. The Connection Binding map is keyed by the TCP 4-Tuple and does not have a lifetime field. The items in Connection Binding map expire and are removed only after the termination of the connection. This allows ongoing connections to continue using their unicast servers even after the corresponding binding entries expire and even replaced with new bindings, mapping the anycast IP address to another unicast server.

Once the above structures are set up, the client creates a Binding Acknowledgement message (BA) and sends it to the unicast server. The values of the sequence number field and the lifetime field are copied from the BU. The status field indicates whether the binding is accepted by the client. After all these works, the client finally can pass the SYN-ACK packet to the TCP layer. The TCP layer will reply with an ACK packet to finish the TCP three-way handshake.

48 Now the connection and the binding are established within the client. The client is able to send to exchange data packets with the server. Since the upper layers (above

IP layer) are oblivious to our RIA mechanism, they still send the data packets to the anycast address. However, the IP layer at the client intercepts each packet and checks if the TCP 4-Tuples matches any key in the Connection Binding map. If so, the destination address (anycast address) is replaced with the unicast address according to the Connection Binding entry. If there is a need to support multiple anycast servers with distinct IP addresses, the client will need to insert the type 2 routing header (see Sec. 5.5 for further details). This ensures that the data packet will reach the unicast server directly. For incoming packets, the client strips off the Dst Opt header and replaces the source address (unicast address) with the anycast address from the Dst Opt.

5.4 An Alternative for Handling SYN-ACK Packets

There is another option for handling SYN-ACK packet within the client comparing with the one in our prototype. We will describe these two options and discuss their pros and cons.

In our prototype, the client holds on to the SYN-ACK packet at the IP layer and creates a new entry in the Binding Cache before passing it to the TCP layer. Figure 11 shows the basic packet flow in this option and its potential problems. Specifically, upon receiving a SYN-ACK packet with a Home Address Destination option, the client

49 extracts the anycast address from the Dst Opt and checks if the 4-Tuple exists in the

SYN-SENT set (created when the client sends out SYN

Figure 11. Handling SYN-ACK in the prototype and its potential problem packets). If the answer is no, the packet is not supposed to be handled by our system and is simply passed to the TCP layer. When the packet does have the corresponding record in the SYN-SENT set, the client checks if the anycast address matches any key in the Binding Cache. A matched entry means that the binding for this anycast address is not expired. The client then reuses the binding by replacing the source address with the anycast address and passing packet to the TCP layer. Otherwise, the client needs to create a new binding for the anycast address. In order to acquire the binding, the client sends a CoT message directly and a HoT message through the

50 anycast server to the unicast server. Upon accepting the Binding Update message, the client creates a new entry in both Binding Cache and Connection Binding map and replies with a BA message. Only at this point the client passes the SYN-ACK packet to TCP layer. Normally, the upper layer would respond with an ACK packet.

Now both the TCP connection and the binding are established within the client.

Figure 12. An alternative for handling SYN-ACK

Comparing with the method in the prototype, another option does not require the client to hold on the SYN-ACK packet. Instead, the client would pass the SYN-ACK

51 directly to the TCP layer above but intercept the corresponding ACK packet and require the binding from the unicast server while holding the ACK. Figure 12 presents the steps of this alternative and its potential problems.

In this alternative, once the client receives SYN-ACK packet, the client replaces the source address with the anycast address in the Dst Opt and removes the Dst Opt header from the packet. Unlike in the first approach, the client passes the SYN-ACK to the upper layer directly without a new binding. The corresponding ACK packet would be intercepted by the IP layer on its way out from the TCP layer. The client then sends out the HoT and CoT messages along with the ACK in order to get a new binding and finish the three-way handshake. The CoT message also piggybacks the

ACK from the client acknowledging the SYN-ACK packet. Furthermore, the client creates an inactive binding in both the Binding Cache and the Connection Binding map. Now the TCP connection is established within the client. However, all the data packets (subsequent packets after the three-way handshake) for this connection are not permitted to go through the IP layer as the binding is not activated yet. After receiving the the HoT and CoT, the unicast server strips off the Dst Opt header in the

CoT/ACK packet and passes the resulting ACK to the TCP layer above. In addition, the unicast server also computes the Authenticator from the HoT and CoT messages and responds to the client with a Binding Update message. Once receiving the BU, the client can activate the binding in both structures and reply with a BA message. With the binding now active, any data packets accumulated in the IP layer must now be sent out to the unicast server. Note that the first such data packet can be piggybacked on the BA message.

52 The primary drawback of the first method is that if the client refuses the SYN-ACK, all the work for creating the binding will be a waste of time. However, the client verifies the TCP 4 tuple of the SYN-ACK (namely, that it matches a previously sent SYN) before passing it to the TCP stack. This may reduce the possibility of rejected connections. In addition, the first method needs an extra packet for the ACK to complete TCP three-way handshake. Even if the first method piggybacked the final

ACK of the handshake on the BA message, the second method can still save a packet if both options do away with the BA message. The BA message is only used to notify the unicast server that the client has accepted the binding and activate the binding within the unicast server. There is a possibility for the unicast server to activate the binding immediately after sending out BU, assuming the client would accept the binding. Even if the binding is rejected by the client, the unicast server can delete this binding after a certain time for an idle connection.

There are two main shortcomings of the second option. First, this alternative may require more space within the client than the first option does. Indeed, the TCP layer at the client can start sending data packets as soon as it sends the final ACK of the handshake. In the second option, this ACK is sent before the binding is activated, so any data packet the client sends from this point on until the binding is finalized have to be held within the IP layer of the client. Thus, the client need to allocate a buffer to store these waiting packets.

53 Second, this alternative involves much more complexity as it needs to essentially duplicate TCP’s congestion control mechanism. Indeed, the client cannot send out all the accumulate data packets once the binding is active because it would violate the

TCP slow start restriction. Rather, the network layer would have to mimic TCP slow start in sending out the packets. Furthermore, any ACKs coming back from the server would presumably be immediately passed up to the TCP stack. This would open up the congestion window within the TCP stack, causing TCP to potentially had an exponentially growing number of packets to the network layer. In fact, since handing off packets between the layers on the same machine is likely to be faster than transmitting the packets into the network, the required buffer space within the network layer may continue to increase. More generally, the performance effects of the interactions of the duplicate congestion control mechanisms at the TCP and network layers are unclear.

For the method we chose in our prototype, the data packets don’t have to wait for the binding. They can be released immediately because the binding is activated before the connection is established within the client. There is no need for extra buffer within the network layer or for duplicated congestion control within the network layer. These benefits seem worth one extra packet the client needs to send during the connection establishment.

5.5 An Alternative for Supporting Multiple Anycast

Servers

54 A CDN usually employs multiple request routers to receive requests. The reason is that a single request router may not be able to handle all the user requests. Besides, multiple request routers should be spread in various geographical locations so that a user can communicate with a request router that is not too far away, which can reduce the delay for the connection.

In RIA, we assume there are multiple anycast servers that share the same anycast address. All the anycast servers act as request routers to handle the incoming user requests. The user requests can be forwarded to one of the anycast servers based on the BGP routing. Now we will present an alternative for supporting multiple anycast servers (request routers) with different IP addresses. Each anycast address represents a set of unicast servers. This option has several benefits. First, it provides a possibility for the CDN to perform load balancing in a controlled manner, because the CDN can forward user requests to one of the anycast servers by specifying its distinct IP address, instead of the BGP routing. Furthermore, the CDN may wish to partition the search space for unicast server selection. For example, Akamai uses subsets of edge servers for different websites. So they may decide to use different anycast IP addresses, each routing requests to different sets of unicast servers, to handle different websites.

Multiple anycast servers with different IP addresses can not simply fit in our prototype. Let’s assume that a client starts two connections with two different anycast servers (A1 and A2) and then both of the anycast servers forward the requests to the same unicast server. Assume that both connections are for the same

55 server port number, e.g., port 80. Now, there are two connections between the client and one unicast server. The client needs to create two entries in the

Connection Binding map. Because these two connections are established through different anycast servers, there is a chance that the client uses the same source port number for the two connections. For example, the key of the two entries in the

Connection Binding map can be (Client_Address, Src_Port1, A1_Address, Dst_Port) and (Client_Address, Src_Port1, A2_Address, Dst_Port) respectively. They are two distinct TCP 4-Tuples and acceptable within the client. However, things are more complex within the unicast server. In our prototype, the unicast server also needs to maintain a Connection Binding map. The difference is that, in the unicast server, the key of this structure is defined as (Client_Address, Src_Port, Unicast_Address,

Dst_Port), which does not contain the anycast address. Thus, under the situation we described before, there will be a duplicated key in the Connection Binding map within the unicast server. The unicast server will not be able to tell which connection which packet belongs to.

This problem does not affect our prototype since it assumes that all anycast servers share the same IP address. Because all the anycast servers share the same anycast address, a client is not able to utilize the same port number for different connections.

One possible side effect is that a HoT message might be directed to an anycast server that is different from the one that handled its corresponding SYN packet. However, all the HoT messages contain a Dst Opt header which further includes a unicast address. The anycast server can simply forward a HoT message to the right unicast server according to the information in the Dst Opt.

56 If the CDN want to employ multiple anycast servers with different IP addresses, there is a solution to solve this problem. The problem is caused by the fact that the client and the service have different definitions to the same connection. While the service regards the unicast address as the destination IP address, the client considers the anycast address as the destination. Thus, the solution is to make the client and the service have identical knowledge about their connections. Specifically, if there is a connection between a client and a unicast server, we want the service in the unciast server and the client to identify this connection by the same TCP 4-Tuple. To fulfill this purpose, we need the web services in the unicast servers to be bound to their corresponding anycast address. Now, a connection is able to be defined as

(Client_Address, Src_Port, Anycast_Address, Dst_Port) in both client and server. In addition, the unicast server is required to do address translations at the IP layer like the client does. The client needs to include an type 2 routing header with the anycast address in it in outgoing data packets like the correspondent node does in MIPv6.

Specifically, when receiving a data packet from the client, the unicast server must replace the destination IP address with the anycast address that is inserted by the client. When transmitting a packet to the client, the unicast server must modify the source IP address to the unicast address. The key of the Connection Binding map within the unicast server becomes exactly the same as the one in the client. The knowledge difference between the client and the unicast server will no longer exist.

5.6 Corner Cases

57 There are several corner cases we need to consider for our prototype. For example, packets might be lost during the transmission. When these exceptions happen, there might be unusable states remained in the structures. Our prototype takes advantages of the TCP retransmission to deal with some of the exceptions, as retransmissions can negate the effect of a packet loss. To handle more persistent packet losses, our prototype has a separate background task for cleaning the structures within both the client and the unicast server. The cleaning task runs every four seconds. On the unicast server side, this cleaning task checks if the inactive entries in the Binding Update map need to be removed (exceed the lifetime). If so, the corresponding records in the Binding Update map, SYN-RCV set and

SYN-ACK-SENT set will be deleted. On the client side, the cleaning task is similar to the one in the unicast server except that the targeted structures are the Binding

Cache, SYN-SENT set and SYN-ACK-RCV set. Besides, the cleaning tasks on both the client and the unicast server are also used to manage the Connection Binding map when connections are terminated. The anycast server does not maintain any state of the connection. Thus, there is no structure management in the anycast server.

Our prototype cannot cover all the exceptions with the cleaning task and the TCP retransmission. Now we list these exceptions within the unicast server and discuss their solutions as follows: a) The TCP layer within the unicast server fails to reply with a SYN-ACK. The

corresponding entry in the SYN_RCV set should not be orphaned forever. Our

prototype simply ignores this issue because the client tends to retransmit the

SYN packet if there is no corresponding SYN-ACK, and this SYN packet will reuse

58 the original entry in the SYN-RCV set. However, if an attacker sends a lot of SYN

packets from forged non-existent source IP addresses, the SYN-RCV set might be

overwhelmed by these false IP addresses. Our prototype is not able to deal with

this issue. A practical solution is to put a created time stamp in every records in

the SYN-RCV set. Then, the cleaning task can remove any defunct record from

the SYN-RCV set after a certain timeout. b) Both the HoT and CoT messages never arrive at the unicast server. Our

prototype ignores this problem just like a) above. In this situation, the IP layer

within the client receives the SYN-ACK packet, but the SYN-ACK is never passed

to the TCP layer. Without an acceptable binding, the client drops the SYN-ACK

after a timeout (4 seconds, see Section 5.3.2). The TCP layer within the client will

retransmit the SYN packet since it never received the SYN-ACK. Then, the

original entry in the SYN-ACK-SENT set within the unicast server can be reused.

Similar to the issue in a) above, the SYN-ACK-SENT set can be flooded with

non-existent IP addresses. Again, a possible solution is to add a time stamp to

every record in the SYN-ACK-SENT set when created. Eventually, any defunct

record in the SYN-ACK-SENT set will be removed. c) One of the HoT and CoT messages fails to arrive at the unicast server. Or, the BA

message that indicates the binding is not accepted never arrives at the unicast

server. Similarly, our prototype relies on the TCP retransmisson to address this

issue. In this case, the SYN-ACK will not be passed to the client TCP layer. The

client retransmits the SYN packet and the unicast server replies with a SYN-ACK.

This retransmitted SYN-ACK triggers the client to send the HoT and CoT

messages again. In response, the unicast server sends the BU message again.

59 Besides, when receiving one of the HoT and CoT messages, the unicast server

creates an inactive entry in the Binding Update map. Finally, if the binding fails

to be activated, this inactive entry is removed by the cleaning task. d) The client accepts the binding and sends out the BA message, but the BA

message never arrives at the unicast server. This is an exception that can not be

handled by our current prototype. There are two possible solutions to this issue.

The first one is that the unicast server retransmits the BU message after a

timeout (up to four times), which is the case in IPv6 Mobility. Another solution is

that if the ACK arrives from the client, the unicast server views this ACK as an

implicit BA and at that point activates the binding in the Binding Update map

and the Connection Binding map.

On the client side, there are also several possible exceptions. We list these issues and their solutions as follows: a) The client does not receive the SYN-ACK, which can be caused by a crashed

unicast server or a lost SYN/SYN-ACK during the transmission. In our prototype,

the record in the SYN-SENT set will be reused when the client retransmits the

SYN packet. However, if the SYN-ACK never arrives at the client, the unusable

record remains in the SYN-SENT set. In order to overcome this problem, we

should record a created time stamp for every items in the SYN-SENT set, which is

not implemented in our prototype. With the created time stamp, the cleaning

task is able to remove the defunct records in the SYN-SENT set for a certain

timeout.

60 b) After receiving the SYN-ACK, the client faces several possible issues. First, the

HoT, CoT and BU messages may be lost during the delivery. Second, the unicast

server crashes after sending out the SYN-ACK. Third, the Binding Update

message is rejected by the client. The client will not pass the SYN-ACK to the TCP

layer until receiving an acceptable Binding Update message. Thus, our prototype

takes the advantage of the TCP retransmission to solve these three problems.

When one of these exceptions happens before the binding is accepted by the

client, the client retransmits the SYN packet. Then, the retransmitted SYN ACK

packet triggers the client to send the HoT and CoT messages again. Further, the

unicast server is able to retransmit the BU message. Once receiving the SYN-ACK,

the client creates an inactive binding in the Binding Cache. This is useful for the

cleaning task to clean up the defunct entries. As mentioned above, the cleaning

task checks if the inactive entries in the Binding Cache need to be removed. If so,

the corresponding entries in the Binding Cache, SYN-SENT set and SYN-ACK-RCV

set will be deleted. Thus, any defunct entry caused by the issues above can be

removed.

The TCP connection may not terminate properly. In both client and the unicast server side, the cleaning task is also used to deal with the TCP connection termination. The cleaning task checks every entry in the Connection Binding map to find out if it needs to be removed. For each active entry, if the connection is terminated by the TCP four-way handwave (FIN_count >= 2 and ACK_count >= 2) or an RST packet (RST_count >= 1), and its waiting time exceed 120 seconds, the cleaning task deletes this entry. Otherwise, the cleaning task checks if the idle time

61 of the connection exceeds 60min (i.e., current_time - Idle_time > 60min, where

Idle_time records the time when the last packet appears within the connection). If so, the connection is considered terminated and the corresponding entry in the

Connection Binding map will be removed. For each inactive entry, the cleaning task only checks if its idle time are longer than 60min. The cleaning task is able to clean the unusable records in the Connection Binding map caused by exceptions during the connection establishment and the termination.

6. Evaluation

In this section, we evaluate the performance of the RIA request routing mechanism and compare it with other two request routing approaches, DNS-based request routing and HTTP redirect.

6.1 Environment

A simple testbed as shown in Figure 13 is deployed to do the experiments. There are three nodes and a network switch in this testbed. The network switch is a multiport network bridge that uses hardware addresses to process and forward data at the data link layer (layer 2) of the OSI model. The bandwidth of all the links within the testbed is 1000Mbps. In this testbed, the switch connects the client node to a

CDN which consists of one anycast server and one unicast server. We assume that the two servers share and declare the same anycast address, which is the IPv6

62 address of the anycast server. The IPv6 address of the unicast server is unknown to the client. Therefore, the anycast server will act as a request router to receive all the requests from the client and the unicast server will be the edge server to deliver web content. Also, we assume the link between the anycast server and the unicast server is protected as MIPv6 does.

Figure 13. Testbed

The two servers are identical machines, which are Dell PowerEdge 1950 with four

2.33GHz CPUs and 4 GiB RAM. The client machine is Sumsung NP940Z5L with eight

2.60GHz CPUs and 8 GiB RAM. All the machines run Linux system with kernel version

4.15, which is IPv6-enabled. Besides, we utilized a Linux tool called Traffic Control (tc) to emulate wide-area latency and limit bandwidth. Tc is a very useful Linux utility for traffic shaping [18].

To measure the overhead, we developed a web service within the anycast server and the unicast server and a client application. The client application opens a connection

63 to the IP address of the anycast server and requests the web service. While the user can download the web content directly from the anycast server in DNS-based CDNs, it should get the web service from the unicast server in CDNs that make use of RIA or

HTTP redirect. In all cases, a file of certain size would be downloaded from the server.

6.2 Overhead of RIA under Ideal Conditions

According to our anycast approach, the client needs to send a SYN packet to the anycast address to initial a connection. The anycast server receives this SYN packet and tunnels it to the unicast server. Then, a binding is created and the connection is established between the client and the unicast server. Eventually, the client communicates with the unicast server directly through the binding.

In order to measure the original overhead generated by our implementation, we conduct an experiment under a ideal situation, which means no wide-area latency, minimal data size and no bandwidth limit. The RIA overhead can be divided to two parts: connection establishment and binding lookup for data packets exchange.

To measure the latency of connection establishment, the client simply records time stamps before and after the call to socket.connect(). Socket.connect() is the function for the client to establish a connection. After this function is returned, the client is able to send data packets to the server.

64 To check the latency caused by the binding lookups and the address replacements, we measure the round trip-time for sending a small data packet and receiving the

ACK for this packet. However, we are not able to measure the RTT by interacting with the socket because the ACK packet is unseen to the socket API. Thus, we record time stamps within the Netfilters to evaluate the RTT. Although the time for passing the packets to the Netfilter is not considered, this inaccuracy is negligible comparing to the overall latency.

We configure the test bed with no extra latency and no bandwidth limit. The client starts the connection and sends a request to the server. Upon receiving the request, the web service replies with a ACK packet. The sizes of the request and ACK packets are very small, and their transmission time is negligible given the bandwidth. We iteratively run the client application 100 times and calculate the average over the reported delays. The results are shown in Table 1.

Connection Establishment Delay RTT AVG Range Standard AVG Range Standard (ms) (Min-Max) Deviation (ms) (Min-Max) Deviation Normal 0.43 0.35-0.60 0.03 0.45 0.40-0.54 0.03 TCP 11.06-28.0 RIA 18.15 4.61 3.32 2.44-3.90 0.21 2

Table 1. The overhead of the prototype

The average delay within our implementation to establish a connection is around

18ms. It is a relatively long time because the establishment involves test messages, verification and creating a binding. However, if a user is directed to the same unicast server again and the binding is not expired, the time for establishment will be

65 amortized due to the binding reuse. The average overhead caused by our approach for a round trip is about 3ms, which includes four Netfilter processes that look up the bindings and rewrite the packets. Moreover, our prototype implements processing within the Netfilters using Python, which is a relatively inefficient programming language. A real implementation using a faster language like C should have better performance.

6.3 Comparison with Other Request Routing Methods

In this section, we describe experiments on our mechanism and two other request routing methods, DNS-based request routing and HTTP redirect under various conditions. There are three network characteristics we need to modify to emulate diverse conditions: latency, bandwidth and file size.

The network parameters of our testbed are listed in Figure 14. RTTca, RTTcu and

RTTau are RTTs between the two machines. In our experiments, the unicast server should be closer to the client than the anycast server does, which means RTTca >=

RTTcu. The reason is that the unicast server is selected based on the location of the actual user, while the anycast server is chosen either by anycast routing in RIA or the

DNS according to user’s LDNS in DNS-based request routing. The anycast cannot control for network conditions, so we conservatively test for high latency between client and anycast server in RIA. Besides, we assume that the unicast server is extremely near to the client. However, the RTT between the client and its gateway needs to be considered. Normally, this initial latency is around ten to twenty

66 milliseconds. To emulate this delay, we configure RTTcu to several levels: 10ms,

20ms and 30ms. The RTT between the unicast server and its anycast server also has significant influence on our approach. However, within the same CDN, we can pretend this link to be highly optimized and set Lau to 10ms.

Figure 14. Network topology of our testbed

In a DNS-based CDN, the client request is redirected to an edge server by DNS response. With the IP address of the assigned edge server, the client requests the web content directly from this edge server, which is the anycast server in our testbed. There is no unicast server involved. However, the anycast server is selected based on the location of LDNS. Therefore, the choice may not be the ‘best’ for the actual user. The basic packets flow for testing DNS-based request routing is shown as

Figure 15.

A server-side HTTP Redirection is a method to redirect users to another URL using an

HTTP status code [19]. This method is also used in CDNs to achieve request routing.

67 Typically, the user is directed to one of the CDN servers and establishes a connection to it. This server is configured to reply with a 3XX redirection HTTP status code and a new URL that points directly to the edge server the CDN wants the user to access for the content download. Once receiving the redirection response, the user will request the service from the new URL. Thus, the HTTP redirection requires users to establish two TCP connections to complete the redirection.

To emulate HTTP redirect, we set an Apache web server in the anycast server and configure it with the HTTP redirect to the unicast server. Normally, the redirection destination would be a domain name, which means a DNS lookup involved. However, we conservatively set the destination as the IP address of the unicast server. The basic packets flow for testing HTTP redirect is shown as Figure 16.

Figure 15. Operations for testing DNS-based request routing

68 Figure 16. Operations for testing HTTP redirect

6.4 Experimental Results

There are four parameters we need to configure to emulate different situations, which are RTTca, RTTcu, bandwidth and the file size. For each configuration set, we iteratively conduct 50 measurements and calculate the average over the results to get the total time for downloading a certain file. The experimental results for our RIA mechanism are shown in Table 2. Table 3 presents the results for the DNS-based request routing and Table 4 is for the HTTP redirect. In order to show the differences among these three request routing methods, we generated several charts in Figure

17.

RTTcu = 10ms RTTcu = 20ms RTTcu = 30ms

RTTca (ms) File size(KB) Bandwidth (Mbps) Bandwidth (Mbps) Bandwidth (Mbps) 10 20 100 10 20 100 10 20 100 10 89 88 86 111 107 108 130 125 128 100 175 154 151 215 216 210 261 261 270 30 1000 946 627 617 986 691 700 1023 873 892 10000 8709 5640 5599 8745 5581 5657 8791 5881 5879 10 108 105 105 129 124 125 148 147 148 50 100 193 170 172 235 234 227 283 290 290

69 1000 964 643 634 1007 705 747 1055 895 901 10000 8727 5637 5633 8766 5712 5798 8810 5817 5859 10 139 135 134 161 155 158 180 176 176 100 221 201 201 258 260 262 310 321 319 80 1000 996 691 681 1032 748 783 1081 896 877 10000 8756 5744 5596 8793 5712 5713 8837 5907 5764

Table 2. Time for downloading a file with certain size using RIA (ms)

Bandwidth (Mbps) RTTca File size (KB) (ms) 10 20 100

10 68 65 62 100 170 153 153 30 1000 933 534 339 10000 8565 4350 1046 10 108 104 102 100 253 254 253 50 1000 1014 638 414 10000 8647 4455 1212 10 168 165 162 100 404 403 403 80 1000 1165 823 655 10000 8799 4634 1728

Table 3. Time for downloading a file with certain size using DNS-based

request routing (ms)

RTTcu = 10ms RTTcu = 20ms RTTcu = 30ms

RTTca (ms) File size (KB) Bandwidth (Mbps) Bandwidth (Mbps) Bandwidth (Mbps)

10 20 100 10 20 100 10 20 100 10 91 88 86 111 108 105 131 128 125 100 168 130 111 197 171 151 233 221 190 30 1000 918 506 193 947 545 273 983 591 372 10000 8462 4283 971 8490 4323 1034 8524 4370 1103 10 131 127 126 151 148 146 171 168 166 100 208 170 151 237 211 191 273 262 231 50 1000 958 546 233 987 585 313 1024 631 412 10000 8503 4324 1011 8529 4363 1072 8565 4410 1143 10 191 188 186 211 208 206 231 228 226 100 268 230 211 297 271 251 333 322 291 80 1000 1018 606 293 1047 645 373 1083 691 472 10000 8562 4384 1071 8589 4424 1133 8626 4469 1208

Table 4. Time for downloading a file with certain size using HTTP redirect (ms)

70 (a)

(b)

71 (c)

(d)

72 (e)

(f)

73 (g)

(h)

74 (i)

(j)

75 (k)

(l)

Figure 17. Comparison among RIA, DNS-based request routing and HTTP

redirect under different conditions. (a)-(d): Bandwidth = 10Mbps. (e)-(h):

Bandwidth = 20Mbps. (i)-(l): Bandwidth = 100Mbps

Figure 17 shows that various parameters have different influences among three request routing mechanisms. We will discuss the impact of these parameters.

76 Bandwidth Bandwidth is the maximum rate of data transfer across a given link. Thus, more bandwidth is able to improve the speed of data transfer, especially for large files. According to Figure 17 (h) and (l), the time to download a 10MB file reduces significantly in both the DNS-based request routing and the HTTP redirect when the bandwidth is increased from 20Mbps to 100Mbps. However, the improvement for our RIA method is not very noticeable. This is caused by the delay within the

Netfilters. All the packets need to be queued and go through a Netfilter one by one, the next packet will not enter the Netfilter until the former one leaves. The overhead caused by the Netfilters for a single packet is around 3ms as we presented in Section

6.2. When there are a lot of packets, the packets queue to wait for the former packet to leave the Netfilter and thus their RTT grows higher. The outgoing packets from the unicast server go through one Netfilter. According to Section 6.2, four Netfilters add around 3ms overhead, so going through one Netfilter takes roughly 0.75ms (3ms/4).

Thus, the unicast server can only send packets every 0.75ms when transferring a large file, which limits the effective throughput to around 16Mbps for 1500 byte packets (determined by the current MTU, maximum transmission unit). This result is calculated from 1500*8bits/0.75ms/1000. So the 100Mbps bandwidth can not be fully utilized in RIA. Unfortunately, our prototype is not optimized, so it cannot take advantage of high bandwidth. However, for small file transfer, the impact of the bandwidth is not crucial. Thus, our prototype still possesses good performance on small file downloading.

77 File Size Normally, if the size of the downloaded file is small, the time to establish a connection has significant impact on the overall performance. Otherwise, data transfer is much more time-consuming than the connection establishment. The connection establishment is a one-time thing after all. Downloading large files is not friendly to our RIA mechanism. The primary reason is that, in RIA, the packets need to be queued and go through the Netfilter one by one as mentioned in the

Bandwidth part above. The number of the packets increases as the size of the file becomes larger, leading to longer queues. In addition, once dequeued, each data packet needs to go through two Netfilters for the binding lookup and the packet crafting, which leads to more overhead caused by the Netfilters. Besides, the unicast server adds an 24 bytes extra header (Dst Opt header) for each outgoing data packet.

The maximum segment size (MSS) in the other two request routing methods is 1440 bytes. In RIA, the MSS becomes 1416 bytes. Thus, this extra header adds 1.6%

(24/1440) overhead for each packet. The queuing for the Netfilter processing is the most crucial factor causing heavily delays the large file transfer in our RIA request routing. The other two reasons also affect the performance of RIA. However, when it comes to small files, the overhead caused by the Netfilters and the extra header does not overwhelm the overall performance.

Difference between RTTca and RTTcu The difference between RTTca and RTTcu is critical to the performance of our RIA request routing approach. Compared with the other two methods, RIA is more sensible to this difference. For instance, in Figure 17, when the difference is large (RTTcu = 10ms and RTTca = 80ms), the RIA always has the best performance for small data delivery (10KB or 100KB) among these three

78 request routing mechanisms. In the DNS-based request routing, the user requests the web service directly from the anycast server. It is obvious that the high latency between the anycast server and the client weakens the DNS-based request routing.

In both HTTP redirect and RIA, the user is able to receive the web content from the unicast server after the connection is established. Thus, they tend to be similarly affected by the difference between RTTca and RTTcu.

6.4.1 Comparison with DNS-based Request Routing

Compared with the DNS-based request routing, RIA is able to download the web contents from the unicast server, instead of the anycast server. If the difference between RTTca and RTTcu is large, RIA tends to have performance advantage over the DNS-based mechanism. For example, in Figure 17 (b), RTTca is set to 50ms.

When the latency between the unicast server and the client is 30ms, the RIA shows worse download time than the DNS-based approach. After changing RTTcu to 20ms, we can see that the RIA has lower download time than the DNS-based method.

Normally, this advantage should be greater as the file size increases. A larger file implies that the data transfer has greater influence on the performance than the connection establishment does. In RIA, more packets will be transferred within the short path, RTTcu. Figure 17 (a) to (b) is a proof of this statement. However, as the file size becomes larger, the overhead within the Netfilters is going to hurt the RIA performance. Therefore, the experiment results indicate that our implementation of

RIA has the worst performance among the three methods when tested with a 10 MB file.

79 In short, our prototype of RIA is more suitable for median size data delivery with large difference between RTTca and RTTcu compared with the DNS-based request routing.

6.4.2 Comparison with HTTP Redirect

HTTP redirect has some similarities to our approach. They both let the user communicate with the unicast server directly after the connection establishment.

However, the HTTP redirect is supposed to have better performance for downloading web contents. The reason is that the HTTP redirect does not need to check bindings and rewrite packets. Thus, the performance of HTTP redirect is always better than the performance of RIA when it comes to data transfer phase.

For the TCP connection establishment, the HTTP redirect requires the user to receive a message containing the redirection URL from the anycast server. Then, the user is able to start a new connection with the unicast server. The total time to establish a connection in the HTTP redirect is 2*RTTca+RTTcu. In RIA, the latency for the connection establishment is RTTca+RTTau+RTTcu, which is derived as follows: the

SYN packet from the client to the unicast server takes RTTca/2 + RTTau/2. The time to transmit SYN-ACK packet is RTTcu/2. Then, the HoT message from the client to the unicast server takes RTTca/2 + RTTau/2. The time to transfer the CoT message is normally covered by the time to transfer the HoT. Finally, delivering the Binding

Update message takes RTTcu/2. Thus, the total delay is RTTca+RTTau+RTTcu

(RTTca/2+RTTau/2+RTTcu/2+RTTca/2+RTTau/2+RTTcu/2). The difference between the connection establishment of these two methods is RTTca-RTTau. In our

80 experiments, we assumed that the link between the anycast server and the unicast server is highly optimized and constantly set it to 10ms because they are both controlled by the same CDN. Thus, if the latency between the anycast server and the client is high and the size of data is not large, RIA is able to be more efficient than the

HTTP redirect. For instance, in Figure 17 (f), we configured RTTca to 80ms and RTTcu to 10ms and the bandwidth is 20Mbps. If a user needs to request a 100KB file under this circumstance, RIA request routing gets the advantage over HTTP redirect.

7. Summary

This thesis proposes Routing-Independent Anycast, an IPv6 anycast mechanism for request routing in CDN. RIA is designed to overcome limitations of DNS-based request routing and traditional anycast method. In contrast to DNS-based request routing, in our approach, the edge server is selected based on the location of the actual user. Thus, the choice tends to be the ‘best’ for the user. Besides, the traffic redirection to a selected edge server will no longer lead all the users behind the

LDNS to this edge server, which may result in server overloading. In contrast to traditional anycast, RIA is able to choose the edge server according to the network conditions. The traditional anycast is unaware of the condition of the network and the servers.

RIA is designed based on IPv6 Mobility. MIPv6 allows a mobile node to remain reachable while moving around IPv6 network [8]. The mobile node keeps its home

81 agent informed of its current address. Other nodes are able to connect with the mobile node through a binding between its home address and its current address. In

RIA, the edge server, also called the unicast server, is analogous to the mobile node.

The request router (anycast server) in CDN acts as the home agent in MIPv6. In RIA, all the anycast servers share and declare the same anycast IP address. Users open their TCP connections the anycast server. The anycast server is capable of forwarding the connections to selected unicast servers. Our approach also utilizes the protocol in MIPv6 to create and verify the binding between anycast address and unicast address. Thus, after the connection establishment, clients are able to communicate directly with their unicast servers through the bindings.

We implemented a prototype of RIA and evaluated its performance by comparing it with other two widely used request routing methods, DNS-based request routing and HTTP redirect. Several experiments are conducted on all three request routing approaches to show their performances under various network conditions. The results indicate that our approach as currently implemented is not suitable for large file downloading and is unable to fully make use of high bandwidth due to the delay within the Netfilters. However, our approach can be optimized in several ways such as using C instead of Python as the programming language and implementing our approach within the kernel. In fact, even our unoptimized prototype outperforms the existing request routing approaches in some scenarios. Compared with

DNS-based request routing, our RIA implementation is more efficient when the client is much nearer to the unicast server than to the anycast server and requests a file of medium size. HTTP redirect shares some similarity to our approach and has better

82 performance in data transfer phase. But if the distance between the client and the anycast server is relatively long and the file size is not large, RIA, and our prototype implementation of it, gets an advantage over HTTP redirect. In summary, RIA request routing outperforms HTTP redirection for small data delivery and DNS-based request routing when the client’s LDNS provides poor approximation of the client location so that the client is mapped to a distant edge server while a closer edge server is available. While our prototype implementation only wins for medium-size files in the latter case, we identified the sources of inefficiencies in the prototype. A production-strength implementation should widen significantly the range of conditions when RIA holds advantage over existing request routing approaches.

83 Bibliography

[1] Content delivery network. (2019). Retrieved from

https://en.wikipedia.org/wiki/Content_delivery_network.

[2] Content Delivery Network Usage Statistics. (2019). Retrieved from

https://trends.builtwith.com/CDN/Content-Delivery-Network.

[3] Niven-Jenkins, B., & van Brandenburg, R. (2016). Request Routing Redirection

Interface for Content Delivery Network (CDN) Interconnection (No. RFC 7975).

[4] Barbir, A., Cain, B., Nair, R., & Spatscheck, O. (2003). Known content network

(CN) request-routing mechanisms (No. RFC 3568).

[5] How Anycast Works - An Introduction to Networking - KeyCDN Support. (2018,

October 4). Retrieved from www.keycdn.com/support/anycast.

[6] Schluting, C. (2014, May 28). Networking 101: Understanding BGP Routing.

Retrieved from

www.enterprisenetworkingplanet.com/netsp/article.php/3615896/Networkin

g-101-Understanding-BGP-Routing.html.

[7] Flavel, A., Mani, P., Maltz, D. A., Holt, N., Liu, J., Chen, Y., & Surmachev, O.

(2015, May). FastRoute: a scalable load-aware anycast routing architecture for

modern CDNs. In Proceedings of the 12th USENIX Conference on Networked

Systems Design and Implementation (pp. 381-394).

[8] Johnson, D., Perkins, C., & Arkko, J. (2004). Mobility support in IPv6 (No. RFC

3775).

84 [9] Prince, M. (2011, October 21). A Brief Primer on Anycast [Web log post].

Retrieved from https://blog.cloudflare.com/a-brief-anycast-primer.

[10] Acharya, A., & Shaikh, A. (2002, August). Using Mobility Support for

Request-Routing in IPv6 CDNs. In 7th Web Caching Workshop.

[11] Szymaniak, M., Pierre, G., Simons‐Nikolova, M., & van Steen, M. (2007).

Enabling service adaptability with versatile anycast. Concurrency and

Computation: Practice and Experience, 19(13), 1837-1863.

[12] Bernat, V. (2014, February 24). Coping with the TCP TIME-WAIT state on busy

Linux servers [Web log post]. Retrieved from

https://vincent.bernat.ch/en/blog/2014-tcp-time-wait-state-linux.

[13] Kumud, D., Hu, J., & Madureira, J. (2017, September). Configure Load Balancer

TCP idle timeout in Azure. Retrieved from

https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-tcp-idle-

timeout.

[14] Brown, I., & Dooley, K. (2006). Adjusting NAT Timers. Cisco IOS Cookbook, 2nd

Edition. Retrieved from

www.oreilly.com/library/view/cisco-ios-cookbook/0596527225/ch21s11.html.

[15] Secure CDN. Retrieved from

www.akamai.com/us/en/what-we-do/intelligent-platform/secure-cdn.jsp.

[16] The netfilter.org project. Retrieved from www.netfilter.org.

[17] Comparison of IPv6 support in operating systems. (2019). Retrieved from

https://en.wikipedia.org/wiki/Comparison_of_IPv6_support_in_operating_syst

ems.

85 [18] Kuznetsov, A. N. Tc(8) - Linux man page. Retrieved from

https://linux.die.net/man/8/tc.

[19] Server-side redirect. (2019). Retrieved from

https://en.wikipedia.org/wiki/Server-side_redirect.

[20] Alzoubi, H. A., Lee, S., Rabinovich, M., Spatscheck, O., & Van Der Merwe, J.

(2011). A practical architecture for an anycast CDN. ACM Transactions on the

Web (TWEB), 5(4), 17.

[21] TCP Connection Passing. (2005, April 19). Retrieved from

http://tcpcp.sourceforge.net.

[22] Hussein, A. A. (2015). Request Routing In Content Delivery Networks (Doctoral

dissertation, Case Western Reserve University).

[23] Contavalli, C., Van Der Gaast, W., Lawrence, D., & Kumari, W. (2016). Client

subnet in dns queries (No. RFC 7871).

[24] Kintis, P., Nadji, Y., Dagon, D., Farrell, M., & Antonakakis, M. (2016, July).

Understanding the privacy implications of ecs. In International Conference on

Detection of Intrusions and Malware, and Vulnerability Assessment (pp.

343-353).

[25] Hunt, G. D., Goldszmidt, G. S., King, R. P., & Mukherjee, R. (1998). Network

dispatcher: A connection router for scalable internet services. Computer

Networks and ISDN Systems, 30(1-7), 347-357.

[26] Bestavros, A., Crovella, M., Liu, J., & Martin, D. (1998, October). Distributed

packet rewriting and its application to scalable server architectures.

In Proceedings Sixth International Conference on Network Protocols (Cat. No.

98TB100256) (pp. 290-297).

86