Can Isps Take the Heat from Overlay Networks?

Can ISPs Take the Heat from Overlay Networks? Ram Keralapura1;2, Nina Taft2, Chen-Nee Chuah1, Gianluca Iannaccone2 1Univ. of California at Davis, 2Intel Research ABSTRACT achieve this goal. ISPs manage performance of their networks in the pres- Another TE issue stems from an ISP’s need to esti- ence of failures or congestion by employing common mate its traffic matrix (TM). ISPs need to understand traffic engineering techniques such as link weight set- their TMs because they are a critical input to many tings, load balancing and routing policies. Overlay net- TE tasks such as capacity planning, reliability analy- works attempt to take control over routing in the hope sis, and link weight assignment. We believe that over- that they might achieve better performance for such lay networks could make an ISP’s job of estimating its failures or high load episodes. In this paper, we examine TM more difficult, and the errors in TM estimation will some of the interaction dynamics between the two layers propagate to errors in the above TE tasks. of control from an ISP’s view. With the help of simple In addition, routing decisions at two independent lay- examples, we illustrate how an uncoordinated effort of ers could lead to short-term or long-term traffic oscil- the two layers to recover from failures may cause per- lations. If overlay networks react to events in the IP formance degradation for both overlay and non-overlay network (e.g., failures or congestion) independently of traffic. We also show how current traffic engineering an ISP, race conditions could occur and lead to traf- techniques are inadequate to deal with emerging over- fic oscillations. Such traffic oscillations not only affect lay network services. the overlay traffic but also impact the background (non- overlay) traffic in the network. We also show that such 1. INTRODUCTION reactions by overlay networks that span multiple domains could threaten the effectiveness of BGP in isolat- Overlay networks have emerged as a promising plat- ing different domains in the Internet. form to provide customizable and reliable services at These critical issues raise an important question: Can the application layer to support multicast (e.g., Split- overlay networks and underlying IP networks form a Stream [4]), content delivery (e.g., Akamai [1]), resilient synergistic co-existence? Given the increasing popular- connectivity (e.g., RON [3]), and distributed hash table ity of overlay networks, it is critical to address issues services [12, 14], among others. that arise from the interaction between the two layers. Overlay networks typically consists of pre-selected nodes, We hypothesize that it could be problematic to have located in one or multiple network domains, connected routing control in two layers, when each layer is un- to one another through application-layer routing. One aware of things happening in the other layer. ISPs may of the underlying paradigms of overlay networks is to neither be aware of which nodes are participating in an give applications more control over routing decisions, overlay, nor their routing strategy. Overlay networks that would otherwise be carried out solely at the IP are not aware of the underlying ISP’s topology, load layer. The advent of a wide variety of active measure- balancing schemes, or timer values for failure reaction. ment techniques has made this possible. An overlay net- Qiu et al. [11] describe the interactions between over- work typically monitors multiple paths between pairs of lay networks and ISPs after the routing control mecha- nodes and selects one based on its own requirements of nisms reach the Nash equilibrium point. In this paper, end-to-end delay, loss rate, or throughput. we explore some of the issues that arise due to dynamic Allowing routing control at both the application and interactions in the presence of unexpected or unplanned the IP layers could have profound implications on how events such as network failures. We believe that the dy- Internet Service Providers (ISPs) design, maintain, and namic network behavior in the presence of failures (that run their networks. The network architecture, along are common, everyday events for ISPs [7]) is very im- with the set of algorithms and traffic engineering (TE) portant for ISPs and needs to be addressed. policies that ISPs use, are based on certain assumptions Using simple illustrations, we identify numerous is- about how their customers and traffic behave. Overlay sues that result in potentially harmful interactions and networks could call into question some of these assump- make the case for future research in this direction. Al- tions, thus rendering it more difficult for ISPs to achieve though it is unclear today what portion of the Internet their goals. For example, it is very important for an ISP traffic will be overlay traffic many years from now, the to perform load balancing across all of its links to en- potential is large, and thus we believe that it is impor- sure good performance and also to limit the impact of tant to study the impact of such a trend, before it hap- failures. We hypothesize that the co-existence of mul- pens. Our focus is on understanding how the network tiple overlays could severely hamper an ISP’s ability to management techniques currently deployed by ISPs are In any IP network, the link utilization level deter- impacted by overlay networks. mines the delay, throughput, and losses experienced by traffic flows traversing the link. To simulate realistic 2. SIMULATING ROUTING DYNAMICS link delays, we use a monotonically increasing piece- To quantify the interactions between the overlay layer wise linear convex function similar to [6] and [11]. Note and the underlying IP layer, we built a Java-based con- that in all our test scenarios, we assume that a load of trol plane simulator to analyze (a) the conflicts in de- 50 units on a link is the threshold value beyond which cisions made by two different layers and (b) the impact it exhibits very high delay values. of such decisions on the data traffic. Overlay nodes typically encrypt the information about the true final destination in the data packets via encap- 2.1 Overlay Network Dynamics sulation that specifies intermediate nodes as the current While different overlay networks designed for a wide destination. In our simulator, overlay traffic is gener- range of applications may differ in their implementa- ated by overlay source nodes, but it is important to re- tion details (e.g., choice of topologies or performance alize that at layer-3 (the carrier’s point of view), overlay goals), most of them provide the following common set and non-overlay traffic are indistinguishable. of functionalities: path/performance monitoring, fail- 3. TRAFFIC ENGINEERING CHALLENGES ure detection and restoration. In our simulation model, we attempt to capture the most generic properties of ISPs apply TE mainly in reaction to changes in the an overlay network: topology (due to link failures [6, 10]) or traffic demands ² The routing strategy in most overlay networks is to (due to flash crowd events or BGP failures). In these select the path between a source and a destination with scenarios, a common way for ISPs to manage the traffic the best performance based on end-to-end delay, through- is by changing the IGP link weights. ISPs make two put, and/or packet loss. We assume that the overlay assumptions while using this technique: (i) traffic de- network will select the path with the shortest end-to- mands do not vary significantly over short timescales, end delays. and (ii) changes in the path within a domain do not im- ² Most overlay networks monitor the paths that they pact traffic demands. Overlay network routing defeats are using by sending frequent probes to make sure the these assumptions as illustrated below. path adheres to acceptable performance bounds. In our 3.1 Traffic Matrix Estimation simulation we consider the probe interval to be x time units (we use generic time units because we are more ISPs employ techniques such as [13] to compute their concerned about the ratio and relative values of timers TM i.e., a matrix that specifies the traffic demand from rather than network specific values). origin nodes to destination nodes in a network. As fi- ² If an overlay’s probe detects a performance problem nal destinations are altered by overlay nodes via packet on a path (due to failures/congestion/etc. in the IP encapsulation as mentioned above, the IP layer is un- network), then the overlay network sends probes at a aware of the ultimate final destination within its do- higher rate to confirm the problem before selecting an main. Traffic between two overlay nodes could traverse alternate path. In our model, if a probe does not re- multiple overlay hops. At each hop, traffic exits the IP ceive a response within a given timeout value, then the network and reaches the overlay node, which deciphers overlay network queries the path at a higher rate (every the next overlay hop information and inserts the traffic y time units) to ensure that the path is bad. If a path back into the IP network. Consider the network in Fig- remains bad after n such high frequency probes, the ure 1(a), which has multiple overlay nodes (at A, B, D overlay network will then find an alternate path (in this and G) in a ISP domain. Suppose the traffic between case, the next best path) between the source and des- nodes A and D is 10 units. If IP routing is used then tination nodes. As soon as an alternate path is found, the TM entry for the source-destination pair AD is 10.

Load more