Bwe: Flexible, Hierarchical Bandwidth Allocation for WAN Distributed Computing

BwE: Flexible, Hierarchical Bandwidth Allocation for WAN Distributed Computing Alok Kumar Sushant Jain Uday Naik Anand Raghuraman Nikhil Kasinadhuni Enrique Cauich Zermeno C. Stephen Gunn Jing Ai Björn Carlin Mihai Amarandei-Stavila Mathieu Robin Aspi Siganporia Stephen Stuart Amin Vahdat Google Inc. [email protected] ABSTRACT Keywords WAN bandwidth remains a constrained resource that is eco- Bandwidth Allocation; Wide-Area Networks; Soware- nomically infeasible to substantially overprovision. Hence, Deûned Network; Max-Min Fair it is important to allocate capacity according to service priority and based on the incremental value of additional allo- 1. INTRODUCTION cation. For example, it may be the highest priority for one service to receive Ï«Gb/s of bandwidth but upon reaching TCP-based bandwidth allocation to individual ows con- such an allocation, incremental priority may drop sharply tending for bandwidth on bottleneck links has served the In- favoring allocation to other services. Motivated by the ob- ternet well for decades. However, this model of bandwidth servation that individual ows with ûxed priority may not allocation assumes all ows are of equal priority and that all be the ideal basis for bandwidth allocation, we present the ows beneût equally from any incremental share of available design and implementation of Bandwidth Enforcer (BwE), bandwidth. It implicitly assumes a client-server communi- a global, hierarchical bandwidth allocation infrastructure. cation model where a TCP ow captures the communication BwE supports: i) service-level bandwidth allocation follow- needs of an application communicating across the Internet. ing prioritized bandwidth functions where a service can rep- his paper re-examines bandwidth allocation for an im- resent an arbitrary collection of ows, ii) independent alloca- portant, emerging trend, distributed computing running tion and delegation policies according to user-deûned hier- across dedicated private WANs in support of cloud comput- archy, all accounting for a global view of bandwidth and fail- ing and service providers. housands of simultaneous such ure conditions, iii) multi-path forwarding common in traõc- applications run across multiple global data centers, with engineered networks, and iv) a central administrative point thousands of processes in each data center, each potentially to override (perhaps faulty) policy during exceptional con- maintaining thousands of individual active connections to ditions. BwE has delivered more service-eõcient bandwidth remote servers. WAN traõc engineering means that site-pair utilization and simpler management in production for mul- communication follows diòerent network paths, each with tiple years. diòerent bottlenecks. Individual services have vastly diòer- ent bandwidth, latency, and loss requirements. We present a new WAN bandwidth allocation mechanism CCS Concepts supporting distributed computing and data transfer. BwE provides work-conserving bandwidth allocation, hierarchi- •Networks → Network resources allocation; Network management; cal fairness with exible policy among competing services, and Service Level Objective (SLO) targets that independently account for bandwidth, latency, and loss. BwE’s key insight is that routers are the wrong place to map Permission to make digital or hard copies of part or all of this work for personal policy designs about bandwidth allocation onto per-packet or classroom use is granted without fee provided that copies are not made or behavior. Routers cannot support the scale and complex- distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page. Copyrights for third-party components of ity of the necessary mappings, oen because the semantics this work must be honored. For all other uses, contact the owner/author(s). of these mappings cannot be captured in individual packets. SIGCOMM ’15 August 17-21, 2015, London, United Kingdom Instead, following the End-to-End Argument[Ê], we push © 2015 Copyright held by the owner/author(s). all such mapping to the source host machines. Hosts rate ACM ISBN 978-1-4503-3542-3/15/08. limit their outgoing traõc and mark packets using the DSCP DOI: http://dx.doi.org/Ï«.ÏÏ/;ÊE.;Ê;;Ê ûeld. Routers use the DSCP marking to determine which 1 path to use for a packet and which packets to drop when congested. We use global knowledge of network topology and link utilization as input to a hierarchy of bandwidth enforcers, ranging from a global enforcer down to enforcers on each host. Bandwidth allocations and packet marking policy ows down the hierarchy while measures of demand ow up, starting with end hosts. he architecture allows us to de- couple the aggregate bandwidth allocated to a ow from the handling of the ow at the routers. BwE allocates bandwidth to competing applications based bandwidth functions on exible policy conûgured by cap- Figure Ï: WANNetwork Model. turing application priority and incremental utility from additional bandwidth in diòerent bandwidth regions. BwE 2. BACKGROUND supports hierarchical bandwidth allocation and delegation We begin by describing our WAN environment and high- among services while simultaneously accounting for multi- light the challenges we faced with existing bandwidth alloca- path WAN communication. BwE is the principal bandwidth tion mechanisms. housands of individual applications and allocation mechanism for one of the largest private WANs services run across dozens of wide area sites each containing and has run in production for multiple years across hundreds multiple clusters. Host machines within a cluster share a com- of thousands of end points. he systems contributions of our S S mon LAN. Figure Ï shows an example WAN with sites Ï, work include: S CÏ C S and t; Ï and Ï are clusters within site Ï. We host a combination of interactive web services, e.g. ● Leveraging concepts from Soware Deûned Network- ing, we build a uniûed, hierarchical control plane for search and web mail, streaming video, batch-style data pro- bandwidth management extending to all end hosts. In cessing, e.g., MapReduce [Ït], and large-scale data transfer particular, hosts report per-user and per-task demands services, e.g., index copy from one site to another. Cluster to the control plane and rate shape a subset of ows. management soware maps services to hosts independently; we cannot leverage IP address aggregation/preûx to identify ● We integrate BwE into existing WAN traõc engineer- a service. However, we can install control soware on hosts ing (TE) [Ï;, ÏÏ, Ï] mechanisms including MPLS Auto- and leverage a control protocol running outside of routers. Bandwidth [] and a custom SDN infrastructure. BwE We started with traditional mechanisms for bandwidth al- takes WAN pathing decisions made by a TE service location such as TCP, QoS and MPLS tunnels. However these and re-allocates the available site-to-site capacity, split proved inadequate for a variety of reasons: across multiple paths, among competing applications. At the same time, we beneût from the reverse integra- ● Granularity and Scale: Our network and service capac- tion: using BwE measures of prioritized application de- ity planners need to reason with bandwidth allocations mand as input to TE pathing algorithms (Section .t.Ï). at diòerent aggregation levels. For example, a product group may need a speciûed minimum of site-to-site ● We implement hierarchical max-min fair bandwidth al- bandwidth across all services within the product area. location to exibly-deûned FlowGroups contending for In other cases, individual users or services may require resources across multiple paths and at diòerent levels of a bandwidth guarantee between a speciûc pair of clus- network abstraction. he bandwidth allocation mech- ters. We need to scale bandwidth management to thou- anism is both work-conserving and exible enough to sands of individual services, and product groups across implement a range of network sharing policies. dozens of sites each containing multiple clusters. We need a way to classify and aggregate individual ows In sum,BwE delivers a number of compelling advantages. into arbitrary groups based on conûgured policy. TCP First, it provides isolation among competing services, deliv- fairness is at a -tuple ow granularity. On a congested ering plentiful capacity in the common case while maintain- link, an application gets bandwidth proportional to the ing required capacity under failure and maintenance scenar- number of active ows it sends across the links.Our ios. Capacity available to one service is largely independent of services require guaranteed bandwidth allocation inde- the behavior of other services. Second, administrators have a pendent of the number of active TCP ows. Router QoS single point for specifying allocation policy. While pathing, and MPLS tunnels do not scale to the number of service RTT, and capacity can shi substantially,BwE continues to classes we must support and they do not provide suõ- allocate bandwidth according to policy. Finally,BwE enables cient exibility in allocation policy (see below). the WAN to run at higher levels of utilization. By tightly inte- grating loss-insensitive ûle transfer protocols running at low ● Multipath Forwarding: For eõciency, wide area packet priority with BwE, we run many of our WAN links at «ì forwarding follows multiple paths through the net- utilization. work, possibly with each path of varying capacity. Routers hash individual service ows to one of

Load more