How to Split a Flow ?

Tzvika Hartman∗, Avinatan Hassidim∗, Haim Kaplan∗†, Danny Raz∗‡, Michal Segalov∗ ∗Google, Inc. Israel R&D Center †Tel Aviv University ‡Technion, Israel tzvika, avinatan, haimk, razdan, msegalov @google.com { }

Abstract —Many practically deployed flow algorithms produce 4 4 2 2 a b c d the output as a set of values associated with the network links. 1 However, to actually deploy a flow in a network we often need to i e f g h represent it as a set of paths between the source and destination 1 1 2 2 nodes. In this paper we consider the problem of decomposing a flow into a small number of paths. We show that there is some fixed Fig. 1. A flow. We can decompose this flow into 4 flow-paths e,f,g,h of constant β > 1 such that it is NP-hard to find a decomposition value 1, a,b,g,h of value 1, a, b, i of value 1, and a, b, c, d of value 2. A in which the number of paths is larger than the optimal by smaller decomposition consists of the paths e,f,i of value 1, a,b,g,h of a factor of at most β. Furthermore, this holds even if arcs value 2, a, b, c, d of value 2. are associated only with three different flow values. We also show that straightforward greedy algorithms for the problem can produce much larger decompositions than the optimal one, on certain well tailored inputs. On the positive side we present a new they may differ substantially in the number of paths used, their that decomposes all but an -fraction of latency, and in other parameters. Our focus in this paper is on the flow into at most O(1/2) times the smallest possible number algorithms that decompose a flow into a minimum number of of paths. paths. (If we are given a multicommodity flow, we just apply We compare the decompositions produced by these algorithms the algorithm to each flow separately.) on real production networks and on synthetically generated data. Our results indicate that the dependency of the decomposition Minimizing the number of paths is desirable for many size on the fraction of flow covered is exponential. Hence, covering reasons. When engineering traffic in a network each path the last few percent of the flow may be costly, so if the application consumes entries in tables of the routers that it goes through. allows, it may be a good idea to decompose most but not all the Each path also requires periodic software maintenance to flow. The experiments also reveal the fact that while for realistic check its available bandwidth and other quality parameters data the greedy approach works very well, our novel algorithm which has a provable worst case guarantee, typically produces [16], [4]. Therefore, reducing the number of paths saves only slightly larger decompositions. resources and makes network management more efficient, both for automatic software and network operators. I.INTRODUCTION Evidently, the number of paths is not the only important Often when tackling network design and routing prob- quality criteria of the decomposition. Other parameters are lems, we obtain a flow (or a multicommodity flow) between also important. Among them are the maximum and average a source-sink pair (or pairs) by running a maximum flow latency of the paths, and the maximum number of paths going algorithm, a minimum cost flow algorithm, or solving a through a single link or a single node. Optimizing more than linear program that captures the problem.1 These algorithms one parameter is not naturally well defined, and thereby more represent and output a flow by associating a flow value with complicated. We focus on the number of paths as our main each arc. The value of each arc is at most the arc capacity, and objective criterion, but we do address other criteria in our in each node the total incoming flow equals the total outgoing experimental study. flow, except for the source and the sink. See Figure 1 for an Flow algorithms use the notion of residual graph which example. contains all nonsaturated arcs on which we can push more To deploy a flow in a network, say a traffic engineered flow. Initially the residual is identical to the original graph. MPLS network [7], [15], or an open-flow network [13], we An augmenting path based algorithm finds a directed path often need to represent the flow as the union of paths from from the source to the sink in the residual graph (called an the source to the sink, such that each path is coupled with the augmenting path), pushes the maximum possible flow along amount of flow that it carries. In general, there may be many it and updates the residual graph. It stops when there is no ways to represent a flow as a union of source-sink paths (see directed path from the source to the sink in the residual graph. Figure 1). While all these representations route the same flow Different algorithms choose different augmenting paths, and their running time is affected by the number of paths they 1Linear programming or even convex programming is typically the method of choice for more complicated multicommodity flow and network design use. problems. Why not take the paths produced by such an algorithm as our decomposition? There are two problems with this naive [18] does not indicate what is the complexity of our problem approach. First, the augmenting paths used by these algorithms in these cases, which are common in real networks. are paths in the residual graph which contains arcs that are not We show that even when there are only three different flow in the flow, therefore these paths may not be contained in the values on the arcs (and even if these values are only 1, 2, or original flow. See Figure 2 for such an example. Second, the 4) then the problem is NP-hard. Furthermore, our reduction number of these augmenting paths may be large, since even also shows that it is hard to approximate the problem better the fastest augmenting path algorithms use a quadratic (in the than some fixed constant, i.e. there is some fixed constant β size of the graph) number of paths. such that no algorithm can guarantee a decomposition with less than β OPT paths in polynomial time unless P = NP . e f · d g In contrast, we show that if there are only two flow values x a b c s t and y, such that x divides y, then there is a simple algorithm that finds an optimal decomposition. We implemented the greedy algorithms and our new ap- p1 p2 proximation algorithms and compared their performance on e f g d data extracted from Google’s backbone network and on syn- a b c s 0 0 0 t thetically generated layered networks, representing intra data center networks. For the Google network, we used a dataset consisting p1 p2 of a few hundred demands observed at a particular time Fig. 2. In this flow f the capacities of all the arcs are 1. The dashed edges period. For each such a demand, we generated a fractional represent long paths. Assume that we run an augmenting path maximum flow multicommodity flow using an LP solver. We decomposed algorithm, like, for example the algorithm of Edmonds and Karp on G(f). The first augmenting path can be a, b, c. (Augmenting path algorithm augment various fractions of each of the flows using these algorithms flow on shortest residual paths.) After pushing one unit of flow on a, b, c we and compared the size of the decomposition they produce and obtain the residual graph at the bottom of the figure. The next augmenting the average and maximum latency of the paths. Our findings path is d, e, b0,f,g which uses the arc b0 which is opposite to b and is not in the original flow. The last augmenting path would be the concatenation of are as follows: p1, b and p2. 1) The greedy width algorithm produces the most compact decompositions on the data we used, with the bicriteria width algorithm lagging behind by a few percent. This A. Our Results indicates that flows on which the greedy width algorithm We consider two natural greedy algorithms for the problem, decomposes badly are unlikely to occur in real data. also studied by [18]. One picks the widest flow-path, removes The greedy length and the bicriteria length algorithms it and recurses, we call it greedy width. The other picks a perform 10-20% worse than their width counterparts. shortest (by latency) path, removes it and recurse, we call it To eliminate the risk of getting a bad decomposition greedy length. while using the greedy width algorithm it may make We show that in the worst case these greedy algorithms sense to run both the greedy width algorithm and our produce a large decomposition compared to the optimal one. new bicirteria width algorithm and select the smaller Specifically, the number of paths which they produce could be decomposition. as large as Ω(√m)OPT where m is the number of arcs and 2) The maximum and average latency of the paths produced OPT is the size of the optimal decomposition. Furthermore, by the greedy width and the bicriteria width algo- this bad performance occurs even if we want to decompose rithms were comparable. Somewhat counter-intuitively only a constant fraction of the flow. the maximum and average latency of the greedy length We give a new algorithm similar to the greedy algorithms and the bicriteria length algorithms were slightly larger. mentioned above, and prove that for any  it decomposes This is a side affect of the fact that the decomposition OP T (1 ) fraction of the flow into at most O( 2 ) paths. produced by these algorithms were considerably larger −  In particular, for an appropriate choice of parameters, it can This shows that on the data we tested, optimizing the decompose 1/3 of the flow into at most OPT paths (see number of paths does not sacrifice latency substantially. Section V). We consider two versions of this algorithm, one 3) The number of paths required increases exponentially which we call the bicriteria width algorithm which is similar with the fraction of the flow we decompose (again on to the greedy width algorithm and the other which we call the tested data sets). the bicriteria length algorithm which is similar to the greedy length algorithm. B. Related work It is known that the problem of decomposing a flow into the Vatinlen et al. considered exactly the same problem as we minimum number of paths is strongly NP-hard by a reduction do, i.e. minimizing the number of paths. They gave an example from 3-partition [18]. But notice that the 3-partition problem showing that counter-intuitively, by eliminating cycles from is easy if all the flow values on the arcs are powers of 2, or if the flow, the size of the optimal solution can increase. They there are constantly many different values. So the reduction of defined a decomposition to be saturating if we can obtain it by taking a source-sink path, pushing as much flow as we can of paths. In contrast, recall, that in our setting the flow is along it, removing it from the flow and repeating this process. given as the input and we cannot change it. Kleinberg studied In particular the greedy width and the greedy length algorithms a single-source version of the problem where we are given a that we defined above produce saturating decompositions. single source and many terminals each with demand associated Vatinlen et al. show that there may not be a saturating with it. The problem is to decide whether the demand to optimal solution. They also show that the size of a saturating each terminal can be routed on a single path so that capacity decomposition is at most m n +2 where m is the number constraints are satisfied. Kleinberg gave approximation algo- of arcs and n is the number− of nodes,2 and consists of O(n) rithms for some optimization versions of this problem. Many more paths than OPT. variations of the problem were introduced since Kleinberg’s Vatinlen et al. also defined and implemented the greedy work, including unsplittable multicommodity flow, variants width and the greedy length algorithms. They compared their in which we allow more than one path per commodity, and performance on random networks and showed that for large varying optimization criteria. (for recent work see [12], [2] enough networks with relatively large average degree the and the references there) greedy width is slightly better (< 3%), and they both were Mirrokni et al. [14] studied a network planning problem close to the upper bound of m n +2. in which one goal is to minimize the number of paths in Hendel and Kubiak [9] in an− unpublished manuscript con- multi-path routing setting such as an MPLS network. They sider a similar problem in which the input flow is acyclic, formulate the problem as a multicommodity flow problem and there is a nonnegative cost associated with each arc, and the derive from the formulation that an optimal solution can be goal is to find a decomposition in which the maximum cost decomposed into at most m+k paths, where k is the number of of a path is minimum. The networking interpretation of this commodities and m is the number of routes. They further try to objective is to minimize the maximum latency of a path in reduce the number of routes by using an the decomposition. They give several results classifying the formulations. They show that their approach is effective on complexity of this problem (for details see their manuscript). various simulated topologies. The problem considered by Hendel and Kubiak is easier if the lengths of all arcs are the same. In this case we II. PRELIMINARIES want to minimize the number of hops of the path with The input to our algorithm is an st-flow f represented as a the largest number of hops in the decomposition. One can graph G(f) with two special vertices s and t called the source construct a polynomial algorithm for this problem using linear and the sink, respectively. Each arc e has a nonnegative flow- programming. We write an LP for the maximum possible flow value f(e) associated with it. For every vertex v other than s on paths of at most k hops. This LP has a variable for each and t the sum of the flow-values on the arcs incoming to v path of at most k hops so it may have exponentially many equals the sum of the flow values on the arcs outgoing from variables. The dual of this LP has a constraint per path of at v. The total flow on arcs outgoing from s minus the total flow most k hops which requires that the sum of the dual variables on arcs incoming to s is the value of f. This is equal to the associated with the arcs on the path is at least 1. There is a total flow incoming to t minus the total flow outgoing from t. polynomial separation oracle for this dual so we can solve it A flow-path in f is a path p from s to t in G(f) together using the ellipsoid algorithm [10], [8]. To find the smallest k with a value which is smaller than the flow-value on all the such that all the given flow can be covered by paths with at arcs along p. most k hops we perform binary search on k solving an LP as A decomposition of an st-flow f is a collection of flow- above in each iteration of the search. paths with the following property: If we take the union of Another possible objective is to minimize the average these paths, and associate with each arc e in this union the weighted latency of the paths in the decomposition (where the sum of the values of the paths containing e we get an st-flow 0 0 weight of each path is the amount of flow it covers). However, f . The flow f is contained in f and its value is equal to the 0 it is not hard to see that all the decompositions of a given flow value of f.3 Note that the difference between f and f is a have the same average weighted latency. This average equals collection of flow cycles and if f is acyclic then f must be 0 to the sum over the arcs of the latency of the arc times its equal to f . flow value. Our goal is to find a decomposition of f with the smallest Minimizing the nonweighted average latency and the maxi- possible number of paths. We point out that removing cycles mum number of paths through each vertex are also interesting from f may change the size of the smallest solution as shown objectives which we do not consider in this paper and as far in [18]. as we know have not been addressed. Let G be a graph with capacity c(e) associated with each A problem related to ours is the maximum unsplittable flow arc e and two special vertices s and t. An st-flow in G is an problem introduced by Kleinberg [11]. In this problem we try st-flow f such that G(f) is a subgraph of G and for each arc to find a flow which can be decomposed into a small number e, f(e) c(e). A maximum st-flow in G in an st-flow in G with maximum≤ value. 2In fact they showed this for a more general family of independent decompositions, where the incidence vectors of the paths, in the space where 3A flow f 0 is contained in f if each arc in f 0 is also in f and its flow-value each coordinate is an arc, are independent. in f 0 is smaller than its flow-value in f. To simplify the presentation, we will use flow instead of decompose a flow. Each such algorithm finds the best flow- st-flow in the rest of the paper. path according to some criteria, add it to the decomposition, We will use the following basic lemma. The proof is straight removes it from the flow and continue decomposing the forward and omitted. remaining flow. Maybe the most natural among these greedy algorithms are Lemma II.1. Let G be a graph in which all capacities are the greedy length algorithm and the greedy width algorithm. multiples of x, and let f be a maximum flow in G of value F . Let G(f) be the graph of the current flow f. (Note that G(f) Then F is a multiple of x and we can decompose f into exactly changes as we subtract flow-paths that we accumulate in our F/x paths each routing x flow. Furthermore, any collection decomposition.) In the greedy length we find a shortest path of paths each routing x flow can be completed to such a from s to the t in G(f), according to some latency measure decomposition of the flow. on the arcs such as RTT (Round Trip Time). We route the III. TWO VALUE CAPACITIES maximum possible flow along this path, so that when we remove this path, at least one arc is completely removed from A simpler version of the flow decomposition problem is the flow. when all the capacities in have one of two possible values, G The greedy width finds the “thickest” path from s to t in and , such that divides , i.e. for some integer x y x y y = kx G(f), which is a path that can carry more flow than any other . k> 1 path from s and t. Again, we route the maximum possible flow We can find an optimal decomposition in this case with along this path, so that at least one arc is completely removed the following algorithm. We first form the subgraph Gy of from the flow when we add the path to the decomposition. G (which is not necessarily a flow) induced by all arcs of We implement greedy length and the greedy width by capacity y. We find a maximum flow Fy from s to t in Gy. running a version of Dijskstra’s single source shortest path We decompose Fy into paths of value y, and add these paths algorithm in each iteration [5]. For the greedy length we run into our decomposition. This is possible by Lemma II.1. Then 0 the standard version of Dijskstra’s algorithm using RTTs as we subtract Fy from G and get a flow G . Notice that since the weights of the arcs. With the greedy width we need a x divides y all capacities in G0 are multiples of x. So using 0 modified version of Dijskstra for the bottleneck shortest path Lemma II.1 again we decompose G into paths of flow value problem. Here the weight of each arc is the flow that it carries and add these paths into our decomposition. We now prove x in G(f). The modified version of Dijskstra for the bottleneck that this algorithm indeed find the optimal solution. uses the width of the thickest flow-path Theorem III.1. The algorithm described above produces an from s to v as the key of v in the heap (rather than the length optimal solution. of this path in the standard version). Note that asymptotically faster algorithms for the bottleneck shortest path problem are Proof: Let F1 be the part of F that OPT routes on paths known [6] and also for the standard single source shortest of value > x. Note that each such path routes at most y flow. path problem when we assume integer weights [17]. These So F F of the flow OPT routes on paths of value < x. It − 1 algorithms are more complex and may be practical only for follows that very large graphs. F1 F F1 OPT + − (1) Both greedy approaches remove at least one arc from G(f) ≥ y x for each path they pick, so they will stop after at most m To minimize the right hand side of Equation (1) we want iterations (where m is the number of arcs in the flow we started to have F1 as large as possible. However since all paths of out with). Furthermore in the decomposition they produce we value > x that OPT uses must be in the subgraph induced have at most m paths. by the arcs of flow value y, we get that F1 cannot be larger than the maximum flow in that subgraph. By the definition of A. The worst case performance of the greedy algorithms the algorithm this maximum flow equals to #(y) y, where The next intriguing question that one should understand · #(y) is the number of paths of value y used by the algorithm. before using these greedy algorithms is how large could Substituting this into equation 1 we get that be the decomposition which they produce compared to the optimal one. We note that the greedy length may produce F1 F F1 OPT + − a decomposition with paths of rather small latency and we ≥ y x ≥ will consider this in Section VII. But our main objective in #(y) y F #(y) y · + − · = #(y) + #(x)= ALG this paper is to minimize the number of flow-paths so we y x first consider the performance of the greedy algorithms with which concludes the proof. respect to the number of paths which they produce. Lets look at the example shown in Figure 3. In this flow IV. GREEDY APPROACHES there are only 3 possible flow values on the arcs which are x, It is easy to verify that when we take a flow-path and 2x and 1. There are k +1 arcs of flow 2x, namely a a , i → i+1 subtract it from the flow the result is still a flow. Based on for 1 i k 1, s a1, and ak t, that together form this fact one can come up with many greedy algorithms to a path≤ from≤ s to− t. There→ are k/2 arcs→ from s to every node . . . 3 3 e 3 2 3 ≤ in in / / he / / 2 2 0 2 t 2 e t t 3 F/ / / paths G ) 2 2 to t e e ( 3 s c / . Dividing 2 paths that by a factor , so we get t +1) 3 3 / 3 3 k that we have such that the / 2 / units of flow. ( units of flow. to the largest even if greedy 2 t so by Lemma 2 . 3 t . We claim that F/ 3 ) 3 3 is small relative 3 is at least G ≤ 3 3 kt / d / / OPT / / 2 m units of flow are t 2 F/ 2 2 x 2 0 2 t arcs on which the F/ + 1) 2 t G / 2 so G √ G the algorithm which 3 G ) )= k in the resulting graph e we know that in the ( 3 e m as in ( Ω( F ( be the capacity of in we have that F/ with capacities divided 3 0 paths each routing c 3 t 3 ≤ c ) F/ e / of k / 3 is also a multiple of 3 e 2 , ) 2 F/ / / to ( t 3 e 2 ≥ 2 3 c G ≤ G / ( first and thereby produce a s / G 0 0 2 c PPROXIMATION G we know that there is some 0 2 /t 2 containing all arcs of capacity 0 . Since our algorithm uses at F / x G A of value G into at most 3 ≤ at least e 2 F / G 3 route at least . Let 2 f 3 units of flow. / f in 3 f / 0 2 / G and call the resulting graph 2 + 1) 0 3 t 2 F 3 k 3 F kt G ( / F/ 2 , F/ t in 1 . CRITERIA be the maximum value of 3 - , and the claim follows. e be the capacity of I ≥ ) . ≥ d 3 e ) 3 paths the theorem follows. There are flows / F/ k ( / e 2 0 Given a flow e ( 0 2 . This would imply that we can route the flow t c such that that we routed in 0 The value of the maximum flow from 3 2 V. B c G / By the definition of / 1 3 2 If we scale down the capacities in ≤ all capacities are multiples of ) t OPT e be the subgraph of 2 3 ≥ 3 ( F/ . Let / by its definition. / t c t ) k 0 2 F/ 3 e G is at least and let / d ( ≥ G also in . But by the definition of Proof: Proof: then we obtain a flow of value 2 c 3 3 3 ) 2 / / / G 2 e From the definition of Consider an arc In If the latency of the arcs of flow-value Let We round down all the capacities in 2 2 0 2 ( 0 integer the upper bound by two we obtain that that Since for every kt c by most G of value optimal decomposition of routed using pathsTherefore each carrying at most in simply by scaling down the flow of value and can be decomposedunits into of flow. Byof Lemma the V.1, decomposition of So we get the following theorem. Theorem V.2. we described decomposes together route at least of II.1 the maximum flow length is required to decompose only a constant fraction of t width is required to decomposeflow, only for a any constant fixed fraction constant. of th to the latency ofalso the pick other the arcslarge then path the decomposition of greedy of width lengthexample would in this which example. thegreedy latency We length of fails have all infollowing a arcs a theorem. different is similar the fashion. So sameTheorem we but the also IV.2. have the approximation ratio of greedy length is flow. at least value of the maximum flow from possible multiple of We have the following lemma. Lemma V.1. To simplify notation we refer to G 2 x w → . with each 1 x kx/ using OPT a t i unit of · 2 t example , we get a ) 1 using the 2 → → k 2+2 → t s we also get k/ → i +1 units of flow Ω( i units of flow 2 1 even if greedy x x → 2 kx/ a x − a ) . One can easily x i i 2 arcs (and vertices 2 ... m → a arcs on which the → a ) 2 √ paths to decompose 2 i 1 a . Finally, there are 2 paths. It will make a k to − t m a i Ω( 1 2 paths. → to a of − , each carrying → i 1 i i 2 OPT = Θ( a 2 2 s 2+1 a G +1 · → which is half the maximum a a ) m x → t s 1 1 k x kx/ × s × and x → Ω( x 2+ 1 2 1 paths − x k/ i a a ... 2 paths 2 2 ..... units of flow on the path a going from s t x x k 4 x x x → 2 x 3 k/ x a 2 a 2 2 1 a 2 to some value larger than say, 2 1 x a units of flow of the arcs x There are flows . This can be done by routing arcs going from paths. x → 2 x 2 +1 s k/ x . Now, it will have to route the remaining . So even if we compare the number of paths that t kx/ 1 ; and finally, 2+ → x , and 2 k/ 1 − ... The greedy width will use Furthermore, with this rather large value of Now if we set The minimal number of paths this flow can be decomposed i 2 2 Fig. 3. The performance of the is bad on this flow. Altogether, we get mistake and route a the flow. The resulting flowif has we eliminate the parallel arcs by subdividingthat them). most of theof flow value is routed by the greedy width on paths greedy width needs to carry only a constant fraction of the flo parallel arcs between units of flow on paths of the form that the greedy width uses for any fixedpaths. constant The then following we theorem get summarizes that thisTheorem result. it IV.1. uses approximation ratio of greedy width is carrying only one unitwill of take flow. So routing the remaining flow the remaining to is verify that this is indeed a legal flow of value on the path parallel arcs of flow possible flow weon can each send one along of this the path; flow a We can generalize this result as follows. Proof: For a variable x let o(x) be the number of occur- 0 Let t1− be the maximum value of t such that the value of rences of the literal x, let o (x) be the number of occurrences 0 the maximum flow from s to t in G is at least (1 )F . To of x, and let om(x) = max o(x), o (x) . t − { } simplify notation we refer to Gt1− as G1−. Given an instance Φ of 3SAT with a set Z of N variables We introduce a new constant 0 δ 1 and round down and a set C of M clauses we construct a flow F with ≤ ≤ all the capacities in G1− to the largest possible multiple of n = M +2+ x∈Z(3+4om(x)) vertices and m = 4M + δ δt1− and call the resulting graph G1−. The following lemma x∈Z (15om(xP)+12) arcs such that there is a decomposition generalizes Lemma V.1 (which is the case where δ = 1 and Pof F into x∈vars 5+3om(x) paths if and only if Φ is  =1/3). satisfiable. P For each variable x we construct the gadget shown in Figure Lemma V.3. The value of the maximum flow from s to t in δ 1− 4. We have two vertices s(x) and t(x) which we connect with G1− is at least 1+ F . δ two parallel paths each containing om(x) pairs of nodes ui(x) 0 0 The proof follows the same lines of V.1 and is omitted. and ui(x), 1 i om(x) in one path and wi(x) and wi(x), δ ≤ ≤ In G1− all capacities are multiples of δt1− so by Lemma 1 i om(x) in the other path. There is an arc from s(x) 0 δ ≤ ≤ 0 0 II.1 the maximum flow F in G1− is also a multiple of δt1−. to u1(x), from ui(x) to ui(x) and from ui(x) to ui+1(x) 0 0 0 By Lemma II.1 we can decompose F into F /δt − paths for every 1 i om(x) 1, and from u (x) to t(x). 1  ≤ ≤ − om(x) each routing δt1− flow. Furthermore, any set of paths each Similarly, there is an arc from s(x) to w1(x), from wi(x) to 0 0 routing δt1− flow can be completed by additional paths each wi(x) and from wi(x) to wi+1(x) for every 1 i om(x) 1, 0 0 ≤ ≤ − routing δt1− flow so that all-together we route F flow. By and from wM (x) to t(x). 0 1− (1−)F Lemma V.3, F F so by using paths of The source s is connected to all vertices s(xj ), for each ≥ 1+δ d (1+δ)δt1− e 0 (1−)F variable xj , 1 j N, and all vertices t(xj ), 1 j N the decomposition of F we route at least (1+δ) flow. The ≤ ≤ ≤ ≤ following theorem summarizes the properties of our algorithm. are connected to t. The flow on all arcs we specified so far is 4, and these would be the only arcs with flow 4 in F . δ 1− (1−)F Lemma V.4. In G − we can route F flow in In addition we have four arcs with flow 1 from s to s(x ), 1  1+δ d (1+δ)δt1− e j paths each routing δt1−. Furthermore, starting with any set of 1 j N, and from t(xj ), 1 j N to t. We also have a δ ≤ ≤ ≤ ≤ 0 paths in G1− each routing δt1− flow we can find additional pair of parallel arcs with flow 1 from ui(x) to ui(x) and from 0 paths each routing δt1− flow so that all paths together route wi(x) to wi(x) for 1 i om(x). 1− ≤ ≤ 1+δ F flow. For each variable x we have an additional vertex a(x). We connect to with arcs of flow and The following theorem follows from Lemma V.4. s a(x) om(x) 2 2om(x) arcs of flow 1. We also connect a(x) to ui(x) and wi(x) for Theorem V.5. Given a flow f of value F we can decompose 1 i om(x) with an arc of flow 2. 1+ ≤ ≤ f into at most OPT paths. Each path routes δt1− For each clause c we have a vertex which we also call c. d (1+δ)δ e · (1−)F Each such vertex c is connected to t by four arcs, two of flow units of flow and together they route at least 1+δ units of flow. 1 and two of flow 2. Last we have to specify the connection between the variable Proof: By the definition of G1− we know that in the 0 gadget and the clause vertices. We connect each vertex ui(x) optimal decomposition of f at least F units of flow are routed for 1 i o(x) to a clause vertex corresponding to a clause ≤ ≤ using paths each carrying at most t1− units of flow. Therefore containing x. We make these connections such that each of OPT F/t1− . By Lemma V.4 our algorithm uses at most these clause vertices is connected to a single vertex u0 (x). (1−≥d)F e i paths therefore the ratio between the number of 0 0 1− Similarly, we connect each vertex w (x) for 1 i o (x) to a d (1+δ)δt  e i paths that we use and the optimal number of paths is clause vertex corresponding to a clause containing≤ ≤x, such that (1−)F (1−)F each of these clause vertices is connected to a single vertex (1+δ)δt1− (1+δ)δt1− 0 0 d e = d e wi(x). If o(x) < om(x) we connect each vertex ui(x) for F/t − F/t − ≤ 0 d 1 e d 1 e o(x)+1 i om(x). If o (x) < om(x) we connect each vertex w0(≤x) for≤ o0(x)+1 i o (x) to t. All arcs defined (1−) F i ≤ ≤ m (1+δ)δ t1− (1 ) in this paragraph have flow 2. l m ·d e = − ≤ F/t1− (1 + δ)δ  We now show that Φ is satisfiable if and only if we can d e decompose F to (5+3o (x)) paths. as required. x∈vars m We first assumeP that Φ is satisfiable and show a decom- VI. HARDNESS RESULTS position of F into x∈vars(5+3om(x)) paths. Consider an assignment that satisfies Φ. In this section we prove the following strong hardness result. P Each path in our decomposition saturates an edge from s.

Theorem VI.1. Let F be a flow such that on each arc the The number of these arcs is exactly x∈vars 5+3om(x). For flow value is either 1, 2 or 4, and let k be an integer. Then it a variable x which equals 1 there wouldP be a pair of paths of is NP-complete to decide if there exists a decomposition of F value 1 to each clause vertex containing the literal x, and a into at most k paths. path of value 2 to each clause vertex containing the literal x. , t ) ) ) ) ) ) ) ) g 2 ) x x x x x x x x x ( ( ( ( ( ( ( ( and ( 0 0 i i i such o a 0 i from . We m m paths o ) u u w ) u o 2 0 x − of flow ) x to − ( then we to ( to ) i to x ) < o ≤ ) ( s x , and then ) m 0 ) ) w x and i x ) ) ( ) , for every ( o o ( x x  > x . x x 0 i ) x ( ( x ( 2 m ( ( ≤ to 4 ( i m u x 0 a o 0 i a of size at most ( m o ≤ u o ) , and then to a 1 i from reaching it each w ) to x , and four paths , u i F ( x ) ) 2 1 ) ( . If a x < o x 0 i ) x to from ( ( ≤ and from ( t 0 i x w ) ) i to paths of value 1 ( ) u . Each of these paths x 0 ESULTS x 1 u we take . ) ) ( x s ( to o ) R ( o x ) x a i and x ) ( ( ) it uses an arc of flow x ≤ ( and arriving to a clause vertex o x w ( )+1 x ) to m i ( 0 m ) x then we take x 2 o m . If from s ( o ( x to outgoing from the vertex. m ) ) ≤ i ( o o 2 i x ) , and if the path goes through a . Then there is an x 2 ≤ ≤ . Each of these paths uses an arc w 1 ( ( < o 4 4 u x , and then to a clause containing ) o paths of value i ≤ ( i ) outgoing from this vertex. Since x ) outgoing from this clause vertex. m for from to a we take every clause has at least one literal x ( i to ) for x or ≤ ( paths of value 2 2 ≤ t ( ) 2 ≤ x 0 i m ) x be a flow such that on each arc the to ) Φ ( o x ≤ 2 < o o paths of value u i x ) )) o ( x ( =0 s , XPERIMENTAL ) ( 1 2 x x a of flow F s 0 ≤ 1 ( ( x )) ≤ x , . If o , and then to a clause vertex containing ) ( o o i and x ) ) 0 ) to to x ( 1 o x x x ) ( − Let o ≤ ( from s ( 0 ( i s 0 x ) i 0 i paths of value o ( . This literal corresponds to two paths of value , for w 2 VII. E x − u i w ) t , for 1 . If ( t ≤ ) u . ) we take x , for to m )+1 ( x x i to from from x 0 o ( ) x ( , and then to and the other two use the arc of flow to 0 and and o ( ) ) x 1 1 0 o ≤ m 2( ) ( 1 x x ) o ) = 1 o i OPT ( . Two of these paths use the arcs from ( x 1 x 0 x ) i 0 i ≤ ) ( , and then to a clause vertex containing w ( (  2( x ) u a i x i i w for ( x u w t ( t To show that this decomposition is well defined we have We omit the other direction in this version due to lack of Symmetrically, if We conducted experiments on two types of networks. The Google’s backbone network is one of the largest networks A variation of the proof of Theorem VI.1 using a reduction If arriving to the clause vertex. Therefore, the correspondin to ≤ 0 i , for ) it uses an arc of flow , and if the path goes through a clause vertex (which must x and to argue that eachcan path use of value an arc of flow clause vertex has at mostusing two one paths of of value the arcs of flow space. the assignment satisfies which equals 1 of value s additional paths of value we take that it is NP-hard to find a decomposition of first one issecond the one is Google aresembles backbone synthetically the production generated topology layeredcluster network. inside of network The a interconnected tha data machines). center (which isin a the big world today, recently mentioned as the second largest to clause vertex (which mustx correspond to a clause containing from a variantgives of the following MAX-3-SAT stronger that result. isTheorem hard VI.2. to approximate flow value is either (1 + to of flow 1 from to paths of value to and then to the literal ships two units of flow on an arc from take w and also take 4 correspond to a clauseoutgoing containing from thisof clause value vertex. We alsoclause vertex take containing . , , ) ) ) x ne x x x ( ( ( s 0 i t . Arcs t .1. Arc w , and a to to ming arcs x ) s . x and t ( . Arcs outgoing ) s m nk x 2 2 2 o ( from i 1

w 4 ≤

w 4 4 4 4 4 ... i to 0 2 0 1 1 1 ≤ 2 1 w m ) w o m w 1 1 0 1 o x 1 w ( w 1 , s ∗ ) 4 4 2 ) x 1 2 m 2 to ( o 1 0 i there would be a pair of paths (2 1 we take a single path of value s 2 1 u ) 0 x 1 1 1 ( 2 4 ) 2 t ∗ 4 x 2 =1 and . ( m from 2 t o a and the other two use the arc of flow 2 ) x 2 1 4 x 1 4 2 2 ( . Two of these paths use the arcs from if are outgoing of the global source i ) m m 0 o 1 1 1 ) 2 o u x x u 0 1 0 2 1 1 u u x 1 ( u u , either enter a clause gadget or enter the sink ( t and therefore at least one pair of paths of 0 i to ... a w 4 which equals 4 1 4 to each clause vertex containing the literal 4 1 ) to of flow 4 u x x and enter the sink ) 2 ) ( 2 2 and 2 ) x ) x we take a single path of value s ( x 0 i ( x 0 ( i to each clause vertex containing the literal ( u t s m to w 1 o entering it. Therefore we can route these paths through s = 0 to ≤ 1 x ) i x ( For each variable from ≤ i . If 1 w 4 of value 4 the clause vertices without splitting them. For a variable literal which is value and four paths of value arrive from the variable gadgets. Outgoing arcs enter the si Since the assignment is satisfiable each clause has at least o outgoing of Fig. 5. A clause gadget in the reduction of Theorem VI.1. Inco path of value Fig. 4. A variable gadget in the reduction proving Theorem VI incoming to of the nodes ISP backbone in terms of overall traffic in Arbors list of the topology [1], where the number of the layers depends on the top 10 carriers of Internet traffic4. size of the cluster. Our first set of experiments were conducted on this network The edge-tier consists of the machines themselves, each using a real topology and the actual set of pop-level high node (leaf) in this layer corresponds to a rack with a few priority demands. We used a linear program (LP) solver with dozens machines. The next layer is an aggregation layer the objective of maximizing overall throughput to generate consisting of switches where each rack is connected to a several hundred flows between the source-destination pairs. constant number d1 of switches in the aggregation layer and This models a traffic engineering setup in an MPLS network each switch of the aggregation layer is connected to a constant (as many large backbone networks are) where an ISP is number d of racks. Typically d d , so the number 2 1  2 seeking to optimize network utilization. of switches in the aggregation layer is d1/d2 fraction of The LP solver outputs for each commodity the amount of the number of racks. There may be several such layers and flow through each arc in the network. As discussed before, we the last layer is the core or the root layer. It consists of a need to decompose these flows into distinct paths which then similar number of switches as the aggregation layers, and can be set as LSPs (Labeled Switched Paths) in the relevant interconnects switches in the previous layer. Data-centers in routers. Routers have constraints on the number of LSPs which the same geographic location are further connected using a they can support, both due to table size restrictions in the similar tree-like layered network. router hardware, as well as CPU load when updating the tables Several routing strategies have been suggested for data as paths change. centers including a traffic engineering approach based on open We decomposed the flows obtained from the LP into paths flow [3]. The topology of the data center forces each flow from using 4 algorithms: Greedy width, greedy length, bicriteria a leaf node in one data-center to a leaf node in the same cluster length and bicriteria width. Specifically, we decomposed (1 to look as a layered network as well. Each layer of the flow ) of each flow separately for  ranging from 0.5 to 0.001, and− consists of the switches that the flow goes through either on compared the aggregated (sum of all flows) number of paths the way up from the source leaf, or on its way down to the used by each algorithm to decompose all the flows. destination leaf. Our results show that on this input, greedy width finds the To test the decomposition algorithms in this setting, we minimum number of paths amongst the algorithms compared. generated a network that consists of 10 layers, each containing Bicriteria width achieves similar results. Bicriteria length lags 5 switches. Two adjacent layers are connected as a full behind and the greedy length finds almost twice as many paths, bipartite graph. All arcs between adjacent layers have the depending on . See the left plot in Figure 7. same latency. To generate a specific flow between 2 racks we For all the algorithms, the number of paths increases expo- combined 100 paths that connect the source and the destination nentially with the fraction of flow that they cover. To route each having a uniformly distributed flow value into a single 70% of the flows, greedy width uses an average of 3.34 paths. flow. This ensures us that an optimal decomposition uses at To cover 90% it needs an average of 5.8 paths, and to cover most 100 paths. 99.99% of the flows almost 12 paths are used on the average The results for this scenario were similar to the results for per demand. The bicriteria algorithms behave similarly. The the Google backbone network. The right plot in Figure 7 shows number of paths used by the greedy length also increase with the average number of paths found by the different algorithms the fraction of flow covered, but less rapidly. This can be due to for 10 random layered networks as above. Again, the greedy the fact that this is the only algorithm that ignores the width of width found the smallest number of paths and bicriteria width the paths completely (bicriteria length decomposes a subgraph performed slightly worse. Greedy length and bicriteria length of the original flow graph consisting of wide arcs). performed 3 to 6 times worse than greedy width, depending The maximum and average latency of the decompositions on . The bad behavior of greedy length and bicriteria length found by each of the algorithms are depicted in Figure 6 . in this case is due to the fact that all the source-destination We found that the latencies for all algorithms are comparable. paths have the same latency. This causes greedy length and When looking at the average latencies, the greedy length and bicriteria length to effectively choose paths arbitrarily when bicriteria length perform slightly worse than the greedy width decomposing this flow. and bicriteria width. This seems counter intuitive, but can be For greedy width and bicriteria width, similarly to the explained by the fact that the number of paths found by the Google backbone network, we see an exponential increase greedy width and bicriteria width was smaller than the number in the number of paths with the fraction of the flow routed. of paths found by the greedy length and bicriteria length. This increase is less rapid and closer to linear for the greedy The second network we tested models an intra cluster topol- length and bicriteria length, which is again a side effect ogy. The topology inside a data-center is typically organized of the arbitrary choice of paths of these algorithms when as a tree-like layered network to allows one to can get a decomposing this flow. large scale communication network. Common architectures We also notice that the number of paths found by the bicri- today consist of several layers of switches forming a tree-like teria algorithms does not necessarily monotonically increase as a function of the amount of flow routed. For instance, the 4See http://asert.arbornetworks.com/2010/03/how-big-is-google/. bicriteria width algorithm routes 85% of the flow using 40 1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4 fraction of flows fraction of flows

0.3 0.3

0.2 0.2 greedy-width greedy-width 0.1 greedy-length 0.1 greedy-length bicriteria-width bicriteria-width bicriteria-length bicriteria-length 0 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 average latency max latency

Fig. 6. Cumulative distribution of maximum and average latency of the paths that cover 70% of the flows in the Google network.

12 200 greedy-width greedy-width 11 greedy-length 180 greedy-length bicriteria-width bicriteria-width 10 bicriteria-length 160 bicriteria-length 9 140 8 120 7 100 6 80 number of paths number of paths 5 60 4 40 3 20 0.75 0.8 0.85 0.9 0.95 1 0.75 0.8 0.85 0.9 0.95 1 fraction of flow fraction of flow

Fig. 7. The number of paths in the decomposition for different values of . The left figure is for an aggregation over few hundred commodities in the Google backbone network. The right figure is for a graph consisting of 10 layers with 5 vertices in each, representing intra data center network. For each flow there is a decomposition with at most 100 paths.

paths, but routes 86% of the flow with less than 40 paths. This [10] L. Khachiyan, “A polynomial time algorithm in ,” is due to the fact that when increasing the amount of flow that Soviet Math. Dokl., vol. 20, pp. 191–195, 1979. [11] J. M. Kleinberg, “Single-source unsplittable flow,” in FOCS, 1996, pp. we have to cover we increase the subgraph that the bicreteria 68–77. algorithms pick. A larger subgraph contains more paths and [12] S. G. Kolliopoulos, “Edge-disjoint paths and unsplittable flow,” in Hand- maybe counter-intuitively may have a smaller decomposition. book of Approximation Algorithms and , ser. Chapman and Hall/CRC, T. F. Gonzalez, Ed., 2007. REFERENCES [13] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner, “Openflow: enabling innovation [1] M. Al-Fares, A. Loukissas, and A. Vahdat, “A scalable, commodity data in campus networks,” SIGCOMM Comput. Commun. Rev., vol. 38, pp. center network architecture,” in SIGCOMM, 2008. 69–74, March 2008. [2] G. Baier, E. K¨ohler, and M. Skutella, “The k-splittable flow problem,” [14] V. S. Mirrokni, M. Thottan, H. Uzunalioglu, and S. Paul, “A simple Algorithmica, vol. 42, pp. 231–248, 2005. polynomial time framework for reduced-path decomposition in multipath [3] T. Benson, A. Anand, A. Akella, and M. Zhang, “The case for fine- routing,” in INFOCOM, 2004. grained traffic engineering in data centers,” in Proceedings of the 2010 [15] E. Osborne and A. Simha, Traffic Engineering with MPLS. Pearson internet network management conference on Research on enterprise Education, 2002. networking (INM/WREN’10). USENIX Association, 2010. [16] A. Premji, Using MPLS Auto-bandwidth in MPLS Networks, Juniper [4] MPLS Traffic Engineering (TE) – Automatic Bandwidth Adjustment for Networks, Sunnyvale, CA 94089 USA. TE Tunnels, Cisco. [17] M. Thorup, “Integer priority queues with decrease key in constant time [5] E. W. Dijkstra, “A note on two problems in connection with graphs,” and the single source shortest paths problem,” J. Comput. Syst. Sci., Numerische Mathematik, vol. 1, pp. 269–271, 1959. vol. 69, no. 3, pp. 330–353, 2004. [6] H. N. Gabow and R. E. Tarjan, “Algorithms for two bottleneck optimiza- [18] B. Vatinlen, F. Chauvet, P. Chr´etienne, and P. Mahey, “Simple bounds tion problems,” J. Algorithms, vol. 9, pp. 411–417, September 1988. and greedy algorithms for decomposing a flow into a minimal set of [7] L. D. Ghein, MPLS Fundamentals. Cisco Press, 2006. paths,” European Journal of Operational Research, vol. 185, pp. 1390– [8] M. Gr¨otschel, L. Lov´asz, and A. Schrijver, “The and 1401, 2008. its consequences in combinatorial optimization,” Combinatorica, vol. 1, no. 2, pp. 169–197, 1981. [9] Y. Hendel and W. Kubiak, “Decomposition of flow into paths to minimize their length.”