7. NETWORK FLOW III

‣ assignment problem ‣ input-queued switching

Lecture slides by Kevin Wayne Copyright © 2005 Pearson-Addison Wesley http://www.cs.princeton.edu/~wayne/kleinberg-tardos

Last updated on 7/25/17 11:04 AM 7. NETWORK FLOW III

‣ assignment problem ‣ input-queued switching Assignment problem

Input. Weighted, complete G = (X ∪ Y, E) with | X | = | Y |. Goal. Find a perfect of min weight.

0 15 0' 7 3

5 1 6 1' 2

9 4 2 1 2'

X Y

3 Assignment problem

Input. Weighted, complete bipartite graph G = (X ∪ Y, E) with | X | = | Y |. Goal. Find a perfect matching of min weight.

0 15 0' 7 3

min-cost perfect matching 5 M = { 0-2', 1-0', 2-1' } 1 6 1' cost(M) = 3 + 5 + 4 = 12 2

9 4 2 1 2'

X Y

4 Princeton writing seminars

Goal. Given m seminars and n = 12m students who rank their top 8 choices, assign each student to one seminar so that: ・Each seminar is assigned exactly 12 students. ・Students tend to be “happy” with their assigned seminar.

Solution. ・Create one node for each student i and 12 nodes for each seminar j. ・Solve assignment problem where cij is some function of the ranks:

f(rank(i, j)) B7 i `MFb j cij = B7 i /Q2b MQi `MF j

5 Locating objects in space

Goal. Given n objects in 3d space, locate them with 2 sensors.

Solution. ・Each sensor computes line from it to each particle. ・Let cij = distance between line i from censor 1 and line j from sensor 2. ・Due to measurement errors, we might have cij > 0. ・Solve assignment problem to locate n objects.

6 Kidney exchange

If a donor and recipient have a different blood type, they can exchange their kidneys with another donor and recipient pair in a similar situation.

Can also be done among multiple pairs (or starting with an altruistic donor).

7 Kidney exchange

1 3 2 8 3 1 3 2 8 3

6 2 5 9 4 6 2 5 9 4

4 7 5 1 6 4 7 5 1 6

weight = 3 + 5 + 7 + 8 + 4 = 27 weight = 2 + 5 + 8 + 6 + 1 + 9 = 31

8 Applications

Natural applications. ・Match jobs to machines. ・Match personnel to tasks. ・Match PU students to writing seminars.

Non-obvious applications. ・Vehicle routing. ・Kidney exchange. ・Signal processing. ・Earth-mover’s distance. ・Multiple object tracking. ・Virtual output queueing. ・Handwriting recognition. ・Locating objects in space. ・Approximate string matching. ・Enhance accuracy of solving linear systems of equations.

9 Bipartite matching

Bipartite matching. Can solve via reduction to maximum flow.

Flow. During Ford–Fulkerson, all residual capacities and flows are 0–1; flow corresponds to edges in a matching M.

1 Residual graph GM simplifies to: 1 1 If (x, y) ∉ M, then (x, y) is in GM. ・ s t If , then is in . ・ (x, y) ∈ M (y, x) GM

X Y Augmenting path simplifies to: ・Edge from s to an unmatched node x ∈ X, ・Alternating sequence of unmatched and matched edges, ・Edge from unmatched node y ∈ Y to t.

10 Alternating path

Def. An alternating path P with respect to a matching M is an alternating sequence of unmatched and matched edges, starting from an unmatched node x ∈ X and going to an unmatched node y ∈ Y.

Key property. Can use P to increase by one the cardinality of the matching. Pf. Set M ʹ = M ⊕ P.

symmetric difference

y y y

x x x

matching M alternating path P matching M′

11 Assignment problem: successive shortest path algorithm

Cost of alternating path. Pay c(x, y) to match x–y; receive c(x, y) to unmatch.

1 10 1'

6 P = 2 → 2' → 1 → 1'

cost(P) = 2 - 6 + 10 = 6 7 2 2 2'

Shortest alternating path. Alternating path from any unmatched node x ∈ X to any unmatched node y ∈ Y with smallest cost.

Successive shortest path algorithm. ・Start with empty matching. ・Repeatedly augment along a shortest alternating path.

12 Finding the shortest alternating path

Shortest alternating path. Corresponds to minimum cost s↝t path in GM.

1 10 1' 0 -6 s t 0 7

2 2 2'

Concern. Edge costs can be negative.

Fact. If always choose shortest alternating path, then GM contains no negative cycles ⇒ can compute using Bellman-Ford.

Our plan. Use duality to avoid negative edge costs (and negative cycles) ⇒ can compute using Dijkstra.

13 Equivalent assignment problem

Duality intuition. Adding a constant p(x) to the cost of every edge incident to node x ∈ X does not change the min-cost perfect matching(s).

Pf. Every perfect matching uses exactly one edge incident to node x. ▪

original costs c(x, y) modified costs c′(x, y) p(0) = 3 0 15 0' 0 18 0' 7 10 3 6 add 3 to all edges incident to node 0 5 5 1 6 1' 1 6 1' 2 2

9 9 4 4 2 1 2' 2 1 2'

X Y X Y 14 Equivalent assignment problem

Duality intuition. Subtracting a constant p(y) to the cost of every edge incident to node y ∈ Y does not change the min-cost perfect matching(s).

Pf. Every perfect matching uses exactly one edge incident to node y. ▪

original costs c(x, y) modified costs c′(x, y)

0 15 0' p(0') = 5 0 10 0' 7 7 3 subtract 5 from all edges 3 incident to node 0' 5 0 1 6 1' 1 6 1' 2 2

9 4 4 4 2 1 2' 2 1 2'

X Y X Y 15 Reduced costs

Reduced costs. For x ∈ X, y ∈ Y, define cp(x, y) = p(x) + c(x, y) – p(y).

Observation 1. Finding a min-cost perfect matching with reduced costs is equivalent to finding a min-cost perfect matching with original costs.

original costs c(x, y) reduced costs cp(x, y) p(0) = 0 0 15 0' p(0') = 11 0 4 0' 7 1 3 0

5 0 p(1) = 6 1 6 1' p(1') = 6 1 6 1' 2 5

9 cp(1, 2') = p(1) + 2 – p(2') 0 4 0 p(2) = 2 2 1 2' p(2') = 3 2 0 2'

X Y X Y 16 Compatible prices

Compatible prices. For each node v ∈ X ∪ Y, maintain prices p(v) such that: ・cp(x, y) ≥ 0 for all (x, y) ∉ M. ・cp(x, y) = 0 for all (x, y) ∈ M.

Observation 2. If prices p are compatible with a perfect matching M, then M is a min-cost perfect matching.

reduced costs cp(x, y) Pf. Matching M has 0 cost. ▪ 0 4 0' 1 0

0 1 6 1' 5

0 0 2 0 2'

X Y 17 Successive shortest path algorithm

SUCCESSIVE-SHORTEST-PATH (X, Y, c)

______

M ← ∅. prices p are compatible with M FOREACH v ∈ X ∪ Y : p(v) ← 0. cp(x, y) = c(x, y) ≥ 0

WHILE (M is not a perfect matching) d ← shortest path distances using costs cp. P ← shortest alternating path using costs cp. M ← updated matching after augmenting along P. FOREACH v ∈ X ∪ Y : p(v) ← p(v) + d(v).

RETURN M.

______

18 Successive shortest path algorithm

Initialization. ・M = ∅. ・For each v ∈ X ∪ Y : p(v) ← 0.

original costs c(x, y) p(0) = 0 p(0') = 0

0 15 0' 7 3

p(1) = 0 p(1') = 0 5 s 1 6 1' t 2

9 4 2 1 2'

p(2) = 0 p(2') = 0 19 Successive shortest path algorithm

Initialization. ・M = ∅. ・For each v ∈ X ∪ Y : p(v) ← 0.

reduced costs cp(x, y) p(0) = 0 p(0') = 0

0 15 0' 7 3

p(1) = 0 p(1') = 0 5 s 1 6 1' t 2

9 4 2 1 2'

p(2) = 0 p(2') = 0 20 Successive shortest path algorithm

Step 1. ・Compute shortest path distances d(v) from s to v using cp(x, y). ・Update matching M via shortest path from s to t. ・For each v ∈ X ∪ Y: p(v) ← p(v) + d(v).

shortest path distances d(v) d(0) = 0 d(0') = 5

0 15 0' 7 3

d(s) = 0 d(1) = 0 d(1') = 4 d(t) = 1 5 s 1 6 1' t 2

9 4 2 1 2'

d(2) = 0 d(2') = 1 21 Successive shortest path algorithm

Step 1. ・Compute shortest path distances d(v) from s to v using cp(x, y). ・Update matching M via shortest path from s to t. ・For each v ∈ X ∪ Y: p(v) ← p(v) + d(v).

alternating path d(0) = 0 d(0') = 5

0 15 0' 7 3

d(s) = 0 d(1) = 0 d(1') = 4 d(t) = 1 5 s 1 6 1' t 2

9 matching 4 2-2' 2 1 2'

d(2) = 0 d(2') = 1 22 Successive shortest path algorithm

Step 1. ・Compute shortest path distances d(v) from s to v using cp(x, y). ・Update matching M via shortest path from s to t. ・For each v ∈ X ∪ Y: p(v) ← p(v) + d(v).

reduced costs cp(x, y) p(0) = 0 p(0') = 5

0 10 0' 3 2

p(1) = 0 p(1') = 4 0 s 1 2 1' t 1

4 matching 0 2-2' 2 0 2'

p(2) = 0 p(2') = 1 23 Successive shortest path algorithm

Step 2. ・Compute shortest path distances d(v) from s to v using cp(x, y). ・Update matching M via shortest path from s to t. ・For each v ∈ X ∪ Y: p(v) ← p(v) + d(v).

shortest path distances d(v) d(0) = 0 d(0') = 0

0 10 0' 3 2

d(s) = 0 d(1) = 0 d(1') = 1 d(t) = 0 0 s 1 2 1' t 1

4 matching 0 2-2' 2 0 2'

d(2) = 1 d(2') = 1 24 Successive shortest path algorithm

Step 2. ・Compute shortest path distances d(v) from s to v using cp(x, y). ・Update matching M via shortest path from s to t. ・For each v ∈ X ∪ Y: p(v) ← p(v) + d(v).

shortest path distances d(v) d(0) = 0 d(0') = 0

0 10 0' 3 2

d(s) = 0 d(1) = 0 d(1') = 1 d(t) = 0 0 s 1 2 1' t 1

4 matching 0 2-2' 1-0' 2 0 2'

d(2) = 1 d(2') = 1 25 Successive shortest path algorithm

Step 2. ・Compute shortest path distances d(v) from s to v using cp(x, y). ・Update matching M via shortest path from s to t. ・For each v ∈ X ∪ Y: p(v) ← p(v) + d(v).

reduced costs cp(x, y) p(0) = 0 p(0') = 5

0 10 0' 2 1

p(1) = 0 p(1') = 5 0 s 1 1 1' t 0

5 matching 0 2-2' 1-0' 2 0 2'

p(2) = 1 p(2') = 2 26 Successive shortest path algorithm

Step 3. ・Compute shortest path distances d(v) from s to v using cp(x, y). ・Update matching M via shortest path from s to t. ・For each v ∈ X ∪ Y: p(v) ← p(v) + d(v).

shortest path distances d(v) d(0) = 0 d(0') = 6

0 10 0' 2 1

d(s) = 0 d(1) = 6 d(1') = 1 d(t) = 1 0 s 1 1 1' t 0

5 matching 0 2-2' 1-0' 2 0 2'

d(2) = 1 d(2') = 1 27 Successive shortest path algorithm

Step 3. ・Compute shortest path distances d(v) from s to v using cp(x, y). ・Update matching M via shortest path from s to t. ・For each v ∈ X ∪ Y: p(v) ← p(v) + d(v).

shortest path distances d(v) d(0) = 0 d(0') = 6

0 10 0' 2 1

d(s) = 0 d(1) = 6 d(1') = 1 d(t) = 1 0 s 1 1 1' t 0

5 matching 0 1-0' 0-2' 2-1' 2 0 2'

d(2) = 1 d(2') = 1 28 Successive shortest path algorithm

Step 3. ・Compute shortest path distances d(v) from s to v using cp(x, y). ・Update matching M via shortest path from s to t. ・For each v ∈ X ∪ Y: p(v) ← p(v) + d(v).

reduced costs cp(x, y) p(0) = 0 p(0') = 11

0 4 0' 1 0

p(1) = 6 p(1') = 6 0 s 1 6 1' t 5

0 matching 0 1-0' 0-2' 2-1' 2 0 2'

p(2) = 2 p(2') = 3 29 Successive shortest path algorithm

Termination. ・M is a perfect matching. ・Prices p are compatible with M.

reduced costs cp(x, y) p(0) = 0 p(0') = 11

0 4 0' 1 0

p(1) = 6 p(1') = 6 0 1 6 1' 5

0 matching 0 1-0' 0-2' 2-1' 2 0 2'

p(2) = 2 p(2') = 3 30 Maintaining compatible prices

Lemma 1. Let p be compatible prices for M. Let d be shortest path p p+d distances in GM with costs c . All edges (x, y) on shortest path have c (x, y) =

0. forward or reverse edges

Pf. Let (x, y) be some edge on shortest path. ・If (x, y) ∈ M, then (y, x) on shortest path and d(x) = d(y) – cp(x, y); If (x, y) ∉ M, then (x, y) on shortest path and d(y) = d(x) + cp(x, y). ・In either case, d(x) + cp(x, y) – d(y) = 0. ・By definition, cp(x, y) = p(x) + c(x, y) – p(y). ・Substituting for cp(x, y) yields (p(x) + d(x)) + c(x, y) – (p(y) + d(y)) = 0. ・In other words, cp+d(x, y) = 0. ▪

Given prices p, the reduced cost of edge (x, y) is cp(x, y) = p(x) + c(x, y) – p(y).

31 Maintaining compatible prices

Lemma 2. Let p be compatible prices for M. Let d be shortest path p distances in GM with costs c . Then pʹ = p + d are also compatible prices for M.

Pf. (x, y) ∈ M is the only edge entering in . Thus, on shortest path. ・(y, x) x GM (y, x) ・By LEMMA 1, cp+d(x, y) = 0.

Pf. (x, y) ∉ M is an edge in p . ・(x, y) GM ⇒ d(y) ≤ d(x) + c (x, y) ・Substituting cp(x, y) = p(x) + c(x, y) – p(y) ≥ 0 yields (p(x) + d(x)) + c(x, y) – (p(y) + d(y)) ≥ 0. ・In other words, cp+d(x, y) ≥ 0. ▪

Prices p are compatible with matching M: ・cp(x, y) ≥ 0 for all (x, y) ∉ M. ・cp(x, y) = 0 for all (x, y) ∈ M. 32 Maintaining compatible prices

Lemma 3. Let p be compatible prices for M and let M ʹ be matching obtained by augmenting along a min cost path with respect to cp+d. Then pʹ = p + d are compatible prices for Mʹ.

Pf. ・By LEMMA 2, the prices p + d are compatible for M. ・Since we augment along a min-cost path, the only edges (x, y) that swap into or out of the matching are on the min-cost path. ・By LEMMA 1, these edges satisfy cp+d(x, y) = 0. ・Thus, compatibility is maintained. ▪

Prices p are compatible with matching M: ・cp(x, y) ≥ 0 for all (x, y) ∉ M. ・cp(x, y) = 0 for all (x, y) ∈ M. 33 Successive shortest path algorithm: analysis

Invariant. The algorithm maintains a matching M and compatible prices p. Pf. Follows from LEMMA 2 and LEMMA 3 and initial choice of prices. ▪

Theorem. The algorithm returns a min-cost perfect matching. Pf. Upon termination M is a perfect matching, and p are compatible prices. Optimality follows from OBSERVATION 2. ▪

Theorem. The algorithm can be implemented in O(n3) time. Pf. ・Each iteration increases the cardinality of M by 1 ⇒ n iterations. ・Bottleneck operation is computing shortest path distances d. Since all costs are nonnegative, each iteration takes O(n2) time using (dense) Dijkstra. ▪

34 Weighted bipartite matching

Weighted bipartite matching. Given a weighted bipartite graph with n nodes and m edges, find a maximum cardinality matching of minimum weight.

Theorem. [Fredman–Tarjan 1987] The successive shortest path algorithm solves the problem in O(n2 + m n log n) time using Fibonacci heaps.

Theorem. [Gabow–Tarjan 1989] There exists an O(m n1/2 log(nC)) time algorithm for the problem when the costs are integers between 0 and C.

SIAM J. COMPUT. ()1989 Society for Industrial and Applied Mathematics Vol. 18, No. 5, pp. 1013-1036, October 1989 011

FASTER SCALING ALGORITHMS FOR NETWORK PROBLEMS* HAROLD N. GABOW AND ROBERT E. TARJAN$

Abstract. This paper presents algorithms for the assignment problem, the transportation problem, and the minimum-cost flow problem of operations research. The algorithms find a minimum- cost solution, yet run in time close to the best-known bounds for the corresponding problems without costs. For example, the assignment problem (equivalently, minimum-cost matching in a bipartite graph) can be solved in O(v/’rn log(nN)) time, where n, m, and N denote the number of vertices, number of edges, and largest magnitude of a cost; costs are assumed to be integral. The algorithms work by scaling. As in the work of Goldberg and Tarjan, in each scaled problem an approximate optimum solution is found, rather than an exact optimum.

Key words, graph theory, networks, assignment problem, matching, scaling

subject classifications. 68Q20, 68Q25, 68R10, 05C70 AMS(MOS) 35 1. Introduction. Many problems in operations research involve minimizing a cost function defined on a bipartite or directed graph. A simple but fundamental example is the assignment problem. This paper gives algorithms for such problems that run almost as fast as the best-known algorithms for the corresponding problems without costs. For the assignment problem, the corresponding problem without costs is maximum cardinality bipartite matching. The results are achieved by scaling the costs. This requires the costs to be integral-valued. Further, for the algorithms to be efficient, costs should be polynomi- ally bounded in the number of vertices, i.e., at most n(1). These requirements are satisfied by a large number of problems in both theoretical and practical applications. Table 1 summarizes the results of the paper. The parameters describing the input are specified in the caption and defined more precisely below. The first column gives the problem and the best-known strongly polynomial time bound. Such a bound comes from an algorithm with running time independent of the size of the numbers (assuming the uniform cost model of computation [AHU]). The second column gives the time bounds achieved in this paper by scaling. The table shows that significant speedups can be achieved through scaling. Further, it will be seen that the scaling algorithms are simple to program. Now we discuss the specific results. The assignment problem is to find a minimum-cost perfect matching in a bipartite graph. The strongly polynomial algorithm is the Hungarian algorithm [K55], [K56] implemented with Fibonacci heaps [FT]. This algorithm can be improved significantly when all costs are zero. Then the problem amounts to finding a perfect matching in a bipartite graph. The best-known cardinality matching algorithm, due to Hopcroft and Sarp, runs in time O(v/-m) [HK]. The new time bound for the assignment problem is

*Received by the editors August 18, 1987; accepted for publication (in revised form) November 4, 1988. Department of Computer Science, University of Colorado, Boulder, Colorado 80309. The research of this author was supported in part by National Science Foundation grant DCR-851191 and AT&T Bell Laboratories. :[:Computer Science Department, Princeton University, Princeton, New Jersey 08544 and AT&T Bell Laboratories, Murray Hill, New Jersey 07974. The research of this author was supported in part by National Science Foundation grant DCR-8605962 and Office of Naval Research contract N00014-87-K-0467. 1013 History

Thorndike 1950. Formulated in a modern way by a psychologist.

Assign individuals to jobs to maximize average success of all individuals.

36 History

Thorndike 1950. Formulated in a modern way by a psychologist.

anticipated theory of computational complexity!

37 History

Kuhn 1955. First poly-time algorithm; named “Hungarian” algorithm to honor two Hungarian mathematicians (Kőnig and Egerváry).

Munkres 1957. Reviewed algorithm; observed O(n4) implementation.

Edmonds–Karp, Tomizawa 1971. Improved to O(n3).

anticipated development of combinatorial optimization

38 History

Jacobi (1804–1851). Introduces a bound on the order of a system of m ordinary differential equations in m unknowns and reduces it to….

Looking for the order of a system of arbitrary ordinary diferential equations

39 16 C.G.J. Jacobi

[ .3.] § History About the resolution of the problem of inequalities on which the research of the order of the system of arbi- trary di↵erential equations is supported4. [Considering a table, we define a canon. An arbitrary canon being Jacobi (1804–1851). The assignment problem! Moreover,given, we find ahe simplest provides one.] a polynomial-time algorithm. ywhatprecedes,the research of the order of a system of ordinary di↵erential equations is reduced to the following problem of inequalities, whichB is also worth to be considered for itself:

Problem. (i) We dispose nn arbitrary quantities hk in a square table in such a way that we have n horizontal series and n vertical series having each one n terms. Among these quantities, to chose n being transversal, that is all disposed in di↵erent horizontal and vertical series, which may be done in 1.2 ...n ways; and among these ways, to research one Looking for the order of a system of arbitrary ordinary di↵erential equations 17 that gives the maximum of the sum of the n chosen numbers.

[. . . it may occur that all combinationsh10 leadh20 to... the same hn0 sum. For example if, as it happens for the isoperimetrical problem,h100 h this200 table... hn00 ...... 2m1 m1 + m2 ... m1 + mn h(n) h(n) ... h(n), m2 + m1 2m1 2 2 ... m2n+ mn we can add to each...... term of the same horizontal series a same quantity, and we call `(i) the quantitymn + m added1 mn to+ m the2 terms... 2m ofn, the ith horizontal series. This

is given.being If done,m1 is the each greatest of the of 1. the2 ...n quantitiestransversalm1, m sums2,..., amongmn, the which terms we of each need to verticalsfind awill maximum be made is equal increased by increasing by the respectively same quantity the lines of the horizontal series by the positive values 0, m m ,...,m (n)m ,...]5 `10 + `002+ + 1` =n L, ··· because, in order(i to) form these sums, we need to pick a term in each hori- 6The quantities h being disposed in a square figure zontal series. Hence,k if we pose 4 This was a 23 in the manuscript. h(i) + `(i) = p(i) 5The remaining§ of [II/13 b) fo 2200] hask been ruledk out by Jacobi. I extract this part which shows how the ideas developped in the last section of the paper(i) could have arisen and that the maximal transversal sum of the terms hk is from his work on the isoperimetrical problem. T.N. 6Cohn’s transcription continues(i1 here) with(i2) a second fragment(in) of a 19 entitled by Jacobi h1 + h2 + + hn = H, About the di↵erentiations and eliminations by which··· the shortest reduction§ (see [Jacobi 2]) (i) to canonicalthis makes form that is done. the A value problem of the of inequalities maximalsum that must formed be solved with to the performpk is this reduction. T.N. p(i1) + p(i2) + + p(in) = H + L, 1 2 ··· n and reciprocally. So that finding the proposed maximum for the quanti- ties h(i) or p(i) is equivalent. Jacobi formulated the assignment problem; proposed andk analyzedk the Hungarian algorithm (n) Let us bring it about that the quantities `0, `00,...,` be determined in (i) such a way that, the quantities pk being disposed in square in the same 40 (i) way as the quantities hk and chosing a maximum in each vertical series, (ik) these maxima be placed in all di↵erent horizontal series. If we call pk the maximum of terms (n) pk0 ,pk00,...,pk , the sum p(i1) + p(i2) + + p(in) 1 2 ··· n will be the maximum among all the transversal sums formed with the quan- (i) tities pk . [. . . ] Indeed, in this case, we have without trouble the maximal (i) transversal sum formed with the proposed quantities hk h(i1) + h(i2) + + h(in). 1 2 ··· n (n) So that we solve the proposed problem when we find quantities `0, `00,...,` [satisfying the given condition]. 7. NETWORK FLOW III

‣ assignment problem ‣ input-queued switching Input-queued switching

Input-queued switch. ・n input ports and n output ports in an n-by-n crossbar layout. ・At most one cell can depart an input at a time. ・At most one cell can arrive at an output at a time. ・Cell arrives at input x and must be routed to output y.

Application. High-bandwidth switches.

x1

inputs x2

x3

y1 y2 y3

outputs 42 FIFO queuing

FIFO queueing. Each input x maintains one queue of cells to be routed.

Head-of-line blocking (HOL). A cell can be blocked by a cell queued ahead of it that is destined for a different output.

x y2 y1 y2 1

x FIFO y2 2

y1 y3 y3 x3

y1 y2 y3

outputs 43 FIFO queuing

FIFO queueing. Each input x maintains one queue of cells to be routed.

Head-of-line blocking (HOL). A cell can be blocked by a cell queued ahead of it that is destined for a different output.

Fact. FIFO can limit throughput to 58% even when arrivals are uniform i.i.d.

Input Versus Output Queueing on a Space-Division Input Versus Output Queueing on aPack& Space-Division Switch Pack& Switch

Abstract-Two simple models of queueing on an N X N space-division ficonnections may be contending for use of the same center packetswitch are examined. Abstract-Two The switch operates simple synchronously models with of queueinglink. The onuse anof Nablocking X N space-divisionnetwork as apacket fi connections switch is may be contending for use of the same center fixed-length packets; duringpacket each timeswitch slot,are packets examined.may arrive on Theany switchfeasible only operates under light synchronously loads or, alternatively, with link. if it is The possible use of ablocking network as apacket switch is inputs addressed to any outputs.fixed-length Because packet packets; arrivals toduring the switch each are timeto run slot, the switchpackets substantially may arrive faster on than any the inputfeasible and output only under light loads or, alternatively, if it is possible unscheduled, more than one packet mayarrive for the same output trunks. during the same time slot, inputsmaking addressedqueueing unavoidable. to any outputs.Mean queue Because In this packet .paper, arrivals we consider to the onlyswitch nonblocking are to networks.run the A switch substantially faster than the input and output lengths are always greater forunscheduled, queueing on inputs more than than for queueing one packet on simple may examplearrive offor a nonblocking the same switch output fabric trunks. isthe crossbar outputs, and the output queuesduring saturate the only same as the timeutilization slot, approaches making interconnectqueueing unavoidable. with switch points Mean (Fig.queue1). Here Initis thisalways .paper, we consider only nonblocking networks. A unity. Input queues, on the other hand, saturate at a utilization that possible to establish a connectionbetween any idle input- depends on N, but is approximatelylengths (2 are - &) always = 0.586 greater when Nis for large. queueing If output on pair. inputs Examples than offor other queueing nonblocking on switchsimple fabrics example are of a nonblocking switch fabric isthe crossbar output trunk utilization is outputs,the primary and consideration, the output itis queuespossible saturate to given only in [3].as the Even utilization with anonblocking approachesinterconnect, interconnectsome with switchpoints (Fig.1). Here itis always slightly increase utilization ofunity. the output Input trunks-up queues, to (1 on- e-’) the = 0.632 other queueing hand, saturate in a packet at switch a utilizationis unavoidable, that simplypossible because theto establish a connectionbetween any idle input- as N --t --by dropping interfering packets at the end of each time slot, switch acts as a statistical multiplexor; that is, packet arrivals ratherthan storing them depends in the input on queues. N, but This is approximately improvement is to(2 the- &)switch = are 0.586 unscheduled. when Nis If morelarge. than If oneoutput packet arrivespair. Examples of other nonblocking switch fabrics are possible, however, only whenoutput the utilization trunk of utilizationthe input trunks is exceeds the primarya for the consideration, same output at itais given possible time,queueing to givenis required. in [3]. Even with anonblocking interconnect, some second critical threshold-approximately In (1 + A) = 0.881 for large Depending on the speed of the switch fabric and its particular = queueing in a packet switchis unavoidable, simply because the N. slightly increase utilization of the outputarchitecture, trunks-up there tomay (1 be - ae-’) choice 0.632as towhere the queueing is as N --t --by dropping interferingdone: packets for atexample, the end on ofthe each input time trunk, slot, on the outputswitch trunk, acts or as a statistical multiplexor; that is, packet arrivals I.rather INTRODUCTIONthan storing them in the inputat an internal queues. node. This improvement is to theswitch are unscheduled. If more than one packet arrives We assumethat the switch operates synchronously with PACE-DIVISION packetpossible, switching however, is emerging only whenas a thekey utilizationfixed-length of thepackets, input and trunks that during exceeds each a time for s1ot;packets the same output at a given time,queueing is required. Scomponent in thetrend secondtoward high-performance critical threshold-approximately may arriveIn (1 on + any A) inputs = 0.881 addressed for tolarge any outputsDepending (Fig. 2). If on the speed of the switch fabric and its particular integrated communicationnetworks for data,voice, image, N. the switch fabric runs N times as fast as the inputarchitecture, and output there may be a choice as towhere the queueing is and video [l], 121 andmultiprocessor interconnects for trunks, all the packets thatarrive during a particular input time building highly parallelcomputer systems [3], [4]. Unlike slot can traverse the switch before the next inputdone: slot, but for there example, onthe input trunk, on the output trunk, or present-day packetswitch architectures with throughputs I. INTRODUCTIONwill still be queueing at the outputs [Fig. l(a)].at This an queueing internal node. measured in 1’s or at most 10’s of Mbits/s, a space-division really has nothing to dowith the switch architecture,We but assumeis due that the switch operates synchronously 44with packet switch can have throughputsPACE-DIVISION measured in l’s, packet lo’s, or switching to the simultaneous is emerging arrival of moreas a thankey one input packet for even 100’s of Gbitsls. These capacities are attained through the same output. If, on theother hand, the switchfixed-length fabric runsat packets, and that during each time s1ot;packets the use of a highly parallelS componentswitch fabric coupled in withthetrend towardsimple high-performance the same speed as the inputs and outputs, only mayone packet arrive can on any inputs addressed to any outputs (Fig. 2). If per packet processingdistributed integratedamong communication many high-speednetworks be accepted for by data,any givenvoice, outputimage, line during a time slot, and VLSI circuits. other packets addressed to the same output mustthe queue switch on the fabric runs N times as fast as the input and output Conceptually, a space-divisionand video packet [l],switch is121 a boxandmultiprocessor with input linesinterconnects [Fig. l(b)]. For simplicity, forwe do nottrunks, consider all the the packets thatarrive during a particular input time N inputs and N outputsbuilding that routes the highly packets parallel arrivingcomputer on its intermediatesystems case where[3],some [4]. Unlikepackets canbe slotqueued can traverseat the switch before the next input slot, but there inputs to the appropriatepresent-day outputs. At any packet given time,switch internalarchitectures internal nodes, aswith in the Banyanthroughputs topology. switch points can be set to establish certain paths from inputs It seems intuitively reasonable that the meanwill queue still lengths, be queueing at the outputs [Fig. l(a)]. This queueing to outputs; the routing informationmeasured used in 1’sto establishor at most input- 10’sand ofhence Mbits/s, the mean waitinga space-division times, will be greaterreally for queueing has nothing to dowith the switch architecture, but is due output paths is often containedpacket inswitch the header can of haveeach arriving throughputs on inputs measured than for queueing in l’s, on outputs.lo’s, orWhen toqueueing the simultaneous is done arrival of more than one input packet for packet.Packets may haveeven to 100’sbebuffered of Gbitsls.within the switchThese capacitieson inputs, a packet are attainedthat could throughtraverse the .switch to an idle until appropriate connections are available; the location of the output during the current time slot may have tothe wait same in queue output. If, on theother hand, the switch fabric runsat buffers and the amountthe of buffering use of required a highly depend parallel on the switch behind fabric a packet coupled whose output with is currentlysimple busy.the The same intuition speed as the inputs and outputs, only one packet can switch architecture andper the statisticspacket of processingthe offered traffic.distributed that,among if possible, itmany is better high-speedto queue on the outputsbe accepted than the by any given output line during a time slot, and Clearly, congestion canVLSI occur circuits. if the switch is a blocking inputs of a space-division packet switch also pertains to the network, that is, if there are not enoughswitch points to following situation. Consider a single road leadingother to packets both a addressed to the same output must queue on the providesimultaneous, independent Conceptually,paths between a space-division arbitrary spot-@ packet arena switchanda store is a [Fig. box 3(a)]. with Even ifinput there linesare no [Fig. l(b)]. For simplicity, we do not consider the pairsof inputs and outputs.A Banyan switch [3]-[5], for N inputs and N outputs that routescustomers thewaiting packets for arrivingservice in onthestore, its intermediatesomeshoppers case wheresome packets canbe queued at example, is ablocking network. inputs toIn the a Banyan appropriate switch,even outputs. might At be stuckany ingiven stadium time, traffic. internal A simple bypassroad around when every input is assigned to a different output, as many as the stadium is the obvious solution [Fig. 3(b)].internal nodes, as in the Banyan topology. switch points can be set to establishThis paper certain quantifies paths the from performance inputs improvements It seems pro- intuitively reasonable that the mean queue lengths, Paper approved by the Editorto outputs; for Local Area theNetworks routing of information the IEEE vided used by outputqueueingto establish for the input-following simple andmodel. hence the mean waiting times, will be greater for queueing Communications Society. Manuscriptoutput received paths August is often8, 1986; containedrevised May Independent, in the header statistically of each identical arriving traffic arrives oneach input 14, 1987. This paper was presented atGLOBECOM’86, Houston, TX, trunk. In any given time slot, the probability thaton a inputspacket will than for queueing on outputs. When queueing is done December 1986. packet.Packets may have to arrivebebuffered ona particular within input the p.is Thus,switch p represents on theinputs, average a packet that could traverse the .switch to an idle M. J. Karol is with ATLT Bell Laboratories, Holmdel, NJ 07733. utilization of each input. Each packet has equal probability 1/ M. G. Hluchyjwas with AT&Tuntil Bell appropriate Laboratories, Holmdel, connections NJ 07733. He are available; the location of the output during the current time slot may have to wait in queue is now with the Codex Corporation,buffers Canton, and MA the 02021. amount of bufferingN of being requiredaddressed depend to- any given onoutput, the and successive S. P. Morgan is with AT&T Bell Laboratories, Murray Hill, NJ 07974. packets are independent. behind a packet whose output is currently busy. The intuition IEEE Log Number 8717486.switch architecture and the statisticsWith output of the queueing, offered all arrivingtraffic. packets inthat, a time if slotpossible,are it is better to queue on the outputs than the Clearly, congestion can occur if the switch is a blocking inputs of a space-division packet switch also pertains to the 0090-6778/87/1200-1347$01.00 0 1987 IEEE network, that is, if there are not enoughswitch points to following situation. Consider a single road leading to both a providesimultaneous, independent paths between arbitrary spot-@ arena anda store [Fig. 3(a)]. Even if there are no pairsof inputs and outputs.A Banyan switch [3]-[5], for customerswaiting for service in thestore, someshoppers example, is ablocking network. In a Banyan switch,even might be stuck in stadium traffic. A simple bypassroad around when every input is assigned to a different output, as many as the stadium is the obvious solution [Fig. 3(b)]. This paper quantifies the performance improvements pro- Paper approved by the Editor for Local AreaNetworks of the IEEE vided by outputqueueing for thefollowing simple model. Communications Society. Manuscript received August 8, 1986; revised May Independent, statistically identical traffic arrives oneach input 14, 1987. This paper was presented atGLOBECOM’86, Houston, TX, trunk. In any given time slot, the probability that a packet will December 1986. arrive ona particular input p.is Thus, p represents the average M. J. Karol is with ATLT Bell Laboratories, Holmdel, NJ 07733. 1/ M. G. Hluchyjwas with AT&T Bell Laboratories, Holmdel, NJ 07733. He utilization of each input. Each packet has equal probability is now with the Codex Corporation, Canton, MA 02021. N of beingaddressed to- anygiven output, and successive S. P. Morgan is with AT&T Bell Laboratories, Murray Hill, NJ 07974. packets are independent. IEEE Log Number 8717486. With output queueing, allarriving packets in a time slotare

0090-6778/87/1200-1347$01.00 0 1987 IEEE Virtual output queueing

Virtual output queueing (VOQ). Each input x maintains n queues of cells, one for each output y.

Maximum-size matching. Find a max cardinality matching. Fact. VOQ achieves 100% throughput when arrivals are uniform i.i.d. but can starve input-queues when arrivals are nonuniform.

y1

y2 y2 y2 x1 y3 y3

x2 VOQ y2 y2 y2

y3 y3 y3 y3

x3

y2 y2 y2

y3

y1 y2 y3

outputs 45 Input-queued switching

Maximum-weight matching. Find a min cost perfect matching between inputs x and outputs y, where c(x, y) equals: ・[LQF] The number of cells waiting to go from input x to output y. ・[OCF] The waiting time of the cell at the head of VOQ from x to y.

Theorem. LQF and OCF achieve 100% throughput if arrivals are independent (even if not uniform).

1260 1260 IEEE TRANSACTIONS ON COMMUNICATIONS,IEEE VOL. TRANSACTIONS 47, NO. 8, AUGUST ON COMMUNICATIONS, 1999 VOL. 47, NO. 8, AUGUST 1999 Achieving 100%Achieving Throughput 100% Throughput in an Input-Queuedin an SwitchInput-Queued Switch

Nick McKeown, Senior Member,Nick IEEE, McKeown,Adisak Mekkittikul,Senior Member,Member, IEEE, IEEE,Adisak Mekkittikul, Member, IEEE, Practice. Venkat Anantharam, Fellow, IEEEVenkat, and Anantharam,Jean Walrand,Fellow,Fellow, IEEE IEEE, and Jean Walrand, Fellow, IEEE Assignment problem too slow in practice. Abstract— It is well known that head-of-lineAbstract— blockingIt is well limits known that head-of-line blocking limits ・ the throughput of an input-queued switchthe throughput with first-in–first-out of an input-queued switch with first-in–first-out (FIFO) queues. Under certain conditions,(FIFO) the queues. throughput Under can certain be conditions, the throughput can be shown to be limited to approximatelyshown 58.6%. to be It islimited also known to approximately 58.6%. It is also known Difficult to implement in hardware. that if non-FIFO queueing policies arethat used, if non-FIFO the throughput queueing can policies are used, the throughput can be increased. However, it has not beenbe previously increased. shown However, that it if has a not been previously shown that if a ・ suitable queueing policy and scheduling algorithm are used, then suitable queueing policy and scheduling algorithm are used, then it is possible to achieve 100% throughputit is possible for all to independent achieve 100% throughput for all independent arrival processes. In this paper we provearrival this processes. to be the case In this using paper we prove this to be the case using Provides theoretical framework: a simple argumenta simple and quadratic linear programming Lyapunov argument and quadratic Lyapunov ・ function. In particular, we assumefunction. that each In input particular, maintains we assume that each input maintains a separate FIFO queue for each outputa separate and that FIFO the queue switch for is each output and that the switch is scheduled using a maximum weight bipartitescheduled matching using a algorithm. maximum weight bipartite matching algorithm. use maximal (weighted) matching. We introduce two maximum weight matchingWe introduce algorithms: two maximum longest weight matching algorithms: longest queue first (LQF) and oldest cell firstqueue (OCF). first Both (LQF) algorithms and oldest cell first (OCF). Both algorithms achieve 100% throughput for all independentachieve 100% arrival throughput processes. forFig. all independent 1. Components arrival of an input-queuedprocesses. cell-switch.Fig. 1. Components of an input-queued cell-switch. LQF favors queues with larger occupancy,LQF favors ensuring queues that with larger larger occupancy, ensuring that larger queues will eventually be served. However,queues we will find eventually that LQF be can served. However, we find that LQF can lead to the permanent starvation of shortlead queues. to the permanent OCF overcomes starvation of4) short arriving queues. packets OCF overcomes are of fixed and4) arriving equal length, packets called are of fixed and equal length, called this limitation by favoring cells with large waiting times. this limitation by favoring cells with large waiting times. cells; cells; Index Terms—Arbitration, ATM, input-queuedIndex Terms— switch,Arbitration, input- ATM,5) input-queuedis large. switch, input- 5) is large. queueing, packet switch, queueing networks, scheduling algo- queueing, packet switch, queueing networks, scheduling algo- When conditions 1) and 2) are true weWhen shall conditions say that arrivals 1) and 2) are true we shall say that arrivals rithm. rithm. are independent, and when conditionare 3)independent is true we, andshall when say condition 3) is true we shall say that arrivals are uniform. that arrivals are uniform. I. INTRODUCTION I. INTRODUCTIONThe throughput is limited because aThe cell throughput can be held is up limited by because a cell can be held up by INCE Karol et al.’s paper was publishedINCE Karol in 1986et [11], al.’s paperit another was published cell ahead in of 1986 it in [11], line it thatanother is destined cell for ahead a different of it in line that is destined for a different Shas become well known that anShas becomeport input-queued well known thatoutput. an This phenomenonport input-queued is knownoutput. as head-of-line This46 phenomenon (HOL) is known as head-of-line (HOL) switch with first-input–first-output (FIFO)switch with queues first-input–first-output can have a blocking. (FIFO) queues can have a blocking. throughput limited to just throughput%. limitedThe conditions to just It is well documented%. The conditions that this resultIt isapplies well documented only to input- that this result applies only to input- for this to be true are that: for this to be true are that: queued switches with FIFO queuesqueued. And so switches many techniqueswith FIFO queues. And so many techniques have been suggested for reducing HOL blocking using non- 1) arrivals at each input are independent1) arrivals and at identically each input arehave independent been suggested and foridentically reducing HOL blocking using non- FIFO queues, for example, by examining the first cells distributed (i.i.d.); distributed (i.i.d.); FIFO queues, for example, by examining the first cells in a FIFO queue, where [5], [8], [10]. In fact, HOL 2) arrival processes at each input2) are arrival independent processes of ar- at eachin ainput FIFO are queue, independent where of ar- [5], [8], [10]. In fact, HOL blocking can be eliminated entirely by using a simple buffering rivals at other inputs; rivals at other inputs; blocking can be eliminated entirely by using a simple buffering strategy at each input port. Rather than maintain a single FIFO 3) all arrival processes have the3) same all arrival processesrate and havestrategy the at same each arrivalinput port. rate Rather and than maintain a single FIFO queue for all cells, each input maintains a separate queue destinations are uniformly distributeddestinations over all outputs; are uniformlyqueue distributed for all overcells, all each outputs; input maintains a separate queue for each output [1], [9], [16]–[19],for as each shown output in Fig. [1], 1. [9], This [16]–[19], as shown in Fig. 1. This Paper approved by P. E. Rynes, the EditorPaper for Switching approved Systems by P. E. of Rynes, the the Editor for Switching Systems of the queuing discipline is often referred to as virtual output queuing IEEE Communications Society. Manuscriptqueuing received discipline July 29, is often 1997;referred revised to as virtual output queuing IEEE Communications Society. Manuscript received July 29, 1997; revised (VOQ). HOL blocking is eliminated because a cell cannot be October 5, 1998. This paper was presentedOctober in part 5, at 1998. INFOCOM’96, This paper San was presented(VOQ). in HOL partat blocking INFOCOM’96, is eliminated San because a cell cannot be Francisco, CA. Francisco, CA. held up by a cell queued ahead ofheld it that up by is destined a cell queued for a ahead of it that is destined for a N. McKeown is with the Department of ElectricalN. McKeown Engineering, is with Stanford the Department of Electrical Engineering, Stanford different output. The implementation of VOQ is slightly more University, Stanford, CA 94305-9030different USA (e-mail: output. [email protected]). The implementation of VOQ is slightly more University, Stanford, CA 94305-9030 USA (e-mail: [email protected]). complex, requiring FIFO’s to be maintained by each input A. Mekkittikul was with the DepartmentA. of Mekkittikul Electrical was Engineering, with the Departmentcomplex, requiring of Electrical Engineering,FIFO’s to be maintained by each input Stanford University, Stanford, CA 94305-9030Stanford USA. University, He is Stanford,now with CAbuffer. 94305-9030 But no USA. additional He is now speedup with isbuffer. required; But at no most additional one cell speedup is required; at most one cell Berkeley Concept Research Corporation, Berkeley,Berkeley CA Concept 94704 Research USA (e-mail: Corporation, Berkeley, CA 94704 USA (e-mail: can arrive and depart from each input in a slot. During each [email protected]). can arrive and depart from each input in a slot. During each [email protected]). slot, a scheduling algorithm decides the configuration of the V. Anantharam is with the Department ofV. Electrical Anantharam Engineering is with the and Departmentslot, a schedulingof Electrical Engineeringalgorithm decidesand the configuration of the Computer Sciences, University of CaliforniaComputer at Berkeley, Sciences, Berkeley, University CA ofswitch California by at finding Berkeley, a matching Berkeley, CA on aswitch bipartite by graph finding (described a matching on a bipartite graph (described 94720 USA (e-mail: [email protected]).94720 USA (e-mail: [email protected]). below). A number of different techniques have been used for J. Walrand is with Odyssia Systems,below). Berkeley, A number CA 94710 of USA different (e-mail: techniques have been used for J. Walrand is with Odyssia Systems, Berkeley, CA 94710 USA (e-mail: finding such a matching, for example, using neural networks [email protected]). [email protected]). finding such a matching, for example, using neural networks Publisher Item Identifier S 0090-6778(99)06305-9.Publisher Item Identifier S 0090-6778(99)06305-9.[2], [4], [22] or iterative algorithms[2], [1], [4],[14], [22] [15]. or These iterative algo- algorithms [1], [14], [15]. These algo-

0090–6778/99$10.00 © 1999 IEEE 0090–6778/99$10.00 © 1999 IEEE