<<

7. NETWORK FLOW III 7. NETWORK FLOW III

‣ assignment problem ‣ assignment problem ‣ input-queued switching ‣ input-queued switching

SECTION 7.13 Lecture slides by Kevin Wayne Copyright © 2005 Pearson-Addison Wesley http://www.cs.princeton.edu/~wayne/kleinberg-tardos

Last updated on Apr 3, 2013 6:16 PM

Assignment problem Assignment problem

Input. Weighted, complete G = (X ∪ Y, E) with | X | = | Y |. Input. Weighted, complete bipartite graph G = (X ∪ Y, E) with | X | = | Y |. Goal. Find a perfect of min weight. Goal. Find a perfect matching of min weight.

0 15 0' 0 15 0' 7 7 3 3

min-cost perfect matching 5 5 M = { 0-2', 1-0', 2-1' } 1 1' 1 1' 6 6 cost(M) = 3 + 5 + 4 = 12 2 2

9 9 4 4 2 1 2' 2 1 2'

X Y X Y

3 4 Princeton writing seminars Locating objects in space

Goal. Given m seminars and n = 12m students who rank their top 8 choices, Goal. Given n objects in 3d space, locate them with 2 sensors. assign each student to one seminar so that: ・Each seminar is assigned exactly 12 students. Solution. ・Students tend to be "happy" with their assigned seminar. ・Each sensor computes line from it to each particle.

・Let cij = distance between line i from censor 1 and line j from sensor 2. Solution. ・Due to measurement errors, we might have cij > 0. ・Create one node for each student i and 12 nodes for each seminar j. ・Solve assignment problem to locate n objects. ・Solve assignment problem where cij is some function of the ranks:

f(rank(i, j)) B7 i `MFb j cij = B7 i /Q2b MQi `MF j

5 6

Kidney exchange Applications

If a donor and recipient have a different blood type, they can exchange their Natural applications. kidneys with another donor and recipient pair in a similar situation. ・Match jobs to machines. ・Match personnel to tasks. ・Match PU students to writing seminars. Can also be done among multiple pairs (or starting with an altruistic donor). Non-obvious applications. ・Vehicle routing. ・Kidney exchange. ・Signal processing. ・Multiple object tracking. ・Virtual output queueing. ・Handwriting recognition. ・Locating objects in space. ・Approximate string matching. ・Enhance accuracy of solving linear systems of equations.

7 8 Bipartite matching Alternating path

Bipartite matching. Can solve via reduction to maximum flow. Def. An alternating path P with respect to a matching M is an alternating sequence of unmatched and matched edges, starting from an unmatched Flow. During Ford-Fulkerson, all residual capacities and flows are 0-1; node x ∈ X and going to an unmatched node y ∈ Y. flow corresponds to edges in a matching M. Key property. Can use P to increase by one the cardinality of the matching. Pf. Set M ' = M ⊕ P. 1 Residual graph GM simplifies to: symmetric difference 1 1 ・If (x, y) ∉ M, then (x, y) is in GM. s t ・If (x, y) ∈ M, then (y, x) is in GM.

X Y y y y Augmenting path simplifies to: ・Edge from s to an unmatched node x ∈ X, ・Alternating sequence of unmatched and matched edges,

・Edge from unmatched node y ∈ Y to t. x x x

matching M alternating path P matching M'

9 10

Assignment problem: successive shortest path algorithm Finding the shortest alternating path

Cost of alternating path. Pay c(x, y) to match x-y; receive c(x, y) to unmatch. Shortest alternating path. Corresponds to minimum cost s↝t path in GM.

1 10 1' 1 10 1' 0 -6 6 P = 2 → 2' → 1 → 1' s t cost(P) = 2 - 6 + 10 = 6 0 7 7 2 2 2' 2 2 2'

Concern. Edge costs can be negative. Shortest alternating path. Alternating path from any unmatched node x ∈ X to any unmatched node y ∈ Y with smallest cost. Fact. If always choose shortest alternating path, then GM contains no negative cycles ⇒ can compute using Bellman-Ford.

Successive shortest path algorithm. Our plan. Use duality to avoid negative edge costs (and negative cycles) ・Start with empty matching. ⇒ can compute using Dijkstra. ・Repeatedly augment along a shortest alternating path.

11 12 Equivalent assignment problem Equivalent assignment problem

Duality intuition. Adding a constant p(x) to the cost of every edge Duality intuition. Subtracting a constant p(y) to the cost of every edge incident to node x ∈ X does not change the min-cost perfect matching(s). incident to node y ∈ Y does not change the min-cost perfect matching(s).

Pf. Every perfect matching uses exactly one edge incident to node x. ▪ Pf. Every perfect matching uses exactly one edge incident to node y. ▪

original costs c(x, y) modified costs c'(x, y) original costs c(x, y) modified costs c'(x, y) p(0) = 3 0 15 0' 0 18 0' 0 15 0' p(0') = 5 0 10 0' 7 10 7 7 3 3 add 3 to all edges 6 subtract 5 from all edges 3 incident to node 0 incident to node 0' 5 5 5 0 1 6 1' 1 6 1' 1 6 1' 1 6 1' 2 2 2 2

9 9 9 4 4 4 4 4 2 1 2' 2 1 2' 2 1 2' 2 1 2'

X Y X Y X Y X Y 13 14

Reduced costs Compatible prices

Reduced costs. For x ∈ X, y ∈ Y, define cp(x, y) = p(x) + c(x, y) – p(y). Compatible prices. For each node v ∈ X ∪ Y, maintain prices p(v) such that: ・cp(x, y) ≥ 0 for all (x, y) ∉ M. Observation 1. Finding a min-cost perfect matching with reduced costs is ・cp(x, y) = 0 for all (x, y) ∈ M. equivalent to finding a min-cost perfect matching with original costs. Observation 2. If prices p are compatible with a perfect matching M, then M is a min-cost perfect matching.

original costs c(x, y) reduced costs cp(x, y) reduced costs cp(x, y) Pf. Matching M has 0 cost. ▪ p(0) = 0 0 15 0' p(0') = 11 0 4 0' 0 4 0' 7 1 1 3 0 0

5 0 0 p(1) = 6 1 6 1' p(1') = 6 1 6 1' 1 6 1' 2 5 5

9 cp(1, 2') = p(1) + 2 – p(2') 0 0 4 0 0 p(2) = 2 2 1 2' p(2') = 3 2 0 2' 2 0 2'

X Y X Y X Y 15 16 Successive shortest path algorithm Successive shortest path algorithm

Initialization. ・M = ∅. SUCCESSIVE-SHORTEST-PATH (X, Y, c) ・For each v ∈ X ∪ Y : p(v) ← 0.

______

M ← ∅. prices p are compatible with M FOREACH v ∈ X ∪ Y : p(v) ← 0. cp(x, y) = c(x, y) ≥ 0

original costs c(x, y) WHILE (M is not a perfect matching) p(0) = 0 p(0') = 0 d ← shortest path distances using costs cp. 0 15 0' 7 P ← shortest alternating path using costs cp. 3 M ← updated matching after augmenting along P. p(1) = 0 p(1') = 0 5 FOREACH v ∈ X ∪ Y : p(v) ← p(v) + d(v). s 1 6 1' t 2

RETURN M.

______9 4 2 1 2' p(2) = 0 p(2') = 0 17 18

Successive shortest path algorithm Successive shortest path algorithm

Initialization. Step 1. ・M = ∅. ・Compute shortest path distances d(v) from s to v using cp(x, y). ・For each v ∈ X ∪ Y : p(v) ← 0. ・Update matching M via shortest path from s to t. ・For each v ∈ X ∪ Y: p(v) ← p(v) + d(v).

reduced costs cp(x, y) shortest path distances d(v) p(0) = 0 p(0') = 0 d(0) = 0 d(0') = 5

0 15 0' 0 15 0' 7 7 3 3

p(1) = 0 p(1') = 0 d(s) = 0 d(1) = 0 d(1') = 4 d(t) = 1 5 5 s 1 6 1' t s 1 6 1' t 2 2

9 9 4 4 2 1 2' 2 1 2' p(2) = 0 p(2') = 0 d(2) = 0 d(2') = 1 19 20 Successive shortest path algorithm Successive shortest path algorithm

Step 1. Step 1. ・Compute shortest path distances d(v) from s to v using cp(x, y). ・Compute shortest path distances d(v) from s to v using cp(x, y). ・Update matching M via shortest path from s to t. ・Update matching M via shortest path from s to t. ・For each v ∈ X ∪ Y: p(v) ← p(v) + d(v). ・For each v ∈ X ∪ Y: p(v) ← p(v) + d(v).

alternating path reduced costs cp(x, y) d(0) = 0 d(0') = 5 p(0) = 0 p(0') = 5

0 15 0' 0 10 0' 7 3 3 2

d(s) = 0 d(1) = 0 d(1') = 4 d(t) = 1 p(1) = 0 p(1') = 4 5 0 s 1 6 1' t s 1 2 1' t 2 1

9 matching 4 matching 4 2-2' 0 2-2' 2 1 2' 2 0 2' d(2) = 0 d(2') = 1 p(2) = 0 p(2') = 1 21 22

Successive shortest path algorithm Successive shortest path algorithm

Step 2. Step 2. ・Compute shortest path distances d(v) from s to v using cp(x, y). ・Compute shortest path distances d(v) from s to v using cp(x, y). ・Update matching M via shortest path from s to t. ・Update matching M via shortest path from s to t. ・For each v ∈ X ∪ Y: p(v) ← p(v) + d(v). ・For each v ∈ X ∪ Y: p(v) ← p(v) + d(v).

shortest path distances d(v) shortest path distances d(v) d(0) = 0 d(0') = 0 d(0) = 0 d(0') = 0

0 10 0' 0 10 0' 3 3 2 2

d(s) = 0 d(1) = 0 d(1') = 1 d(t) = 0 d(s) = 0 d(1) = 0 d(1') = 1 d(t) = 0 0 0 s 1 2 1' t s 1 2 1' t 1 1

4 matching 4 matching 0 2-2' 0 2-2' 1-0' 2 0 2' 2 0 2' d(2) = 1 d(2') = 1 d(2) = 1 d(2') = 1 23 24 Successive shortest path algorithm Successive shortest path algorithm

Step 2. Step 3. ・Compute shortest path distances d(v) from s to v using cp(x, y). ・Compute shortest path distances d(v) from s to v using cp(x, y). ・Update matching M via shortest path from s to t. ・Update matching M via shortest path from s to t. ・For each v ∈ X ∪ Y: p(v) ← p(v) + d(v). ・For each v ∈ X ∪ Y: p(v) ← p(v) + d(v).

reduced costs cp(x, y) shortest path distances d(v) p(0) = 0 p(0') = 5 d(0) = 0 d(0') = 6

0 10 0' 0 10 0' 2 2 1 1

p(1) = 0 p(1') = 5 d(s) = 0 d(1) = 6 d(1') = 1 d(t) = 1 0 0 s 1 1 1' t s 1 1 1' t 0 0

5 matching 5 matching 0 2-2' 1-0' 0 2-2' 1-0' 2 0 2' 2 0 2' p(2) = 1 p(2') = 2 d(2) = 1 d(2') = 1 25 26

Successive shortest path algorithm Successive shortest path algorithm

Step 3. Step 3. ・Compute shortest path distances d(v) from s to v using cp(x, y). ・Compute shortest path distances d(v) from s to v using cp(x, y). ・Update matching M via shortest path from s to t. ・Update matching M via shortest path from s to t. ・For each v ∈ X ∪ Y: p(v) ← p(v) + d(v). ・For each v ∈ X ∪ Y: p(v) ← p(v) + d(v).

shortest path distances d(v) reduced costs cp(x, y) d(0) = 0 d(0') = 6 p(0) = 0 p(0') = 11

0 10 0' 0 4 0' 2 1 1 0

d(s) = 0 d(1) = 6 d(1') = 1 d(t) = 1 p(1) = 6 p(1') = 6 0 0 s 1 1 1' t s 1 6 1' t 0 5

5 matching 0 matching 0 1-0' 0-2' 2-1' 0 1-0' 0-2' 2-1' 2 0 2' 2 0 2' d(2) = 1 d(2') = 1 p(2) = 2 p(2') = 3 27 28 Successive shortest path algorithm Maintaining compatible prices

Termination. Lemma 1. Let p be compatible prices for M. Let d be shortest path distances p p+d ・M is a perfect matching. in GM with costs c . All edges (x, y) on shortest path have c (x, y) = 0.

・Prices p are compatible with M. forward or reverse edges

Pf. Let (x, y) be some edge on shortest path. ・If (x, y) ∈ M, then (y, x) on shortest path and d(x) = d(y) – cp(x, y); p reduced costs cp(x, y) If (x, y) ∉ M, then (x, y) on shortest path and d(y) = d(x) + c (x, y). p(0) = 0 p(0') = 11 ・In either case, d(x) + cp(x, y) – d(y) = 0. 0 4 0' By definition, p . 1 ・ c (x, y) = p(x) + c(x, y) – p(y) 0 ・Substituting for cp(x, y) yields (p(x) + d(x)) + c(x, y) – (p(y) + d(y)) = 0. ・In other words, cp+d(x, y) = 0. ▪ p(1) = 6 p(1') = 6 0 1 6 1' 5

0 matching Given prices p, the reduced cost of edge (x, y) is 0 1-0' 0-2' 2-1' p 2 0 2' c (x, y) = p(x) + c(x, y) – p(y). p(2) = 2 p(2') = 3 29 30

Maintaining compatible prices Maintaining compatible prices

Lemma 2. Let p be compatible prices for M. Let d be shortest path distances Lemma 3. Let p be compatible prices for M and let M ' be matching obtained p p+d in GM with costs c . Then p' = p + d are also compatible prices for M. by augmenting along a min cost path with respect to c . Then p' = p + d are compatible prices for M'. Pf. (x, y) ∈ M

・(y, x) is the only edge entering x in GM. Thus, (y, x) on shortest path. Pf. ・By LEMMA 1, cp+d(x, y) = 0. ・By LEMMA 2, the prices p + d are compatible for M. ・Since we augment along a min-cost path, the only edges (x, y) that swap Pf. (x, y) ∉ M into or out of the matching are on the min-cost path. p p+d ・(x, y) is an edge in GM ⇒ d(y) ≤ d(x) + c (x, y). ・By LEMMA 1, these edges satisfy c (x, y) = 0. ・Substituting cp(x, y) = p(x) + c(x, y) – p(y) ≥ 0 yields ・Thus, compatibility is maintained. ▪ (p(x) + d(x)) + c(x, y) – (p(y) + d(y)) ≥ 0. ・In other words, cp+d(x, y) ≥ 0. ▪

Prices p are compatible with matching M: Prices p are compatible with matching M: ・cp(x, y) ≥ 0 for all (x, y) ∉ M. ・cp(x, y) ≥ 0 for all (x, y) ∉ M. ・cp(x, y) = 0 for all (x, y) ∈ M. ・cp(x, y) = 0 for all (x, y) ∈ M. 31 32 Successive shortest path algorithm: analysis Weighted bipartite matching

Invariant. The algorithm maintains a matching M and compatible prices p. Weighted bipartite matching. Given a weighted bipartite graph with n nodes Pf. Follows from LEMMA 2 and LEMMA 3 and initial choice of prices. ▪ and m edges, find a maximum cardinality matching of minimum weight.

Theorem. The algorithm returns a min-cost perfect matching. Theorem. [Fredman-Tarjan 1987] The successive shortest path algorithm

Pf. Upon termination M is a perfect matching, and p are compatible prices. solves the problem in O(n2 + m n log n) time using Fibonacci heaps. Optimality follows from OBSERVATION 2. ▪

Theorem. [Gabow-Tarjan 1989] There exists an O(m n1/2 log(nC)) time Theorem. The algorithm can be implemented in O(n3) time. algorithm for the problem when the costs are integers between 0 and C. Pf.

・Each iteration increases the cardinality of M by 1 ⇒ n iterations. SIAM J. COMPUT. ()1989 Society for Industrial and Applied Mathematics Vol. 18, No. 5, pp. 1013-1036, October 1989 011 ・Bottleneck operation is computing shortest path distances d. FASTER SCALING ALGORITHMS FOR NETWORK PROBLEMS* Since all costs are nonnegative, each iteration takes O(n2) time HAROLD N. GABOW AND ROBERT E. TARJAN$

using (dense) Dijkstra. ▪ Abstract. This paper presents algorithms for the assignment problem, the transportation problem, and the minimum-cost flow problem of operations research. The algorithms find a minimum- cost solution, yet run in time close to the best-known bounds for the corresponding problems without costs. For example, the assignment problem (equivalently, minimum-cost matching in a bipartite graph) can be solved in O(v/’rn log(nN)) time, where n, m, and N denote the number of vertices, number of edges, and largest magnitude of a cost; costs are assumed to be integral. The algorithms work by scaling. As in the work of Goldberg and Tarjan, in each scaled problem an approximate optimum solution is found, rather than an exact optimum.

Key words, graph theory, networks, assignment problem, matching, scaling

AMS(MOS) subject classifications. 68Q20, 68Q25, 68R10, 05C70 33 34 1. Introduction. Many problems in operations research involve minimizing a cost function defined on a bipartite or directed graph. A simple but fundamental example is the assignment problem. This paper gives algorithms for such problems that run almost as fast as the best-known algorithms for the corresponding problems without costs. For the assignment problem, the corresponding problem without costs is maximum cardinality bipartite matching. History History The results are achieved by scaling the costs. This requires the costs to be integral-valued. Further, for the algorithms to be efficient, costs should be polynomi- ally bounded in the number of vertices, i.e., at most n(1). These requirements are satisfied by a large number of problems in both theoretical and practical applications. Table 1 summarizes the results of the paper. The parameters describing the input Thorndike 1950. Formulated in a modern way by a psychologist. Thorndike 1950. Formulatedare specified in the caption in anda moderndefined more precisely waybelow. by Thea psychologist.first column gives the problem and the best-known strongly polynomial time bound. Such a bound comes from an algorithm with running time independent of the size of the numbers (assuming the uniform cost model of computation [AHU]). The second column gives the time bounds achieved in this paper by scaling. The table shows that significant speedups can be achieved through scaling. Further, it will be seen that the scaling algorithms are simple to program. Now we discuss the specific results. The assignment problem is to find a minimum-cost perfect matching in a bipartite graph. The strongly polynomial algorithm is the Hungarian algorithm [K55], [K56] implemented with Fibonacci heaps [FT]. This algorithm can be improved significantly when all costs are zero. Then the problem amounts to finding a perfect matching in a bipartite graph. The best-known cardinality matching algorithm, due to Hopcroft and Sarp, runs in time O(v/-m) [HK]. The new time bound for the assignment problem is

*Received by the editors August 18, 1987; accepted for publication (in revised form) November 4, 1988. Department of Computer Science, University of Colorado, Boulder, Colorado 80309. The research of this author was supported in part by National Science Foundation grant DCR-851191 and AT&T Bell Laboratories. :[:Computer Science Department, Princeton University, Princeton, New Jersey 08544 and AT&T Bell Laboratories, Murray Hill, New Jersey 07974. The research of this author was supported in part by National Science Foundation grant DCR-8605962 and Office of Naval Research contract N00014-87-K-0467. 1013

Assign individuals to jobs to maximize average success of all individuals. anticipated theory of computational complexity!

35 36 History History

Kuhn 1955. First poly-time algorithm; named "Hungarian" algorithm to Jacobi (1804-1851). Introduces a bound on the order of a system of m honor two Hungarian mathematicians (Kőnig and Egerváry). ordinary differential equations in m unknowns and reduces it to….

Munkres 1957. Reviewed algorithm; observed O(n4) implementation.

Edmonds-Karp, Tomizawa 1971. Improved to O(n3).

anticipated development of combinatorial optimization Looking for the order of a system of arbitrary ordinary diferential equations

37 38

16 C.G.J. Jacobi

[ .3.] § History About the resolution of the problem of inequalities on which the research of the order of the system of arbi- trary di↵erential equations is supported4. [Considering a table, we define a canon. An arbitrary canon being Jacobi (1804-1851). The assignment problem! Moreover,given, we find ahe simplest provides one.] a polynomial-time algorithm. ywhatprecedes,the research of the order of a system of ordinary 7. NETWORK FLOW III di↵erential equations is reduced to the following problem of inequalities, whichB is also worth to be considered for itself:

Problem. (i) ‣ assignment problem We dispose nn arbitrary quantities hk in a square table in such a way that we have n horizontal series and n vertical series having each one n terms. Among these quantities, to chose n being transversal, that is all disposed in di↵erent horizontal and vertical series, which ‣ input-queued switching may be done in 1.2 ...n ways; and among these ways, to research one thatLooking gives for the the maximum order of a of system the sum of arbitrary of the n ordinarychosen numbers. di↵erential equations 17

[. . . it may occur that all combinationsh10 leadh20 to... the same hn0 sum. For example if, as it happens for the isoperimetrical problem,h100 h this200 table... hn00 ...... 2m1 m1 + m2 ... m1 + mn (n) (n) (n) m2 + m1 2hm1 2 h2 ...... m h2n+,mn we can add to each...... term of the same horizontal series a same quantity, and we call `(i) the quantitymn + m added1 mn to+ m the2 terms... 2m ofn, the ith horizontal series. This

is given.being If done,m1 is the each greatest of the of 1. the2 ...n quantitiestransversalm1, m sums2,..., amongmn, the which terms we of each need to verticalsfind awill maximum be made is equal increased by increasing by the respectively same quantity the lines of the horizontal 5 series by the positive values 0, m1 m2,...,m1 (n)mn,...] `0 + `00 + + ` = L, ··· 6because,The quantities in orderh(i to) being form disposed these sums, in a we square need figure to pick a term in each hori- zontal series. Hence,k if we pose 4 This was a 23 in the manuscript. h(i) + `(i) = p(i) 5The remaining§ of [II/13 b) fo 2200] hask been ruled outk by Jacobi. I extract this part SECTION whichand shows that how the the maximal ideas developped transversal in the sum last of section the terms of the paperh(i) is could have arisen from his work on the isoperimetrical problem. T.N. k 6Cohn’s transcription continues(i1 here) with(i2) a second fragment(in) of a 19 entitled by Jacobi h1 + h2 + + hn = H, About the di↵erentiations and eliminations by which··· the shortest reduction§ (see [Jacobi 2]) to canonical form is done. A problem of inequalities that must be solved to perform(i) this this makes that the value of the maximal sum formed with the pk is reduction. T.N. p(i1) + p(i2) + + p(in) = H + L, 1 2 ··· n and reciprocally. So that finding the proposed maximum for the quanti- (i) (i) ties hk or pk is equivalent. Jacobi formulated the assignment problem; proposed and analyzed the Hungarian algorithm (n) Let us bring it about that the quantities `0, `00,...,` be determined in (i) such a way that, the quantities pk being disposed in square in the same 39 (i) way as the quantities hk and chosing a maximum in each vertical series, (ik) these maxima be placed in all di↵erent horizontal series. If we call pk the maximum of terms (n) pk0 ,pk00,...,pk , the sum p(i1) + p(i2) + + p(in) 1 2 ··· n will be the maximum among all the transversal sums formed with the quan- (i) tities pk . [. . . ] Indeed, in this case, we have without trouble the maximal (i) transversal sum formed with the proposed quantities hk h(i1) + h(i2) + + h(in). 1 2 ··· n (n) So that we solve the proposed problem when we find quantities `0, `00,...,` [satisfying the given condition]. Input-queued switching FIFO queuing

Input-queued switch. FIFO queueing. Each input x maintains one queue of cells to be routed. ・n input ports and n output ports in an n-by-n crossbar layout. ・At most one cell can depart an input at a time. Head-of-line blocking (HOL). A cell can be blocked by a cell queued ahead ・At most one cell can arrive at an output at a time. of it that is destined for a different output. ・Cell arrives at input x and must be routed to output y.

Application. High-bandwidth switches.

x1 y2 y1 y2 x1

inputs x2 FIFO y2 x2

x3 y1 y3 y3 x3

y1 y2 y3 y1 y2 y3

outputs outputs 41 42

FIFO queuing Virtual output queueing

FIFO queueing. Each input x maintains one queue of cells to be routed. Virtual output queueing (VOQ). Each input x maintains n queues of cells, one for each output y. Head-of-line blocking (HOL). A cell can be blocked by a cell queued ahead of it that is destined for a different output. Maximum size matching. Find a max cardinality matching. Fact. VOQ achieves 100% throughput when arrivals are uniform i.i.d. Fact. FIFO can limit throughput to 58% even when arrivals are uniform i.i.d. but can starve input-queues when arrivals are nonuniform.

y1

Input Versus Output Queueing on a Space-Division y2 y2 y2 Input Versus Output Queueing on a Space-Division x Pack& Switch y3 y3 1 Pack& Switch

x Abstract-Two simple models of queueing on an N X N space-division ficonnections may be contending for use of the same center 2 Abstract-Two simple models of queueing on an N X N space-division connections may be contending for use of the same center VOQ y2 y2 y2 packetswitch are examined. The switch operates synchronously with link. The use of ablocking network as apacket fi switch is fixed-length packets; duringpacket each timeswitch slot,are packets examined.may arrive on Theany switchfeasible only operates under light synchronously loads or, alternatively, with link. if it is The possible use of ablocking network as apacket switch is inputs addressed to any outputs.fixed-length Because packet packets; arrivals toduring the switch each are timeto run slot, the switchpackets substantially may arrive faster on than any the inputfeasible and output only under light loads or, alternatively, if it is possible y3 y3 y3 y3 unscheduled, more than one packet mayarrive for the same output trunks. to run the switch substantially faster than the input and output during the same time slot, inputsmaking queueingaddressed unavoidable. to any outputs.Mean queue Because In this packet .paper, arrivals we consider to the onlyswitch nonblocking are networks. A lengths are always greater forunscheduled, queueing on inputs more than than for queueing one packeton simple may examplearrive offor a nonblocking the same switch output fabric trunks. isthe crossbar outputs, and the output queuesduring saturate the only same as the utilizationtime slot, approaches making interconnectqueueing unavoidable. with switch points Mean (Fig.queue1). Here Initis thisalways .paper, we consider only nonblocking networks. A unity. Input queues, on the other hand, saturate at a utilization that possible to establish a connectionbetween anysimple idle input- example of a nonblocking switch fabric isthe crossbar x3 depends on N, but is approximatelylengths (2 are - &) always = 0.586 greater when Nis for large. queueing If output on pair. inputs Examples than offor other queueing nonblocking on switch fabrics are output trunk utilization is outputs,the primary and consideration, the output itis queuespossible saturateto given only in [3].as the Even utilization with anonblocking approachesinterconnect, interconnectsome with switchpoints (Fig.1). Here itis always slightly increase utilization ofunity. the output Input trunks-up queues, to (1 on- e-’) the = 0.632 other queueing hand, saturate in a packet at switch a utilizationis unavoidable, that simplypossible because theto establish a connectionbetween any idle input- y2 y2 y2 as N --t --by dropping interfering packets at the end of each time slot, switch acts as a statistical multiplexor; that is, packet arrivals ratherthan storing them depends in the input on queues. N, but This is approximately improvement is to(2 the- switch&) = are 0.586 unscheduled. when Nis If morelarge. than If oneoutput packet arrivespair. Examples of other nonblocking switch fabrics are possible, however, only whenoutput the utilization trunk of utilizationthe input trunks is exceeds the primarya for the consideration, same output at itais given possible time,queueing to givenis required. in [3]. Even with anonblocking interconnect, some y3 Depending on the speed of the switch fabric and its particular second critical threshold-approximatelyslightly increase In (1 + utilizationA) = 0.881 offor thelarge output trunks-up to (1 - e-’) = 0.632 queueing in a packet switchis unavoidable, simply because the N. architecture, there may be a choice as towhere the queueing is as N --t --by dropping interferingdone: packets for atexample, the end on ofthe each input time trunk, slot, on the outputswitch trunk, acts or as a statistical multiplexor; that is, packet arrivals I.rather INTRODUCTIONthan storing them in the inputat an internal queues. node. This improvement is to theswitch are unscheduled. If more than one packet arrives y y y We assumethat the switch operates synchronously for the withsame output at a given time,queueing is required. 1 2 3 PACE-DIVISION packetpossible, switching however, is emerging only whenas a thekey utilizationfixed-length of thepackets, input and trunks that during exceeds each a time s1ot;packets Scomponent in thetrend towardsecond high-performance critical threshold-approximately may arriveIn (1 on + any A) inputs = 0.881 addressed for tolarge any outputsDepending (Fig. 2). If on the speed of the switch fabric and its particular integrated communicationN. networks for data,voice, image, the switch fabric runs N times as fast as the inputarchitecture, and output there may be a choice as towhere the queueing is and video [l], 121 andmultiprocessor interconnects for trunks, all the packets thatarrive during a particular input time building highly parallelcomputer systems [3], [4]. Unlike slot can traverse the switch before the next inputdone: slot, but for there example, onthe input trunk, on the output trunk, or present-day packetswitch architectures with throughputs outputs I. INTRODUCTIONwill still be queueing at the outputs [Fig. l(a)].at This an queueing internal node. measured in 1’s or at most 10’s of Mbits/s, a space-division really has nothing to dowith the switch architecture,We but isassume due that the switch operates synchronously with43 44 packet switch can have throughputsPACE-DIVISION measured in l’s, packet lo’s, or switching to the simultaneous is emerging arrival of moreas a thankey one input packet for even 100’s of Gbitsls. These capacities are attained through the same output. If, on theother hand, the switchfixed-length fabric runsat packets, and that during each time s1ot;packets the use of a highly parallelS componentswitch fabric coupled in withthetrend towardsimple high-performance the same speed as the inputs and outputs, only mayone packet arrive can on any inputs addressed to any outputs (Fig. 2). If per packet processingdistributed integratedamong communication many high-speednetworks be accepted for by data,any givenvoice, outputimage, line during a timethe slot, switch and fabric runs N times as fast as the input and output VLSI circuits. and video [l], 121 andmultiprocessor other packetsinterconnects addressed to the same outputfor must queue on the Conceptually, a space-division packet switch is a box with input lines [Fig. l(b)]. For simplicity, we do nottrunks, consider all the the packets thatarrive during a particular input time N inputs and N outputsbuilding that routes the highly packets parallel arrivingcomputer on its intermediatesystems case where[3],some [4]. Unlikepackets canbe slotqueued can traverseat the switch before the next input slot, but there inputs to the appropriatepresent-day outputs. At any packet given time,switch internalarchitectures internal nodes, as within the Banyanthroughputs topology. switch points can be set to establish certain paths from inputs will still be queueing at the outputs [Fig. l(a)]. This queueing measured in 1’s or at most 10’sIt of seems Mbits/s, intuitively a reasonablespace-division that the mean queue lengths, to outputs; the routing information used to establish input- and hence the mean waiting times, will be greaterreally for queueing has nothing to dowith the switch architecture, but is due output paths is often containedpacket in switch the header can of haveeach arriving throughputs on inputs measured than for queueing in l’s, on outputs. lo’s, Whenor queueingto the simultaneous is done arrival of more than one input packet for packet.Packets may haveeven to be100’sbuffered of Gbitsls.within theswitch These oncapacities inputs, a packet are attainedthat could throughtraverse the .switch to an idle until appropriate connections are available; the location of the the same output. If, on theother hand, the switch fabric runsat the use of a highly parallel switchoutput fabric during thecoupled current timewith slot simple may have to wait in queue buffers and the amount of buffering required depend on the behind a packet whose output is currently busy.the The same intuition speed as the inputs and outputs, only one packet can switch architecture and perthe statisticspacket of processingthe offered traffic.distributed that,among if possible, itmany is better high-speedto queue on the outputsbe accepted than the by any given output line during a time slot, and Clearly, congestion canVLSI occur circuits. if the switch is a blocking inputs of a space-division packet switch also pertains to the network, that is, if there are not enoughswitch points to following situation. Consider a single road leadingother to packets both a addressed to the same output must queue on the providesimultaneous, independent Conceptually,paths between a space-division arbitrary spot-@ packet arena switchanda store is a [Fig. box 3(a)]. with Even ifinput there linesare no [Fig. l(b)]. For simplicity, we do not consider the pairsof inputs and outputs.N inputsA Banyan and N switch outputs [3]-[5], that for routescustomers thewaiting packets for arrivingservice in onthestore, its intermediatesomeshoppers case wheresome packets canbe queued at example, is ablocking network. inputs toIn the a Banyan appropriate switch,even outputs. might At be stuckany ingiven stadium time, traffic. internal A simple bypassroad around when every input is assigned to a different output, as many as the stadium is the obvious solution [Fig. 3(b)].internal nodes, as in the Banyan topology. switch points can be set to establishThis paper certain quantifies paths the from performance inputs improvements It seems pro- intuitively reasonable that the mean queue lengths, Paper approved by the Editorto outputs; for Local Area theNetworks routing of information the IEEE vided used by outputqueueing to establish for the input-following simple andmodel. hence the mean waiting times, will be greater for queueing Communications Society. Manuscriptoutput received paths August is often8, 1986; revisedcontained May Independent, in the header statistically of each identical arriving traffic arrives oneach input 14, 1987. This paper was presented atGLOBECOM’86, Houston, TX, trunk. In any given time slot, the probability thaton a inputspacket will than for queueing on outputs. When queueing is done December 1986. packet.Packets may have to arrivebebuffered ona particular within input the p.is Thus,switch p represents on theinputs, average a packet that could traverse the .switch to an idle M. J. Karol is with ATLT Bell Laboratories, Holmdel, NJ 07733. utilization of each input. Each packet has equal probability 1/ M. G. Hluchyjwas with AT&Tuntil Bell appropriate Laboratories, Holmdel, connections NJ 07733. He are available; the location of the output during the current time slot may have to wait in queue is now with the Codex Corporation,buffers Canton, and MA the 02021. amount of bufferingN of being requiredaddressed depend to- any given onoutput, the and successive S. P. Morgan is with AT&T Bell Laboratories, Murray Hill, NJ 07974. packets are independent. behind a packet whose output is currently busy. The intuition IEEE Log Number 8717486.switch architecture and the statisticsWith output of the queueing, offered all arrivingtraffic. packets inthat, a time if slotpossible,are it is better to queue on the outputs than the Clearly, congestion can occur if the switch is a blocking inputs of a space-division packet switch also pertains to the 0090-6778/87/1200-1347$01.00 0 1987 IEEE network, that is, if there are not enoughswitch points to following situation. Consider a single road leading to both a providesimultaneous, independent paths between arbitrary spot-@ arena anda store [Fig. 3(a)]. Even if there are no pairsof inputs and outputs.A Banyan switch [3]-[5], for customerswaiting for service in thestore, someshoppers example, is ablocking network. In a Banyan switch,even might be stuck in stadium traffic. A simple bypassroad around when every input is assigned to a different output, as many as the stadium is the obvious solution [Fig. 3(b)]. This paper quantifies the performance improvements pro- Paper approved by the Editor for Local AreaNetworks of the IEEE vided by outputqueueing for thefollowing simple model. Communications Society. Manuscript received August 8, 1986; revised May Independent, statistically identical traffic arrives oneach input 14, 1987. This paper was presented atGLOBECOM’86, Houston, TX, trunk. In any given time slot, the probability that a packet will December 1986. arrive ona particular input p.is Thus, p represents the average M. J. Karol is with ATLT Bell Laboratories, Holmdel, NJ 07733. 1/ M. G. Hluchyjwas with AT&T Bell Laboratories, Holmdel, NJ 07733. He utilization of each input. Each packet has equal probability is now with the Codex Corporation, Canton, MA 02021. N of beingaddressed to- anygiven output, and successive S. P. Morgan is with AT&T Bell Laboratories, Murray Hill, NJ 07974. packets are independent. IEEE Log Number 8717486. With output queueing, allarriving packets in a time slotare

0090-6778/87/1200-1347$01.00 0 1987 IEEE Input-queued switching

Max weight matching. Find a min cost perfect matching between inputs x and outputs y, where c(x, y) equals: ・[LQF] The number of cells waiting to go from input x to output y. ・[OCF] The waiting time of the cell at the head of VOQ from x to y.

Theorem. LQF and OCF achieve 100% throughput if arrivals are independent (even if not uniform).

1260 1260 IEEE TRANSACTIONS ON COMMUNICATIONS,IEEE VOL. TRANSACTIONS 47, NO. 8, AUGUST ON COMMUNICATIONS, 1999 VOL. 47, NO. 8, AUGUST 1999 Achieving 100%Achieving Throughput 100% Throughput in an Input-Queuedin an Switch Input-Queued Switch Practice. Nick McKeown, Senior Member,Nick IEEE, McKeown,Adisak Mekkittikul,Senior Member,Member, IEEE, IEEE,Adisak Mekkittikul, Member, IEEE, Venkat Anantharam, Fellow, IEEEVenkat, and JeanAnantharam, Walrand,Fellow,Fellow, IEEE IEEE, and Jean Walrand, Fellow, IEEE

・Assignment problem too slow in practice. Abstract— It is well known that head-of-line blocking limits Abstract— It is well known that head-of-line blocking limits the throughput of an input-queued switchthe throughput with first-in–first-out of an input-queued switch with first-in–first-out (FIFO) queues. Under certain conditions,(FIFO) the queues. throughput Under can certain be conditions, the throughput can be shown to be limited to approximatelyshown 58.6%. to be It is limited also known to approximately 58.6%. It is also known ・Difficult to implement in hardware. that if non-FIFO queueing policies arethat used, if non-FIFO the throughput queueing can policies are used, the throughput can be increased. However, it has not beenbe previously increased. shown However, that it if has a not been previously shown that if a suitable queueing policy and schedulingsuitable algorithm queueing are used, policy then and scheduling algorithm are used, then it is possible to achieve 100% throughputit is possible for all to independent achieve 100% throughput for all independent arrival processes. In this paper we provearrival this processes. to be the case In this using paper we prove this to be the case using ・Provides theoretical framework: a simple argumenta simple and quadratic linear programming Lyapunov argument and quadratic Lyapunov function. In particular, we assumefunction. that each In input particular, maintains we assume that each input maintains a separate FIFO queue for each outputa separate and that FIFO the queue switch for is each output and that the switch is scheduled using a maximum weight bipartitescheduled matching using a algorithm. maximum weight bipartite matching algorithm. use maximal (weighted) matching. We introduce two maximum weight matchingWe introduce algorithms: two maximum longest weight matching algorithms: longest queue first (LQF) and oldest cell firstqueue (OCF). first Both (LQF) algorithms and oldest cell first (OCF). Both algorithms achieve 100% throughput for all independentachieve 100% arrival throughput processes. forFig. all independent 1. Components arrival of an input-queued processes. cell-switch.Fig. 1. Components of an input-queued cell-switch. LQF favors queues with larger occupancy,LQF favors ensuring queues that with larger larger occupancy, ensuring that larger queues will eventually be served. However,queues we will find eventually that LQF be can served. However, we find that LQF can lead to the permanent starvation of shortlead queues. to the permanent OCF overcomes starvation of4) short arriving queues. packets OCF overcomes are of fixed and4) arrivingequal length, packets called are of fixed and equal length, called this limitation by favoring cells with large waiting times. this limitation by favoring cells with large waiting times. cells; cells; Index Terms—Arbitration, ATM, input-queuedIndex Terms— switch,Arbitration, input- ATM,5) input-queuedis large. switch, input- 5) is large. queueing, packet switch, queueing networks, scheduling algo- queueing, packet switch, queueing networks, scheduling algo- When conditions 1) and 2) are true weWhen shall conditions say that arrivals 1) and 2) are true we shall say that arrivals rithm. rithm. are independent, and when conditionare 3)independent is true we, shalland when say condition 3) is true we shall say that arrivals are uniform. that arrivals are uniform. I. INTRODUCTION I. INTRODUCTIONThe throughput is limited because aThe cell throughput can be held is up limited by because a cell can be held up by INCE Karol et al.’s paper was publishedINCE Karolin 1986et [11], al.’s paperit another was published cell ahead in of 1986 it in [11], line it thatanother is destined cell for ahead a different of it in line that is destined for a different Shas become well known that anShas becomeport input-queued well known thatoutput. an This phenomenonport input-queued is knownoutput. as head-of-line This45 phenomenon (HOL) is known as head-of-line (HOL) switch with first-input–first-output (FIFO)switch with queues first-input–first-output can have a blocking. (FIFO) queues can have a blocking. throughput limited to just throughput%. Thelimited conditions to just It is well documented%. The conditions that this resultIt isapplies well documented only to input- that this result applies only to input- for this to be true are that: for this to be true are that: queued switches with FIFO queues.queued And so switches many techniqueswith FIFO queues. And so many techniques have been suggested for reducing HOL blocking using non- 1) arrivals at each input are independent1) arrivals and at identically each inputhave are independent been suggested and foridentically reducing HOL blocking using non- FIFO queues, for example, by examining the first cells distributed (i.i.d.); distributed (i.i.d.); FIFO queues, for example, by examining the first cells in a FIFO queue, where [5], [8], [10]. In fact, HOL 2) arrival processes at each input2) are arrival independent processes of ar- at eachin a input FIFO are queue, independent where of ar- [5], [8], [10]. In fact, HOL blocking can be eliminated entirely by using a simple buffering rivals at other inputs; rivals at other inputs; blocking can be eliminated entirely by using a simple buffering strategy at each input port. Rather than maintain a single FIFO 3) all arrival processes have the3) same all arrival processesrate and havestrategy the at same each inputarrival port. rate Rather and than maintain a single FIFO queue for all cells, each input maintains a separate queue destinations are uniformly distributeddestinations over all outputs; are uniformlyqueue distributed for all cells,over all each outputs; input maintains a separate queue for each output [1], [9], [16]–[19],for as each shown output in Fig. [1], 1. [9], This [16]–[19], as shown in Fig. 1. This Paper approved by P. E. Rynes, the EditorPaper for Switching approved Systems by P. E. of Rynes, the thequeuing Editor disciplinefor Switching is Systems often referred of the toqueuing as virtual discipline output queuingis often referred to as virtual output queuing IEEE Communications Society. Manuscript received July 29, 1997; revised IEEE Communications Society. Manuscript received July 29, 1997; revised (VOQ). HOL blocking is eliminated because a cell cannot be October 5, 1998. This paper was presentedOctober in part 5, at 1998. INFOCOM’96, This paper San was presented(VOQ). in HOL part atblocking INFOCOM’96, is eliminated San because a cell cannot be Francisco, CA. Francisco, CA. held up by a cell queued ahead ofheld it that up by is destined a cell queued for a ahead of it that is destined for a N. McKeown is with the Department of ElectricalN. McKeown Engineering, is with Stanford the Departmentdifferent of Electrical output. Engineering, The implementation Stanford different of VOQ output. is slightly The moreimplementation of VOQ is slightly more University, Stanford, CA 94305-9030 USA (e-mail: [email protected]). University, Stanford, CA 94305-9030 USA (e-mail: [email protected]). complex, requiring FIFO’s to be maintained by each input A. Mekkittikul was with the DepartmentA. of Mekkittikul Electrical was Engineering, with the Departmentcomplex, requiring of Electrical Engineering,FIFO’s to be maintained by each input Stanford University, Stanford, CA 94305-9030Stanford USA. University, He is nowStanford, with CAbuffer. 94305-9030 But no USA. additional He is now speedup with isbuffer. required; But at no most additional one cell speedup is required; at most one cell Berkeley Concept Research Corporation, Berkeley,Berkeley CA Concept 94704 Research USA (e-mail: Corporation,can arriveBerkeley, and CA depart 94704 USA from (e-mail: each inputcan in arrive a slot. and During depart each from each input in a slot. During each [email protected]). [email protected]). slot, a scheduling algorithm decides the configuration of the V. Anantharam is with the Department ofV. Electrical Anantharam Engineering is with the and Departmentslot, a scheduling of Electrical algorithmEngineering decides and the configuration of the Computer Sciences, University of CaliforniaComputer at Berkeley, Sciences, Berkeley, University CA ofswitch California by at finding Berkeley, a matching Berkeley, CA on aswitch bipartite by graph finding (described a matching on a bipartite graph (described 94720 USA (e-mail: [email protected]).94720 USA (e-mail: [email protected]). below). A number of different techniques have been used for J. Walrand is with Odyssia Systems,below). Berkeley, A number CA 94710 of USA different (e-mail: techniques have been used for J. Walrand is with Odyssia Systems, Berkeley, CA 94710 USA (e-mail: finding such a matching, for example, using neural networks [email protected]). [email protected]). finding such a matching, for example, using neural networks Publisher Item Identifier S 0090-6778(99)06305-9.Publisher Item Identifier S 0090-6778(99)06305-9.[2], [4], [22] or iterative algorithms [1],[2], [14],[4], [22] [15]. or These iterative algo- algorithms [1], [14], [15]. These algo-

0090–6778/99$10.00 © 1999 IEEE 0090–6778/99$10.00 © 1999 IEEE