I. Sorting Networks Thomas Sauerwald
Easter 2015 Outline
Introduction to Sorting Networks
Batcher’s Sorting Network
Counting Networks
Load Balancing on Graphs
I. Sorting Networks Introduction to Sorting Networks 2 Overview: Sorting Networks
(Serial) Sorting Algorithms we already know several (comparison-based) sorting algorithms: Insertion sort, Bubble sort, Merge sort, Quick sort, Heap sort execute one operation at a time can handle arbitrarily large inputs sequence of comparisons is not set in advance
Sorting Networks only perform comparisons can only handle inputs of a fixed size sequence of comparisons is set in advance Allows to sort n numbers Comparisons can be performed in parallel in sublinear time!
Simple concept, but surprisingly deep and complex theory!
I. Sorting Networks Introduction to Sorting Networks 3 Comparison Networks
A sorting network is a comparison network which Comparison Network works correctly (that is, it sorts every input) A comparison network consists solely of wires and comparators: comparator is a device with, on given two inputs, x and y, returns two operates in O(1) outputs x0 and y 0 wire connect output of one comparator to the input of another special wires: n input wires a1, a2,..., an and n output wires b1, b2,..., bn
Convention:27.1 Comparison use networks the same name for both a wire and its value. 705
7 3 x x! min(x, y) x x ! min(x, y) comparator = = 3 7 y y! max(x, y) y y! max(x, y) = = (a) (b)
Figure 27.1 (a) Acomparatorwithinputsx and y and outputs x! and y!. (b) The same comparator, drawn as a single vertical line. Inputs x 7, y 3andoutputsx 3, y 7areshown. = = ! = ! =
Acomparisonnetworkiscomposedsolelyofwiresandcomparators. A compara- tor,showninFigure27.1(a),isadevicewithtwoinputs,I. Sorting Networks Introduction to Sorting Networks x and y,andtwooutputs,4 x! and y!,thatperformsthefollowingfunction:
x! min(x, y), = y! max(x, y). = Because the pictorial representation of a comparator in Figure 27.1(a) is too bulky for our purposes, we shall adopt the convention of drawing comparators as single vertical lines, as shown in Figure 27.1(b). Inputs appear on the left and outputs on the right, with the smaller input value appearing on the top output and the larger input value appearing on the bottom output. We can thus think of a comparator as sorting its two inputs. We shall assume that each comparator operates in O(1) time. In other words, we assume that the time between the appearance of the input values x and y and the production of the output values x! and y! is a constant. A wire transmits a value from place to place. Wires can connect the output of one comparator to the input of another, but otherwise they are either network input wires or network output wires. Throughout this chapter, we shall assume that a comparison network contains n input wires a1, a2,...,an,throughwhich the values to be sorted enter the network, and n output wires b1, b2,...,bn,which produce the results computed by the network. Also, we shall speak of the input sequence a1, a2,...,an and the output sequence b1, b2,...,bn ,referringto the values" on the input and# output wires. That is, we use" the same name# for both a wire and the value it carries. Our intention will always be clear from the context. Figure 27.2 shows a comparison network,whichisasetofcomparatorsinter- connected by wires. We draw a comparison network on n inputs as a collection of n horizontal lines with comparators stretched vertically. Note that a line does not represent a single wire, but rather a sequence of distinct wires connecting vari- ous comparators. The top line in Figure 27.2, for example, represents three wires: input wire a1,whichconnectstoaninputofcomparatorA;awireconnectingthe top output of comparator A to an input of comparator C;andoutputwireb1,which comes from the top output of comparator C.Eachcomparatorinputisconnected X
Interconnections between comparators must be acyclic 9 5 2 2
5 9 6 5
F D F 2 2 F 5 6
D 6 6 9 9
depth 0 1 1 2 2 3 TracingThis backnetwork a path is in must fact a neverMaximum sorting cycle network! depth back ofon an output Depth of a wire: itself and go through the samewire comparator equals total twice. running time Input wire has Depth 0
If a comparator has two inputs of depths dx and dy , then outputs have depth max{dx , dy } + 1
Example of a Comparison Network (Figure 27.2)
A horizontal line represents a sequence of distinct wires
a1 b1 A C
a2 b2 E
a3 b3 B D
a4 b4
I. Sorting Networks Introduction to Sorting Networks 5 A horizontal line represents a sequence of distinctX wires 9 5 2 2
5 9 6 5
D F 2 2 F 5 6
D 6 6 9 9
depth 0 1 1 2 2 3 TracingThis backnetwork a path is in must fact a neverMaximum sorting cycle network! depth back ofon an output Depth of a wire: itself and go through the samewire comparator equals total twice. running time Input wire has Depth 0
If a comparator has two inputs of depths dx and dy , then outputs have depth max{dx , dy } + 1
Example of a Comparison Network (Figure 27.2)
Interconnections between comparators must be acyclic
a1 b1 A C
a2 b2 F E
a3 b3 B D
a4 b4
I. Sorting Networks Introduction to Sorting Networks 5 A horizontal line represents a sequence of distinct wires 9 5 2 2
5 9 6 5
F D 2 2 F 5 6
D 6 6 9 9
depth 0 1 1 2 2 3 TracingThis backnetwork a path is in must fact a neverMaximum sorting cycle network! depth back ofon an output Depth of a wire: itself and go through the samewire comparator equals total twice. running time Input wire has Depth 0
If a comparator has two inputs of depths dx and dy , then outputs have depth max{dx , dy } + 1
Example of a Comparison Network (Figure 27.2)
Interconnections between comparators must be acyclic X
a1 b1 A C
a2 b2 D F E
a3 b3 B
a4 b4
I. Sorting Networks Introduction to Sorting Networks 5 A horizontal line represents a sequence of distinctX wires 9 5 2 2
5 9 6 5
F D F 2 2 5 6
D 6 6 9 9
depth 0 1 1 2 2 3 This network is in fact aMaximum sorting network! depth of an output Depth of a wire: wire equals total running time Input wire has Depth 0
If a comparator has two inputs of depths dx and dy , then outputs have depth max{dx , dy } + 1
Example of a Comparison Network (Figure 27.2)
Interconnections between comparators must be acyclic
a1 b1 A C
a2 b2 E F a3 b3 B D
a4 b4
Tracing back a path must never cycle back on itself and go through the same comparator twice.
I. Sorting Networks Introduction to Sorting Networks 5 X
InterconnectionsA horizontal between line represents comparators a sequencemust be of acyclic distinct wires
F D F F
D
depth 0 1 1 2 2 3 Tracing back a path must neverMaximum cycle depth back ofon an output Depth of a wire: itself and go through the samewire comparator equals total twice. running time Input wire has Depth 0
If a comparator has two inputs of depths dx and dy , then outputs have depth max{dx , dy } + 1
Example of a Comparison Network (Figure 27.2)
9 5 2 2 a1 b1 A C 5 9 6 5 a2 b2 E 2 2 5 6 a3 b3 B D 6 6 9 9 a4 b4
This network is in fact a sorting network!
I. Sorting Networks Introduction to Sorting Networks 5 X
InterconnectionsA horizontal between line represents comparators a sequencemust be of acyclic distinct wires
F D F F
D
TracingThis backnetwork a path is in must fact a never sorting cycle network! back on itself and go through the same comparator twice.
Example of a Comparison Network (Figure 27.2)
9 5 2 2 a1 b1 A C 5 9 6 5 a2 b2 E 2 2 5 6 a3 b3 B D 6 6 9 9 a4 b4 depth 0 1 1 2 2 3 Maximum depth of an output Depth of a wire: wire equals total running time Input wire has Depth 0
If a comparator has two inputs of depths dx and dy , then outputs have depth max{dx , dy } + 1
I. Sorting Networks Introduction to Sorting Networks 5 Zero-One Principle
Zero-One Principle: A sorting networks works correctly on arbitrary in- puts if it works correctly on binary inputs.
Lemma 27.1
If a comparison network transforms the input a = ha1, a2,..., ani into the output b = hb1, b2,..., bni, then for any monotonically increasing function f , the network transforms f (a) = hf (a1), f (a2),..., f (an)i into f (b) = hf (b1), f (b2),..., f (bn)i. 710 Chapter 27 Sorting Networks
f (x) min( f (x), f (y)) f (min(x, y)) = f (y) max( f (x), f (y)) f (max(x, y)) =
Figure 27.4 The operation of the comparator in the proof of Lemma 27.1. Thefunction f is monotonically increasing.
To prove the claim, consider a comparator whose input values are x and y.The upper output of the comparator is min(x, y) and the lower output is max(x, y). I. Sorting Networks Introduction to Sorting Networks 6 Suppose we now apply f (x) and f (y) to the inputs of the comparator, as is shown in Figure 27.4. The operation of the comparator yields the value min( f (x), f (y)) on the upper output and the value max( f (x), f (y)) on the lower output. Since f is monotonically increasing, x y implies f (x) f (y).Consequently,wehave the identities ≤ ≤ min( f (x), f (y)) f (min(x, y)) , = max( f (x), f (y)) f (max(x, y)) . = Thus, the comparator produces the values f (min(x, y)) and f (max(x, y)) when f (x) and f (y) are its inputs, which completes the proof of the claim. We can use induction on the depth of each wire in a general comparison network to prove a stronger result than the statement of the lemma: if awireassumesthe value ai when the input sequence a is applied to the network, then it assumes the value f (ai ) when the input sequence f (a) is applied. Because the output wires are included in this statement, proving it will prove the lemma. For the basis, consider a wire at depth 0, that is, an input wire ai .Theresult follows trivially: when f (a) is applied to the network, the input wire carries f (ai ). For the inductive step, consider a wire at depth d,whered 1. The wire is the output of a comparator at depth d,andtheinputwirestothiscomparatorareata≥ depth strictly less than d.Bytheinductivehypothesis,therefore,iftheinputwires to the comparator carry values ai and a j when the input sequence a is applied, then they carry f (ai ) and f (a j ) when the input sequence f (a) is applied. By our earlier claim, the output wires of this comparator then carry f (min(ai , a j )) and f (max(ai , a j )).Sincetheycarrymin(ai , a j ) and max(ai , a j ) when the input sequence is a,thelemmaisproved.
As an example of the application of Lemma 27.1, Figure 27.5(b)showsthesort- ing network from Figure 27.2 (repeated in Figure 27.5(a)) with the monotonically increasing function f (x) x/2 applied to the inputs. The value on every wire is f applied to the value on= the# same$ wire in Figure 27.2. When a comparison network is a sorting network, Lemma 27.1 allows us to prove the following remarkable result. Zero-One Principle
Zero-One Principle: A sorting networks works correctly on arbitrary in- puts if it works correctly on binary inputs.
Lemma 27.1
If a comparison network transforms the input a = ha1, a2,..., ani into the output b = hb1, b2,..., bni, then for any monotonically increasing function f , the network transforms f (a) = hf (a1), f (a2),..., f (an)i into f (b) = hf (b1), f (b2),..., f (bn)i.
Theorem 27.2 (Zero-One Principle) If a comparison network with n inputs sorts all 2n possible sequences of 0’s and 1’s correctly, then it sorts all sequence of arbitrary numbers correctly.
I. Sorting Networks Introduction to Sorting Networks 6 Proof of the Zero-One Principle
Theorem 27.2 (Zero-One Principle) If a comparison network with n inputs sorts all 2n possible sequences of 0’s and 1’s correctly, then it sorts all sequence of arbitrary numbers correctly.
Proof: For the sake of contradiction, suppose the network does not correctly sort.
Let a = ha1, a2,..., ani be the input with ai < aj , but the network places aj before ai in the output Define a monotonitcally increasing function f as: ( 0 if x ≤ a , f (x) = i 1 if x > ai .
Since the network places aj before ai , by the previous lemma ⇒ f (aj ) is placed before f (ai )
But f (aj ) = 1 and f (ai ) = 0, which contradicts the assumption that the network sorts all sequences of 0’s and 1’s correctly
I. Sorting Networks Introduction to Sorting Networks 7 Bubble Sort
Insertion Sort
Some Basic (Recursive) Sorting Networks
1 2 3 4 n-wire Sorting Network ??? 5
n − 1 n n + 1 These are Sorting Networks, but with depth Θ(n).
1 2 3 4 n-wire Sorting Network ??? 5
n − 1 n n + 1
I. Sorting Networks Introduction to Sorting Networks 8 Outline
Introduction to Sorting Networks
Batcher’s Sorting Network
Counting Networks
Load Balancing on Graphs
I. Sorting Networks Batcher’s Sorting Network 9 Bitonic Sequences
Bitonic Sequence A sequence is bitonic if it monotonically increases and then monoton- ically decreases, or can be circularly shifted to become monotonically increasing and then monotonically decreasing.
Sequences of one or two numbers are defined to be bitonic.
Examples: h1, 4, 6, 8, 3, 2i X h6, 9, 4, 2, 3, 5i X h9, 8, 3, 2, 4, 6i X ((( (h4,(5,(7, 1, 2, 6i binary sequences: 0i 1j 0k , or, 1i 0j 1k , for i, j, k ≥ 0.
I. Sorting Networks Batcher’s Sorting Network 10 Towards a Bitonic Sorting Networks
Half-Cleaner A half-cleaner is a comparison network of depth 1 in which input wire i is compared with wire i + n/2 for i = 1, 2,..., n/2.
We always assume that n is even. Lemma 27.3 If the input to a half-cleaner is a bitonic sequence of 0’s and 1’s, then the output satisfies the following properties: both the top half and the bottom half are bitonic, every element in the top is not larger than any element in the bottom, at least one half is clean. 27.3 A bitonic sorting network 713
0 0 0 0 0 0 bitonic, 0 0 bitonic 1 0 clean 1 1 1 0 1 0 bitonic bitonic 1 1 1 1 0 0 1 1 bitonic, bitonic 0 1 1 1 clean 0 1 0 1
I. Sorting Networks Batcher’s Sorting Network 11 Figure 27.7 The comparison network HALF-CLEANER[8]. Two different sample zero-one input and output values are shown. The input is assumed to be bitonic. A half-cleaner ensures that ev- ery output element of the top half is at least as small as every output element of the bottom half. Moreover, both halves are bitonic, and at least one half is clean.
even.) Figure 27.7 shows HALF-CLEANER[8], the half-cleaner with 8 inputs and 8outputs. When a bitonic sequence of 0’s and 1’s is applied as input to a half-cleaner, the half-cleaner produces an output sequence in which smaller values are in the top half, larger values are in the bottom half, and both halves arebitonic.Infact,at least one of the halves is clean—consisting of either all 0’s or all 1’s—and it is from this property that we derive the name “half-cleaner.” (Note that all clean sequences are bitonic.) The next lemma proves these properties of half-cleaners.
Lemma 27.3 If the input to a half-cleaner is a bitonic sequence of 0’s and 1’s, then the output satisfies the following properties: both the top half and the bottom half are bitonic, every element in the top half is at least as small as every element of the bottom half, and at least one half is clean.
Proof The comparison network HALF-CLEANER[n]comparesinputsi and i n/2fori 1, 2,...,n/2. Without loss of generality, suppose that the in- put+ is of the form= 00 ...011 ...100 ...0. (The situation in which the input is of the form 11 ...100 ...011 ...1issymmetric.)Therearethreepossiblecasesde- pending upon the block of consecutive 0’s or 1’s in which the midpoint n/2falls, and one of these cases (the one in which the midpoint occurs in the block of 1’s) is further split into two cases. The four cases are shown in Figure 27.8. In each case shown, the lemma holds. This suggests a recursive approach, since it now suffices to sort the top and bottom half separately.
Proof of Lemma 27.3 714 Chapter 27 Sorting Networks W.l.o.g. assume that the input is of the form 0i 1j 0k , for some i, j, k ≥ 0.
divide compare combine
0 top top bitonic, 0 1 1 clean 0 bitonic 0 0 1 0 1 1 1 0 bitonic 0 bottom bottom 1 (a)
0 0 top top 1 bitonic 0 0 1 0 bitonic 1 1 1 1 0 0 1 bitonic, bottom bottom clean 0 (b)
top top bitonic, 0 0 0 0 clean bitonic I. Sorting0 Networks1 Batcher’s0 Sorting1 Network 12 0 0 0 1 1 bitonic bottom bottom 0 0 (c)
0 top top bitonic, 1 0 0 0 clean bitonic 1 0 0 1 0 0 0 0 1 bitonic bottom bottom 0 (d)
Figure 27.8 The possible comparisons in HALF-CLEANER[n]. The input sequence is assumed to be a bitonic sequence of 0’s and 1’s, and without loss of generality, we assume that it is of the form 00 ...011 ...100 ...0. Subsequences of 0’s are white, and subsequences of 1’s are gray. We can think of the n inputs as being divided into two halves such that for i 1, 2,...,n/2, inputs i = and i n/2arecompared. (a)–(b) Cases in which the division occurs in the middle subsequence + of 1’s. (c)–(d) Cases in which the division occurs in a subsequence of 0’s. Forallcases,every element in the top half of the output is at least as small as every element in the bottom half, both halves are bitonic, and at least one half is clean. 714 Chapter 27 Sorting Networks
divide compare combine
0 top top bitonic, 0 1 1 clean 0 bitonic 0 0 1 0 1 1 1 0 bitonic 0 bottom bottom 1 (a)
0 0 top top 1 bitonic 0 0 1 0 bitonic 1 1 1 1 Proof of Lemma 27.3 0 0 1 bitonic, bottom bottom clean 0 i j k W.l.o.g. assume that the input(b) is of the form 0 1 0 , for some i, j, k ≥ 0.
top top bitonic, 0 0 0 0 clean bitonic 0 1 0 1 0 0 0 1 1 bitonic bottom bottom 0 0 (c)
0 top top bitonic, 1 0 0 0 clean bitonic 1 0 0 1 0 0 0 0 1 bitonic bottom bottom 0 (d)
This suggests a recursive approach, since it now Figure 27.8 The possible comparisonssuffices in HALF to-CLEANER sort the[n]. top The and input bottom sequence halfis assumed separately. to be a bitonic sequence of 0’s and 1’s, and without loss of generality, we assume that it is of the form 00 ...011 ...100 ...0. Subsequences of 0’s are white, and subsequences of 1’s are gray. We can think of the n inputsI. Sorting as Networks being divided into Batcher’s two halvesSorting Network such that for i 1, 2,...,n/2, inputs12 i = and i n/2arecompared. (a)–(b) Cases in which the division occurs in the middle subsequence + of 1’s. (c)–(d) Cases in which the division occurs in a subsequence of 0’s. Forallcases,every element in the top half of the output is at least as small as every element in the bottom half, both halves are bitonic, and at least one half is clean. The Bitonic27.3 Sorter A bitonic sorting network 715
0 0 0 0 0 0 BITONIC- 0 0 SORTER[n/2] 1 0 0 0 0 0 HALF- 1 0 bitonic sorted CLEANER[n] 1 1 1 0 0 0 BITONIC- 0 1 SORTER[n/2] 0 1 1 1 0 1 1 1 (a) (b)
Figure 27.9 The comparison network BITONIC-SORTER[n], shown here for n 8. (a) The re- = cursive construction: HALF-CLEANER[n]followedbytwocopiesofBITONIC-SORTER[n/2] that operate in parallel. (b) The network after unrolling the recursion. Each half-cleaner is shaded. Sam- ple zero-one values are shown on the wires. Henceforth we will always The bitonic sorter assume that n is a power of 2. Recursive FormulaBy recursively for depth combiningD( half-cleaners,n): as shown in Figure 27.9, we can build a bitonic sorter,whichisanetworkthatsortsbitonicsequences.Thefirstst( age of BITONIC-SORTER[n]consistsofH0ALF-CLEANER if n =[n],1, which, by Lemma 27.3, produces twoD bitonic(n) = sequences of half the size such thatk every element in the top half is at least as smallD as(n every/2) element + 1 in if then = bottom2 . half. Thus, we can complete the sort by using two copies of BITONIC-SORTER[n/2] to sort the two halves recursively. In Figure 27.9(a), the recursion has been shown explicitly, and BITONIC-SORTERin Figure[n] 27.9(b),has depth the recursion log hasn and been unrolled sorts toany show zero-one theprogressivelysmaller bitonic sequence. half-cleaners that make up the remainder of the bitonic sorter. The depth D(n) of BITONIC-SORTER[n]isgivenbytherecurrence 0ifn 1 , D(nI.) Sorting Networks= Batcher’s Sorting Network 13 = D(n/2) 1ifn 2k and k 1 , ! + = ≥ whose solution is D(n) lg n. = Thus, a zero-one bitonic sequence can be sorted by BITONIC-SORTER,which has a depth of lg n.Itfollowsbytheanalogofthezero-oneprinciplegivenas Exercise 27.3-6 that any bitonic sequence of arbitrary numbers can be sorted by this network.
Exercises
27.3-1 How many zero-one bitonic sequences of length n are there? Merging Networks
Merging Networks can merge two sorted input sequences into one sorted output sequences will be based on a modification of BITONIC-SORTER[n]
Basic Idea: consider two given sequences X = 00000111, Y = 00001111 concatenating X with Y R (the reversal of Y ) ⇒ 0000011111110000
This sequence is bitonic!
Hence in order to merge the sequences X and Y , it suf- fices to perform a bitonic sort on X concatenated with Y R .
I. Sorting Networks Batcher’s Sorting Network 14 Construction of a Merging Network (1/2)
Given two sorted sequences ha1, a2,..., an/2i and han/2+1, an/2+2,..., ani We know it suffices to bitonically sort ha1, a2,..., an/2, an, an−1,..., an/2+1i Recall: first half-cleaner of BITONIC-SORT[n] compares i and n/2 + i ⇒ First part of MERGER[n] compares inputs i and n − i for i = 1, 2,..., n/2 Remaining27.4 A part merging is network identical to BITONIC-SORT[n] 717
0 0 0 0 a1 b1 a1 b1 0 0 0 0 a2 b2 a2 b2 sorted 1 0 bitonic 1 0 bitonic a3 b3 a3 b3 1 0 1 0 a4 b4 a4 b4 0 1 bitonic 1 1 a5 b5 a8 b8 0 1 0 0 a6 b6 a7 b7 sorted 0 0 bitonic 0 1 bitonic a7 b7 a6 b6 1 1 0 1 a8 b8 a5 b5 (a) (b) Lemma 27.3 still applies, since the reversal of a bitonic sequence is bitonic.
Figure 27.10 Comparing the first stage of MERGER[n]withHALF-CLEANER[n], for n 8. = (a) The first stage of MERGER[n]transformsthetwomonotonicinputsequences a , a ,...,a ! 1 2 n/2" and an/2 1, an/2 2,...,an into two bitonic sequences b1, b2,...,bn/2 and bn/2 1, bn/2 2, ! + + " ! " ! + + ..., bn . (b) The equivalent operation for HALF-CLEANER[n]. The bitonic input sequence " a1, a2,...,an/2 1, an/2, an, an 1,...,an/2 2, an/2 1 is transformed into the two bitonic se- ! − − + + " quences b1, b2,...,bn/2 and bn, bn 1,...,bn/2 1 . ! " ! − + "
We canI. Sorting construct Networks M ERGER[n Batcher’s]bymodifyingthefirsthalf-cleanerofB Sorting Network ITONIC15 - SORTER[n]. The key is to perform the reversal of the second half of the inputs implicitly. Given two sorted sequences a1, a2,...,an/2 and an/2 1, an/2 2, ..., a to be merged, we want the effect! of bitonically" sorting! the+ sequence+ n" a1, a2,...,an/2, an, an 1,...,an/2 1 .Sincethefirsthalf-cleanerofBITONIC- ! − + " SORTER[n]comparesinputsi and n/2 i,fori 1, 2,...,n/2, we make the first stage of the merging network compare+ inputs =i and n i 1. Figure 27.10 shows the correspondence. The only subtlety is that the orde−roftheoutputsfrom+ the bottom of the first stage of MERGER[n]arereversedcomparedwiththeorder of outputs from an ordinary half-cleaner. Since the reversalofabitonicsequence is bitonic, however, the top and bottom outputs of the first stage of the merging network satisfy the properties in Lemma 27.3, and thus the topandbottomcanbe bitonically sorted in parallel to produce the sorted output of the merging network. The resulting merging network is shown in Figure 27.11. Only the first stage of MERGER[n]isdifferentfromBITONIC-SORTER[n]. Consequently, the depth of MERGER[n]islgn,thesameasthatofBITONIC-SORTER[n].
Exercises
27.4-1 Prove an analog of the zero-one principle for merging networks. Specifically, show that a comparison network that can merge any two monotonically increasing se- Construction of a Merging Network (2/2)
718 Chapter 27 Sorting Networks
0 0 0 0 0 0 BITONIC- 0 0 sorted SORTER[n/2] 1 1 1 0 1 0 0 1 sorted 0 1 1 1 1 1 BITONIC- 1 1 sorted SORTER[n/2] 1 1 1 1 1 1 1 1 (a) (b)
Figure 27.11 Anetworkthatmergestwosortedinputsequencesintoonesorted output sequence. The network MERGER[n]canbeviewedasBITONIC-SORTER[n]withthefirsthalf-cleaneralteredto compare inputs i and n i 1fori 1, 2,...,n/2. Here, n 8. (a) The network decomposed into − + = = the first stage followed by two parallel copies of BITONIC-SORTER[n/2]. (b) The same network with the recursion unrolled. Sample zero-one values are shown on the wires, and the stages are shaded.
quences of 0’s and 1’s can merge any two monotonically increasing sequences of arbitrary numbers. I. Sorting Networks Batcher’s Sorting Network 16 27.4-2 How many different zero-one input sequences must be applied to the input of a comparison network to verify that it is a merging network?
27.4-3 Show that any network that can merge 1 item with n 1sorteditemstoproducea sorted sequence of length n must have depth at least lg− n.
27.4-4 ! Consider a merging network with inputs a1, a2,...,an ,forn an exact power of 2, in which the two monotonic sequences to be merged are a1, a3,...,an 1 and " − # a2, a4,...,an .Provethatthenumberofcomparatorsinthiskindofmerging network" is "(n#lg n).Whyisthisaninterestinglowerbound?(Hint: Partition the comparators into three sets.)
27.4-5 ! Prove that any merging network, regardless of the order of inputs, requires "(n lg n) comparators. Construction of a Sorting Network 27.3 A bitonic sorting network 715
0 0 0 0 0 0 Main Components BITONIC- 0 0 SORTER[n/2] 1 0 0 0 0 0 HALF- 1 0 1.B ITONIC-SORT[n] 1. bitonic sorted CLEANER[n] 1 1 1 0 0 0 sorts any bitonic sequence BITONIC- 0 1 SORTER[n/2] 1 1 depth log n 718 Chapter 27 Sorting Networks 0 1 0 1 1 1 2.M ERGER[n] (a) (b) 0 0 0 0 0 0 merges two sorted input sequences BITONIC- 0 0 sorted Figure 27.9SORTERThe[n/2] comparison network BITONIC-SORTER[n], shown1 here for1n 8. (a) The re- 1 = 0 depth log n cursive construction: HALF-CLEANER[n]followedbytwocopiesofBITONIC-SORTER[n/2] that 1 0 0 1 operate in parallel. (b) The network after unrolling the recursion. Each half-cleaner is shaded. Sam- 2. 1 1 sorted ple zero-one values are shown on the wires. 0 1 1 1 BITONIC- 1 1 sorted SORTER[n/2] 1 1 The bitonic sorter 1 1 1 1 1 1 By(a) recursively combining half-cleaners, as shown in Figure(b) 27.9, we can build a bitonic sorter,whichisanetworkthatsortsbitonicsequences.Thefirststage 720 Chapterof BITONIC 27 Sorting-SORTER Networks[n]consistsofHALF-CLEANER[n], which, by Lemma 27.3, Batcher’s Sorting Network producesFigure 27.11 twoAnetworkthatmergestwosortedinputsequencesintoonesor bitonic sequences of half the size such that everyted element output sequence. in the topThenetwork half is M atERGER least[n as]canbeviewedasB small as everyITONIC element-SORTER in the[n]withthefirsthalf-cleaneralteredto bottom half. Thus, we can compare inputs i and n i 1fori 1, 2,...,n/2. Here, n 8. (a) The network decomposed into complete the sort by− using+ two= copiesMERGER of B[2]ITONIC=-SORTER[n/2] to sort the two ORTER[ ] the first stage followed by two parallel copies of BITONIC-SORTER[n/2]. (b) The same network with S n is defined recursively: SORTER[n/2] MERGER[4] halvesthe recursion recursively. unrolled.In Sample Figure zero-one 27.9(a), values the are recursion shown on the has wires, been and shown the stages explicitly, are shaded. and If n = 2k , use two copies of SORTER[n/2] to in Figure 27.9(b), the recursion has beenMERGER unrolled[2] to show theprogressivelysmaller half-cleanersM thatERGER make[n] up the remainder of the bitonic sorter. TheMERGER depth[8] D(n) of quences of 0’s and 1’s can merge any two monotonically increasing sequences of sort two subsequences of length n/2 each. BITONIC-SORTER[n]isgivenbytherecurrenceMERGER[2] arbitrary numbers. Then merge them using MERGER[n]. SORTER[n/2] MERGER[4] 0ifn 1 , MERGER[2] D27.4-2(n) = k If n = 1, network consists of a single wire. = D(n/2) 1ifn 2 and k 1 , How many(a)! different+ zero-one= input sequences≥ must be(b) applied to the input of a whosecomparison solution network is D(n to) verifylg n. that it is a merging network? = Thus, a zero-one bitonic sequence can be sorted by BITONIC-SORTER,which can be seen as a parallel version of merge sort 1 0 has27.4-3 a0 depth of lg n.Itfollowsbytheanalogofthezero-oneprinciplegivenas0 0 1 ExerciseShow0 that 27.3-6 any network that any that bitonic can0 merge sequence 1 item of witharbitraryn 1sorteditemstoproducea numbers can be sorted by − 1 0 thissorted network.1 sequence of length n must0 have depth at least lg n. 1 1 I. Sorting Networks Batcher’s Sorting Network 0 0 0 27.4-40 !17 1 Exercises 0 0 1 Consider0 a merging network with1 inputs a1, a2,...,an ,forn an exact power of 2, 0 0 in which0 the two monotonic1 sequences to be merged are a1, a3,...,an 1 and 27.3-1 " − # 0 0 a , a1 ,...,a .Provethatthenumberofcomparatorsinthiskindofmerging1 How" 2 many4 zero-onen# bitonic sequences of length n are there? depth 1 2 2network3 4 is4"(4n lg4 n5).Whyisthisaninterestinglowerbound?(5 6 Hint: Partition the comparators(c) into three sets.)
27.4-5 ! FigureProve 27.12 thatThe any sorting merging network network, SORTER[n regardless]constructedbyrecursivelycombiningmergingnet- of the order of inputs, requires works."(n(a)lg nThe) comparators. recursive construction. (b) Unrolling the recursion. (c) Replacing the MERGER boxes with the actual merging networks. The depth of each comparator is indicated, and sample zero-one values are shown on the wires.
27.5-2 Show that the depth of SORTER[n]isexactly(lg n)(lg n 1)/2. + 27.5-3 Suppose that we have 2n elements a1, a2,...,a2n and wish to partition them into the n smallest and the n largest. Prove! that we can" do this in constant additional depth after separately sorting a1, a2,...,an and an 1, an 2,...,a2n . ! " ! + + " 27.5-4 ! Let S(k) be the depth of a sorting network with k inputs, and let M(k) be the depth of a merging network with 2k inputs. Suppose that we have a sequence of n numbers to be sorted and we know that every number is within k positions of its correct position in the sorted order. Show that we can sort the n numbers in depth S(k) 2M(k). + 720 Chapter 27 Sorting Networks
720720Unrolling Chapter Chapterthe 27 Recursion 27 Sorting Sorting Networks Networks (Figure 27.12) MERGER[2] SORTER[n/2] MERGER[4]
MERGERMERGER[2] [2]MERGER[2] SORTERSORTER[n/2][n/2] MERGER[n] MERGERMERGER[4] [4] MERGER[8] MERGERMERGER[2] [2] MERGER[2] MERGERMERGER[n] [n] MERGERMERGER[8] [8] SORTER[n/2] MERGER[4] MERGERMERGER[2] [2] MERGER[2] SORTERSORTER[n/2][n/2] MERGERMERGER[4] [4] MERGER[2] (a) MERGER[2] (b) (a) (a) (b) (b)
1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 Recursion for D(n): 0 0 1 1 0 0 0 0 1 0 1 0 1 0 0 1 1 0 1 1 1 0 ( 0 00 1 1 1 1 0 0 0 0 if n = 1, 0 0 D(n) = 1 11 0 0 0 0 0 0 0 k 1 0 D(n/2) + log n if n = 2 . 0 00 1 1 0 0 1 1 1 0 00 0 0 0 0 00 1 1 1 2 0 0 1 1 Solution: D(n) = Θ(log n). 0 00 0 1 1 1 1 depth 1 2 2 3 4 4 4 4 5 5 6 depthdepth1 1 2 2 23 2 34 4 44 44 54 5 46 5 5 6 (c) (c) (c) SORTER[n] has depth Θ(log2 n) and sorts any input. FigureFigure 27.12 27.12TheThe sorting sorting network network SORTER SORTER[n]constructedbyrecursivelycombiningmergingnet-[n]constructedbyrecursivelycombiningmergingnet- works.works.Figure(a) The(a) 27.12The recursive recursiveThe construction. sortingconstruction. network(b) Unrolling(b) Unrolling SORTER the recursion. the[n]constructedbyrecursivelycombiningmergingnet- recursion.(c) Replacing(c) Replacing the M theERGER MERGERboxesboxes withwith theworks. actual the actual(a) mergingThe merging recursive networks. networks. construction. The Thedepth depth of each(b) of eachUnrolling comparat comparator the isor indicated, recursion. is indicated, and(c) sampleandReplacing sample zero-one zero-one the MERGER boxes valuesvalueswith areI. shown arethe Sorting shown actual on Networks the on merging wires. the wires. networks. Batcher’s The Sorting depth Network of each comparator is indicated, and18 sample zero-one values are shown on the wires. 27.5-227.5-2 ShowShow that that the thedepth depth of S ofORTER SORTER[n]isexactly[n]isexactly(lg n(lg)(lgn)(nlg n1)/2.1)/2. 27.5-2 + + 27.5-327.5-3Show that the depth of SORTER[n]isexactly(lg n)(lg n 1)/2. SupposeSuppose that that we havewe have 2n elements 2n elementsa , a ,...,, a ,...,a aandand wish wish to partition to partition+ them them into into 1 ! 21 2 2n 2n" the then27.5-3smallestn smallest and and the then largest.n largest. Prove! Prove that that we wecan" can do this do this in constant in constant additional additional depthdepthSuppose after after separately separatelythat we sorting have sorting 2an1,elementsa21,...,, a2,...,an aand1n, aand2a,...,n 1a,nan1a,22an,...,n and2,...,a2 wishn a.2n to. partition them into ! ! " " ! +! + + + " " the n smallest and the n largest. Prove! that we can" do this in constant additional 27.5-427.5-4! ! depth after separately sorting a1, a2,...,an and an 1, an 2,...,a2n . LetLetS(k)S(bek) thebe the depth depth of a of sorting a sorting network! network with withk inputs,k" inputs, and! and+ let letM(+kM) (bek) thebe the" depthdepth of a of merging a merging network network with with 2k inputs. 2k inputs. Suppose Suppose that that we havewe have a sequence a sequence of n of n numbersnumbers27.5-4 to be to! sorted be sorted and and we weknow know that that every every number number is within is withink positionsk positions of its of its correctcorrectLet positionS position(k) be in the thein thesorted depth sorted order. of order. a Show sorting Show that that networkwe canwe can sort with sort the thnkenumbersinputs,n numbers in and depth in letdepthM(k) be the S(k)Sdepth(k)2M2( ofkM).( ak merging). network with 2k inputs. Suppose that we have a sequence of n numbers+ + to be sorted and we know that every number is within k positions of its correct position in the sorted order. Show that we can sort the n numbers in depth S(k) 2M(k). + A Glimpse at the AKS Network
Ajtai, Komlós, Szemerédi (1983) There exists a sorting network with depth O(log n).
Quite elaborate construction, and involves huges constants.
Perfect Halver A perfect halver is a comparator network that, given any input, places the n/2 smaller keys in b1,..., bn/2 and the n/2 larger keys in bn/2+1,..., bn.
2 Perfect halver of depth log2 n exist yields sorting networks of depth Θ((log n) ).
Approximate Halver An (n, )-approximate halver, < 1, is a comparator network that for every k = 1, 2,..., n/2 places at most k of its k smallest keys in bn/2+1,..., bn and at most k of its k largest keys in b1,..., bn/2.
We will prove that such networks can be constructed in constant depth!
I. Sorting Networks Batcher’s Sorting Network 19 Expander Graphs
Expander Graphs A bipartite (n, d, µ)-expander is a graph with: G has n vertices (n/2 on each side) the edge-set is the union of d matchings For every subset S ⊆ V being in one part,
|N(S)| ≥ min{µ · |S|, n/2 − |S|}
L R
Expander Graphs: probabilistic construction “easy”: take d (disjoint) random matchings explicit construction is a deep mathematical problem with ties to number theory, group theory, combinatorics etc. many applications in networking, complexity theory and coding theory
I. Sorting Networks Batcher’s Sorting Network 20 From Expanders to Approximate Halvers
1
2
3 1 6 4 2 7 3 8 5 4 9 6 5 10 7
L R 8
9
10
I. Sorting Networks Batcher’s Sorting Network 21 From Expanders to Approximate Halvers
1
2
3 1 6 4 2 7 3 8 5 4 9 6 5 10 7
L R 8
9
10
I. Sorting Networks Batcher’s Sorting Network 21 From Expanders to Approximate Halvers
1
2
3 1 6 4 2 7 3 8 5 4 9 6 5 10 7
L R 8
9
10
I. Sorting Networks Batcher’s Sorting Network 21 From Expanders to Approximate Halvers
1
2
3 1 6 4 2 7 3 8 5 4 9 6 5 10 7
L R 8
9
10
I. Sorting Networks Batcher’s Sorting Network 21 Here we used that k ≤ n/2
Existence of Approximate Halvers
Proof: X := wires with the k smallest inputs Y := wires in lower half with k smallest outputs For every u ∈ N(Y ): ∃ comparator (u, v) Let ut , vt be their keys after the comparator ut ud Let ud , vd be their keys at the output Note that vd ∈ Y ⊆ X Further: ud ≤ ut ≤ vt ≤ vd ⇒ ud ∈ X Since u was arbitrary: |Y | + |N(Y )| ≤ k. Since G is a bipartite (n, d, µ)-expander: |Y | + |N(Y )| ≥ |Y | + min{µ|Y |, n/2 − |Y |} = min{(1 + µ)|Y |, n/2}. v v Combining the two bounds above yields: t d (1 + µ)|Y | ≤ k. The same argument shows that at most · k, := 1/(µ + 1), of the k largest input keys are placed in b1,..., bn/2.
I. Sorting Networks Batcher’s Sorting Network 22 AKS network vs. Batcher’s network
Richard J. Lipton (Georgia Tech) Donald E. Knuth (Stanford) “The AKS sorting network is “Batcher’s method is much : it needs that n be better, unless n exceeds the galactic larger than 278 or so to finally total memory capacity of all be smaller than Batcher’s computers on earth!” network for n items.”
I. Sorting Networks Batcher’s Sorting Network 23 Siblings of Sorting Network
comparator Sorting Networks 7 2 < sorts any input of size n = special case of Comparison Networks 2 > 7
switch Switching (Shuffling) Networks 7 ? creates a random permutation of n items special case of Permutation Networks 2 ?
Counting Networks balancer 7 5 balances any stream of tokens over n wires special case of Balancing Networks 2 4
I. Sorting Networks Batcher’s Sorting Network 24 Outline
Introduction to Sorting Networks
Batcher’s Sorting Network
Counting Networks
Load Balancing on Graphs
I. Sorting Networks Counting Networks 25 Number of tokens differs by at most one
Counting Network
Distributed Counting Processors collectively assign successive values from a given range.
Values could represent addresses in memories or destinations on an interconnection network
Balancing Networks constructed in a similar manner like sorting networks instead of comparators, consists of balancers balancers are asynchronous flip-flops that forward tokens from its inputs to one of its two outputs alternately (top, bottom, top,...)
I. Sorting Networks Counting Networks 26 Counting Network
Distributed Counting Processors collectively assign successive values from a given range.
Values could represent addresses in memories or destinations on an interconnection network
Balancing Networks constructed in a similar manner like sorting networks instead of comparators, consists of balancers balancers are asynchronous flip-flops that forward tokens from its inputs to one of its two outputs alternately (top, bottom, top,...)
Number of tokens differs by at most one
I. Sorting Networks Counting Networks 26 Bitonic Counting Network
Counting Network (Formal Definition)
1. Let x1, x2,..., xn be the number of tokens (ever received) on the designated input wires
2. Let y1, y2,..., yn be the number of tokens (ever received) on the designated output wires Pn Pn 3. In a quiescent state: i=1 xi = i=1 yi 4. A counting network is a balancing network with the step-property:
0 ≤ yi − yj ≤ 1 for any i < j.
I. Sorting Networks Counting Networks 27 0 0 Let z1,..., zn/2 and zn/2+1,..., zn be the outputs of the MERGER[n/2] subnetworks
0 Balancer between zj and zj will ensure that the step property holds.
n > 2: 0 0 IH ⇒ z1,..., zn/2 and zn/2+1,..., zn have the step property Pn/2 0 Pn/2 0 Let Z := i=1 zi and Z := i=1 zi F1 ⇒ Z = d 1 Pn/2 x e + b 1 Pn x c and Z 0 = b 1 Pn/2 x c + d 1 Pn x e 2 i=1 i 2 i=n/2+1 i 2 i=1 i 2 i=n/2+1 i 0 Case 1: If Z = Z , then F2 implies the output of MERGER[n] is yi = z1+b(i−1)/2c X 0 0 0 Case 2: If |Z − Z | = 1, F3 implies zi = zi for i = 1,..., n/2 except a unique j with zj 6= zj .
Correctness of the Bitonic Counting Network
Facts
Let x1,..., xn and y1,..., yn have the step property. Then: Pn/2 1 Pn Pn/2 1 Pn 1. We have i=1 x2i−1 = 2 i=1 xi , and i=1 x2i = 2 i=1 xi Pn Pn 2. If i=1 xi = i=1 yi , then xi = yi for i = 1,..., n. Pn Pn 3. If i=1 xi = i=1 yi + 1, then ∃! j = 1, 2,..., n with xj = yj + 1 and xi = yi for j 6= i.
Key Lemma
Consider aM ERGER[n]. Then if the inputs x1,..., xn/2 and xn/2+1,..., xn have the step property, then so does the output y1,..., yn.
Proof (by induction on n) Case n = 2 is clear, since MERGER[2] is a single balancer
I. Sorting Networks Counting Networks 28 0 Balancer between zj and zj will ensure that the step property holds.
z0 1
z0 2 z0 3
z0 4
0 0 IH ⇒ z1,..., zn/2 and zn/2+1,..., zn have the step property Pn/2 0 Pn/2 0 Let Z := i=1 zi and Z := i=1 zi F1 ⇒ Z = d 1 Pn/2 x e + b 1 Pn x c and Z 0 = b 1 Pn/2 x c + d 1 Pn x e 2 i=1 i 2 i=n/2+1 i 2 i=1 i 2 i=n/2+1 i 0 Case 1: If Z = Z , then F2 implies the output of MERGER[n] is yi = z1+b(i−1)/2c X 0 0 0 Case 2: If |Z − Z | = 1, F3 implies zi = zi for i = 1,..., n/2 except a unique j with zj 6= zj .
Correctness of the Bitonic Counting Network
Facts
Let x1,..., xn and y1,..., yn have the step property. Then: Pn/2 1 Pn Pn/2 1 Pn 1. We have i=1 x2i−1 = 2 i=1 xi , and i=1 x2i = 2 i=1 xi Pn Pn 2. If i=1 xi = i=1 yi , then xi = yi for i = 1,..., n. Pn Pn 3. If i=1 xi = i=1 yi + 1, then ∃! j = 1, 2,..., n with xj = yj + 1 and xi = yi for j 6= i.
z1
z2
z3
z4
Proof (by induction on n) Case n = 2 is clear, since MERGER[2] is a single balancer 0 0 n > 2: Let z1,..., zn/2 and zn/2+1,..., zn be the outputs of the MERGER[n/2] subnetworks
I. Sorting Networks Counting Networks 28 0 Balancer between zj and zj will ensure that the step property holds.
0 0 IH ⇒ z1,..., zn/2 and zn/2+1,..., zn have the step property Pn/2 0 Pn/2 0 Let Z := i=1 zi and Z := i=1 zi F1 ⇒ Z = d 1 Pn/2 x e + b 1 Pn x c and Z 0 = b 1 Pn/2 x c + d 1 Pn x e 2 i=1 i 2 i=n/2+1 i 2 i=1 i 2 i=n/2+1 i 0 Case 1: If Z = Z , then F2 implies the output of MERGER[n] is yi = z1+b(i−1)/2c X 0 0 0 Case 2: If |Z − Z | = 1, F3 implies zi = zi for i = 1,..., n/2 except a unique j with zj 6= zj .
Correctness of the Bitonic Counting Network
Facts
Let x1,..., xn and y1,..., yn have the step property. Then: Pn/2 1 Pn Pn/2 1 Pn 1. We have i=1 x2i−1 = 2 i=1 xi , and i=1 x2i = 2 i=1 xi Pn Pn 2. If i=1 xi = i=1 yi , then xi = yi for i = 1,..., n. Pn Pn 3. If i=1 xi = i=1 yi + 1, then ∃! j = 1, 2,..., n with xj = yj + 1 and xi = yi for j 6= i.
z1 z0 1 z2 z0 2 z0 3 z3 z0 4 z4
Proof (by induction on n) Case n = 2 is clear, since MERGER[2] is a single balancer 0 0 n > 2: Let z1,..., zn/2 and zn/2+1,..., zn be the outputs of the MERGER[n/2] subnetworks
I. Sorting Networks Counting Networks 28 Correctness of the Bitonic Counting Network
Facts
Let x1,..., xn and y1,..., yn have the step property. Then: Pn/2 1 Pn Pn/2 1 Pn 1. We have i=1 x2i−1 = 2 i=1 xi , and i=1 x2i = 2 i=1 xi Pn Pn 2. If i=1 xi = i=1 yi , then xi = yi for i = 1,..., n. Pn Pn 3. If i=1 xi = i=1 yi + 1, then ∃! j = 1, 2,..., n with xj = yj + 1 and xi = yi for j 6= i.
z1 z0 1 z2 z0 2 z0 3 z3 z0 4 z4
Proof (by induction on n) Case n = 2 is clear, since MERGER[2] is a single balancer 0 0 n > 2: Let z1,..., zn/2 and zn/2+1,..., zn be the outputs of the MERGER[n/2] subnetworks 0 0 IH ⇒ z1,..., zn/2 and zn/2+1,..., zn have the step property Pn/2 0 Pn/2 0 Let Z := i=1 zi and Z := i=1 zi F1 ⇒ Z = d 1 Pn/2 x e + b 1 Pn x c and Z 0 = b 1 Pn/2 x c + d 1 Pn x e 2 i=1 i 2 i=n/2+1 i 2 i=1 i 2 i=n/2+1 i 0 Case 1: If Z = Z , then F2 implies the output of MERGER[n] is yi = z1+b(i−1)/2c X 0 0 0 Case 2: If |Z − Z | = 1, F3 implies zi = zi for i = 1,..., n/2 except a unique j with zj 6= zj . 0 Balancer between zj and zj will ensure that the step property holds. I. Sorting Networks Counting Networks 28 2 26 1 5
42 1265
4 15 2 6
236 415
35 15 4 3
6513 43
36 3 4
Counting can be done as follows: Add local counter to each output wire i, to assign consecutive numbers i, i + n, i + 2 · n,...
Bitonic Counting Network in Action
4 x1 y1
2 x2 y2
5 3 1 x3 y3
6 x4 y4
I. Sorting Networks Counting Networks 29 4 2 26
42 1265
2 4 15
236 415
5 3 135 15 4
6513 43
6 36 3
Bitonic Counting Network in Action
x1 y1 1 5
x2 y2 2 6
x3 y3 3
x4 y4 4
Counting can be done as follows: Add local counter to each output wire i, to assign consecutive numbers i, i + n, i + 2 · n,...
I. Sorting Networks Counting Networks 29 A Periodic Counting Network [Aspnes, Herlihy, Shavit, JACM 1994]
x1 y1
x2 y2
x3 y3
x4 y4
x5 y5
x6 y6
x7 y7
x8 y8
Consists of log n BLOCK[n] networks each of which has depth log n
I. Sorting Networks Counting Networks 30 From Counting to Sorting The converse is not true! Counting vs. Sorting If a network is a counting network, then it is also a sorting network.
Proof. Let C be a counting network, and S be the corresponding sorting network n Consider an input sequence a1, a2,..., an ∈ {0, 1} to S n Define an input x1, x2,..., xn ∈ {0, 1} to C by xi = 1 iff ai = 0. C is a counting network ⇒ all ones will be routed to the lower wires S corresponds to C ⇒ all zeros will be routed to the lower wires By the Zero-One Principle, S is a sorting network.
0 1 1 1 1 1 0
1 0 0 1 1 0 0
C 1 1 1 0 0 0 1 S
0 0 0 0 0 1 1
I. Sorting Networks Counting Networks 31 Outline
Introduction to Sorting Networks
Batcher’s Sorting Network
Counting Networks
Load Balancing on Graphs
I. Sorting Networks Load Balancing on Graphs 32 Communication Models: Diffusion vs. Matching
1 2 1 2
6 3 6 3
5 4 5 4
1 1 1 1 1 3 3 0 0 0 3 2 2 0 0 0 0 1 1 1 1 1 3 3 3 0 0 0 2 2 0 0 0 0 1 1 1 0 0 0 0 0 0 0 3 3 3 0 0 (t) M = M = 1 1 0 0 1 1 1 0 0 0 0 0 3 3 3 2 2 1 1 1 0 0 0 1 1 0 0 0 0 3 3 3 2 2 1 1 1 0 0 0 0 0 0 3 0 0 0 3 3
I. Sorting Networks Load Balancing on Graphs 33 Smoothness of the Load Distribution
n let x ∈ R be a load vector x denotes the average load
Metrics q t Pn t 2 3 3 `2-norm: Φ = (x − x) i=1 i 6.5 n t makespan: max x 2 i=1 i 2 n t n 3.5 discrepancy: maxi=1 xi − mini=1 xi . 1.5 2.5
For this example: √ √ Φt = 02 + 02 + 3.52 + 0.52 + 12 + 12 + 1.52 + 0.52 = 17 n t maxi=1 xi = 6.5 n t n t maxi=1 xi − mini=1 xi = 5
I. Sorting Networks Load Balancing on Graphs 34 Diffusion Matrix
How to choose α for a d-regular graph? α = 1 d may yield to oscillation (if graph is bipartite) α = 1 d+1 ensures convergence α = 1 2d ensures convergence (and all eigenvalues of M are non-negative) Diffusion Matrix Given an undirected, connected graph G = (V , E) and a diffusion pa- rameter α > 0, the diffusion matrix M is defined as follows: α if (i, j) ∈ E, Mij = 1 − α deg(i) if i = j, 0 otherwise. # neighbors of i
Further let γ(M) := maxµi 6=1 |µi |, where µ1 = 1 ≥ µ2 ≥ · · · ≥ µn ≥ −1 are the eigenvalues of M. This can be also seen as a random walk on G!
First-Order Diffusion: Load vector x t satisfies
x t = M · x t−1.
I. Sorting Networks Load Balancing on Graphs 35 1D grid 2D grid 3D grid
γ(M) ≈ 1 − 1 γ( ) ≈ − 1 γ(M) ≈ 1 − 1 n2 M 1 n n2/3
Hypercube Random Graph Complete Graph
1 γ(M) ≈ 1 − log n γ(M) < 1 γ(M) ≈ 0 γ(M) ∈ (0, 1] measures connectivity of G
I. Sorting Networks Load Balancing on Graphs 36 afterafter iteration iteration 20: 1:2:4:3:5:
1.111.301.481.561.860
1.110.560.931.281.850 2.223.332.102.061.88
1.111.671.231.341.85 3.332.782.602.431.90
1.711.671.661.86 2.473.332.782.601.90
2.221.672.161.88
Diffusion on a Ring
0
0 0
0 10
5 0
0
I. Sorting Networks Load Balancing on Graphs 37 afterafter iteration iteration 20: 2:4:3:5:
1.111.301.481.561.860
1.110.560.931.281.850 2.222.102.061.880
1.111.231.341.850 2.782.602.431.9010
1.711.661.865 2.472.782.601.900
2.222.161.880
Diffusion on a Ring
after iteration 1:
0
0 3.33
1.67 3.33
1.67 3.33
1.67
I. Sorting Networks Load Balancing on Graphs 37 after iteration 1:2:4:3:5:
1.111.301.481.560
1.110.560.931.280 2.223.332.102.060
1.111.671.231.340 3.332.782.602.4310
1.711.671.665 2.473.332.782.600
2.221.672.160
Diffusion on a Ring
after iteration 20:
1.86
1.85 1.88
1.85 1.90
1.86 1.90
1.88
I. Sorting Networks Load Balancing on Graphs 37 Convergence of the Quadratic Error (Upper Bound)
Lemma
Let γ(M) := maxµi 6=1 |µi |, where µ1 = 1 ≥ µ2 ≥ · · · ≥ µn ≥ −1 are the eigenvalues of M. Then for any iteration t,
Φt ≤ γ(M)2t · Φ0.
Proof: Let et = x t − x, where x is the column vector with all entries set to x Express et through the orthogonal basis given by the eigenvectors of M: n t X e = α1 · v1 + α2 · v2 + ··· + αn · vn = αi · vi . i=2 t For the diffusion scheme, e is orthogonal to v1 n ! n t+1 t X X e = Me = M · αi vi = αi µi vi . i=2 i=2
Taking norms and using that the vi ’s are orthogonal, n n t+1 t X 2 2 2 X 2 2 t ke k2 = kMe k2 = ci µi kvi k2 ≤ γ ci kvi k2 = γ · ke k2 i=2 i=2
I. Sorting Networks Load Balancing on Graphs 38 Convergence of the Quadratic Error (Lower Bound)
Lemma 0 For any eigenvalue µi , 1 ≤ i ≤ n, there is an initial load vector x so that
t 2t 0 Φ = µi · Φ .
Proof: 0 Let x = x + vi , where vi is the eigenvector corresponding to µi Then
t t−1 t 0 t t e = Me = M e = M vi = µi vi ,
and
t t 2t 2t 0 Φ = ke k2 = µi kvi k2 = µi Φ .
I. Sorting Networks Load Balancing on Graphs 39 Outlook: Idealised versus Discrete Case
Here load consists of integers that cannot be divided further.
Idealised Case Discrete Case Rounding Error
x t = M · x t−1 y t = M · y t−1 +∆ t t 0 t = M · x X − = Mt · y 0 + Mt s · ∆s s=1
Linear System Non-Linear System corresponds to Markov chain rounding of a Markov chain well-understood harder to analyze
Given any load vector x0, the num- How close can it be made ber of iterations until xt satisfies log(Φ0/) Φt ≤ is at most . to the idealised case? γ(M)
I. Sorting Networks Load Balancing on Graphs 40 II. Matrix Multiplication Thomas Sauerwald
Easter 2015 Outline
Introduction
Serial Matrix Multiplication
Reminder: Multithreading
Multithreaded Matrix Multiplication
II. Matrix Multiplication Introduction 2 4.2 Strassen’s algorithm for matrix multiplication 75
ray is 0.Howwouldyouchangeanyofthealgorithmsthatdonotallowempty subarrays to permit an empty subarray to be the result?
4.1-5 Use the following ideas to develop a nonrecursive, linear-time algorithm for the maximum-subarray problem. Start at the left end of the array, and progress toward the right, keeping track of the maximum subarray seen so far. Knowing a maximum subarray of AŒ1 : : j ,extendtheanswertofindamaximumsubarrayendingatin- dex j 1 by using the following observation: a maximum subarray of AŒ1 : : j 1 C C is either a maximum subarray of AŒ1 : : j or a subarray AŒi : : j 1,forsome C 1 i j 1.DetermineamaximumsubarrayoftheformAŒi : : j 1 in Ä Ä C C constant time based on knowing a maximum subarray ending at index j .
4.2 Strassen’s algorithm for matrix multiplication
If you have seen matrices before, then you probably know how to multiply them. (Otherwise, you should read Section D.1 in Appendix D.) If A .aij / and D MatrixB .bij Multiplication/ are square n n matrices, then in the product C A B,wedefinethe D " D # entry cij ,fori;j 1; 2; : : : ; n,by D n cij aik bkj : (4.8) Remember:D If# A = (aij ) and B = (bij ) are square n × n matrices, then the k 1 matrixX productD C = A · B is defined by n2 n We must compute matrix entries,n and each is the sum of values. The following procedure takes n n matricesX A and B and multiplies them, returning their n n " cij = aik · bkj ∀i, j = 1, 2,..., n. " product C .Weassumethateachmatrixhasanattributerows,givingthenumber k=1 of rows in the matrix.
2 3 SQUARE-MATRIX-MULTIPLY.A; B/ This definition suggests that n · n = n 1 n A:rows arithmetic operations are necessary. D 2letC be a new n n matrix " 3 for i 1 to n D 4 for j 1 to n D 5 cij 0 D 6 for k 1 to n D 7 cij cij aik bkj D C # 8 return C
The SQUARE-MATRIX-MULTIPLY procedure works as follows. The3 for loop SQUARE-MATRIX-MULTIPLY(A, B) takes time Θ(n ). of lines 3–7 computes the entries of each row i,andwithinagivenrowi,the
II. Matrix Multiplication Introduction 3 Outline
Introduction
Serial Matrix Multiplication
Reminder: Multithreading
Multithreaded Matrix Multiplication
II. Matrix Multiplication Serial Matrix Multiplication 4 Divide & Conquer: First Approach
Assumption: n is always an exact power of 2.
Divide & Conquer: Partition A, B, and C into four n/2 × n/2 matrices: A A B B C C A = 11 12 , B = 11 12 , C = 11 12 . A21 A22 B21 B22 C21 C22
Hence the equation C = A · B becomes: C C A A B B 11 12 = 11 12 · 11 12 C21 C22 A21 A22 B21 B22
This corresponds to the four equations: C = A · B + A · B 11 11 11 12 21 Each equation specifies C12 = A11 · B12 + A12 · B22 two multiplications of
C21 = A21 · B11 + A22 · B21 n/2×n/2 matrices and the addition of their products. C22 = A21 · B12 + A22 · B22
II. Matrix Multiplication Serial Matrix Multiplication 5 8 Multiplications Goal:4 Additions Reduce and the Partitioning number of multiplications
Divide4.2 Strassen’s & algorithmConquer: for matrix First multiplication Approach (Pseudocode) 77
SQUARE-MATRIX-MULTIPLY-RECURSIVE.A; B/ 1 n A:rows D Line 5: Handle submatrices implicitly through 2letC be a new n n matrix ! 3 if n == 1 index calculations instead of creating them. 4 c11 a11 b11 D " 5 else partition A, B,andC as in equations (4.9) 6 C11 SQUARE-MATRIX-MULTIPLY-RECURSIVE.A11;B11/ D SQUARE-MATRIX-MULTIPLY-RECURSIVE.A12;B21/ C 7 C12 SQUARE-MATRIX-MULTIPLY-RECURSIVE.A11;B12/ D SQUARE-MATRIX-MULTIPLY-RECURSIVE.A12;B22/ C 8 C21 SQUARE-MATRIX-MULTIPLY-RECURSIVE.A21;B11/ D SQUARE-MATRIX-MULTIPLY-RECURSIVE.A22;B21/ C 9 C22 SQUARE-MATRIX-MULTIPLY-RECURSIVE.A21;B12/ D SQUARE-MATRIX-MULTIPLY-RECURSIVE.A22;B22/ C 10 return C
This pseudocode glosses over one subtle but important implementation detail. LetHowT do(n we) be partition the runtime the matrices of in this line5? procedure. If we were to Then: create 12 new n=2 n=2 ‚.n2/ ! matrices, we would spend time( copying entries. In fact, we can partition the matrices without copying entries.Θ( The1) trick is to use index calculations.if n = 1, We T (n) = identify a submatrix by a range of row8 · indicesT (n/ and2) + a range Θ(n2 of) columnif n > indices1. of the original matrix. We end up representing a submatrix a little differently from how we represent the original matrix, which is the subtlety we are glossing over. Solution:The advantageT (n is) that, = Θ( since8log we2 n) can =specify Θ(n3) submatricesNo improvement by index calculations, over the naive algorithm! executing line 5 takes only ‚.1/ time (although we shall see that it makes no difference asymptotically to the overall running time whether we copy or partition II. Matrix Multiplication Serial Matrix Multiplication in place). 6 Now, we derive a recurrence to characterize the running time of SQUARE- MATRIX-MULTIPLY-RECURSIVE.LetT.n/ be the time to multiply two n n ! matrices using this procedure. In the base case, when n 1,weperformjustthe D one scalar multiplication in line 4, and so T.1/ ‚.1/ : (4.15) D The recursive case occurs when n>1.Asdiscussed,partitioningthematricesin line 5 takes ‚.1/ time, using index calculations. In lines 6–9, we recursively call SQUARE-MATRIX-MULTIPLY-RECURSIVE atotalofeighttimes.Becauseeach recursive call multiplies two n=2 n=2 matrices, thereby contributing T.n=2/ to ! the overall running time, the time taken by all eight recursive calls is 8T .n=2/.We also must account for the four matrix additions in lines 6–9. Each of these matrices contains n2=4 entries, and so each of the four matrix additions takes ‚.n2/ time. Since the number of matrix additions is a constant, the total time spent adding ma- Divide & Conquer (Second Approach)
Idea: Make the recursion tree less bushy by performing only 7 recursive multiplications of n/2 × n/2 matrices.
Strassen’s Algorithm (1969) 1. Partition each of the matrices into four n/2 × n/2 submatrices
2. Create 10 matrices S1, S2,..., S10. Each is n/2 × n/2 and is the sum or difference of two matrices created in the previous step.
3. Recursively compute 7 matrix products P1, P2,..., P7, each n/2 × n/2 4. Compute n/2 × n/2 submatrices of C by adding and subtracting various combinations of the Pi .
Time for steps 1,2,4: Θ(n2), hence T (n) = 7 · T (n/2) + Θ(n2) ⇒ T (n) = Θ(nlog 7).
II. Matrix Multiplication Serial Matrix Multiplication 7 Other three blocks can be verified similarly.
Details of Strassen’s Algorithm
The 10 Submatrices and 7 Products
P1 = A11 · S1 = A11 · (B12 − B22) P2 = S2 · B22 = (A11 + A12) · B22 P3 = S3 · B11 = (A21 + A22) · B11 P4 = A22 · S4 = A22 · (B21 − B11) P5 = S5 · S6 = (A11 + A22) · (B11 + B22) P6 = S7 · S8 = (A12 − A22) · (B21 + B22)
P7 = S9 · S10 = (A11 − A21) · (B11 + B12)
Claim A B + A B A B + A B P + P − P + P P + P 11 11 12 21 11 12 12 21 = 5 4 2 6 1 2 A21B11 + A22B21 A21B12 + A22B22 P3 + P4 P5 + P1 − P3 − P7
Proof: P5 + P4 − P2 + P6 = A11B11 +A11B22 +A22B11 +A22B22 +A22B21 −A22B11 −A11B22 −A12B22 + A12B21 +A12B22 −A22B21 −A22B22 = A11B11 + A12B21
II. Matrix Multiplication Serial Matrix Multiplication 8 Current State-of-the-Art
Conjecture: Does a quadratic-time algorithm exist?
Asymptotic Complexities: O(n3), naive approach O(n2.808), Strassen (1969) O(n2.796), Pan (1978) O(n2.522), Schönhage (1981) O(n2.517), Romani (1982) O(n2.496), Coppersmith and Winograd (1982) O(n2.479), Strassen (1986) O(n2.376), Coppersmith and Winograd (1989) O(n2.374), Stothers (2010) O(n2.3728642), V. Williams (2011) O(n2.3728639), Le Gall (2014) ...
II. Matrix Multiplication Serial Matrix Multiplication 9 Outline
Introduction
Serial Matrix Multiplication
Reminder: Multithreading
Multithreaded Matrix Multiplication
II. Matrix Multiplication Reminder: Multithreading 10 Memory Models
Distributed Memory Each processor has its private memory Access to memory of another processor via messages
1 2 3 4 5 6
Shared Memory Central location of memory Each processor has direct access
Shared Memory
1 2 3 4 5 6
II. Matrix Multiplication Reminder: Multithreading 11 Dynamic Multithreading
Programming shared-memory parallel computer difficult Use concurrency platform which coordinates all resources
Scheduling jobs, communication protocols, load balancing etc.
Functionalities: spawn (optional) prefix to a procedure call statement procedure is executed in a separate thread sync wait until all spawned threads are done parallel (optinal) prefix to the standard loop for each iteration is called in its own thread
Only logical parallelism, but not actual! Need a scheduler to map threads to processors.
II. Matrix Multiplication Reminder: Multithreading 12 Computing Fibonacci Numbers Recursively (Fig. 27.1) 27.1 The basics of dynamic multithreading 775
FIB.6/
FIB.5/ FIB.4/
FIB.4/ FIB.3/ FIB.3/ FIB.2/
FIB.3/ FIB.2/ FIB.2/ FIB.1/ FIB.2/ FIB.1/ FIB.1/ FIB.0/
FIB.2/ FIB.1/ FIB.1/ FIB.0/ FIB.1/ FIB.0/ FIB.1/ FIB.0/
FIB.1/ FIB.0/ Very inefficient – exponential time!
Figure 27.1 The tree of recursive procedure instances when computing FIB.6/.Eachinstanceof FIB with the same argument does the same work to produce the same result, providing an inefficient 0: FIB(n)but interesting way to compute Fibonacci numbers. 1: if n<=1 return n 2: elseFIB.n/ x=FIB(n-1) 3: 1 ify=FIB(n-2)n 1 Ä n 4: 2 returnreturn x+y 3 else x FIB.n 1/ D " 4 y FIB.n 2/ II. Matrix MultiplicationD " Reminder: Multithreading 13 5 return x y C You would not really want to compute large Fibonacci numbers this way, be- cause this computation does much repeated work. Figure 27.1 shows the tree of recursive procedure instances that are created when computing F6.Forexample, acalltoFIB.6/ recursively calls FIB.5/ and then FIB.4/.But,thecalltoFIB.5/ also results in a call to FIB.4/.BothinstancesofFIB.4/ return the same result (F4 3). Since the FIB procedure does not memoize, the second call to FIB.4/ D replicates the work that the first call performs. Let T.n/ denote the running time of FIB.n/.SinceFIB.n/ contains two recur- sive calls plus a constant amount of extra work, we obtain the recurrence T.n/ T.n 1/ T.n 2/ ‚.1/ : D " C " C This recurrence has solution T.n/ ‚.Fn/,whichwecanshowusingthesubsti- D tution method. For an inductive hypothesis, assume that T.n/ aFn b,where Ä " a>1and b>0are constants. Substituting, we obtain P-FIB(4)
P-FIB(3) P-FIB(2)
P-FIB(2) P-FIB(1) P-FIB(1) P-FIB(0) Computation Dag G = (V , E) • V set of threads (instructions/strands without parallel control) •P-FIB(1)E set of dependenciesP-FIB(0) Total work ≈ 17 nodes, longest path: 8 nodes
0: P-FIB(n) 1: if n<=1 return n 2: else x=spawn P-FIB(n-1) 3: y=spawn P-FIB(n-2) 4: sync 5: return x+y
Computing Fibonacci Numbers in Parallel (Fig. 27.2)
• Without spawn and sync same pseudocode as before • spawn does not imply parallel execution (depends on scheduler)
0: P-FIB(n) 1: if n<=1 return n 2: else x=spawn P-FIB(n-1) 3: y=spawn P-FIB(n-2) 4: sync 5: return x+y
II. Matrix Multiplication Reminder: Multithreading 14 P-FIB(4)
P-FIB(3) P-FIB(2)
P-FIB(2) P-FIB(1) P-FIB(1) P-FIB(0)
• Without spawn and sync same pseudocode as before • spawnP-FIB(1)does not implyP-FIB(0) parallelTotal execution work ≈ (depends17 nodes, on longest scheduler path:) 8 nodes
Computing Fibonacci Numbers in Parallel (Fig. 27.2)
Computation Dag G = (V , E) • V set of threads (instructions/strands without parallel control) • E set of dependencies
0: P-FIB(n) 1: if n<=1 return n 2: else x=spawn P-FIB(n-1) 3: y=spawn P-FIB(n-2) 4: sync 5: return x+y
II. Matrix Multiplication Reminder: Multithreading 14 Computation Dag G = (V , E) • Without• V setspawn of threadsand (instructions/strandssync same pseudocode without as parallel before control) • spawn• E setdoes of dependencies not imply parallel execution (depends on scheduler)
Computing Fibonacci Numbers in Parallel (Fig. 27.2)
P-FIB(4)
P-FIB(3) P-FIB(2)
P-FIB(2) P-FIB(1) P-FIB(1) P-FIB(0)
P-FIB(1) P-FIB(0) Total work ≈ 17 nodes, longest path: 8 nodes
0: P-FIB(n) 1: if n<=1 return n 2: else x=spawn P-FIB(n-1) 3: y=spawn P-FIB(n-2) 4: sync 5: return x+y
II. Matrix Multiplication Reminder: Multithreading 14 Computing Fibonacci Numbers in Parallel (DAG Perspective)
4 4 4
2 2 2
0
1
3 3 3
1
2 2 2
0
1
II. Matrix Multiplication Reminder: Multithreading 15 #nodesP = 18= 5
Span Longest time to execute the threads along any path.
If each thread takes unit time, span is the length of the critical path.
Performance Measures
P = 26 Work Total time to execute everything on single processor. 4
3 6 5
2 5
1
4
II. Matrix Multiplication Reminder: Multithreading 16 #nodesP = 26= 5
If each thread takes unit time, span is the length of the critical path.
Performance Measures
P = 18 Work Total time to execute everything on single processor. 4
3 6 5 Span
Longest time to execute the threads along any path. 2 5
1
4
II. Matrix Multiplication Reminder: Multithreading 16 P = 2618
4
3 6 5
2 5
1
4
Performance Measures
#nodes = 5 Work Total time to execute everything on single processor.
Span Longest time to execute the threads along any path.
If each thread takes unit time, span is the length of the critical path.
II. Matrix Multiplication Reminder: Multithreading 16 T∞ = 5
Span Law
TP ≥ T∞ Time on P processors can’t be shorter than time on ∞ processors
Speed-Up: T1 Maximum Speed-Up bounded by P! TP Parallelism: T1 T∞ Maximum Speed-Up for ∞ processors!
Work Law and Span Law
T1 = work, T∞ = span P = number of (identical) processors
TP = running time on P processors T1 = 8, P = 4
Running time actually also depends on scheduler etc.! Work Law
T1 TP ≥ P Time on P processors can’t be shorter than if all work all time
II. Matrix Multiplication Reminder: Multithreading 17 T1 = 8, P = 4
Speed-Up: T1 Maximum Speed-Up bounded by P! TP Parallelism: T1 T∞ Maximum Speed-Up for ∞ processors!
Work Law and Span Law
T1 = work, T∞ = span P = number of (identical) processors
TP = running time on P processors T∞ = 5
Running time actually also depends on scheduler etc.! Work Law
T1 TP ≥ P Time on P processors can’t be shorter than if all work all time
Span Law
TP ≥ T∞ Time on P processors can’t be shorter than time on ∞ processors
II. Matrix Multiplication Reminder: Multithreading 17 T1 =T∞8,=P5= 4
Work Law and Span Law
T1 = work, T∞ = span P = number of (identical) processors
TP = running time on P processors
Running time actually also depends on scheduler etc.! Work Law
T1 TP ≥ P Time on P processors can’t be shorter than if all work all time
Span Law
TP ≥ T∞ Time on P processors can’t be shorter than time on ∞ processors
Speed-Up: T1 Maximum Speed-Up bounded by P! TP Parallelism: T1 T∞ Maximum Speed-Up for ∞ processors!
II. Matrix Multiplication Reminder: Multithreading 17 Outline
Introduction
Serial Matrix Multiplication
Reminder: Multithreading
Multithreaded Matrix Multiplication
II. Matrix Multiplication Multithreaded Matrix Multiplication 18 27.1 The basics of dynamic multithreading 785
value for n suffices to achieve near perfect linear speedup for P-FIB.n/,because this procedure exhibits considerable parallel slackness.
Parallel loops Many algorithms contain loops all of whose iterations can operate in parallel. As we shall see, we can parallelize such loops using the spawn and sync keywords, but it is much more convenient to specify directly that the iterations of such loops can run concurrently. Our pseudocode provides this functionality via the parallel Warmup:concurrency Matrixkeyword, Vector which precedes Multiplication the for keyword in a for loop statement. As an example, consider the problem of multiplying an n n matrix A .aij / ! D Remember:by an n-vector Multiplyingx .xj /.Theresulting an n × n matrixn-vectorA =y (a ).yandi / isn given-vector by thex = equation (x ) yields D Dij j an n-vectorn y = (yi ) given by y a x ; i D ij j n j 1 X XD yi = aij xj for i = 1, 2,..., n. for i 1; 2; : : : ; n.Wecanperformmatrix-vectormultiplicationbycomputingallj=1 D the entries of y in parallel as follows:
MAT-VEC.A; x/ 1 n A:rows D 2lety be a new vector of length n 3 parallel for i 1 to n D 4 yi 0 D The parallel for-loops can be used since 5 parallel for i 1 to n D different entries of y can be computed concurrently. 6 for j 1 to n D 7 yi yi aij xj D C 8 return y
In this code, the parallel for keywords in lines 3 and 5 indicate that the itera- tions of theHow respective can a compiler loops may implement be run concurrently. the parallel A compiler for can-loop? implement each parallel for loop as a divide-and-conquer subroutine using nested parallelism. For example, the parallel for loop in lines 5–7 can be implemented with the call MAT-VEC-MAINII.-L MatrixOOP Multiplication.A; x; y; n; 1; n/ Multithreaded,wherethecompilerproducestheauxil- Matrix Multiplication 19 iary subroutine MAT-VEC-MAIN-LOOP as follows: 786 Chapter 27 Multithreaded Algorithms
1,8 27.1 The basics of dynamic multithreading 785
value for n suffices to achieve near perfect linear speedup for P-FIB.n/,because 1,4 5,8 this procedure exhibits considerable parallel slackness.
Parallel loops 1,2 3,4 5,6 7,8 Implementing786parallel Chapter 27 Multithreaded for AlgorithmsbasedMany on algorithms Divide-and-Conquer contain loops all of whose iterations can operate in parallel. As we shall see, we can parallelize such loops using the spawn and sync keywords, but it is much more convenient to specify directly that the iterations of such loops can run concurrently. Our pseudocode provides this functionality via the parallel 1,1 2,2 3,3 4,4 5,5 6,6 7,7 8,8 1,8 concurrency keyword, which precedes the for keyword in a for loop statement. As an example, consider the problem of multiplying an n n matrix A .aij / ! D by an n-vector x .xj /.Theresultingn-vector y .yi / is given by the equation Figure 27.4 AdagrepresentingthecomputationofM1,4AT-VEC-MAIN-LOOP.A; x; y; 8; 1; 5,8 8/.TheD D two numbers within each rounded rectangle give the values of the last two parametersn (i and i0 in the procedure header) in the invocation (spawn or call) of the procedure. They black circlesa repre-x ; i D ij j sent strands corresponding to either1,2 the base case or the 3,4 part of the procedure 5,6 up toj the1 spawn of 7,8 MAT-VEC-MAIN-LOOP in line 5; the shaded circles represent strands correspondingX toD the part of the procedure that calls MAT-VEC-MAIN-LOOP in line 6 up to the sync in linefor 7, wherei 1; it suspends 2; : : : ; n.Wecanperformmatrix-vectormultiplicationbycomputingall until the spawned subroutine in line 5 returns; and the white circles represent strandsD corresponding 1,1 2,2 3,3 4,4 5,5the entries 6,6 of y in 7,7 parallel 8,8 as follows: to the (negligible) part of the procedure after the sync up to the point where it returns. .A; x/ Figure 27.4 AdagrepresentingthecomputationofMMATAT-V-VECEC-MAIN-LOOP.A; x; y; 8; 1; 8/.The .A; x; y; n; i; i / i i MAT-VEC-MAIN-LOOP two numbers0 within each rounded rectangle give the1 valuesn of theA: lastrows two parameters ( and 0 in the procedure header) in the invocation (spawn or call) of theD procedure. The black circles repre- 1 if i == i 0 sent strands corresponding to either the base case2let or the party of thebe procedure a new up vector to the spawn of length of n j 1 n MAT-VEC-MAIN-LOOP in line 5; the shaded circles represent strands corresponding to the part of 2 for to the procedure that calls MAT-VEC-MAIN-LOOP in3 line 6parallel up to the sync forin linei 7, where1 itto suspendsn D D 3 yi yi aij xj until the spawned subroutine in line 5 returns; and the4 white circlesyi represent0 strands corresponding D C to the (negligible) part of the procedure after the sync up to the point whereD it returns. 4 else mid .i i 0/=2 5 parallel for i 1 to n D b C c .A; x; y; n; i; / D 5 spawn MAT-VEC-MAIN-LMATOOP-VEC-MAIN-LOOP.A;mid x; y; n; i; i 0/ 6 for j 1 to n D 6MAT-VEC-MAIN-LOOP.A; x;i y;i n;0 mid 1; i 0/ yi yi aij xj 1 if == C 7 2 for j 1 to n D C 7 sync D 8 return y 3 yi yi aij xj D C 4 else mid .i i 0/=2 D b C c This code recursively2 spawns theWork5 first halfspawn is ofM equal theAT-V iterationsEC-M toAIN-L running ofOOP the.A;In x; loop this y; n;toi; timecode,mid execute/ the ofparallel its serialization; for keywords in lines overhead 3 and 5 indicate that the itera- 6MAT-VEC-MAIN-LOOP.A; x; y; n; mid 1; i 0/ inT parallel1(n) with = Θ( then second) half of the iterations and then executestions a syncC of the,thereby respective loops may be run concurrently. A compiler can implement 7 ofsync recursive spawning does not change asymptotics. creating a binary tree of execution where the leaves are individualeach loopparallel iterations, for loop as a divide-and-conquer subroutine using nested parallelism. as shown in Figure 27.4. This code recursively spawns the first halfFor of the example, iterations of the theparallel loop to execute for loop in lines 5–7 can be implemented with the call To calculate the work T1.n/ of MinAT parallel-VEC withon the an secondn n halfmatrix, of the iterations weM simplyAT and-VEC then compute-M executesAIN-L a syncOOP,thereby.A; x; y; n; 1; n/,wherethecompilerproducestheauxil- T∞(n) = Θ(log n) + maxcreating aiter binary( treen) of! executionSpan where the is leaves the are individual depth loop of iterations, recursive callings plus the running time of its serialization, which we obtain by replacingiary the subroutineparallel for MAT-VEC-MAIN-LOOP as follows: 1as≤ showni≤n in Figure 27.4. 2 To calculate the workT1.n/T1.n/ ofthe‚.n MAT-V maximum/EC on an n n matrix, span we simply of compute any of the n iterations. loops with ordinary for loops. Thus, we have ,becausethequa-! the running time of its serialization,D which we obtain by replacing the parallel for dratic running time of the doubly nested loops in lines 5–7 dominates. This analysis2 = Θ(n). loops with ordinary for loops. Thus, we have T1.n/ ‚.n /,becausethequa- D dratic running time of the doubly nested loops in lines 5–7 dominates. This analysis II. Matrix Multiplication Multithreaded Matrix Multiplication 20 Naive Algorithm in Parallel
27.2 Multithreaded matrix multiplication 793
P-SQUARE-MATRIX-MULTIPLY.A; B/ 1 n A:rows D 2letC be a new n n matrix ! 3 parallel for i 1 to n D 4 parallel for j 1 to n D 5 cij 0 D 6 for k 1 to n D 7 cij cij aik bkj D C " 8 return C
To analyze this algorithm, observe that since the serialization of the algorithm is 3 3 P-SQUAREjust SQUARE-MATRIX-M-MATRIXULTIPLY-MULTIPLY(A, B),theworkisthereforesimplyhas work T1(n) = Θ(n ) andT1.n/ span T‚.n∞(n/), = Θ(n). D the same as the running time of SQUARE-MATRIX-MULTIPLY.Thespanis T .n/ ‚.n/,becauseitfollowsapathdownthetreeofrecursionforthe 1 D The first two nested for-loops parallelise perfectly. parallel for loop starting in line 3, then down the tree of recursion for the parallel for loop starting in line 4, and then executes all n iterations of the ordinary for loop starting in line 6, resulting in a total span of ‚.lg n/ ‚.lg n/ ‚.n/ ‚.n/. 3 2 C C D Thus, the parallelism is ‚.n /=‚.n/ ‚.n /.Exercise27.2-3asksyoutopar- D 3 allelize the innerII. loop Matrix Multiplication to obtain a parallelism Multithreaded of Matrix‚.n Multiplication= lg n/,whichyoucannotdo21 straightforwardly using parallel for,becauseyouwouldcreateraces.
Adivide-and-conquermultithreaded algorithm for matrix multiplication As we learned in Section 4.2, we can multiply n n matrices serially in time 7 2:81 ! ‚.nlg / O.n / using Strassen’s divide-and-conquer strategy, which motivates D us to look at multithreading such an algorithm. We begin, as we did in Section 4.2, with multithreading a simpler divide-and-conquer algorithm. Recall from page 77 that the SQUARE-MATRIX-MULTIPLY-RECURSIVE proce- dure, which multiplies two n n matrices A and B to produce the n n matrix C , ! ! relies on partitioning each of the three matrices into four n=2 n=2 submatrices: ! A A B B C C A 11 12 ;B 11 12 ;C 11 12 : D A21 A22 D B21 B22 D C21 C22 Â Ã Â Ã Â Ã Then, we can write the matrix product as
C11 C12 A11 A12 B11 B12 C21 C22 D A21 A22 B21 B22 Â Ã Â ÃÂ Ã A11B11 A11B12 A12B21 A12B22 : (27.6) D A21B11 A21B12 C A22B21 A22B22 Â Ã Â Ã Thus, to multiply two n n matrices, we perform eight multiplications of n=2 n=2 ! ! matrices and one addition of n n matrices. The following pseudocode implements ! 794 Chapter 27 Multithreaded Algorithms
this divide-and-conquer strategy using nested parallelism. Unlike the SQUARE- MATRIX-MULTIPLY-RECURSIVE procedure on which it is based, P-MATRIX- TheMULTIPLY Simple-RECURSIVE Divide&Conquertakes the output matrix as aApproach parameter to avoid in allocating Parallel matrices unnecessarily.
P-MATRIX-MULTIPLY-RECURSIVE.C; A; B/ 1 n A:rows D 2 if n == 1 3 c11 a11b11 D 4 else let T be a new n n matrix ! 5partitionA, B, C ,andT into n=2 n=2 submatrices ! A11;A12;A21;A22; B11;B12;B21;B22; C11;C12;C21;C22; and T11;T12;T21;T22;respectively 6 spawn P-MATRIX-MULTIPLY-RECURSIVE.C11;A11;B11/ 7 spawn P-MATRIX-MULTIPLY-RECURSIVE.C12;A11;B12/ 8 spawn P-MATRIX-MULTIPLY-RECURSIVE.C21;A21;B11/ 9 spawn P-MATRIX-MULTIPLY-RECURSIVE.C22;A21;B12/ 10 spawn P-MATRIX-MULTIPLY-RECURSIVE.T11;A12;B21/ 11 spawn P-MATRIX-MULTIPLY-RECURSIVE.T12;A12;B22/ 12 spawn P-MATRIX-MULTIPLY-RECURSIVE.T21;A22;B21/ 13 P-MATRIX-MULTIPLY-RECURSIVE.T22;A22;B22/ 14 sync 15 parallel for i 1 to n D 16 parallel for j 1 to n D 17 cij cij tij D C The same as before. Line 3 handles the base case, where we are multiplying 1 1 matrices. We handle ! the recursive case in lines 4–17. We allocate a temporary matrix T in line 4,3 and 2 P-MATRIXline 5 partitions-MULTIPLY each of-R theECURSIVE matrices A, B,hasC ,and workT intoTn=21(n)n=2 =submatrices. Θ(n ) and span T∞(n) = Θ(log n). ! (As with SQUARE-MATRIX-MULTIPLY-RECURSIVE on page 77, we gloss over the minor issue of how to use index calculations to represent submatrix sections C of a matrix.) The recursive call in line 6 sets the submatrixT∞11 (ton the) = submatrixT∞(n/2) + Θ(log n) product A11B11,sothatC11 equals the first of the two terms that form its sum in equation (27.6). Similarly, lines 7–9 set C12, C21,andC22 to the first of the two terms that equal theirII. sums Matrix in Multiplication equation (27.6). Line Multithreaded 10 sets the Matrix submatrix MultiplicationT11 to 22 the submatrix product A12B21,sothatT11 equals the second of the two terms that form C11’s sum. Lines 11–13 set T12, T21,andT22 to the second of the two terms that form the sums of C12, C21,andC22,respectively.Thefirstsevenrecursive calls are spawned, and the last one runs in the main strand. The sync statement in line 14 ensures that all the submatrix products in lines 6–13 have been computed, Strassen’s Algorithm in Parallel
Strassen’s Algorithm (parallelised) 1. Partition each of the matrices into four n/2 × n/2 submatrices This step takes Θ(1) work and span by index calculations.
2. Create 10 matrices S1, S2,..., S10. Each is n/2 × n/2 and is the sum or difference of two matrices created in the previous step.
Can create all 10 matrices with Θ(n2) work and Θ(log n) span using doubly nested parallel for loops.
3. Recursively compute 7 matrix products P1, P2,..., P7, each n/2 × n/2 Recursively spawn the computation of the seven products.
4. Compute n/2 × n/2 submatrices of C by adding and subtracting various combinations of the Pi . log 7 Using doubly nested parallel for T1(n) = Θ(n ) 2 2 this takes Θ(n ) work and Θ(log n) span. T∞(n) = Θ(log n)
II. Matrix Multiplication Multithreaded Matrix Multiplication 23 Matrix Multiplication and Matrix Inversion
Speedups for Matrix Inversion by an equivalence with Matrix Multiplication.
Theorem 28.1 (Multiplication is no harder than Inversion) If we can invert an n × n matrix in time I(n), where I(n) = Ω(n2) and I(n) satisfies the regularity condition I(3n) = O(I(n)), then we can multiply two n × n matrices in time O(I(n)).
Proof: Define a 3n × 3n matrix D by: In A 0 In −AAB −1 D = 0 In B ⇒ D = 0 In −B . 0 0 In 0 0 In
Matrix D can be constructed in Θ(n2) = O(I(n)) time, and we can invert D in O(I(3n)) = O(I(n)) time. ⇒ We can compute AB in O(I(n)) time.
II. Matrix Multiplication Multithreaded Matrix Multiplication 24 The Other Direction
Theorem 28.1 (Multiplication is no harder than Inversion) If we can invert an n × n matrix in time I(n), where I(n) = Ω(n2) and I(n) satisfies the regularity condition I(3n) = O(I(n)), then we can multiply two n × n matrices in time O(I(n)).
Allows us to use Strassen’s Algorithm to invert a matrix!
Theorem 28.2 (Inversion is no harder than Multiplication) Suppose we can multiply two n × n real matrices in time M(n) and M(n) satisfies the two regularity conditions M(n + k) = O(M(n)) for any 0 ≤ k ≤ n and M(n/2) ≤ c · M(n) for some constant c < 1/2. Then we can compute the inverse of any real nonsingular n×n matrix in time O(M(n)).
Proof of this directon much harder (CLRS) – relies on properties of SPD matrices.
II. Matrix Multiplication Multithreaded Matrix Multiplication 25 III. Linear Programming Thomas Sauerwald
Easter 2015 Outline
Introduction
Standard and Slack Forms
Formulating Problems as Linear Programs
Simplex Algorithm
Finding an Initial Solution
III. Linear Programming Introduction 2 Introduction
Linear Programming (informal definition) maximize or minimize an objective, given limited resources and competing constraint constraints are specified as (in)equalities
Example: Political Advertising Imagine you are a politician trying to win an election Your district has three different types of areas: Urban, suburban and rural, each with, respectively, 100,000, 200,000 and 50,000 registered voters Aim: at least half of the registered voters in each of the three regions should vote for you Possible Actions: Advertise on one of the primary issues which are (i) building more roads, (ii) gun control, (iii) farm subsidies and (iv) a gasoline tax dedicated to improve public transit.
III. Linear Programming Introduction 3 Political Advertising Continued
policy urban suburban rural build roads −2 5 3 gun control 8 2 −5 farm subsidies 0 0 10 gasoline tax 10 0 −2 The effects of policies on voters. Each entry describes the number of thousands of voters who could be won over by spending $1,000 on advertising support of a policy on a particular issue.
Possible Solution: $20,000 on advertising to building roads $0 on advertising to gun control $4,000 on advertising to farm subsidies $9,000 on advertising to a gasoline tax Total cost: $33,000
What is the best possible strategy?
III. Linear Programming Introduction 4 Towards a Linear Program
policy urban suburban rural build roads −2 5 3 gun control 8 2 −5 farm subsidies 0 0 10 gasoline tax 10 0 −2 The effects of policies on voters. Each entry describes the number of thousands of voters who could be won (lost) over by spending $1,000 on advertising support of a policy on a particular issue.
x1 = number of thousands of dollars spent on advertising on building roads
x2 = number of thousands of dollars spent on advertising on gun control
x3 = number of thousands of dollars spent on advertising on farm subsidies
x4 = number of thousands of dollars spent on advertising on gasoline tax Constraints:
−2x1 + 8x2 + 0x3 + 10x4 ≥ 50 Objective: Minimize x1 + x2 + x3 + x4 5x1 + 2x2 + 0x3 + 0x4 ≥ 100
3x1 − 5x2 + 10x3 − 2x4 ≥ 25
III. Linear Programming Introduction 5 The Linear Program Linear Program for the Advertising Problem
minimize x1 + x2 + x3 + x4 subject to −2x1 + 8x2 + 0x3 + 10x4 ≥ 50 5x1 + 2x2 + 0x3 + 0x4 ≥ 100 3x1 − 5x2 + 10x3 − 2x4 ≥ 25 x1, x2, x3, x4 ≥ 0 The solution of this linear program yields the optimal advertising strategy.
Formal Definition of Linear Program
Given a1, a2,..., an and a set of variables x1, x2,..., xn, a linear function f is defined by
f (x1, x2,..., xn) = a1x1 + a2x2 + ··· + anxn.
Linear Equality: f (x1, x2,..., xn) = b ≥ Linear Constraints Linear Inequality: f (x1, x2,..., xn) ≤ b Linear-Progamming Problem: either minimize or maximize a linear function subject to a set of linear constraints
III. Linear Programming Introduction 6 x 1 + x x 1 + 2 = x 8 x 2 = 1 + x 7 1 x + 2 = x x 1 6 + 2 = x x 2 5 1 + = x x 4 2 1 + = x x 3 1 + 2 = x 1 x 2 Graphical Procedure: Move the line + 2 = x 1 x1 + x2 = z as far up as possible. 2 = 0
While the same approach also works for higher-dimensions, we need to take a more systematic and algebraic procedure.
A Small(er) Example
x2 2
≥ − 8 2 x 2 ≤
2 − x
1 x − maximize x1 + x2 5 1 x
subject to 4 4x1 − x2 ≤ 8 2x1 + x2 ≤ 10 5x − 2x ≥ −2 x1 ≥ 0 1 2 2 x x1, x2 ≥ 0 1 +
x 2 ≤ Any setting of x1 and x2 satisfying all constraints is a feasible solution 10 x1 x2 ≥ 0
III. Linear Programming Introduction 7 x 1 + x x 2 = 1 + x 7 1 x + 2 = x x 1 6 + 2 = x x 2 5 1 + = x x 4 2 1 + = x x 3 1 + 2 = x 1 x 2 2 Any setting of x1 and x2 satisfying + = x 1 all constraints is a feasible solution 2 = 0
A Small(er) Example
x2 2
x 1 ≥ − + 8 2 x x ≤ 2 = 2 2 8 − x 1 x − maximize x1 + x2 5 1 x
subject to 4 4x1 − x2 ≤ 8 2x1 + x2 ≤ 10 5x − 2x ≥ −2 x1 ≥ 0 1 2 2 x x1, x2 ≥ 0 1 +
x 2 Graphical Procedure: Move the line ≤ 10 x1 + x2 = z as far up as possible. x1 x2 ≥ 0
While the same approach also works for higher-dimensions, we need to take a more systematic and algebraic procedure.
III. Linear Programming Introduction 7 Outline
Introduction
Standard and Slack Forms
Formulating Problems as Linear Programs
Simplex Algorithm
Finding an Initial Solution
III. Linear Programming Standard and Slack Forms 8 Standard and Slack Forms
Standard Form
n X maximize cj xj Objective Function j=1 subject to n X aij xj ≤ bi for i = 1, 2,..., m n + m Constraints j=1
xj ≥ 0 for j = 1, 2,..., n
Non-Negativity Constraints
Standard Form (Matrix-Vector-Notation)
maximize cT x Inner product of two vectors subject to Ax ≤ b Matrix-vector product x ≥ 0
III. Linear Programming Standard and Slack Forms 9 Converting Linear Programs into Standard Form
Reasons for a LP not being in standard form: 1. The objective might be a minimization rather than maximization. 2. There might be variables without nonnegativity constraints. 3. There might be equality constraints 4. There might be inequality constraints (with ≤ instead of ≥)
Goal: Convert linear program into an equivalent program which is in standard form
Equivalence: a correspondence (not necessarily a bijection) between solutions so that their objective values are identical.
When switching from maximization to minimization, sign of objective value changes.
III. Linear Programming Standard and Slack Forms 10 Converting into Standard Form (1/5)
Reasons for a LP not being in standard form: 1. The objective might be a minimization rather than maximization.
minimize −2x1 + 3x2 subject to x1 + x2 = 7 x1 − 2x2 ≤ 4 x1 ≥ 0
Negate objective function
maximize 2x1 − 3x2 subject to x1 + x2 = 7 x1 − 2x2 ≤ 4 x1 ≥ 0
III. Linear Programming Standard and Slack Forms 11 Converting into Standard Form (2/5)
Reasons for a LP not being in standard form: 2. There might be variables without nonnegativity constraints.
maximize 2x1 − 3x2 subject to x1 + x2 = 7 x1 − 2x2 ≤ 4 x1 ≥ 0
Replace x2 by two non-negative 0 00 variables x2 and x2
0 00 maximize 2x1 − 3x2 + 3x2 subject to 0 00 x1 + x2 − x2 = 7 0 00 x1 − 2x2 + 2x2 ≤ 4 0 00 x1, x2, x2 ≥ 0
III. Linear Programming Standard and Slack Forms 12 Converting into Standard Form (3/5)
Reasons for a LP not being in standard form: 3. There might be equality constraints
0 00 maximize 2x1 − 3x2 + 3x2 subject to 0 00 x1 + x2 − x2 = 7 0 00 x1 − 2x2 + 2x2 ≤ 4 0 00 x1, x2, x2 ≥ 0
Replace each equality by two inequalities.
0 00 maximize 2x1 − 3x2 + 3x2 subject to 0 00 x1 + x2 − x2 ≥ 7 0 00 x1 + x2 − x2 ≤ 7 0 00 x1 − 2x2 + 2x2 ≤ 4 0 00 x1, x2, x2 ≥ 0
III. Linear Programming Standard and Slack Forms 13 Converting into Standard Form (4/5)
Reasons for a LP not being in standard form: 4. There might be inequality constraints (with ≤ instead of ≥)
0 00 maximize 2x1 − 3x2 + 3x2 subject to 0 00 x1 + x2 − x2 ≥ 7 0 00 x1 + x2 − x2 ≤ 7 0 00 x1 − 2x2 + 2x2 ≤ 4 0 00 x1, x2, x2 ≥ 0
Negate respective inequalities.
0 00 maximize 2x1 − 3x2 + 3x2 subject to 0 00 x1 + x2 − x2 ≥ 7 0 00 −x1 − x2 + x2 ≥ −7 0 00 x1 − 2x2 + 2x2 ≤ 4 0 00 x1, x2, x2 ≥ 0
III. Linear Programming Standard and Slack Forms 14 Converting into Standard Form (5/5)
Rename variable names (for consistency).
maximize 2x1 − 3x2 + 3x3 subject to x1 + x2 − x3 ≥ 7 −x1 − x2 + x3 ≥ −7 x1 − 2x2 + 2x3 ≤ 4 x1, x2, x3 ≥ 0
It is always possible to convert a linear program into standard form.
III. Linear Programming Standard and Slack Forms 15 Converting Standard Form into Slack Form (1/3)
Goal: Convert standard form into slack form, where all constraints except for the non-negativity constraints are equalities.
For the simplex algorithm, it is more con- venient to work with equality constraints.
Introducing Slack Variables Pn Let j=1 aij xj ≤ bi be an inequality constraint Introduce a slack variable s by
n X s = b − a x s measures the slack between i ij j j=1 the two sides of the inequality. s ≥ 0.
Denote slack variable of the ith inequality by xn+i
III. Linear Programming Standard and Slack Forms 16 Converting Standard Form into Slack Form (2/3)
maximize 2x1 − 3x2 + 3x3 subject to x1 + x2 − x3 ≥ 7 −x1 − x2 + x3 ≥ −7 x1 − 2x2 + 2x3 ≤ 4 x1, x2, x3 ≥ 0
Introduce slack variables
maximize 2x1 − 3x2 + 3x3 subject to x4 = 7 − x1 − x2 + x3 x5 = −7 + x1 + x2 − x3 x6 = 4 − x1 + 2x2 − 2x3 x1, x2, x3, x4, x5, x6 ≥ 0
III. Linear Programming Standard and Slack Forms 17 Converting Standard Form into Slack Form (3/3)
maximize 2x1 − 3x2 + 3x3 subject to x4 = 7 − x1 − x2 + x3 x5 = −7 + x1 + x2 − x3 x6 = 4 − x1 + 2x2 − 2x3 x1, x2, x3, x4, x5, x6 ≥ 0 Use variable z to denote objective function and omit the nonnegativity constraints.
z = 2x1 − 3x2 + 3x3 x4 = 7 − x1 − x2 + x3 x5 = −7 + x1 + x2 − x3 x6 = 4 − x1 + 2x2 − 2x3
This is called slack form.
III. Linear Programming Standard and Slack Forms 18 Basic and Non-Basic Variables
z = 2x1 − 3x2 + 3x3 x4 = 7 − x1 − x2 + x3 x5 = −7 + x1 + x2 − x3 x6 = 4 − x1 + 2x2 − 2x3
Basic Variables: B = {4, 5, 6} Non-Basic Variables: N = {1, 2, 3}
Slack Form (Formal Definition) Slack form is given by a tuple (N, B, A, b, c, v) so that X z = v + cj xj j∈N X xi = bi − aij xj for i ∈ B, j∈N
and all variables are non-negative. Variables on the right hand side are indexed by the entries of N.
III. Linear Programming Standard and Slack Forms 19 Slack Form (Example)
x 2x z = 28 − 3 − x5 − 6 6 6 3 x x x = 8 + 3 + x5 − 6 1 6 6 3 8x x x = 4 − 3 − 2x5 + 6 2 3 3 3 x x = 18 − 3 + x5 4 2 2 Slack Form Notation B = {1, 2, 4}, N = {3, 5, 6}
a13 a15 a16 −1/6 −1/6 1/3 A = a23 a25 a26 = 8/3 2/3 −1/3 a43 a45 a46 1/2 −1/2 0
b1 T T b = b2 = 8 4 18 , c = c3 c5 c6 = −1/6 −1/6 −2/3 b3
v = 28
III. Linear Programming Standard and Slack Forms 20 The Structure of Optimal Solutions
Definition A point x is a vertex if it cannot be represented as a strict convex combi- nation of two other points in the feasible set. The set of feasible solutions is a convex set. Theorem If there exists an optimal solution, it occurs at a vertex of the polygon.
Proof: Let x be an optimal solution which is not a vertex x2 ⇒ ∃ vector d so that x − d and x + d are feasible Since A(x + d) = b and Ax = b ⇒ Ad = 0 W.l.o.g. assume cT d ≤ 0 (otherwise replace d by −d) Consider x + λd as a function of λ ≥ 0
Case 1: There exists j with dj < 0 x + d Increase λ from 0 to λ0 until a new entry of x + λd x becomes zero x − d x + λ0d feasible, since A(x + λ0d) = Ax = b and x + λ0d ≥ 0 cT (x + λ0d) = cT x + cT λ0d ≤ cT x x1
III. Linear Programming Standard and Slack Forms 21 The Structure of Optimal Solutions
Definition A point x is a vertex if it cannot be represented as a strict convex combi- nation of two other points in the feasible set. The set of feasible solutions is a convex set. Theorem If there exists an optimal solution, it occurs at a vertex of the polygon.
Proof: Let x be an optimal solution which is not a vertex x2 ⇒ ∃ vector d so that x − d and x + d are feasible Since A(x + d) = b and Ax = b ⇒ Ad = 0 W.l.o.g. assume cT d ≤ 0 (otherwise replace d by −d) Consider x + λd as a function of λ ≥ 0
Case 2: For all j, dj ≥ 0 x + d x + λd is feasible for all λ ≥ 0: A(x + λd) = b and x x + λd ≥ x ≥ 0 x − d If λ → ∞, then cT (x + λd) → ∞ ⇒ This contradicts the assumption that there exists an optimal solution. x1
III. Linear Programming Standard and Slack Forms 21 Outline
Introduction
Standard and Slack Forms
Formulating Problems as Linear Programs
Simplex Algorithm
Finding an Initial Solution
III. Linear Programming Formulating Problems as Linear Programs 22 Shortest Paths
a 5 d Single-Pair Shortest Path Problem 6 4 1 Given: directed graph G = (V , E) with + s b 4 e t edge weights w : E → R , pair of 2 3 vertices s, t ∈ V −2 Goal: Find a path of minimum weight 2 from s to t in G 2 5 1 c f p = (v0 = s, v1,..., vk = t) such that Pk w(p) = i=1 w(vk−1, vk ) is minimized.
Shortest Paths as LP Recall: When BELLMAN-FORD terminates,
maximize dt all these inequalities are satisfied. subject to dv ≤ du + w(u, v) for each edge (u, v) ∈ E, d = 0. this is a maxi- s mization problem! n o Solution d satisfies d v = minu :(u,v)∈E d u + w(u, v)
III. Linear Programming Formulating Problems as Linear Programs 23 0/4
0/8 0/10 0/10 0/2 0/6 0/10 0/9 0/10
Maximum Flow
Maximum Flow Problem + Given: directed graph G = (V , E) with edge capacities c : E → R , pair of vertices s, t ∈ V Goal: Find a maximum flow f : V × V → R from s to t which satisfies the capacity constraints and flow conservation
4/4 2 4 |f | = 19 6/8 9/10 10/10 0/2 5/6 9/10 9/9 10/10 s 3 5 t
Maximum Flow as LP P P maximize v∈V fsv − v∈V fvs subject to fuv ≤ c(u, v) for each u, v ∈ V , P P v∈V fvu = v∈V fuv for each u ∈ V \{s, t}, fuv ≥ 0 for each u, v ∈ V .
III. Linear Programming Formulating Problems as Linear Programs 24 Minimum-Cost Flow
Generalization of the Maximum Flow Problem Minimum-Cost-Flow Problem + Given: directed graph G = (V , E) with capacities c : E → R , pair of + vertices s, t ∈ V , cost function a : E → R , flow demand of d units Goal: Find a flow f : V × V → R from s to t with |f | = d while P minimising the total cost (u,v)∈E a(u, v)fuv incurrred by the flow. Optimal Solution with total cost: P 862 Chapter 29 Linear Programming(u,v)∈E a(u, v)fuv = (2·2)+(5·2)+(3·1)+(7·1)+(1·3) = 27
c x = 2 x 1/2 c = 5 a = 7 2/5 a = 7 a = 2 a = 2 c = 1 1/1 s t s t a = 3 a = 3 c = 2 2/2 a = 5 y c = 4 a = 5 y 3/4 a = 1 a = 1 (a) (b)
Figure 29.3 (a) An example of a minimum-cost-flow problem. We denote the capacities by c and the costs by a.Vertexs is the source and vertex t is the sink, and we wish to send 4 units of flow from s to t. (b) Asolutiontotheminimum-costflowprobleminwhich4 units of flow are sent from s to t.Foreachedge,theflowandcapacityarewrittenasflow/capacity.
III. Linear Programming Formulating Problems as Linear Programs 25 straint that the value of the flow be exactly d units, and with the new objective function of minimizing the cost:
minimize a.u; !/fu! (29.51) .u;!/ E subject to X2 fu! c.u;!/ for each u; ! V; Ä 2 f!u fu! 0 for each u V s; t ; " D 2 " f g ! V ! V X2 X2 f f d; s! " !s D ! V ! V X2 X2 fu! 0 for each u; ! V: (29.52) # 2
Multicommodity flow As a final example, we consider another flow problem. Suppose that the Lucky Puck company from Section 26.1 decides to diversify its product line and ship not only hockey pucks, but also hockey sticks and hockey helmets. Each piece of equipment is manufactured in its own factory, has its own warehouse, and must be shipped, each day, from factory to warehouse. The sticks are manufactured in Vancouver and must be shipped to Saskatoon, and the helmets are manufactured in Edmonton and must be shipped to Regina. The capacity of the shipping network does not change, however, and the different items, or commodities,mustsharethe same network. This example is an instance of a multicommodity-flow problem.Inthisproblem, we are again given a directed graph G .V; E/ in which each edge .u; !/ E D 2 has a nonnegative capacity c.u;!/ 0.Asinthemaximum-flowproblem,weim- # plicitly assume that c.u;!/ 0 for .u; !/ E,andthatthegraphhasnoantipar- D 62 Minimum-Cost Flow as a LP
Minimum Cost Flow as LP P minimize (u,v)∈E a(u, v)fuv subject to
fuv ≤ c(u, v) for each u, v ∈ V , P P v∈V fvu − v∈V fuv = 0 for each u ∈ V \{s, t}, P P v∈V fsv − v∈V fvs = d , fuv ≥ 0 for each u, v ∈ V .
Real power of Linear Programming comes from the ability to solve new problems!
III. Linear Programming Formulating Problems as Linear Programs 26 Outline
Introduction
Standard and Slack Forms
Formulating Problems as Linear Programs
Simplex Algorithm
Finding an Initial Solution
III. Linear Programming Simplex Algorithm 27 Simplex Algorithm: Introduction
Simplex Algorithm classical method for solving linear programs (Dantzig, 1947) usually fast in practice although worst-case runtime not polynomial iterative procedure somewhat similar to Gaussian elimination
Basic Idea: Each iteration corresponds to a “basic solution” of the slack form All non-basic variables are 0, and the basic variables are determined from the equality constraints Each iteration converts one slack form into an equivalent one while the objective value will not decrease In that sense, it is a greedy algorithm. Conversion (“pivoting”) is achieved by switching the roles of one basic and one non-basic variable
III. Linear Programming Simplex Algorithm 28 Extended Example: Conversion into Slack Form
maximize 3x1 + x2 + 2x3 subject to x1 + x2 + 3x3 ≤ 30 2x1 + 2x2 + 5x3 ≤ 24 4x1 + x2 + 2x3 ≤ 36 x1, x2, x3 ≥ 0
Conversion into slack form
z = 3x1 + x2 + 2x3 x4 = 30 − x1 − x2 − 3x3 x5 = 24 − 2x1 − 2x2 − 5x3 x6 = 36 − 4x1 − x2 − 2x3
III. Linear Programming Simplex Algorithm 29 All coefficientsIncreasingIncreasingare theIncreasing negative,value the of value thex2 andwould value of hencex1 of increasewouldx this3 would increasebasic the increase objective solution the objective the is value.optimal objective value.! value.
BasicTheThe second solution: third constraint constraint(x1, x2,..., is is the thex6 tightest) tightest = (339, 0 and, and03, 21 limits69 limits, 6, 0 how) howwith much much objective we we can can value increase increase 27111 x13x.2. BasicBasic solution: solution:(x1,(xx21,...,, x2,...,x6)x =6) ( =4 , (08,, 42 , 04,,180,,00), 0with) with objective objective value value4 28= 27.75
Switch roles of x132 and x635:
Solving for x132 yields:
3 83xx22 2xx3x55 xx6x66 xx2x31===49−−− −− −−+ ... 2 348 234 483
Substitute this into x321 in the other three equations
Extended Example: Iteration 1
z = 3x1 + x2 + 2x3
x4 = 30 − x1 − x2 − 3x3
x5 = 24 − 2x1 − 2x2 − 5x3
x6 = 36 − 4x1 − x2 − 2x3
Basic solution: (x1, x2,..., x6) = (0, 0, 0, 30, 24, 36)
This basic solution is feasible Objective value is 0.
III. Linear Programming Simplex Algorithm 30 All coefficientsIncreasingare theIncreasing negative,value of thex2 andwould value hence of increasex this3 would basic the increase objective solution the is value.optimal objective! value.
BasicTheBasicThe second solution: third solution: constraint constraint(x1(,xx12,,...,x is2 is,..., the thex6 tightest)x tightest6 =) (=339, (0 and,, and03,,210 limits69, limits30, 6,,024 how) how,with36 much) much objective we we can can value increase increase 27111 x3x.2. BasicBasic solution: solution:(x1,(xx21,...,, x2,...,x6)x =6) ( =4 , (08,, 42 , 04,,180,,00), 0with) with objective objective value value4 28= 27.75
This basicSwitch solution roles is feasible of x32 and x35Objective: value is 0.
Solving for x32 yields:
3 83xx22 2xx55 xx66 xx23 == 4 −− −− −+ .. 2 38 34 83
Substitute this into x32 in the other three equations
Extended Example: Iteration 1
Increasing the value of x1 would increase the objective value.
z = 3x1 + x2 + 2x3
x4 = 30 − x1 − x2 − 3x3
x5 = 24 − 2x1 − 2x2 − 5x3
x6 = 36 − 4x1 − x2 − 2x3
The third constraint is the tightest and limits how much we can increase x1.
Switch roles of x1 and x6:
Solving for x1 yields:
x2 x3 x6 x1 = 9 − − − . 4 2 4
Substitute this into x1 in the other three equations
III. Linear Programming Simplex Algorithm 30 All coefficientsIncreasingIncreasingare the negative,value the of valuex2 andwould of hencex1 increasewould this increasebasic the objective solution the objective is value.optimal value.!
TheBasicThe second third solution: constraint constraint(x1, x is2 is,..., the the tightestx tightest6) =33 (0 and, and03, 0 limits69, limits30, 24 how how, 36 much) much we we can can increase increase111 x13x.2. BasicBasic solution: solution:(x1,(xx21,...,, x2,...,x6)x =6) ( =4 , (08,, 42 , 04,,180,,00), 0with) with objective objective value value4 28= 27.75 Objective value is 0. This basicSwitch solution roles is feasible of x132 and x635:
Solving for x132 yields:
3 83xx22 2xx3x55 xx6x66 xx2x31===49−−− −− −−+ ... 2 348 234 483
Substitute this into x321 in the other three equations
Extended Example: Iteration 2
Increasing the value of x3 would increase the objective value.
x x 3x z = 27 + 2 + 3 − 6 4 2 4 x x x x = 9 − 2 − 3 − 6 1 4 2 4 3x 5x x x = 21 − 2 − 3 + 6 4 4 2 4 3x x x = 6 − 2 − 4x + 6 5 2 3 2
Basic solution: (x1, x2,..., x6) = (9, 0, 0, 21, 6, 0) with objective value 27
III. Linear Programming Simplex Algorithm 30 All coefficientsIncreasingIncreasingare theIncreasing negative,value the of value thex2 andwould value of hencex1 of increasewouldx this3 would increasebasic the increase objective solution the objective the is value.optimal objective value.! value.
BasicTheBasicThe second solution: third solution: constraint constraint(x1(,xx12,,...,x is2 is,..., the thex6 tightest)x tightest6 =) (=339, (0 and,, and03,,210 limits69, limits30, 6,,024 how) how,with36 much) much objective we we can can value increase increase 27111 x1x.2. BasicBasic solution: solution:(x1,(xx21,...,, x2,...,x6)x =6) ( =4 , (08,, 42 , 04,,180,,00), 0with) with objective objective value value4 28= 27.75 Objective value is 0. This basicSwitch solution roles is feasible of x12 and x63:
Solving for x12 yields:
8x2 2x3x5 x6x6 x2x1==49−− − −+ . . 34 23 43
Substitute this into x21 in the other three equations
Extended Example: Iteration 2
x x 3x z = 27 + 2 + 3 − 6 4 2 4 x x x x = 9 − 2 − 3 − 6 1 4 2 4 3x 5x x x = 21 − 2 − 3 + 6 4 4 2 4 3x x x = 6 − 2 − 4x + 6 5 2 3 2
The third constraint is the tightest and limits how much we can increase x3.
Switch roles of x3 and x5:
Solving for x3 yields:
3 3x2 x5 x6 x3 = − − − . 2 8 4 8
Substitute this into x3 in the other three equations
III. Linear Programming Simplex Algorithm 30 All coefficientsIncreasing areIncreasing negative, the value the and value of hencex1 ofwouldx this3 would increasebasic increase solution the objective the is optimal objective value.! value.
BasicTheBasicThe second solution: third solution: constraint constraint(x1(,xx12,,...,x is2 is,..., the thex6 tightest)x tightest6 =) (=9, (0 and,, and0,,210 limits, limits30, 6,,024 how) how,with36 much) much objective we we can can value increase increase 27 x13x.2. Basic solution: (x1, x2,..., x6) = (8, 4, 0, 18, 0, 0) with objective value 28
Objective value is 0. This basicSwitch solution roles is feasible of x132 and x635:
Solving for x132 yields:
3 83xx22 2xx3x55 xx6x66 xx2x31===49−−− −− −−+ ... 2 348 234 483
Substitute this into x321 in the other three equations
Extended Example: Iteration 3
Increasing the value of x2 would increase the objective value.
x 11x z = 111 + 2 − x5 − 6 4 16 8 16 x 5x x = 33 − 2 + x5 − 6 1 4 16 8 16 3x x x = 3 − 2 − x5 + 6 3 2 8 4 8 3x x x = 69 + 2 + 5x5 − 6 4 4 16 8 16
33 3 69 111 Basic solution: (x1, x2,..., x6) = ( 4 , 0, 2 , 4 , 0, 0) with objective value 4 = 27.75
III. Linear Programming Simplex Algorithm 30 All coefficientsIncreasingIncreasingare theIncreasing negative,value the of value thex2 andwould value of hencex1 of increasewouldx this3 would increasebasic the increase objective solution the objective the is value.optimal objective value.! value.
BasicBasicThe solution: third solution: constraint(x1(,xx12,,...,x is2,..., thex6 tightest)x6 =) (=339, (0 and,,03,,210 limits69, 30, 6,,024 how) ,with36 much) objective we can value increase 27111 x13. BasicBasic solution: solution:(x1,(xx21,...,, x2,...,x6)x =6) ( =4 , (08,, 42 , 04,,180,,00), 0with) with objective objective value value4 28= 27.75 Objective value is 0. This basicSwitch solution roles is feasible of x13 and x65:
Solving for x13 yields:
3 3xx22 xx35 xx66 xx31== 9−− −− −− .. 2 48 24 48
Substitute this into x31 in the other three equations
Extended Example: Iteration 3
x 11x z = 111 + 2 − x5 − 6 4 16 8 16 x 5x x = 33 − 2 + x5 − 6 1 4 16 8 16 3x x x = 3 − 2 − x5 + 6 3 2 8 4 8 3x x x = 69 + 2 + 5x5 − 6 4 4 16 8 16
The second constraint is the tightest and limits how much we can increase x2.
Switch roles of x2 and x3:
Solving for x2 yields:
8x2 2x5 x6 x2 = 4 − − + . 3 3 3
Substitute this into x2 in the other three equations
III. Linear Programming Simplex Algorithm 30 IncreasingIncreasing theIncreasing value the of value thex2 would value of x1 of increasewouldx3 would increase the increase objective the objective the value. objective value. value.
BasicTheBasicThe second solution: third solution: constraint constraint(x1(,xx12,,...,x is2 is,..., the thex6 tightest)x tightest6 =) (=339, (0 and,, and03,,210 limits69, limits30, 6,,024 how) how,with36 much) much objective we we can can value increase increase 27111 x13x.2. Basic solution: (x1, x2,..., x6) = ( 4 , 0, 2 , 4 , 0, 0) with objective value 4 = 27.75 Objective value is 0. This basicSwitch solution roles is feasible of x132 and x635:
Solving for x132 yields:
3 83xx22 2xx3x55 xx6x66 xx2x31===49−−− −− −−+ ... 2 348 234 483
Substitute this into x321 in the other three equations
Extended Example: Iteration 4
All coefficients are negative, and hence this basic solution is optimal!
x 2x z = 28 − 3 − x5 − 6 6 6 3 x x x = 8 + 3 + x5 − 6 1 6 6 3 8x x x = 4 − 3 − 2x5 + 6 2 3 3 3 x x = 18 − 3 + x5 4 2 2
Basic solution: (x1, x2,..., x6) = (8, 4, 0, 18, 0, 0) with objective value 28
III. Linear Programming Simplex Algorithm 30 Extended Example: Visualization of SIMPLEX
x3
x2 (0, 12, 0) 12 (0, 0, 4.8) 9.6
(0, 0, 0) (8, 4, 0) 0 (8.25, 0, 1.5) 28 27.75
x1 (9, 0, 0) 27
III. Linear Programming Simplex Algorithm 31 Extended Example: Alternative Runs (1/2)
z = 3x1 + x2 + 2x3
x4 = 30 − x1 − x2 − 3x3
x5 = 24 − 2x1 − 2x2 − 5x3
x6 = 36 − 4x1 − x2 − 2x3
Switch roles of x2 and x5 x x z = 12 + 2x − 3 − 5 1 2 2 5x x x = 12 − x − 3 − 5 2 1 2 2 x x x = 18 − x − 3 + 5 4 2 2 2 x x x = 24 − 3x + 3 + 5 6 1 2 2
Switch roles of x1 and x6
x x 2x z = 28 − 3 − 5 − 6 6 6 3 x x x x = 8 + 3 + 5 − 6 1 6 6 3 8x 2x x x = 4 − 3 − 5 + 6 2 3 3 3 x x x = 18 − x − 3 + 5 4 2 2 2
III. Linear Programming Simplex Algorithm 32 Extended Example: Alternative Runs (2/2)
z = 3x1 + x2 + 2x3
x4 = 30 − x1 − x2 − 3x3
x5 = 24 − 2x1 − 2x2 − 5x3
x6 = 36 − 4x1 − x2 − 2x3
Switch roles of x3 and x5
11x x 2x z = 48 + 1 + 2 − 5 5 5 5 5 x x 3x x = 78 + 1 + 2 + 5 4 5 5 5 5 2x 2x x x = 24 − 1 − 2 − 5 3 5 5 5 5 16x x 2x x = 132 − 1 − 2 + 3 6 5 5 5 5
Switch roles of x1 and x6 Switch roles of x2 and x3
111 x2 x5 11x6 x x 2x z = + − − z = 28 − 3 − 5 − 6 4 16 8 16 6 6 3 x x 5x x x x x = 33 − 2 + 5 − 6 x = 8 + 3 + 5 − 6 1 4 16 8 16 1 6 6 3 3x x x 8x 2x x x = 3 − 2 − 5 + 6 x = 4 − 3 − 5 + 6 3 2 8 4 8 2 3 3 3 69 3x2 5x5 x6 x3 x5 x = + + − x4 = 18 − + 4 4 16 8 16 2 2
III. Linear Programming Simplex Algorithm 33 29.3 The simplex algorithm 869
necessarily integral. Furthermore, the final solution to a linear program need not be integral; it is purely coincidental that this example has an integral solution.
Pivoting
We now formalize the procedure for pivoting. The procedure P IVOT takes as in- put a slack form, given by the tuple .N; B; A; b; c; !/,theindexl of the leav- ing variable xl ,andtheindexe of the entering variable xe.Itreturnsthetuple .N;B;A; b;c;!/ describing the new slack form. (Recall again that the entries of y y y y y y the m n matrices A and A are actually the negatives of the coefficients that appear The Pivot! Step Formallyy in the slack form.)
PIVOT.N; B; A; b; c; !;l;e/ 1 // Compute the coefficients of the equation for new basic variable xe. 2letA be a new m n matrix y ! 3 be bl =ale y D 4 for each j N e Need that ale 6= 0! Rewrite “tight” equation 2 " f g 5 aej alj =ale for enterring variable xe. y D 6 ael 1=ale y D 7 // Compute the coefficients of the remaining constraints. 8 for each i B l 2 " f g 9 bi bi aiebe y D " y Substituting xe into 10 for each j N e 2 " f g other equations. 11 aij aij aieaej y D " y 12 ail aieael y D " y 13 // Compute the objective function. 14 ! ! cebe y D C y 15 for each j N e Substituting xe into 2 " f g 16 cj cj ceaej objective function. y D " y 17 cl ceael y D " y 18 // Compute new sets of basic and nonbasic variables. 19 N N e l Update non-basic y D " f g [ f g 20 B B l e y D " f g [ f g and basic variables 21 return .N;B;A; b;c;!/ y y y y y y PIVOT works as follows. Lines 3–6 compute the coefficients in the new equation for xe by rewritingIII. the Linear equation Programming that has xl on Simplex the left-hand Algorithm side to instead have xe 34 on the left-hand side. Lines 8–12 update the remaining equations by substituting the right-hand side of this new equation for each occurrence of xe.Lines14–17 do the same substitution for the objective function, and lines 19 and 20 update the Effect of the Pivot Step
Lemma 29.1
Consider a call to PIVOT(N, B, A, b, c, v, l, e) in which ale 6= 0. Let the values returned from the call be (Nb, Bb, Ab, b, bc, vb), and let x denote the basic solution after the call. Then
1. x j = 0 for each j ∈ Nb.
2. x e = bl /ale.
3. x i = bi − aiebe for each i ∈ Bb \{e}.
Proof: 1. holds since the basic solution always sets all non-basic variables to zero. 2. When we set each non-basic variable to 0 in a constraint X xi = bi − baij xj , j∈Nb
we have x i = bi for each i ∈ Bb. Hence x e = be = bl /ale. 3. After the substituting in the other constraints, we have
x i = bi = bi − aiebe.
III. Linear Programming Simplex Algorithm 35 Formalizing the Simplex Algorithm: Questions
Questions: How do we determine whether a linear program is feasible? What do we do if the linear program is feasible, but the initial basic solution is not feasible? How do we determine whether a linear program is unbounded? How do we choose the entering and leaving variables?
Example before was a particularly nice one!
III. Linear Programming Simplex Algorithm 36 Proof is based on the following three-part loop invariant: 1. the slack form is always equivalent to the one returned by INITIALIZE-SIMPLEX,
2. for each i ∈ B, we have bi ≥ 0, 3. the basic solution associated with the (current) slack form is feasible.
Lemma 29.2
Suppose the call to INITIALIZE-SIMPLEX in line 1 returns a slack form for which the basic solution is feasible. Then if SIMPLEX returns a solution, it is a feasible solution. If SIMPLEX returns “unbounded”, the linear program is unbounded.
29.3 The simplex algorithm 871
In Section 29.5, we shall show how to determine whether a problem is feasible, and if so, how to find a slack form in which the initial basic solution is feasible. Therefore, let us assume that we have a procedure INITIALIZE-SIMPLEX.A; b; c/ that takes as input a linear program in standard form, that is, an m n matrix ! A .aij /,anm-vector b .bi /,andann-vector c .cj /.Iftheproblemis D D D infeasible, the procedure returns a message that the program is infeasible and then terminates. Otherwise, the procedure returns a slack form for which the initial basic solution is feasible. The procedure SIMPLEX takes as input a linear program in standard form, as just described. It returns an n-vector x .xj / that is an optimal solution to the linear N D N Theprogram formal described procedure in (29.19)–(29.21). SIMPLEX
SIMPLEX.A; b; c/ Returns a slack form with a 1 .N; B; A; b; c; !/ INITIALIZE-SIMPLEX.A; b; c/ D feasible basic solution (if it exists) 2let be a new vector of length n 3 while some index j N has cj >0 2 Main Loop: 4chooseanindexe N for which ce >0 2 5 for each index i B terminates if all coefficients in 2 6 if aie >0 objective function are negative 7 i bi =aie D Line 4 picks enterring variable 8 else i xe with negative coefficient D1 9chooseanindexl B that minimizes i 2 Lines 6 − 9 pick the tightest 10 if l == 1 constraint, associated with xl 11 return “unbounded” Line 11 returns “unbounded” if 12 else .N; B; A; b; c; !/ PIVOT.N; B; A; b; c; !;l;e/ D there are no constraints 13 for i 1 to n D 14 if i B Line 12 calls PIVOT, switching 2 roles of xl and xe 15 xi bi N D 16 else xi 0 N D 17 return .x1; x2;:::;xn/ N N N Return corresponding solution. The SIMPLEX procedure works as follows. In line 1, it calls the procedure INITIALIZE-SIMPLEX.A; b; c/,describedabove,whicheitherdeterminesthatthe linear program is infeasible or returns a slack form for which the basic solution is feasible. The while loop of lines 3–12 forms the main part of the algorithm. If all coefficients in the objective function are negative, then the while loop terminates. Otherwise, line 4 selects a variable xe,whosecoefficientintheobjectivefunction is positive, as the enteringIII. Linear Programming variable. Although Simplex we may Algorithm choose any such variable as 37 the entering variable, we assume that we use some prespecified deterministic rule. Next, lines 5–9 check each constraint and pick the one that most severely limits the amount by which we can increase xe without violating any of the nonnegativ- Returns a slack form with a feasible basic solution (if it exists)
Main Loop: terminates if all coefficients in objective function are negative Line 4 picks enterring variable xe with negative coefficient Lines 6 − 9 pick the tightest constraint, associated with xl Line 11 returns “unbounded” if there are no constraints
Line 12 calls PIVOT, switching roles of xl and xe
Return corresponding solution.
29.3 The simplex algorithm 871
In Section 29.5, we shall show how to determine whether a problem is feasible, and if so, how to find a slack form in which the initial basic solution is feasible. Therefore, let us assume that we have a procedure INITIALIZE-SIMPLEX.A; b; c/ that takes as input a linear program in standard form, that is, an m n matrix ! A .aij /,anm-vector b .bi /,andann-vector c .cj /.Iftheproblemis D D D infeasible, the procedure returns a message that the program is infeasible and then terminates. Otherwise, the procedure returns a slack form for which the initial basic solution is feasible. The procedure SIMPLEX takes as input a linear program in standard form, as just described. It returns an n-vector x .xj / that is an optimal solution to the linear N D N Theprogram formal described procedure in (29.19)–(29.21). SIMPLEX
SIMPLEX.A; b; c/ 1 .N; B; A; b; c; !/ INITIALIZE-SIMPLEX.A; b; c/ D 2let be a new vector of length n 3 while some index j N has cj >0 2 4chooseanindexe N for which ce >0 2 5 for each index i B 2 6 if aie >0 7 i bi =aie D 8 else i D1 9chooseanindexl B that minimizes i 2 10 if l == 1 11 return “unbounded” 12 else .N; B; A; b; c; !/ PIVOT.N; B; A; b; c; !;l;e/ Proof is based on the followingD three-part loop invariant: 13 for i 1 to n D 1. the14 slackif i formB is always equivalent to the one returned by INITIALIZE-SIMPLEX, 2 15 xi bi 2. for each i ∈NBD, we have bi ≥ 0, 16 else xi 0 N D 3. the17 basicreturn solution.x1; x2;:::; associatedxn/ with the (current) slack form is feasible. N N N The SLemmaIMPLEX 29.2procedure works as follows. In line 1, it calls the procedure .A; b; c/ INITIALIZESuppose-S theIMPLEX call to INITIALIZE,describedabove,whicheitherdeterminesthatthe-SIMPLEX in line 1 returns a slack form for which linearthe basic program solution is infeasible is feasible. or returns Then a slack if S formIMPLEX for whichreturns the abasic solution, solution it is is a feasible feasible.solution. The Ifwhile SIMPLEXloop ofreturns lines 3–12 “unbounded”, forms the main the linearpart of program the algorithm. is unbounded. If all coefficients in the objective function are negative, then the while loop terminates. Otherwise, line 4 selects a variable xe,whosecoefficientintheobjectivefunction is positive, as the enteringIII. Linear Programming variable. Although Simplex we may Algorithm choose any such variable as 37 the entering variable, we assume that we use some prespecified deterministic rule. Next, lines 5–9 check each constraint and pick the one that most severely limits the amount by which we can increase xe without violating any of the nonnegativ- Termination
Degeneracy: One iteration of SIMPLEX leaves the objective value unchanged.
z = x1 + x2 + x3
x4 = 8 − x1 − x2
x5 = x2 − x3
Pivot with x1 entering and x4 leaving
z = 8 + x3 − x4
x1 = 8 − x2 − x4
x5 = x2 − x3
Cycling: Slack forms at two iterations are Pivot with x3 entering and x5 leaving identical, and SIMPLEX fails to terminate!
z = 8 + x2 − x4 − x5
x1 = 8 − x2 − x4
x3 = x2 − x5
III. Linear Programming Simplex Algorithm 38 Termination and Running Time
It is theoretically possible, but very rare in practice.
Cycling:SIMPLEX may fail to terminate.
Anti-Cycling Strategies 1. Bland’s rule: Choose entering variable with smallest index 2. Random rule: Choose entering variable uniformly at random 3. Perturbation: Perturb the input slightly so that it is impossible to have two solutions with the same objective value
Replace each bi by bi = bi + i , where i i+1 are all small.
Lemma 29.7 Assuming INITIALIZE-SIMPLEX returns a slack form for which the ba- sic solution is feasible, SIMPLEX either reports that the program is un- n+m bounded or returns a feasible solution in at most m iterations.
Every set B of basic variables uniquely determines a slack n+m form, and there are at most m unique slack forms.
III. Linear Programming Simplex Algorithm 39 Outline
Introduction
Standard and Slack Forms
Formulating Problems as Linear Programs
Simplex Algorithm
Finding an Initial Solution
III. Linear Programming Finding an Initial Solution 40 Finding an Initial Solution
maximize 2x1 − x2 subject to 2x1 − x2 ≤ 2 x1 − 5x2 ≤ −4 x1, x2 ≥ 0
Conversion into slack form
z = 2x1 − x2 x3 = 2 − 2x1 + x2 x4 = −4 − x1 + 5x2
Basic solution (x1, x2, x3, x4) = (0, 0, 2, −4) is not feasible!
III. Linear Programming Finding an Initial Solution 41 Geometric Illustration
maximize 2x1 − x2 subject to 2x1 − x2 ≤ 2 x1 − 5x2 ≤ −4 Questions: x1, x2 ≥ 0 How to determine whether
x2 there is any feasible solution? If there is one, how to determine 2
≤ an initial basic solution?
2 x
−
1 x 2
4 5x2 ≤ − x1 −
x1
III. Linear Programming Finding an Initial Solution 42 Formulating an Auxiliary Linear Program Pn maximize j=1 cj xj subject to Pn j=1 aij xj ≤ bi for i = 1, 2,..., m, xj ≥ 0 for j = 1, 2,..., n
Formulating an Auxiliary Linear Program
maximize −x0 subject to Pn j=1 aij xj − x0 ≤ bi for i = 1, 2,..., m, xj ≥ 0 for j = 0, 1,..., n Lemma 29.11
Let Laux be the auxiliary LP of a linear program L in standard form. Then L is feasible if and only if the optimal objective value of Laux is 0.
Proof.
“⇒”: Suppose L has a feasible solution x = (x1, x2,..., xn)
x0 = 0 combined with x is a feasible solution to Laux with objective value 0. Since x0 ≥ 0 and the objective is to maximize −x0, this is optimal for Laux “⇐”: Suppose that the optimal objective value of Laux is 0
Then x0 = 0, and the remaining solution values (x1, x2,..., xn) satisfy L. III. Linear Programming Finding an Initial Solution 43 29.5 The initial basic feasible solution 887
maximize x0 (29.106) ! subject to n aij xj x0 bi for i 1; 2; : : : ; m ; (29.107) ! Ä D j 1 XD xj 0 for j 0; 1; : : : ; n : (29.108) # D
Then L is feasible if and only if the optimal objective value of Laux is 0.
Proof Suppose that L has a feasible solution x .x1; x2;:::;xn/.Thenthe N D N N N solution x0 0 combined with x is a feasible solution to Laux with objective N D N value 0.Sincex0 0 is a constraint of Laux and the objective function is to # maximize x0,thissolutionmustbeoptimalforLaux. ! Conversely, suppose that the optimal objective value of Laux is 0.Thenx0 0, N D and the remaining solution values of x satisfy the constraints of L. INITIALIZE-SIMPLEX N We now describe our strategy to find an initial basic feasible solution for a linear program L in standard form: Test solution with N = {1, 2,..., n}, B = {n + 1, n + NITIALIZE IMPLEX.A; b; c/ I -S 2,..., n + m}, x i = bi for i ∈ B, x i = 0 otherwise. 1letk be the index of the minimum bi 2 if bk 0 // is the initial basic solution feasible? # 3 return . 1; 2; : : : ; n ; n 1; n 2;: : : ;n m ;A;b;c;0/ f g f C C C g 4formLaux by adding x0 to the left-hand side of each constraint ! and setting the objective function to x0 ! ` will be the leaving variable so 5let.N; B; A; b; c; !/ be the resulting slack form for Laux 6 l n k that x` has the most negative value. D C 7 // Laux has n 1 nonbasic variables and m basic variables. C 8 .N; B; A; b; c; !/ PIVOT.N; B; A; b; c; !;l;0/ D Pivot step with x` leaving and x0 entering. 9 // The basic solution is now feasible for Laux. 10 iterate the while loop of lines 3–12 of SIMPLEX until an optimal solution to Laux is found 11 if the optimal solution to Laux sets x0 to 0 This pivot step does not change N 12 if x0 is basic N the value of any variable. 13 perform one (degenerate) pivot to make it nonbasic 14 from the final slack form of Laux,removex0 from the constraints and restore the original objective function of L,butreplaceeachbasic variable in this objective function by the right-hand side of its associated constraint 15 return the modified final slack form 16 else return “infeasible”
III. Linear Programming Finding an Initial Solution 44 Example of INITIALIZE-SIMPLEX (1/3)
maximize 2x1 − x2 subject to 2x1 − x2 ≤ 2 x1 − 5x2 ≤ −4 x1, x2 ≥ 0
Formulating the auxiliary linear program
maximize −x0 subject to 2x1 − x2 − x0 ≤ 2 x1 − 5x2 − x0 ≤ −4 x1, x2, x0 ≥ 0 Basic solution (0, 0, 0, 2, −4) not feasible! Converting into slack form
z = − x0 x3 = 2 − 2x1 + x2 + x0 x4 = −4 − x1 + 5x2 + x0
III. Linear Programming Finding an Initial Solution 45 Example of INITIALIZE-SIMPLEX (2/3)
z = − x0 x3 = 2 − 2x1 + x2 + x0 x4 = −4 − x1 + 5x2 + x0
Pivot with x0 entering and x4 leaving
z = −4 − x1 + 5x2 − x4 x0 = 4 + x1 − 5x2 + x4 x3 = 6 − x1 − 4x2 + x4
Basic solution (4, 0, 0, 6, 0) is feasible! Pivot with x2 entering and x0 leaving
z = − x0 x x x x = 4 − 0 + 1 + 4 2 5 5 5 5 4x 9x x x = 14 + 0 − 1 + 4 3 5 5 5 5
Optimal solution has x0 = 0, hence the initial problem was feasible!
III. Linear Programming Finding an Initial Solution 46 Example of INITIALIZE-SIMPLEX (3/3)
z = − x0 x x x x = 4 − 0 + 1 + 4 2 5 5 5 5 4x 9x x x = 14 + 0 − 1 + 4 3 5 5 5 5
Set x0 = 0 and express objective function
4 x0 x1 x4 by non-basic variables 2x1 − 2x2 = 2x1 − ( 5 − 5 + 5 + 5 ) 9x x z = − 4 + 1 − 4 5 5 5 x x x = 4 + 1 + 4 2 5 5 5 9x x x = 14 − 1 + 4 3 5 5 5
4 14 Basic solution (0, 5 , 5 , 0), which is feasible!
Lemma 29.12 If a linear program L has no feasible solution, then INITIALIZE-SIMPLEX returns “infeasible”. Otherwise, it returns a valid slack form for which the basic solution is feasible.
III. Linear Programming Finding an Initial Solution 47 Fundamental Theorem of Linear Programming
Theorem 29.13 Any linear program L, given in standard form, either 1. has an optimal solution with a finite objective value, 2. is infeasible, or 3. is unbounded.
If L is infeasible, SIMPLEX returns “infeasible”. If L is unbounded, SIMPLEX returns “unbounded”. Otherwise, SIMPLEX returns an optimal solution with a finite objective value.
III. Linear Programming Finding an Initial Solution 48 Linear Programming and Simplex: Summary
Linear Programming extremely versatile tool for modelling problems of all kinds basis of Integer Programming, to be discussed in later lectures
Simplex Algorithm x3 In practice: usually terminates in x2 polynomial time, i.e., O(m + n) In theory: even with anti-cycling may need exponential time x1 Research Problem: Is there a pivoting rule which makes SIMPLEX a polynomial-time algorithm?
Polynomial-Time Algorithms x3 Interior-Point Methods: traverses the x2 interior of the feasible set of solutions (not just vertices!)
x1
III. Linear Programming Finding an Initial Solution 49 IV. Approximation Algorithms: Covering Problems Thomas Sauerwald
Easter 2015 Outline
Introduction
Vertex Cover
The Set-Covering Problem
IV. Covering Problems Introduction 2 Motivation
Many fundamental problems are NP-complete, yet they are too impor- tant to be abandoned.
Examples: HAMILTON, 3-SAT, VERTEX-COVER,KNAPSACK,...
Strategies to cope with NP-complete problems 1. If inputs (or solutions) are small, an algorithm with exponential running time may be satisfactory. 2. Isolate important special cases which can be solved in polynomial-time. 3. Develop algorithms which find near-optimal solutions in polynomial-time. We will call these approximation algorithms.
IV. Covering Problems Introduction 3 Performance Ratios for Approximation Algorithms
Approximation Ratio An algorithm for a problem has approximation ratio ρ(n), if for any input of size n, the cost C of the returned solution and optimal cost C∗ satisfy:
∗ C C∗ Maximization problem: C ≥ 1 max , ≤ ρ(n). C C∗ C C Minimization problem: C∗ ≥ 1
This covers both maximization and minimization problems.
For many problems: tradeoff between runtime and approximation ratio.
Approximation Schemes An approximation scheme is an approximation algorithm, which given any input and > 0, is a (1 + )-approximation algorithm. It is a polynomial-time approximation scheme (PTAS) if for any fixed > 0, the runtime is polynomial in n. For example, O(n2/). It is a fully polynomial-time approximation scheme (FPTAS) if the runtime is polynomial in both 1/ and n. For example, O((1/)2 · n3).
IV. Covering Problems Introduction 4 Outline
Introduction
Vertex Cover
The Set-Covering Problem
IV. Covering Problems Vertex Cover 5 b
c
e
The Vertex-Cover Problem
We are covering edges by picking vertices! b Vertex Cover Problem Given: Undirected graph G = (V , E) a 0 Goal: Find a minimum-cardinality subset V ⊆ V c such that if (u, v) ∈ E(G), then u ∈ V 0 or v ∈ V 0. e This is an NP-hard problem. d
Applications: Every edge forms a task, and every vertex represents a person/machine which can execute that task Perform all tasks with the minimal amount of resources Extensions: weighted edges or hypergraphs
IV. Covering Problems Vertex Cover 6 Edges removed from E 0: 1. {b, c} b c d 2. {e, f } 3. {d, g}
e f g
APPROX-VTheERTEX optimal-COVER solutionproduces has size a set 3. of size 6.
35.1 The vertex-cover problem 1109
bcd bcd
ae fg ae fg (a) (b)
bcd bcd
ae fg ae fg (c) (d)
bcd bcd
ae fg ae fg (e) (f)
Figure 35.1 The operation of APPROX-VERTEX-COVER. (a) The input graph G,whichhas7 vertices and 8 edges. (b) The edge .b; c/,shownheavy,isthefirstedgechosenbyAPPROX-VERTEX- COVER.Verticesb and c,shownlightlyshaded,areaddedtothesetC containing the vertex cover being created. Edges .a; b/, .c; e/,and.c; d/,showndashed,areremovedsincetheyarenowcovered by some vertex in C . (c) Edge .e; f / is chosen; vertices e and f are added to C . (d) Edge .d; g/ is chosen; vertices d and g are added to C . (e) The set C ,whichisthevertexcoverproducedby APPROX-VERTEX-COVER,containsthesixverticesb;c; d;e; f;g. (f) The optimal vertex cover for Anthis problem Approximation contains only three Algorithm vertices: b, d,and basede. on Greedy
APPROX-VERTEX-COVER.G/ 1 C D; 2 E0 G:E D 3 while E0 ¤; 4let.u; !/ be an arbitrary edge of E0 5 C C u; ! D [ f g 6removefromE0 every edge incident on either u or ! 7 return C
Figure 35.1 illustrates how APPROX-VERTEX-COVER operates on an example graph. The variable C contains the vertex cover being constructed. Line 1 ini- tializesb C to the emptyc set. Line 2d sets E0 to be a copy of the edge set G:E of the graph. The loop of lines 3–6 repeatedly picks an edge .u; !/ from E0,addsits
a e f g
IV. Covering Problems Vertex Cover 7 b d
e
The optimal solution has size 3.
35.1 The vertex-cover problem 1109
bcd bcd
ae fg ae fg (a) (b)
bcd bcd
ae fg ae fg (c) (d)
bcd bcd
ae fg ae fg (e) (f)
Figure 35.1 The operation of APPROX-VERTEX-COVER. (a) The input graph G,whichhas7 vertices and 8 edges. (b) The edge .b; c/,shownheavy,isthefirstedgechosenbyAPPROX-VERTEX- COVER.Verticesb and c,shownlightlyshaded,areaddedtothesetC containing the vertex cover being created. Edges .a; b/, .c; e/,and.c; d/,showndashed,areremovedsincetheyarenowcovered by some vertex in C . (c) Edge .e; f / is chosen; vertices e and f are added to C . (d) Edge .d; g/ is chosen; vertices d and g are added to C . (e) The set C ,whichisthevertexcoverproducedby APPROX-VERTEX-COVER,containsthesixverticesb;c; d;e; f;g. (f) The optimal vertex cover for Anthis problem Approximation contains only three Algorithm vertices: b, d,and basede. on Greedy
APPROX-VERTEX-COVER.G/ 1 C D; 2 E0 G:E D 3 while E0 ¤; 4let.u; !/ be an arbitrary edge of E0 5 C C u; ! D [ f g 6removefromE0 every edge incident on either u or ! 7 return C
Figure 35.1 illustrates how APPROX-VERTEX-COVEREdgesoperates removed on an from exampleE 0: C graph. The variable contains the vertex cover being1. { constructed.b, c} Line 1 ini- tializesb C to the emptyc set. Line 2d sets E0 to be a copy of the edge set G:E of the graph. The loop of lines 3–6 repeatedly picks an2. edge{e.u;, f }!/ from E0,addsits 3. {d, g}
a e f g
APPROX-VERTEX-COVER produces a set of size 6.
IV. Covering Problems Vertex Cover 7 Edges removed from E 0: 1. {b, c} b c d 2. {e, f } 3. {d, g}
e f g
APPROX-VERTEX-COVER produces a set of size 6.
35.1 The vertex-cover problem 1109
bcd bcd
ae fg ae fg (a) (b)
bcd bcd
ae fg ae fg (c) (d)
bcd bcd
ae fg ae fg (e) (f)
Figure 35.1 The operation of APPROX-VERTEX-COVER. (a) The input graph G,whichhas7 vertices and 8 edges. (b) The edge .b; c/,shownheavy,isthefirstedgechosenbyAPPROX-VERTEX- COVER.Verticesb and c,shownlightlyshaded,areaddedtothesetC containing the vertex cover being created. Edges .a; b/, .c; e/,and.c; d/,showndashed,areremovedsincetheyarenowcovered by some vertex in C . (c) Edge .e; f / is chosen; vertices e and f are added to C . (d) Edge .d; g/ is chosen; vertices d and g are added to C . (e) The set C ,whichisthevertexcoverproducedby APPROX-VERTEX-COVER,containsthesixverticesb;c; d;e; f;g. (f) The optimal vertex cover for Anthis problem Approximation contains only three Algorithm vertices: b, d,and basede. on Greedy
APPROX-VERTEX-COVER.G/ 1 C D; 2 E0 G:E D 3 while E0 ¤; 4let.u; !/ be an arbitrary edge of E0 5 C C u; ! D [ f g 6removefromE0 every edge incident on either u or ! 7 return C
Figure 35.1 illustrates how APPROX-VERTEX-COVER operates on an example graph. The variable C contains the vertex cover being constructed. Line 1 ini- tializesb C to the emptyc set. Line 2d sets E0 to be a copy of the edge set G:E of the graph. The loop of lines 3–6 repeatedly picks an edge .u; !/ from E0,addsits
a e f g
The optimal solution has size 3.
IV. Covering Problems Vertex Cover 7 35.1 The vertex-cover problem 1109
bcd bcd
ae fg ae fg (a) (b)
bcd bcd
ae fg ae fg (c) (d)
bcd bcd
ae fg ae fg (e) (f)
Figure 35.1 The operation of APPROX-VERTEX-COVER. (a) The input graph G,whichhas7 vertices and 8 edges. (b) The edge .b; c/,shownheavy,isthefirstedgechosenbyAPPROX-VERTEX- COVER.Verticesb and c,shownlightlyshaded,areaddedtothesetC containing the vertex cover being created. Edges .a; b/, .c; e/,and.c; d/,showndashed,areremovedsincetheyarenowcovered by some vertex in C . (c) Edge .e; f / is chosen; vertices e and f are added to C . (d) Edge .d; g/ is chosen; vertices d and g are added to C . (e) The set C ,whichisthevertexcoverproducedby APPROX-VERTEX-COVER,containsthesixverticesb;c; d;e; f;g. (f) The optimal vertex cover for Analysisthis problem contains of Greedy only three vertices: for Vertexb, d,and Covere.
APPROX-VERTEX-COVER.G/ 1 C D; 2 E0 G:E D 3 while E0 ¤; 4let.u; !/ be an arbitrary edge of E0 5 C C u; ! D [ f g 6removefromE0 every edge incident on either u or ! C 7 return We can bound the size of the returned solution without knowing the (size of an) optimal solution! FigureTheorem 35.1 illustrates 35.1 how APPROX-VERTEX-COVER operates on an example graph. The variable C contains the vertex cover being constructed. Line 1 ini- APPROX-VERTEX-COVER is a poly-time 2-approximation algorithm. tializes C to the empty set. Line 2 sets E0 to be a copy of the edge set G:E of the graph. The loop of lines 3–6 repeatedly picks an edge .u; !/ from E0,addsits Proof: Running time is O(V + E) (using adjaency lists to represent E 0) Let A ⊆ E denote the set of edges picked in line 4 Every optimal cover C∗ must include at least one endpoint of edges in A, and edges in A do not share a common endpoint: |C∗| ≥ |A|
Every edge in A contributes 2 vertices to |C|: |C| = 2|A| ≤ 2|C∗|.
IV. Covering Problems Vertex Cover 8 Solving Special Cases
Strategies to cope with NP-complete problems 1. If inputs are small, an algorithm with exponential running time may be satisfactory. 2. Isolate important special cases which can be solved in polynomial-time. 3. Develop algorithms which find near-optimal solutions in polynomial-time.
IV. Covering Problems Vertex Cover 9 Vertex Cover on Trees
There exists an optimal vertex cover which does not include any leaves.
Exchange-Argument: Replace any leaf in the cover by its parent.
IV. Covering Problems Vertex Cover 10 Solving Vertex Cover on Trees
There exists an optimal vertex cover which does not include any leaves.
VERTEX-COVER-TREES(G) 1: C = ∅ 2: while ∃ leaves in G 3: Add all parents to C 4: Remove all leafs and their parents from G 5: return C
Clear: Running time is O(V ), and the returned solution is a vertex cover.
Solution is also optimal. (Use inductively the ex- istence of an optimal vertex cover without leaves)
IV. Covering Problems Vertex Cover 11 After iteration
Problem can be also solved on bipartite graphs, using Max-Flows and Min-Cuts.
Execution on a Small Example
VERTEX-COVER-TREES(G) 1: C = ∅ 2: while ∃ leaves in G 3: Add all parents to C 4: Remove all leafs and their parents from G 5: return C
IV. Covering Problems Vertex Cover 12 Problem can be also solved on bipartite graphs, using Max-Flows and Min-Cuts.
Execution on a Small Example
After iteration 1
VERTEX-COVER-TREES(G) 1: C = ∅ 2: while ∃ leaves in G 3: Add all parents to C 4: Remove all leafs and their parents from G 5: return C
IV. Covering Problems Vertex Cover 12 Execution on a Small Example
After iteration 2
VERTEX-COVER-TREES(G) 1: C = ∅ 2: while ∃ leaves in G 3: Add all parents to C 4: Remove all leafs and their parents from G 5: return C
Problem can be also solved on bipartite graphs, using Max-Flows and Min-Cuts.
IV. Covering Problems Vertex Cover 12 Exact Algorithms
Such algorithms are called exact algorithms.
Strategies to cope with NP-complete problems 1. If inputs (or solutions) are small, an algorithm with exponential running time may be satisfactory 2. Isolate important special cases which can be solved in polynomial-time. 3. Develop algorithms which find near-optimal solutions in polynomial-time.
Focus on instances of where the minimum vertex cover is small, that is, smaller than some given integer k.