I. Networks Thomas Sauerwald

Easter 2015 Outline

Introduction to Sorting Networks

Batcher’s

Counting Networks

Load Balancing on Graphs

I. Sorting Networks Introduction to Sorting Networks 2 Overview: Sorting Networks

(Serial) Sorting we already know several (comparison-based) sorting algorithms: Insertion , , , Quick sort, sort execute one operation at a time can handle arbitrarily large inputs sequence of comparisons is not set in advance

Sorting Networks only perform comparisons can only handle inputs of a fixed size sequence of comparisons is set in advance Allows to sort n numbers Comparisons can be performed in parallel in sublinear time!

Simple concept, but surprisingly deep and complex theory!

I. Sorting Networks Introduction to Sorting Networks 3 Comparison Networks

A sorting network is a comparison network which Comparison Network works correctly (that is, it sorts every input) A comparison network consists solely of wires and comparators: comparator is a device with, on given two inputs, x and y, returns two operates in O(1) outputs x0 and y 0 wire connect output of one comparator to the input of another special wires: n input wires a1, a2,..., an and n output wires b1, b2,..., bn

Convention:27.1 Comparison use networks the same name for both a wire and its value. 705

7 3 x x! min(x, y) x x ! min(x, y) comparator = = 3 7 y y! max(x, y) y y! max(x, y) = = (a) (b)

Figure 27.1 (a) Acomparatorwithinputsx and y and outputs x! and y!. (b) The same comparator, drawn as a single vertical line. Inputs x 7, y 3andoutputsx 3, y 7areshown. = = ! = ! =

Acomparisonnetworkiscomposedsolelyofwiresandcomparators. A compara- tor,showninFigure27.1(a),isadevicewithtwoinputs,I. Sorting Networks Introduction to Sorting Networks x and y,andtwooutputs,4 x! and y!,thatperformsthefollowingfunction:

x! min(x, y), = y! max(x, y). = Because the pictorial representation of a comparator in Figure 27.1(a) is too bulky for our purposes, we shall adopt the convention of drawing comparators as single vertical lines, as shown in Figure 27.1(b). Inputs appear on the left and outputs on the right, with the smaller input value appearing on the top output and the larger input value appearing on the bottom output. We can thus think of a comparator as sorting its two inputs. We shall assume that each comparator operates in O(1) time. In other words, we assume that the time between the appearance of the input values x and y and the production of the output values x! and y! is a constant. A wire transmits a value from place to place. Wires can connect the output of one comparator to the input of another, but otherwise they are either network input wires or network output wires. Throughout this chapter, we shall assume that a comparison network contains n input wires a1, a2,...,an,throughwhich the values to be sorted enter the network, and n output wires b1, b2,...,bn,which produce the results computed by the network. Also, we shall speak of the input sequence a1, a2,...,an and the output sequence b1, b2,...,bn ,referringto the values" on the input and# output wires. That is, we use" the same name# for both a wire and the value it carries. Our intention will always be clear from the context. Figure 27.2 shows a comparison network,whichisasetofcomparatorsinter- connected by wires. We draw a comparison network on n inputs as a collection of n horizontal lines with comparators stretched vertically. Note that a line does not represent a single wire, but rather a sequence of distinct wires connecting vari- ous comparators. The top line in Figure 27.2, for example, represents three wires: input wire a1,whichconnectstoaninputofcomparatorA;awireconnectingthe top output of comparator A to an input of comparator C;andoutputwireb1,which comes from the top output of comparator C.Eachcomparatorinputisconnected X

Interconnections between comparators must be acyclic 9 5 2 2

5 9 6 5

F D F 2 2 F 5 6

D 6 6 9 9

depth 0 1 1 2 2 3 TracingThis backnetwork a path is in must fact a neverMaximum sorting cycle network! depth back ofon an output Depth of a wire: itself and go through the samewire comparator equals total twice. running time Input wire has Depth 0

If a comparator has two inputs of depths dx and dy , then outputs have depth max{dx , dy } + 1

Example of a Comparison Network (Figure 27.2)

A horizontal line represents a sequence of distinct wires

a1 b1 A C

a2 b2 E

a3 b3 B D

a4 b4

I. Sorting Networks Introduction to Sorting Networks 5 A horizontal line represents a sequence of distinctX wires 9 5 2 2

5 9 6 5

D F 2 2 F 5 6

D 6 6 9 9

depth 0 1 1 2 2 3 TracingThis backnetwork a path is in must fact a neverMaximum sorting cycle network! depth back ofon an output Depth of a wire: itself and go through the samewire comparator equals total twice. running time Input wire has Depth 0

If a comparator has two inputs of depths dx and dy , then outputs have depth max{dx , dy } + 1

Example of a Comparison Network (Figure 27.2)

Interconnections between comparators must be acyclic

a1 b1 A C

a2 b2 F E

a3 b3 B D

a4 b4

I. Sorting Networks Introduction to Sorting Networks 5 A horizontal line represents a sequence of distinct wires 9 5 2 2

5 9 6 5

F D 2 2 F 5 6

D 6 6 9 9

depth 0 1 1 2 2 3 TracingThis backnetwork a path is in must fact a neverMaximum sorting cycle network! depth back ofon an output Depth of a wire: itself and go through the samewire comparator equals total twice. running time Input wire has Depth 0

If a comparator has two inputs of depths dx and dy , then outputs have depth max{dx , dy } + 1

Example of a Comparison Network (Figure 27.2)

Interconnections between comparators must be acyclic X

a1 b1 A C

a2 b2 D F E

a3 b3 B

a4 b4

I. Sorting Networks Introduction to Sorting Networks 5 A horizontal line represents a sequence of distinctX wires 9 5 2 2

5 9 6 5

F D F 2 2 5 6

D 6 6 9 9

depth 0 1 1 2 2 3 This network is in fact aMaximum sorting network! depth of an output Depth of a wire: wire equals total running time Input wire has Depth 0

If a comparator has two inputs of depths dx and dy , then outputs have depth max{dx , dy } + 1

Example of a Comparison Network (Figure 27.2)

Interconnections between comparators must be acyclic

a1 b1 A C

a2 b2 E F a3 b3 B D

a4 b4

Tracing back a path must never cycle back on itself and go through the same comparator twice.

I. Sorting Networks Introduction to Sorting Networks 5 X

InterconnectionsA horizontal between line represents comparators a sequencemust be of acyclic distinct wires

F D F F

D

depth 0 1 1 2 2 3 Tracing back a path must neverMaximum cycle depth back ofon an output Depth of a wire: itself and go through the samewire comparator equals total twice. running time Input wire has Depth 0

If a comparator has two inputs of depths dx and dy , then outputs have depth max{dx , dy } + 1

Example of a Comparison Network (Figure 27.2)

9 5 2 2 a1 b1 A C 5 9 6 5 a2 b2 E 2 2 5 6 a3 b3 B D 6 6 9 9 a4 b4

This network is in fact a sorting network!

I. Sorting Networks Introduction to Sorting Networks 5 X

InterconnectionsA horizontal between line represents comparators a sequencemust be of acyclic distinct wires

F D F F

D

TracingThis backnetwork a path is in must fact a never sorting cycle network! back on itself and go through the same comparator twice.

Example of a Comparison Network (Figure 27.2)

9 5 2 2 a1 b1 A C 5 9 6 5 a2 b2 E 2 2 5 6 a3 b3 B D 6 6 9 9 a4 b4 depth 0 1 1 2 2 3 Maximum depth of an output Depth of a wire: wire equals total running time Input wire has Depth 0

If a comparator has two inputs of depths dx and dy , then outputs have depth max{dx , dy } + 1

I. Sorting Networks Introduction to Sorting Networks 5 Zero-One Principle

Zero-One Principle: A sorting networks works correctly on arbitrary in- puts if it works correctly on binary inputs.

Lemma 27.1

If a comparison network transforms the input a = ha1, a2,..., ani into the output b = hb1, b2,..., bni, then for any monotonically increasing function f , the network transforms f (a) = hf (a1), f (a2),..., f (an)i into f (b) = hf (b1), f (b2),..., f (bn)i. 710 Chapter 27 Sorting Networks

f (x) min( f (x), f (y)) f (min(x, y)) = f (y) max( f (x), f (y)) f (max(x, y)) =

Figure 27.4 The operation of the comparator in the proof of Lemma 27.1. Thefunction f is monotonically increasing.

To prove the claim, consider a comparator whose input values are x and y.The upper output of the comparator is min(x, y) and the lower output is max(x, y). I. Sorting Networks Introduction to Sorting Networks 6 Suppose we now apply f (x) and f (y) to the inputs of the comparator, as is shown in Figure 27.4. The operation of the comparator yields the value min( f (x), f (y)) on the upper output and the value max( f (x), f (y)) on the lower output. Since f is monotonically increasing, x y implies f (x) f (y).Consequently,wehave the identities ≤ ≤ min( f (x), f (y)) f (min(x, y)) , = max( f (x), f (y)) f (max(x, y)) . = Thus, the comparator produces the values f (min(x, y)) and f (max(x, y)) when f (x) and f (y) are its inputs, which completes the proof of the claim. We can use induction on the depth of each wire in a general comparison network to prove a stronger result than the statement of the lemma: if awireassumesthe value ai when the input sequence a is applied to the network, then it assumes the value f (ai ) when the input sequence f (a) is applied. Because the output wires are included in this statement, proving it will prove the lemma. For the basis, consider a wire at depth 0, that is, an input wire ai .Theresult follows trivially: when f (a) is applied to the network, the input wire carries f (ai ). For the inductive step, consider a wire at depth d,whered 1. The wire is the output of a comparator at depth d,andtheinputwirestothiscomparatorareata≥ depth strictly less than d.Bytheinductivehypothesis,therefore,iftheinputwires to the comparator carry values ai and a j when the input sequence a is applied, then they carry f (ai ) and f (a j ) when the input sequence f (a) is applied. By our earlier claim, the output wires of this comparator then carry f (min(ai , a j )) and f (max(ai , a j )).Sincetheycarrymin(ai , a j ) and max(ai , a j ) when the input sequence is a,thelemmaisproved.

As an example of the application of Lemma 27.1, Figure 27.5(b)showsthesort- ing network from Figure 27.2 (repeated in Figure 27.5(a)) with the monotonically increasing function f (x) x/2 applied to the inputs. The value on every wire is f applied to the value on= the# same$ wire in Figure 27.2. When a comparison network is a sorting network, Lemma 27.1 allows us to prove the following remarkable result. Zero-One Principle

Zero-One Principle: A sorting networks works correctly on arbitrary in- puts if it works correctly on binary inputs.

Lemma 27.1

If a comparison network transforms the input a = ha1, a2,..., ani into the output b = hb1, b2,..., bni, then for any monotonically increasing function f , the network transforms f (a) = hf (a1), f (a2),..., f (an)i into f (b) = hf (b1), f (b2),..., f (bn)i.

Theorem 27.2 (Zero-One Principle) If a comparison network with n inputs sorts all 2n possible sequences of 0’s and 1’s correctly, then it sorts all sequence of arbitrary numbers correctly.

I. Sorting Networks Introduction to Sorting Networks 6 Proof of the Zero-One Principle

Theorem 27.2 (Zero-One Principle) If a comparison network with n inputs sorts all 2n possible sequences of 0’s and 1’s correctly, then it sorts all sequence of arbitrary numbers correctly.

Proof: For the sake of contradiction, suppose the network does not correctly sort.

Let a = ha1, a2,..., ani be the input with ai < aj , but the network places aj before ai in the output Define a monotonitcally increasing function f as: ( 0 if x ≤ a , f (x) = i 1 if x > ai .

Since the network places aj before ai , by the previous lemma ⇒ f (aj ) is placed before f (ai )

But f (aj ) = 1 and f (ai ) = 0, which contradicts the assumption that the network sorts all sequences of 0’s and 1’s correctly

I. Sorting Networks Introduction to Sorting Networks 7 Bubble Sort

Insertion Sort

Some Basic (Recursive) Sorting Networks

1 2 3 4 n-wire Sorting Network ??? 5

n − 1 n n + 1 These are Sorting Networks, but with depth Θ(n).

1 2 3 4 n-wire Sorting Network ??? 5

n − 1 n n + 1

I. Sorting Networks Introduction to Sorting Networks 8 Outline

Introduction to Sorting Networks

Batcher’s Sorting Network

Counting Networks

Load Balancing on Graphs

I. Sorting Networks Batcher’s Sorting Network 9 Bitonic Sequences

Bitonic Sequence A sequence is bitonic if it monotonically increases and then monoton- ically decreases, or can be circularly shifted to become monotonically increasing and then monotonically decreasing.

Sequences of one or two numbers are defined to be bitonic.

Examples: h1, 4, 6, 8, 3, 2i X h6, 9, 4, 2, 3, 5i X h9, 8, 3, 2, 4, 6i X ((( (h4,(5,(7, 1, 2, 6i binary sequences: 0i 1j 0k , or, 1i 0j 1k , for i, j, k ≥ 0.

I. Sorting Networks Batcher’s Sorting Network 10 Towards a Bitonic Sorting Networks

Half-Cleaner A half-cleaner is a comparison network of depth 1 in which input wire i is compared with wire i + n/2 for i = 1, 2,..., n/2.

We always assume that n is even. Lemma 27.3 If the input to a half-cleaner is a bitonic sequence of 0’s and 1’s, then the output satisfies the following properties: both the top half and the bottom half are bitonic, every element in the top is not larger than any element in the bottom, at least one half is clean. 27.3 A bitonic sorting network 713

0 0 0 0 0 0 bitonic, 0 0 bitonic 1 0 clean 1 1 1 0 1 0 bitonic bitonic 1 1 1 1 0 0 1 1 bitonic, bitonic 0 1 1 1 clean 0 1 0 1

I. Sorting Networks Batcher’s Sorting Network 11 Figure 27.7 The comparison network HALF-CLEANER[8]. Two different sample zero-one input and output values are shown. The input is assumed to be bitonic. A half-cleaner ensures that ev- ery output element of the top half is at least as small as every output element of the bottom half. Moreover, both halves are bitonic, and at least one half is clean.

even.) Figure 27.7 shows HALF-CLEANER[8], the half-cleaner with 8 inputs and 8outputs. When a bitonic sequence of 0’s and 1’s is applied as input to a half-cleaner, the half-cleaner produces an output sequence in which smaller values are in the top half, larger values are in the bottom half, and both halves arebitonic.Infact,at least one of the halves is clean—consisting of either all 0’s or all 1’s—and it is from this property that we derive the name “half-cleaner.” (Note that all clean sequences are bitonic.) The next lemma proves these properties of half-cleaners.

Lemma 27.3 If the input to a half-cleaner is a bitonic sequence of 0’s and 1’s, then the output satisfies the following properties: both the top half and the bottom half are bitonic, every element in the top half is at least as small as every element of the bottom half, and at least one half is clean.

Proof The comparison network HALF-CLEANER[n]comparesinputsi and i n/2fori 1, 2,...,n/2. Without loss of generality, suppose that the in- put+ is of the form= 00 ...011 ...100 ...0. (The situation in which the input is of the form 11 ...100 ...011 ...1issymmetric.)Therearethreepossiblecasesde- pending upon the block of consecutive 0’s or 1’s in which the midpoint n/2falls, and one of these cases (the one in which the midpoint occurs in the block of 1’s) is further split into two cases. The four cases are shown in Figure 27.8. In each case shown, the lemma holds. This suggests a recursive approach, since it now suffices to sort the top and bottom half separately.

Proof of Lemma 27.3 714 Chapter 27 Sorting Networks W.l.o.g. assume that the input is of the form 0i 1j 0k , for some i, j, k ≥ 0.

divide compare combine

0 top top bitonic, 0 1 1 clean 0 bitonic 0 0 1 0 1 1 1 0 bitonic 0 bottom bottom 1 (a)

0 0 top top 1 bitonic 0 0 1 0 bitonic 1 1 1 1 0 0 1 bitonic, bottom bottom clean 0 (b)

top top bitonic, 0 0 0 0 clean bitonic I. Sorting0 Networks1 Batcher’s0 Sorting1 Network 12 0 0 0 1 1 bitonic bottom bottom 0 0 (c)

0 top top bitonic, 1 0 0 0 clean bitonic 1 0 0 1 0 0 0 0 1 bitonic bottom bottom 0 (d)

Figure 27.8 The possible comparisons in HALF-CLEANER[n]. The input sequence is assumed to be a bitonic sequence of 0’s and 1’s, and without loss of generality, we assume that it is of the form 00 ...011 ...100 ...0. Subsequences of 0’s are white, and subsequences of 1’s are gray. We can think of the n inputs as being divided into two halves such that for i 1, 2,...,n/2, inputs i = and i n/2arecompared. (a)–(b) Cases in which the division occurs in the middle subsequence + of 1’s. (c)–(d) Cases in which the division occurs in a subsequence of 0’s. Forallcases,every element in the top half of the output is at least as small as every element in the bottom half, both halves are bitonic, and at least one half is clean. 714 Chapter 27 Sorting Networks

divide compare combine

0 top top bitonic, 0 1 1 clean 0 bitonic 0 0 1 0 1 1 1 0 bitonic 0 bottom bottom 1 (a)

0 0 top top 1 bitonic 0 0 1 0 bitonic 1 1 1 1 Proof of Lemma 27.3 0 0 1 bitonic, bottom bottom clean 0 i j k W.l.o.g. assume that the input(b) is of the form 0 1 0 , for some i, j, k ≥ 0.

top top bitonic, 0 0 0 0 clean bitonic 0 1 0 1 0 0 0 1 1 bitonic bottom bottom 0 0 (c)

0 top top bitonic, 1 0 0 0 clean bitonic 1 0 0 1 0 0 0 0 1 bitonic bottom bottom 0 (d)

This suggests a recursive approach, since it now Figure 27.8 The possible comparisonssuffices in HALF to-CLEANER sort the[n]. top The and input bottom sequence halfis assumed separately. to be a bitonic sequence of 0’s and 1’s, and without loss of generality, we assume that it is of the form 00 ...011 ...100 ...0. Subsequences of 0’s are white, and subsequences of 1’s are gray. We can think of the n inputsI. Sorting as Networks being divided into Batcher’s two halvesSorting Network such that for i 1, 2,...,n/2, inputs12 i = and i n/2arecompared. (a)–(b) Cases in which the division occurs in the middle subsequence + of 1’s. (c)–(d) Cases in which the division occurs in a subsequence of 0’s. Forallcases,every element in the top half of the output is at least as small as every element in the bottom half, both halves are bitonic, and at least one half is clean. The Bitonic27.3 Sorter A bitonic sorting network 715

0 0 0 0 0 0 BITONIC- 0 0 SORTER[n/2] 1 0 0 0 0 0 HALF- 1 0 bitonic sorted CLEANER[n] 1 1 1 0 0 0 BITONIC- 0 1 SORTER[n/2] 0 1 1 1 0 1 1 1 (a) (b)

Figure 27.9 The comparison network BITONIC-SORTER[n], shown here for n 8. (a) The re- = cursive construction: HALF-CLEANER[n]followedbytwocopiesofBITONIC-SORTER[n/2] that operate in parallel. (b) The network after unrolling the recursion. Each half-cleaner is shaded. Sam- ple zero-one values are shown on the wires. Henceforth we will always The assume that n is a power of 2. Recursive FormulaBy recursively for depth combiningD( half-cleaners,n): as shown in Figure 27.9, we can build a bitonic sorter,whichisanetworkthatsortsbitonicsequences.Thefirstst( age of BITONIC-SORTER[n]consistsofH0ALF-CLEANER if n =[n],1, which, by Lemma 27.3, produces twoD bitonic(n) = sequences of half the size such thatk every element in the top half is at least as smallD as(n every/2) element + 1 in if then = bottom2 . half. Thus, we can complete the sort by using two copies of BITONIC-SORTER[n/2] to sort the two halves recursively. In Figure 27.9(a), the recursion has been shown explicitly, and BITONIC-SORTERin Figure[n] 27.9(b),has depth the recursion log hasn and been unrolled sorts toany show zero-one theprogressivelysmaller bitonic sequence. half-cleaners that make up the remainder of the bitonic sorter. The depth D(n) of BITONIC-SORTER[n]isgivenbytherecurrence 0ifn 1 , D(nI.) Sorting Networks= Batcher’s Sorting Network 13 = D(n/2) 1ifn 2k and k 1 , ! + = ≥ whose solution is D(n) lg n. = Thus, a zero-one bitonic sequence can be sorted by BITONIC-SORTER,which has a depth of lg n.Itfollowsbytheanalogofthezero-oneprinciplegivenas Exercise 27.3-6 that any bitonic sequence of arbitrary numbers can be sorted by this network.

Exercises

27.3-1 How many zero-one bitonic sequences of length n are there? Merging Networks

Merging Networks can merge two sorted input sequences into one sorted output sequences will be based on a modification of BITONIC-SORTER[n]

Basic Idea: consider two given sequences X = 00000111, Y = 00001111 concatenating X with Y R (the reversal of Y ) ⇒ 0000011111110000

This sequence is bitonic!

Hence in order to merge the sequences X and Y , it suf- fices to perform a bitonic sort on X concatenated with Y R .

I. Sorting Networks Batcher’s Sorting Network 14 Construction of a Merging Network (1/2)

Given two sorted sequences ha1, a2,..., an/2i and han/2+1, an/2+2,..., ani We know it suffices to bitonically sort ha1, a2,..., an/2, an, an−1,..., an/2+1i Recall: first half-cleaner of BITONIC-SORT[n] compares i and n/2 + i ⇒ First part of MERGER[n] compares inputs i and n − i for i = 1, 2,..., n/2 Remaining27.4 A part merging is network identical to BITONIC-SORT[n] 717

0 0 0 0 a1 b1 a1 b1 0 0 0 0 a2 b2 a2 b2 sorted 1 0 bitonic 1 0 bitonic a3 b3 a3 b3 1 0 1 0 a4 b4 a4 b4 0 1 bitonic 1 1 a5 b5 a8 b8 0 1 0 0 a6 b6 a7 b7 sorted 0 0 bitonic 0 1 bitonic a7 b7 a6 b6 1 1 0 1 a8 b8 a5 b5 (a) (b) Lemma 27.3 still applies, since the reversal of a bitonic sequence is bitonic.

Figure 27.10 Comparing the first stage of MERGER[n]withHALF-CLEANER[n], for n 8. = (a) The first stage of MERGER[n]transformsthetwomonotonicinputsequences a , a ,...,a ! 1 2 n/2" and an/2 1, an/2 2,...,an into two bitonic sequences b1, b2,...,bn/2 and bn/2 1, bn/2 2, ! + + " ! " ! + + ..., bn . (b) The equivalent operation for HALF-CLEANER[n]. The bitonic input sequence " a1, a2,...,an/2 1, an/2, an, an 1,...,an/2 2, an/2 1 is transformed into the two bitonic se- ! − − + + " quences b1, b2,...,bn/2 and bn, bn 1,...,bn/2 1 . ! " ! − + "

We canI. Sorting construct Networks M ERGER[n Batcher’s]bymodifyingthefirsthalf-cleanerofB Sorting Network ITONIC15 - SORTER[n]. The key is to perform the reversal of the second half of the inputs implicitly. Given two sorted sequences a1, a2,...,an/2 and an/2 1, an/2 2, ..., a to be merged, we want the effect! of bitonically" sorting! the+ sequence+ n" a1, a2,...,an/2, an, an 1,...,an/2 1 .Sincethefirsthalf-cleanerofBITONIC- ! − + " SORTER[n]comparesinputsi and n/2 i,fori 1, 2,...,n/2, we make the first stage of the merging network compare+ inputs =i and n i 1. Figure 27.10 shows the correspondence. The only subtlety is that the orde−roftheoutputsfrom+ the bottom of the first stage of MERGER[n]arereversedcomparedwiththeorder of outputs from an ordinary half-cleaner. Since the reversalofabitonicsequence is bitonic, however, the top and bottom outputs of the first stage of the merging network satisfy the properties in Lemma 27.3, and thus the topandbottomcanbe bitonically sorted in parallel to produce the sorted output of the merging network. The resulting merging network is shown in Figure 27.11. Only the first stage of MERGER[n]isdifferentfromBITONIC-SORTER[n]. Consequently, the depth of MERGER[n]islgn,thesameasthatofBITONIC-SORTER[n].

Exercises

27.4-1 Prove an analog of the zero-one principle for merging networks. Specifically, show that a comparison network that can merge any two monotonically increasing se- Construction of a Merging Network (2/2)

718 Chapter 27 Sorting Networks

0 0 0 0 0 0 BITONIC- 0 0 sorted SORTER[n/2] 1 1 1 0 1 0 0 1 sorted 0 1 1 1 1 1 BITONIC- 1 1 sorted SORTER[n/2] 1 1 1 1 1 1 1 1 (a) (b)

Figure 27.11 Anetworkthatmergestwosortedinputsequencesintoonesorted output sequence. The network MERGER[n]canbeviewedasBITONIC-SORTER[n]withthefirsthalf-cleaneralteredto compare inputs i and n i 1fori 1, 2,...,n/2. Here, n 8. (a) The network decomposed into − + = = the first stage followed by two parallel copies of BITONIC-SORTER[n/2]. (b) The same network with the recursion unrolled. Sample zero-one values are shown on the wires, and the stages are shaded.

quences of 0’s and 1’s can merge any two monotonically increasing sequences of arbitrary numbers. I. Sorting Networks Batcher’s Sorting Network 16 27.4-2 How many different zero-one input sequences must be applied to the input of a comparison network to verify that it is a merging network?

27.4-3 Show that any network that can merge 1 item with n 1sorteditemstoproducea sorted sequence of length n must have depth at least lg− n.

27.4-4 ! Consider a merging network with inputs a1, a2,...,an ,forn an exact power of 2, in which the two monotonic sequences to be merged are a1, a3,...,an 1 and " − # a2, a4,...,an .Provethatthenumberofcomparatorsinthiskindofmerging network" is "(n#lg n).Whyisthisaninterestinglowerbound?(Hint: Partition the comparators into three sets.)

27.4-5 ! Prove that any merging network, regardless of the order of inputs, requires "(n lg n) comparators. Construction of a Sorting Network 27.3 A bitonic sorting network 715

0 0 0 0 0 0 Main Components BITONIC- 0 0 SORTER[n/2] 1 0 0 0 0 0 HALF- 1 0 1.B ITONIC-SORT[n] 1. bitonic sorted CLEANER[n] 1 1 1 0 0 0 sorts any bitonic sequence BITONIC- 0 1 SORTER[n/2] 1 1 depth log n 718 Chapter 27 Sorting Networks 0 1 0 1 1 1 2.M ERGER[n] (a) (b) 0 0 0 0 0 0 merges two sorted input sequences BITONIC- 0 0 sorted Figure 27.9SORTERThe[n/2] comparison network BITONIC-SORTER[n], shown1 here for1n 8. (a) The re- 1 = 0 depth log n cursive construction: HALF-CLEANER[n]followedbytwocopiesofBITONIC-SORTER[n/2] that 1 0 0 1 operate in parallel. (b) The network after unrolling the recursion. Each half-cleaner is shaded. Sam- 2. 1 1 sorted ple zero-one values are shown on the wires. 0 1 1 1 BITONIC- 1 1 sorted SORTER[n/2] 1 1 The bitonic sorter 1 1 1 1 1 1 By(a) recursively combining half-cleaners, as shown in Figure(b) 27.9, we can build a bitonic sorter,whichisanetworkthatsortsbitonicsequences.Thefirststage 720 Chapterof BITONIC 27 Sorting-SORTER Networks[n]consistsofHALF-CLEANER[n], which, by Lemma 27.3, Batcher’s Sorting Network producesFigure 27.11 twoAnetworkthatmergestwosortedinputsequencesintoonesor bitonic sequences of half the size such that everyted element output sequence. in the topThenetwork half is M atERGER least[n as]canbeviewedasB small as everyITONIC element-SORTER in the[n]withthefirsthalf-cleaneralteredto bottom half. Thus, we can compare inputs i and n i 1fori 1, 2,...,n/2. Here, n 8. (a) The network decomposed into complete the sort by− using+ two= copiesMERGER of B[2]ITONIC=-SORTER[n/2] to sort the two ORTER[ ] the first stage followed by two parallel copies of BITONIC-SORTER[n/2]. (b) The same network with S n is defined recursively: SORTER[n/2] MERGER[4] halvesthe recursion recursively. unrolled.In Sample Figure zero-one 27.9(a), values the are recursion shown on the has wires, been and shown the stages explicitly, are shaded. and If n = 2k , use two copies of SORTER[n/2] to in Figure 27.9(b), the recursion has beenMERGER unrolled[2] to show theprogressivelysmaller half-cleanersM thatERGER make[n] up the remainder of the bitonic sorter. TheMERGER depth[8] D(n) of quences of 0’s and 1’s can merge any two monotonically increasing sequences of sort two subsequences of length n/2 each. BITONIC-SORTER[n]isgivenbytherecurrenceMERGER[2] arbitrary numbers. Then merge them using MERGER[n]. SORTER[n/2] MERGER[4] 0ifn 1 , MERGER[2] D27.4-2(n) = k If n = 1, network consists of a single wire. = D(n/2) 1ifn 2 and k 1 , How many(a)! different+ zero-one= input sequences≥ must be(b) applied to the input of a whosecomparison solution network is D(n to) verifylg n. that it is a merging network? = Thus, a zero-one bitonic sequence can be sorted by BITONIC-SORTER,which can be seen as a parallel version of merge sort 1 0 has27.4-3 a0 depth of lg n.Itfollowsbytheanalogofthezero-oneprinciplegivenas0 0 1 ExerciseShow0 that 27.3-6 any network that any that bitonic can0 merge sequence 1 item of witharbitraryn 1sorteditemstoproducea numbers can be sorted by − 1 0 thissorted network.1 sequence of length n must0 have depth at least lg n. 1 1 I. Sorting Networks Batcher’s Sorting Network 0 0 0 27.4-40 !17 1 Exercises 0 0 1 Consider0 a merging network with1 inputs a1, a2,...,an ,forn an exact power of 2, 0 0 in which0 the two monotonic1 sequences to be merged are a1, a3,...,an 1 and 27.3-1 " − # 0 0 a , a1 ,...,a .Provethatthenumberofcomparatorsinthiskindofmerging1 How" 2 many4 zero-onen# bitonic sequences of length n are there? depth 1 2 2network3 4 is4"(4n lg4 n5).Whyisthisaninterestinglowerbound?(5 6 Hint: Partition the comparators(c) into three sets.)

27.4-5 ! FigureProve 27.12 thatThe any sorting merging network network, SORTER[n regardless]constructedbyrecursivelycombiningmergingnet- of the order of inputs, requires works."(n(a)lg nThe) comparators. recursive construction. (b) Unrolling the recursion. (c) Replacing the MERGER boxes with the actual merging networks. The depth of each comparator is indicated, and sample zero-one values are shown on the wires.

27.5-2 Show that the depth of SORTER[n]isexactly(lg n)(lg n 1)/2. + 27.5-3 Suppose that we have 2n elements a1, a2,...,a2n and wish to partition them into the n smallest and the n largest. Prove! that we can" do this in constant additional depth after separately sorting a1, a2,...,an and an 1, an 2,...,a2n . ! " ! + + " 27.5-4 ! Let S(k) be the depth of a sorting network with k inputs, and let M(k) be the depth of a merging network with 2k inputs. Suppose that we have a sequence of n numbers to be sorted and we know that every number is within k positions of its correct position in the sorted order. Show that we can sort the n numbers in depth S(k) 2M(k). + 720 Chapter 27 Sorting Networks

720720Unrolling Chapter Chapterthe 27 Recursion 27 Sorting Sorting Networks Networks (Figure 27.12) MERGER[2] SORTER[n/2] MERGER[4]

MERGERMERGER[2] [2]MERGER[2] SORTERSORTER[n/2][n/2] MERGER[n] MERGERMERGER[4] [4] MERGER[8] MERGERMERGER[2] [2] MERGER[2] MERGERMERGER[n] [n] MERGERMERGER[8] [8] SORTER[n/2] MERGER[4] MERGERMERGER[2] [2] MERGER[2] SORTERSORTER[n/2][n/2] MERGERMERGER[4] [4] MERGER[2] (a) MERGER[2] (b) (a) (a) (b) (b)

1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 Recursion for D(n): 0 0 1 1 0 0 0 0 1 0 1 0 1 0 0 1 1 0 1 1 1 0 ( 0 00 1 1 1 1 0 0 0 0 if n = 1, 0 0 D(n) = 1 11 0 0 0 0 0 0 0 k 1 0 D(n/2) + log n if n = 2 . 0 00 1 1 0 0 1 1 1 0 00 0 0 0 0 00 1 1 1 2 0 0 1 1 Solution: D(n) = Θ(log n). 0 00 0 1 1 1 1 depth 1 2 2 3 4 4 4 4 5 5 6 depthdepth1 1 2 2 23 2 34 4 44 44 54 5 46 5 5 6 (c) (c) (c) SORTER[n] has depth Θ(log2 n) and sorts any input. FigureFigure 27.12 27.12TheThe sorting sorting network network SORTER SORTER[n]constructedbyrecursivelycombiningmergingnet-[n]constructedbyrecursivelycombiningmergingnet- works.works.Figure(a) The(a) 27.12The recursive recursiveThe construction. sortingconstruction. network(b) Unrolling(b) Unrolling SORTER the recursion. the[n]constructedbyrecursivelycombiningmergingnet- recursion.(c) Replacing(c) Replacing the M theERGER MERGERboxesboxes withwith theworks. actual the actual(a) mergingThe merging recursive networks. networks. construction. The Thedepth depth of each(b) of eachUnrolling comparat comparator the isor indicated, recursion. is indicated, and(c) sampleandReplacing sample zero-one zero-one the MERGER boxes valuesvalueswith areI. shown arethe Sorting shown actual on Networks the on merging wires. the wires. networks. Batcher’s The Sorting depth Network of each comparator is indicated, and18 sample zero-one values are shown on the wires. 27.5-227.5-2 ShowShow that that the thedepth depth of S ofORTER SORTER[n]isexactly[n]isexactly(lg n(lg)(lgn)(nlg n1)/2.1)/2. 27.5-2 + + 27.5-327.5-3Show that the depth of SORTER[n]isexactly(lg n)(lg n 1)/2. SupposeSuppose that that we havewe have 2n elements 2n elementsa , a ,...,, a ,...,a aandand wish wish to partition to partition+ them them into into 1 ! 21 2 2n 2n" the then27.5-3smallestn smallest and and the then largest.n largest. Prove! Prove that that we wecan" can do this do this in constant in constant additional additional depthdepthSuppose after after separately separatelythat we sorting have sorting 2an1,elementsa21,...,, a2,...,an aand1n, aand2a,...,n 1a,nan1a,22an,...,n and2,...,a2 wishn a.2n to. partition them into ! ! " " ! +! + + + " " the n smallest and the n largest. Prove! that we can" do this in constant additional 27.5-427.5-4! ! depth after separately sorting a1, a2,...,an and an 1, an 2,...,a2n . LetLetS(k)S(bek) thebe the depth depth of a of sorting a sorting network! network with withk inputs,k" inputs, and! and+ let letM(+kM) (bek) thebe the" depthdepth of a of merging a merging network network with with 2k inputs. 2k inputs. Suppose Suppose that that we havewe have a sequence a sequence of n of n numbersnumbers27.5-4 to be to! sorted be sorted and and we weknow know that that every every number number is within is withink positionsk positions of its of its correctcorrectLet positionS position(k) be in the thein thesorted depth sorted order. of order. a Show sorting Show that that networkwe canwe can sort with sort the thnkenumbersinputs,n numbers in and depth in letdepthM(k) be the S(k)Sdepth(k)2M2( ofkM).( ak merging). network with 2k inputs. Suppose that we have a sequence of n numbers+ + to be sorted and we know that every number is within k positions of its correct position in the sorted order. Show that we can sort the n numbers in depth S(k) 2M(k). + A Glimpse at the AKS Network

Ajtai, Komlós, Szemerédi (1983) There exists a sorting network with depth O(log n).

Quite elaborate construction, and involves huges constants.

Perfect Halver A perfect halver is a comparator network that, given any input, places the n/2 smaller keys in b1,..., bn/2 and the n/2 larger keys in bn/2+1,..., bn.

2 Perfect halver of depth log2 n exist yields sorting networks of depth Θ((log n) ).

Approximate Halver An (n, )-approximate halver,  < 1, is a comparator network that for every k = 1, 2,..., n/2 places at most k of its k smallest keys in bn/2+1,..., bn and at most k of its k largest keys in b1,..., bn/2.

We will prove that such networks can be constructed in constant depth!

I. Sorting Networks Batcher’s Sorting Network 19 Expander Graphs

Expander Graphs A bipartite (n, d, µ)-expander is a graph with: G has n vertices (n/2 on each side) the edge-set is the union of d matchings For every subset S ⊆ V being in one part,

|N(S)| ≥ min{µ · |S|, n/2 − |S|}

L R

Expander Graphs: probabilistic construction “easy”: take d (disjoint) random matchings explicit construction is a deep mathematical problem with ties to number theory, group theory, combinatorics etc. many applications in networking, complexity theory and coding theory

I. Sorting Networks Batcher’s Sorting Network 20 From Expanders to Approximate Halvers

1

2

3 1 6 4 2 7 3 8 5 4 9 6 5 10 7

L R 8

9

10

I. Sorting Networks Batcher’s Sorting Network 21 From Expanders to Approximate Halvers

1

2

3 1 6 4 2 7 3 8 5 4 9 6 5 10 7

L R 8

9

10

I. Sorting Networks Batcher’s Sorting Network 21 From Expanders to Approximate Halvers

1

2

3 1 6 4 2 7 3 8 5 4 9 6 5 10 7

L R 8

9

10

I. Sorting Networks Batcher’s Sorting Network 21 From Expanders to Approximate Halvers

1

2

3 1 6 4 2 7 3 8 5 4 9 6 5 10 7

L R 8

9

10

I. Sorting Networks Batcher’s Sorting Network 21 Here we used that k ≤ n/2

Existence of Approximate Halvers

Proof: X := wires with the k smallest inputs Y := wires in lower half with k smallest outputs For every u ∈ N(Y ): ∃ comparator (u, v) Let ut , vt be their keys after the comparator ut ud Let ud , vd be their keys at the output Note that vd ∈ Y ⊆ X Further: ud ≤ ut ≤ vt ≤ vd ⇒ ud ∈ X Since u was arbitrary: |Y | + |N(Y )| ≤ k. Since G is a bipartite (n, d, µ)-expander: |Y | + |N(Y )| ≥ |Y | + min{µ|Y |, n/2 − |Y |} = min{(1 + µ)|Y |, n/2}. v v Combining the two bounds above yields: t d (1 + µ)|Y | ≤ k. The same argument shows that at most  · k,  := 1/(µ + 1), of the k largest input keys are placed in b1,..., bn/2.

I. Sorting Networks Batcher’s Sorting Network 22 AKS network vs. Batcher’s network

Richard J. Lipton (Georgia Tech) Donald E. Knuth (Stanford) “The AKS sorting network is “Batcher’s method is much : it needs that n be better, unless n exceeds the galactic larger than 278 or so to finally total memory capacity of all be smaller than Batcher’s computers on earth!” network for n items.”

I. Sorting Networks Batcher’s Sorting Network 23 Siblings of Sorting Network

comparator Sorting Networks 7 2 < sorts any input of size n = special case of Comparison Networks 2 > 7

switch Switching (Shuffling) Networks 7 ? creates a random of n items special case of Permutation Networks 2 ?

Counting Networks balancer 7 5 balances any stream of tokens over n wires special case of Balancing Networks 2 4

I. Sorting Networks Batcher’s Sorting Network 24 Outline

Introduction to Sorting Networks

Batcher’s Sorting Network

Counting Networks

Load Balancing on Graphs

I. Sorting Networks Counting Networks 25 Number of tokens differs by at most one

Counting Network

Distributed Counting Processors collectively assign successive values from a given range.

Values could represent addresses in memories or destinations on an interconnection network

Balancing Networks constructed in a similar manner like sorting networks instead of comparators, consists of balancers balancers are asynchronous flip-flops that forward tokens from its inputs to one of its two outputs alternately (top, bottom, top,...)

I. Sorting Networks Counting Networks 26 Counting Network

Distributed Counting Processors collectively assign successive values from a given range.

Values could represent addresses in memories or destinations on an interconnection network

Balancing Networks constructed in a similar manner like sorting networks instead of comparators, consists of balancers balancers are asynchronous flip-flops that forward tokens from its inputs to one of its two outputs alternately (top, bottom, top,...)

Number of tokens differs by at most one

I. Sorting Networks Counting Networks 26 Bitonic Counting Network

Counting Network (Formal Definition)

1. Let x1, x2,..., xn be the number of tokens (ever received) on the designated input wires

2. Let y1, y2,..., yn be the number of tokens (ever received) on the designated output wires Pn Pn 3. In a quiescent state: i=1 xi = i=1 yi 4. A counting network is a balancing network with the step-property:

0 ≤ yi − yj ≤ 1 for any i < j.

I. Sorting Networks Counting Networks 27 0 0 Let z1,..., zn/2 and zn/2+1,..., zn be the outputs of the MERGER[n/2] subnetworks

0 Balancer between zj and zj will ensure that the step property holds.

n > 2: 0 0 IH ⇒ z1,..., zn/2 and zn/2+1,..., zn have the step property Pn/2 0 Pn/2 0 Let Z := i=1 zi and Z := i=1 zi F1 ⇒ Z = d 1 Pn/2 x e + b 1 Pn x c and Z 0 = b 1 Pn/2 x c + d 1 Pn x e 2 i=1 i 2 i=n/2+1 i 2 i=1 i 2 i=n/2+1 i 0 Case 1: If Z = Z , then F2 implies the output of MERGER[n] is yi = z1+b(i−1)/2c X 0 0 0 Case 2: If |Z − Z | = 1, F3 implies zi = zi for i = 1,..., n/2 except a unique j with zj 6= zj .

Correctness of the Bitonic Counting Network

Facts

Let x1,..., xn and y1,..., yn have the step property. Then: Pn/2  1 Pn  Pn/2  1 Pn  1. We have i=1 x2i−1 = 2 i=1 xi , and i=1 x2i = 2 i=1 xi Pn Pn 2. If i=1 xi = i=1 yi , then xi = yi for i = 1,..., n. Pn Pn 3. If i=1 xi = i=1 yi + 1, then ∃! j = 1, 2,..., n with xj = yj + 1 and xi = yi for j 6= i.

Key Lemma

Consider aM ERGER[n]. Then if the inputs x1,..., xn/2 and xn/2+1,..., xn have the step property, then so does the output y1,..., yn.

Proof (by induction on n) Case n = 2 is clear, since MERGER[2] is a single balancer

I. Sorting Networks Counting Networks 28 0 Balancer between zj and zj will ensure that the step property holds.

z0 1

z0 2 z0 3

z0 4

0 0 IH ⇒ z1,..., zn/2 and zn/2+1,..., zn have the step property Pn/2 0 Pn/2 0 Let Z := i=1 zi and Z := i=1 zi F1 ⇒ Z = d 1 Pn/2 x e + b 1 Pn x c and Z 0 = b 1 Pn/2 x c + d 1 Pn x e 2 i=1 i 2 i=n/2+1 i 2 i=1 i 2 i=n/2+1 i 0 Case 1: If Z = Z , then F2 implies the output of MERGER[n] is yi = z1+b(i−1)/2c X 0 0 0 Case 2: If |Z − Z | = 1, F3 implies zi = zi for i = 1,..., n/2 except a unique j with zj 6= zj .

Correctness of the Bitonic Counting Network

Facts

Let x1,..., xn and y1,..., yn have the step property. Then: Pn/2  1 Pn  Pn/2  1 Pn  1. We have i=1 x2i−1 = 2 i=1 xi , and i=1 x2i = 2 i=1 xi Pn Pn 2. If i=1 xi = i=1 yi , then xi = yi for i = 1,..., n. Pn Pn 3. If i=1 xi = i=1 yi + 1, then ∃! j = 1, 2,..., n with xj = yj + 1 and xi = yi for j 6= i.

z1

z2

z3

z4

Proof (by induction on n) Case n = 2 is clear, since MERGER[2] is a single balancer 0 0 n > 2: Let z1,..., zn/2 and zn/2+1,..., zn be the outputs of the MERGER[n/2] subnetworks

I. Sorting Networks Counting Networks 28 0 Balancer between zj and zj will ensure that the step property holds.

0 0 IH ⇒ z1,..., zn/2 and zn/2+1,..., zn have the step property Pn/2 0 Pn/2 0 Let Z := i=1 zi and Z := i=1 zi F1 ⇒ Z = d 1 Pn/2 x e + b 1 Pn x c and Z 0 = b 1 Pn/2 x c + d 1 Pn x e 2 i=1 i 2 i=n/2+1 i 2 i=1 i 2 i=n/2+1 i 0 Case 1: If Z = Z , then F2 implies the output of MERGER[n] is yi = z1+b(i−1)/2c X 0 0 0 Case 2: If |Z − Z | = 1, F3 implies zi = zi for i = 1,..., n/2 except a unique j with zj 6= zj .

Correctness of the Bitonic Counting Network

Facts

Let x1,..., xn and y1,..., yn have the step property. Then: Pn/2  1 Pn  Pn/2  1 Pn  1. We have i=1 x2i−1 = 2 i=1 xi , and i=1 x2i = 2 i=1 xi Pn Pn 2. If i=1 xi = i=1 yi , then xi = yi for i = 1,..., n. Pn Pn 3. If i=1 xi = i=1 yi + 1, then ∃! j = 1, 2,..., n with xj = yj + 1 and xi = yi for j 6= i.

z1 z0 1 z2 z0 2 z0 3 z3 z0 4 z4

Proof (by induction on n) Case n = 2 is clear, since MERGER[2] is a single balancer 0 0 n > 2: Let z1,..., zn/2 and zn/2+1,..., zn be the outputs of the MERGER[n/2] subnetworks

I. Sorting Networks Counting Networks 28 Correctness of the Bitonic Counting Network

Facts

Let x1,..., xn and y1,..., yn have the step property. Then: Pn/2  1 Pn  Pn/2  1 Pn  1. We have i=1 x2i−1 = 2 i=1 xi , and i=1 x2i = 2 i=1 xi Pn Pn 2. If i=1 xi = i=1 yi , then xi = yi for i = 1,..., n. Pn Pn 3. If i=1 xi = i=1 yi + 1, then ∃! j = 1, 2,..., n with xj = yj + 1 and xi = yi for j 6= i.

z1 z0 1 z2 z0 2 z0 3 z3 z0 4 z4

Proof (by induction on n) Case n = 2 is clear, since MERGER[2] is a single balancer 0 0 n > 2: Let z1,..., zn/2 and zn/2+1,..., zn be the outputs of the MERGER[n/2] subnetworks 0 0 IH ⇒ z1,..., zn/2 and zn/2+1,..., zn have the step property Pn/2 0 Pn/2 0 Let Z := i=1 zi and Z := i=1 zi F1 ⇒ Z = d 1 Pn/2 x e + b 1 Pn x c and Z 0 = b 1 Pn/2 x c + d 1 Pn x e 2 i=1 i 2 i=n/2+1 i 2 i=1 i 2 i=n/2+1 i 0 Case 1: If Z = Z , then F2 implies the output of MERGER[n] is yi = z1+b(i−1)/2c X 0 0 0 Case 2: If |Z − Z | = 1, F3 implies zi = zi for i = 1,..., n/2 except a unique j with zj 6= zj . 0 Balancer between zj and zj will ensure that the step property holds. I. Sorting Networks Counting Networks 28 2 26 1 5

42 1265

4 15 2 6

236 415

35 15 4 3

6513 43

36 3 4

Counting can be done as follows: Add local counter to each output wire i, to assign consecutive numbers i, i + n, i + 2 · n,...

Bitonic Counting Network in Action

4 x1 y1

2 x2 y2

5 3 1 x3 y3

6 x4 y4

I. Sorting Networks Counting Networks 29 4 2 26

42 1265

2 4 15

236 415

5 3 135 15 4

6513 43

6 36 3

Bitonic Counting Network in Action

x1 y1 1 5

x2 y2 2 6

x3 y3 3

x4 y4 4

Counting can be done as follows: Add local counter to each output wire i, to assign consecutive numbers i, i + n, i + 2 · n,...

I. Sorting Networks Counting Networks 29 A Periodic Counting Network [Aspnes, Herlihy, Shavit, JACM 1994]

x1 y1

x2 y2

x3 y3

x4 y4

x5 y5

x6 y6

x7 y7

x8 y8

Consists of log n BLOCK[n] networks each of which has depth log n

I. Sorting Networks Counting Networks 30 From Counting to Sorting The converse is not true! Counting vs. Sorting If a network is a counting network, then it is also a sorting network.

Proof. Let C be a counting network, and S be the corresponding sorting network n Consider an input sequence a1, a2,..., an ∈ {0, 1} to S n Define an input x1, x2,..., xn ∈ {0, 1} to C by xi = 1 iff ai = 0. C is a counting network ⇒ all ones will be routed to the lower wires S corresponds to C ⇒ all zeros will be routed to the lower wires By the Zero-One Principle, S is a sorting network.

0 1 1 1 1 1 0

1 0 0 1 1 0 0

C 1 1 1 0 0 0 1 S

0 0 0 0 0 1 1

I. Sorting Networks Counting Networks 31 Outline

Introduction to Sorting Networks

Batcher’s Sorting Network

Counting Networks

Load Balancing on Graphs

I. Sorting Networks Load Balancing on Graphs 32 Communication Models: Diffusion vs. Matching

1 2 1 2

6 3 6 3

5 4 5 4

 1 1 1   1 1  3 3 0 0 0 3 2 2 0 0 0 0 1 1 1 1 1  3 3 3 0 0 0  2 2 0 0 0 0  1 1 1  0 0 0 0 0 0 0 3 3 3 0 0 (t)   M =   M =  1 1  0 0 1 1 1 0 0 0 0 0 3 3 3  2 2   1 1 1  0 0 0 1 1 0 0 0 0 3 3 3   2 2  1 1 1 0 0 0 0 0 0 3 0 0 0 3 3

I. Sorting Networks Load Balancing on Graphs 33 Smoothness of the Load Distribution

n let x ∈ R be a load vector x denotes the average load

Metrics q t Pn t 2 3 3 `2-norm: Φ = (x − x) i=1 i 6.5 n t makespan: max x 2 i=1 i 2 n t n 3.5 discrepancy: maxi=1 xi − mini=1 xi . 1.5 2.5

For this example: √ √ Φt = 02 + 02 + 3.52 + 0.52 + 12 + 12 + 1.52 + 0.52 = 17 n t maxi=1 xi = 6.5 n t n t maxi=1 xi − mini=1 xi = 5

I. Sorting Networks Load Balancing on Graphs 34 Diffusion Matrix

How to choose α for a d-regular graph? α = 1 d may yield to oscillation (if graph is bipartite) α = 1 d+1 ensures convergence α = 1 2d ensures convergence (and all eigenvalues of M are non-negative) Diffusion Matrix Given an undirected, connected graph G = (V , E) and a diffusion pa- rameter α > 0, the diffusion matrix M is defined as follows:  α if (i, j) ∈ E,  Mij = 1 − α deg(i) if i = j, 0 otherwise. # neighbors of i

Further let γ(M) := maxµi 6=1 |µi |, where µ1 = 1 ≥ µ2 ≥ · · · ≥ µn ≥ −1 are the eigenvalues of M. This can be also seen as a random walk on G!

First-Order Diffusion: Load vector x t satisfies

x t = M · x t−1.

I. Sorting Networks Load Balancing on Graphs 35 1D grid 2D grid 3D grid

γ(M) ≈ 1 − 1 γ( ) ≈ − 1 γ(M) ≈ 1 − 1 n2 M 1 n n2/3

Hypercube Random Graph Complete Graph

1 γ(M) ≈ 1 − log n γ(M) < 1 γ(M) ≈ 0 γ(M) ∈ (0, 1] measures connectivity of G

I. Sorting Networks Load Balancing on Graphs 36 afterafter iteration iteration 20: 1:2:4:3:5:

1.111.301.481.561.860

1.110.560.931.281.850 2.223.332.102.061.88

1.111.671.231.341.85 3.332.782.602.431.90

1.711.671.661.86 2.473.332.782.601.90

2.221.672.161.88

Diffusion on a Ring

0

0 0

0 10

5 0

0

I. Sorting Networks Load Balancing on Graphs 37 afterafter iteration iteration 20: 2:4:3:5:

1.111.301.481.561.860

1.110.560.931.281.850 2.222.102.061.880

1.111.231.341.850 2.782.602.431.9010

1.711.661.865 2.472.782.601.900

2.222.161.880

Diffusion on a Ring

after iteration 1:

0

0 3.33

1.67 3.33

1.67 3.33

1.67

I. Sorting Networks Load Balancing on Graphs 37 after iteration 1:2:4:3:5:

1.111.301.481.560

1.110.560.931.280 2.223.332.102.060

1.111.671.231.340 3.332.782.602.4310

1.711.671.665 2.473.332.782.600

2.221.672.160

Diffusion on a Ring

after iteration 20:

1.86

1.85 1.88

1.85 1.90

1.86 1.90

1.88

I. Sorting Networks Load Balancing on Graphs 37 Convergence of the Quadratic Error (Upper Bound)

Lemma

Let γ(M) := maxµi 6=1 |µi |, where µ1 = 1 ≥ µ2 ≥ · · · ≥ µn ≥ −1 are the eigenvalues of M. Then for any iteration t,

Φt ≤ γ(M)2t · Φ0.

Proof: Let et = x t − x, where x is the column vector with all entries set to x Express et through the orthogonal basis given by the eigenvectors of M: n t X e = α1 · v1 + α2 · v2 + ··· + αn · vn = αi · vi . i=2 t For the diffusion scheme, e is orthogonal to v1 n ! n t+1 t X X e = Me = M · αi vi = αi µi vi . i=2 i=2

Taking norms and using that the vi ’s are orthogonal, n n t+1 t X 2 2 2 X 2 2 t ke k2 = kMe k2 = ci µi kvi k2 ≤ γ ci kvi k2 = γ · ke k2 i=2 i=2

I. Sorting Networks Load Balancing on Graphs 38 Convergence of the Quadratic Error (Lower Bound)

Lemma 0 For any eigenvalue µi , 1 ≤ i ≤ n, there is an initial load vector x so that

t 2t 0 Φ = µi · Φ .

Proof: 0 Let x = x + vi , where vi is the eigenvector corresponding to µi Then

t t−1 t 0 t t e = Me = M e = M vi = µi vi ,

and

t t 2t 2t 0 Φ = ke k2 = µi kvi k2 = µi Φ .

I. Sorting Networks Load Balancing on Graphs 39 Outlook: Idealised versus Discrete Case

Here load consists of integers that cannot be divided further.

Idealised Case Discrete Case Rounding Error

x t = M · x t−1 y t = M · y t−1 +∆ t t 0 t = M · x X − = Mt · y 0 + Mt s · ∆s s=1

Linear System Non-Linear System corresponds to Markov chain rounding of a Markov chain well-understood harder to analyze

Given any load vector x0, the num- How close can it be made ber of iterations until xt satisfies log(Φ0/) Φt ≤  is at most . to the idealised case? γ(M)

I. Sorting Networks Load Balancing on Graphs 40 II. Matrix Multiplication Thomas Sauerwald

Easter 2015 Outline

Introduction

Serial Matrix Multiplication

Reminder: Multithreading

Multithreaded Matrix Multiplication

II. Matrix Multiplication Introduction 2 4.2 Strassen’s for matrix multiplication 75

ray is 0.Howwouldyouchangeanyofthealgorithmsthatdonotallowempty subarrays to permit an empty subarray to be the result?

4.1-5 Use the following ideas to develop a nonrecursive, linear-time algorithm for the maximum-subarray problem. Start at the left end of the array, and progress toward the right, keeping track of the maximum subarray seen so far. Knowing a maximum subarray of AŒ1 : : j ,extendtheanswertofindamaximumsubarrayendingatin- dex j 1 by using the following observation: a maximum subarray of AŒ1 : : j 1 C C is either a maximum subarray of AŒ1 : : j  or a subarray AŒi : : j 1,forsome C 1 i j 1.DetermineamaximumsubarrayoftheformAŒi : : j 1 in Ä Ä C C constant time based on knowing a maximum subarray ending at index j .

4.2 Strassen’s algorithm for matrix multiplication

If you have seen matrices before, then you probably know how to multiply them. (Otherwise, you should read Section D.1 in Appendix D.) If A .aij / and D MatrixB .bij Multiplication/ are square n n matrices, then in the product C A B,wedefinethe D " D # entry cij ,fori;j 1; 2; : : : ; n,by D n cij aik bkj : (4.8) Remember:D If# A = (aij ) and B = (bij ) are square n × n matrices, then the k 1 matrixX productD C = A · B is defined by n2 n We must compute matrix entries,n and each is the sum of values. The following procedure takes n n matricesX A and B and multiplies them, returning their n n " cij = aik · bkj ∀i, j = 1, 2,..., n. " product C .Weassumethateachmatrixhasanattributerows,givingthenumber k=1 of rows in the matrix.

2 3 SQUARE-MATRIX-MULTIPLY.A; B/ This definition suggests that n · n = n 1 n A:rows arithmetic operations are necessary. D 2letC be a new n n matrix " 3 for i 1 to n D 4 for j 1 to n D 5 cij 0 D 6 for k 1 to n D 7 cij cij aik bkj D C # 8 return C

The SQUARE-MATRIX-MULTIPLY procedure works as follows. The3 for loop SQUARE-MATRIX-MULTIPLY(A, B) takes time Θ(n ). of lines 3–7 computes the entries of each row i,andwithinagivenrowi,the

II. Matrix Multiplication Introduction 3 Outline

Introduction

Serial Matrix Multiplication

Reminder: Multithreading

Multithreaded Matrix Multiplication

II. Matrix Multiplication Serial Matrix Multiplication 4 Divide & Conquer: First Approach

Assumption: n is always an exact power of 2.

Divide & Conquer: Partition A, B, and C into four n/2 × n/2 matrices: A A  B B  C C  A = 11 12 , B = 11 12 , C = 11 12 . A21 A22 B21 B22 C21 C22

Hence the equation C = A · B becomes: C C  A A  B B  11 12 = 11 12 · 11 12 C21 C22 A21 A22 B21 B22

This corresponds to the four equations: C = A · B + A · B 11 11 11 12 21 Each equation specifies C12 = A11 · B12 + A12 · B22 two multiplications of

C21 = A21 · B11 + A22 · B21 n/2×n/2 matrices and the addition of their products. C22 = A21 · B12 + A22 · B22

II. Matrix Multiplication Serial Matrix Multiplication 5 8 Multiplications Goal:4 Additions Reduce and the Partitioning number of multiplications

Divide4.2 Strassen’s & algorithmConquer: for matrix First multiplication Approach (Pseudocode) 77

SQUARE-MATRIX-MULTIPLY-RECURSIVE.A; B/ 1 n A:rows D Line 5: Handle submatrices implicitly through 2letC be a new n n matrix ! 3 if n == 1 index calculations instead of creating them. 4 c11 a11 b11 D " 5 else partition A, B,andC as in equations (4.9) 6 C11 SQUARE-MATRIX-MULTIPLY-RECURSIVE.A11;B11/ D SQUARE-MATRIX-MULTIPLY-RECURSIVE.A12;B21/ C 7 C12 SQUARE-MATRIX-MULTIPLY-RECURSIVE.A11;B12/ D SQUARE-MATRIX-MULTIPLY-RECURSIVE.A12;B22/ C 8 C21 SQUARE-MATRIX-MULTIPLY-RECURSIVE.A21;B11/ D SQUARE-MATRIX-MULTIPLY-RECURSIVE.A22;B21/ C 9 C22 SQUARE-MATRIX-MULTIPLY-RECURSIVE.A21;B12/ D SQUARE-MATRIX-MULTIPLY-RECURSIVE.A22;B22/ C 10 return C

This pseudocode glosses over one subtle but important implementation detail. LetHowT do(n we) be partition the runtime the matrices of in this line5? procedure. If we were to Then: create 12 new n=2 n=2 ‚.n2/ ! matrices, we would spend time( copying entries. In fact, we can partition the matrices without copying entries.Θ( The1) trick is to use index calculations.if n = 1, We T (n) = identify a submatrix by a range of row8 · indicesT (n/ and2) + a range Θ(n2 of) columnif n > indices1. of the original matrix. We end up representing a submatrix a little differently from how we represent the original matrix, which is the subtlety we are glossing over. Solution:The advantageT (n is) that, = Θ( since8log we2 n) can =specify Θ(n3) submatricesNo improvement by index calculations, over the naive algorithm! executing line 5 takes only ‚.1/ time (although we shall see that it makes no difference asymptotically to the overall running time whether we copy or partition II. Matrix Multiplication Serial Matrix Multiplication in place). 6 Now, we derive a recurrence to characterize the running time of SQUARE- MATRIX-MULTIPLY-RECURSIVE.LetT.n/ be the time to multiply two n n ! matrices using this procedure. In the base case, when n 1,weperformjustthe D one scalar multiplication in line 4, and so T.1/ ‚.1/ : (4.15) D The recursive case occurs when n>1.Asdiscussed,partitioningthematricesin line 5 takes ‚.1/ time, using index calculations. In lines 6–9, we recursively call SQUARE-MATRIX-MULTIPLY-RECURSIVE atotalofeighttimes.Becauseeach recursive call multiplies two n=2 n=2 matrices, thereby contributing T.n=2/ to ! the overall running time, the time taken by all eight recursive calls is 8T .n=2/.We also must account for the four matrix additions in lines 6–9. Each of these matrices contains n2=4 entries, and so each of the four matrix additions takes ‚.n2/ time. Since the number of matrix additions is a constant, the total time spent adding ma- Divide & Conquer (Second Approach)

Idea: Make the recursion tree less bushy by performing only 7 recursive multiplications of n/2 × n/2 matrices.

Strassen’s Algorithm (1969) 1. Partition each of the matrices into four n/2 × n/2 submatrices

2. Create 10 matrices S1, S2,..., S10. Each is n/2 × n/2 and is the sum or difference of two matrices created in the previous step.

3. Recursively compute 7 matrix products P1, P2,..., P7, each n/2 × n/2 4. Compute n/2 × n/2 submatrices of C by adding and subtracting various combinations of the Pi .

Time for steps 1,2,4: Θ(n2), hence T (n) = 7 · T (n/2) + Θ(n2) ⇒ T (n) = Θ(nlog 7).

II. Matrix Multiplication Serial Matrix Multiplication 7 Other three blocks can be verified similarly.

Details of Strassen’s Algorithm

The 10 Submatrices and 7 Products

P1 = A11 · S1 = A11 · (B12 − B22) P2 = S2 · B22 = (A11 + A12) · B22 P3 = S3 · B11 = (A21 + A22) · B11 P4 = A22 · S4 = A22 · (B21 − B11) P5 = S5 · S6 = (A11 + A22) · (B11 + B22) P6 = S7 · S8 = (A12 − A22) · (B21 + B22)

P7 = S9 · S10 = (A11 − A21) · (B11 + B12)

Claim A B + A B A B + A B  P + P − P + P P + P  11 11 12 21 11 12 12 21 = 5 4 2 6 1 2 A21B11 + A22B21 A21B12 + A22B22 P3 + P4 P5 + P1 − P3 − P7

Proof:      P5 + P4 − P2 + P6 = A11B11 +A11B22 +A22B11 +A22B22 +A22B21 −A22B11      −A11B22 −A12B22 + A12B21 +A12B22 −A22B21 −A22B22 = A11B11 + A12B21

II. Matrix Multiplication Serial Matrix Multiplication 8 Current State-of-the-Art

Conjecture: Does a quadratic-time algorithm exist?

Asymptotic Complexities: O(n3), naive approach O(n2.808), Strassen (1969) O(n2.796), Pan (1978) O(n2.522), Schönhage (1981) O(n2.517), Romani (1982) O(n2.496), Coppersmith and Winograd (1982) O(n2.479), Strassen (1986) O(n2.376), Coppersmith and Winograd (1989) O(n2.374), Stothers (2010) O(n2.3728642), V. Williams (2011) O(n2.3728639), Le Gall (2014) ...

II. Matrix Multiplication Serial Matrix Multiplication 9 Outline

Introduction

Serial Matrix Multiplication

Reminder: Multithreading

Multithreaded Matrix Multiplication

II. Matrix Multiplication Reminder: Multithreading 10 Memory Models

Distributed Memory Each processor has its private memory Access to memory of another processor via messages

1 2 3 4 5 6

Shared Memory Central location of memory Each processor has direct access

Shared Memory

1 2 3 4 5 6

II. Matrix Multiplication Reminder: Multithreading 11 Dynamic Multithreading

Programming shared-memory parallel computer difficult Use concurrency platform which coordinates all resources

Scheduling jobs, communication protocols, load balancing etc.

Functionalities: spawn (optional) prefix to a procedure call statement procedure is executed in a separate thread sync wait until all spawned threads are done parallel (optinal) prefix to the standard loop for each iteration is called in its own thread

Only logical parallelism, but not actual! Need a scheduler to map threads to processors.

II. Matrix Multiplication Reminder: Multithreading 12 Computing Fibonacci Numbers Recursively (Fig. 27.1) 27.1 The basics of dynamic multithreading 775

FIB.6/

FIB.5/ FIB.4/

FIB.4/ FIB.3/ FIB.3/ FIB.2/

FIB.3/ FIB.2/ FIB.2/ FIB.1/ FIB.2/ FIB.1/ FIB.1/ FIB.0/

FIB.2/ FIB.1/ FIB.1/ FIB.0/ FIB.1/ FIB.0/ FIB.1/ FIB.0/

FIB.1/ FIB.0/ Very inefficient – exponential time!

Figure 27.1 The tree of recursive procedure instances when computing FIB.6/.Eachinstanceof FIB with the same argument does the same work to produce the same result, providing an inefficient 0: FIB(n)but interesting way to compute Fibonacci numbers. 1: if n<=1 return n 2: elseFIB.n/ x=FIB(n-1) 3: 1 ify=FIB(n-2)n 1 Ä n 4: 2 returnreturn x+y 3 else x FIB.n 1/ D " 4 y FIB.n 2/ II. Matrix MultiplicationD " Reminder: Multithreading 13 5 return x y C You would not really want to compute large Fibonacci numbers this way, be- cause this computation does much repeated work. Figure 27.1 shows the tree of recursive procedure instances that are created when computing F6.Forexample, acalltoFIB.6/ recursively calls FIB.5/ and then FIB.4/.But,thecalltoFIB.5/ also results in a call to FIB.4/.BothinstancesofFIB.4/ return the same result (F4 3). Since the FIB procedure does not memoize, the second call to FIB.4/ D replicates the work that the first call performs. Let T.n/ denote the running time of FIB.n/.SinceFIB.n/ contains two recur- sive calls plus a constant amount of extra work, we obtain the recurrence T.n/ T.n 1/ T.n 2/ ‚.1/ : D " C " C This recurrence has solution T.n/ ‚.Fn/,whichwecanshowusingthesubsti- D tution method. For an inductive hypothesis, assume that T.n/ aFn b,where Ä " a>1and b>0are constants. Substituting, we obtain P-FIB(4)

P-FIB(3) P-FIB(2)

P-FIB(2) P-FIB(1) P-FIB(1) P-FIB(0) Computation Dag G = (V , E) • V set of threads (instructions/strands without parallel control) •P-FIB(1)E set of dependenciesP-FIB(0) Total work ≈ 17 nodes, longest path: 8 nodes

0: P-FIB(n) 1: if n<=1 return n 2: else x=spawn P-FIB(n-1) 3: y=spawn P-FIB(n-2) 4: sync 5: return x+y

Computing Fibonacci Numbers in Parallel (Fig. 27.2)

• Without spawn and sync same pseudocode as before • spawn does not imply parallel execution (depends on scheduler)

0: P-FIB(n) 1: if n<=1 return n 2: else x=spawn P-FIB(n-1) 3: y=spawn P-FIB(n-2) 4: sync 5: return x+y

II. Matrix Multiplication Reminder: Multithreading 14 P-FIB(4)

P-FIB(3) P-FIB(2)

P-FIB(2) P-FIB(1) P-FIB(1) P-FIB(0)

• Without spawn and sync same pseudocode as before • spawnP-FIB(1)does not implyP-FIB(0) parallelTotal execution work ≈ (depends17 nodes, on longest scheduler path:) 8 nodes

Computing Fibonacci Numbers in Parallel (Fig. 27.2)

Computation Dag G = (V , E) • V set of threads (instructions/strands without parallel control) • E set of dependencies

0: P-FIB(n) 1: if n<=1 return n 2: else x=spawn P-FIB(n-1) 3: y=spawn P-FIB(n-2) 4: sync 5: return x+y

II. Matrix Multiplication Reminder: Multithreading 14 Computation Dag G = (V , E) • Without• V setspawn of threadsand (instructions/strandssync same pseudocode without as parallel before control) • spawn• E setdoes of dependencies not imply parallel execution (depends on scheduler)

Computing Fibonacci Numbers in Parallel (Fig. 27.2)

P-FIB(4)

P-FIB(3) P-FIB(2)

P-FIB(2) P-FIB(1) P-FIB(1) P-FIB(0)

P-FIB(1) P-FIB(0) Total work ≈ 17 nodes, longest path: 8 nodes

0: P-FIB(n) 1: if n<=1 return n 2: else x=spawn P-FIB(n-1) 3: y=spawn P-FIB(n-2) 4: sync 5: return x+y

II. Matrix Multiplication Reminder: Multithreading 14 Computing Fibonacci Numbers in Parallel (DAG Perspective)

4 4 4

2 2 2

0

1

3 3 3

1

2 2 2

0

1

II. Matrix Multiplication Reminder: Multithreading 15 #nodesP = 18= 5

Span Longest time to execute the threads along any path.

If each thread takes unit time, span is the length of the critical path.

Performance Measures

P = 26 Work Total time to execute everything on single processor. 4

3 6 5

2 5

1

4

II. Matrix Multiplication Reminder: Multithreading 16 #nodesP = 26= 5

If each thread takes unit time, span is the length of the critical path.

Performance Measures

P = 18 Work Total time to execute everything on single processor. 4

3 6 5 Span

Longest time to execute the threads along any path. 2 5

1

4

II. Matrix Multiplication Reminder: Multithreading 16 P = 2618

4

3 6 5

2 5

1

4

Performance Measures

#nodes = 5 Work Total time to execute everything on single processor.

Span Longest time to execute the threads along any path.

If each thread takes unit time, span is the length of the critical path.

II. Matrix Multiplication Reminder: Multithreading 16 T∞ = 5

Span Law

TP ≥ T∞ Time on P processors can’t be shorter than time on ∞ processors

Speed-Up: T1 Maximum Speed-Up bounded by P! TP Parallelism: T1 T∞ Maximum Speed-Up for ∞ processors!

Work Law and Span Law

T1 = work, T∞ = span P = number of (identical) processors

TP = running time on P processors T1 = 8, P = 4

Running time actually also depends on scheduler etc.! Work Law

T1 TP ≥ P Time on P processors can’t be shorter than if all work all time

II. Matrix Multiplication Reminder: Multithreading 17 T1 = 8, P = 4

Speed-Up: T1 Maximum Speed-Up bounded by P! TP Parallelism: T1 T∞ Maximum Speed-Up for ∞ processors!

Work Law and Span Law

T1 = work, T∞ = span P = number of (identical) processors

TP = running time on P processors T∞ = 5

Running time actually also depends on scheduler etc.! Work Law

T1 TP ≥ P Time on P processors can’t be shorter than if all work all time

Span Law

TP ≥ T∞ Time on P processors can’t be shorter than time on ∞ processors

II. Matrix Multiplication Reminder: Multithreading 17 T1 =T∞8,=P5= 4

Work Law and Span Law

T1 = work, T∞ = span P = number of (identical) processors

TP = running time on P processors

Running time actually also depends on scheduler etc.! Work Law

T1 TP ≥ P Time on P processors can’t be shorter than if all work all time

Span Law

TP ≥ T∞ Time on P processors can’t be shorter than time on ∞ processors

Speed-Up: T1 Maximum Speed-Up bounded by P! TP Parallelism: T1 T∞ Maximum Speed-Up for ∞ processors!

II. Matrix Multiplication Reminder: Multithreading 17 Outline

Introduction

Serial Matrix Multiplication

Reminder: Multithreading

Multithreaded Matrix Multiplication

II. Matrix Multiplication Multithreaded Matrix Multiplication 18 27.1 The basics of dynamic multithreading 785

value for n suffices to achieve near perfect linear speedup for P-FIB.n/,because this procedure exhibits considerable parallel slackness.

Parallel loops Many algorithms contain loops all of whose iterations can operate in parallel. As we shall see, we can parallelize such loops using the spawn and sync keywords, but it is much more convenient to specify directly that the iterations of such loops can run concurrently. Our pseudocode provides this functionality via the parallel Warmup:concurrency Matrixkeyword, Vector which precedes Multiplication the for keyword in a for loop statement. As an example, consider the problem of multiplying an n n matrix A .aij / ! D Remember:by an n-vector Multiplyingx .xj /.Theresulting an n × n matrixn-vectorA =y (a ).yandi / isn given-vector by thex = equation (x ) yields D Dij j an n-vectorn y = (yi ) given by y a x ; i D ij j n j 1 X XD yi = aij xj for i = 1, 2,..., n. for i 1; 2; : : : ; n.Wecanperformmatrix-vectormultiplicationbycomputingallj=1 D the entries of y in parallel as follows:

MAT-VEC.A; x/ 1 n A:rows D 2lety be a new vector of length n 3 parallel for i 1 to n D 4 yi 0 D The parallel for-loops can be used since 5 parallel for i 1 to n D different entries of y can be computed concurrently. 6 for j 1 to n D 7 yi yi aij xj D C 8 return y

In this code, the parallel for keywords in lines 3 and 5 indicate that the itera- tions of theHow respective can a compiler loops may implement be run concurrently. the parallel A compiler for can-loop? implement each parallel for loop as a divide-and-conquer subroutine using nested parallelism. For example, the parallel for loop in lines 5–7 can be implemented with the call MAT-VEC-MAINII.-L MatrixOOP Multiplication.A; x; y; n; 1; n/ Multithreaded,wherethecompilerproducestheauxil- Matrix Multiplication 19 iary subroutine MAT-VEC-MAIN-LOOP as follows: 786 Chapter 27 Multithreaded Algorithms

1,8 27.1 The basics of dynamic multithreading 785

value for n suffices to achieve near perfect linear speedup for P-FIB.n/,because 1,4 5,8 this procedure exhibits considerable parallel slackness.

Parallel loops 1,2 3,4 5,6 7,8 Implementing786parallel Chapter 27 Multithreaded for AlgorithmsbasedMany on algorithms Divide-and-Conquer contain loops all of whose iterations can operate in parallel. As we shall see, we can parallelize such loops using the spawn and sync keywords, but it is much more convenient to specify directly that the iterations of such loops can run concurrently. Our pseudocode provides this functionality via the parallel 1,1 2,2 3,3 4,4 5,5 6,6 7,7 8,8 1,8 concurrency keyword, which precedes the for keyword in a for loop statement. As an example, consider the problem of multiplying an n n matrix A .aij / ! D by an n-vector x .xj /.Theresultingn-vector y .yi / is given by the equation Figure 27.4 AdagrepresentingthecomputationofM1,4AT-VEC-MAIN-LOOP.A; x; y; 8; 1; 5,8 8/.TheD D two numbers within each rounded rectangle give the values of the last two parametersn (i and i0 in the procedure header) in the invocation (spawn or call) of the procedure. They black circlesa repre-x ; i D ij j sent strands corresponding to either1,2 the base case or the 3,4 part of the procedure 5,6 up toj the1 spawn of 7,8 MAT-VEC-MAIN-LOOP in line 5; the shaded circles represent strands correspondingX toD the part of the procedure that calls MAT-VEC-MAIN-LOOP in line 6 up to the sync in linefor 7, wherei 1; it suspends 2; : : : ; n.Wecanperformmatrix-vectormultiplicationbycomputingall until the spawned subroutine in line 5 returns; and the white circles represent strandsD corresponding 1,1 2,2 3,3 4,4 5,5the entries 6,6 of y in 7,7 parallel 8,8 as follows: to the (negligible) part of the procedure after the sync up to the point where it returns. .A; x/ Figure 27.4 AdagrepresentingthecomputationofMMATAT-V-VECEC-MAIN-LOOP.A; x; y; 8; 1; 8/.The .A; x; y; n; i; i / i i MAT-VEC-MAIN-LOOP two numbers0 within each rounded rectangle give the1 valuesn of theA: lastrows two parameters ( and 0 in the procedure header) in the invocation (spawn or call) of theD procedure. The black circles repre- 1 if i == i 0 sent strands corresponding to either the base case2let or the party of thebe procedure a new up vector to the spawn of length of n j 1 n MAT-VEC-MAIN-LOOP in line 5; the shaded circles represent strands corresponding to the part of 2 for to the procedure that calls MAT-VEC-MAIN-LOOP in3 line 6parallel up to the sync forin linei 7, where1 itto suspendsn D D 3 yi yi aij xj until the spawned subroutine in line 5 returns; and the4 white circlesyi represent0 strands corresponding D C to the (negligible) part of the procedure after the sync up to the point whereD it returns. 4 else mid .i i 0/=2 5 parallel for i 1 to n D b C c .A; x; y; n; i; / D 5 spawn MAT-VEC-MAIN-LMATOOP-VEC-MAIN-LOOP.A;mid x; y; n; i; i 0/ 6 for j 1 to n D 6MAT-VEC-MAIN-LOOP.A; x;i y;i n;0 mid 1; i 0/ yi yi aij xj 1 if == C 7 2 for j 1 to n D C 7 sync D 8 return y 3 yi yi aij xj D C 4 else mid .i i 0/=2 D b C c This code recursively2 spawns theWork5 first halfspawn is ofM equal theAT-V iterationsEC-M toAIN-L running ofOOP the.A;In x; loop this y; n;toi; timecode,mid execute/ the ofparallel its serialization; for keywords in lines overhead 3 and 5 indicate that the itera- 6MAT-VEC-MAIN-LOOP.A; x; y; n; mid 1; i 0/ inT parallel1(n) with = Θ( then second) half of the iterations and then executestions a syncC of the,thereby respective loops may be run concurrently. A compiler can implement 7 ofsync recursive spawning does not change asymptotics. creating a of execution where the leaves are individualeach loopparallel iterations, for loop as a divide-and-conquer subroutine using nested parallelism. as shown in Figure 27.4. This code recursively spawns the first halfFor of the example, iterations of the theparallel loop to execute for loop in lines 5–7 can be implemented with the call To calculate the work T1.n/ of MinAT parallel-VEC withon the an secondn n halfmatrix, of the iterations weM simplyAT and-VEC then compute-M executesAIN-L a syncOOP,thereby.A; x; y; n; 1; n/,wherethecompilerproducestheauxil- T∞(n) = Θ(log n) + maxcreating aiter binary( treen) of! executionSpan where the is leaves the are individual depth loop of iterations, recursive callings plus the running time of its serialization, which we obtain by replacingiary the subroutineparallel for MAT-VEC-MAIN-LOOP as follows: 1as≤ showni≤n in Figure 27.4. 2 To calculate the workT1.n/T1.n/ ofthe‚.n MAT-V maximum/EC on an n n matrix, span we simply of compute any of the n iterations. loops with ordinary for loops. Thus, we have ,becausethequa-! the running time of its serialization,D which we obtain by replacing the parallel for dratic running time of the doubly nested loops in lines 5–7 dominates. This analysis2 = Θ(n). loops with ordinary for loops. Thus, we have T1.n/ ‚.n /,becausethequa- D dratic running time of the doubly nested loops in lines 5–7 dominates. This analysis II. Matrix Multiplication Multithreaded Matrix Multiplication 20 Naive Algorithm in Parallel

27.2 Multithreaded matrix multiplication 793

P-SQUARE-MATRIX-MULTIPLY.A; B/ 1 n A:rows D 2letC be a new n n matrix ! 3 parallel for i 1 to n D 4 parallel for j 1 to n D 5 cij 0 D 6 for k 1 to n D 7 cij cij aik bkj D C " 8 return C

To analyze this algorithm, observe that since the serialization of the algorithm is 3 3 P-SQUAREjust SQUARE-MATRIX-M-MATRIXULTIPLY-MULTIPLY(A, B),theworkisthereforesimplyhas work T1(n) = Θ(n ) andT1.n/ span T‚.n∞(n/), = Θ(n). D the same as the running time of SQUARE-MATRIX-MULTIPLY.Thespanis T .n/ ‚.n/,becauseitfollowsapathdownthetreeofrecursionforthe 1 D The first two nested for-loops parallelise perfectly. parallel for loop starting in line 3, then down the tree of recursion for the parallel for loop starting in line 4, and then executes all n iterations of the ordinary for loop starting in line 6, resulting in a total span of ‚.lg n/ ‚.lg n/ ‚.n/ ‚.n/. 3 2 C C D Thus, the parallelism is ‚.n /=‚.n/ ‚.n /.Exercise27.2-3asksyoutopar- D 3 allelize the innerII. loop Matrix Multiplication to obtain a parallelism Multithreaded of Matrix‚.n Multiplication= lg n/,whichyoucannotdo21 straightforwardly using parallel for,becauseyouwouldcreateraces.

Adivide-and-conquermultithreaded algorithm for matrix multiplication As we learned in Section 4.2, we can multiply n n matrices serially in time 7 2:81 ! ‚.nlg / O.n / using Strassen’s divide-and-conquer strategy, which motivates D us to look at multithreading such an algorithm. We begin, as we did in Section 4.2, with multithreading a simpler divide-and-conquer algorithm. Recall from page 77 that the SQUARE-MATRIX-MULTIPLY-RECURSIVE proce- dure, which multiplies two n n matrices A and B to produce the n n matrix C , ! ! relies on partitioning each of the three matrices into four n=2 n=2 submatrices: ! A A B B C C A 11 12 ;B 11 12 ;C 11 12 : D A21 A22 D B21 B22 D C21 C22 Â Ã Â Ã Â Ã Then, we can write the matrix product as

C11 C12 A11 A12 B11 B12 C21 C22 D A21 A22 B21 B22 Â Ã Â ÃÂ Ã A11B11 A11B12 A12B21 A12B22 : (27.6) D A21B11 A21B12 C A22B21 A22B22 Â Ã Â Ã Thus, to multiply two n n matrices, we perform eight multiplications of n=2 n=2 ! ! matrices and one addition of n n matrices. The following pseudocode implements ! 794 Chapter 27 Multithreaded Algorithms

this divide-and-conquer strategy using nested parallelism. Unlike the SQUARE- MATRIX-MULTIPLY-RECURSIVE procedure on which it is based, P-MATRIX- TheMULTIPLY Simple-RECURSIVE Divide&Conquertakes the output matrix as aApproach parameter to avoid in allocating Parallel matrices unnecessarily.

P-MATRIX-MULTIPLY-RECURSIVE.C; A; B/ 1 n A:rows D 2 if n == 1 3 c11 a11b11 D 4 else let T be a new n n matrix ! 5partitionA, B, C ,andT into n=2 n=2 submatrices ! A11;A12;A21;A22; B11;B12;B21;B22; C11;C12;C21;C22; and T11;T12;T21;T22;respectively 6 spawn P-MATRIX-MULTIPLY-RECURSIVE.C11;A11;B11/ 7 spawn P-MATRIX-MULTIPLY-RECURSIVE.C12;A11;B12/ 8 spawn P-MATRIX-MULTIPLY-RECURSIVE.C21;A21;B11/ 9 spawn P-MATRIX-MULTIPLY-RECURSIVE.C22;A21;B12/ 10 spawn P-MATRIX-MULTIPLY-RECURSIVE.T11;A12;B21/ 11 spawn P-MATRIX-MULTIPLY-RECURSIVE.T12;A12;B22/ 12 spawn P-MATRIX-MULTIPLY-RECURSIVE.T21;A22;B21/ 13 P-MATRIX-MULTIPLY-RECURSIVE.T22;A22;B22/ 14 sync 15 parallel for i 1 to n D 16 parallel for j 1 to n D 17 cij cij tij D C The same as before. Line 3 handles the base case, where we are multiplying 1 1 matrices. We handle ! the recursive case in lines 4–17. We allocate a temporary matrix T in line 4,3 and 2 P-MATRIXline 5 partitions-MULTIPLY each of-R theECURSIVE matrices A, B,hasC ,and workT intoTn=21(n)n=2 =submatrices. Θ(n ) and span T∞(n) = Θ(log n). ! (As with SQUARE-MATRIX-MULTIPLY-RECURSIVE on page 77, we gloss over the minor issue of how to use index calculations to represent submatrix sections C of a matrix.) The recursive call in line 6 sets the submatrixT∞11 (ton the) = submatrixT∞(n/2) + Θ(log n) product A11B11,sothatC11 equals the first of the two terms that form its sum in equation (27.6). Similarly, lines 7–9 set C12, C21,andC22 to the first of the two terms that equal theirII. sums Matrix in Multiplication equation (27.6). Line Multithreaded 10 sets the Matrix submatrix MultiplicationT11 to 22 the submatrix product A12B21,sothatT11 equals the second of the two terms that form C11’s sum. Lines 11–13 set T12, T21,andT22 to the second of the two terms that form the sums of C12, C21,andC22,respectively.Thefirstsevenrecursive calls are spawned, and the last one runs in the main strand. The sync statement in line 14 ensures that all the submatrix products in lines 6–13 have been computed, Strassen’s Algorithm in Parallel

Strassen’s Algorithm (parallelised) 1. Partition each of the matrices into four n/2 × n/2 submatrices This step takes Θ(1) work and span by index calculations.

2. Create 10 matrices S1, S2,..., S10. Each is n/2 × n/2 and is the sum or difference of two matrices created in the previous step.

Can create all 10 matrices with Θ(n2) work and Θ(log n) span using doubly nested parallel for loops.

3. Recursively compute 7 matrix products P1, P2,..., P7, each n/2 × n/2 Recursively spawn the computation of the seven products.

4. Compute n/2 × n/2 submatrices of C by adding and subtracting various combinations of the Pi . log 7 Using doubly nested parallel for T1(n) = Θ(n ) 2 2 this takes Θ(n ) work and Θ(log n) span. T∞(n) = Θ(log n)

II. Matrix Multiplication Multithreaded Matrix Multiplication 23 Matrix Multiplication and Matrix Inversion

Speedups for Matrix Inversion by an equivalence with Matrix Multiplication.

Theorem 28.1 (Multiplication is no harder than Inversion) If we can invert an n × n matrix in time I(n), where I(n) = Ω(n2) and I(n) satisfies the regularity condition I(3n) = O(I(n)), then we can multiply two n × n matrices in time O(I(n)).

Proof: Define a 3n × 3n matrix D by:     In A 0 In −AAB −1 D = 0 In B ⇒ D = 0 In −B . 0 0 In 0 0 In

Matrix D can be constructed in Θ(n2) = O(I(n)) time, and we can invert D in O(I(3n)) = O(I(n)) time. ⇒ We can compute AB in O(I(n)) time.

II. Matrix Multiplication Multithreaded Matrix Multiplication 24 The Other Direction

Theorem 28.1 (Multiplication is no harder than Inversion) If we can invert an n × n matrix in time I(n), where I(n) = Ω(n2) and I(n) satisfies the regularity condition I(3n) = O(I(n)), then we can multiply two n × n matrices in time O(I(n)).

Allows us to use Strassen’s Algorithm to invert a matrix!

Theorem 28.2 (Inversion is no harder than Multiplication) Suppose we can multiply two n × n real matrices in time M(n) and M(n) satisfies the two regularity conditions M(n + k) = O(M(n)) for any 0 ≤ k ≤ n and M(n/2) ≤ c · M(n) for some constant c < 1/2. Then we can compute the inverse of any real nonsingular n×n matrix in time O(M(n)).

Proof of this directon much harder (CLRS) – relies on properties of SPD matrices.

II. Matrix Multiplication Multithreaded Matrix Multiplication 25 III. Linear Programming Thomas Sauerwald

Easter 2015 Outline

Introduction

Standard and Slack Forms

Formulating Problems as Linear Programs

Simplex Algorithm

Finding an Initial Solution

III. Linear Programming Introduction 2 Introduction

Linear Programming (informal definition) maximize or minimize an objective, given limited resources and competing constraint constraints are specified as (in)equalities

Example: Political Advertising Imagine you are a politician trying to win an election Your district has three different types of areas: Urban, suburban and rural, each with, respectively, 100,000, 200,000 and 50,000 registered voters Aim: at least half of the registered voters in each of the three regions should vote for you Possible Actions: Advertise on one of the primary issues which are (i) building more roads, (ii) gun control, (iii) farm subsidies and (iv) a gasoline tax dedicated to improve public transit.

III. Linear Programming Introduction 3 Political Advertising Continued

policy urban suburban rural build roads −2 5 3 gun control 8 2 −5 farm subsidies 0 0 10 gasoline tax 10 0 −2 The effects of policies on voters. Each entry describes the number of thousands of voters who could be won over by spending $1,000 on advertising support of a policy on a particular issue.

Possible Solution: $20,000 on advertising to building roads $0 on advertising to gun control $4,000 on advertising to farm subsidies $9,000 on advertising to a gasoline tax Total cost: $33,000

What is the best possible strategy?

III. Linear Programming Introduction 4 Towards a Linear Program

policy urban suburban rural build roads −2 5 3 gun control 8 2 −5 farm subsidies 0 0 10 gasoline tax 10 0 −2 The effects of policies on voters. Each entry describes the number of thousands of voters who could be won (lost) over by spending $1,000 on advertising support of a policy on a particular issue.

x1 = number of thousands of dollars spent on advertising on building roads

x2 = number of thousands of dollars spent on advertising on gun control

x3 = number of thousands of dollars spent on advertising on farm subsidies

x4 = number of thousands of dollars spent on advertising on gasoline tax Constraints:

−2x1 + 8x2 + 0x3 + 10x4 ≥ 50 Objective: Minimize x1 + x2 + x3 + x4 5x1 + 2x2 + 0x3 + 0x4 ≥ 100

3x1 − 5x2 + 10x3 − 2x4 ≥ 25

III. Linear Programming Introduction 5 The Linear Program Linear Program for the Advertising Problem

minimize x1 + x2 + x3 + x4 subject to −2x1 + 8x2 + 0x3 + 10x4 ≥ 50 5x1 + 2x2 + 0x3 + 0x4 ≥ 100 3x1 − 5x2 + 10x3 − 2x4 ≥ 25 x1, x2, x3, x4 ≥ 0 The solution of this linear program yields the optimal advertising strategy.

Formal Definition of Linear Program

Given a1, a2,..., an and a set of variables x1, x2,..., xn, a linear function f is defined by

f (x1, x2,..., xn) = a1x1 + a2x2 + ··· + anxn.

Linear Equality: f (x1, x2,..., xn) = b ≥ Linear Constraints Linear Inequality: f (x1, x2,..., xn) ≤ b Linear-Progamming Problem: either minimize or maximize a linear function subject to a set of linear constraints

III. Linear Programming Introduction 6 x 1 + x x 1 + 2 = x 8 x 2 = 1 + x 7 1 x + 2 = x x 1 6 + 2 = x x 2 5 1 + = x x 4 2 1 + = x x 3 1 + 2 = x 1 x 2 Graphical Procedure: Move the line + 2 = x 1 x1 + x2 = z as far up as possible. 2 = 0

While the same approach also works for higher-dimensions, we need to take a more systematic and algebraic procedure.

A Small(er) Example

x2 2

≥ − 8 2 x 2 ≤

2 − x

1 x − maximize x1 + x2 5 1 x

subject to 4 4x1 − x2 ≤ 8 2x1 + x2 ≤ 10 5x − 2x ≥ −2 x1 ≥ 0 1 2 2 x x1, x2 ≥ 0 1 +

x 2 ≤ Any setting of x1 and x2 satisfying all constraints is a feasible solution 10 x1 x2 ≥ 0

III. Linear Programming Introduction 7 x 1 + x x 2 = 1 + x 7 1 x + 2 = x x 1 6 + 2 = x x 2 5 1 + = x x 4 2 1 + = x x 3 1 + 2 = x 1 x 2 2 Any setting of x1 and x2 satisfying + = x 1 all constraints is a feasible solution 2 = 0

A Small(er) Example

x2 2

x 1 ≥ − + 8 2 x x ≤ 2 = 2 2 8 − x 1 x − maximize x1 + x2 5 1 x

subject to 4 4x1 − x2 ≤ 8 2x1 + x2 ≤ 10 5x − 2x ≥ −2 x1 ≥ 0 1 2 2 x x1, x2 ≥ 0 1 +

x 2 Graphical Procedure: Move the line ≤ 10 x1 + x2 = z as far up as possible. x1 x2 ≥ 0

While the same approach also works for higher-dimensions, we need to take a more systematic and algebraic procedure.

III. Linear Programming Introduction 7 Outline

Introduction

Standard and Slack Forms

Formulating Problems as Linear Programs

Simplex Algorithm

Finding an Initial Solution

III. Linear Programming Standard and Slack Forms 8 Standard and Slack Forms

Standard Form

n X maximize cj xj Objective Function j=1 subject to n X aij xj ≤ bi for i = 1, 2,..., m n + m Constraints j=1

xj ≥ 0 for j = 1, 2,..., n

Non-Negativity Constraints

Standard Form (Matrix-Vector-Notation)

maximize cT x Inner product of two vectors subject to Ax ≤ b Matrix-vector product x ≥ 0

III. Linear Programming Standard and Slack Forms 9 Converting Linear Programs into Standard Form

Reasons for a LP not being in standard form: 1. The objective might be a minimization rather than maximization. 2. There might be variables without nonnegativity constraints. 3. There might be equality constraints 4. There might be inequality constraints (with ≤ instead of ≥)

Goal: Convert linear program into an equivalent program which is in standard form

Equivalence: a correspondence (not necessarily a bijection) between solutions so that their objective values are identical.

When switching from maximization to minimization, sign of objective value changes.

III. Linear Programming Standard and Slack Forms 10 Converting into Standard Form (1/5)

Reasons for a LP not being in standard form: 1. The objective might be a minimization rather than maximization.

minimize −2x1 + 3x2 subject to x1 + x2 = 7 x1 − 2x2 ≤ 4 x1 ≥ 0

Negate objective function

maximize 2x1 − 3x2 subject to x1 + x2 = 7 x1 − 2x2 ≤ 4 x1 ≥ 0

III. Linear Programming Standard and Slack Forms 11 Converting into Standard Form (2/5)

Reasons for a LP not being in standard form: 2. There might be variables without nonnegativity constraints.

maximize 2x1 − 3x2 subject to x1 + x2 = 7 x1 − 2x2 ≤ 4 x1 ≥ 0

Replace x2 by two non-negative 0 00 variables x2 and x2

0 00 maximize 2x1 − 3x2 + 3x2 subject to 0 00 x1 + x2 − x2 = 7 0 00 x1 − 2x2 + 2x2 ≤ 4 0 00 x1, x2, x2 ≥ 0

III. Linear Programming Standard and Slack Forms 12 Converting into Standard Form (3/5)

Reasons for a LP not being in standard form: 3. There might be equality constraints

0 00 maximize 2x1 − 3x2 + 3x2 subject to 0 00 x1 + x2 − x2 = 7 0 00 x1 − 2x2 + 2x2 ≤ 4 0 00 x1, x2, x2 ≥ 0

Replace each equality by two inequalities.

0 00 maximize 2x1 − 3x2 + 3x2 subject to 0 00 x1 + x2 − x2 ≥ 7 0 00 x1 + x2 − x2 ≤ 7 0 00 x1 − 2x2 + 2x2 ≤ 4 0 00 x1, x2, x2 ≥ 0

III. Linear Programming Standard and Slack Forms 13 Converting into Standard Form (4/5)

Reasons for a LP not being in standard form: 4. There might be inequality constraints (with ≤ instead of ≥)

0 00 maximize 2x1 − 3x2 + 3x2 subject to 0 00 x1 + x2 − x2 ≥ 7 0 00 x1 + x2 − x2 ≤ 7 0 00 x1 − 2x2 + 2x2 ≤ 4 0 00 x1, x2, x2 ≥ 0

Negate respective inequalities.

0 00 maximize 2x1 − 3x2 + 3x2 subject to 0 00 x1 + x2 − x2 ≥ 7 0 00 −x1 − x2 + x2 ≥ −7 0 00 x1 − 2x2 + 2x2 ≤ 4 0 00 x1, x2, x2 ≥ 0

III. Linear Programming Standard and Slack Forms 14 Converting into Standard Form (5/5)

Rename variable names (for consistency).

maximize 2x1 − 3x2 + 3x3 subject to x1 + x2 − x3 ≥ 7 −x1 − x2 + x3 ≥ −7 x1 − 2x2 + 2x3 ≤ 4 x1, x2, x3 ≥ 0

It is always possible to convert a linear program into standard form.

III. Linear Programming Standard and Slack Forms 15 Converting Standard Form into Slack Form (1/3)

Goal: Convert standard form into slack form, where all constraints except for the non-negativity constraints are equalities.

For the simplex algorithm, it is more con- venient to work with equality constraints.

Introducing Slack Variables Pn Let j=1 aij xj ≤ bi be an inequality constraint Introduce a slack variable s by

n X s = b − a x s measures the slack between i ij j j=1 the two sides of the inequality. s ≥ 0.

Denote slack variable of the ith inequality by xn+i

III. Linear Programming Standard and Slack Forms 16 Converting Standard Form into Slack Form (2/3)

maximize 2x1 − 3x2 + 3x3 subject to x1 + x2 − x3 ≥ 7 −x1 − x2 + x3 ≥ −7 x1 − 2x2 + 2x3 ≤ 4 x1, x2, x3 ≥ 0

Introduce slack variables

maximize 2x1 − 3x2 + 3x3 subject to x4 = 7 − x1 − x2 + x3 x5 = −7 + x1 + x2 − x3 x6 = 4 − x1 + 2x2 − 2x3 x1, x2, x3, x4, x5, x6 ≥ 0

III. Linear Programming Standard and Slack Forms 17 Converting Standard Form into Slack Form (3/3)

maximize 2x1 − 3x2 + 3x3 subject to x4 = 7 − x1 − x2 + x3 x5 = −7 + x1 + x2 − x3 x6 = 4 − x1 + 2x2 − 2x3 x1, x2, x3, x4, x5, x6 ≥ 0 Use variable z to denote objective function and omit the nonnegativity constraints.

z = 2x1 − 3x2 + 3x3 x4 = 7 − x1 − x2 + x3 x5 = −7 + x1 + x2 − x3 x6 = 4 − x1 + 2x2 − 2x3

This is called slack form.

III. Linear Programming Standard and Slack Forms 18 Basic and Non-Basic Variables

z = 2x1 − 3x2 + 3x3 x4 = 7 − x1 − x2 + x3 x5 = −7 + x1 + x2 − x3 x6 = 4 − x1 + 2x2 − 2x3

Basic Variables: B = {4, 5, 6} Non-Basic Variables: N = {1, 2, 3}

Slack Form (Formal Definition) Slack form is given by a tuple (N, B, A, b, c, v) so that X z = v + cj xj j∈N X xi = bi − aij xj for i ∈ B, j∈N

and all variables are non-negative. Variables on the right hand side are indexed by the entries of N.

III. Linear Programming Standard and Slack Forms 19 Slack Form (Example)

x 2x z = 28 − 3 − x5 − 6 6 6 3 x x x = 8 + 3 + x5 − 6 1 6 6 3 8x x x = 4 − 3 − 2x5 + 6 2 3 3 3 x x = 18 − 3 + x5 4 2 2 Slack Form Notation B = {1, 2, 4}, N = {3, 5, 6}

    a13 a15 a16 −1/6 −1/6 1/3 A = a23 a25 a26 =  8/3 2/3 −1/3 a43 a45 a46 1/2 −1/2 0

  b1  T T b = b2 = 8 4 18 , c = c3 c5 c6 = −1/6 −1/6 −2/3 b3

v = 28

III. Linear Programming Standard and Slack Forms 20 The Structure of Optimal Solutions

Definition A point x is a vertex if it cannot be represented as a strict convex combi- nation of two other points in the feasible set. The set of feasible solutions is a convex set. Theorem If there exists an optimal solution, it occurs at a vertex of the polygon.

Proof: Let x be an optimal solution which is not a vertex x2 ⇒ ∃ vector d so that x − d and x + d are feasible Since A(x + d) = b and Ax = b ⇒ Ad = 0 W.l.o.g. assume cT d ≤ 0 (otherwise replace d by −d) Consider x + λd as a function of λ ≥ 0

Case 1: There exists j with dj < 0 x + d Increase λ from 0 to λ0 until a new entry of x + λd x becomes zero x − d x + λ0d feasible, since A(x + λ0d) = Ax = b and x + λ0d ≥ 0 cT (x + λ0d) = cT x + cT λ0d ≤ cT x x1

III. Linear Programming Standard and Slack Forms 21 The Structure of Optimal Solutions

Definition A point x is a vertex if it cannot be represented as a strict convex combi- nation of two other points in the feasible set. The set of feasible solutions is a convex set. Theorem If there exists an optimal solution, it occurs at a vertex of the polygon.

Proof: Let x be an optimal solution which is not a vertex x2 ⇒ ∃ vector d so that x − d and x + d are feasible Since A(x + d) = b and Ax = b ⇒ Ad = 0 W.l.o.g. assume cT d ≤ 0 (otherwise replace d by −d) Consider x + λd as a function of λ ≥ 0

Case 2: For all j, dj ≥ 0 x + d x + λd is feasible for all λ ≥ 0: A(x + λd) = b and x x + λd ≥ x ≥ 0 x − d If λ → ∞, then cT (x + λd) → ∞ ⇒ This contradicts the assumption that there exists an optimal solution. x1

III. Linear Programming Standard and Slack Forms 21 Outline

Introduction

Standard and Slack Forms

Formulating Problems as Linear Programs

Simplex Algorithm

Finding an Initial Solution

III. Linear Programming Formulating Problems as Linear Programs 22 Shortest Paths

a 5 d Single-Pair Shortest Path Problem 6 4 1 Given: directed graph G = (V , E) with + s b 4 e t edge weights w : E → R , pair of 2 3 vertices s, t ∈ V −2 Goal: Find a path of minimum weight 2 from s to t in G 2 5 1 c f p = (v0 = s, v1,..., vk = t) such that Pk w(p) = i=1 w(vk−1, vk ) is minimized.

Shortest Paths as LP Recall: When BELLMAN-FORD terminates,

maximize dt all these inequalities are satisfied. subject to dv ≤ du + w(u, v) for each edge (u, v) ∈ E, d = 0. this is a maxi- s mization problem! n o Solution d satisfies d v = minu :(u,v)∈E d u + w(u, v)

III. Linear Programming Formulating Problems as Linear Programs 23 0/4

0/8 0/10 0/10 0/2 0/6 0/10 0/9 0/10

Maximum Flow

Maximum Flow Problem + Given: directed graph G = (V , E) with edge capacities c : E → R , pair of vertices s, t ∈ V Goal: Find a maximum flow f : V × V → R from s to t which satisfies the capacity constraints and flow conservation

4/4 2 4 |f | = 19 6/8 9/10 10/10 0/2 5/6 9/10 9/9 10/10 s 3 5 t

Maximum Flow as LP P P maximize v∈V fsv − v∈V fvs subject to fuv ≤ c(u, v) for each u, v ∈ V , P P v∈V fvu = v∈V fuv for each u ∈ V \{s, t}, fuv ≥ 0 for each u, v ∈ V .

III. Linear Programming Formulating Problems as Linear Programs 24 Minimum-Cost Flow

Generalization of the Maximum Flow Problem Minimum-Cost-Flow Problem + Given: directed graph G = (V , E) with capacities c : E → R , pair of + vertices s, t ∈ V , cost function a : E → R , flow demand of d units Goal: Find a flow f : V × V → R from s to t with |f | = d while P minimising the total cost (u,v)∈E a(u, v)fuv incurrred by the flow. Optimal Solution with total cost: P 862 Chapter 29 Linear Programming(u,v)∈E a(u, v)fuv = (2·2)+(5·2)+(3·1)+(7·1)+(1·3) = 27

c x = 2 x 1/2 c = 5 a = 7 2/5 a = 7 a = 2 a = 2 c = 1 1/1 s t s t a = 3 a = 3 c = 2 2/2 a = 5 y c = 4 a = 5 y 3/4 a = 1 a = 1 (a) (b)

Figure 29.3 (a) An example of a minimum-cost-flow problem. We denote the capacities by c and the costs by a.Vertexs is the source and vertex t is the sink, and we wish to send 4 units of flow from s to t. (b) Asolutiontotheminimum-costflowprobleminwhich4 units of flow are sent from s to t.Foreachedge,theflowandcapacityarewrittenasflow/capacity.

III. Linear Programming Formulating Problems as Linear Programs 25 straint that the value of the flow be exactly d units, and with the new objective function of minimizing the cost:

minimize a.u; !/fu! (29.51) .u;!/ E subject to X2 fu! c.u;!/ for each u; ! V; Ä 2 f!u fu! 0 for each u V s; t ; " D 2 " f g ! V ! V X2 X2 f f d; s! " !s D ! V ! V X2 X2 fu! 0 for each u; ! V: (29.52) # 2

Multicommodity flow As a final example, we consider another flow problem. Suppose that the Lucky Puck company from Section 26.1 decides to diversify its product line and ship not only hockey pucks, but also hockey sticks and hockey helmets. Each piece of equipment is manufactured in its own factory, has its own warehouse, and must be shipped, each day, from factory to warehouse. The sticks are manufactured in Vancouver and must be shipped to Saskatoon, and the helmets are manufactured in Edmonton and must be shipped to Regina. The capacity of the shipping network does not change, however, and the different items, or commodities,mustsharethe same network. This example is an instance of a multicommodity-flow problem.Inthisproblem, we are again given a directed graph G .V; E/ in which each edge .u; !/ E D 2 has a nonnegative capacity c.u;!/ 0.Asinthemaximum-flowproblem,weim- # plicitly assume that c.u;!/ 0 for .u; !/ E,andthatthegraphhasnoantipar- D 62 Minimum-Cost Flow as a LP

Minimum Cost Flow as LP P minimize (u,v)∈E a(u, v)fuv subject to

fuv ≤ c(u, v) for each u, v ∈ V , P P v∈V fvu − v∈V fuv = 0 for each u ∈ V \{s, t}, P P v∈V fsv − v∈V fvs = d , fuv ≥ 0 for each u, v ∈ V .

Real power of Linear Programming comes from the ability to solve new problems!

III. Linear Programming Formulating Problems as Linear Programs 26 Outline

Introduction

Standard and Slack Forms

Formulating Problems as Linear Programs

Simplex Algorithm

Finding an Initial Solution

III. Linear Programming Simplex Algorithm 27 Simplex Algorithm: Introduction

Simplex Algorithm classical method for solving linear programs (Dantzig, 1947) usually fast in practice although worst-case runtime not polynomial iterative procedure somewhat similar to Gaussian elimination

Basic Idea: Each iteration corresponds to a “basic solution” of the slack form All non-basic variables are 0, and the basic variables are determined from the equality constraints Each iteration converts one slack form into an equivalent one while the objective value will not decrease In that sense, it is a greedy algorithm. Conversion (“pivoting”) is achieved by switching the roles of one basic and one non-basic variable

III. Linear Programming Simplex Algorithm 28 Extended Example: Conversion into Slack Form

maximize 3x1 + x2 + 2x3 subject to x1 + x2 + 3x3 ≤ 30 2x1 + 2x2 + 5x3 ≤ 24 4x1 + x2 + 2x3 ≤ 36 x1, x2, x3 ≥ 0

Conversion into slack form

z = 3x1 + x2 + 2x3 x4 = 30 − x1 − x2 − 3x3 x5 = 24 − 2x1 − 2x2 − 5x3 x6 = 36 − 4x1 − x2 − 2x3

III. Linear Programming Simplex Algorithm 29 All coefficientsIncreasingIncreasingare theIncreasing negative,value the of value thex2 andwould value of hencex1 of increasewouldx this3 would increasebasic the increase objective solution the objective the is value.optimal objective value.! value.

BasicTheThe second solution: third constraint constraint(x1, x2,..., is is the thex6 tightest) tightest = (339, 0 and, and03, 21 limits69 limits, 6, 0 how) howwith much much objective we we can can value increase increase 27111 x13x.2. BasicBasic solution: solution:(x1,(xx21,...,, x2,...,x6)x =6) ( =4 , (08,, 42 , 04,,180,,00), 0with) with objective objective value value4 28= 27.75

Switch roles of x132 and x635:

Solving for x132 yields:

3 83xx22 2xx3x55 xx6x66 xx2x31===49−−− −− −−+ ... 2 348 234 483

Substitute this into x321 in the other three equations

Extended Example: Iteration 1

z = 3x1 + x2 + 2x3

x4 = 30 − x1 − x2 − 3x3

x5 = 24 − 2x1 − 2x2 − 5x3

x6 = 36 − 4x1 − x2 − 2x3

Basic solution: (x1, x2,..., x6) = (0, 0, 0, 30, 24, 36)

This basic solution is feasible Objective value is 0.

III. Linear Programming Simplex Algorithm 30 All coefficientsIncreasingare theIncreasing negative,value of thex2 andwould value hence of increasex this3 would basic the increase objective solution the is value.optimal objective! value.

BasicTheBasicThe second solution: third solution: constraint constraint(x1(,xx12,,...,x is2 is,..., the thex6 tightest)x tightest6 =) (=339, (0 and,, and03,,210 limits69, limits30, 6,,024 how) how,with36 much) much objective we we can can value increase increase 27111 x3x.2. BasicBasic solution: solution:(x1,(xx21,...,, x2,...,x6)x =6) ( =4 , (08,, 42 , 04,,180,,00), 0with) with objective objective value value4 28= 27.75

This basicSwitch solution roles is feasible of x32 and x35Objective: value is 0.

Solving for x32 yields:

3 83xx22 2xx55 xx66 xx23 == 4 −− −− −+ .. 2 38 34 83

Substitute this into x32 in the other three equations

Extended Example: Iteration 1

Increasing the value of x1 would increase the objective value.

z = 3x1 + x2 + 2x3

x4 = 30 − x1 − x2 − 3x3

x5 = 24 − 2x1 − 2x2 − 5x3

x6 = 36 − 4x1 − x2 − 2x3

The third constraint is the tightest and limits how much we can increase x1.

Switch roles of x1 and x6:

Solving for x1 yields:

x2 x3 x6 x1 = 9 − − − . 4 2 4

Substitute this into x1 in the other three equations

III. Linear Programming Simplex Algorithm 30 All coefficientsIncreasingIncreasingare the negative,value the of valuex2 andwould of hencex1 increasewould this increasebasic the objective solution the objective is value.optimal value.!

TheBasicThe second third solution: constraint constraint(x1, x is2 is,..., the the tightestx tightest6) =33 (0 and, and03, 0 limits69, limits30, 24 how how, 36 much) much we we can can increase increase111 x13x.2. BasicBasic solution: solution:(x1,(xx21,...,, x2,...,x6)x =6) ( =4 , (08,, 42 , 04,,180,,00), 0with) with objective objective value value4 28= 27.75 Objective value is 0. This basicSwitch solution roles is feasible of x132 and x635:

Solving for x132 yields:

3 83xx22 2xx3x55 xx6x66 xx2x31===49−−− −− −−+ ... 2 348 234 483

Substitute this into x321 in the other three equations

Extended Example: Iteration 2

Increasing the value of x3 would increase the objective value.

x x 3x z = 27 + 2 + 3 − 6 4 2 4 x x x x = 9 − 2 − 3 − 6 1 4 2 4 3x 5x x x = 21 − 2 − 3 + 6 4 4 2 4 3x x x = 6 − 2 − 4x + 6 5 2 3 2

Basic solution: (x1, x2,..., x6) = (9, 0, 0, 21, 6, 0) with objective value 27

III. Linear Programming Simplex Algorithm 30 All coefficientsIncreasingIncreasingare theIncreasing negative,value the of value thex2 andwould value of hencex1 of increasewouldx this3 would increasebasic the increase objective solution the objective the is value.optimal objective value.! value.

BasicTheBasicThe second solution: third solution: constraint constraint(x1(,xx12,,...,x is2 is,..., the thex6 tightest)x tightest6 =) (=339, (0 and,, and03,,210 limits69, limits30, 6,,024 how) how,with36 much) much objective we we can can value increase increase 27111 x1x.2. BasicBasic solution: solution:(x1,(xx21,...,, x2,...,x6)x =6) ( =4 , (08,, 42 , 04,,180,,00), 0with) with objective objective value value4 28= 27.75 Objective value is 0. This basicSwitch solution roles is feasible of x12 and x63:

Solving for x12 yields:

8x2 2x3x5 x6x6 x2x1==49−− − −+ . . 34 23 43

Substitute this into x21 in the other three equations

Extended Example: Iteration 2

x x 3x z = 27 + 2 + 3 − 6 4 2 4 x x x x = 9 − 2 − 3 − 6 1 4 2 4 3x 5x x x = 21 − 2 − 3 + 6 4 4 2 4 3x x x = 6 − 2 − 4x + 6 5 2 3 2

The third constraint is the tightest and limits how much we can increase x3.

Switch roles of x3 and x5:

Solving for x3 yields:

3 3x2 x5 x6 x3 = − − − . 2 8 4 8

Substitute this into x3 in the other three equations

III. Linear Programming Simplex Algorithm 30 All coefficientsIncreasing areIncreasing negative, the value the and value of hencex1 ofwouldx this3 would increasebasic increase solution the objective the is optimal objective value.! value.

BasicTheBasicThe second solution: third solution: constraint constraint(x1(,xx12,,...,x is2 is,..., the thex6 tightest)x tightest6 =) (=9, (0 and,, and0,,210 limits, limits30, 6,,024 how) how,with36 much) much objective we we can can value increase increase 27 x13x.2. Basic solution: (x1, x2,..., x6) = (8, 4, 0, 18, 0, 0) with objective value 28

Objective value is 0. This basicSwitch solution roles is feasible of x132 and x635:

Solving for x132 yields:

3 83xx22 2xx3x55 xx6x66 xx2x31===49−−− −− −−+ ... 2 348 234 483

Substitute this into x321 in the other three equations

Extended Example: Iteration 3

Increasing the value of x2 would increase the objective value.

x 11x z = 111 + 2 − x5 − 6 4 16 8 16 x 5x x = 33 − 2 + x5 − 6 1 4 16 8 16 3x x x = 3 − 2 − x5 + 6 3 2 8 4 8 3x x x = 69 + 2 + 5x5 − 6 4 4 16 8 16

33 3 69 111 Basic solution: (x1, x2,..., x6) = ( 4 , 0, 2 , 4 , 0, 0) with objective value 4 = 27.75

III. Linear Programming Simplex Algorithm 30 All coefficientsIncreasingIncreasingare theIncreasing negative,value the of value thex2 andwould value of hencex1 of increasewouldx this3 would increasebasic the increase objective solution the objective the is value.optimal objective value.! value.

BasicBasicThe solution: third solution: constraint(x1(,xx12,,...,x is2,..., thex6 tightest)x6 =) (=339, (0 and,,03,,210 limits69, 30, 6,,024 how) ,with36 much) objective we can value increase 27111 x13. BasicBasic solution: solution:(x1,(xx21,...,, x2,...,x6)x =6) ( =4 , (08,, 42 , 04,,180,,00), 0with) with objective objective value value4 28= 27.75 Objective value is 0. This basicSwitch solution roles is feasible of x13 and x65:

Solving for x13 yields:

3 3xx22 xx35 xx66 xx31== 9−− −− −− .. 2 48 24 48

Substitute this into x31 in the other three equations

Extended Example: Iteration 3

x 11x z = 111 + 2 − x5 − 6 4 16 8 16 x 5x x = 33 − 2 + x5 − 6 1 4 16 8 16 3x x x = 3 − 2 − x5 + 6 3 2 8 4 8 3x x x = 69 + 2 + 5x5 − 6 4 4 16 8 16

The second constraint is the tightest and limits how much we can increase x2.

Switch roles of x2 and x3:

Solving for x2 yields:

8x2 2x5 x6 x2 = 4 − − + . 3 3 3

Substitute this into x2 in the other three equations

III. Linear Programming Simplex Algorithm 30 IncreasingIncreasing theIncreasing value the of value thex2 would value of x1 of increasewouldx3 would increase the increase objective the objective the value. objective value. value.

BasicTheBasicThe second solution: third solution: constraint constraint(x1(,xx12,,...,x is2 is,..., the thex6 tightest)x tightest6 =) (=339, (0 and,, and03,,210 limits69, limits30, 6,,024 how) how,with36 much) much objective we we can can value increase increase 27111 x13x.2. Basic solution: (x1, x2,..., x6) = ( 4 , 0, 2 , 4 , 0, 0) with objective value 4 = 27.75 Objective value is 0. This basicSwitch solution roles is feasible of x132 and x635:

Solving for x132 yields:

3 83xx22 2xx3x55 xx6x66 xx2x31===49−−− −− −−+ ... 2 348 234 483

Substitute this into x321 in the other three equations

Extended Example: Iteration 4

All coefficients are negative, and hence this basic solution is optimal!

x 2x z = 28 − 3 − x5 − 6 6 6 3 x x x = 8 + 3 + x5 − 6 1 6 6 3 8x x x = 4 − 3 − 2x5 + 6 2 3 3 3 x x = 18 − 3 + x5 4 2 2

Basic solution: (x1, x2,..., x6) = (8, 4, 0, 18, 0, 0) with objective value 28

III. Linear Programming Simplex Algorithm 30 Extended Example: Visualization of SIMPLEX

x3

x2 (0, 12, 0) 12 (0, 0, 4.8) 9.6

(0, 0, 0) (8, 4, 0) 0 (8.25, 0, 1.5) 28 27.75

x1 (9, 0, 0) 27

III. Linear Programming Simplex Algorithm 31 Extended Example: Alternative Runs (1/2)

z = 3x1 + x2 + 2x3

x4 = 30 − x1 − x2 − 3x3

x5 = 24 − 2x1 − 2x2 − 5x3

x6 = 36 − 4x1 − x2 − 2x3

Switch roles of x2 and x5 x x z = 12 + 2x − 3 − 5 1 2 2 5x x x = 12 − x − 3 − 5 2 1 2 2 x x x = 18 − x − 3 + 5 4 2 2 2 x x x = 24 − 3x + 3 + 5 6 1 2 2

Switch roles of x1 and x6

x x 2x z = 28 − 3 − 5 − 6 6 6 3 x x x x = 8 + 3 + 5 − 6 1 6 6 3 8x 2x x x = 4 − 3 − 5 + 6 2 3 3 3 x x x = 18 − x − 3 + 5 4 2 2 2

III. Linear Programming Simplex Algorithm 32 Extended Example: Alternative Runs (2/2)

z = 3x1 + x2 + 2x3

x4 = 30 − x1 − x2 − 3x3

x5 = 24 − 2x1 − 2x2 − 5x3

x6 = 36 − 4x1 − x2 − 2x3

Switch roles of x3 and x5

11x x 2x z = 48 + 1 + 2 − 5 5 5 5 5 x x 3x x = 78 + 1 + 2 + 5 4 5 5 5 5 2x 2x x x = 24 − 1 − 2 − 5 3 5 5 5 5 16x x 2x x = 132 − 1 − 2 + 3 6 5 5 5 5

Switch roles of x1 and x6 Switch roles of x2 and x3

111 x2 x5 11x6 x x 2x z = + − − z = 28 − 3 − 5 − 6 4 16 8 16 6 6 3 x x 5x x x x x = 33 − 2 + 5 − 6 x = 8 + 3 + 5 − 6 1 4 16 8 16 1 6 6 3 3x x x 8x 2x x x = 3 − 2 − 5 + 6 x = 4 − 3 − 5 + 6 3 2 8 4 8 2 3 3 3 69 3x2 5x5 x6 x3 x5 x = + + − x4 = 18 − + 4 4 16 8 16 2 2

III. Linear Programming Simplex Algorithm 33 29.3 The simplex algorithm 869

necessarily integral. Furthermore, the final solution to a linear program need not be integral; it is purely coincidental that this example has an integral solution.

Pivoting

We now formalize the procedure for pivoting. The procedure P IVOT takes as in- put a slack form, given by the tuple .N; B; A; b; c; !/,theindexl of the leav- ing variable xl ,andtheindexe of the entering variable xe.Itreturnsthetuple .N;B;A; b;c;!/ describing the new slack form. (Recall again that the entries of y y y y y y the m n matrices A and A are actually the negatives of the coefficients that appear The Pivot! Step Formallyy in the slack form.)

PIVOT.N; B; A; b; c; !;l;e/ 1 // Compute the coefficients of the equation for new basic variable xe. 2letA be a new m n matrix y ! 3 be bl =ale y D 4 for each j N e Need that ale 6= 0! Rewrite “tight” equation 2 " f g 5 aej alj =ale for enterring variable xe. y D 6 ael 1=ale y D 7 // Compute the coefficients of the remaining constraints. 8 for each i B l 2 " f g 9 bi bi aiebe y D " y Substituting xe into 10 for each j N e 2 " f g other equations. 11 aij aij aieaej y D " y 12 ail aieael y D " y 13 // Compute the objective function. 14 ! ! cebe y D C y 15 for each j N e Substituting xe into 2 " f g 16 cj cj ceaej objective function. y D " y 17 cl ceael y D " y 18 // Compute new sets of basic and nonbasic variables. 19 N N e l Update non-basic y D " f g [ f g 20 B B l e y D " f g [ f g and basic variables 21 return .N;B;A; b;c;!/ y y y y y y PIVOT works as follows. Lines 3–6 compute the coefficients in the new equation for xe by rewritingIII. the Linear equation Programming that has xl on Simplex the left-hand Algorithm side to instead have xe 34 on the left-hand side. Lines 8–12 update the remaining equations by substituting the right-hand side of this new equation for each occurrence of xe.Lines14–17 do the same substitution for the objective function, and lines 19 and 20 update the Effect of the Pivot Step

Lemma 29.1

Consider a call to PIVOT(N, B, A, b, c, v, l, e) in which ale 6= 0. Let the values returned from the call be (Nb, Bb, Ab, b, bc, vb), and let x denote the basic solution after the call. Then

1. x j = 0 for each j ∈ Nb.

2. x e = bl /ale.

3. x i = bi − aiebe for each i ∈ Bb \{e}.

Proof: 1. holds since the basic solution always sets all non-basic variables to zero. 2. When we set each non-basic variable to 0 in a constraint X xi = bi − baij xj , j∈Nb

we have x i = bi for each i ∈ Bb. Hence x e = be = bl /ale. 3. After the substituting in the other constraints, we have

x i = bi = bi − aiebe.

III. Linear Programming Simplex Algorithm 35 Formalizing the Simplex Algorithm: Questions

Questions: How do we determine whether a linear program is feasible? What do we do if the linear program is feasible, but the initial basic solution is not feasible? How do we determine whether a linear program is unbounded? How do we choose the entering and leaving variables?

Example before was a particularly nice one!

III. Linear Programming Simplex Algorithm 36 Proof is based on the following three-part loop invariant: 1. the slack form is always equivalent to the one returned by INITIALIZE-SIMPLEX,

2. for each i ∈ B, we have bi ≥ 0, 3. the basic solution associated with the (current) slack form is feasible.

Lemma 29.2

Suppose the call to INITIALIZE-SIMPLEX in line 1 returns a slack form for which the basic solution is feasible. Then if SIMPLEX returns a solution, it is a feasible solution. If SIMPLEX returns “unbounded”, the linear program is unbounded.

29.3 The simplex algorithm 871

In Section 29.5, we shall show how to determine whether a problem is feasible, and if so, how to find a slack form in which the initial basic solution is feasible. Therefore, let us assume that we have a procedure INITIALIZE-SIMPLEX.A; b; c/ that takes as input a linear program in standard form, that is, an m n matrix ! A .aij /,anm-vector b .bi /,andann-vector c .cj /.Iftheproblemis D D D infeasible, the procedure returns a message that the program is infeasible and then terminates. Otherwise, the procedure returns a slack form for which the initial basic solution is feasible. The procedure SIMPLEX takes as input a linear program in standard form, as just described. It returns an n-vector x .xj / that is an optimal solution to the linear N D N Theprogram formal described procedure in (29.19)–(29.21). SIMPLEX

SIMPLEX.A; b; c/ Returns a slack form with a 1 .N; B; A; b; c; !/ INITIALIZE-SIMPLEX.A; b; c/ D feasible basic solution (if it exists) 2let be a new vector of length n 3 while some index j N has cj >0 2 Main Loop: 4chooseanindexe N for which ce >0 2 5 for each index i B terminates if all coefficients in 2 6 if aie >0 objective function are negative 7 i bi =aie D Line 4 picks enterring variable 8 else i xe with negative coefficient D1 9chooseanindexl B that minimizes i 2 Lines 6 − 9 pick the tightest 10 if l == 1 constraint, associated with xl 11 return “unbounded” Line 11 returns “unbounded” if 12 else .N; B; A; b; c; !/ PIVOT.N; B; A; b; c; !;l;e/ D there are no constraints 13 for i 1 to n D 14 if i B Line 12 calls PIVOT, switching 2 roles of xl and xe 15 xi bi N D 16 else xi 0 N D 17 return .x1; x2;:::;xn/ N N N Return corresponding solution. The SIMPLEX procedure works as follows. In line 1, it calls the procedure INITIALIZE-SIMPLEX.A; b; c/,describedabove,whicheitherdeterminesthatthe linear program is infeasible or returns a slack form for which the basic solution is feasible. The while loop of lines 3–12 forms the main part of the algorithm. If all coefficients in the objective function are negative, then the while loop terminates. Otherwise, line 4 selects a variable xe,whosecoefficientintheobjectivefunction is positive, as the enteringIII. Linear Programming variable. Although Simplex we may Algorithm choose any such variable as 37 the entering variable, we assume that we use some prespecified deterministic rule. Next, lines 5–9 check each constraint and pick the one that most severely limits the amount by which we can increase xe without violating any of the nonnegativ- Returns a slack form with a feasible basic solution (if it exists)

Main Loop: terminates if all coefficients in objective function are negative Line 4 picks enterring variable xe with negative coefficient Lines 6 − 9 pick the tightest constraint, associated with xl Line 11 returns “unbounded” if there are no constraints

Line 12 calls PIVOT, switching roles of xl and xe

Return corresponding solution.

29.3 The simplex algorithm 871

In Section 29.5, we shall show how to determine whether a problem is feasible, and if so, how to find a slack form in which the initial basic solution is feasible. Therefore, let us assume that we have a procedure INITIALIZE-SIMPLEX.A; b; c/ that takes as input a linear program in standard form, that is, an m n matrix ! A .aij /,anm-vector b .bi /,andann-vector c .cj /.Iftheproblemis D D D infeasible, the procedure returns a message that the program is infeasible and then terminates. Otherwise, the procedure returns a slack form for which the initial basic solution is feasible. The procedure SIMPLEX takes as input a linear program in standard form, as just described. It returns an n-vector x .xj / that is an optimal solution to the linear N D N Theprogram formal described procedure in (29.19)–(29.21). SIMPLEX

SIMPLEX.A; b; c/ 1 .N; B; A; b; c; !/ INITIALIZE-SIMPLEX.A; b; c/ D 2let be a new vector of length n 3 while some index j N has cj >0 2 4chooseanindexe N for which ce >0 2 5 for each index i B 2 6 if aie >0 7 i bi =aie D 8 else i D1 9chooseanindexl B that minimizes i 2 10 if l == 1 11 return “unbounded” 12 else .N; B; A; b; c; !/ PIVOT.N; B; A; b; c; !;l;e/ Proof is based on the followingD three-part loop invariant: 13 for i 1 to n D 1. the14 slackif i formB is always equivalent to the one returned by INITIALIZE-SIMPLEX, 2 15 xi bi 2. for each i ∈NBD, we have bi ≥ 0, 16 else xi 0 N D 3. the17 basicreturn solution.x1; x2;:::; associatedxn/ with the (current) slack form is feasible. N N N The SLemmaIMPLEX 29.2procedure works as follows. In line 1, it calls the procedure .A; b; c/ INITIALIZESuppose-S theIMPLEX call to INITIALIZE,describedabove,whicheitherdeterminesthatthe-SIMPLEX in line 1 returns a slack form for which linearthe basic program solution is infeasible is feasible. or returns Then a slack if S formIMPLEX for whichreturns the abasic solution, solution it is is a feasible feasible.solution. The Ifwhile SIMPLEXloop ofreturns lines 3–12 “unbounded”, forms the main the linearpart of program the algorithm. is unbounded. If all coefficients in the objective function are negative, then the while loop terminates. Otherwise, line 4 selects a variable xe,whosecoefficientintheobjectivefunction is positive, as the enteringIII. Linear Programming variable. Although Simplex we may Algorithm choose any such variable as 37 the entering variable, we assume that we use some prespecified deterministic rule. Next, lines 5–9 check each constraint and pick the one that most severely limits the amount by which we can increase xe without violating any of the nonnegativ- Termination

Degeneracy: One iteration of SIMPLEX leaves the objective value unchanged.

z = x1 + x2 + x3

x4 = 8 − x1 − x2

x5 = x2 − x3

Pivot with x1 entering and x4 leaving

z = 8 + x3 − x4

x1 = 8 − x2 − x4

x5 = x2 − x3

Cycling: Slack forms at two iterations are Pivot with x3 entering and x5 leaving identical, and SIMPLEX fails to terminate!

z = 8 + x2 − x4 − x5

x1 = 8 − x2 − x4

x3 = x2 − x5

III. Linear Programming Simplex Algorithm 38 Termination and Running Time

It is theoretically possible, but very rare in practice.

Cycling:SIMPLEX may fail to terminate.

Anti-Cycling Strategies 1. Bland’s rule: Choose entering variable with smallest index 2. Random rule: Choose entering variable uniformly at random 3. Perturbation: Perturb the input slightly so that it is impossible to have two solutions with the same objective value

Replace each bi by bi = bi + i , where i  i+1 are all small.

Lemma 29.7 Assuming INITIALIZE-SIMPLEX returns a slack form for which the ba- sic solution is feasible, SIMPLEX either reports that the program is un- n+m bounded or returns a feasible solution in at most m iterations.

Every set B of basic variables uniquely determines a slack n+m form, and there are at most m unique slack forms.

III. Linear Programming Simplex Algorithm 39 Outline

Introduction

Standard and Slack Forms

Formulating Problems as Linear Programs

Simplex Algorithm

Finding an Initial Solution

III. Linear Programming Finding an Initial Solution 40 Finding an Initial Solution

maximize 2x1 − x2 subject to 2x1 − x2 ≤ 2 x1 − 5x2 ≤ −4 x1, x2 ≥ 0

Conversion into slack form

z = 2x1 − x2 x3 = 2 − 2x1 + x2 x4 = −4 − x1 + 5x2

Basic solution (x1, x2, x3, x4) = (0, 0, 2, −4) is not feasible!

III. Linear Programming Finding an Initial Solution 41 Geometric Illustration

maximize 2x1 − x2 subject to 2x1 − x2 ≤ 2 x1 − 5x2 ≤ −4 Questions: x1, x2 ≥ 0 How to determine whether

x2 there is any feasible solution? If there is one, how to determine 2

≤ an initial basic solution?

2 x

1 x 2

4 5x2 ≤ − x1 −

x1

III. Linear Programming Finding an Initial Solution 42 Formulating an Auxiliary Linear Program Pn maximize j=1 cj xj subject to Pn j=1 aij xj ≤ bi for i = 1, 2,..., m, xj ≥ 0 for j = 1, 2,..., n

Formulating an Auxiliary Linear Program

maximize −x0 subject to Pn j=1 aij xj − x0 ≤ bi for i = 1, 2,..., m, xj ≥ 0 for j = 0, 1,..., n Lemma 29.11

Let Laux be the auxiliary LP of a linear program L in standard form. Then L is feasible if and only if the optimal objective value of Laux is 0.

Proof.

“⇒”: Suppose L has a feasible solution x = (x1, x2,..., xn)

x0 = 0 combined with x is a feasible solution to Laux with objective value 0. Since x0 ≥ 0 and the objective is to maximize −x0, this is optimal for Laux “⇐”: Suppose that the optimal objective value of Laux is 0

Then x0 = 0, and the remaining solution values (x1, x2,..., xn) satisfy L. III. Linear Programming Finding an Initial Solution 43 29.5 The initial basic feasible solution 887

maximize x0 (29.106) ! subject to n aij xj x0 bi for i 1; 2; : : : ; m ; (29.107) ! Ä D j 1 XD xj 0 for j 0; 1; : : : ; n : (29.108) # D

Then L is feasible if and only if the optimal objective value of Laux is 0.

Proof Suppose that L has a feasible solution x .x1; x2;:::;xn/.Thenthe N D N N N solution x0 0 combined with x is a feasible solution to Laux with objective N D N value 0.Sincex0 0 is a constraint of Laux and the objective function is to # maximize x0,thissolutionmustbeoptimalforLaux. ! Conversely, suppose that the optimal objective value of Laux is 0.Thenx0 0, N D and the remaining solution values of x satisfy the constraints of L. INITIALIZE-SIMPLEX N We now describe our strategy to find an initial basic feasible solution for a linear program L in standard form: Test solution with N = {1, 2,..., n}, B = {n + 1, n + NITIALIZE IMPLEX.A; b; c/ I -S 2,..., n + m}, x i = bi for i ∈ B, x i = 0 otherwise. 1letk be the index of the minimum bi 2 if bk 0 // is the initial basic solution feasible? # 3 return . 1; 2; : : : ; n ; n 1; n 2;: : : ;n m ;A;b;c;0/ f g f C C C g 4formLaux by adding x0 to the left-hand side of each constraint ! and setting the objective function to x0 ! ` will be the leaving variable so 5let.N; B; A; b; c; !/ be the resulting slack form for Laux 6 l n k that x` has the most negative value. D C 7 // Laux has n 1 nonbasic variables and m basic variables. C 8 .N; B; A; b; c; !/ PIVOT.N; B; A; b; c; !;l;0/ D Pivot step with x` leaving and x0 entering. 9 // The basic solution is now feasible for Laux. 10 iterate the while loop of lines 3–12 of SIMPLEX until an optimal solution to Laux is found 11 if the optimal solution to Laux sets x0 to 0 This pivot step does not change N 12 if x0 is basic N the value of any variable. 13 perform one (degenerate) pivot to make it nonbasic 14 from the final slack form of Laux,removex0 from the constraints and restore the original objective function of L,butreplaceeachbasic variable in this objective function by the right-hand side of its associated constraint 15 return the modified final slack form 16 else return “infeasible”

III. Linear Programming Finding an Initial Solution 44 Example of INITIALIZE-SIMPLEX (1/3)

maximize 2x1 − x2 subject to 2x1 − x2 ≤ 2 x1 − 5x2 ≤ −4 x1, x2 ≥ 0

Formulating the auxiliary linear program

maximize −x0 subject to 2x1 − x2 − x0 ≤ 2 x1 − 5x2 − x0 ≤ −4 x1, x2, x0 ≥ 0 Basic solution (0, 0, 0, 2, −4) not feasible! Converting into slack form

z = − x0 x3 = 2 − 2x1 + x2 + x0 x4 = −4 − x1 + 5x2 + x0

III. Linear Programming Finding an Initial Solution 45 Example of INITIALIZE-SIMPLEX (2/3)

z = − x0 x3 = 2 − 2x1 + x2 + x0 x4 = −4 − x1 + 5x2 + x0

Pivot with x0 entering and x4 leaving

z = −4 − x1 + 5x2 − x4 x0 = 4 + x1 − 5x2 + x4 x3 = 6 − x1 − 4x2 + x4

Basic solution (4, 0, 0, 6, 0) is feasible! Pivot with x2 entering and x0 leaving

z = − x0 x x x x = 4 − 0 + 1 + 4 2 5 5 5 5 4x 9x x x = 14 + 0 − 1 + 4 3 5 5 5 5

Optimal solution has x0 = 0, hence the initial problem was feasible!

III. Linear Programming Finding an Initial Solution 46 Example of INITIALIZE-SIMPLEX (3/3)

z = − x0 x x x x = 4 − 0 + 1 + 4 2 5 5 5 5 4x 9x x x = 14 + 0 − 1 + 4 3 5 5 5 5

Set x0 = 0 and express objective function

4 x0 x1 x4 by non-basic variables 2x1 − 2x2 = 2x1 − ( 5 − 5 + 5 + 5 ) 9x x z = − 4 + 1 − 4 5 5 5 x x x = 4 + 1 + 4 2 5 5 5 9x x x = 14 − 1 + 4 3 5 5 5

4 14 Basic solution (0, 5 , 5 , 0), which is feasible!

Lemma 29.12 If a linear program L has no feasible solution, then INITIALIZE-SIMPLEX returns “infeasible”. Otherwise, it returns a valid slack form for which the basic solution is feasible.

III. Linear Programming Finding an Initial Solution 47 Fundamental Theorem of Linear Programming

Theorem 29.13 Any linear program L, given in standard form, either 1. has an optimal solution with a finite objective value, 2. is infeasible, or 3. is unbounded.

If L is infeasible, SIMPLEX returns “infeasible”. If L is unbounded, SIMPLEX returns “unbounded”. Otherwise, SIMPLEX returns an optimal solution with a finite objective value.

III. Linear Programming Finding an Initial Solution 48 Linear Programming and Simplex: Summary

Linear Programming extremely versatile tool for modelling problems of all kinds basis of Integer Programming, to be discussed in later lectures

Simplex Algorithm x3 In practice: usually terminates in x2 polynomial time, i.e., O(m + n) In theory: even with anti-cycling may need exponential time x1 Research Problem: Is there a pivoting rule which makes SIMPLEX a polynomial-time algorithm?

Polynomial-Time Algorithms x3 Interior-Point Methods: traverses the x2 interior of the feasible set of solutions (not just vertices!)

x1

III. Linear Programming Finding an Initial Solution 49 IV. Approximation Algorithms: Covering Problems Thomas Sauerwald

Easter 2015 Outline

Introduction

Vertex Cover

The Set-Covering Problem

IV. Covering Problems Introduction 2 Motivation

Many fundamental problems are NP-complete, yet they are too impor- tant to be abandoned.

Examples: HAMILTON, 3-SAT, VERTEX-COVER,KNAPSACK,...

Strategies to cope with NP-complete problems 1. If inputs (or solutions) are small, an algorithm with exponential running time may be satisfactory. 2. Isolate important special cases which can be solved in polynomial-time. 3. Develop algorithms which find near-optimal solutions in polynomial-time. We will call these approximation algorithms.

IV. Covering Problems Introduction 3 Performance Ratios for Approximation Algorithms

Approximation Ratio An algorithm for a problem has approximation ratio ρ(n), if for any input of size n, the cost C of the returned solution and optimal cost C∗ satisfy:

∗  C C∗  Maximization problem: C ≥ 1 max , ≤ ρ(n). C C∗ C C Minimization problem: C∗ ≥ 1

This covers both maximization and minimization problems.

For many problems: tradeoff between runtime and approximation ratio.

Approximation Schemes An approximation scheme is an approximation algorithm, which given any input and  > 0, is a (1 + )-approximation algorithm. It is a polynomial-time approximation scheme (PTAS) if for any fixed  > 0, the runtime is polynomial in n. For example, O(n2/). It is a fully polynomial-time approximation scheme (FPTAS) if the runtime is polynomial in both 1/ and n. For example, O((1/)2 · n3).

IV. Covering Problems Introduction 4 Outline

Introduction

Vertex Cover

The Set-Covering Problem

IV. Covering Problems Vertex Cover 5 b

c

e

The Vertex-Cover Problem

We are covering edges by picking vertices! b Vertex Cover Problem Given: Undirected graph G = (V , E) a 0 Goal: Find a minimum-cardinality subset V ⊆ V c such that if (u, v) ∈ E(G), then u ∈ V 0 or v ∈ V 0. e This is an NP-hard problem. d

Applications: Every edge forms a task, and every vertex represents a person/machine which can execute that task Perform all tasks with the minimal amount of resources Extensions: weighted edges or hypergraphs

IV. Covering Problems Vertex Cover 6 Edges removed from E 0: 1. {b, c} b c d 2. {e, f } 3. {d, g}

e f g

APPROX-VTheERTEX optimal-COVER solutionproduces has size a set 3. of size 6.

35.1 The vertex-cover problem 1109

bcd bcd

ae fg ae fg (a) (b)

bcd bcd

ae fg ae fg (c) (d)

bcd bcd

ae fg ae fg (e) (f)

Figure 35.1 The operation of APPROX-VERTEX-COVER. (a) The input graph G,whichhas7 vertices and 8 edges. (b) The edge .b; c/,shownheavy,isthefirstedgechosenbyAPPROX-VERTEX- COVER.Verticesb and c,shownlightlyshaded,areaddedtothesetC containing the vertex cover being created. Edges .a; b/, .c; e/,and.c; d/,showndashed,areremovedsincetheyarenowcovered by some vertex in C . (c) Edge .e; f / is chosen; vertices e and f are added to C . (d) Edge .d; g/ is chosen; vertices d and g are added to C . (e) The set C ,whichisthevertexcoverproducedby APPROX-VERTEX-COVER,containsthesixverticesb;c; d;e; f;g. (f) The optimal vertex cover for Anthis problem Approximation contains only three Algorithm vertices: b, d,and basede. on Greedy

APPROX-VERTEX-COVER.G/ 1 C D; 2 E0 G:E D 3 while E0 ¤; 4let.u; !/ be an arbitrary edge of E0 5 C C u; ! D [ f g 6removefromE0 every edge incident on either u or ! 7 return C

Figure 35.1 illustrates how APPROX-VERTEX-COVER operates on an example graph. The variable C contains the vertex cover being constructed. Line 1 ini- tializesb C to the emptyc set. Line 2d sets E0 to be a copy of the edge set G:E of the graph. The loop of lines 3–6 repeatedly picks an edge .u; !/ from E0,addsits

a e f g

IV. Covering Problems Vertex Cover 7 b d

e

The optimal solution has size 3.

35.1 The vertex-cover problem 1109

bcd bcd

ae fg ae fg (a) (b)

bcd bcd

ae fg ae fg (c) (d)

bcd bcd

ae fg ae fg (e) (f)

Figure 35.1 The operation of APPROX-VERTEX-COVER. (a) The input graph G,whichhas7 vertices and 8 edges. (b) The edge .b; c/,shownheavy,isthefirstedgechosenbyAPPROX-VERTEX- COVER.Verticesb and c,shownlightlyshaded,areaddedtothesetC containing the vertex cover being created. Edges .a; b/, .c; e/,and.c; d/,showndashed,areremovedsincetheyarenowcovered by some vertex in C . (c) Edge .e; f / is chosen; vertices e and f are added to C . (d) Edge .d; g/ is chosen; vertices d and g are added to C . (e) The set C ,whichisthevertexcoverproducedby APPROX-VERTEX-COVER,containsthesixverticesb;c; d;e; f;g. (f) The optimal vertex cover for Anthis problem Approximation contains only three Algorithm vertices: b, d,and basede. on Greedy

APPROX-VERTEX-COVER.G/ 1 C D; 2 E0 G:E D 3 while E0 ¤; 4let.u; !/ be an arbitrary edge of E0 5 C C u; ! D [ f g 6removefromE0 every edge incident on either u or ! 7 return C

Figure 35.1 illustrates how APPROX-VERTEX-COVEREdgesoperates removed on an from exampleE 0: C graph. The variable contains the vertex cover being1. { constructed.b, c} Line 1 ini- tializesb C to the emptyc set. Line 2d sets E0 to be a copy of the edge set G:E of the graph. The loop of lines 3–6 repeatedly picks an2. edge{e.u;, f }!/ from E0,addsits 3. {d, g}

a e f g

APPROX-VERTEX-COVER produces a set of size 6.

IV. Covering Problems Vertex Cover 7 Edges removed from E 0: 1. {b, c} b c d 2. {e, f } 3. {d, g}

e f g

APPROX-VERTEX-COVER produces a set of size 6.

35.1 The vertex-cover problem 1109

bcd bcd

ae fg ae fg (a) (b)

bcd bcd

ae fg ae fg (c) (d)

bcd bcd

ae fg ae fg (e) (f)

Figure 35.1 The operation of APPROX-VERTEX-COVER. (a) The input graph G,whichhas7 vertices and 8 edges. (b) The edge .b; c/,shownheavy,isthefirstedgechosenbyAPPROX-VERTEX- COVER.Verticesb and c,shownlightlyshaded,areaddedtothesetC containing the vertex cover being created. Edges .a; b/, .c; e/,and.c; d/,showndashed,areremovedsincetheyarenowcovered by some vertex in C . (c) Edge .e; f / is chosen; vertices e and f are added to C . (d) Edge .d; g/ is chosen; vertices d and g are added to C . (e) The set C ,whichisthevertexcoverproducedby APPROX-VERTEX-COVER,containsthesixverticesb;c; d;e; f;g. (f) The optimal vertex cover for Anthis problem Approximation contains only three Algorithm vertices: b, d,and basede. on Greedy

APPROX-VERTEX-COVER.G/ 1 C D; 2 E0 G:E D 3 while E0 ¤; 4let.u; !/ be an arbitrary edge of E0 5 C C u; ! D [ f g 6removefromE0 every edge incident on either u or ! 7 return C

Figure 35.1 illustrates how APPROX-VERTEX-COVER operates on an example graph. The variable C contains the vertex cover being constructed. Line 1 ini- tializesb C to the emptyc set. Line 2d sets E0 to be a copy of the edge set G:E of the graph. The loop of lines 3–6 repeatedly picks an edge .u; !/ from E0,addsits

a e f g

The optimal solution has size 3.

IV. Covering Problems Vertex Cover 7 35.1 The vertex-cover problem 1109

bcd bcd

ae fg ae fg (a) (b)

bcd bcd

ae fg ae fg (c) (d)

bcd bcd

ae fg ae fg (e) (f)

Figure 35.1 The operation of APPROX-VERTEX-COVER. (a) The input graph G,whichhas7 vertices and 8 edges. (b) The edge .b; c/,shownheavy,isthefirstedgechosenbyAPPROX-VERTEX- COVER.Verticesb and c,shownlightlyshaded,areaddedtothesetC containing the vertex cover being created. Edges .a; b/, .c; e/,and.c; d/,showndashed,areremovedsincetheyarenowcovered by some vertex in C . (c) Edge .e; f / is chosen; vertices e and f are added to C . (d) Edge .d; g/ is chosen; vertices d and g are added to C . (e) The set C ,whichisthevertexcoverproducedby APPROX-VERTEX-COVER,containsthesixverticesb;c; d;e; f;g. (f) The optimal vertex cover for Analysisthis problem contains of Greedy only three vertices: for Vertexb, d,and Covere.

APPROX-VERTEX-COVER.G/ 1 C D; 2 E0 G:E D 3 while E0 ¤; 4let.u; !/ be an arbitrary edge of E0 5 C C u; ! D [ f g 6removefromE0 every edge incident on either u or ! C 7 return We can bound the size of the returned solution without knowing the (size of an) optimal solution! FigureTheorem 35.1 illustrates 35.1 how APPROX-VERTEX-COVER operates on an example graph. The variable C contains the vertex cover being constructed. Line 1 ini- APPROX-VERTEX-COVER is a poly-time 2-approximation algorithm. tializes C to the empty set. Line 2 sets E0 to be a copy of the edge set G:E of the graph. The loop of lines 3–6 repeatedly picks an edge .u; !/ from E0,addsits Proof: Running time is O(V + E) (using adjaency lists to represent E 0) Let A ⊆ E denote the set of edges picked in line 4 Every optimal cover C∗ must include at least one endpoint of edges in A, and edges in A do not share a common endpoint: |C∗| ≥ |A|

Every edge in A contributes 2 vertices to |C|: |C| = 2|A| ≤ 2|C∗|.

IV. Covering Problems Vertex Cover 8 Solving Special Cases

Strategies to cope with NP-complete problems 1. If inputs are small, an algorithm with exponential running time may be satisfactory. 2. Isolate important special cases which can be solved in polynomial-time. 3. Develop algorithms which find near-optimal solutions in polynomial-time.

IV. Covering Problems Vertex Cover 9 Vertex Cover on Trees

There exists an optimal vertex cover which does not include any leaves.

Exchange-Argument: Replace any leaf in the cover by its parent.

IV. Covering Problems Vertex Cover 10 Solving Vertex Cover on Trees

There exists an optimal vertex cover which does not include any leaves.

VERTEX-COVER-TREES(G) 1: C = ∅ 2: while ∃ leaves in G 3: Add all parents to C 4: Remove all leafs and their parents from G 5: return C

Clear: Running time is O(V ), and the returned solution is a vertex cover.

Solution is also optimal. (Use inductively the ex- istence of an optimal vertex cover without leaves)

IV. Covering Problems Vertex Cover 11 After iteration

Problem can be also solved on bipartite graphs, using Max-Flows and Min-Cuts.

Execution on a Small Example

VERTEX-COVER-TREES(G) 1: C = ∅ 2: while ∃ leaves in G 3: Add all parents to C 4: Remove all leafs and their parents from G 5: return C

IV. Covering Problems Vertex Cover 12 Problem can be also solved on bipartite graphs, using Max-Flows and Min-Cuts.

Execution on a Small Example

After iteration 1

VERTEX-COVER-TREES(G) 1: C = ∅ 2: while ∃ leaves in G 3: Add all parents to C 4: Remove all leafs and their parents from G 5: return C

IV. Covering Problems Vertex Cover 12 Execution on a Small Example

After iteration 2

VERTEX-COVER-TREES(G) 1: C = ∅ 2: while ∃ leaves in G 3: Add all parents to C 4: Remove all leafs and their parents from G 5: return C

Problem can be also solved on bipartite graphs, using Max-Flows and Min-Cuts.

IV. Covering Problems Vertex Cover 12 Exact Algorithms

Such algorithms are called exact algorithms.

Strategies to cope with NP-complete problems 1. If inputs (or solutions) are small, an algorithm with exponential running time may be satisfactory 2. Isolate important special cases which can be solved in polynomial-time. 3. Develop algorithms which find near-optimal solutions in polynomial-time.

Focus on instances of where the minimum vertex cover is small, that is, smaller than some given integer k.

n k Simple Brute-Force Search would take ≈ k = Θ(n ) time.

IV. Covering Problems Vertex Cover 13 u

Towards a more efficient Search

Substructure Lemma Consider a graph G = (V , E), edge (u, v) ∈ E(G) and integer k ≥ 1. Let Gu be the graph obtained by deleting u and its incident edges (Gv is defined similarly). Then G has a vertex cover of size k if and only if Gu or Gv (or both) have a vertex cover of size k − 1.

Reminiscent of Dynamic Programming. Proof:

⇐ Assume Gu has a vertex cover Cu of size k − 1. Adding u yields a vertex cover of G which is of size k ⇒ Assume G has a vertex cover C of size k, which contains, say u. Removing u from C yields a vertex cover of Gu which is of size k − 1.

u

Gu v

IV. Covering Problems Vertex Cover 14 A More Efficient Search Algorithm

VERTEX-COVER-SEARCH(G, k) 1: Pick an arbitrary edge (u, v) ∈ E 2: S1 = VERTEX-COVER-SEARCH(Gu, k − 1) 3: S2 = VERTEX-COVER-SEARCH(Gv , k − 1) 4: if S1 6= ∅ return S1 ∪ {u} 5: if S2 6= ∅ return S2 ∪ {v} 6: return ∅

Correctness follows by the Substructure Lemma and induction.

Running time: k Depth log2 k, branching factor 2 ⇒ total number of calls is O(2 ) O(E) work per recursive call Total runtime: O(2k · E). exponential in k, but much better than Θ(nk ) (i.e., still polynomial for k = O(log n))

IV. Covering Problems Vertex Cover 15 Outline

Introduction

Vertex Cover

The Set-Covering Problem

IV. Covering Problems The Set-Covering Problem 16 The Set-Covering Problem

Set Cover Problem S1 Given: set X of size n and family of subsets F Goal: Find a minimum-size subset C ⊆ F [ S2 Number of sets s.t. X = S. (and not elements) S∈C

S6 S Only solvable if S∈F S = X! S3 S4 S5

Remarks: generalisation of the vertex-cover problem and hence also NP-hard. models resource allocation problems

IV. Covering Problems The Set-Covering Problem 17 Can be easily implemented to run in time polynomial in |X| and |F| Optimal cover is C = {S3, S4, S5}

How good is the approximation ratio? Optimal cover is C = {S3, S4, S5}

Greedy35.3 The set-covering problem 1119

Agreedyapproximationalgorithm Strategy: Pick the set S that covers the Thelargest greedy number method works of uncovered by picking, elements. at each stage, the set S that covers the great- est number of remaining elements that are uncovered. S1 GREEDY-SET-COVER.X; F / 1 U X D 2 C D; S2 3 while U ¤; 4selectanS F that maximizes S U 2 j \ j 5 U U S D ! S 6 C C S 6 DC [ f g 7 return S3 S4 S5

In the example of Figure 35.3, GREEDY-SET-COVER adds to C,inorder,thesets S S S S S 1, 4,and 5,followedbyeither 3 or 6. Greedy chooses S1, S4, S5 and S3 U The algorithm works as follows. The set contains,(or S6), at which each stage,is a cover the set of of size 4. remaining uncovered elements. The set C contains the cover being constructed. Line 4 is the greedy decision-making step, choosing a subset S that covers as many uncovered elements as possible (breaking ties arbitrarily). After S is selected, line 5 removes its elements from U ,andline6placesS into C.Whenthealgorithm terminates, the set C contains a subfamily of F that covers X. We can easilyIV. implement Covering Problems G REEDY-S TheET-C Set-CoveringOVER Problemto run in time polynomial in X j18 j and F .Sincethenumberofiterationsofthelooponlines3–6isboundedfrom j j above by min. X ; F /,andwecanimplementtheloopbodytorunintime j j j j O. X F /,asimpleimplementationrunsintimeO. X F min. X ; F //.Ex- j jj j j jj j j j j j ercise 35.3-3 asks for a linear-time algorithm.

Analysis We now show that the greedy algorithm returns a set cover that is not too much larger than an optimal set cover. For convenience, in this chapter we denote the dth d harmonic number Hd i 1 1=i (see Section A.1) by H.d/.Asaboundary D D condition, we define H.0/ 0. PD Theorem 35.4 GREEDY-SET-COVER is a polynomial-time !.n/-approximation algorithm, where !.n/ H.max S S F /: D fj j W 2 g

Proof We have already shown that G REEDY-SET-COVER runs in polynomial time. S1

Greedy chooses S1, S4, S5 and S3 (or S6), which is a cover of size 4.

Optimal cover is C = {S3, S4, S5}

Greedy35.3 The set-covering problem 1119

Agreedyapproximationalgorithm Strategy: Pick the set S that covers the Thelargest greedy number method works of uncovered by picking, elements. at each stage, the set S that covers the great- est number of remaining elements that are uncovered. S1 GREEDY-SET-COVER.X; F / 1 U X D 2 C D; S2 3 while U ¤; 4selectanS F that maximizes S U 2 j \ j 5 U U S D ! S 6 C C S 6 DC [ f g 7 return S3 S4 S5 Can be easily implemented to run In the example of Figure 35.3, GREEDY-SET-COVER adds to C,inorder,thesets in time polynomial in |X| and |F| S1, S4,andS5,followedbyeitherS3 or S6. C = { , , } The algorithm works as follows. The set U contains,Optimal at eachcover stage, is the setS3 ofS4 S5 remaining uncovered elements. The set C contains the cover being constructed. 4 S LineHowis good the greedy is the decision-making approximation step, ratio choosing? a subset that covers as many uncovered elements as possible (breaking ties arbitrarily). After S is selected, line 5 removes its elements from U ,andline6placesS into C.Whenthealgorithm terminates, the set C contains a subfamily of F that covers X. We can easilyIV. implement Covering Problems G REEDY-S TheET-C Set-CoveringOVER Problemto run in time polynomial in X j18 j and F .Sincethenumberofiterationsofthelooponlines3–6isboundedfrom j j above by min. X ; F /,andwecanimplementtheloopbodytorunintime j j j j O. X F /,asimpleimplementationrunsintimeO. X F min. X ; F //.Ex- j jj j j jj j j j j j ercise 35.3-3 asks for a linear-time algorithm.

Analysis We now show that the greedy algorithm returns a set cover that is not too much larger than an optimal set cover. For convenience, in this chapter we denote the dth d harmonic number Hd i 1 1=i (see Section A.1) by H.d/.Asaboundary D D condition, we define H.0/ 0. PD Theorem 35.4 GREEDY-SET-COVER is a polynomial-time !.n/-approximation algorithm, where !.n/ H.max S S F /: D fj j W 2 g

Proof We have already shown that G REEDY-SET-COVER runs in polynomial time. Approximation Ratio of Greedy

Theorem 35.4 GREEDY-SET-COVER is a polynomial-time ρ(n)-algorithm, where

ρ(n) = H(max{|S|: |S| ∈ F}) ≤ ln(n) + 1.

Pk 1 H(k) := i=1 k ≤ ln(k) + 1

Idea: Distribute cost of 1 for each added set over the newly covered elements.

Definition of cost

If an element x is covered for the first time by set Si in iteration i, then 1 cx := . |Si \ (S1 ∪ S2 ∪ · · · ∪ Si−1)|

IV. Covering Problems The Set-Covering Problem 19 Illustration of Costs

1 1 1 6 6 6

S1

1 1 1 6 6 6

S2

1 1 1 3 3 2

1 1 1 S6 3 2

S3 S4 S5

1 1 1 1 1 1 1 1 1 1 1 6 + 6 + 6 + 6 + 6 + 6 + 3 + 3 + 3 + 2 + 2 + 1 = 4

IV. Covering Problems The Set-Covering Problem 20 Proof of Theorem 35.4 (1/2)

Definition of cost 1 If x is covered for the first time by a set Si , then cx := . |Si \(S1∪S2∪···∪Si−1)|

Proof. Each step of the algorithm assigns 1 unit of cost, so X |C| = cx (1) x∈X Each element x ∈ X is in at least one set in the optimal cover C∗, so X X X cx ≥ cx (2) S∈C∗ x∈S x∈X

Combining 1 and 2 gives

X X X ∗ |C| ≤ cx ≤ H(|S|) ≤ |C | · H(max{|S|: S ∈ F}) S∈C∗ x∈S S∈C∗ P Key Inequality: x∈S cx ≤ H(|S|).

IV. Covering Problems The Set-Covering Problem 21 Proof of Theorem 35.4 (2/2) P Proof of the Key Inequality x∈S cx ≤ H(|S|)

Remaining uncovered elements in S Sets chosen by the algorithm

For any S ∈ F and i = 1, 2,..., |C| = k let ui := |S \ (S1 ∪ S2 ∪ · · · ∪ Si )| ⇒ u0 > u1 > ··· > u|C| = 0 and ui−1 − ui counts the items covered first time by Si . ⇒ Each factor equals one. k X X 1 cx = (ui−1 − ui ) · |S \ (S ∪ S ∪ · · · ∪ S )| x∈S i=1 i 1 2 i−1

Further, by definition of theG REEDY-SET-COVER:

|Si \ (S1 ∪ S2 ∪ · · · ∪ Si−1)| ≥ |S \ (S1 ∪ S2 ∪ · · · ∪ Si−1)| = ui−1. Combining the last inequalities gives:

k k ui−1 X X 1 X X 1 cx ≤ (ui−1 − ui ) · = ui−1 ui−1 x∈S i=1 i=1 j=ui +1

k ui−1 X X 1 ≤ j i=1 j=ui +1 k X ≤ (H(ui−1) − H(ui )) = H(u0) − H(uk ) = H(|S|). i=1

IV. Covering Problems The Set-Covering Problem 22 Set-Covering Problem (Summary)

The same approach also gives an approximation ratio + of O(ln(n)) if there exists a cost function c : S → Z Theorem 35.4 GREEDY-SET-COVER is a polynomial-time ρ(n)-algorithm, where

ρ(n) = H(max{|S|: |S| ∈ F}) ≤ ln(n) + 1.

Is the bound on the approximation ratio tight? Is there a better algorithm?

Lower Bound Unless P=NP, there is no c · ln(n) approximation algorithm for set cover for some constant 0 < c < 1.

IV. Covering Problems The Set-Covering Problem 23 T1

T2

S1 S2 S3 S4

Solution of Greedy consists of k sets. Optimum consists of 2 sets.

Example where Greedy is a (1/2) · log2 n factor off

Instance Given any integer k ≥ 3 There are n = 2k+1 − 2 elements overall

Sets S1, S2,..., Sk are pairwise disjoint and each set contains 2, 4,..., 2k elements

Sets T1, T2 are disjoint and each set contains half of the elements of each set S1, S2,..., Sk

k = 4:

T1

T2

S1 S2 S3 S4

IV. Covering Problems The Set-Covering Problem 24 T1

T2

Optimum consists of 2 sets.

Example where Greedy is a (1/2) · log2 n factor off

Instance Given any integer k ≥ 3 There are n = 2k+1 − 2 elements overall

Sets S1, S2,..., Sk are pairwise disjoint and each set contains 2, 4,..., 2k elements

Sets T1, T2 are disjoint and each set contains half of the elements of each set S1, S2,..., Sk

k = 4:

T1

T2

S1 S2 S3 S4

Solution of Greedy consists of k sets.

IV. Covering Problems The Set-Covering Problem 24 S1 S2 S3 S4

Solution of Greedy consists of k sets.

Example where Greedy is a (1/2) · log2 n factor off

Instance Given any integer k ≥ 3 There are n = 2k+1 − 2 elements overall

Sets S1, S2,..., Sk are pairwise disjoint and each set contains 2, 4,..., 2k elements

Sets T1, T2 are disjoint and each set contains half of the elements of each set S1, S2,..., Sk

k = 4:

T1

T2

S1 S2 S3 S4

Optimum consists of 2 sets.

IV. Covering Problems The Set-Covering Problem 24 V. Approximation Algorithms via Exact Algorithms Thomas Sauerwald

Easter 2015 Outline

The Subset-Sum Problem

Parallel Machine Scheduling

V. Approximation via Exact Algorithms The Subset-Sum Problem 2 x3 + x4 + x5 = 12

The Subset-Sum Problem

The Subset-Sum Problem

Given: Set of positive integers S = {x1, x2,..., xn} and positive integer t 0 P Goal: Find a subset S ⊆ S which maximizes 0 xi ≤ t. i : xi ∈S

This problem is NP-hard

t = 13 tons

x1 = 10

x2 = 4

x3 = 5 x1 + x5 = 11

x4 = 6

x5 = 1

V. Approximation via Exact Algorithms The Subset-Sum Problem 3 x1 + x5 = 11

The Subset-Sum Problem

The Subset-Sum Problem

Given: Set of positive integers S = {x1, x2,..., xn} and positive integer t 0 P Goal: Find a subset S ⊆ S which maximizes 0 xi ≤ t. i : xi ∈S

This problem is NP-hard

t = 13 tons

x1 = 10

x2 = 4

x3 = 5 x3 + x4 + x5 = 12

x4 = 6

x5 = 1

V. Approximation via Exact Algorithms The Subset-Sum Problem 3 can be shown by induction on n

Correctness: Ln contains all sums of {x1, x2,..., xn} Runtime: O(21 + 22 + ··· + 2n) = O(2n)

i There are 2 subsets of {x1, x2,..., xi }. Better runtime if t and/or |Li | are small.

35.5 The subset-sum problem 1129

eratively computes Li ,thelistofsumsofallsubsetsof x1;:::;xi that do not f g exceed t,andthenitreturnsthemaximumvalueinLn. If L is a list of positive integers and x is another positive integer, then we let L x denote the list of integers derived from L by increasing each element of L C by x.Forexample,ifL 1; 2; 3; 5; 9 ,thenL 2 3; 4; 5; 7; 11 .Wealsouse Dh i C Dh i this notation for sets, so that S x s x s S : C D f C W 2 g AnWe Exact also use (Exponential-Time) an auxiliary procedure Algorithm M ERGE-LISTS.L; L0/,whichreturnsthe sorted list that is the merge of its two sorted input lists L and L0 with duplicate valuesDynamic removed. Progamming: Like the MERGE Computeprocedure bottom-up we used allin merge possible sort sums (Section≤ 2.3.1),t MERGE-LISTS runs in time O. L L0 /.WeomitthepseudocodeforMERGE- j j C j j LISTS. implementable in time O(|Li−1|) (like Merge-Sort) EXACT-SUBSET-SUM.S; t/ 1 n S Returns the merged list (in sorted D j j 2 L0 0 order and without duplicates) Dh i 3 for i 1 to n D S + x := {s + x : s ∈ S} 4 Li MERGE-LISTS.Li 1;Li 1 xi / D ! ! C 5removefromLi every element that is greater than t 6 return the largest element in Ln

To see how E XACT-SUBSET-SUM works, let Pi denote the set of all values obtained by selecting a (possibly empty) subset of x1;x2;:::;xi and summing Example: f g its members. For example, if S 1; 4; 5 ,then S = {1, 4, 5} D f g P1 0; 1 ; L0D= hf0i g P2 0; 1; 4; 5 ; D f g L1 = h0, 1i P3 0; 1; 4; 5; 6; 9; 10 : LD= hf0, 1, 4, 5i g Given2 the identity L3 = h0, 1, 4, 5, 6, 9, 10i Pi Pi 1 .Pi 1 xi /; (35.23) D ! [ ! C i Li we can prove byV. induction Approximation on via Exact(see Algorithms Exercise 35.5-1) The Subset-Sum that the Problem list is a sorted list4 containing every element of Pi whose value is not more than t.Sincethelength i of Li can be as much as 2 ,EXACT-SUBSET-SUM is an exponential-time algorithm in general, although it is a polynomial-time algorithm in the special cases in which t is polynomial in S or all the numbers in S are bounded by a polynomial in S . j j j j

Afullypolynomial-timeapproximationscheme We can derive a fully polynomial-time approximation scheme for the subset-sum problem by “trimming” each list Li after it is created. The idea behind trimming is implementable in time O(|Li−1|) (like Merge-Sort)

Returns the merged list (in sorted order and without duplicates)

S + x := {s + x : s ∈ S}

Example: S = {1, 4, 5}

L0 = h0i

L1 = h0, 1i

L2 = h0, 1, 4, 5i

L3 = h0, 1, 4, 5, 6, 9, 10i

35.5 The subset-sum problem 1129

eratively computes Li ,thelistofsumsofallsubsetsof x1;:::;xi that do not f g exceed t,andthenitreturnsthemaximumvalueinLn. If L is a list of positive integers and x is another positive integer, then we let L x denote the list of integers derived from L by increasing each element of L C by x.Forexample,ifL 1; 2; 3; 5; 9 ,thenL 2 3; 4; 5; 7; 11 .Wealsouse Dh i C Dh i this notation for sets, so that S x s x s S : C D f C W 2 g AnWe Exact also use (Exponential-Time) an auxiliary procedure Algorithm M ERGE-LISTS.L; L0/,whichreturnsthe sorted list that is the merge of its two sorted input lists L and L0 with duplicate valuesDynamic removed. Progamming: Like the MERGE Computeprocedure bottom-up we used allin merge possible sort sums (Section≤ 2.3.1),t MERGE-LISTS runs in time O. L L0 /.WeomitthepseudocodeforMERGE- j j C j j LISTS.

EXACT-SUBSET-SUM.S; t/ 1 n S D j j 2 L0 0 Dh i 3 for i 1 to n D 4 Li MERGE-LISTS.Li 1;Li 1 xi / D ! ! C 5removefromLi every element that is greater than t L 6 return the largest element in n can be shown by induction on n

To see how E XACT-SUBSET-SUM works, let Pi denote the set of all values obtained by selecting a (possiblyCorrectness: empty) subsetLn contains of x1;x2 all;:::;x sumsi and of { summingx1, x2,..., xn} f g its members. For example, if S 1; 4; 5 ,then1 2 n n Runtime:D f Og(2 + 2 + ··· + 2 ) = O(2 ) P1 0; 1 ; D f g i P2 0; 1; 4; 5 There; are 2 subsets of {x1, x2,..., xi }. Better runtime if t D f g and/or |Li | are small. P3 0; 1; 4; 5; 6; 9; 10 : D f g Given the identity

Pi Pi 1 .Pi 1 xi /; (35.23) D ! [ ! C i Li we can prove byV. induction Approximation on via Exact(see Algorithms Exercise 35.5-1) The Subset-Sum that the Problem list is a sorted list4 containing every element of Pi whose value is not more than t.Sincethelength i of Li can be as much as 2 ,EXACT-SUBSET-SUM is an exponential-time algorithm in general, although it is a polynomial-time algorithm in the special cases in which t is polynomial in S or all the numbers in S are bounded by a polynomial in S . j j j j

Afullypolynomial-timeapproximationscheme We can derive a fully polynomial-time approximation scheme for the subset-sum problem by “trimming” each list Li after it is created. The idea behind trimming is 1130 Chapter 35 Approximation Algorithms

that if two values in L are close to each other, then since we want just an approxi- mate solution, we do not need to maintain both of them explicitly. More precisely, we use a trimming parameter ı such that 0<ı<1.Whenwetrim alistL by ı, we remove as many elements from L as possible, in such a way that if L0 is the result of trimming L,thenforeveryelementy that was removed from L,thereis an element ´ still in L0 that approximates y,thatis, y ´ y: (35.24) 1 ı Ä Ä C We can think of such a ´ as “representing” y in the new list L0.Eachremoved element y is represented by a remaining element ´ satisfying inequality (35.24). TowardsFor example, a if FPTASı 0:1 and D L 10; 11; 12; 15; 20; 21; 22; 23; 24; 29 ; Dh i thenIdea: we can Don’t trim L needto obtain to maintain two values in L which are close to each other.

L0 10; 12; 15; 20; 23; 29 ; Dh i where theTrimming deleted valuea List11 is represented by 10,thedeletedvalues21 and 22 are representedGiven a bytrimming20,andthedeletedvalue parameter0 <24 δis < represented1 by 23.Because every element of the trimmed version of the list is also0 an element of the original versionTrimming of the list, trimmingL yields can minimal dramatically sublist decreaseL so the that number for of every elementsy ∈ keptL: ∃z ∈ L: while keeping a close (and slightly smaller) representativey value in the list for each deleted element. ≤ z ≤ y. The following procedure trims list L 1y+1;yδ2;:::;ym in time ‚.m/,givenL Dh i and ı,andassumingthatL is sorted into monotonically increasing order. The L = h10, 11, 12, 15, 20, 21, 22, 23, 24, 29i output of the procedure is a trimmed, sorted list. δ = 0.1 L0 = h10, 12, 15, 20, 23, 29i TRIM.L; ı/ 1letm be the length of L 2 L0 y1 Dh i 3 last y1 D 4 for i 2 to m D 5 if yi > last .1 ı/ // yi last because L is sorted " C # 6appendyi onto the end of L0 7 last yi D 8 return L0 Trims list in time Θ(m), if L is given in sorted order. The procedure scans the elements of L in monotonically increasing order. A num- ber is appended onto the returned list L0 only if it is the first element of L or if it cannot be representedV. Approximation by the most via Exact recent Algorithms number placed The into Subset-SumL0. Problem 5 Given the procedure TRIM,wecanconstructourapproximationschemeasfol- lows. This procedure takes as input a set S x1;x2;:::;xn of n integers (in D f g arbitrary order), a target integer t,andan“approximationparameter”!,where The returned list L0

last last last last last

i i i i i i i i

1130 Chapter 35 Approximation Algorithms

that if two values in L are close to each other, then since we want just an approxi- mate solution, we do not need to maintain both of them explicitly. More precisely, we use a trimming parameter ı such that 0<ı<1.Whenwetrim alistL by ı, we remove as many elements from L as possible, in such a way that if L0 is the result of trimming L,thenforeveryelementy that was removed from L,thereis an element ´ still in L0 that approximates y,thatis, y ´ y: (35.24) 1 ı Ä Ä C We can think of such a ´ as “representing” y in the new list L0.Eachremoved element y is represented by a remaining element ´ satisfying inequality (35.24). For example, if ı 0:1 and D L 10; 11; 12; 15; 20; 21; 22; 23; 24; 29 ; Dh i then we can trim L to obtain

L0 10; 12; 15; 20; 23; 29 ; Dh i where the deleted value 11 is represented by 10,thedeletedvalues21 and 22 are represented by 20,andthedeletedvalue24 is represented by 23.Because every element of the trimmed version of the list is also an element of the original version of the list, trimming can dramatically decrease the number of elements kept while keeping a close (and slightly smaller) representative value in the list for each deleted element. The following procedure trims list L y1;y2;:::;ym in time ‚.m/,givenL Dh i Illustrationand ı,andassumingthat of the TrimL is sorted Operation into monotonically increasing order. The output of the procedure is a trimmed, sorted list.

TRIM.L; ı/ 1letm be the length of L 2 L0 y1 Dh i 3 last y1 D 4 for i 2 to m D 5 if yi > last .1 ı/ // yi last because L is sorted " C # 6appendyi onto the end of L0 7 last yi D 8 return L0 The procedure scans the elements of L in monotonically increasing order. A num- ber is appendedδ onto= 0 the.1 returned list L0 only if it is the first element of L or if it cannot be represented by the most recentAfter number the initialization placed into L0. (lines 1-3) Given the procedure TRIM,wecanconstructourapproximationschemeasfol- lows. This procedure takes as input a set S x1;x2;:::;xn of n integers (in D f g arbitrary order), a target integerlast t,andan“approximationparameter”!,where L = h10, 11, 12, 15, 20, 21, 22, 23, 24, 29i

i

L0 = h10i

V. Approximation via Exact Algorithms The Subset-Sum Problem 6 After the initialization (lines 1-3)

last last last last last

i i i i i i i i

1130 Chapter 35 Approximation Algorithms

that if two values in L are close to each other, then since we want just an approxi- mate solution, we do not need to maintain both of them explicitly. More precisely, we use a trimming parameter ı such that 0<ı<1.Whenwetrim alistL by ı, we remove as many elements from L as possible, in such a way that if L0 is the result of trimming L,thenforeveryelementy that was removed from L,thereis an element ´ still in L0 that approximates y,thatis, y ´ y: (35.24) 1 ı Ä Ä C We can think of such a ´ as “representing” y in the new list L0.Eachremoved element y is represented by a remaining element ´ satisfying inequality (35.24). For example, if ı 0:1 and D L 10; 11; 12; 15; 20; 21; 22; 23; 24; 29 ; Dh i then we can trim L to obtain

L0 10; 12; 15; 20; 23; 29 ; Dh i where the deleted value 11 is represented by 10,thedeletedvalues21 and 22 are represented by 20,andthedeletedvalue24 is represented by 23.Because every element of the trimmed version of the list is also an element of the original version of the list, trimming can dramatically decrease the number of elements kept while keeping a close (and slightly smaller) representative value in the list for each deleted element. The following procedure trims list L y1;y2;:::;ym in time ‚.m/,givenL Dh i Illustrationand ı,andassumingthat of the TrimL is sorted Operation into monotonically increasing order. The output of the procedure is a trimmed, sorted list.

TRIM.L; ı/ 1letm be the length of L 2 L0 y1 Dh i 3 last y1 D 4 for i 2 to m D 5 if yi > last .1 ı/ // yi last because L is sorted " C # 6appendyi onto the end of L0 7 last yi D 8 return L0 The procedure scans the elements of L in monotonically increasing order. A num- ber is appended onto the returned list L0 only if it is the first element of L or if it δ = 0.1 0 cannot be represented by the most recent numberThe placed returned into L list0. L Given the procedure TRIM,wecanconstructourapproximationschemeasfol- lows. This procedure takes as input a set S x1;x2;:::;xn of n integers (in D f g arbitrary order), a target integer t,andan“approximationparameter”!,wherelast L = h10, 11, 12, 15, 20, 21, 22, 23, 24, 29i

i

L0 = h10, 12, 15, 20, 23, 29i

V. Approximation via Exact Algorithms The Subset-Sum Problem 6 35.5 The subset-sum problem 1129

eratively computes Li ,thelistofsumsofallsubsetsof x1;:::;xi that do not f g exceed t,andthenitreturnsthemaximumvalueinLn. If L is a list of positive integers and x is another positive integer, then we let L x denote the list of integers derived from L by increasing each element of L C by x.Forexample,ifL 1; 2; 3; 5; 9 ,thenL 2 3; 4; 5; 7; 11 .Wealsouse Dh i C Dh i this notation for sets, so that S x s x s S : The FPTAS C D f C W 2 g 35.5 The subset-sum problemWe also use 1131 an auxiliary procedure M ERGE-LISTS.L; L0/,whichreturnsthe sorted list that is the merge of its two sorted input lists L and L0 with duplicate values removed. Like the MERGE procedure we used in merge sort (Section 2.3.1), 0

0

LineInput: 2 initializesS = theh104 list L, 0102to be, 201 the list, 101 containingi, t = 308, just the element= 0.4 0.Thefor ⇒loopTrimming in lines 3–6 computesparameterLi as: δ a= sorted/( list2 · containingn) = /8 a suitably= 0.05 trimmed ver- sion of the set Pi ,withallelementslargerthant removed. Since we create Li fromlineLi 1 2:,wemustensurethattherepeatedtrimmingdoesn’tintroducetoomuchL0 = h0i ! compounded inaccuracy. In a moment, we shall see that APPROX-SUBSET-SUM line 4: L1 = h0, 104i returns a correct approximation if one exists. line 5: L1 = h0, 104i Asline an example, 6: L = supposeh0, 104 we havei the instance S 104; 102;1 201; 101 Dh i line 4: L2 = h0, 102, 104, 206i with t 308 and ! 0:40.Thetrimmingparameterı is !=8 0:05.APPROX- lineD 5: L =Dh0, 102, 206i D SUBSET-SUM computes2 the following values on the indicated lines: line 6: L2 = h0, 102, 206i line 2: L0 0 ; Dhi line 4: L3 = h0, 102, 201, 206, 303, 407i line 4: L1 0; 104 ; line 5: LDh3 = h0, 102i , 201, 303, 407i line 5: L1 0; 104 ; line 6: LDh3 = h0, 102i , 201, 303i line 6: L1 0; 104 ; line 4: LDh4 = h0, 101i , 102, 201, 203, 302, 303, 404i line 4: L2 =0;h 102;, 104;, 206 ,; , i line 5: LDh4 0 101 201i 302 404 Returned solution z∗ = 302, which is 2% line 5:lineL 6:2 L =0;h 102;0, 101 206, 201; , 302i Dh4 i of the optimum 307 = 104 + 102 + 101 line 6: L2 0; 102; 206 ; Dh i V. Approximation via Exact Algorithms The Subset-Sum Problem 8 line 4: L3 0; 102; 201; 206; 303; 407 ; Dh i line 5: L3 0; 102; 201; 303; 407 ; Dh i line 6: L3 0; 102; 201; 303 ; Dh i line 4: L4 0; 101; 102; 201; 203; 302; 303; 404 ; Dh i line 5: L4 0; 101; 201; 302; 404 ; Dh i line 6: L4 0; 101; 201; 302 : Dh i Analysis of APPROX-SUBSET-SUM

Theorem 35.8 APPROX-SUBSET-SUM is a FPTAS for the subset-sum problem.

Proof (Approximation Ratio): ∗ Returned solution z is a valid solution X Let y ∗ denote an optimal solution

For every possible sum y ≤ t, there exists an element z ∈ Li such that: ∗ y y=y∗ y ≤ z ≤ y ⇒ ≤ z ≤ y ∗ (1 + /(2n))i (1 + /(2n))i y ∗   n ≤ 1 + , Can be shown by induction on i z 2n

n  /2  n→∞ /2 and now using the fact that 1 + n −→ e yields

y ∗ ≤ e/2 Taylor approximation of e z ≤ 1 + /2 + (/2)2 ≤ 1 + 

V. Approximation via Exact Algorithms The Subset-Sum Problem 9 Analysis of APPROX-SUBSET-SUM

Theorem 35.8 APPROX-SUBSET-SUM is a FPTAS for the subset-sum problem.

Proof (Running Time):

Strategy: Derive a bound on |Li | (running time is polynomial in |Li |) After trimming, two successive elements z and z0 satisfy z0/z ≥ 1 + /(2n)

⇒ Possible Values after trimming are 0, 1, and up to dlog1+/(2n) te additional values. Hence, ln t log t + 2 = + 2 1+/(2n) ln(1 + /(2n)) 2n(1 + /(2n)) ln t ≤ + 2  3n ln t For x > −1, ln(1 + x) ≥ x < + 2. 1+x 

This bound on |Li | is polynomial in the size of the input and in 1/.

Need log(t) bits to represent t and n bits to represent S.

V. Approximation via Exact Algorithms The Subset-Sum Problem 9 Concluding Remarks

The Subset-Sum Problem

Given: Set of positive integers S = {x1, x2,..., xn} and positive integer t 0 P Goal: Find a subset S ⊆ S which maximizes 0 xi ≤ t. i : xi ∈S

Theorem 35.8 APPROX-SUBSET-SUM is a FPTAS for the subset-sum problem.

A more general problem than Subset-Sum. The Knapsack Problem

Given: Items i = 1, 2,..., n with weights wi and values vi , and integer t Goal: Find a subset S0 ⊆ S which P 1. maximizes i∈S0 vi P 2. satisfies i∈S0 wi ≤ t

Algorithm very similar to APPROX-SUBSET-SUM. Theorem There is a FPTAS for the Knapsack problem.

V. Approximation via Exact Algorithms The Subset-Sum Problem 10 Outline

The Subset-Sum Problem

Parallel Machine Scheduling

V. Approximation via Exact Algorithms Parallel Machine Scheduling 11 M2 J4 J3 J1

M1 J2

Parallel Machine Scheduling

Machine Scheduling Problem

Given: n jobs J1, J2,..., Jn with processing times p1, p2,..., pn, and m identical machines M1, M2,..., Mm Goal: Schedule the jobs on the machines minimizing the makespan Cmax = max1≤j≤n Cj , where Ck is the completion time of job Jk .

J1: p1 = 2

J2: p2 = 12

J3: p3 = 6

J4: p4 = 4

M2 J4 J3

M1 J1 J2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

V. Approximation via Exact Algorithms Parallel Machine Scheduling 12 M2 J4 J3

M1 J1 J2

Parallel Machine Scheduling

Machine Scheduling Problem

Given: n jobs J1, J2,..., Jn with processing times p1, p2,..., pn, and m identical machines M1, M2,..., Mm Goal: Schedule the jobs on the machines minimizing the makespan Cmax = max1≤j≤n Cj , where Ck is the completion time of job Jk .

J1: p1 = 2

J2: p2 = 12

J3: p3 = 6

J4: p4 = 4

M2 J4 J3 J1

M1 J2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

V. Approximation via Exact Algorithms Parallel Machine Scheduling 12 NP-Completeness of Parallel Machine Scheduling

Lemma Parallel Machine Scheduling is NP-complete even if there are only two machines.

Proof Idea: Polynomial time reduction from NUMBER-PARTITIONING.

M2 J4 J3 J1

M1 J2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Equivalent to the following Online Algorithm [CLRS]: Whenever a machine is idle, schedule any job that has not yet been scheduled.

LIST SCHEDULING(J1, J2,..., Jn, m) 1: while there exists an unassigned job 2: Schedule job on the machine with the least load

How good is this most basic Greedy Approach?

V. Approximation via Exact Algorithms Parallel Machine Scheduling 13 List Scheduling Analysis (Observations)

Ex 35-5 a.&b. a. The optimal makespan is at least as large as the greatest processing time, that is,

∗ Cmax ≥ max pk . 1≤k≤n

b. The optimal makespan is at least as large as the average machine load, that is, n ∗ 1 X C ≥ p . max m k k=1

Proof: Pn a. The total processing times of all n jobs equals k=1 pk 1 Pn b. One machine must have a load of at least m · k=1 pk

V. Approximation via Exact Algorithms Parallel Machine Scheduling 14 List Scheduling Analysis (Final Step)

Ex 35-5 d. (Graham 1966) For the schedule returned by the greedy algorithm it holds that

n 1 X Cmax ≤ pk + max pk . m 1≤k≤n k=1 Hence list scheduling is a poly-time 2-approximation algorithm.

Proof: Let Ji be the last job scheduled on machine Mj with Cmax = Cj When Ji was scheduled to machine Mj , Cj − pi ≤ Ck for all 1 ≤ k ≤ m Averaging over k yields: Using Ex 35-5 a. & b. m n n 1 X 1 X 1 X ∗ Cj − pi ≤ Ck = pk ⇒ Cmax ≤ pk + max pk ≤ 2 · Cmax m m m 1≤k≤n k=1 k=1 k=1

Mj Ji

0 Cj − pi Cmax

V. Approximation via Exact Algorithms Parallel Machine Scheduling 15 Improving Greedy

The problem of the List-Scheduling Approach were the large jobs

Analysis can be shown to be almost tight. Is there a better algorithm?

LEAST PROCESSING TIME(J1, J2,..., Jn, m) 1: Sort jobs decreasingly in their processing times 2: for i = 1 to n 3: Ci = 0 4: Si = ∅ 5: end for 6: for j = 1 to n

7: i = argmin1≤k≤m Lk 8: Si = Si ∪ {j}, Ci = Ci + pj 9: end for 10: return S1,..., Sm Runtime: O(n log n) for sorting O(n log n) for extracting the minimum (use ).

V. Approximation via Exact Algorithms Parallel Machine Scheduling 16 Analysis of Improved Greedy

Graham 1966 The LPT algorithm has an approximation ratio of 4/3 − 1/(3m).

This can be shown to be tight (see next slide). Proof (of approximation ratio 3/2). Observation 1: If there are at most m jobs, then the solution is optimal. ∗ Observation 2: If there are more than m jobs, then Cmax ≥ 2 · pm+1. As in the analysis for list scheduling, we have

∗ 1 ∗ 3 Cj = (Cj − pi ) + pi ≤ C + C = Cmax. max 2 max 2 This is for the case j ≥ m + 1 (otherwise, an even stronger inequality holds)

Mj Ji

0 Cj − pi Cmax

V. Approximation via Exact Algorithms Parallel Machine Scheduling 17 19 20 1 15 = 15 − 15

LPT gives Cmax = 19 ∗ Optimum is Cmax = 15

5 7 5 7 5 8 6 7 ∗ 8 6 7 Cmax = 15 Cmax = 19 9 5 6 9 5 6 5

Tightness of the Bound for LPT

Graham 1966 The LPT algorithm has an approximation ratio of 4/3 − 1/(3m).

Proof of an instance which shows tightness: m machines n = 2m + 1 jobs of length 2m − 1, 2m − 2,..., m and one job of length m

m = 5, n = 11 :

9 9 M5 8 8 7 7 M4 6 6 5 5 5 M3

M2

M1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

V. Approximation via Exact Algorithms Parallel Machine Scheduling 18 19 20 1 15 = 15 − 15

∗ Optimum is Cmax = 15

9 9 5 5 58 8 7 7 8 7 6 6 ∗ 5 5 5 8 7 Cmax = 15 9 6 9 6

Tightness of the Bound for LPT

Graham 1966 The LPT algorithm has an approximation ratio of 4/3 − 1/(3m).

Proof of an instance which shows tightness: m machines n = 2m + 1 jobs of length 2m − 1, 2m − 2,..., m and one job of length m

m = 5, n = 11 : LPT gives Cmax = 19

M5 7 7

M4 8 6

M3 8 6 Cmax = 19

M2 9 5

M1 9 5 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

V. Approximation via Exact Algorithms Parallel Machine Scheduling 18 9 9 7 7 8 8 7 7 8 6 6 6 5 5 5 8 6 Cmax = 19 9 5 9 5 5

Tightness of the Bound for LPT

Graham 1966 The LPT algorithm has an approximation ratio of 4/3 − 1/(3m).

19 = 20 − 1 Proof of an instance which shows tightness: 15 15 15 m machines n = 2m + 1 jobs of length 2m − 1, 2m − 2,..., m and one job of length m

m = 5, n = 11 : LPT gives Cmax = 19 ∗ Optimum is Cmax = 15

M5 5 5 5

M4 8 7 ∗ M3 8 7 Cmax = 15

M2 9 6

M1 9 6 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

V. Approximation via Exact Algorithms Parallel Machine Scheduling 18 A PTAS for Parallel Machine Scheduling

Basic Idea: For (1 + )-approximation, don’t have to work with exact pk ’s.

SUBROUTINE(J1, J2,..., Jn, m, T ) ∗ 1: Either: Return a solution with Cmax ≤ (1 + ) · max{T , Cmax} 2: Or: Return there is no solution with makespan < T

Key Lemma We will prove this on the next slides. 2 SUBROUTINE can be implemented in time nO(1/ ).

Theorem (Hochbaum, Shmoys’87) There exists a PTAS for Parallel Machine Scheduling which runs in time O(1/2) Pn O(n · log P), where P := k=1 pk . polynomial in the size of the input ∗ ∗ Since 0 ≤ Cmax ≤ P and Cmax is integral, Proof (using Key Lemma): binary search terminates after O(log P) steps.

PTAS(J1, J2,..., Jn, m) ∗ 1: Do binary search to find smallest T s.t. Cmax ≤ (1 + ) · max{T , Cmax}. 2: Return solution computed by SUBROUTINE(J1, J2,..., Jn, m, T )

V. Approximation via Exact Algorithms Parallel Machine Scheduling 19 Implementation of Subroutine

SUBROUTINE(J1, J2,..., Jn, m, T ) ∗ 1: Either: Return a solution with Cmax ≤ (1 + ) · max{T , Cmax} 2: Or: Return there is no solution with makespan < T

Observation

Divide jobs into two groups: Jsmall = {Ji : pi ≤  · T } and Jlarge = J \ Jsmall. Given a solution for Jlarge only with makespan (1 + ) · T , then greedily ∗ placing Jsmall yields a solution with makespan (1 + ) · max{T , Cmax}.

Proof: Let Mj be the machine with largest load If there are no jobs from Jsmall, then makespan is at most (1 + ) · T . Otherwise, let i ∈ Jsmall be the last job added to Mj . n n 1 X 1 X C − p ≤ p ⇒ C ≤ p + p j i m k j i m k k=1 k=1 ∗ the “well-known” formula ≤  · T + Cmax ∗ ≤ (1 + ) · max{T , Cmax}

V. Approximation via Exact Algorithms Parallel Machine Scheduling 20 T ≤ T + b · ≤ (1 + ) · T . b2

0 Cmax ≤ T + b · max (pi − pi ) i∈Jlarge

2 2 Number of table entries is at most nb , hence filling all entries takes nO(b ) 0 If f (nb, nb+1,..., nb2 ) ≤ m (for the jobs with p ), then return yes, otherwise no. 0 ≥ T ≤ As every machine is assigned at most b jobs (pi b ) and the makespan is T ,

Proof of Key Lemma

Use Dynamic Programming to schedule Jlarge with makespan (1 + ) · T .

p b2 Let b be the smallest integer with 1/b ≤ . Define processing times p0 =  j  · T i T b2 0 T 2 ⇒ Every p = α · for α = b, b + 1,..., b Can assume there are no jobs with pj ≥ T ! i b2 Pb2 T Assignments to one machine Let C be all (sb, sb+1,..., s 2 ) with = sj · j · ≤ T . b i j b2 with makespan ≤ T .

Let f (nb, nb+1,..., nb2 ) be the minimum number of machines required to schedule all jobs with makespan ≤ T : Assign some jobs to one machine, and then f (0, 0,..., 0) = 0 use as few machines as possible for the rest.

f (nb, nb+1,..., nb2 ) = 1 + min f (nb − sb, nb+1 − sb+1,..., nb2 − sb2 ). (s ,s ,...,s )∈C b b+1 b2

1.5 · T  = 0.5 1.5 · T 1.25 · T b = 2 1.25 · T 1 · T 1 · T 0 0.75 · T p1 0.75 · T p1 0.5 · T p2 0.5 · T p0 p 2 3 p 0 0.25 · T 4 p5 p6 0.25 · T p3 0 0 Jlarge Jsmall Jlarge

V. Approximation via Exact Algorithms Parallel Machine Scheduling 21 Can assume there are no jobs with pj ≥ T ! Assignments to one machine with makespan ≤ T .

Assign some jobs to one machine, and then use as few machines as possible for the rest.

1.5 · T  = 0.5 1.5 · T 1.25 · T b = 2 1.25 · T 1 · T 1 · T 0 0.75 · T p1 0.75 · T p1 0.5 · T p2 0.5 · T p0 p 2 3 p 0 0.25 · T 4 p5 p6 0.25 · T p3 0 0 Jlarge Jsmall Jlarge

Proof of Key Lemma

Use Dynamic Programming to schedule Jlarge with makespan (1 + ) · T .

p b2 Let b be the smallest integer with 1/b ≤ . Define processing times p0 =  j  · T i T b2 ⇒ Every p0 = α · T for α = b, b + 1,..., b2 i b2 Pb2 T Let C be all (s , s + ,..., s 2 ) with s · j · ≤ T . b b 1 b i=j j b2

Let f (nb, nb+1,..., nb2 ) be the minimum number of machines required to schedule all jobs with makespan ≤ T : f (0, 0,..., 0) = 0

f (nb, nb+1,..., nb2 ) = 1 + min f (nb − sb, nb+1 − sb+1,..., nb2 − sb2 ). (s ,s ,...,s )∈C b b+1 b2

2 2 Number of table entries is at most nb , hence filling all entries takes nO(b ) 0 If f (nb, nb+1,..., nb2 ) ≤ m (for the jobs with p ), then return yes, otherwise no. 0 ≥ T ≤ As every machine is assigned at most b jobs (pi b ) and the makespan is T , 0 Cmax ≤ T + b · max (pi − pi ) i∈Jlarge T ≤ T + b · ≤ (1 + ) · T . b2

V. Approximation via Exact Algorithms Parallel Machine Scheduling 21 Final Remarks

Graham 1966 List scheduling has an approximation ratio of 2.

Graham 1966 The LPT algorithm has an approximation ratio of 4/3 − 1/(3m).

Theorem (Hochbaum, Shmoys’87) There exists a PTAS for Parallel Machine Scheduling which runs in time O(1/2) Pn O(n · log P), where P := k=1 pk .

Can we find a FPTAS (for polynomially bounded processing times)? No!

Because for sufficiently small approximation ratio 1 + , the computed solution has to be optimal.

V. Approximation via Exact Algorithms Parallel Machine Scheduling 22 VI. Approximation Algorithms: Travelling Salesman Problem Thomas Sauerwald

Easter 2015 Outline

Introduction

General TSP

Metric TSP

VI. Travelling Salesman Problem Introduction 2 3 + 2 + 1 + 3 = 9

The Traveling Salesman Problem (TSP)

Given a set of cities along with the cost of travel between them, find the cheapest route visiting all cities and returning to your starting point.

3 Formal Definition 2 1 Given: A complete undirected graph G = (V , E) with nonnegative integer cost c(u, v) for each edge (u, v) ∈ E 4 1 Goal: Find a hamiltonian cycle of G with minimum cost.

Solution space consists of n! possible tours! 3 2 + 4 + 1 + 1 = 8 Actually the right number is (n − 1)!/2 Special Instances Even this version is Metric TSP: costs satisfy triangle inequality: NP hard (Ex. 35.2-2) ∀u, v, w ∈ V : c(u, w) ≤ c(u, v) + c(v, w).

Euclidean TSP: cities are points in the Euclidean space, costs are equal to their Euclidean distance

VI. Travelling Salesman Problem Introduction 3 History of the TSP problem (1954)

Dantzig, Fulkerson and Johnson found an optimal tour through 42 cities.

http://www.math.uwaterloo.ca/tsp/history/img/dantzig_big.html

VI. Travelling Salesman Problem Introduction 4 (2.25, 3)

Additional constraint to cut the solution space of the LP

x2 ≤ 3

(2, 3)

More cuts are needed to find integral solution

The Dantzig-Fulkerson-Johnson Method

1. Create a linear program (variable x(u, v) = 1 iff tour goes between u and v) 2. Solve the linear program. If the solution is integral and form a tour, stop. Otherwise find a new constraint to add (cutting plane)

x 2 (3.3, 1.5) 5 2x1 − 9x2 ≤ −27

4 1 max 3 x + y 3

2

1 4x1 + 9x2 ≤ 36

x1 0 1 2 3 4 5 6 7 8 9

VI. Travelling Salesman Problem Introduction 5 (3.3, 1.5)

The Dantzig-Fulkerson-Johnson Method

1. Create a linear program (variable x(u, v) = 1 iff tour goes between u and v) 2. Solve the linear program. If the solution is integral and form a tour, stop. Otherwise find a new constraint to add (cutting plane)

x2 (2.25, 3) 5 2x1 − 9x2 ≤ −27 Additional constraint to cut

4 1 the solution space of the LP max 3 x + y x2 ≤ 3 3

2

1 (2, 3) 4x1 + 9x2 ≤ 36 x1 0 1 2 3 4 5 6 7 8 9 More cuts are needed to find integral solution

VI. Travelling Salesman Problem Introduction 5 Outline

Introduction

General TSP

Metric TSP

VI. Travelling Salesman Problem General TSP 6 ⇒ c(T ) ≥ (ρ|V | + 1) + (|V | − 1) = (ρ + 1)|V |.

If G has a hamiltonian cycle H, then (G0, c) contains a tour of cost |V | If G does not have a hamiltonian cycle, then any tour T must use some edge 6∈ E,

Gap of ρ + 1 between tours which are using only edges in G and those which don’t ρ-Approximation of TSP in G0 computes hamiltonian cycle in G (if one exists) ρ · 41£ + 1

Hardness of Approximation

Theorem 35.3 If P 6= NP, then for any constant ρ ≥ 1, there is no polynomial-time ap- proximation algorithm with approximation ratio ρ for the general TSP.

Proof: Idea: Reduction from the hamiltonian-cycle problem.

Let G = (V , E) be an instance of the hamiltonian-cycle problem Let G0 = (V , E0) be a complete graph with costs for each (u, v) ∈ E0: ( Can create representations of G0 and 1 if (u, v) ∈ E, c in time polynomial in |V | and |E|! c(u, v) = Large weight will render ρ|V | + 1 otherwise. this edge useless!

1

Reduction 1 G = (V , E) 1 G0 = (V , E0) 1 ρ · 4 + 1 1

VI. Travelling Salesman Problem General TSP 7 ⇒ c(T ) ≥ (ρ|V | + 1) + (|V | − 1) = (ρ + 1)|V |.

Can create representations of G0 and c in time polynomial in |V | and |E|! Large weight will render this edge useless!

If G does not have a hamiltonian cycle, then any tour T must use some edge 6∈ E,

Gap of ρ + 1 between tours which are using only edges in G and those which don’t ρ-Approximation of TSP in G0 computes hamiltonian cycle in G (if one exists) ρ · 41£ + 1

Hardness of Approximation

Theorem 35.3 If P 6= NP, then for any constant ρ ≥ 1, there is no polynomial-time ap- proximation algorithm with approximation ratio ρ for the general TSP.

Proof: Idea: Reduction from the hamiltonian-cycle problem.

Let G = (V , E) be an instance of the hamiltonian-cycle problem Let G0 = (V , E0) be a complete graph with costs for each (u, v) ∈ E0: ( 1 if (u, v) ∈ E, c(u, v) = ρ|V | + 1 otherwise. If G has a hamiltonian cycle H, then (G0, c) contains a tour of cost |V |

1

Reduction 1 G = (V , E) 1 G0 = (V , E0) 1 ρ · 4 + 1 1

VI. Travelling Salesman Problem General TSP 7 Can create representations of G0 and c in time polynomial in |V | and |E|! Large weight will render this edge useless!

Hardness of Approximation

Theorem 35.3 If P 6= NP, then for any constant ρ ≥ 1, there is no polynomial-time ap- proximation algorithm with approximation ratio ρ for the general TSP.

Proof: Idea: Reduction from the hamiltonian-cycle problem.

Let G = (V , E) be an instance of the hamiltonian-cycle problem Let G0 = (V , E0) be a complete graph with costs for each (u, v) ∈ E0: ( 1 if (u, v) ∈ E, c(u, v) = ρ|V | + 1 otherwise. If G has a hamiltonian cycle H, then (G0, c) contains a tour of cost |V | If G does not have a hamiltonian cycle, then any tour T must use some edge 6∈ E, ⇒ c(T ) ≥ (ρ|V | + 1) + (|V | − 1) = (ρ + 1)|V |. Gap of ρ + 1 between tours which are using only edges in G and those which don’t ρ-Approximation of TSP in G0 computes hamiltonian cycle in G (if one exists) ρ · 4 + 1

Reduction 1 G = (V , E) 1 G0 = (V , E0) 1 ρ · 4 + 1 1

VI. Travelling Salesman Problem General TSP 7 Proof of Theorem 35.3 from a higher perspective

General Method to prove inapproximability results! All instances with a hamiltonian cycle All instances with cost ≤ k

x f f (x)

f y f (y)

All instances with cost ≥ ρ · k

instances of Hamilton instances of TSP

VI. Travelling Salesman Problem General TSP 8 Outline

Introduction

General TSP

Metric TSP

VI. Travelling Salesman Problem Metric TSP 9 1112 Chapter 35 Approximation Algorithms

c.A/ c.u;!/: D .u;!/ A X2 In many practical situations, the least costly way to go from a place u to a place w is to go directly, with no intermediate steps. Put another way, cutting out an inter- mediate stop never increases the cost. We formalize this notion by saying that the cost function c satisfies the triangle inequality if, for all vertices u; !;w V , 2 c.u;w/ c.u;!/ c.!;w/: Ä C The triangle inequality seems as though it should naturally hold, and it is au- tomatically satisfied in several applications. For example, if the vertices of the graph are points in the plane and the cost of traveling between two vertices is the ordinary euclidean distance between them, then the triangle inequality is satisfied. Furthermore, many cost functions other than euclidean distance satisfy the triangle inequality. As Exercise 35.2-2 shows, the traveling-salesman problem is NP-complete even if we require that the cost function satisfy the triangle inequality. Thus, we should not expect to find a polynomial-time algorithm for solving this problem exactly. Instead, we look for good approximation algorithms. In Section 35.2.1, we examine a 2-approximation algorithm for the traveling- salesman problem with the triangle inequality. In Section 35.2.2, we show that without the triangle inequality, a polynomial-time approximation algorithm with a constant approximation ratio does not exist unless P NP. D The TSP Problem with the Triangle Inequality 35.2.1 The traveling-salesman problem with the triangle inequality Applying the methodology of the previous section, we shall first compute a struc- ture—a minimum spanning tree—whose weight gives a lower bound on the length of an optimal traveling-salesman tour. We shall then use the minimum spanning tree to create a tour whose cost is no more than twice that of the minimum spanning tree’s weight, as long as the cost function satisfies the triangle inequality. The fol- lowingIdea: algorithm First compute implements an this MST, approach, and calling then create the minimum-spanning-tree a tour based on the tree. algorithm MST-PRIM from Section 23.2 as a subroutine. The parameter G is a complete undirected graph, and the cost function c satisfies the triangle inequality.

APPROX-TSP-TOUR.G; c/ 1selectavertexr G:V to be a “root” vertex 2 2computeaminimumspanningtreeT for G from root r using MST-PRIM.G; c; r/ 3letH be a list of vertices, ordered according to when they are first visited in a preorder tree walk of T 4 return the hamiltonian cycle H

Runtime is dominated by MST-PRIM, which is Θ(V 2).

VI. Travelling Salesman Problem Metric TSP 10 X X

ThisSolutionBetter is the has optimalsolution, cost solution≈ yet19 still.704 (costnot - not optimal!≈ optimal!14.715).

X 2. Perform preorder walk on MST 3. Return list of vertices according to the preorder tree walk

Run of APPROX-TSP-TOUR

a d

e

b f g

c

h

1. Compute MST

VI. Travelling Salesman Problem Metric TSP 11 X

ThisSolutionBetter is the has optimalsolution, cost solution≈ yet19 still.704 (costnot - not optimal!≈ optimal!14.715).

X 3. Return list of vertices according to the preorder tree walk

Run of APPROX-TSP-TOUR

a d

e

b f g

c

h

1. Compute MST X 2. Perform preorder walk on MST

VI. Travelling Salesman Problem Metric TSP 11 ThisSolutionBetter is the has optimalsolution, cost solution≈ yet19 still.704 (costnot - not optimal!≈ optimal!14.715).

X

Run of APPROX-TSP-TOUR

a d

e

b f g

c

h

1. Compute MST X 2. Perform preorder walk on MST X 3. Return list of vertices according to the preorder tree walk

VI. Travelling Salesman Problem Metric TSP 11 ThisBetter is the optimalsolution, solution yet still (costnot optimal!≈ 14.715).

Run of APPROX-TSP-TOUR

Solution has cost ≈ 19.704 - not optimal!

a d

e

b f g

c

h

1. Compute MST X 2. Perform preorder walk on MST X 3. Return list of vertices according to the preorder tree walk X

VI. Travelling Salesman Problem Metric TSP 11 ThisSolution is the has optimal cost solution≈ 19.704 (cost - not≈ optimal!14.715).

Run of APPROX-TSP-TOUR

Better solution, yet still not optimal!

a d

e

b f g

c

h

1. Compute MST X 2. Perform preorder walk on MST X 3. Return list of vertices according to the preorder tree walk X

VI. Travelling Salesman Problem Metric TSP 11 SolutionBetter has solution, cost ≈ yet19 still.704 not - not optimal! optimal!

Run of APPROX-TSP-TOUR

This is the optimal solution (cost ≈ 14.715).

a d

e

b f g

c

h

1. Compute MST X 2. Perform preorder walk on MST X 3. Return list of vertices according to the preorder tree walk X

VI. Travelling Salesman Problem Metric TSP 11 c(W ) = 2(T ) ≤ 2c(H∗)

c(H) ≤ c(W ) ≤ 2c(H∗)

Let W be the full walk of the spanning tree T (including repeated visits) ⇒ Full walk traverses every edge exactly twice, so exploiting triangle inequality!

Deleting duplicate vertices from W yields a tour T

Walk W Tour= (minimumaH, b=, c (,a¡b,, spanningbh,,c¡b,,ha£,,dd,, treeee,,ff,,Tg¡e,,ag),¡e, ¡d, a) optimal solution H∗

Proof of the Approximation Ratio Theorem 35.2 APPROX-TSP-TOUR is a polynomial-time 2-approximation for the traveling-salesman problem with the triangle inequality.

Proof: exploiting that all edge Consider the optimal tour H∗ and remove one edge costs are non-negative! ⇒ yields a spanning tree and therefore c(T ) ≤ c(H∗)

a d a d

e e

b f g b f g

c c

h h

∗ solution H of APPROX-TSP spanning tree as a subset of H

VI. Travelling Salesman Problem Metric TSP 12 c(H) ≤ c(W ) ≤ 2c(H∗)

exploiting that all edge costs are non-negative!

exploiting triangle inequality!

Deleting duplicate vertices from W yields a tour T

∗ Walk W Tour= (solutionminimumaH, b=, c (,a¡bH,, spanningbhof,,c¡b A,,PPROXha£,,dd,, treeee-T,,ff,SP,Tg¡e,,ag),¡e, ¡d, a) spanning tree as a subset of H

Proof of the Approximation Ratio Theorem 35.2 APPROX-TSP-TOUR is a polynomial-time 2-approximation for the traveling-salesman problem with the triangle inequality.

Proof: Consider the optimal tour H∗ and remove one edge ⇒ yields a spanning tree and therefore c(T ) ≤ c(H∗) Let W be the full walk of the spanning tree T (including repeated visits) ⇒ Full walk traverses every edge exactly twice, so c(W ) = 2(T ) ≤ 2c(H∗)

a d a d

e e

b f g b f g

c c

h h

Walk W = (a, b, c, b, h, b, a, d, e, f , e, g, e, d, a) optimal solution H∗

VI. Travelling Salesman Problem Metric TSP 12 exploiting that all edge costs are non-negative!

∗ Walk W Tour= (solutionminimumaH, b=, c (,abH,, spanningbhof,,cb A,,PPROXha,,dd,, treeee-T,,ff,SP,Tge,,ag), e, d, a) spanning tree as a subset of H

Proof of the Approximation Ratio Theorem 35.2 APPROX-TSP-TOUR is a polynomial-time 2-approximation for the traveling-salesman problem with the triangle inequality.

Proof: Consider the optimal tour H∗ and remove one edge ⇒ yields a spanning tree and therefore c(T ) ≤ c(H∗) Let W be the full walk of the spanning tree T (including repeated visits) ⇒ Full walk traverses every edge exactly twice, so c(W ) = 2(T ) ≤ 2c(H∗) exploiting triangle inequality! Deleting duplicate vertices from W yields a tour T with smaller cost: c(H) ≤ c(W ) ≤ 2c(H∗)

a d a d

e e

b f g b f g

c c

h h

Walk W = (a, b, c,¡b, h,¡b, a£, d, e, f ,¡e, g,¡e, ¡d, a) optimal solution H∗ VI. Travelling Salesman Problem Metric TSP 12 Christofides Algorithm

Theorem 35.2 APPROX-TSP-TOUR is a polynomial-time 2-approximation for the traveling-salesman problem with the triangle inequality.

Can we get a better approximation ratio?

CHRISTOFIDES(G, c) 1: select a vertex r ∈ G.V to be a “root” vertex 2: compute a minimum spanning tree T for G from root r 3: using MST-PRIM(G, c, r) 4: compute a perfect matching M with minimum weight in the complete graph 5: over the odd-degree vertices in T 6: let H be a list of vertices, ordered according to when they are first visited 7: in a Eulearian circuit of T ∪ M 8: return H

Theorem (Christofides’76) 3 There is a polynomial-time 2 -approximation algorithm for the travelling salesman problem with the triangle inequality.

VI. Travelling Salesman Problem Metric TSP 13 X X X

Solution has cost ≈ 15.54 - within 10% of the optimum!

e

b f g

c

h

X 2. Add a minimum-weight perfect matching M of the odd vertices in T 3. Find an Eulerian Circuit 4. Transform the Circuit into a Hamiltonian Cycle

Run of CHRISTOFIDES

a d

e

b f g

c

h

1. Compute MST

VI. Travelling Salesman Problem Metric TSP 14 X X

Solution has cost ≈ 15.54 - within 10% of the optimum!

3. Find an Eulerian Circuit 4. Transform the Circuit into a Hamiltonian Cycle

Run of CHRISTOFIDES

a d

e

b f g

c

h

1. Compute MST X 2. Add a minimum-weight perfect matching M of the odd vertices in T X

VI. Travelling Salesman Problem Metric TSP 14 X

Solution has cost ≈ 15.54 - within 10% of the optimum!

e

b f g

c

h

4. Transform the Circuit into a Hamiltonian Cycle

Run of CHRISTOFIDES

a d

e

b f g

c

h

1. Compute MST X 2. Add a minimum-weight perfect matching M of the odd vertices in T X 3. Find an Eulerian Circuit X

VI. Travelling Salesman Problem Metric TSP 14 e

b f g

c

h

Run of CHRISTOFIDES

Solution has cost ≈ 15.54 - within 10% of the optimum!

a d

e

b f g

c

h

1. Compute MST X 2. Add a minimum-weight perfect matching M of the odd vertices in T X 3. Find an Eulerian Circuit X 4. Transform the Circuit into a Hamiltonian CycleX

VI. Travelling Salesman Problem Metric TSP 14 Concluding Remarks

Theorem (Christofides’76) 3 There is a polynomial-time 2 -approximation algorithm for the travelling salesman problem with the triangle inequality.

Both received the Gödel Award 2010

Theorem (Arora’96, Mitchell’96) There is a PTAS for the Euclidean TSP Problem.

“Christos Papadimitriou told me that the traveling salesman problem is not a problem. It’s an addiction.”

Jon Bentley 1991

VI. Travelling Salesman Problem Metric TSP 15 VII. Approximation Algorithms: Randomisation and Rounding Thomas Sauerwald

Easter 2015 Outline

Randomised Approximation

MAX-3-CNF

Weighted Vertex Cover

Weighted Set Cover

VII. Randomisation and Rounding Randomised Approximation 2 C∗ Maximization problem: C ≥ 1 C Minimization problem: C∗ ≥ 1

Performance Ratios for Randomised Approximation Algorithms

Approximation Ratio A randomised algorithm for a problem has approximation ratio ρ(n), if for any input of size n, the expected cost C of the returned solution and optimal cost C∗ satisfy:

 C C∗  max , ≤ ρ(n). C∗ C

Call such an algorithm randomized ρ(n)-approximation algorithm.

extends in the natural way to randomized algorithms Approximation Schemes An approximation scheme is an approximation algorithm, which given any input and  > 0, is a (1 + )-approximation algorithm. It is a polynomial-time approximation scheme (PTAS) if for any fixed  > 0, the runtime is polynomial in n. For example, O(n2/). It is a fully polynomial-time approximation scheme (FPTAS) if the runtime is polynomial in both 1/ and n. For example, O((1/)2 · n3).

VII. Randomisation and Rounding Randomised Approximation 3 Outline

Randomised Approximation

MAX-3-CNF

Weighted Vertex Cover

Weighted Set Cover

VII. Randomisation and Rounding MAX-3-CNF 4 MAX-3-CNF Satisfiability

Assume that no literal (including its negation) appears more than once in the same clause. MAX-3-CNF Satisfiability

Given: 3-CNF formula, e.g.: (x1 ∨ x3 ∨ x4) ∧ (x2 ∨ x3 ∨ x5) ∧ · · · Goal: Find an assignment of the variables that satisfies as many clauses as possible.

Relaxation of the satisfiability problem. Want to com- pute how “close” the formula to being satisfiable is.

Example:

(x1 ∨ x3 ∨ x4) ∧ (x1 ∨ x3 ∨ x5) ∧ (x2 ∨ x4 ∨ x5) ∧ (x1 ∨ x2 ∨ x3)

x1 = 1, x2 = 0, x3 = 1, x4 = 0 and x5 = 1 satisfies 3 (out of 4 clauses)

Idea: What about assigning each variable independently at random?

VII. Randomisation and Rounding MAX-3-CNF 5 Analysis

Theorem 35.6

Given an instance of MAX-3-CNF with n variables x1, x2,..., xn and m clauses, the randomised algorithm that sets each variable independently at random is a randomized 8/7-approximation algorithm.

Proof: For every clause i = 1, 2,..., m, define a random variable:

Yi = 1{clause i is satisfied} Since each literal (including its negation) appears at most once in clause i, 1 1 1 1 Pr [ clause i is not satisfied ] = · · = 2 2 2 8 1 7 ⇒ Pr [ clause i is satisfied ] = 1 − = 8 8 7 ⇒ E [ Yi ] = Pr [ Yi = 1 ] · 1 = . 8 Pm Let Y := i=1 Yi be the number of satisfied clauses. Then, " m # m m X X X 7 7 E [ Y ] = E Yi = E [ Yi ] = = · m. 8 8 i=1 i=1 i=1 Linearity of Expectations maximum number of satisfiable clauses is m!

VII. Randomisation and Rounding MAX-3-CNF 6 Interesting Implications

Theorem 35.6

Given an instance of MAX-3-CNF with n variables x1, x2,..., xn and m clauses, the randomised algorithm that sets each variable independently at random is a polynomial-time randomized 8/7-approximation algorithm.

Corollary For any instance of MAX-3-CNF, there exists an assigment which satis- 7 fies at least 8 of all clauses. Probabilistic Method: powerful tool to There is ω ∈ Ω such that Y (ω) ≥ E [ Y ]! show existence of a non-obvious property.

Corollary Any instance of MAX-3-CNF with at most 7 clauses is satisfiable.

Follows from the previous Corollary.

VII. Randomisation and Rounding MAX-3-CNF 7 Algorithm: Assign x1 so that the conditional expectation is maximized and recurse.

Expected Approximation Ratio

Theorem 35.6

Given an instance of MAX-3-CNF with n variables x1, x2,..., xn and m clauses, the randomised algorithm that sets each variable independently at random is a polynomial-time randomized 8/7-approximation algorithm.

One could prove that the probability to satisfy (7/8) · m clauses is at least 1/(8m)

1 1 E [ Y ] = · E [ Y | x1 = 1 ] + · E [ Y | x1 = 0 ] . 2 2 Y is defined as in the previous proof. One of the two conditional expectations is greater than E [ Y ]!

GREEDY-3-CNF(φ, n, m) 1: for j = 1, 2,..., n 2: Compute E [ Y | x1 = v1 ..., xj−1 = vj−1, xj = 1 ] 3: Compute E [ Y | x1 = v1,..., xj−1 = vj−1, xj = 0 ] 4: Let xj = vj so that the conditional expectation is maximized 5: return the assignment v1, v2,..., vn

VII. Randomisation and Rounding MAX-3-CNF 8 X

X

Analysis of GREEDY-3-CNF(φ, n, m)

This algorithm is deterministic. Theorem GREEDY-3-CNF(φ, n, m) is a polynomial-time 7/8-approximation.

Proof: Step 1: polynomial-time algorithm In iteration j = 1, 2,..., n, Y = Y (φ) averages over 2n−j+1 assignments A smarter way is to use linearity of (conditional) expectations: m   X   E Y | x1 = v1,..., xj−1 = vj−1, xj = 1 = E Yi | x1 = v1,..., xj−1 = vj−1, xj = 1 i=1 computable in O(1) Step 2: satisfies at least 7/8 · m clauses Due to the greedy choice in each iteration j = 1, 2,..., n,     E Y | x1 = v1,..., xj−1 = vj−1, xj = vj ≥ E Y | x1 = v1,..., xj−1 = vj−1   ≥ E Y | x1 = v1,..., xj−2 = vj−2 . . 7 ≥ E [ Y ] = · m. 8

VII. Randomisation and Rounding MAX-3-CNF 9 ???? 8.75

0??? 8.625 1??? 8.875

00?? 8 01?? 9.25 10?? 9 11?? 8.75

000? 8 001? 8 010? 9 011? 9.5 100? 9 101? 9 110? 9 111? 8.5

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 8 8 9 7 9 9 10 9 9 9 9 9 9 9 8 9

Returned solution satisfies 9 out of 10 clauses, but the formula is satisfiable.

Run of GREEDY-3-CNF(ϕ, n, m)

(x1 ∨ x2 ∨ x3) ∧ (x1 ∨ x2 ∨ x4) ∧ (x1 ∨ x2 ∨ x4) ∧ (x1 ∨ x3 ∨ x4) ∧ (x1 ∨ x2 ∨ x4) ∧ (x1 ∨ x2 ∨ x3) ∧ (x1 ∨ x2 ∨ x3) ∧ (x1 ∨ x2 ∨ x3) ∧ (x1 ∨ x3 ∨ x4) ∧ (x2 ∨ x3 ∨ x4)

???? 8.75

x1 = 0 x1 = 1

0??? 8.625 1??? 8.875

x2 = 0 x2 = 1 x2 = 0 x2 = 1

00?? 01?? 10?? 11??

x3 = 0 x3 = 1 x3 = 0 x3 = 1 x3 = 0 x3 = 1 x3 = 0 x3 = 1

000? 001? 010? 011? 100? 101? 110? 111?

x x x x x x x

x x x x x x x x x

4 4 4 4 4 4 4 4

4 4 4 4 4 4 4 4

======

======

0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 1

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

VII. Randomisation and Rounding MAX-3-CNF 10 ???? 8.75

0??? 8.625 1??? 8.875

00?? 8 01?? 9.25 10?? 9 11?? 8.75

000? 8 001? 8 010? 9 011? 9.5 100? 9 101? 9 110? 9 111? 8.5

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 8 8 9 7 9 9 10 9 9 9 9 9 9 9 8 9

Returned solution satisfies 9 out of 10 clauses, but the formula is satisfiable.

Run of GREEDY-3-CNF(ϕ, n, m)

1 ∧ 1 ∧ 1 ∧ (x3 ∨ x4) ∧ 1 ∧ (x2 ∨ x3) ∧ (x2 ∨ x3) ∧ (x2 ∨ x3) ∧ 1 ∧ (x2 ∨ x3 ∨ x4)

???? 8.75

x1 = 0 x1 = 1

0??? 8.625 1??? 8.875

x2 = 0 x2 = 1 x2 = 0 x2 = 1

00?? 01?? 10?? 9 11?? 8.75

x3 = 0 x3 = 1 x3 = 0 x3 = 1 x3 = 0 x3 = 1 x3 = 0 x3 = 1

000? 001? 010? 011? 100? 101? 110? 111?

x x x x x x x

x x x x x x x x x

4 4 4 4 4 4 4 4

4 4 4 4 4 4 4 4

======

======

0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 1

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

VII. Randomisation and Rounding MAX-3-CNF 10 ???? 8.75

0??? 8.625 1??? 8.875

00?? 8 01?? 9.25 10?? 9 11?? 8.75

000? 8 001? 8 010? 9 011? 9.5 100? 9 101? 9 110? 9 111? 8.5

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 8 8 9 7 9 9 10 9 9 9 9 9 9 9 8 9

Returned solution satisfies 9 out of 10 clauses, but the formula is satisfiable.

Run of GREEDY-3-CNF(ϕ, n, m)

1 ∧ 1 ∧ 1 ∧ (x3 ∨ x4) ∧ 1 ∧ 1 ∧ (x3) ∧ 1 ∧ 1 ∧ (x3 ∨ x4)

???? 8.75

x1 = 0 x1 = 1

0??? 8.625 1??? 8.875

x2 = 0 x2 = 1 x2 = 0 x2 = 1

00?? 01?? 10?? 9 11?? 8.75

x3 = 0 x3 = 1 x3 = 0 x3 = 1 x3 = 0 x3 = 1 x3 = 0 x3 = 1

000? 001? 010? 011? 100? 9 101? 9 110? 111?

x x x x x x x

x x x x x x x x x

4 4 4 4 4 4 4 4

4 4 4 4 4 4 4 4

======

======

0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 1

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

VII. Randomisation and Rounding MAX-3-CNF 10 ???? 8.75

0??? 8.625 1??? 8.875

00?? 8 01?? 9.25 10?? 9 11?? 8.75

000? 8 001? 8 010? 9 011? 9.5 100? 9 101? 9 110? 9 111? 8.5

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 8 8 9 7 9 9 10 9 9 9 9 9 9 9 8 9

Returned solution satisfies 9 out of 10 clauses, but the formula is satisfiable.

Run of GREEDY-3-CNF(ϕ, n, m)

1 ∧ 1 ∧ 1 ∧ 1 ∧ 1 ∧ 1 ∧ 0 ∧ 1 ∧ 1 ∧ 1

???? 8.75

x1 = 0 x1 = 1

0??? 8.625 1??? 8.875

x2 = 0 x2 = 1 x2 = 0 x2 = 1

00?? 01?? 10?? 9 11?? 8.75

x3 = 0 x3 = 1 x3 = 0 x3 = 1 x3 = 0 x3 = 1 x3 = 0 x3 = 1

000? 001? 010? 011? 100? 9 101? 9 110? 111?

x x x x x x x

x x x x x x x x x

4 4 4 4 4 4 4 4

4 4 4 4 4 4 4 4

======

======

0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 1

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 9 9

VII. Randomisation and Rounding MAX-3-CNF 10 ???? 8.75

0??? 8.625 1??? 8.875

00?? 8 01?? 9.25 10?? 9 11?? 8.75

000? 8 001? 8 010? 9 011? 9.5 100? 9 101? 9 110? 9 111? 8.5

0000 0001 0010 0011 0100 0101 0110 0111 1001 1010 1011 1100 1101 1110 1111 8 8 9 7 9 9 10 9 9 9 9 9 9 8 9

Returned solution satisfies 9 out of 10 clauses, but the formula is satisfiable.

Run of GREEDY-3-CNF(ϕ, n, m)

1 ∧ 1 ∧ 1 ∧ 1 ∧ 1 ∧ 1 ∧ 0 ∧ 1 ∧ 1 ∧ 1

???? 8.75

x1 = 0 x1 = 1

0??? 8.625 1??? 8.875

x2 = 0 x2 = 1 x2 = 0 x2 = 1

00?? 01?? 10?? 9 11?? 8.75

x3 = 0 x3 = 1 x3 = 0 x3 = 1 x3 = 0 x3 = 1 x3 = 0 x3 = 1

000? 001? 010? 011? 100? 9 101? 9 110? 111?

x x x x x x x

x x x x x x x x x

4 4 4 4 4 4 4 4

4 4 4 4 4 4 4 4

======

======

0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 1

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 9 9

VII. Randomisation and Rounding MAX-3-CNF 10 ???? 8.75

0??? 8.625 1??? 8.875

00?? 01?? 10?? 9 11?? 8.75

000? 001? 010? 011? 100? 9 101? 9 110? 111?

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 10 9 9

Run of GREEDY-3-CNF(ϕ, n, m)

(x1 ∨ x2 ∨ x3) ∧ (x1 ∨ x2 ∨ x4) ∧ (x1 ∨ x2 ∨ x4) ∧ (x1 ∨ x3 ∨ x4) ∧ (x1 ∨ x2 ∨ x4) ∧ (x1 ∨ x2 ∨ x3) ∧ (x1 ∨ x2 ∨ x3) ∧ (x1 ∨ x2 ∨ x3) ∧ (x1 ∨ x3 ∨ x4) ∧ (x2 ∨ x3 ∨ x4)

???? 8.75

x1 = 0 x1 = 1

0??? 8.625 1??? 8.875

x2 = 0 x2 = 1 x2 = 0 x2 = 1

00?? 8 01?? 9.25 10?? 9 11?? 8.75

x3 = 0 x3 = 1 x3 = 0 x3 = 1 x3 = 0 x3 = 1 x3 = 0 x3 = 1

000? 8 001? 8 010? 9 011? 9.5 100? 9 101? 9 110? 9 111? 8.5

x x x x x x x

x x x x x x x x x

4 4 4 4 4 4 4 4

4 4 4 4 4 4 4 4

======

======

0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 1

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 8 8 9 7 9 9 10 9 9 9 9 9 9 9 8 9

Returned solution satisfies 9 out of 10 clauses, but the formula is satisfiable.

VII. Randomisation and Rounding MAX-3-CNF 10 MAX-3-CNF: Concluding Remarks

Theorem 35.6

Given an instance of MAX-3-CNF with n variables x1, x2,..., xn and m clauses, the randomised algorithm that sets each variable independently at random is a randomized 8/7-approximation algorithm.

Theorem GREEDY-3-CNF(φ, n, m) is a polynomial-time 8/7-approximation.

Theorem (Hastad’97) For any  > 0, there is no polynomial time8 /7 −  approximation algo- rithm of MAX3-SAT unless P=NP.

Roughly speaking, there is nothing smarter than just guessing.

VII. Randomisation and Rounding MAX-3-CNF 11 Outline

Randomised Approximation

MAX-3-CNF

Weighted Vertex Cover

Weighted Set Cover

VII. Randomisation and Rounding Weighted Vertex Cover 12 b

e

c

The Weighted Vertex-Cover Problem

3 b 4 Vertex Cover Problem a Given: Undirected, vertex-weighted graph G = (V , E) Goal: Find a minimum-weight subset V 0 ⊆ V such that e if (u, v) ∈ E(G), then u ∈ V 0 or v ∈ V 0. 2

This is (still) an NP-hard problem. c d 3 1

Applications: Every edge forms a task, and every vertex represents a person/machine which can execute that task Weight of a vertex could be salary of a person Perform all tasks with the minimal amount of resources

VII. Randomisation and Rounding Weighted Vertex Cover 13 b c d e

Optimal solution has weight 4.

35.1 The vertex-cover problem 1109

bcd bcd

ae fg ae fg (a) (b)

bcd bcd

ae fg ae fg (c) (d)

bcd bcd

ae fg ae fg (e) (f)

Figure 35.1 The operation of APPROX-VERTEX-COVER. (a) The input graph G,whichhas7 vertices and 8 edges. (b) The edge .b; c/,shownheavy,isthefirstedgechosenbyAPPROX-VERTEX- COVER.Verticesb and c,shownlightlyshaded,areaddedtothesetC containing the vertex cover being created. Edges .a; b/, .c; e/,and.c; d/,showndashed,areremovedsincetheyarenowcovered by some vertex in C . (c) Edge .e; f / is chosen; vertices e and f are added to C . (d) Edge .d; g/ is chosen; vertices d and g are added to C . (e) The set C ,whichisthevertexcoverproducedby APPROX-VERTEX-COVER,containsthesixverticesb;c; d;e; f;g. (f) The optimal vertex cover for Thethis problem Greedy contains Approach only three vertices: fromb, d,and (Unweighted)e. Vertex Cover

APPROX-VERTEX-COVER.G/ 1 C D; 2 E0 G:E D 3 while E0 ¤; 4let.u; !/ be an arbitrary edge of E0 5 C C u; ! D [ f g 6removefromE0 every edge incident on either u or ! 7 return C

Figure 35.1 illustrates how APPROX-VERTEX-C100OVER operates on an example graph. The variable C contains the vertex cover being constructed. Line 1 ini- a tializes C to the empty set. Line 2 sets E0 to be a copy of the edge set G:E of the graph. The loop of lines 3–6 repeatedly picks an edge .u; !/ from E0,addsits

b c d e 1 1 1 1

Computed solution has weight 101.

VII. Randomisation and Rounding Weighted Vertex Cover 14 a

b

Computed solution has weight 101.

35.1 The vertex-cover problem 1109

bcd bcd

ae fg ae fg (a) (b)

bcd bcd

ae fg ae fg (c) (d)

bcd bcd

ae fg ae fg (e) (f)

Figure 35.1 The operation of APPROX-VERTEX-COVER. (a) The input graph G,whichhas7 vertices and 8 edges. (b) The edge .b; c/,shownheavy,isthefirstedgechosenbyAPPROX-VERTEX- COVER.Verticesb and c,shownlightlyshaded,areaddedtothesetC containing the vertex cover being created. Edges .a; b/, .c; e/,and.c; d/,showndashed,areremovedsincetheyarenowcovered by some vertex in C . (c) Edge .e; f / is chosen; vertices e and f are added to C . (d) Edge .d; g/ is chosen; vertices d and g are added to C . (e) The set C ,whichisthevertexcoverproducedby APPROX-VERTEX-COVER,containsthesixverticesb;c; d;e; f;g. (f) The optimal vertex cover for Thethis problem Greedy contains Approach only three vertices: fromb, d,and (Unweighted)e. Vertex Cover

APPROX-VERTEX-COVER.G/ 1 C D; 2 E0 G:E D 3 while E0 ¤; 4let.u; !/ be an arbitrary edge of E0 5 C C u; ! D [ f g 6removefromE0 every edge incident on either u or ! 7 return C

Figure 35.1 illustrates how APPROX-VERTEX-C100OVER operates on an example graph. The variable C contains the vertex cover being constructed. Line 1 ini- a tializes C to the empty set. Line 2 sets E0 to be a copy of the edge set G:E of the graph. The loop of lines 3–6 repeatedly picks an edge .u; !/ from E0,addsits

b c d e 1 1 1 1

Optimal solution has weight 4.

VII. Randomisation and Rounding Weighted Vertex Cover 14 Invoking an (Integer) Linear Program

Idea: Round the solution of an associated linear program.

0-1 Integer Program X minimize w(v)x(v) v∈V subject to x(u) + x(v) ≥ 1 for each (u, v) ∈ E x(v) ∈{ 0, 1} for each v ∈ V

optimum is a lower bound on the optimal weight of a minimum weight-cover. Linear Program X minimize w(v)x(v) v∈V subject to x(u) + x(v) ≥ 1 for each (u, v) ∈ E x(v) ∈ [0, 1] for each v ∈ V

Rounding Rule: if x(v) ≥ 1/2 then round up, otherwise round down.

VII. Randomisation and Rounding Weighted Vertex Cover 15 The Algorithm

1126 Chapter 35 Approximation Algorithms

APPROX-MIN-WEIGHT-VC.G; w/ 1 C D; 2computex,anoptimalsolutiontothelinearprograminlines(35.17)–(35.20) N 3 for each ! V 2 4 if x.!/ 1=2 N ! 5 C C ! D [ f g 6 return C

The APPROX-MIN-WEIGHT-VC procedure works as follows. Line 1 initial- izes theTheorem vertex cover 35.7 to be empty. Line 2 formulates the linear program in linesAPPROX (35.17)–(35.20)-MIN-W andEIGHT then-VC solves is this a linear polynomial-time program. An optimal 2-approximation solution algo- gives each vertex ! an associated value x.!/,where0 x.!/ 1.Weusethis rithm for the minimum-weight vertex-coverN Ä problem.N Ä value to guide the choice of which vertices to add to the vertex cover C in lines 3–5. If x.!/ 1=2,weadd! to C ;otherwisewedonot.Ineffect,weare“rounding” N ! iseach polynomial-time fractional variable because in the solution we to can the solve linear program the linear to 0 programor 1 in order in to polynomial time obtain a solution to the 0-1 integer program in lines (35.14)–(35.16). Finally, line 6 returns the vertex cover C .

Theorem 35.7 Algorithm APPROX-MIN-WEIGHT-VC is a polynomial-time 2-approximation al- gorithm for the minimum-weight vertex-cover problem. VII. Randomisation and Rounding Weighted Vertex Cover 16 Proof Because there is a polynomial-time algorithm to solve the linear program in line 2, and because the for loop of lines 3–5 runs in polynomial time, APPROX- MIN-WEIGHT-VC is a polynomial-time algorithm. Now we show that APPROX-MIN-WEIGHT-VC is a 2-approximation algo- rithm. Let C ! be an optimal solution to the minimum-weight vertex-cover prob- lem, and let ´! be the value of an optimal solution to the linear program in lines (35.17)–(35.20). Since an optimal vertex cover is a feasible solution to the linear program, ´! must be a lower bound on w.C !/,thatis,

´! w.C !/: (35.21) Ä Next, we claim that by rounding the fractional values of the variables x.!/,we N produce a set C that is a vertex cover and satisfies w.C/ 2´!.ToseethatC is Ä avertexcover,consideranyedge.u; !/ E.Byconstraint(35.18),weknowthat 2 x.u/ x.!/ 1,whichimpliesthatatleastoneofx.u/ and x.!/ is at least 1=2. C ! N N Therefore, at least one of u and ! is included in the vertex cover, and so every edge is covered. Now, we consider the weight of the cover. We have Example of APPROX-MIN-WEIGHT-VC

( ) = ( ) = ( ) = 1 ( ) = ( ) = ( ) = ( ) = ( ) = ( ) = ( ) = x a x b x c 2 , x d 1, x e 0 x a x b x c 1, x d 1, x e 0

3 3 3 b b b 4 4 4 a a a Rounding e e e 2 2 2

c d c d c d 3 1 3 1 3 1

fractional solution of LP rounded solution of LP optimal solution with weight = 5.5 with weight = 10 with weight = 6

VII. Randomisation and Rounding Weighted Vertex Cover 17 Approximation Ratio

Proof (Approximation Ratio is 2): Let C∗ be an optimal solution to the minimum-weight vertex cover problem Let z∗ be the value of an optimal solution to the linear program, so z∗ ≤ w(C∗) Step 1: The computed set C covers all vertices: Consider any edge (u, v) ∈ E which imposes the constraint x(u) + x(v) ≥ 1 ⇒ at least one of x(u) and x(v) is at least 1/2 ⇒ C covers edge (u, v) Step 2: The computed set C satisfies w(C) ≤ 2z∗:

∗ ∗ X X 1 1 w(C ) ≥ z = w(v)x(v) ≥ w(v) · = w(C). 2 2 v∈V v∈V : x(v)≥1/2

3 3 3 b b b 4 4 4 a a a Rounding e e e 2 2 2

c d c d c d 3 1 3 1 3 1

VII. Randomisation and Rounding Weighted Vertex Cover 18 Outline

Randomised Approximation

MAX-3-CNF

Weighted Vertex Cover

Weighted Set Cover

VII. Randomisation and Rounding Weighted Set Cover 19 S Only solvable if S∈F S = X!

The Weighted Set-Covering Problem

Set Cover Problem S1 Given: set X and a family of subsets F, + and a cost function c : F → R S Goal: Find a minimum-cost subset C ⊆ F 2 [ Sum over the costs s.t. X = S. of all sets in C S∈C S6

S3 S4 S5

S1 S2 S3 S4 S5 S6 Remarks: c : 2 3 3 5 1 2 generalisation of the weighted vertex-cover problem. models resource allocation problems

VII. Randomisation and Rounding Weighted Set Cover 20 Setting up an Integer Program

0-1 Integer Program X minimize c(S)y(S) S∈F X subject to y(S) ≥ 1 for each x ∈ X S∈F : x∈S y(S) ∈{ 0, 1} for each S ∈ F

Linear Program X minimize c(S)y(S) S∈F X subject to y(S) ≥ 1 for each x ∈ X S∈F : x∈S y(S) ∈ [0, 1] for each S ∈ F

VII. Randomisation and Rounding Weighted Set Cover 21 Back to the Example

S1

S2

S6

S3 S4 S5

S1 S2 S3 S4 S5 S6 c : 2 3 3 5 1 2 y(.): 1/2 1/2 1/2 1/2 1 1/2 Cost equals 8.5

The strategy employed for Vertex-Cover would take all 6 sets!

Even worse: If all y’s were below 1/2, we would not even return a valid cover!

VII. Randomisation and Rounding Weighted Set Cover 22 Randomised Rounding

S1 S2 S3 S4 S5 S6 c : 2 3 3 5 1 2 y(.): 1/2 1/2 1/2 1/2 1 1/2

Idea: Interpret the y-values as probabilities for picking the respective set.

Lemma Let C ⊆ F be a random subset with each set S being included indepen- dently with probability y(S). The expected cost satisfies X E [ c(C) ] = c(S) · y(S) S∈F The probability that an element x ∈ X is covered satisfies " # [ 1 Pr x ∈ S ≥ 1 − . e S∈C

VII. Randomisation and Rounding Weighted Set Cover 23 X

X

Proof of Lemma Lemma Let C ⊆ F be a random subset with each set S being included indepen- dently with probability y(S). P The expected cost satisfies E [ c(C) ] = S∈F c(S) · y(S). 1 The probability that x is covered satisfies Pr [ x ∈ ∪S∈CS ] ≥ 1 − e .

Proof: Step 1: The expected cost of the random set S   X X E [ c(C)] = E  c(S)  = E [ c(S)] S∈C S∈C X X = Pr [ S ∈ C ] · c(S) = y(S) · c(S). S∈F S∈F Step 2: The probability for an element to be (not) covered Y Y Pr [ x 6∈ ∪S∈CS ] = Pr [ S 6∈ C ] = (1 − y(S)) S∈C : x∈F S∈F : x∈S Y −y(S) ≤ e y solves the LP! 1 + x ≤ ex for any x ∈ S∈F : x∈S R P = e− S∈F : x∈S y(S) ≤ e−1

VII. Randomisation and Rounding Weighted Set Cover 24 The Final Step

Lemma Let C ⊆ F be a random subset with each set S being included indepen- dently with probability y(S). P The expected cost satisfies E [ c(C) ] = S∈F c(S) · y(S). 1 The probability that x is covered satisfies Pr [ x ∈ ∪S∈CS ] ≥ 1 − e .

Problem: Need to make sure that every element is covered!

Idea: Amplify this probability by taking the union of Ω(log n) random sets C.

WEIGHTED SET COVER-LP(X, F, c) 1: compute y, an optimal solution to the linear program 2: C = ∅ 3: repeat 2 ln n times 4: for each S ∈ F 5: let C = C ∪ {S} with probability y(S) 6: return C clearly runs in polynomial-time!

VII. Randomisation and Rounding Weighted Set Cover 25 By Markov’s inequality, Pr [ c(C) ≤ 4 ln(n) · c(C∗)] ≥ 1/2. X 1 1 1 probability could be further Hence with probability at least 1 − n − 2 > 3 , solution is within a factor of 4 ln(n) of the optimum. increased by repeating

Typical Approach for Designing Approximation Algorithms based on LPs

X

Analysis of WEIGHTED SET COVER-LP Theorem 1 With probability at least 1 − n , the returned set C is a valid cover of X. The expected approximation ratio is 2 ln(n).

Proof: Step 1: The probability that C is a cover By previous Lemma, an element x ∈ X is covered in one of the 2 ln n 1 iterations with probability at least 1 − e , so that  1 2 ln n 1 Pr [ x 6∈ ∪S∈CS ] ≤ = . e n2 This implies for the event that all elements are covered:   [ Pr [ X = ∪S∈CS ] = 1 − Pr  {x 6∈ ∪S∈CS}  x∈X X 1 1 Pr [ A ∪ B ] ≥ Pr [ A ] + Pr [ B ] ≥ 1 − Pr [ x 6∈ ∪S∈CS ] ≥ 1 − n · = 1 − . 2 n x∈X n Step 2: The expected approximation ratio P By previous lemma, the expected cost of one iteration is S∈F c(S) · y(S). P ∗ Linearity ⇒ E [ c(C)] ≤ 2 ln(n) · S∈F c(S) · y(S) ≤ 2 ln(n) · c(C )

VII. Randomisation and Rounding Weighted Set Cover 26 X

Pr [ A ∪ B ] ≥ Pr [ A ] + Pr [ B ]

X

Proof: Step 1: The probability that C is a cover By previous Lemma, an element x ∈ X is covered in one of the 2 ln n 1 iterations with probability at least 1 − e , so that  1 2 ln n 1 Pr [ x 6∈ ∪S∈CS ] ≤ = . e n2 This implies for the event that all elements are covered:   [ Pr [ X = ∪S∈CS ] = 1 − Pr  {x 6∈ ∪S∈CS}  x∈X X 1 1 ≥ 1 − Pr [ x 6∈ ∪S∈CS ] ≥ 1 − n · = 1 − . 2 n x∈X n Step 2: The expected approximation ratio P By previous lemma, the expected cost of one iteration is S∈F c(S) · y(S). P ∗ Linearity ⇒ E [ c(C)] ≤ 2 ln(n) · S∈F c(S) · y(S) ≤ 2 ln(n) · c(C )

Analysis of WEIGHTED SET COVER-LP Theorem 1 With probability at least 1 − n , the returned set C is a valid cover of X. The expected approximation ratio is 2 ln(n).

By Markov’s inequality, Pr [ c(C) ≤ 4 ln(n) · c(C∗)] ≥ 1/2.

1 1 1 probability could be further Hence with probability at least 1 − n − 2 > 3 , solution is within a factor of 4 ln(n) of the optimum. increased by repeating

Typical Approach for Designing Approximation Algorithms based on LPs

VII. Randomisation and Rounding Weighted Set Cover 26 VIII. Approximation Algorithms: MAX-CUT Problem Thomas Sauerwald

Easter 2015 Outline

Simple Algorithms for MAX-CUT

A Solution based on Semidefinite Programming

Summary

VIII. MAX-CUT Problem Simple Algorithms for MAX-CUT 2 Max-Cut

2 b Weighted MAX-CUT: Every edge e ∈ E has a non-negative weight w(e) a 1 3 MAX-CUT Problem 5 4 c Given: Undirected graph G = (V , E) 2 1 Goal: Find a subset S ⊆ V such that |E(S, V \ S)| 2 h e 1 is maximized. g 3

Weighted MAX-CUT: Maximize the weights of edges crossing d P the cut, i.e., maximize w(S) := {u,v}∈E(S,V \S) w({u, v}) S = {a, b, g} w(S) = 18 Applications: cluster analysis VLSI design

VIII. MAX-CUT Problem Simple Algorithms for MAX-CUT 3 Random Sampling

Ex 35.4-3 Suppose that for each vertex v, we randomly and independently place v in S with probability 1/2 and in V \ S with probability 1/2. Then this algorithm is a randomized 2-approximation algorithm.

We could employ the same derandomisation used for MAX-3-CNF. Proof: We express the expected weight of the random cut (S, V \ S) as:

E [ w(S, V \ S)]   X = E  w({u, v})  {u,v}∈E(S,V \S) X = Pr [ {u ∈ S ∩ v ∈ (V \ S)} ∪ {u ∈ (V \ S) ∩ v ∈ S} ] · w({u, v}) {u,v}∈E X  1 1  = + · w({u, v}) 4 4 {u,v}∈E

1 X 1 ∗ = w({u, v}) ≥ w . 2 2 {u,v}∈E

VIII. MAX-CUT Problem Simple Algorithms for MAX-CUT 4 Local Search

Local Search: Switch side of a vertex if it increases the cut.

LOCAL SEARCH(G, w) 1: Let S be an arbitrary subset of V 2: do 3: flag = 0 4: if ∃u ∈ S with w(S \{u}, (V \ S) ∪ {u}) ≥ w(S, V \ S) then 5: S = S \{u} 6: flag = 1 7: end if 8: if ∃u ∈ V \ S with w(S ∪ {u}, (V \ S) \{u}) ≥ w(S, V \ S) then 9: S = S ∪ {u} 10: flag = 1 11: end if 12: while flag = 1 13: return S

VIII. MAX-CUT Problem Simple Algorithms for MAX-CUT 5 b a

e g

d

(local search terminates) AfterA Step betterStepStep 5: 5: solution 3:2:4:1:Local Move Move Searcha canintodgba intointo beV terminates found:S\ S CutCut==13111210508¡

Illustration of Local Search

b a f

c

h e g

d

i

Cut = 0

VIII. MAX-CUT Problem Simple Algorithms for MAX-CUT 6 b

e g

d

(local search terminates) AfterA Step betterStepStep 5: 5: solution 3:2:4:Local Move Move Searcha canintodgb into beV terminates found:S\ S CutCut==13111210085¡

Illustration of Local Search

b a f

c

h e g

d

i Step 1: Move a into S Cut = 5

VIII. MAX-CUT Problem Simple Algorithms for MAX-CUT 6 b

e

d

(local search terminates) AfterA Step betterStepStep 5: 5: solution 3:4:1:Local Move Move Searcha canintodba intointo beV terminates found:S\ S CutCut==13111210508¡

Illustration of Local Search

b a f

c

h e g

d

i Step 2: Move g into S Cut = 8

VIII. MAX-CUT Problem Simple Algorithms for MAX-CUT 6 b

e

d

(local search terminates) AfterA Step betterStepStep 5: 5: solution 2:4:1:Local Move Move Searcha canintogba into beV terminates found:S\ S CutCut==13111210508¡

Illustration of Local Search

b a f

c

h e g

d

i Step 3: Move d into S Cut = 10

VIII. MAX-CUT Problem Simple Algorithms for MAX-CUT 6 b

e

d

(local search terminates) AfterA Step betterStepStep 5: 5: solution 3:2:1:Local Move Move Searcha canintodga intointo beV terminates found:S\ S CutCut==13111210508¡

Illustration of Local Search

b a f

c

h e g

d

i Step 4: Move b into S Cut = 11

VIII. MAX-CUT Problem Simple Algorithms for MAX-CUT 6 b a

e

d

AfterA Step betterStep 5: solution 3:2:4:1:Local Move Search candgba intointo be terminates found:S CutCut==131110508¡

Illustration of Local Search

b a f

c

h e g

d

(local search terminates) i Step 5: Move a into V \ S Cut = 12

VIII. MAX-CUT Problem Simple Algorithms for MAX-CUT 6 b a

g

d

(local search terminates) After StepStepStep 5: 5: 3:2:4:1:Local Move Move Searcha intodgba intointoV terminatesS\ S CutCut==111210508¡

Illustration of Local Search

b a f

c

h e g

d

i A better solution can be found: Cut = 13

VIII. MAX-CUT Problem Simple Algorithms for MAX-CUT 6 Analysis of Local Search (1/2) Theorem The cut returned by LOCAL-SEARCH satisfies W ≥ (1/2)W ∗.

Proof: At the time of termination, for every vertex u ∈ S: X X w({u, v}) ≥ w({u, v}), (1) v∈V \S,v∼u v∈S,v∼u Similarly, for any vertex u ∈ V \ S: X X w({u, v}) ≥ w({u, v}). (2) v∈S,v∼u v∈V \S,v∼u Adding up equation 1 for all vertices in S and equation 2 for all vertices in V \ S, X X w(S) ≥ 2 · w({u, v}) and w(S) ≥ 2 · w({u, v}). v∈S,u∈S,u∼v v∈V \S,u∈V \S,u∼v Adding up these two inequalities, and diving by 2 yields X X w(S) ≥ w({u, v}) + w({u, v}). v∈S,u∈S,u∼v v∈V \S,u∈V \S,u∼v Every edge appears on one of the two sides.

VIII. MAX-CUT Problem Simple Algorithms for MAX-CUT 7 Analysis of Local Search (2/2)

Theorem The cut returned by LOCAL-SEARCH satisfies W ≥ (1/2)W ∗.

What is the running time of LOCAL-SEARCH?

Unweighted Graphs: Cut increases by at least one in each iteration ⇒ at most n2 iterations Weighted Graphs: could take exponential time in n (not obvious...)

VIII. MAX-CUT Problem Simple Algorithms for MAX-CUT 8 Outline

Simple Algorithms for MAX-CUT

A Solution based on Semidefinite Programming

Summary

VIII. MAX-CUT Problem A Solution based on Semidefinite Programming 9 Max-Cut Problem

High-Level-Approach: 1. Describe the Max-Cut Problem as a quadratic optimisation problem 2. Solve a corresponding semidefinite program that is a relaxation of the original problem 3. Recover an approximation for the original problem from the approximation for the semidefinite program

Label vertices by 1, 2,..., n and express Quadratic program weight function etc. as a n × n-matrix. 1 X maximize wi,j · (1 − yi yj ) 2 (i,j)∈E

subject to yi ∈ {−1, +1}, i = 1,..., n.

This models the MAX-CUT problem S = {i ∈ V : yi = +1}, V \ S = {i ∈ V : yi = −1}

VIII. MAX-CUT Problem A Solution based on Semidefinite Programming 10 Relaxation

Quadratic program 1 X maximize wi,j · (1 − yi yj ) 2 (i,j)∈E

subject to yi ∈ {−1, +1}, i = 1,..., n.

Any solution of the original program can be recovered by setting vi = (yi , 0, 0,..., 0)!

Vector Programming Relaxation 1 X maximize wi,j · (1 − vi vj ) 2 (i,j)∈E

subject to vi · vi = 1 i = 1,..., n. n vi ∈ R

VIII. MAX-CUT Problem A Solution based on Semidefinite Programming 11 Positive Definite Matrices

Definition n×n n A matrix A ∈ R is positive semidefinite iff for all y ∈ R , y T · A · y ≥ 0.

Remark 1. A is symmetric and positive definite iff there exists a n × n matrix B with BT · B = A. 2. If A is symmetric and positive definite, then the matrix B above can be computed in polynomial time.

using Cholesky-decomposition Examples:

18 2 4 −1  4 1 A = = · , so A is SPD. 2 6 1 2 −1 2 1 2 1 2  1  A = since 1 −1 · · = −2, A is not SPD. 2 1 2 1 −1

VIII. MAX-CUT Problem A Solution based on Semidefinite Programming 12 Reformulating the Quadratic Program as a Semidefinite Program Vector Programming Relaxation 1 X maximize wi,j · (1 − vi vj ) 2 (i,j)∈E

subject to vi · vi = 1 i = 1,..., n. n vi ∈ R

Reformulation: 2 Introduce n variables ai,j = vi · vj , which give rise to a matrix A T If V is the matrix given by the vectors (v1, v2,..., vn), then A = V · V is symmetric and positive definite

Solve this (which can be done in polynomial time), Semidefinite Program and recover V using Cholesky Decomposition. 1 X maximize wi,j · (1 − ai,j ) 2 (i,j)∈E

subject to A = (ai,j ) is symmetric and positive definite,

and ai,i = 1 for all i = 1,..., n

VIII. MAX-CUT Problem A Solution based on Semidefinite Programming 13 Rounding the Vector Program

Vector Programming Relaxation 1 X maximize wi,j · (1 − vi vj ) 2 (i,j)∈E

subject to vi · vi = 1 i = 1,..., n. n vi ∈ R

Rounding by a random hyperplane :

1. Pick a random vector r = (r1, r2,..., rn) by drawing each component from N (0, 1)

2. Put i ∈ V if vi · r ≥ 0 and i ∈ V \ S otherwise

Lemma 1 n The probability that two vectors vi , vj ∈ R are separated by the (random) arccos(vi ·vj ) hyperplane given by r equals π . Follows by projecting on the plane given by vi and vj .

VIII. MAX-CUT Problem A Solution based on Semidefinite Programming 14 Illustration of the Hyperplane

N

v4 v 2 r

v3 v5

v1

S

VIII. MAX-CUT Problem A Solution based on Semidefinite Programming 15 A second (technical) Lemma

Lemma 2 For any x ∈ [−1, 1], 1 1 arccos(x) ≥ 0.878 · (1 − x). π 2

1 π arccos(x) 1 (1−x) f (x) 2

1 1 2 2 (1 − x) 1 π arrcos(x)

0.5 1 0.878

0 x 0 x −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

VIII. MAX-CUT Problem A Solution based on Semidefinite Programming 16 Putting Everything Together

Theorem (Goemans, Willamson’96)

1 The algorithm has an approximation ratio of 0.878 ≈ 1.139.

Proof: Define an indicator variable ( 1 if (i, j) ∈ E are on different sides of the hyperplane X = i,j 0 otherwise. Hence for the (random) weight of the computed cut,   X E [ w(S)] = E  Xi,j  {i,j}∈E X   = E Xi,j {i,j}∈E X = wi,j · Pr [ {i, j} ∈ E is in the cut ] {i,j}∈E X 1 By Lemma 1 = wi,j · arccos(vi · vj ) π {i,j}∈E

1 X ∗ ∗ By Lemma 2 ≥ 0.878 · wi,j · (1 − vi · vj ) = 0.878 · z ≥ 0.878 · W . 2 {i,j}∈E

VIII. MAX-CUT Problem A Solution based on Semidefinite Programming 17 MAX-CUT: Concluding Remarks

Theorem (Goemans, Willamson’96) There is a randomised polynomial-time 1.139-approximation algorithm for MAX-CUT.

can be derandomized Similar approach can be applied to MAX-3-CNF (with some effort) and yields an approximation ratio of 1.345

Theorem (Håstad’97) Unless P=NP, there is no ρ-approximation algorithm for MAX-CUT with 17 ρ ≤ 16 = 1.0625.

Theorem (Khot, Kindler, Mossel, O’Donnell’04) Assuming the so-called Unique Games Conjecture holds, unless P=NP there is no ρ-approximation algorithm for MAX-CUT with

1 (1 − x) ρ ≤ max 2 ≤ 1.139 −1≤x≤1 1 π arccos(x)

VIII. MAX-CUT Problem A Solution based on Semidefinite Programming 18 Other Approximation Algorithms for MAX-CUT

Theorem (Mathieu, Schudy’08) For any  > 0, there is a randomised algorithm with running time 2 O(n2)2O(1/ ) so that the expected value of the output deviates from the 2 maximum cut value by at most O( · n ). This is an additive approximation!

Algorithm (1): 1. Take a sample S of x = O(1/2) vertices chosen uniformly at random 2. For each of the 2x possible cuts, go through vertices in V \ S in random order and place them on the side of the cut which maximizes the crossing edges 3. Output the best cut found

Theorem (Trevisan’08) There is a randomised 1.833-approximation algorithm for MAX-CUT which runs in O(n2 · polylog(n)) time.

Exploits relation between the smallest eigenvalue and the structure of the graph.

VIII. MAX-CUT Problem A Solution based on Semidefinite Programming 19 Outline

Simple Algorithms for MAX-CUT

A Solution based on Semidefinite Programming

Summary

VIII. MAX-CUT Problem Summary 20 Spectrum of Approximations

MAX-CLIQUE

SET-COVER

VERTEX-COVER, MAX-3-CNF, MAX-CUT

SCHEDULING

KNAPSACK SUBSET-SUM

FPTAS PTAS APX log-APX poly-APX

VIII. MAX-CUT Problem Summary 21