<<

CMPT 307 Notes on Greedy (Should be read in conjunction with the lecture notes and the text.)

1 Introduction

A greedy builds a solution to a problem in steps. At each iteration, it adds a part of the solution. Which part of the solution to add next is determined by a greedy rule. The greedy rule says: among all the possibilities choose the best one. It never backtracks or changes past choices. Depending on the problem, greedy strategy may or may not work. However, there are many important problems that can be solved using the greedy strategy. We consider the following problems.

1. Coin Changing 2. Huffman Codes (Chapter 16.3) 3. Minimum-cost Spanning (Chapter 23) 4. Single Source Shortest Path (Chapter 24) A proceeds step by step. Initially, the part of the solution selected is empty. Then at each step some part of the solution is added to the partial solution already obtained. This addition is guided by the selection function which uses a greedy rule. If after the addition, the extended part of the solution is no longer feasible, the part of the solution, just added, is removed; and this part of the solution is never considered again. However, if the extended part of the solution is still feasible, the added part of the solution stays. The process is repeated till the solution to the original problem is obtained. The generic function Greedy can be described as follows: function Greedy (C:set):set {C is the set of all candidates for the solution} S ← Φ { S is a set which stores the part of the solution.} while not solution(S) and C =6 Φ do x ← an element, or elements, of C using a greedy rule select(x) Update C by removing x, selected in the previous step. In some cases some elements get added to C as well. if feasible (S ∪ {x}) then S ← S ∪ {x} if solution(S) then return S else return “There is no solution”

1 2 Coin Changing

We are given an unlimited number of coins representing 1, 5, 10, 25 denominations. We want to give change to a customer using the smallest possible number of coins. The following greedy strategy is applied: • candidate set: an unlimited set of coins representing 1, 5, 10, 25 denominations. • solution: the total value of the chosen set of coins is exactly the amount we have to pay. • feasible set: the total value of the chosen set does not exceed the amount to be paid. • selection function (greedy rule): choose the highest-denomination coin whose value does not exceed the balance of the change. • objective function: the number of coins used in the solution.

Theorem 1: The greedy algorithm is optimal for denominations 1, 5, 10, 25.

Proof: We use induction to prove that, to make change for an amount A, the greedy output and the optimal solution are identical. The cases A = 1, 2, 3, . . . , 23, 24 can be readily verified. Suppose the greedy algorithm gives optimal solution for any amount k < A. Suppose A ≥ 25. Let Opt be an optimal solution. Let b25, b10, b5 and b1 denote the respective number of coins of denominations 25, 10, 5 and 1 in the optimal solution of A. We claim that Opt must use a coin of denomination 25. We prove this by contradiction. Suppose that b25 = 0. Clearly, b10 ≤ 2, b5 ≤ 1, and b1 ≤ 4. Otherwise, it is possible to improve the optimal solution. It is also not possible to have the combination b10 = 2 and b5 = 1. Otherwise, we could replace the 10 and 5 denomination coins by a 25 denomination coin. Therefore, the possible combinations for the 10 and 5 denomination coins are: b10 = 2, b5 = 0; b10 = 1, b5 = 1; b10 = 1, b5 = 0; b10 = 0, b5 = 1; and b10 = 0, b5 = 0. In all these cases, 1.b1 + 5.b5 + 10.b10 is less than 25 which is less than A. Therefore, none of the above combinations is possible in the optimal change for A ≥ 25. Hence our claim that b25 = 0 is false. Once one 25 denomination coin is used by the greedy algorithm, the greedy strategy gives optimal change for A−25, by induction hypothesis. Therefore, adding a coin of denomination 25 to the greedy algorithm output for A − 25 yields greedy algorithm output for A. Thus, the greedy solution and the optimal solution are the same. 2 Note: The above strategy does not work if there also exist coins of denomination 12. (For A=29 the greedy algorithm gives wrong result.) We now describe a approach that solves the coin change problem for a list of k coins (d1, d2, . . . , dk), d1 = 1, and di < di+1 for all i.

Problem: Given a list of k coins (d1, d2, . . . , dk), and a number n, we want to find the k k integers (bd1 , bd2 , . . . , bdk ) such that n = Pi=1 dibdi and Pi=1 bdi is minimal. Our subproblems consist of the optimal change set for 1 through n. To keep track of the optimal solution for each subproblem we will use an array bSum which is indexed by

2 subproblem. (i.e. bSum[i] contains the least number of coins needed to make change for i. The recurrence relation for bSum[i] can be described as follows:

bSum[d1] = 1, bSum[d2] = 1, . . . , bSum[dk] = 1 bSum[i] = min1≤j≤k{bSum[i − dj] + 1}

In the above, we ignore the case when i − dj < 0. The top down algorithm and the bottom up algorithm should be easy to write. (Make sure that you know this.) There are O(n) subproblems, each subproblem takes O(k) time to solve. Therefor, the dynamic programming solution of the coin change problem for any set of denomination coins can be solved in O(nk) time. The storage space requirement is O(n).

2.1 Problems 1. Show that the greedy strategy for the coint set (1, 5, 10, 25) is optimal for any change less than 25. (In the induction proof we mentioned this is easy to determine.)

2. Determine whether greedy strategy is optimal for each of the following coin sets. If it is optimal give an argument to support your answer. If greedy strategy is not optimal, give a counterexample.

(a) (1,4,10) (b) (1,5,10,25,50) (c) (1,5,14,18)

3. Show that the greedy strategy is optimal for the coin set (d1, d2, . . . , dk), d1 < d2 < . . . < dk, and di−1 divides di, for i = 2, . . . , k.

3 Huffman Code

We are given an alphabet (a set of characters, e.g. English characters) and a string made up of these characters. The objective is to find binary codes for the characters that will minimize the total length of the given string. The general strategy is to encode frequently used characters with short binary codes and use longer binary codes for infrequently appearing characters. Suppose we use the following encoding: B=110, C=010, D=010110. When string “010110” is decoded, we can not say whether the string is “CB” or “D”. Therefore, the additional requirement of the encoding is that no code can be a prefix of another code. In the above the code for C is a prefix of the code for D. The encoding could be of fixed length or of variable length. Consider the following table: character a b c d e f frequency 45 13 12 16 9 5 probability .45 .13 .12 .16 .09 .05 variable length code 0 101 100 111 1101 1100

3 The expected length using the code is: n∗(1∗.45+3∗.13+3∗.12+3∗.16+4∗.09+4∗.05) = 2.24n. Huffman’s algorithm The idea is to build a binary tree T with the leafs storing the characters such that

Pc∈alphabet P r(c) ∗ dT (c) is minimum where P r(c) denotes the probability of the character appearing in the text and dT (c) denotes the depth of leaf-node storing c in T . The algorithm can be informally described as follows:

• The algorithm starts with a forest of |C| trees, each consisting of a single node labelled with a character and a weight (= character’s probability).

• While there is more than one tree in the forest do:

– Choose 2 trees, X and Y , with least weight. – Construct a new tree Z with X and Y as its left and right children, respectively. Make the weight of Z to be the sum of the weights of X and Y . – Delete X and Y from the forest; add Z.

• (Building of the tree is complete) Label the left links with “0” and the right links with “1”.

• Encode each character by concatenating the labels of the links on the path from the root to the leaf storing the character.

It is easy to see that the algorithm produces prefix codes. This comes from the fact that the characters are all stored at the leaf nodes. The decoding is done by following the path dictated by the given code. The algorithm to construct T is formally described in the text (HUFFMAN(C), page 388). A priority queue is needed to implement EXTRACT-MIN(Q). The correctness of the above greedy algorithm follows from the lemma below: Lemma 1: Let C be an alphabet in which every character c ∈ C has a weight w[c]. Let x and y be two characters in C having the smallest weight. Then there exists an optimal prefix code of C in which the codewords for x and y have the same length and differ in the last bit.

(The proof is given in the text, Lemma 16.2, page 388. We give below a sketch of the proof. Our notations are slightly different) (Please refer to figure 16.6 (page 390 of the text) Let A(T ) represent the expected length of the encoding of a character in the alphabet C using the prefix code given by T . Then

A(T ) = P∀c∈C (length of the encoding of character c)× (probability of character c in the 0 text). Now A(T ) = A(T ) − w(a)dT (a) − w(x).dT (x) + w(a)dT 0 (a) + w(x)dT 0 (x). Since 0 dT (x) = dT 0 (a) and dT (a) = dT 0 (x), A(T ) = A(T )−(w(a)−w(x)(d)T (a)−dT (x)). Therefore, 0 A(T ) − A(T ) ≥ 0, since w(a) ≥ w(x) and dT (a) ≥ DT (x). Similarly, we can show that A(T 0) − A(T 00) ≥ 0. Therefore, A(T ) − A(T 00) ≥ 0. Therefore, T 0 also gives an optimal prefix code. 2

4 3.1 Problems 1. 16.3.1, 16.3.2, 16.3.7 2. Prof. B needs to store text made up of the characters A with frequency 6, B with frequency 2, C with frequency 3, D with frequency 2, and E with frequency 8. Prof. B suggests using the following variable-length codes: A: ”1”; B: ”00”; C: ”01”; D: ”10”; E: ”0”, which, he argues, store the text in less space than that used by an optimal Huffman code. Is the professor correct? Explain.

4 Minimum-cost spanning tree (MST)

Let G = (V, E) be a connected undirected graph with cost associated with each edge. A spanning tree of G is an undirected tree that connects all the vertices in V . The cost of the spanning tree is the sum of the costs of its edges. Our objective is to find a spanning tree of G with minimum cost. The following two Lemmas are important for any algorithm that computes MST of G.

Lemma 1: Let G = (V, E) be a connected undirected graph and GT = (V, T ) a spanning tree of G. Then

1. ∀v1, v2 ∈ V , the path between v1 and v2 in GT is unique, and

2. if any edge in E − T is added to GT , a unique cycle results.

Proof: The first part of the lemma is easy since GT is a tree and more than one path will result in a cycle. For the second part of the lemma, if e = (v1, v2) in E − T is added, there will exist two paths between v1 and v2, resulting in a cycle. 2 Lemma 2: Let G = (V, E) be a connected, undirected weighted graph. Let A be a subset of edges of E that is included in some minimum-cost spanning tree of G. Let (S, V − S) be any cut (or partition) of G that respects A (ie. both the endpoints of each edge in A either lie in S or in V − S). Let e = (u, v) be a lowest cost edge of E with one endpoint in S and the other one in V − S. Then A ∪ {(u, v)} is included in some minimum-cost spanning tree of G. Proof: Let T be a minimum-cost spanning tree that includes the edges of A. Suppose, to the contrary, the GT = (V, T ) that does not include the lowest cost edge e = (u, v) between S and V − S. By the previous lemma, the addition of e = (u, v) to GT forms a cycle. See figure 23.3 of the text (page 565). The cycle must contain an edge e0 = (x, y) ∈ T , crossing S and V − S. Clearly, e0 is not an element of A. the weight of (x, y), w((x, y)), is greater 0 0 than or equal to w((u, v)) (why?). Consider the graph GT 0 = (V, T = T − {e } ∪ {e}). GT 0 is a tree and

0 w(GT 0 ) = w(GT ) − w(e ) + w(e) 0 ≤ w(GT ), since w(e ) ≥ w(e) Since T is a minimum-cost spanning tree, w(T 0) = w(T ). Therefore, T 00 is also a minimum-cost spanning tree that contains the edges of A ∪ {(u, v)}. 2

5 4.1 Kruskal’s Algorithm Let G = (V, E) be a connected weighted graph. There are two standard ways to represent a graph: as a collection of adjacency linked lists or as an adjacency matrix. The adjacency list is usually preferred if the graph is a sparse graph ( for which |E| is much less than |V |2). The adjacency matrix representation may be preferred when the graph is dense (for which |E| is close |V |2). (Read section 22.1 of the text for more details.) The set T of edges is initially empty. Each iteration of the algorithm an edge is selected to be added to T . At every instant the partial graph formed by the vertices of G and the edges in T is a forest (several connected components). When T is empty, each vertex of G forms a distinct trivial connected component. The elements of T that are included in a connected component form a minimum-cost spanning tree for the vertices in that component. At the end of the algorithm only one connected component remains, so that T is a MST of G. We consider the edges of G in order of increasing cost. If the selected edge joins two nodes in diffrent connected components, we add the edge to T . The two connected components are now combined to form one component. If the selected edge joins two nodes in the same component, it is not added to T , otherwise, a cycle will be created. The algorithm MST-KRUSKAL is described in the text (page 569). In the description the functions MAKE-SET, FIND-SET and UNION have been used. MAKE-SET(v) creates a node labelled v; FIND-SET(v) finds the component that contains the node v; UNION(u,v) merges the two components, one that contains u and the other that contains v. These operations require disjoint set structures which have been discussed in Chapter 21 ( we have not covered). I will describe a in the class which allows an efficient implementation of the above mentioned three operations. The each FIND-SET operation costs O(log n) time. The time required to execute each UNION operation takes O(1) time. Analysis How long does Kruskal’s algorithm take? Let n = |V | and m = |E|. Since the graph is connected, we may assume that m ≥ n − 1. Observe that it takes O(m log m) time to sort the edges. The for loop is iterated m times, and each iteration involves a constant number of accesses to the Union-Find data structure on a collection of n items. Therefore Kruskal’s algorithm can be implemented in O((m + n) log n) time. Although this does not change the worst-case analysis, it is preferable to keep the edges in a min-. This allows the initialization to be carried out in O(n) time. In this way, edges in sorted order are selected one at a time and as soon as the number of components is reduced to one, the algorithm stops.

4.2 Problems 1. What can you say about the time required by Kruskal’s algorithm if, instead of pro- viding a list of edges, the user supplies a matrix of distances, leaving to the algorithm the job of working out which edges exist? 2. Let G be a connected, weighted graph, and let v be a vertex of G. Suppose that the weights of the edges incident on v are distinct. Let e be the edge of minimum weight incident on v. Show that e is contained in every minimum-cost spanning tree of G. 3. 23.2-1, 23.2-2, 23.2-8

6 5 Prim’s algorithm

It is another greedy algorithm. Unlike Kruskal’s algorithm, whose partial solutions are not necessarily connected, a partial solution to Prims algorithm is a tree. Prim’s algorithm begins with a start vertex s. Consider the partition (S, V − S) where S = {s}. We then find the lowest-cost edge with one endpoint in S (in the current tree) and the other endpoint in V − S. This edge is then added to T , and S contains all the vertices induced by the edges in T . The process is repeated till T has |V | − 1 edges (i.e. S = V ). The edges in T determine the minimum-cost spanning tree. The correctness follows from Lemma 2, described earlier. Page 572 of the text formally describes Prim’s algorithm. It is easy to see that the key questions in the efficient implementation of Prim’s algorithm is how to update the partition (or cut) efficiently, and how to determine the minimum-cost cross edge quickly. To do this we make use of a priority queue data structure. The priority queue supports three operations.

insert(u,key): Insert u with the key value key in Q.

extractMin(): Extract the item with the minimum key value in Q.

decreaseKey(u,newKey): Decrease the value of u0s key value to newKey.

A heap data structure can be used for our purpose. All the three operations can be performed in O(log n) time, where n is the number of items in the heap. What do we store in the priority queue? Initially, we might think that we should store the edges that cross the cut, since that is what we are removing with each step of the algorithm. The problem is that when a vertex, say v, is moved from one side of the cut to the other side, this results in a complicated sequence of updates. We need to removea all the cross edges incident on v in the priority queue and insert the remaining non-cross edges incident on v in the priority queue. There is a much more elegant solution, and that is what makes Prim’s algorithm so nice. For each vertex in u ∈ V − S (not part of the current spanning tree) we associate u with a key value key[u], which is the weight of the lightest edge going from u to any vertex in S. We also store in π[u] the end vertex of this edge in S. If there is no edges from u to a vertex in V − S, we set its key value to +∞. We will also need to know which vertices are in S and which are not. We do this by coloring the vertices in S black, and can be implemented by using a boolean array color.

7 MST-PRIM-Modified(G,w,r) for each { u in V} { Initialization key[u] = +∞; color[u] = white; } key[r] = 0; // start at the root π[r] = nil; Q = new PriQueue(V); // put vertices in Q while (Q.nonEmpty()) { // until all the vertices in Q u = Q.extractMin(); // vertex with the lightest edge for each { v in Adj[u]) { if ((color[v] == white and (w(u, v) < key[v]))) { key[v] = w(u, v); // new lighter edge out of v Q.decreaseKey(v, key[v]); π[v] = u; } } color[u] = black; } { The array π define the MST as an inverted tree rooted at r }.

To analyze Prim’s algorithm, we account for the time on each vertex as it is extracted from the priority queue. It takes O(log n) time to extract this vertex from Q. Here we use n = |V | and m = |E|. For each edge, we spend potentially O(log n) time decreasing the key of the adjacent vertex. Thus the time is O(log n+ (# of nodes incident on v)× O(log n)). Therefore the total running time is Θ((n + m) log n) since the total degree in G is 2m.

5.1 Problems 1. What happens (a) in the case of Prim’s algorithm (b) in the case of Kruskal’s algorithm if we allow edges with negative lengths? Is it still sensible to talk about minimal spanning trees if edges with negative lengths are allowed?

8 2. A graph may have several different minimal spanning trees. Where is the possibility reflected in the algorithm?

3. Are there graphs where Prim’s algorithm is slower than Kruskal’s algorithm?

4. 23-1 (page 575)

5. Is the path between a pair of vertices in a min-cost spanning tree of an undirected graph necessarily a min-cost path? Is this true if min-cost spanning tree is unique?

6. Suppose Prim’s algorithm and Kruskal’s algorithm choose lexicographically first edge first. That is, if a choice must be made between two distinct edges, e1 = (u1, v1) and e2 = (u2, v2), then the following strategy is used. Suppose the vertices are numbered 1, 2, . . . , n. Chose e1 if u1 < u2, or u1 = u2 and v1 < v2. Otherwise choose e2. Prove that under these conditions, both algorithms construct the same minimum-cost spanning tree.

7. Design an algorithm for the max-cost spanning tree.

6 Single-source

Problem: Given G = (V, E), a weighted, (dircted or undirected) find shortest paths from a single source vertex s to all other vertices of G. As in all-pairs shortest path problem, we will allow edges with negative weights, but negative cycle in G is not allowed. Let δ(u, v) denote shortest path from u to v. Then

min {w(P )} : over all path P from u to v δ(u, v) = { ∞ there is no such path The weight of a path P , w(P ), is the sum of the weights of the edges on the path P . There are many variants of the shortest path problem:

• Single-destination shortest path problem This is just the reverse of the single source shortest path problem.

• Single-pair shortest path problem

• All-pairs shortest path problem

All shortest path problems have the following optimal substructure property: If the shortest path from u to v goes via the vertex z, then δ(u, v) = δ(u, z) + δ(z, v). Dijkstra’s algorithm is a greedy algorithm. (Remember the generic Greedy algorithm) The sets C and S are respectively the set of available candidate nodes (still to be selected) and the set of nodes already chosen. At every moment S contains those nodes whose shortest paths from the source s is already known. At the outset S contains s only and C = V − S. At each step of the algorithm we choose the node in C whose distance from the source is least and we add to it to S.

9 Shortest Path Tree • A shortest path tree from source s is a subgraph G0 = (V 0, E0)

1. V 0 are the vertices reachable from s. 2. G0 is a tree rooted at s. 3. The unique path from s to a vertex v in G0 is a shortest path in G.

• The shortest path tree may be given by predecessors π[v]. The shortest path from s to v can be constructed from the predecessor array as (the path is given in reverse order) < v, π[v], π[π[v]], π[π[π[v]]], . . . , s > We say that a path from s to some node is special if all the intermediate nodes along the path belong to S. At each step of the algorithm, an array d contains the length of the shortest special path to each node of the graph. The initialization step is

• Initialize-Single-Source(G,s)

for ∀v ∈ V do d[v] ← ∞ π[v] ← Nil d[s] ← 0

Dijkstra’s algorithm is given in the text (page 595). We describe below the algorithm that avoids using RELAX. DIJKSTRA( G,s) 1. Initialize-Single-Source(G,s)

2. C ← V

3. S ← nil { Greedy Loop}

4. while S =6 V do

5. u ← mini∀v ∈ Cd[v] 6. C ← C − {u}

7. S ← S ∪ {u}

8. for each v ∈ C and (u, v) ∈ E do // cross edges of the type (u, ∗)

9. d[v] ← min(d[v], d[v] + w(u, v))

10. π[v] ← u if d[v] has changed its value in Line 9 (i.e. predecessor vertex has changes)

10 11. return d and π

Analysis of the Algorithm: Suppose G has n vertices and m edges. The initialization step takes O(n) time. If d and π are stored in arrays, each repeat loop takes O(n) time to select u. Since the edges incident on u are looked at only once (when u moves from V − S to S), Dijkstra’s algorithm can be implemented in a straightforward way in O(n2) time. This implementation is efficient if the graph is dense (i.e. m ∈ Θ(n2)). If m ∈ o(n2) (the graph is sparse), we could maintain the elements in d as a heap. Each extract of the minimum value of the heap and each decrease-value of an element of the heap can be performed in O(log n) time. In this way Dijkstra’s algorithm can be implemented in O((m + n) log n) time. If the graph is connected then m ≥ n − 1. In this case the algorithm of Dijkstra is implementable in O(m log n) time. What happens when some vertices are not reachable from the source vertex? In this case the shortest path cost from s to a non-reachable vertex is ∞. The algorithm formally described above indicates that the while loop is executed until S = V , even when some of the vertices are not reachable from s. What happens to the algorithm then? Correctness of Dijkstra’s algorithm: Dijkstra’s algorithm is correct; that is, it finds shortest paths from a sourcevertex to all of the other vertices. Proof: We use induction to show that at each iteration of Dijkstra’s algorithm, when u is removed from V − S, d[u] is the length of the shortest path from s to u. When we begin, u = s, d[u] = 0, and the length of a shortest path from s to s is zero, so the basis step is correct. We now assume that u has just now moved from V − S to S and that for all vertices i previously moved to S, d[i] is the length of a shortest path from s to i, i.e. d[i] = δ(s, i). First we show that if there exist a vertex z whose d[z] is less than d[u], z was previously moved to S. Suppose, by way of contradiction, z is still in V −S. Consider the shortest path P from s to z. Let z0 be the vertex of V − S on P which is closest to z, i.e. z0 is the first vertex encountered on the shortest path P as one moves from s to z. (The figure is drawn in the class) Let z00 be the predecessor of z0 on P . Then z00 is not in V − S. Therefore,

d[z0] ≤ d[z00] + w(z00, z0) ≤ w(P ) < d[u].

This shows that u is not the vertex in V − S (why?). This contradiction completes the proof that, if there is a path from s to a vertex z whose length is less than d[u], then z is already in S (i.e. already chosen). The above claim establishes the fact that any path from s to a vertex in V − S is at least d[u]. By construction, there is a path from s to u of cost d[u], so this is a shortest path from s to u. This completes the proof.

6.1 Problems 1. Show by giving an example that if the edge lengths are allowed to be negative, Dijkstra’s algorithm does not always work correctly. I am assuming that there is no negative cycle in the graph.

11 2. Show that the single-source shortest paths constructed by Dijkstra’s algorithm on a connected undirected graph form a spanning tree.

3. Is the spanning tree formed by Dijkstra’s algorithm on a connected undirected graph a min-cost spanning tree?

4. Suppose we want to solve the single-source longest path problem. Can we modify Dijkstra’s algorithm to solve this problem by changing minimum to maximum? If so, then prove your algorithm is correct. If not, provide a counterexample.

5. Problems 24.3-2, 24.3-4, 24.3-8 of the text (page 601).

12