<<

Overview Kruskal Dijkstra Human Compression Overview Kruskal Dijkstra Human Compression Overview

One of the strategies used to solve optimization problems CSE 548: (Design and) Analysis of Multiple solutions exist; pick one of low (or least) cost Greedy Algorithms Greedy strategy: make a locally optimal choice, or simply, what appears best at the moment R. Sekar Often, locally optimality 6⇒ global optimality So, use with a great deal of care Always need to prove optimality If it is unpredictable, why use it? It simplifies the task!

1 / 35 2 / 35

Overview Kruskal Dijkstra Human Compression Overview Kruskal Dijkstra Human Compression Making change When does a Greedy work?

Given coins of denominations 25c|, 10c|, 5c| and 1c|, make change for x cents (0 < x < 100) using minimum number of coins. Greedy choice property Greedy solution The greedy (i.e., locally optimal) choice is always consistent with makeChange(x) some (globally) optimal solution if (x = 0) return What does this mean for the coin change problem? Let y be the largest denomination that satisfies y ≤ x Optimal substructure Issue bx/yc coins of denomination y The optimal solution contains optimal solutions to subproblems. makeChange(x mod y) Implies that a can invoke itself recursively after Show that it is optimal making a greedy choice. Is it optimal for arbitrary denominations?

3 / 35 4 / 35 Chapter 5

Overview Kruskal Dijkstra Human Compression Overview Kruskal Dijkstra Human Compression Greedy algorithmsFractional Knapsack

A sack that can hold a maximum of x lbs Greedy choice property Proof by contradiction: Start with the assumption that there is an You have a choice of items you can pack in the sack optimal solution that does not include the greedy choice, and show a A game like chess can be won only by thinking ahead: a player who is focused entirely on Maximize the combined “value” of items in the sack contradiction. item calories/lb weightimmediate advantage is easy to defeat. But in many other games, such as Scrabble, it is bread 1100 5possible to do quite well byOptimal simply substructure making whichever move seems best at the moment and not After taking as much of the item with jth maximal value/weight, butter 3300 worrying1 too much about future consequences. tomato 80 1 suppose that the knapsack can hold y more lbs. cucumber 55 2 This sort of myopic behaviorThen the optimal is easy solution and for the convenient, problem includes making the optimal it an attractive algorithmic 0-1 knapsack: Take all of one item or none at allstrategy. Greedy algorithmschoice of build how to fill up a knapsack a solution of size y piecewith the byremaining piece, items. always choosing the next

Fractional knapsack: Fractional quantities acceptablepiece that offers the mostDoes obvious not work forand 0-1 immediateknapsack because bene greedyfi choicet. Although property such an approach can be Greedy choice: pick item that maximizes calories/lbdisastrous for some computationaldoes not hold. tasks, there are many for which it is optimal. Ourfirst

Will a greedy algorithm work, with x = 5? example is that5 / 35 of minimum0-1 knapsack spanning is NP-hard, trees. but a pseudo-polynomial algorithm is 6 / 35 available.

Overview Kruskal Dijkstra Human Compression Minimal Spanning 5.1Kruskal’s Algorithm Minimum spanningOverview Kruskal Dijkstra trees Human Compression Minimal Spanning Tree Kruskal’s Algorithm Spanning Tree Minimal Spanning Tree (MST) Suppose you are asked to network a collection of computers by linking selected pairs of them. A subgraph of a graph G = (V, E) that includes:This translates into a graphA spanning problem tree with minimal in which cost. Formally: nodes are computers, undirected edges are Input: An undirected graph G = (V, E), a cost function w : E → . All the vertices V in the graph potential links, and the goal is to pick enough of these edges thatR the nodes are connected. But this is not all; each linkOutput: also A tree hasT = a (V maintenance, E0) such that E0 ⊆ cost,E that minimizesreflected in that edge’s weight. What A subset of E such that these edges form a tree P 0 w(e) is the cheapest possible network?e∈E We consider connected undirected graphs, where the second condition for MST can be replaced by 1 A maximal subset of E such that the subgraph has no cycles A C E 4 3 4 A subset of E with |V| − 1 edges such that the subgraph is 4 2 5 connected

A subset of E such that there is a unique path between any two B 4 D 6 F vertices in the subgraph One immediate7 / 35 observation is that the optimal set of edges cannot8 / 35contain a cycle, because removing an edge from this cycle would reduce the cost without compromising connectivity:

Property 1 Removing a cycle edge cannot disconnect a graph. So the solution must be connected and acyclic: undirected graphs of this kind are called trees. The particular tree we want is the one with minimum total weight, known as the . Here is its formal definition.

Input: An undirected graphG=(V, E); edge weightsw e.

133 Chapter 5 Output: A treeT=(V, E ), withE E, that minimizes � � ⊆ � weight(T) = we. e E� Greedy algorithms ∈ In the preceding example, the minimum spanning tree has a cost of 16:

1 A C E 4 2 5 A game like chess can be won only by thinking ahead: a player who is focused entirely on B 4 D F immediate advantage is easy to defeat. But in many other games, such as Scrabble, it is possible to do quite well by simply making whichever move seems best at theHowever, moment this is not andthe only not optimal solution. Can you spot another? worrying too much about future consequences. 5.1.1 A greedy approach This sort of myopic behavior is easy and convenient, making it an attractiveKruskal’s minimum algorithmic spanning tree algorithm starts with the empty graph and then selects E strategy.Output: Greedy A tree algorithmsT=(V, E build), with upE a solutionE, that minimizes piece by piece, alwaysedges choosing from according the to next the following rule. � � ⊆ piece that offers the most obvious and immediate bene� fit. Although such anRepeatedly approach add the can next be lightest edge that doesn’t produce a cycle. disastrous for some computational tasks, there are many for which it is optimal. Ourfirst weight(T) = we. In other words, it constructs the tree edge by edge and, apart from taking care to avoid cycles, simply picks whichever edge is cheapest at the moment. This is a greedy algorithm: every example is that of minimum spanning trees. e E� ∈ decision it makes is the one with the most obvious immediate advantage. Figure 5.1 shows an example. We start with an empty graph and then attempt to add Overview Kruskal Dijkstra Human Compression Minimal Spanning Tree Kruskal’s Algorithm edges in increasingOverview order Kruskal of weightDijkstra Human (ties Compression are brokenMinimal arbitrarily): Spanning Tree Kruskal’s Algorithm 5.1 Minimum spanning trees In the preceding example,Minimal the Spanning minimum Tree spanning (MST) tree has a cost of 16: BKruskal’sC,C D,B algorithmD,C F,D F,E F,A D,A B,C E,A C. − − − − − − − − − − Thefirst two succeed, but the third,B D, would produce a cycle if added. So we ignore it Suppose you are asked to network a collection1 of computers by linking selected pairs of them. − This translates into a graph problemA in whichC nodes areE computers, undirectedand move along.edges Thefinal result are is a tree with cost 14, the minimum possible. 4 Start with the empty set of edges The correctness of Kruskal’s method follows from a certain cut property, which is general potential links, and the goal is to pick enough of these edges5 that the nodes are connected. 2 enough to alsoRepeat: justify adda whole lightest slew edge of other that minimum doesn’t create spanning a cycle tree algorithms. But this is not all; each link also has a maintenance cost, reflected in that edge’s weight. What Adds edges B—C, C—D, C—F, A—D, E—F is the cheapest possible network? B 4 D F Figure 5.1 The minimum spanning tree found by Kruskal’s algorithm.

6 5 1 A C E A C E A C E 4 1 However, this is not the only optimal solution. Can you spot another? 5 2 3 4 4 3 4 4 2 5 B 2 D 4 F B D F 5.1.1 A greedy approach B 4 D 6 F 134 Kruskal’s minimum spanning tree algorithm starts with the empty9 / 35 graph and then selects 10 / 35 edgesOne from immediateE according observation to the following is that the rule. optimal set of edges cannot contain a cycle, because removing an edge from this cycleOverview Kruskal would Dijkstra Human reduce Compression theMinimal Spanning cost Tree withoutKruskal’s Algorithm compromising connectivity:Overview Kruskal Dijkstra Human Compression Minimal Spanning Tree Kruskal’s Algorithm Repeatedly add the next lightest edge that doesn’t produce a cycle. Property 1 Removing aKruskal’s cycle edge algorithm cannot disconnect a graph. Kruskal’s: Correctness (by induction) In otherSo the words, solution it constructs must be connected the tree edge and by acyclic: edge and, undirected apart from graphs taking of this care kind toInduction avoid are Hypothesis: cycles, calledThe first i edges selected by Kruskal’s algorithm are simplytrees. The picks particular whichever tree edge we is want cheapest is the at one the withmoment. minimum This is total a greedy weight, algorithm: knownincluded in someas every theminimal spanning tree T MST(V, E, w) Base case: trivial — the empty set of edges is always in any MST. decisionminimum it spanningmakes is the tree one. Here with is theits formal most obvious definition. immediate advantage. Induction step: Show that i+1th edge chosen by Kruskal’s is in the MST T X = φ Figure 5.1 shows an example. We start with an empty graph and then attemptfrom induction to hypothesis, add i.e., prove greedy choice property. Q = priorityQueue(E) // from min to max weight edgesInput: in increasing An undirected order of graph weightG=( (tiesV, Eare); edge broken weights arbitrarily):w e. Let e = (v, w) be the edge chosen at i + 1th step of Kruskal’s. while Q is nonempty T is a spanning tree: must include a unique path from v to w e = deleteMin(Q) At least one edge e0 on this path is not in X, the set of edges chosen in B C,C D,B D,Cif e connectsF,D two disconnectedF,E 133F,A componentsD,A in (VB,C, X) E,A C. the first i steps by Kruskal’s. (Otherwise, v and w will already be − − − X = X−∪ {e} − − − − − − connected in X and so e won’t be chosen by Kruskal’s.) Since neither e nor e0 are in X, and Kruskal’s chose e, w(e0) ≥ w(e). 0 0 Thefirst two succeed, but the third,B D, would produce a cycle if added. So weReplace ignoree by ite in T to get another spanning tree T . Either − w(T 0) < w(T), a contradiction to the assumption T is minimal; or and move along. Thefinal result is a tree with cost 14, the minimum possible. w(T 0) = w(T), and we have another MST T 0 consistent with X ∪ {e}. In

11 / 35 both cases, we have completed the induction step. 12 / 35 The correctness of Kruskal’s method follows from a certain cut property, which is general enough to also justify a whole slew of other minimum spanning tree algorithms.

Figure 5.1 The minimum spanning tree found by Kruskal’s algorithm.

6 5 A C E A C E

4 1 5 2 3 4

B 2 D 4 F B D F

134 Overview Kruskal Dijkstra Human Compression Minimal Spanning Tree Kruskal’s Algorithm Overview Kruskal Dijkstra Human Compression Minimal Spanning Tree Kruskal’s Algorithm Kruskal’s: Runtime complexity MST: Applications

MST(V, E, w) X = φ Q = priorityQueue(E, w) // from min to max weight Network design: Communication networks, transportation networks, while Q is nonempty electrical grid, oil/water pipelines, ... e = deleteMin(Q) if e connects two disconnected components in (V, X) Clustering: Application of minimum spanning forest (stop when X = X ∪ {e} |X| = |V| − k to get k clusters Broadcasting: Spanning tree protocol in Ethernets Priority queue: O(log |E|) = O(log V) per operation Connectivity test: O(log V) per check using a disjoint set Thus, for |E| iterations, we have a runtime of O(|E| log |V|) 13 / 35 14 / 35

Overview Kruskal Dijkstra Human Compression Overview Kruskal Dijkstra Human Compression Shortest Paths Dijkstra’s Algorithm: Outline

Input: A directed graph G = (V, E), a cost function l : E → R assigning non-negative costs, source and destination vertices s Base case: Start with explored = {s} and t Inductive step: Output: The shortest cost path from s to t in G. Optimal substructure: After having computed the shortest path to Note: all vertices in explored, Single source shortest paths: find shortest paths from s to all Greedy choice: extend explored with a v that can be reached every vertex. Can be solved using the same algorithm, with the using one edge e from some u ∈ explored such that same complexity! dist(u) + l(e) is minimized This algorithm constructs a spanning tree called shortest path tree (SPT) Finish: when explored = V Applications: protocols (OSPF, BGP, RIP, ...), Map routing (flights, cars, mass transit), ... 15 / 35 16 / 35 �������������������� ��������������������

Greedy approach. Maintain a set of explored nodes � for which Greedy approach. Maintain a set of explored nodes � for which algorithm has determined the shortest path distance ���� from � to �.algorithm has determined the shortest path distance ���� from � to �. Initialize . Initialize . � �������������������� � �������������������� �Repeatedly choose unexplored node � which minimizes �Repeatedly choose unexplored node � which minimizes

Overview Kruskal Dijkstra Human Compression shortest path to some node u in exploredadd part, �Overview to � Kruskal, and Dijkstra set Human �� �Compression����π���. shortest path to some node u in explored part, Dijkstra’s: High-level intuition followed by a single edge (u, v) Dijkstra’s: High-level intuition followed by a single edge (u, v)

���� �e v �e v ���� ���� u u � �

s s

Figure 4.9 A complete run of Dijkstra’s algorithm, with nodeA as the starting point. Also shown are the associated dist values and thefinal shortest-path tree. Figure 4.9Figure A complete 4.9 A complete run of Dijkstra’s run of Dijkstra’s algorithm, algorithm, with node withA as node the startingA as the point. starting Also point. Also 6 shown2 areshown the associated are the associated dist values dist and values thefinal and shortest-path thefinal shortest-path tree. tree. 7 B D Blue-colored region represents explored, i.e., we have already In each iteration, we4 extend explored to include the vertex v that is 4 A:20 D: 2 B B ∞D D 1 3 1 B:4 E: computed shortest paths to these vertices. the closest to anyA vertex in explored 4 4 4 ∞ 4 A:0 D: A:0 D: 3 C:2 2 ∞ ∞ A 1A 3 1 3 1 1 B:4 E: B:4 E: C E ∞ ∞ 5 3 3 C:2 C:2 2 2 17 / 35 18 / 35 E E C 5 C 5 2 B D 4 4 2 2 B 0 B 6D D 1 3 A: D: A 1 4 4 B:3 4 E:7 4 A:0 D:6A:0 D:6 3 1 3 1 3 Overview Kruskal Dijkstra Human Compression Overview Kruskal Dijkstra Human Compression C:2 1 1 2 A A B:3 E:7B:3 E:7 C E 3 3 5 2 2 C:2 C:2 C C E E Dijkstra’s Algorithm Figure 4.9Figure A complete 4.9 A run completeDijkstra’s of Dijkstra’s run of algorithm, Dijkstra’s Algorithm: algorithm, with nodeA with as the node starting IllustrationA as the point. starting Also point.5 Also 5 shown areshown the associated are the associateddist values dist and values thefinal and shortest-path thefinal shortest-path tree. tree. 2 B D 4 2 2 2 2 4 B B D D ShortestPathTree(V, E, l, s) B B D D 4 4 A:0 D:5 4 4 A 1 3 1 4 4 4 4 A:0 D: A:0 D: B:3 E:6 A:0 D:5A:0 D:5 ∞ A ∞ 1A 3 1 3 1 1 for do 1 3 1 3 1 1 4 3 4 v in V A A 2 B: E: B: E: C:2 B:3 E:6B:3 E:6 ∞ ∞ 3 3 3 3 C:2 C:2 E 2 2 2 2 2 2 C 5 C: C: dist(v) = ∞, prev(v) = nil C C E E E E 5 5 C 5 C 5 dist(s) = 0 2 2 2 B B D D B D 2 2 H = priorityQueue(V, dist) 4 4 4 B B D D 4 4 0 4 6 0 6 4 4 A:0 D:5 1 3 1 3 A: D: A: D: 1 1 1 3 1 4 4 0 5 0 5 while H is nonempty A A A B:3 E:7 B:3 E:7 B:3 E:6 A: D: A: D: 3 A 1A 3 1 3 1 1 B:3 E:6B:3 E:6 3 3 3 3 2 2 2 C:2 C:2 C:2 2 2 C:2 C:2 v = deleteMin(H) // Note: explored = V − H C C E E C E 5 5 5 C C E E for hv, wi ∈ E do 5 5

2 2 2 if dist(w) > dist(v) + l(hv, wi) B B D D B D 2 2 4 4 B B D D 4 4 dist(w) = dist(v) + l(hv, wi) A:0 D:5 A:0 D:5 A 1A 3 1 3 1 1 1 3 B:3A E:6 B:3 E:6 A A 1 3 1 3 3 3 prev(w) = v 2 2 C:2 C:2 2 C C E E 2 2 decreaseKey(H, w) 5 5 C E C C E E

19 / 35 20 / 35 2 2 B B D D 4 4 116 116 116 4 4 A:0 D:5 A:0 D:5 A 1A 3 1 3 1 1 B:3 E:6 B:3 E:6 3 3 2 2 C:2 C:2 E E C 5 C 5

2 2 B B D D

A A1 3 1 3

2 2 C C E E

116 116 Overview Kruskal Dijkstra Human Compression Overview Kruskal Dijkstra Human Compression Dijkstra’s Algorithm: Correctness Dijkstra’s Algorithm: Correctness (2)

Let Vi = V − H, and Ei = {prev(v)|v ∈ Vi}. Note that Ti = (Vi, Ei) Note that v ∈ H chosen to be added to explored has the lowest dist in H. This means its dist must have been updated previously, and must have prev(v) set to some u ∈ explored. Base case: Start with explored = ∅, so holds vacuously Note Ti+1 = (Vi ∪ {v}, Ei ∪ (u, v)). Need to show (u, v) ∈ T. Since T is a tree, it must have a unique path P from s to v 0 0 Induction hypothesis: Tree Ti constructed so far (after i steps of P must have an edge (u ∈ Vi, v ∈ H) that bridges Vi and H. 0 0 Dijkstra’s) is a subtree of an SPT T (Optimal substructure) If v = v and u = u we are done. Otherwise: if v0 6= v then note that dist(v0) ≥ dist(v) (by how v was selected) and Induction step: By contradiction — similar to MST hence the so-called shortest path in T to v is longer than that in Ti+1 — a contradiction. (Assuming l(x, y) > 0∀x, y ∈ V.) if u0 6= u, then there is still a contradiction if dist(u0) + l(u0, v) > dist(u) + l(u, v). Otherwise, the two sides should be equal, in which case we can obtain another SPT T 0 from T by replacing (u0, v) by (u, v). This completes the induction step, as we have constructed an SPT consistent with Ti+1 21 / 35 22 / 35

Overview Kruskal Dijkstra Human Compression Overview Kruskal Dijkstra Human Compression Dijkstra’s Algorithm: Runtime Information Theory and Coding

Information content while H is nonempty O(|V|) iterations of For an event e that occurs with probability p, its information content v = deleteMin(H) deleteMin: O(|V| log |V|) is given by I(e) = − log p for hv, wi ∈ E do Inner loop executes O(|E|) if dist(w) > dist(v) + l(hv, wi) times, each iteration takes “surprise factor” — low probability event conveys more dist(w) = dist(v) + l(hv, wi) O(log V) time information; an event that is almost always likely (p ≈ 1) conveys prev(w) = v no information. So, total time is decreasKey(H, w) O((|E| + |V|) log |V|) Information content adds up: for two events e1 and e2, their combined information content is −(log p1 + log p2)

23 / 35 24 / 35 Overview Kruskal Dijkstra Human Compression Overview Kruskal Dijkstra Human Compression Information theory: Entropy Optimal code length

Shannon’s source coding theorem Information entropy A random variable X denoting chars in an alphabet Σ = {x1,..., xn} For a discrete random variable X that can take a value xi with probability pi, its entropy is defined as the expectation (“weighted cannot be encoded in fewer than H(X) bits. can be encoded using at most H(X) + 1 bits average”) over the information content of xi: n X H(X) = E[I(X)] = − p log p i i The first part of this theorem sets a lower bound, regardless of how i=1 clever the encoding is.

Entropy is a measure of uncertainty Surprisingly simple proof for such a fundamental theorem! (See Wikipedia.) Plays a fundamental role in many areas, including coding theory and machine learning. Human coding: an algorithm that achieves this bound

25 / 35 26 / 35

Overview Kruskal Dijkstra Human Compression Overview Kruskal Dijkstra Human Compression Variable-length encoding Variable-length encoding

Let Σ =Figure{A, 5.10B, C A, preDfi}x-freewith encoding. probabilities Frequencies 0 are.55 shown, 0.02 in, square0.15, 0 brackets..28.

0 1 Let Σ = {A, B, C, D} with probabilities 0.55, 0.02, 0.15, 0.28. Let us try fixing the codes, not just their lengths:Symbol Codeword [60] If we use a fixed-length code, each character will use 2-bits. A 0 A [70] A = 0, D = 11, C B= 101, B100= 100. Alternatively, use a variable length code C 101 [23] Note: enough toD assign11 3 bits to B, D [37] Let us use as many bits as the information content of a character not 6. So, average coding size A uses 1 bit, B uses 6 bits, C uses 3 bits, and D uses 2 bits. reduces to 1.62. C [20]B [3] You get an average saving of 15% Prefix encoding

0.55 ∗ 1 + 0.02 ∗ 6 + 0.15 ∗ 3 + 0.28 ∗ 2 = 1.68 bits In general, how do wefind the optimal coding tree, given the frequenciesf 1, f2, . . . , fn of No coden symbols? is a To prefix make the of problem another. precise, we want a tree whose leaves each correspond to a Lower bound (entropy) symbol and which minimizes the overall length of the encoding, −(.5 log .5 + .02 log .02 + .14 log .14 + .27 log .27) = 1.51 bits Necessary property to enable decoding. 2 2 2 2 �n cost of tree= f (depth ofith symbol in tree) Every such encoding can be representedi · using a full binary tree (either 0 or 2 children for everyi=1 node)

27 / 35 (the number of bits required for a symbol is exactly its depth in the tree). 28 / 35 There is another way to write this cost function that is very helpful. Although we are only given frequencies for the leaves, we can define the frequency of any internal node to be the sum of the frequencies of its descendant leaves; this is, after all, the number of times the internal node is visited during encoding or decoding. During the encoding process, each time we move down the tree, one bit gets output for every nonroot node through which we pass. So the total cost—the total number of bits which are output—can also be expressed thus:

The cost of a tree is the sum of the frequencies of all leaves and internal nodes, except the root.

Thefirst formulation of the cost function tells us that the two symbols with the smallest frequencies must be at the bottom of the optimal tree, as children of the lowest internal node (this internal node has two children since the tree is full). Otherwise, swapping these two symbols with whatever is lowest in the tree would improve the encoding. This suggests that we start constructing the tree greedily:find the two symbols with the smallest frequencies, sayi andj, and make them children of a new node, which then has frequencyf i +f j. To keep the notation simple, let’s just assume these aref 1 andf 2. By the second formulation of the cost function, any tree in whichf 1 andf 2 are sibling-leaves has cost f +f plus the cost for a tree withn 1 leaves of frequencies(f +f ), f , f , . . . , f : 1 2 − 1 2 3 4 n 147 Algorithms Lecture 7: Greedy Algorithms[Fa’13]

Overview Kruskal Dijkstra Human Compression After 19 merges, allOverview 20characters Kruskal Dijkstra haveHuman been Compression merged together. The record of merges gives us our code tree. The algorithm makes a number of arbitrary choices; as a result, there are actually several different Human encoding Huffman codes.Human One suchencoding: code is shown below. Example For example, the code for A is 110000, and the code for S is 00.

Build the prefix tree bottom-up ��� �� ��� Start with a node whose children are � �� �� �� �� codewords c1 and c2 that occur least � � �� �� �� �� often �� ��

� � � � f1 +f 2 �� �� �� �� � � �� �� Remove c1 and c2 from alphabet, f5 f4 f3 � � � � � 0 � � � replace with c that occurs with � � � � �

� � � � frequency f1 + f2 � � f1 f2 � � � �

� � � � Recurse � � � � The latter problem is just a smaller version of the one we started with. So we pullf 1 andf 2 A HuffmanThis code sentence for Lee contains Sallows’ three self-descriptive a’s, three c’s, two sentence; d’s, twenty-six the e’s, five f’s,are three frequencies g’s, eight for h’s, merged thirteen characters i’s, off the list of frequencies, insert(f 1 +f 2), and loop. The resulting algorithm can be described How to make this algorithm fast? two l’s, sixteen n’s, nine o’s, six r’s, twenty-seven s’s, twenty-two t’s, two u’s, five v’s, eight w’s, four x’s, in terms of priority queue operations (as defined on page 114) and takesIf weO use(n log thisn) code, time the if a encoded message starts like this: five y’s, and only one z. Images from Je Erickson’s “Algorithms” What is itsbinary complexity? (Section 4.5.2) is used. 1001 0100 1101 00 00 111 011 1001 111 011 110001 111 110001 10001 011 1001 110000 1101 Uses about 650 bits, vs 850 for fixed-length (5-bit) code. procedure Huffman(f) 29 / 35 T H I S S E N T E N C E C O N T A30 / 35 I ··· Input: An arrayf[1 n] of frequencies Here is the list of costs for encoding each character in the example message, along with that character’s ··· Output: An encoding tree withn leaves contribution to the total length of the encoded message: char. ACDEFG HILNOR STUVWX YZ Overview Kruskal Dijkstra Human Compression Overview Kruskal Dijkstra Human Compression letH be a priority queue of integers, ordered byfreq.f 3 3 2 26 5 3 8 13 2 16 9 6 27 22 2 5 8 4 5 1 Human encoding:fori=1 to Optimalityn: insert(H,i) depth 6Human 6 7 3 Coding: 5 6 4 Applications 4 7 3 4 4 2 4 7 5 4 6 5 7 fork=n+1 to2n 1: total 18 18 14 78 25 18 32 52 14 48 36 24 54 88 14 25 32 24 25 7 − i= deletemin(H), j= deletemin(H) Altogether, the encoded message is 646 bits long. Different Huffman codes would assign different codes, create a node numberedk with children i, j possibly with different lengths, to various characters, but the overall length of the encoded message is f[k]=f[i] +f[j] the same for any Huffman code: 646 bits. Crux of the proof:insertGreedy(H, choice k) property Given the simple structure of Huffman’s algorithm, it’s rather surprising that it produces an optimal prefix-free binary code. Encoding Lee Sallows’ sentence using any prefix-free code requires at least 646 Familiar exchangeReturning argument to our toy example: can you tell if the tree of Figure 5.10bits! is optimal? Fortunately, the recursive structure makes this claim easy to prove using an exchange argument, similar to ourDocument earlier optimality compression proofs. We start by proving that the algorithm’s veryfirst choice is Suppose the optimal prefix tree does not use longest path for two correct. Signal encoding least frequent codewords c1 and c2 Lemma 5. Let x and y be the two least frequent characters (breaking ties between equally frequent Show that by exchanging c1 with the codeword using the longest path characters arbitrarily).As part of There other is an compression optimal code algorithms tree in which (MP3,x and gzip,y are siblings.PKZIP, JPEG, ...) in the optimal tree, you can reduce the cost of the “optimal code” — a Proof: I’ll actually prove a stronger statement: There is an optimal code in which x and y are siblings contradiction and have the largest depth of any leaf.

Same argument holds for c2 6

31 / 35 32 / 35

148 Overview Kruskal Dijkstra Human Compression Overview Kruskal Dijkstra Human Compression Lossless Compression gzip Algorithm [Lempel-Ziv 1977]

How much compression can we get using Human? Key Idea: Use preceding W-bytes as the codebook (“sliding window”, It depends on what we mean by a codeword! up to 32KB in gzip) If they are English characters, eect is relatively small if they are English words, or better, sentences, then much higher Encoding: compression is possible Strings previously seen in the window are replaced by the pair To use words/sentences as codewords, we probably need to (oset, length) construct document-specific codebook Need to find the longest match for the current string Larger alphabet size implies larger codebooks! Matches should have a minimum length, or else they will be emitted Need to consider the combined size of codebook plus the encoded as literals document Encode oset and length using Human encoding Can the codebook be constructed on-the-fly? Decoding: Interpret (oset, length) using the same window of Lempel-Ziv compression algorithms (gzip) W-bytes of preceding text. (Much faster than encoding.)

33 / 35 34 / 35

Overview Kruskal Dijkstra Human Compression Greedy Algorithms: Summary

One of the strategies used to solve optimization problems Frequently, locally optimal choices are NOT globally optimal, so use with a great deal of care. Always need to prove optimality. Proof typically relies on greedy choice property, usually established by an “exchange” argument, and optimal substructure. Examples MST and clustering Shortest path Human encoding

35 / 35