Greedy Algorithms
Total Page:16
File Type:pdf, Size:1020Kb
Greedy Algorithms Introduction Analysis of algorithms In this lecture we begin the actual \analysis of algorithms" by examining greedy algorithms, which are considered among the easiest algorithms to describe and implement. Before doing so, we describe the different aspects of an algorithm that require analysis. Correctness It must be established that the algorithm performs as advertised. In other words, for each legal input, called a problem instance, the algorithm produces the desired output. Complexity Bounds must be provided on either the amount of time required to execute the algo- rithm, or the amount of space (e.g., memory) that is consumed by the algorithm as a function of the size of an input instance. Implementation Appropriate data structures must be identified that allow for the algorithm to achieve a desired time or space complexity. The correctness of an algorithm is sometimes immediate from its description, while the correctness of others may require clever mathematical proofs. As for complexity, in this course we are primarily concerned with the big-O growth of the worst-case running time of an algorithm. Of course, the running time T will be a function of the size of an input to the algorithm, and will have a big-O growth that is directly proportional to the number of algorithm steps that are required to process the given input. Here we define the size of a problem instance as the minimum number of bits needed to represent the instance. The convention is to represent the size of an instance using one or more size parameters that increase in proportion to the problem size. For example, if the size of an input is represented using the single size parameter n, then T (n) will represent the worst-case running time of the algorithm. By \worst case" we mean that T (n) represents the running time of the size-n input that requires the most time to run. 1 Example 1. Consider an algorithm that sorts an array of integers. Suppose each integer can be represented using k bits. Then the size of a problem instance is equal to k times the number of integers to be sorted. Let n be a parameter that represents the number of integers in the array. Then the problem size is equal to nk, and the running time function can be written as T (n; k). Now, if the algorithm is independent of the number of bits used to represent each integer, then we may drop the k and simply let n denote the problem size. In this case T is written as T (n). For example, the Quicksort algorithm makes use of the swap operation, which is independent of the number of bits representing each integer (assuming that each integer is stored in a single register or memory word). Thus T (n) seems more appropriate than T (n; k). On the other hand, Radix sort works directly on the bits of each integer. In this case T (n; k) seems more appropriate. 2 Example 2. Recall that a graph is a pair of sets G = (V; E), where V is the vertex set, and E is the edge set whose members have the form (u; v), where u; v 2 V .A graph algorithm takes as input a graph, and computes some property of the graph. In this case the worst-case running time is expressed as T (m; n), using size parameters n = jV j, called the order of G, and m = jEj, called the size of G. Notice that the size parameters for representing the size of a single vertex or edge are not included, since the algorithm steps are usually independent of the nature of each vertex and edge. In other words, regardless of whether each vertex is an integer, or an Employee data structure, the algorithm steps, and hence the big-O growth of the running time, remain the same. 3 Example 3. Consider an algorithm that takes as input a positive integer n, and determines whether or not n is prime. Since a positive integer n can be repreented using blog nc + 1 bits, we use size parameter m = log n to represent the problem size. Thus, the worst-case running time is written as T (m) or, equivalently, T (log n). 4 Note that we use the growth terminology from the big-O lecture to describe the running time of an algorithm. For example, if an algorithm has running time T (n) = O(n), then the algorithm is said to have a linear running time. In general we may write Ω(f(n)) ≤ T (n) ≤ O(g(n)); where f(n) represents the best-case running time, and g(n) denotes the worst-case. In case f(n) = Θ(g(n)), then we may write T (n) = Θ(g(n)), meaning that the algorithm always requires on the order of g(n) steps, regardless of the size-n input. Another time complexity measure of interest is the average-case running time Tave, which is obtained by taking the average of the running times for inputs of a given size. Finally, the runnng time of an algorithm is dependent on its implementation. For example, a graph algorithm may have running time T (m; n) = O(m2) using one implementation, and T (m; n) = O(m log n) using another. For this reason complexity analysis is inseparable from implementation analysis, and it often requires the use of both basic and advanced data structures for achieving a desired running time. On the other hand, correctness analysis is usually independent of implemen- tation. 5 Greedy Algorithms A Greedy algorithm is characterized by the following two properties: 1. the algorithm works in stages, and during each stage a choice is made which is locally optimal; 2. the sum totality of all the locally optimal choices produces a globally optimal solution. Note that if a greedy procedure does not always lead to a globally optimal solution, then we will refer to it as a heuristic, or a greedy heuristic. Heuristics often provide a \short cut" to a solution, but not necessarily to an optimal solution. Hencefore, we will use the term \algorithm" for a method that always produces a correct/optimal solution, and \heuristic" to describe a procedure that may not always produce the correct or optimal solution. The following are some problems that that can be solved using a greedy algorithm. Their algorithms can be found in this lecture, and in the exercises. Minimum Spanning Tree finding a spanning tree for a graph whose weight edges sum to a mini- mum value Fractional Knapsack selecting a subset of items to load in a container in order to maximize profit Task Selection finding a maximum set of non-overlapping tasks (each with a fixed start and finish time) that can be completed by a single processor Huffman Coding finding a code for a set of items that minimizes the expected code-length Unit Task Scheduling with Deadlines finding a unit-length a task-completion scehdule for a single processor in order to maximize the total earned profit where task ti Single source distances in a graph finding the distance from a source vertex in a graph to every other vertex in the graph Like all families of algorithms, greedy algorithms tend to follow a similar analysis pattern. Greedy Correctness Correctness is usually proved through some form of induction. For example, assume their is an optimal solution that agrees with the first k choices of the algorithm. Then show that there is an optimal solution that agrees with the first k + 1 choices. 6 Greedy Complexity The running time of a greedy algorithm is determined by the ease in main- taining an ordering of the candidate choices in each round. This is usually accomplished via a static or dynamic sorting of the candidate choices. Greedy Implementation Greedy algorithms are usually implemented with the help of a static sorting algorithm, such as Quicksort, or with a dynamic sorting structure, such as a heap. Additional data structures may be needed to efficiently update the candidate choices during each round. 7 Huffman Coding Huffman coding represents an important tool for compressing information. For example, consider a 1MB text file that consists of a sequence of ASCII characters from the set f`A'; `G'; `T'g. Moreover, suppose half the characters are A's, one quarter are G's, and one quarter are T's. Then instead of having each byte encode a letter, we instead let each byte encode a sequence of zeros and ones. We do this by assigning the codeword 0 to `A', 10 to `G', and 11 to `T'. Then rather than the sequence AGAT AA requiring 6 bytes of memory, we instead store it as the single byte 01001100. Moreover, the average length of a codeword is (1)(0:5) + (2)(0:25) + (2)(0:25) = 1:5 bits which yields a compression percentage of 81%, meaning that the file size has been reduced to 0.19 MB. For example, bit length 1 is weighted by 0.5 since half the characters are A's, while 2 is weighted with 0.25 since one quarter of the characters are G's. With a moment's thought one can see that 1.5 is the least attainable average by any binary prefix code that encodes the three characters with respect to the given frequencies, where a prefix code means that no codeword can be a prefix of any other codeword. For example, C = f0; 1; 11g is not a prefix code since 1 is a prefix of 11.