The Mathematics of Algorithm Design
Total Page:16
File Type:pdf, Size:1020Kb
The Mathematics of about all these algorithms without recourse to spe- Algorithm Design cific computing devices or computer programming languages, instead expressing them using the lan- Jon Kleinberg guage of mathematics. In fact, the notion of an algorithm as we now think of it was formalized Cornell University, Ithaca NY USA. in large part by the work of mathematical logi- cians in the 1930s, and algorithmic reasoning is 1 The Goals of Algorithm Design implicit in the past several millenia of mathemati- cal activity. (For example, equation-solving meth- When computer science began to emerge as a sub- ods have always tended to have a strong algorith- ject at universities in the 1960s and 1970s, it drew mic flavor; the geometric constructions of the an- some amount of puzzlement from the practitioners cient Greeks were inherently algorithmic as well.) of more established fields. Indeed, it is not initially Today, the mathematical analysis of algorithms clear why computer science should be viewed as a occupies a central position in computer science; distinct academic discipline. The world abounds reasoning about algorithms independently of the with novel technologies, but we don't generally specific devices on which they run can yield in- create a separate field around each one; rather, sight into general design principles and fundamen- we tend to view them as by-products of existing tal constraints on computation. branches of science and engineering. What is spe- At the same time, computer science research cial about computers? struggles to keep two diverging views in focus: Viewed in retrospect, such debates highlighted this more abstract view that formulates algorithms an important issue: computer science is not so mathematically, and the more applied view that much about the computer as a specific piece of the public generally associates with the field, the technology as it is about the more general phe- one that seeks to develop applications such as In- nomenon of computation itself, the design of pro- ternet search engines, electronic banking systems, cesses that represent and manipulate information. medical imaging software, and the host of other Such processes turn out to obey their own inherent creations we have come to expect from computer laws, and they are performed not only by comput- technology. The tension between these two views ers but by people, by organizations, and by sys- means that the field’s mathematical formulations tems that arise in nature. We will refer to these are continually being tested against their imple- computational processes as algorithms. For the mentation in practice; it provides novel avenues purposes of our discussion in this article, one can for mathematical notions to influence widely used think of an algorithm informally as a step-by-step applications; and it sometimes leads to new mathe- sequence of instructions, expressed in a stylized matical problems motivated by these applications. language, for solving a problem. The goal of this short article is to illustrate this This view of algorithms is general enough to cap- balance between the mathematical formalism and ture both the way a computer processes data and the motivating applications of computing. We be- the way a person performs calculations by hand. gin by building up to one of the most basic defi- For example, the rules for adding and multiplying nitional questions in this vein: how should we for- numbers that we learn as children are algorithms; mulate the notion of efficient computation? the rules used by an airline company for schedul- ing flights constitute an algorithm; and the rules used by a search engine like Google for ranking 2 Two Representative Problems Web pages constitute an algorithm. It is also fair to say that the rules used by the human brain to To make the discussion of efficiency more concrete, identify objects in the visual field constitute a kind and to illustrate how one might think about an of algorithm, though we are currently a long way issue like this, we first discuss two representative from understanding what this algorithm looks like problems | both fundamental in the study of algo- or how it is implemented on our neural hardware. rithms | that are similar in their formulation but A common theme here is that one can reason very different in their computational difficulty. 1 2 a) b) 24 12 12 8 8 20 10 14 10 7 22 11 11 Figure 1: Solutions to instance of (a) the Traveling Salesman Problem and (b) the Minimum Spanning Tree Problem, on the same set of points. The dark lines indicate the pairs of cities that are connected by the respective optimal solutions, and the lighter lines indicate all pairs that are not connected. The first in this pair is the Traveling Salesman in a \good" order; it has been used for problems Problem (TSP), and it is defined as follows. We that run from planning the motion of robotic arms imagine a salesman contemplating a map with n drilling holes on printed circuit boards (where the cities (he is currently located in one of them). The \cities" are the locations where the holes must be map gives the distance between each pair of cities, drilled) to ordering genetic markers on a chromo- and the salesman wishes to plan the shortest pos- some in a linear sequence (with the markers con- sible tour that visits all n cities and returns to the stituting the cities, and the distances derived from starting point. In other words, we are seeking an probabilistic estimates of proximity). The MST is algorithm that takes as input the set of all dis- a basic issue in the design of efficient communi- tances among pairs of cities, and produces a tour cation networks; this follows the motivation given of minimum total length. Figure 1(a) depicts the above, with fiber-optic cable acting in the role of optimal solution to a small instance of the TSP; \roads." The MST also plays an important role in the circles represent the cities, the dark line seg- the problem of clustering data into natural group- ments (with lengths labeling them) connect cities ings. Note for example how the points on the left that the salesman visits consecutively on the tour, side of Figure 1(b) are joined to the points on the and the light line segments connect all the other right side by a relatively long link; in clustering ap- pairs of cities, which are not visited consecutively. plications, this can be taken as evidence that the A second problem is the Minimum Spanning left and right points form natural groupings. Tree Problem (MST). Here we imagine a construc- It is not hard to come up with an algorithm for tion firm with access to the same map of n cities, solving the TSP. We first list every possible way but with a different goal in mind. They wish to of ordering the cities (other than the starting city, build a set of roads connecting certain pairs of the which is fixed in advance). Each ordering defines cities on the map, so that after these roads are a tour { the salesman could visit the cities in this built there is a route from each of the n cities to order and then return to the start { and for each each other. (A key point here is that each road ordering we could compute the total length of the must go directly from one city to another.) Their tour, by traversing the cities in this order and sum- goal is to build such a road network as cheaply as ming the distances from each city to the next. As possible | in other words, using as little total road we perform this calculation for all possible orders, material as possible. Figure 1(b) depicts the op- we keep track of the order that yields the smallest timal solution to the instance of the MST defined total distance, and at the end of the process we by the same set of cities used for part (a). return this tour as the optimal solution. Both of these problems have a wide range of While this algorithm does solve the problem, it practical applications. The TSP is a basic problem is extremely inefficient. There are n 1 cities other concerned with sequencing a given set of objects than the starting point, and any possible− sequence 3. COMPUTATIONAL EFFICIENCY 3 of them defines a tour, so we need to consider needs to sort the pairs of cities by their distances, (n 1)(n 2)(n 3) (3)(2)(1) = (n 1)! possi- and then make a single pass through this sorted ble−tours.−Even for− n·=· · 30 cities, this is− an astro- list to decide which roads to build. nomically large quantity; on the fastest computers This discussion has provided us with a fair we have today, running this algorithm to comple- amount of insight into the nature of the TSP tion would take longer than the life expectancy and MST problems. Rather than experimenting of the Earth. The difficulty is that the algorithm with actual computer programs, we described al- we have just described is performing brute-force gorithms in words, and made claims about their search: the \search space" of possible solutions to performance that could be stated and proved as the TSP is very large, and the algorithm is doing mathematical theorems. But what can we abstract nothing more than plowing its way through this from these examples, if we want to talk about com- entire space, considering every possible solution. putational efficiency in general? For most problems, there is a comparably inef- ficient algorithm that simply performs brute-force 3 Computational Efficiency search.