Experimental Analysis of Algorithms, Volume 48, Number 3

fea-mcgeoch.qxp 2/2/01 11:58 AM Page 304

Experimental Analysis of Algorithms Catherine C. McGeoch

As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality. —Albert Einstein

In theory there is no difference between theory and practice. In practice there is. —Yogi Berra

he algorithm and the program. The given applications. Another effort focuses on using abstract and the physical. If you want to experiments to extend and improve the kinds of understand the fundamental and uni- results obtained in traditional algorithm analysis. versal properties of a computational That is, rather than having a goal of measuring pro- process, study the abstract algorithm grams, we develop experiments in order to better Tand prove a theorem about it. If you want to know understand algorithms, which are abstract math- how the process really works, implement the ematical objects. In this article we concentrate on algorithm as a program and measure the running examples from this latter type of research in ex- time (or another quantity of interest). perimental algorithmics. This distinction between the abstract model It is natural to wonder whether such an effort and the physical artifact exists in the study of is likely to bear fruit. If the ultimate goal of algo- computational processes just as in every other rithm analysis is to produce better programs, area of mathematical modeling. But algorithmic wouldn’t we be better off studying programs in problems have some unusual features. For exam- their natural habitats (computers) rather than per- ple, we usually build models to serve as handy rep- forming experiments on their abstractions? Con- resentations of natural phenomena that cannot versely, if the goal is to advance knowledge in an be observed or manipulated directly. But programs area of mathematical research (algorithm analysis), and computers are completely accessible to the re- are we wise to study abstract objects using im- searcher and are far more manipulable than, say, perfect, finite, and ever-changing measurement weather patterns. They are also much easier to tools? understand: hypothetically, one could obtain com- One answer to the first question is that reliable plete information about the behavior of a program program measurement is not as easy as it sounds. by consulting technical documents and code. And Running times, for example, depend on complex finally, algorithms are usually invented before pro- interactions between a variety of products that grams are implemented, not the other way around. comprise the programming environment, includ- This article surveys problems and opportunities ing the computer chip (perhaps Intel Inside), mem- that lie in the interface between theory and prac- ory sizes and configurations, the operating system tice in a relatively new research area that has been (such as Windows 98 or Unix), the programming called experimental algorithmics, experimental language (maybe Java or C), and the brand of com- analysis of algorithms, or algorithm engineering. piler (like CodeWarrior).1 These products are so- Much work in this field is directed at finding and phisticated, varied in design, and, especially when evaluating state-of-the-art implementations for 1Intel InsideTM is a registered trademark of the Intel Catherine C. McGeoch is an associate professor of Corporation. WindowsTM 98 is a registered trademark of computer science at Amherst College. She has been Microsoft Corporation. UnixTM is a registered trademark active in the development of an experimental of The Open Group. JavaTM is a registered trademark of tradition in algorithmic studies. Her e-mail address is Sun Microsystems, Inc. C is not a trademark. CodeWar- [email protected]. riorTM is a trademark of Metrowerks, Inc.

304 NOTICES OF THE AMS VOLUME 48, NUMBER 3 fea-mcgeoch.qxp 2/2/01 11:58 AM Page 305

1. Array A[1 ...n] contains n distinct numbers. We want the r th smallest. Set lo 1 and hi n. 2. Partition. Set x A[lo]. Rearrange the numbers in A[lo...hi] into three groups around some index p such that: a. For lo ix. 3. Check Indices. If p = r, stop and report the element A[p]. Otherwise, if pr, so set hi p 1 and go to Step 2.

Figure 1: Selection Algorithm S. The algorithm reports the r th-smallest number from array A[1 ...n].

used in combination, extremely difficult to model. ment of new mathematical techniques (mostly There is no known general method for making ac- combinatorial), and, not infrequently, efficient al- curate predictions of performance in one pro- gorithms that can have substantial impact on gramming environment based on observations of practice. It is clear, however, that our analytical running times in another. Experimental analysis of techniques are far too weak to answer all our ques- the abstraction allows us to have more control tions about algorithms. Computational experi- over the trade-off between generality and accuracy ments are being used to suggest new directions for when making predictions about performance. research, to support or disprove conjectures, and An answer to the second question is straight to guide the development of new analytical tools from the mathematician: algorithms and their and proof strategies. analyses are beautiful and fundamental, and they This article presents examples from two broad deserve study by any means available, including research efforts in experimental algorithmics: first, experimentation. to develop accurate models of computation that Certainly algorithms existed long before the allow closer predictions of performance, and sec- first computing machine was a gleam in Charles ond, to extend abstract analyses beyond tradi- Babbage’s eye. For example, Euclid’s Elements, circa tional questions and assumptions. We start with 300 B.C., contains an algorithm for finding the a short tutorial on the notations and typical results greatest common divisor of two numbers. While obtained in algorithm analysis. many algorithms have been discovered and pub- lished over the centuries, it is only with the advent A Tutorial on Algorithm Analysis of computers that the notion of analyzing algo- The selection problem is to report the r th-smallest rithmic efficiency has been formalized. number in a collection of n numbers. For example, Perhaps the first real surprise in algorithm r =1refers to the minimum of the collection, and analysis occurred with Strassen’s discovery in 1968 r =(n +1)/2 is the median when n is odd. For con- of a new method for multiplying two n n ma- venience we assume that no duplicate numbers ap- trices. While the classic algorithm we all learned pear in the collection. in high school requires 2n3 n2 scalar arithmetic Figure 1 presents a well-known selection algo- operations, Strassen’s algorithm uses fewer than rithm. The n numbers are placed in an array called log2 7 2 7n 6n operations, where log2 7 < 2.808. A in no particular order. Each number has some Therefore, this new algorithm uses fewer opera- position from 1 to n in the array: the notation A[i] tions than the classic method when n is greater refers to the number in position i, and i is called than 654. Strassen’s discovery touched off an in- an index. The notation ` lo means “set ` equal tensive search for asymptotically better matrix to the value of lo”. multiplication algorithms. The current champion The main operation of algorithm S is to choose requires no more than cn2.376 scalar operations for a partition element x from a contiguous subarray a known constant c. It is an open question whether of A defined by indices lo and hi and to partition better algorithms exist. (See any algorithms text- the subarray by rearranging its contents so that book, such as [1], for more about matrix multipli- numbers smaller than x are to its left and num- cation.) These fancy algorithms are not much use bers larger are to its right. This puts x at some lo- in practice, however. It would be a daunting cation p; that is, A[p]=x. After partitioning, we prospect even to write an error-free program for know that x is the pth-smallest number in A. We one of them, and the extra computation costs im- are looking for the r th-smallest number: depend- posed by their complexity make them unlikely to ing on the relationship of p to r, the algorithm ei- outperform the classic method in typical scenar- ther stops or repeats the process on one side or ios. the other of p. Algorithm analysis is a vigorous and vital sub- To analyze the algorithm, we derive a function discipline of computer science, providing deep that relates input size to the cost of the compu- new insights into the fundamental power and lim- tational resources used by the algorithm. The pre- itations of computation, fodder for the develop- cise meanings of “input size” and “cost” depend

MARCH 2001 NOTICES OF THE AMS 305 fea-mcgeoch.qxp 2/2/01 11:58 AM Page 306

upon the model of computation being assumed. Definition: f (n) ∈ O(g(n)) if there exist positive con- Here we shall use the simple RAM (random access stants c and n0 such that f (n) c g(n), ∀n n0. machine) model under which all scalar numbers Definition: f (n) ∈ (g(n)) iff g(n) ∈ O(f (n)). have unit size and all basic operations on scalar values—such as arithmetic, comparison, and copy- The statements below are typical of the kinds ing of values—have unit cost. Therefore the input of results obtained in classic algorithm analysis. size is n. • Any correct selection algorithm must at least For our purposes it is sufficient to define cost examine every element of A. Therefore, any al- as the number of times x must be compared to el- gorithm that solves the selection problem must ements from A[lo...hi] during the partitioning have a worst-case cost function in (n). This step. A partitioning method is known (described is called a lower bound on the problem of se- later in this article) that uses hi lo comparisons lection. ∈ 2 to partition subarray A[lo...hi] of size hi lo +1. • It is easy to see that Cw (n) O(n ). We have 2 The only problem remaining is to count up total a complexity gap between the O(n ) worst- costs. case bound and the (n) lower bound on the Obviously, the total cost depends on which par- problem. Complexity gaps generate research tition element x = A[lo] is used each time: we may questions: Does an O(n) algorithm exist? Or get lucky and find p = r after just one partitioning should the lower bound be raised to, say, 2 operation, or we may have to repeat the process (n ) , because selection is fundamentally harder than the first lower bound indicates? several times. The worst-case cost, Cw (n), is the maximal number of comparisons over all arrays of Or is the lower bound somewhere in between, perhaps at (n log n)? size n and all r: a worst-case scenario holds, for • In fact, a variation on algorithm S is known that example, when r = n and p becomes 1, 2,... ,n in has O(n) worst-case cost, thus closing the successive partitioning stages. Letting t denote complexity gap. The variation uses an elabo- the cost of a single comparison of x to an array el- P rate strategy for choosing the partition element ement, we have C (n)= n t(n i). w i=1 at each stage and is generally considered too The average-case cost, C (n, r), is found by av- a complicated and expensive to be useful in eraging over some probability distribution on ar- practice. rays of size n. Assume here that every number in The goal of the asymptotic analyses above is to the collection is equally likely to be in position lo place cost functions into complexity classes, while and selected as the partition element x. Then we the goal of exact analysis is to find closed forms have: that retain constant factors in the leading terms. Ca(1, 1) = 0 A closed form for Cw is easy to obtain. Also, it can C (n, r)=t(n 1) be shown that: a  rX1 1  Ca(n, r)=t [2 [(n +1)Hn (n +2 r)Hn+1r + Ca(n p, r p)+ n p=1  (r +1)Hr + n +5/3] Xn  rn/3 r1/3 2rnr1/3] , Ca(n p, r) . th p=r+1 where Hn is the n harmonic number and rn is the Kronecker delta function. This implies that Note that establishing the correctness of this re- Ca(n, r) ∈ O(n) (see Knuth [5], Exercise 5.2.2.31, for cursive formula requires a proof (which exists) details). that the partitioning operation preserves the prop- erty that each number in the subset is equally Memory Sensitive Models of Computation likely to end up in position A[lo] and be chosen as In the good old days (say around 1970) it was pos- the partition element at a later stage. sible to obtain quite accurate predictions of pro- Note also that the cost of this, and any, algorithm gram running time by inserting appropriate time is described by a collection of cost functions that units for the constants (like t above) in an exact correspond to different scenarios. Another cost analysis using the RAM model of computation. function might be obtained with other probabilis- Nowadays, however, the RAM model is inadequate tic assumptions, or a different definition of cost to the task. might be used under some other model of com- For example, consider the seemingly innocu- putation. ous instruction x A[lo] in Step 2 of our exam- The first goal in algorithm analysis is the ple algorithm. The actual work to carry out this classification of cost functions according to com- instruction is performed in a computer by the Cen- plexity classes as defined below. Throughout, we tral Processing Unit (CPU). The instruction involves assume that f (n) and g(n) map nonnegative inte- about three CPU operations: (1) calculate the mem- gers to positive real numbers. ory address of A[lo], (2) fetch the value from that

306 NOTICES OF THE AMS VOLUME 48, NUMBER 3 fea-mcgeoch.qxp 2/2/01 11:58 AM Page 307

memory address, and then (3) store the value at If A[lo] is not in a register, it may be in the the memory address associated with x. (Normally, Level I or Level II cache. Typical cache access times array addresses need to be calculated, but scalar on workstations are around 5 to 10 nanoseconds. addresses do not.) Cache sizes and eviction policies are part of the A fast modern CPU is capable of performing overall design of the computer. If not in a cache, each operation in about a nanosecond. But the A[lo] is likely to be in main memory, with access CPU may be forced to wait upon a slow memory, times around 50 to 100 nanoseconds. But it is pos- and the additional time needed for the fetch and sible that A[lo] has been placed in secondary mem- store operations can vary by enormous factors, de- ory, which may have access times greater than pending on exactly where A[lo] and x reside in the 106 nanoseconds. Main-to-secondary eviction poli- memory hierarchy. Figure 2 gives a simplified di- cies are incorporated into the operating system. agram of a memory hierarchy ranging from a small, Whereas the abstract RAM model assigns unit fast, and expensive register set, to a huge, slow, and cost to one access of A[lo], true memory access inexpensive secondary memory. times can vary by factors as large as a million. Without a decent model of the memory hierarchy, we cannot predict whether a given program will CPU take one minute or two years to run. (Of course, this worst-case scenario is not typical, but pre- Registers dictions about program running times that are off by three or more orders of magnitude are not un- Level I Cache usual.) Several threads of research in experimental al- Level II Cache gorithmics are concerned with developing new models of computation that incorporate aspects Main Memory of the memory hierarchy, especially with respect to caches and to policies at the main-to-secondary Secondary Memory memory levels. In 1996, for example, LaMarca and Ladner in- Figure 2: The memory hierarchy, simplified. troduced a RAM model with a two-layer memory This drawing is not to scale. (cache and main memory). They and several other researchers have since been able to reanalyze clas- The idea is to keep values that the CPU needs sic algorithms and data structures under the new close to it in the fastest memory, but of course not model. These results, obtained through a combi- everything will fit. Therefore, memories are de- nation of analytical and experimental approaches, signed to retain values that will be needed soon and have produced new theorems and new analysis to evict unneeded values by moving them back to techniques, better predictions of program perfor- slower memory. It is not always possible to pre- mance, faster programs, and clear indications that dict which values will be needed next, so most our “common sense” understanding of what con- memory levels operate under some kind of prin- tributes to program efficiency is due for revision. ciple of locality, which says that values that have To get a feeling for the type of mathematics in- been recently accessed, or that are near values re- volved, consider reanalyzing our selection algo- cently accessed, are likely to be needed soon. There rithm under the new two-layer model. We need to may be a different version of this principle at each be more explicit about the partition step: Figure 3 level of the hierarchy, and wide variations exist in shows one well-known partitioning method that computing environments with respect to number performs hi lo array-element comparisons for of levels, sizes of memories, and the eviction poli- subarray A[lo...hi] of size hi lo +1. cies adopted. The new model organizes main memory and the Consider now the cost of fetching the value cache into contiguous blocks of scalar values. Main A[lo]. If that value has been used recently, it may memory contains M blocks named m0 through be in a register, and the fetching cost is negligible. mM1, and the cache holds C blocks named c0 to Decisions about which elements get to stay in reg- cC1, with C<compiler, a trans- element i which resides in block mx. Then i and lation program that converts the original program the entire block containing it are transferred to to machine-readable language. It is an NP-hard cache block cy, where y x mod C. This transfer problem (about which more later) to make optimal incurs a cost of tout, but any subsequent access of decisions about placing variables in registers; there- an element in block cy will have smaller cost tin. fore efficient optimal algorithms are not likely to This block will be evicted from the cache and be found. Modern compilers incorporate heuristic sent back to main memory when another element strategies for deciding which values are to be is accessed from some block mz such that stored in registers. y z mod C.

MARCH 2001 NOTICES OF THE AMS 307 fea-mcgeoch.qxp 2/2/01 11:58 AM Page 308

Partition. 2.1. Set x A[lo]. Set ` lo +1 and h hi. Iterate the following three steps. 1. Use ` to scan up from the left, comparing each A[`] to x. Stop when either A[`] >xor ` = h. 2. Use h to scan down from the right, comparing each number A[h] to x. Stop when A[h]

Figure 3. Partitioning A[lo...hi] around x = A[lo].

Now suppose subarray A[i...j] resides in mem- from several weeks to a few hours have been re- ory block mx. The access cost for a particular array ported in the literature. element A[k], with i k j, is therefore • tout if this is the first access of A[k] or any of Heuristics for NP-Hard Problems its block neighbors A[i...j] . We now turn to another research thread in exper- • tout if an array element from any block y such imental algorithmics. The most interesting ques- that y x mod C has been accessed more re- tion about any solvable computational problem is cently than any element from block x (caus- whether it is tractable or intractable. A problem is ing an eviction). tractable if it can be solved in polynomial time; that • tin otherwise. is, there exists an algorithm for the problem that Homework Problem. Write new formulas for has worst-case cost in O(nk) for some constant k. Cw (n) and Ca(n, r) under this model. Perform an A problem is intractable if no such algorithm ex- exact analysis on both formulas. Would an alter- ists. For convenience we define intractable prob- native partitioning method, say one that traverses lems to be those having exponential lower bounds array elements from left-to-right rather than both- in (cn) for some constant c>1 and ignore the ends-toward-the-middle, give a lower memory ac- fact that there exist functions like nlog n that are cess cost? neither polynomial nor exponential. Clearly even this simple two-level model greatly Perhaps the most important technical idea to complicates the analysis task. But the tighter analy- arise from the study of algorithms and problem sis does appear to be well worth the effort in many complexity is the identification of problems that cases. Jon Bentley (personal communication) are NP-complete and NP-hard. Hundreds of prac- reports that simple program modifications can re- tical problems that arise naturally in all spheres duce cache effects and improve running times by of industry, commerce, science, and government factors as large as 16. For a nice introduction to work have been identified as NP-complete and/or memory-sensitive models of computation, see NP-hard. The technical definitions of these two LaMarca and Ladner’s [6] analysis of four sorting problem classes are rather too involved to fit into algorithms under the cacheing model. They draw this article, but the point is this: it is an open ques- conclusions about efficiency that flatly contradict tion whether these problems are tractable or in- a common lore based on traditional analyses and tractable. That is, for every problem in these classes obtain much tighter predictions of program run- there is a huge complexity gap between an expo- ning times than had been possible before. nential worst-case bound on the best algorithm Another research effort in memory-sensitive known, and a polynomial lower bound on the prob- analysis concerns algorithms for data sets that lem complexity. Do efficient algorithms exist for must reside in secondary memory because they are these problems, or are they fundamentally too too large to fit in main memory. Important appli- hard to be solved in polynomial time? cations for such algorithms include Web search en- For technical reasons, “NP-complete” refers to de- gines that peruse directories containing hundreds cision problems, which are always phrased as yes- of millions of entries, many-body simulation al- or-no questions (Does input instance I have prop- gorithms for research problems in the natural sci- erty P?). “NP-hard” can refer to more general types ences, and algorithms for image processing and sci- of problems, such as optimization problems that in- entific visualization. The RAM model can also be volve minimizing or maximizing some quantity as- extended to models that account for data trans- sociated with the desired solution. The two classes fers between main and secondary memory (which are related as follows: Let X and Y be NP-complete follow different rules from caches). New algorithms problems, and let Z be NP-hard but not a decision and analyses of old algorithms under these new problem. Then (1) a polynomial time algorithm for models can result in huge improvements to pro- X exists if and only if one exists for Y; (2) if a poly- gram running times. Reductions of running times nomial time algorithm to produce optimal solutions

308 NOTICES OF THE AMS VOLUME 48, NUMBER 3 fea-mcgeoch.qxp 2/2/01 11:58 AM Page 309

for Z exists, then polynomial algorithms exist for Graph-coloring is a generalization of the famous X and Y. map-coloring problem to nonplanar graphs. Thus, discovery of a polynomial-time algorithm Graph-coloring is of interest to a university reg- for any NP-complete or NP-hard problem would istrar who assigns times (colors) to courses (nodes), have profound implications about the complexity under the constraint that no two courses with the of several hundred other problems of great prac- same professor can meet at the same time. It is of tical interest, besides earning the discoverer a cool interest to anyone who needs to create work sched- million dollars. (To learn more about the million- ules or timetables that avoid certain kinds of con- dollar Millennium Prize Problems announced by the flicts. And graph-coloring arises in the context of Clay Mathematics Institute, visit www.claymath. the compiler task, mentioned earlier in this arti- org/prize_problems/.) Proof of an exponential cle, of assigning values to registers so as to mini- lower bound for any NP-complete problem would mize memory access costs. reverberate similarly. To learn more about the the- Recall that the CPU can work only on values that ory of NP-completeness, see the classic text by are in registers. Memory costs are incurred when Garey and Johnson [3]. Three famous NP-hard a value must be transferred from another memory to a register (here we assume that this transfer cost problems are described below. is constant). Traveling Salesperson. You are given a graph Consider the sequence of instructions that oc- or digraph G =(V,E) with positive-valued weights curs at the beginning of our selection algorithm S: on its edges. A Hamiltonian tour of G is a closed path over vertices and edges that visits each ver- (1) lo 1 tex of G exactly once. The cost of the tour is the (2) hi n sum of the weights of edges traversed on the path. (3) x A[lo] The problem is to find a minimum-cost Hamil- (4) ` lo +1 tonian tour of any given graph G. (5) h hi This problem is of interest to traveling sales- persons who wish to tour all cities in a sales re- gion while minimizing total travel cost. It is also Two values interfere if one must be accessed be- of interest to any organization that must sched- tween two accesses of the other. When two values ule regular tours of delivery trucks, like the U.S. interfere, we want to place them in different registers so as to avoid the cost of evicting one to make Post Office and your favorite grocery store chain. room for the other. For example, lo and hi inter- Bin Packing. Given a list of n weights, all from fere because accesses of hi on line (2) come between the real number range (0, 1), the problem is to or- accesses of lo on lines (1) and (3) and also because ganize the weights into unit-capacity bins so as to lo is accessed between lines (2) and (5). If both lo minimize the number of bins used. For example, and hi were assigned to register R1, then these val- the weight list containing 0.4, 0.3, 0.6, 0.5 could be ues would incur the following memory transfer grouped into (0.4, 0.3), (0.6), (0.5), which uses three costs: (1) fetch lo; (2) evict lo, fetch hi; (3) evict hi, bins, or into (0.4, 0.6), (0.3, 0.5), which uses only fetch lo; (5) evict lo, fetch hi; for a total of seven two bins. memory transfers. These two values could coex- This problem is of interest to anyone walking ist peacefully if placed in registers R1 and R2, al- into a lumberyard with a list of board lengths though this might cause interference difficulties needed for a construction project who must group with other values. the lengths so as to minimize the number of stan- More generally, given a sequence of instruc- dard 8-foot boards that must be purchased. It is tions in a program, we can build an interference of interest to any construction supply firm that graph G where nodes represent program values must cut stock from unit-sized pieces. The prob- and an edge is placed between two nodes if their lem is also of interest to anyone who wants to values interfere. The problem is to assign a mini- back up files by copying onto unit-sized floppy mum number of registers (colors) to values (nodes) disks while minimizing the total number of disks such that interfering values receive distinct regis- needed. A two-dimensional version of the problem ter assignments. (where weights come in (x, y) pairs) is of interest As is the case with all NP-hard problems, no to any printer or pattern cutter who must lay out polynomial-time algorithm for finding minimal parts on unit-sized rectangular sheets so as to colorings is known, and it is not known whether minimize the total number of sheets needed. such an algorithm exists. While the theoreticians Graph-Coloring. Given a graph G, a coloring of struggle with the general problem, the compiler G is an assignment of colors to vertices under the writers must settle for coloring algorithms that use constraint that no two vertices sharing an edge can a “small number” of colors, if not the absolute have the same color. The graph-coloring problem minimum. Dozens of algorithms have been pro- is, for any given graph G, to find a coloring that posed, which fall into some broad categories, minimizes the total number of distinct colors used. broadly sketched in the list below. (Colors are

MARCH 2001 NOTICES OF THE AMS 309 fea-mcgeoch.qxp 2/2/01 11:58 AM Page 310

assumed to be numbered 1 through k, for some best in which kinds of scenarios. appropriate k.) Indeed, graph-coloring has been the focus of Greedy Algorithms. A greedy algorithm may what might be termed a new mode of research in iterate over nodes, assigning to each the least- algorithmic studies: a coordinated, multiparticipant numbered color that does not violate any edge effort to identify the state-of-the-art in algorithmic constraint. Or an algorithm may iterate over col- performance for some given problem. Since 1992 ors, assigning each color to a large independent set the DIMACS Center (an NSF- and corporate-funded (finding a maximum independent set is an NP- center for research in discrete mathematics and hard problem). Greedy algorithms are often re- theoretical computer science, located at Rutgers peated with different node or color orderings each University) has sponsored an annual Implementa- time, and the best solution found is saved. These tion Challenge to encourage and promote experi- algorithms are fast but may produce colorings mental research on algorithms for specific prob- using many more colors than necessary. lems. The focus on a particular problem allows a Branch-and-Bound. Systematically generate col- great deal more coordination and interaction than orings of G, avoiding redundant colorings and un- might otherwise be possible by independent re- productive starts, and save the best one found searchers. within a given time limit. Note that even on mod- The second DIMACS Challenge (1993) [4] con- erate-sized graphs only a tiny fraction of colorings cerned algorithms for three NP-hard problems: can be generated from the enormous space of pos- graph-coloring, finding large cliques in graphs, sibilities; various strategies for generating the best and finding satisfying truth assignments for ones first may be considered. Boolean formulas. While a great deal of progress Heuristic Search Methods. Start with a valid col- was made, one conclusion drawn at that work- oring. “Step” to another valid coloring by randomly shop was that huge gaps remain in our under- modifying the current coloring in some small way standing of the “best algorithms” for particular ap- (perhaps by changing the color of one node). Con- plications. Much more work remains to be done. tinue stepping until a good coloring is found. Steps Besides the DIMACS Challenges, several other toward better colorings are of course preferred, but venues for experimental research on algorithms it is important occasionally to allow steps toward have appeared in the past decade. Two annual worse colorings to avoid being trapped in a local conferences (WAE in Europe and ALENEX in North minimum. A rich variety of stepping rules and America), the electronic Journal on Experimental strategies for controlling the stepping process Algorithmics, and numerous small workshops have have been proposed. all been developed to provide forums for re- These general strategies can be modified to fit searchers to report on their experimental obser- many varieties of NP-hard problems, including vations. Traveling Salesperson and Bin Packing. In some cases (like these two), strategies that exploit Questions of Methodology special problem structures are known that out- Research in experimental algorithmics produces perform the general techniques. Note that the new understanding of models, problems, algo- analysis of algorithms for NP-hard problems usu- rithms, and program efficiency. Another important ally involves finding functions and complexity component of experimental algorithmics has been classes for two interesting quantities: algorithm ef- the development of methods and techniques for ficiency and the quality of solution produced by performing experiments on algorithms. As sug- the algorithm. gested in the beginning of this article, the rela- In the case of heuristic algorithms like those tionship between algorithms and programs and the sketched above, average-case analyses are usual types of questions asked about algorithms extremely difficult to obtain even under the sim- are not typical of those that arise in mathematical plest computational and probabilistic models. An- modeling. Consequently, new problems of de- alytical statements about how a coloring algorithm signing and conducting experiments and of de- performs on, say, random graphs with edge prob- veloping appropriate statistical tools can be iden- ability p, are rare, and statements about perfor- tified. Some research questions concerning mance on “graph structures typical of register methodologies for experimental analysis of algo- interference problems” are far beyond our ana- rithms are sketched below. See [7] for more dis- lytical capabilities. cussion. Computational experiments can be used to en- How to Escape the Artifact? We study a perfect rich our understanding of these heuristic algo- abstract object (the algorithm) using an imperfect rithms for NP-hard problems. Experiments have measuring device (the computer). Difficulties been used to direct the discovery of new algo- associated with numerical precision, nonrandom- rithms and new analyses, to allow prediction of ness in pseudo-random number generators, and algorithmic responses to variations in input inaccurate measuring tools can produce spurious properties, and to sort out which approaches work and erroneous results. Tools for measuring pro-

310 NOTICES OF THE AMS VOLUME 48, NUMBER 3 fea-mcgeoch.qxp 2/2/01 11:58 AM Page 311

gram running times are notoriously quirky, and in while the O(n) implementation with huge coeffi- some contexts it can be quite difficult to find a de- cients took only 32 minutes. finition of “cost” that is useful, measurable, and This and other experiences reported in the lit- generalizable. New techniques are needed for iden- erature suggest some rough guidelines about the tifying and minimizing the impact of artifact on relative importance of algorithmic and environ- observation. mental factors: In typical scenarios changes in en- Sampling from Large and Infinite Spaces. It is vironment (such as switching computers, compiler not possible to measure the performance of a optimization level, or operating system; or im- given graph-coloring algorithm over the space of proving programmer skills) can affect running all graphs of a given size n, nor is it always feasi- times by factors in ranges between 2 and 100. Im- ble to sample from the space of all colorings of a provements on a larger scale can be obtained given graph. We need good methods for defining through asymptotic improvements to algorithms, and generating input classes and input samples to by factors of O(n) or more, and algorithmic improvements by factors below, say, O(log n) are not support observations that can be generalized out- likely to have much real impact on performance. side the realm of observation. For example, very lit- Continuing research in the new and rapidly de- tle is known about methods for extrapolating from veloping field of algorithm experimentation will a finite data set to obtain the kind of asymptotic allow us to refine these guidelines, to produce bet- statements about complexity classes (e.g., does 2 ter understanding of environmental effects, and to the data come from a function that is O(n )?) that generate better algorithms and tighter analyses. are of interest to computer scientists. Acknowledgment. I thank the editor, Harold New Data Analysis Techniques. Computational Boas, and an anonymous reviewer for several sug- experiments on algorithms can produce enormous gestions for improving the clarity of this article. data sets. Some sets are too massive to be amenable to standard statistical techniques, so we must References either make do with fewer measurements or [1] S. BAASE and A. VAN GELDER, Computer Algorithms: In- develop faster techniques. Computational experi- troduction to Design and Analysis, Third Edition, Ad- ments can be very interactive and iterative, and it dison-Wesley, 2000. A very readable undergraduate seems likely that new methods of data analysis can textbook. be developed to take advantage of these [2] J. L. BENTLEY, Programming Pearls, Second Edition, ACM Press and Addison-Wesley, 2000. Short articles properties. about the interface between theory and practice in computer programming. How Important Are Algorithms? [3] M. R. GAREY and D. S. JOHNSON, Computers and In- As the discussion in this article suggests, suffi- tractability: A Guide to the Theory of NP-Complete- ciently precise results for all but the simplest al- ness, W. H. Freeman and Company, 1979. A classic gorithms, input models, and models of computa- introduction to the field. tion are quite difficult to obtain by purely analytical [4] D. S. JOHNSON and M. TRICK, eds., Cliques, Coloring, methods. With our current techniques, asymptotic and Satisfiability: Second DIMACS Implementation Challenge, DIMACS Series in Discrete Mathematics analyses of algorithms are just barely adequate for and Theoretical Computer Science, vol. 26, Amer. making very rough predictions about program run- Math. Soc., 1996. Papers presented at the workshop. ning times. Some researchers have wondered if [5] D. E. KNUTH, The Art of Computer Programming, Vol- we would be better off discarding our analytical ume 3: Sorting and Searching, Addison-Wesley, 1973. models and building new empirical models based Classic results in algorithm analysis. entirely on observations. [6] A. LAMARCA and R. E. LADNER, The influence of caches In an experiment to test the relevance of big-oh on the performance of sorting, Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algo- algorithm analysis to practice, Bentley [2] describes rithms, ACM, 1997, pp. 370–379. A very readable a race between two algorithms implemented under technical paper. extreme conditions. He implemented an O(n) al- [7] C. MCGEOCH, Towards an experimental method for gorithm (under the RAM model) using 1980’s algorithm simulation, INFORMS J. Comput. 8 (Win- low-end technology, which required 19.5n ter 1995). A survey of methodological issues, with milliseconds to run, and an O(n3) algorithm in an extensive bibliography. high-end 1999 technology, which required 0.58n3 nanoseconds to run. At small n the cubic algorithm dominates, and at large n the linear algorithm dominates: Bentley found that even with this maximal difference in computing environments, the crossover point was low, near n =5, 800, where both implementations required around 2 minutes of running time. At n =105 the O(n3) implementation with small coefficients required 7 days,

MARCH 2001 NOTICES OF THE AMS 311