Average-Case Analysis of Algorithms and Data Structures L’Analyse En Moyenne Des Algorithmes Et Des Structures De Donn�Ees

(page i) Average-Case Analysis of Algorithms and Data Structures L'analyse en moyenne des algorithmes et des structures de donn¶ees Je®rey Scott Vitter 1 and Philippe Flajolet 2 Abstract. This report is a contributed chapter to the Handbook of Theoretical Computer Science (North-Holland, 1990). Its aim is to describe the main mathematical methods and applications in the average-case analysis of algorithms and data structures. It comprises two parts: First, we present basic combinatorial enumerations based on symbolic methods and asymptotic methods with emphasis on complex analysis techniques (such as singularity analysis, saddle point, Mellin transforms). Next, we show how to apply these general methods to the analysis of sorting, searching, tree data structures, hashing, and dynamic algorithms. The emphasis is on algorithms for which exact \analytic models" can be derived. R¶esum¶e. Ce rapport est un chapitre qui para^³t dans le Handbook of Theoretical Com- puter Science (North-Holland, 1990). Son but est de d¶ecrire les principales m¶ethodes et applications de l'analyse de complexit¶e en moyenne des algorithmes. Il comprend deux parties. Tout d'abord, nous donnons une pr¶esentation des m¶ethodes de d¶enombrements combinatoires qui repose sur l'utilisation de m¶ethodes symboliques, ainsi que des techniques asymptotiques fond¶ees sur l'analyse complexe (analyse de singularit¶es, m¶ethode du col, transformation de Mellin). Ensuite, nous d¶ecrivons l'application de ces m¶ethodes g¶enerales a l'analyse du tri, de la recherche, de la manipulation d'arbres, du hachage et des algorithmes dynamiques. L'accent est mis dans cette pr¶esentation sur les algorithmes pour lesquels existent des hh modeles analytiques ii exacts. 1 Dept. of Computer Science, Brown University, Providence, R. I. 02912, USA. Research was also done while the author was on sabbatical at INRIA in Rocquencourt, France, and at Ecole Normale Sup¶erieure in Paris. Support was provided in part by NSF research grant DCR{84{ 03613, by an NSF Presidential Young Investigator Award with matching funds from an IBM Faculty Development Award and an AT&T research grant, and by a Guggenheim Fellowship. 2 INRIA, Domaine de Voluceau, Rocquencourt, B. P. 105, 78153 Le Chesnay, France. i (page ii) Table of Contents 0. Introduction . 1 1. Combinatorial Enumerations . 5 1.1. Overview . 5 1.2. Ordinary Generating Functions . 7 1.3. Exponential Generating Functions . 10 1.4. From Generating Functions to Counting . 13 2. Asymptotic Methods . 15 2.1. Generalities . 15 2.2. Singularity Analysis . 17 2.3. Saddle Point Methods . 20 2.4. Mellin Transforms . 22 2.5. Limit Probability Distributions . 25 3. Sorting Algorithms . 28 3.1. Inversions . 28 3.2. Insertion Sort . 31 3.3. Shellsort . 31 3.4. Bubble Sort . 37 3.5. Quicksort . 39 3.6. Radix-Exchange Sort . 39 3.7. Selection Sort and Heapsort . 40 3.8. Merge Sort . 41 4. Recursive Decompositions and Algorithms on Trees . 43 4.1. Binary Trees and Plane Trees . 43 4.2. Binary Search Trees . 51 4.3. Radix-Exchange Tries . 57 4.4. Digital Search Trees . 60 ii 5. Hashing and Address Computation Techniques . 62 5.1 Bucket Algorithms and Hashing by Chaining . 63 5.2 Hashing by Open Addressing . 76 6. Dynamic Algorithms . 81 6.1. Integrated Cost and the History Model . 81 6.2. Size of Dynamic Data Structures . 83 6.3. Set Union-Find Algorithms . 87 References . 90 Index and Glossary . 97 iii (page 1) Average-Case Analysis of Algorithms and Data Structures Je®rey Scott Vitter and Philippe Flajolet 0. Introduction Analyzing an algorithm means, in its broadest sense, characterizing the amount of com- putational resources that an execution of the algorithm will require when applied to data of a certain type. Many algorithms in classical mathematics, primarily in number theory and analysis, were analyzed by eighteenth and nineteenth century mathematicians. For instance, Lam¶e in 1845 showed that Euclid's GCD algorithm requires at most logÁ n di- vision steps (where Á is the \golden ratio" (1 + p5 )=2) when applied to numbers¼ bounded by n. Similarly, the well-known quadratic convergence of Newton's method is a way of describing its complexity/accuracy tradeo®. This chapter presents analytic methods for average-case analysis of algorithms, with special emphasis on the main algorithms and data structures used for processing nonnu- merical data. We characterize algorithmic solutions to a number of essential problems, such as sorting, searching, pattern matching, register allocation, tree compaction, retrieval of multidimensional data, and e±cient access to large ¯les stored on secondary memory. The ¯rst step required to analyze an algorithm is to de¯ne an input data model and a complexity measure: A 1. Assume that the input to is data of a certain type. Each commonly used data type carries a natural notion ofAsize: the size of an array is the number of its elements; the size of a ¯le is the number of its records; the size of a character string is its length; and so on. An input model is speci¯ed by the subset In of inputs of size n and by a probability distribution over In, for each n. For example, a classical input model for comparison-based sorting algorithms is to assume that the n inputs are real numbers independently and uniformly distributed in the unit interval [ 0; 1]. An equivalent model is to assume that the n inputs form a random permutation of 1; 2; : : : ; n . f g 2. The main complexity measures for algorithms executed on sequential machines are time utilization and space utilization. These may be either \raw" measures (such as the time in nanoseconds on a particular machine or the number of bits necessary for storing temporary variables) or \abstract" measures (such as the number of comparison operations or the number of disk pages accessed). 2 / Average-Case Analysis of Algorithms and Data Structures Figure 1. Methods used in the average-case analysis of algorithms. Let us consider an algorithm with complexity measure ¹. The worst-case and best-case complexities of algorithm A over I are de¯ned in an obvious way. Determining A n the worst-case complexity requires constructing extremal con¯gurations that force ¹n, the restriction of ¹ to In, to be as large as possible. The average-case complexity is de¯ned in terms of the probabilistic input model: ¹ [ ] = E ¹ [ ] = k Pr ¹ [ ] = k ; n A f n A g f n A g Xk where E : denotes expected value and Pr : denotes probability with respect to the f g f g probability distribution over In. Frequently, In is a ¯nite set and the probabilistic model assumes a uniform probability distribution over I . In that case, ¹ [ ] takes the form n n A 1 ¹n[ ] = kJnk; A In Xk where In = In and Jnk is the number of inputs of size n with complexity k for algorithm . Average-casej analysisj then reduces to combinatorial enumeration. A The next step in the analysis is to express the complexity of the algorithm in terms of standard functions like n®(log n)¯(log log n) , where ®, ¯, and are constants, so that the analytic results can be easily interpreted. This involves getting asymptotic estimates. The following steps give the route followed by many of the average-case analyses that appear in the literature (see Figure 1): 1. RECUR: To determine the probabilities or the expectations in exact form, start by setting up recurrences that relate the behavior of algorithm on inputs of size n to its behavior on smaller (and similar) inputs. A 2. SOLVE: Solve previous recurrences explicitly using classical algebra. 3. ASYMPT: Use standard real asymptotics to estimate those explicit expressions. Section 0. Introduction / 3 An important way to solve recurrences is via the use of generating functions: 4. GENFUN: Translate the recurrences into equations involving generating functions. The coe±cient of the nth term of each generating function represents a particular probability or expectation. In general we obtain a set of functional equations. 5. EXPAND: Solve those functional equations using classical tools from algebra and analysis, then expand the solutions to get the coe±cients in explicit form. The above methods can often be bypassed by the following more powerful methods, which we emphasize in this chapter: 6. SYMBOL: Bypass the use of recurrences and translate the set-theoretic de¯nitions of the data structures or underlying combinatorial structures directly into functional equations involving generating functions. 7. COMPLEX: Use complex analysis to translate the information available from the functional equations directly into asymptotic expressions of the coe±cients of the generating functions. The symbolic method (SYMBOL) is often direct and has the advantage of characterizing the special functions that arise from the analysis of a natural class of related algorithms. The COMPLEX method provides powerful tools for direct asymptotics from generating functions. It has the intrinsic advantage in many cases of providing asymptotic estimates of the coe±cients of functions known only implicitly from their functional equations. In Sections 1 and 2 we develop general techniques for the mathematical analysis of algorithms, with emphasis on the SYMBOL and COMPLEX methods. Section 1 is devoted to exact analysis and combinatorial enumeration. We present the primary methods used to obtain counting results for the analysis of algorithms, with emphasis on symbolic methods (SYMBOL). The main mathematical tool we use is the generating function associated with the particular class of structures. A rich set of combinatorial constructions translates directly into functional relations involving the generating functions. In Section 2 we discuss asymptotic analysis. We briey review methods from elementary real analysis and then concentrate on complex analysis techniques (COMPLEX ).

Average-Case Analysis of Algorithms and Data Structures L’Analyse En Moyenne Des Algorithmes Et Des Structures De Donn�Ees

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support