Pearls of Functional Algorithm Design

Total Page:16

File Type:pdf, Size:1020Kb

Pearls of Functional Algorithm Design This page intentionally left blank PEARLS OF FUNCTIONAL ALGORITHM DESIGN In Pearls of Functional Algorithm Design Richard Bird takes a radically new approach to algorithm design, namely design by calculation. The body of the text is divided into 30 short chapters, called pearls, each of which deals with a partic- ular programming problem. These problems, some of which are completely new, are drawn from sources as diverse as games and puzzles, intriguing combinatorial tasks and more familiar algorithms in areas such as data compression and string matching. Each pearl starts with the statement of the problem expressed using the functional programming language Haskell, a powerful yet succinct language for capturing algorithmic ideas clearly and simply. The novel aspect of the book is that each solution is calculated from the problem statement by appealing to the laws of functional programming. Pearls of Functional Algorithm Design will appeal to the aspiring functional programmer, students and teachers interested in the principles of algorithm design, and anyone seeking to master the techniques of reasoning about programs in an equational style. RICHARD BIRD is Professor of Computer Science at Oxford University and Fellow of Lincoln College, Oxford. PEARLS OF FUNCTIONAL ALGORITHM DESIGN RICHARD BIRD University of Oxford CAMBRIDGE UNIVERSITY PRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo, Delhi, Dubai, Tokyo Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521513388 © Cambridge University Press 2010 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2010 ISBN-13 978-0-511-90044-0 eBook (EBL) ISBN-13 978-0-521-51338-8 Hardback Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. Dedicated to my wife, Norma. Contents Preface page ix 1 The smallest free number 1 2 A surpassing problem 7 3 Improving on saddleback search 12 4 A selection problem 21 5 Sorting pairwise sums 27 6 Making a century 33 7 Building a tree with minimum height 41 8 Unravelling greedy algorithms 50 9 Finding celebrities 56 10 Removing duplicates 64 11 Not the maximum segment sum 73 12 Ranking suffixes 79 13 The Burrows–Wheeler transform 91 14 The last tail 102 15 All the common prefixes 112 16 The Boyer–Moore algorithm 117 17 The Knuth–Morris–Pratt algorithm 127 18 Planning solves the Rush Hour problem 136 19 A simple Sudoku solver 147 20 The Countdown problem 156 21 Hylomorphisms and nexuses 168 22 Three ways of computing determinants 180 23 Inside the convex hull 188 vii viii Contents 24 Rational arithmetic coding 198 25 Integer arithmetic coding 208 26 The Schorr–Waite algorithm 221 27 Orderly insertion 231 28 Loopless functional algorithms 242 29 The Johnson–Trotter algorithm 251 30 Spider spinning for dummies 258 Index 275 Preface In 1990, when the Journal of Functional Programming (JFP) was in the stages of being planned, I was asked by the then editors, Simon Peyton Jones and Philip Wadler, to contribute a regular column to be called Func- tional Pearls. The idea they had in mind was to emulate the very successful series of essays that Jon Bentley had written in the 1980s under the title “Programming Pearls” in the Communications of the ACM. Bentley wrote about his pearls: Just as natural pearls grow from grains of sand that have irritated oysters, these programming pearls have grown from real problems that have irritated program- mers. The programs are fun, and they teach important programming techniques and fundamental design principles. I think the editors had asked me because I was interested in the specific task of taking a clear but inefficient functional program, a program that acted as a specification of the problem in hand, and using equational reasoning to calculate a more efficient one. One factor that stimulated growing interest in functional languages in the 1990s was that such languages were good for equational reasoning. Indeed, the functional language GOFER, invented by Mark Jones, captured this thought as an acronym. GOFER was one of the languages that contributed to the development of Haskell, the language on which this book is based. Equational reasoning dominates everything in this book. In the past 20 years, some 80 pearls have appeared in the JFP, together with a sprinkling of pearls at conferences such as the International Confer- ence of Functional Programming (ICFP) and the Mathematics of Program Construction Conference (MPC). I have written about a quarter of them, but most have been written by others. The topics of these pearls include interesting program calculations, novel data structures and small but elegant domain-specific languages embedded in Haskell and ML for particular applications. ix x Preface My interest has always been in algorithms and their design. Hence the title of this book is Pearls of Functional Algorithm Design rather than the more general Functional Pearls. Many, though by no means all, of the pearls start with a specification in Haskell and then go on to calculate a more efficient version. My aim in writing these particular pearls is to see to what extent algorithm design can be cast in a familiar mathematical tradition of calculating a result by using well-established mathematical principles, the- orems and laws. While it is generally true in mathematics that calculations are designed to simplify complicated things, in algorithm design it is usually the other way around: simple but inefficient programs are transformed into more efficient versions that can be completely opaque. It is not the final program that is the pearl; rather it is the calculation that yields it. Other pearls, some of which contain very few calculations, are devoted to trying to give simple explanations of some interesting and subtle algorithms. Explain- ing the ideas behind an algorithm is so much easier in a functional style than in a procedural one: the constituent functions can be more easily separated, they are brief and they capture powerful patterns of computation. The pearls in this book that have appeared before in the JFP and other places have been polished and repolished. In fact, many do not bear much resemblance to the original. Even so, they could easily be polished more. The gold standard for beauty in mathematics is Proofs from The Book by Aigner and Ziegler (third edition, Springer, 2003), which contains some per- fect proofs for mathematical theorems. I have always had this book in mind as an ideal towards which to strive. About a third of the pearls are new. With some exceptions, clearly indi- cated, the pearls can be read in any order, though the chapters have been arranged to some extent in themes, such as divide and conquer, greedy algor- ithms, exhaustive search and so on. There is some repetition of material in the pearls, mostly concerning the properties of the library functions that we use, as well as more general laws, such as the fusion laws of various folds. A brief index has been included to guide the reader when necessary. Finally, many people have contributed to the material. Indeed, several pearls were originally composed in collaboration with other authors. I would like to thank Sharon Curtis, Jeremy Gibbons, Ralf Hinze, Geraint Jones and Shin-Cheng Mu, my co-authors on these pearls, for their kind generosity in allowing me to rework the material. Jeremy Gibbons read the final draft and made numerous useful suggestions for improving the presentation. Some pearls have also been subject to close scrutiny at meetings of the Algebra of Programming research group at Oxford. While a number of flaws and errors have been removed, no doubt additional ones have been introduced. Apart Preface xi from those mentioned above, I would like to thank Stephen Drape, Tom Harper, Daniel James, Jeffrey Lake, Meng Wang and Nicholas Wu for many positive suggestions for improving presentation. I would also like to thank Lambert Meertens and Oege de Moor for much fruitful collaboration over the years. Finally, I am indebted to David Tranah, my editor at Cambridge University Press, for encouragement and support, including much needed technical advice in the preparation of the final copy. Richard Bird 1 The smallest free number Introduction Consider the problem of computing the smallest natural number not in a given finite set X of natural numbers. The problem is a simplification of a common programming task in which the numbers name objects and X is the set of objects currently in use. The task is to find some object not in use, say the one with the smallest name. The solution to the problem depends, of course, on how X is represented. If X is given as a list without duplicates and in increasing order, then the solution is straightforward: simply look for the first gap. But suppose X is given as a list of distinct numbers in no particular order. For example, [08, 23, 09, 00, 12, 11, 01, 10, 13, 07, 41, 04, 14, 21, 05, 17, 03, 19, 02, 06] How would you find the smallest number not in this list? It is not immediately clear that there is a linear-time solution to the problem; after all, sorting an arbitrary list of numbers cannot be done in linear time.
Recommended publications
  • AFP Assignments (Version: October 16, 2015)
    AFP Assignments (version: October 16, 2015) Atze Dijkstra (and Andres Loh,¨ Doaitse Swierstra, and others) Summer - Fall 2015 Contents 1 Beginners exercises 9 1.1 Beginners training, brief (*) ..........................9 1.1.1 Hello, world! . .9 1.1.2 Interaction with the outside world . 10 1.1.3 The exercise . 12 1.2 Beginners training, extensive (*) ....................... 13 1.2.1 Getting started with GHCi . 13 1.2.2 Basic arithmetic . 14 1.2.3 Booleans . 15 1.2.4 Strings . 16 1.2.5 Types . 18 1.2.6 Lists . 19 1.2.7 Tuples . 22 1.2.8 Currying . 22 1.2.9 Overloading . 25 1.2.10 Numeric types and their classes . 26 1.2.11 Printing values . 28 1.2.12 Equality . 29 1.2.13 Enumeration . 30 1.2.14 Defining new functions . 31 1.2.15 Anonymous functions . 32 1.2.16 Higher-order functions . 34 1.2.17 Operator sections . 36 1.2.18 Loading modules . 37 2 Smaller per topic exercises 39 2.1 Tooling (*) .................................... 39 2.2 Programming . 39 2.3 Monads . 47 2.4 Programming jointly with types and values . 50 2.5 Programming with classes . 53 2.6 Type extensions . 56 2.7 Performance . 58 2.8 Observing: performance, testing, benchmarking . 59 2.9 Reasoning (inductive, equational) . 61 3 Contents 2.10 IO, Files, Unsafety, and the rest of the world (∗∗) ............. 62 2.10.1 IO Unsafety (1) . 62 2.10.2 Server . 63 2.10.3 IO Unsafety (2) . 63 2.11 Generic Programming (***) ......................... 64 2.12 Lists (*) ..................................... 64 2.13 Trees (*) ....................................
    [Show full text]
  • Arxiv:1908.11105V1 [Cs.DS] 29 Aug 2019 the first Two Types of Operations Are Called Updates and the Last One Is a Query
    FunSeqSet: Towards a Purely Functional Data Structure for the Linearisation Case of Dynamic Trees Problem ? Juan Carlos S´aenz-Carrasco The University of Sheffield, S10 2TN UK [email protected] Abstract. Dynamic trees, originally described by Sleator and Tarjan, have been studied deeply for non persistent structures providing O(log n) time for update and lookup operations as shown in theory and practice by Werneck. However, discussions on how the most common dynamic trees operations (i.e. link and cut) are computed over a purely functional data structure have not been studied. Even more, asking whether vertices u and v are connected (i.e. within the same forest) assumes that corre- sponding indices or locations for u and v are taken for granted in most of the literature, and not performed as part of the whole computation for such a question. We present FunSeqSet, based on the primitive version of finger trees, i.e. the de facto sequence data structure for the purely functional programming language Haskell, augmented with variants of the collection (i.e. sets) data structures in order to manage efficiently k-ary trees for the linearisation case of the dynamic trees problem. Dif- ferent implementations are discussed, and the performance is measured. Keywords: purely functional data structures · finger trees · dynamic trees · Euler-tour trees · Haskell 1 Introduction A dynamic tree allows three kinds of (basic) operations : { Insert an edge. { Delete an edge. { Answer a question related to the maintained forest property. arXiv:1908.11105v1 [cs.DS] 29 Aug 2019 The first two types of operations are called updates and the last one is a query.
    [Show full text]
  • Algebras for Tree Algorithms
    t r l o L 0' :.;:. ."'\ If) OJ O. _\ II L U •• 0 o U fj n .- - CL _ :03:'" 0 0" :J 'E ,l;! d Algebras for Tree Algorithms ] eremy Gibbons Linacre College, Oxford A thesis submitted in p"rtia] fulfilment of the requirements for the degree of Doctor of Philosophy "t the University of Oxford September 1991 '\cchnical Monogr<lph PRG-94 ISBN 0-902928,72-4 Programming Research Group Oxford University Computing Laboratory 11 Keblc Road Oxford OX I 3QD England Copyright © 199\ Jeremy Gibbons Author's current address: Department of CompLller Science Univcrsily ofAuckland Privale Bag 92019 Auckland New Zealand Electronic mail: [email protected] 'I\1irJnda' is a trademark of Research Software Ltd. Abstract This thesis presents an investigation into the properties ofvarious alge- bras of trees. In particular, we study the influence that the structure of a tree algebr<l has on the solution ofalgorithmic problems about trees in that algebra. The investigation 1S conducted within the framework pro- vided by the nird-Mccrtcns formalism, a calculus for the construction ofprogT<lms by equation.al reasoning from their specifications. \Vc present three difTcrcnt tree algebras: two kinds ofbinary tree ami a kind of general tree. One of the binary tree algebras, called 'hip trees', is nc'-\'. Instead ofbeing: built with a single ternary operator, hip trees are built with two bjnary operators which respectively add left and right children to trees which do not already have them; these operators enjoy a kind of associativity property. Each of these algebras brings with it with a class of 'structure- respecting' [unctions called catamorphisms; the definition ofa catamor- phism and a number ofits properties come for free from the definition of the algcl)l'a, bcca usc the algebra is chosen to be initial in a class of algebras induced by a (cocontilluous) functor.
    [Show full text]
  • On the Implementation of Purely Functional Data Structures for the Linearisation Case of Dynamic Trees
    On the Implementation of Purely Functional Data Structures for the Linearisation case of Dynamic Trees By: Juan Carlos Sáenz-Carrasco A thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy The University of Sheffield Faculty of Engineering Department of Computer Science October 2019 Abstract Dynamic trees, originally described by Sleator and Tarjan, have been studied in detail for non persistent structures providing O(log n) time for update and lookup operations as shown in theory and practice by Werneck. However, there are two gaps in current theory. First, how the most com- mon dynamic tree operations (link and cut) are computed over a purely functional data structure has not been studied in detail. Second, even in the imperative case, when checking whether two vertices u and v are connected (i.e. in the same component), it is taken for granted that the corresponding location indices (i.e. pointers, which are not allowed in purely functional pro- gramming) are known a priori and do not need to be computed, yet this is rarely the case in practice. In this thesis we address these omissions by formally introducing two new data structures, Full and Top, which we use to represent trees in a functionally efficient manner. Based on a primitive version of finger trees – the de facto sequence data structure for the purely lazy-evaluation program- ming language Haskell – they are augmented with collection (i.e. set-based) data structures in order to manage efficiently k-ary trees for the so-called linearisation case of the dynamic trees problem.
    [Show full text]
  • Discovering Non-Binary Hierarchical Structures with Bayesian Rose Trees
    1 Discovering Non-binary Hierarchical Structures with Bayesian Rose Trees Charles Blundell,y Yee Whye Teh,y and Katherine A. Hellerx yGatsby Unit, UCL, UK. xDepartment of Engineering, University of Cambridge, UK. 1.1 Introduction Rich hierarchical structures are common across many disciplines, making the discovery of hierarchies a fundamental exploratory data analysis and unsupervised learning problem. Applications with natural hierarchical structure include topic hierarchies in text (Blei et al. 2010), phylogenies in evolutionary biology (Felsenstein 2003), hierarchical community structures in social networks (Girvan and Newman 2002), and psychological taxonomies (Rosch et al. 1976). A large variety of models and algorithms for discovering hierarchical structures have been proposed. These range from the traditional linkage algorithms based on distance metrics between data items (Duda and Hart 1973), to maximum parsimony and maximum likelihood methods in phylogenetics (Felsenstein 2003), to fully Bayesian approaches that compute posterior distributions over hierarchical structures (e.g. Neal 2003). We will review some of these in Section 1.2. A common feature of many of these methods is that their hypothesis spaces are restricted to binary trees, where each internal node in the hierarchical structure has exactly two children. This restriction is reasonable under certain circumstances, and is a natural output of the popular agglomerative approaches to discovering hierarchies, where each step involves the merger of two clusters of data items
    [Show full text]
  • Red-Black Trees Important: There Are Two Questions N:O 6 – One About AVL Trees and One About Red-Black Trees
    Data structures and algorithms Re-examination Solution suggestions Thursday, 2020-08-20, 8:30–12:30, in Canvas and Zoom Examiner(s) Peter Ljunglöf, Nick Smallbone, Pelle Evensen. Allowed aids Course literature, other books, the internet. You are not allowed to discuss the problems with anyone! Submitting Submit your answers in one single PDF or Word or OpenOffice document. ​ ​ You can write your answers directly into the red boxes in this file. Pen & paper Some questions are easier to answer using pen and paper, which is fine! In that case, take a photo of your answer and paste into your answer document. Make sure the photo is readable! Assorted notes You may answer in English or Swedish. Excessively complicated answers might be rejected. Write legibly – we need to be able to read and understand your answer! Exam review If you want to discuss the grading, please contact Peter via email. There are 6 basic, and 3 advanced questions, and two points per question. So, the highest possible mark is 18 in total (12 from the basic and 6 from the advanced). Here is what you need for each grade: ● To pass the exam, you must get 8 out of 12 on the basic questions. ​ ​ ● To get a four, you must get 9 out of 12 on the basic questions, plus 2 out of 6 on the advanced. ​ ​ ● To get a five, you must get 10 out of 12 on the basic questions, plus 4 out of 6 on the advanced. ​ ​ ● To get VG (DIT181), you must get 9 out of 12 on the basic, plus 3 out of 6 on the advanced.
    [Show full text]
  • Strike Legal Battle Stalled for Week-End
    M1DDLET0WN- BAYSHORE EDITION VOLUME LXXHI NO. 4» KMl MIDDLETOWN, N. J., FRIDAY, OCTOBER 33. 1»M 7c PER COPY PAGE ONI Objectors Hint Law Suits Strike Legal If Eatontown Adopts Code Battle Stalled EATOMTOWN- law No* New, Amymmy mr tto tf propond No Rt. 35 Widening t a public hairing last night to- For Week-end rW *U Mittttflff 01 dOM w9 Nodedtioawitreactodoatto PHILADELPHIA (AP>-Tto HetJ strike, already measure but a final vote kt ex- w of dollars, moved mto it* Mitt day today aa pected Tuesday when me toar- • mtattty of tto government*! bach to work h> Seen in Red Bank _ wUI to continued at an ad- Ha widening a» Rt II ia Red ouraed meeting at I p. m. ia Baak ia contemplated la tte aetr Cloeftg Dtough HalL Tto legal truce, prompted by arguments yesterday oa me af future, a State Highway Depart- Lawyers _ _ _ _ unctton appeal, At 2 A. M. Sunday owners ia tto araa around tto earry next week. 1 Rt JMtt. II traffic circle draw "Mtpla Aw. (aa it ia kamm Hue year ! Dagftgtt Saving Tto meh> •mm•_—jv- PVJtTtmtfm^mtmA J mmV mmmmVmimmmtmmMt BJtjwajmam* , Castro ia Had Baak). without putt* ttoy calltd tto ordt- is captito ol carryiag four KmMMva at 1 a. as. ammty. •••• "iMtrrlm amsimmmsl BBfam mm mam mmVmmmfl BmmCmC of traffic If, aad when. H lutory" Md '•unreteonibie" aad MAKfN* A POINT— Freeholder Director Joteph C Irwin «a«hire( at he imwm aeceteary." thha k to carry their cheats' Speech without wiosni&g. farther tt tto ordl- a question at lait nlght't candidate*' meeting In Shrawtmiry.
    [Show full text]
  • COSC 3015: Lecture 16
    COSC 3015: Lecture 16 Lectured and scribed by Sunil Kothari 21 October 2008 1 Review We looked at a couple of trees and their corresponding datatypes. For instance, the binary trees are given as: data Btree a = Leaf a | Fork (Btree a) (Btree a) and the binary search trees are given as: data (Ord a) => Stree a = Null | Fork (Stree a) a (Stree a) The induction principle for the Stree is: (P (Null) ^ 8x : a; 8t1; t2 : (Stree a);P (t1) ^ P (t2) ) P (F ork t1 x t2)) ) 8t : Stree a; P (t). Note that we haven't taken into account the constraint on the types. But if we do consider the constraint, the induction principle is: 8 a : T ype; (Ord a) ) [(P (Null) ^ 8x : a; 8t1; t2 : (Stree a);P (t1) ^ P (t2) ) P (F ork t1 x t2))] ) 8t : Stree a; P (t). Today, we will look at two different types of trees: • Binary heap tree • Rose tree 1.1 Binary Heap Tree The binary search tree is given by: data (Ord a) => Htree a = Null | Fork a (Htree a) (Htree a) This looks familiar. In fact, it is very similar to the way a Stree is defined. The only difference is the position of the label in the Stree. In Htree, the label on a non-null node is given before the two subtrees. 1 The book says that the Htree has to satisfy the condition that the label on a given node cannot be greater than label on any subtree of the node. Actually, depending upon the order there are two kinds of binary heap trees: min and max.
    [Show full text]