Functional Data Structures for Typed Racket

Functional Data Structures for Typed Racket Hari Prashanth K R Sam Tobin-Hochstadt Northeastern University Northeastern University [email protected] [email protected] Abstract supports untagged rather than disjoint unions. Thus, most data Scheme provides excellent language support for programming in structures presented here are implemented as unions of several dis- a functional style, but little in the way of library support. In this tinct structure types. paper, we present a comprehensive library of functional data structures, drawing from several sources. We have implemented the li- 1.2 Interoperation with External Code brary in Typed Racket, a typed variant of Racket, allowing us to Most Scheme programs are neither purely functional nor typed. maintain the type invariants of the original definitions. That does not prevent them from benefiting from the data structures presented in this paper, however. Typed Racket automatically 1. Functional Data Structures for a Functional supports interoperation between typed and untyped code, allowing any program to use the data structures presented here, regardless of Language whether it is typed. Typed Racket does however enforce its type in- Functional programming requires more than just lambda; library variants via software contracts, which can reduce the performance support for programming in a functional style is also required. of the structures. In particular, efficient and persistent functional data structures are Additionally, using these data structures in no way requires needed in almost every program. programming in a purely functional style. An mostly-functional Scheme does provide one supremely valuable functional data Scheme program that does not mutate a list can replace that list structure—the linked list. This is sufficient to support many forms with a VList without any problem. Using functional data structures of functional programming (Shivers 1999) (although lists are sadly often adds persistence and performance without subtracting func- mutable in most Schemes), but not nearly sufficient. To truly sup- tionality. port efficient programming in a functional style, additional data structures are needed. Fortunately, the last 15 years have seen the development of 2. An Introduction to Functional Data Structures many efficient and useful functional data structures, in particular Purely functional data structures, like all data structures, come in by Okasaki (1998) and Bagwell (2002; 2000). These data structures many varieties. For this work, we have selected a variety that pro- have seen wide use in languages such as Haskell and Clojure, but vide different APIs and performance characteristics. They include have rarely been implemented in Scheme. several variants of queues, double-ended queues (or deques), pri- In this paper, we present a comprehensive library of efficient ority queues (or heaps), lists, hash lists, tries, red-black trees, and functional data structures, implemented in Typed Racket (Tobin- streams. All of the implemented data structures are polymorphic in Hochstadt and Felleisen 2008), a recently-developed typed dialect their element type. of Racket (formerly PLT Scheme). The remainder of the paper is The following subsections describe each of these data struc- organized as follows. We first present an overview of Typed Racket, tures, many with a number of different implementations with dis- and describe how typed functional datastructures can interoperate tinct performance characteristics. Each subsection introduces the with untyped, imperative code. Section 2 describes the data struc- data structure, specifies its API, provides examples, and then dis- tures, with their API and their performance characteristics. In sec- cusses each implementation variant and their performance charac- tion 3, we present benchmarks demonstrating that our implemena- teristics. tions are viable for use in real code. We then detail the experience of using Typed Racket for this project, both positive and negative. 2.1 Queues Finally, we discuss other implementations and conclude. Queues are simple “First In First Out” (FIFO) data structures. We 1.1 An Overview of Typed Racket have implemented a wide variety of queues, each with the interface given below. Each queue implementation provides a polymorphic Typed Racket (Tobin-Hochstadt and Felleisen 2008; Tobin-Hochstadt type (QueueA) , as well as the following functions: 2010) is an explicitly typed dialect of Scheme, implemented in Racket (Flatt and PLT 2010). Typed Racket supports both integra- • queue : (8 (A)A* ! (QueueA)) tion with untyped Scheme code as well as a typechecker designed to work with idiomatic Scheme code. Constructs a queue with the given elements in order. In the While this paper presents the API of the functional data struc- queue type signature, 8 is a type constructor used for poly- tures, rather than their implementation, we begin with a brief de- morphic types, binding the given type variables, here A. The scription of a few key features of the type system. function type constructor ! is written infix between arguments First, Typed Racket supports explicit polymorphism, which is and results. The annotation * in the function type specifies that used extensively in the functional data structure library. Type argu- queue accepts arbitrarily many elements of type A, producing ments to polymorphic functions are automatically inferred via lo- a queue of type (QueueA) . cal type inference (Pierce and Turner 2000). Second, Typed Racket • enqueue : (8 (A)A(QueueA) ! (QueueA)) Inserts the given element (the first argument) into the given size is used to implement a data structure which can handle data of queue (the second argument), producing a new queue. unbounded size. Bootstrapped Queues give a worst-case running ∗ 1 • head : (8 (A)(QueueA) ! A) time of O(1) for the operation head and O(log n) for tail and enqueue. Our implementation of Bootstrapped Queues uses Returns the first element in the queue. The queue is unchanged. Real-Time Queues for bootstrapping. • tail : (8 (A)(QueueA) ! (QueueA)) Hood-Melville Queue Hood-Melville Queues are similar to the Removes the first element from the given queue, producing a Real-Time Queues in many ways, but use a different and more com- new queue. plex technique, called global rebuilding, to eliminate amortization from the complexity analysis. In global rebuilding, rebalancing is >(define que(queue -101234)) done incrementally, a few steps of rebalancing per normal opera- > que tion on the data structure. Hood-Melville Queues have worst-case - : (Queue Fixnum) running times of O(1) for the operations head, tail and en- #<Queue> queue. >(head que) - : Fixnum 2.2 Deque -1 >(head(tail que)) Double-ended queues are also known as deques. The difference - : Fixnum between the queues and the deques lies is that new elements of 0 a deque can be inserted and deleted from either end. We have >(head(enqueue 10 que)) implemented several deque variants, each discussed below. All the - : Fixnum deque data structures implement following interface and have the -1 type (DequeA) . Banker’s Queues The Bankers Queues (Okasaki 1998) are amor- • deque : (8 (A)A* ! (DequeA)) tized queues obtained using a method of amortization called the Constructs a double ended queue from the given elements in Banker’s method. The Banker’s Queue combines the techniques order. of lazy evaluation and memoization to obtain good amortized run- • enqueue : (8 (A)A(DequeA) ! (DequeA)) ning times. The Bankers Queue implementation internally uses streams (see section 2.4.4) to achieve lazy evaluation. The Banker’s Inserts the given element to the rear of the deque. Queue provides a amortized running time of O(1) for the opera- • enqueue-front : (8 (A)A(DequeA) ! (DequeA)) head tail enqueue tions , and . Inserts the given element to the front of the deque. Physicist’s Queue The Physicist’s Queue (Okasaki 1998) is a • head : (8 (A)(DequeA) ! A) amortized queue obtained using a method of amortization called the Returns the first element from the front of the deque. Physicist’s method. The Physicist’s Queue also uses the techniques of lazy evaluation and memoization to achieve excellent amortized • last : (8 (A)(DequeA) ! A) running times for its operations. The only drawback of the Physi- Returns the first element from the rear of the deque. cist’s method is that it is much more complicated than the Banker’s • tail (8 (A)(DequeA) ! (DequeA)) method. The Physicist’s Queue provides an amortized running time : of O(1) for the operations head, tail and enqueue. Removes the first element from the front of the given deque, producing a new deque. Real-Time Queue Real-Time Queues eliminate the amortization • ( (A)(DequeA) (DequeA)) of the Banker’s and Physicist’s Queues to produce a queue with ex- init : 8 ! cellent worst-case as well as amortized running times. Real-Time Removes the first element from the rear of the given deque, Queues employ lazy evaluation and a technique called schedul- producing a new deque. ing (Okasaki 1998) where lazy components are forced systemati- cally so that no suspension takes more than constant time to ex- >(define dque(deque -101234)) ecute, assuring ensures good asymptotic worst-case running time > dque for the operations on the data structure. Real-Time Queues have an - : (Deque Fixnum) O(1) worst-case running time for the operations head, tail and #<Deque> enqueue. >(head dque) - : Fixnum Implicit Queue Implicit Queues are a queue data structure im- -1 plemented by applying a technique called implicit recursive slow- >(last dque) down (Okasaki 1998). Implicit recursive slowdown combines lazi- - : Fixnum ness with a technique called recursive slowdown developed by Ka- 4 plan and Tarjan (1995). This technique is simpler than pure recur- >(head(enqueue-front 10 dque)) sive slow-down, but with the disadvantage of amortized bounds on - : Fixnum the running time. Implicit Queues provide an amortized running 10 time of O(1) for the operations head, tail and enqueue. >(last(enqueue 20 dque)) - : Fixnum Bootstrapped Queue The technique of bootstrapping is applica- 20 ble to problems whose solutions require solutions to simpler in- >(head(tail dque)) stances of the same problem.

Load more