An Efficient Programming Model for Memory-Intensive Recursive

An Efficient Programming Model for Memory-Intensive Recursive Algorithms using Parallel Disks Vlad Slavici†∗, Daniel Kunkle‡, Gene Cooperman†*, Stephen Linton♯ † Northeastern University, Boston, MA ‡ Google Inc., New York ♯ University of St. Andrews, St. Andrews, Scotland [email protected], [email protected], [email protected], [email protected] ABSTRACT 1 Introduction In order to keep up with the demand for solutions to prob- In the past 10 years, migrating traditional single-CPU lems with ever-increasing data sets, both academia and in- computations to clusters or supercomputers has become im- dustry have embraced commodity computer clusters with lo- portant, as developers are faced with the task of providing cally attached disks or SANs as an inexpensive alternative to solutions to problems handling increasingly larger data sets. supercomputers. With the advent of tools for parallel disks The expense of supercomputers precludes their wider use. programming, such as MapReduce, STXXL and Roomy — Hence, the majority of academic and commercial institu- that allow the developer to focus on higher-level algorithms tions have embraced commodity computer clusters as a less — the programmer productivity for memory-intensive pro- expensive alternative, albeit with less RAM than supercom- grams has increased many-fold. However, such parallel tools puters. were primarily targeted at iterative programs. To get the most benefit out of the use of commodity clus- We propose a programming model for migrating recur- ters, developers need to use the aggregate disks of the cluster sive RAM-based legacy algorithms to parallel disks. Many in order to solve as large problems as possible. A typical 32- memory-intensive symbolic algebra algorithms are most eas- node commodity cluster might have an aggregate of 32 ter- ily expressed as recursive algorithms. In this case, the pro- abytes of disk, compared to an aggregate of only 512 GB of gramming challenge is multiplied, since the developer must RAM, which is a consequence of disk being two orders of re-structure such an algorithm with two criteria in mind: magnitude less expensive than RAM. converting a naturally recursive algorithm into an iterative However, the modern out-of-core program developer still algorithm, while simultaneously exposing any potential data has a challenging task ahead: migrating traditional, RAM- parallelism (as needed for parallel disks). based algorithms from the CPU to external memory on a This model alleviates the large effort going into the de- per-algorithm/per-problem basis. This work addresses that sign phase of an external memory algorithm. Research in challenge by introducing a general programming model and this area over the past 10 years has focused on per-problem a method of migrating traditional recursive RAM-based al- solutions, without providing much insight into the connec- gorithms to external memory programming. tion between legacy algorithms and out-of-core algorithms. Research Contributions There are two main research con- Our method shows how legacy algorithms employing recur- tributions in this paper. sion and non-streaming memory access can be more easily translated into efficient parallel disk-based algorithms. • A general programming model and method for migrating recursive algorithms to parallel disk-based comput- We demonstrate the ideas on a largest computation of ing its kind: the determinization via subset construction and • A general method of migrating random-acess (that is minimization of very large nondeterministic finite set au- at the heart of most RAM-based algorithms for sequen- tomata (NFA). To our knowledge, this is the largest subset tial programming) to parallel streaming access construction reported in the literature. Determinization for Although there already exist many papers (e.g. see [27, large NFA has long been a large computational hurdle in 28] and their citations) written on a variety of parallel disk- the study of permutation classes defined by token passing based algorithms, the algorithms they describe appear quite networks. The programming model was used to design and different from their RAM-based counterparts, and seem to implement an efficient NFA determinization algorithm that mostly cater to the expert, i.e. the researcher already fa- solves the next stage in analyzing token passing networks miliar with parallel disk-based computing. Moreover, these representing two stacks in series. already existing algorithms are providing little insight on ∗ how to tackle future problems not already described. The This work was partially supported by the National Science research presented here attempts to alleviate these obsta- Foundation under Grant CCF 0916133. cles, by providing a general theory of migrating recursive RAM-based algorithms to parallel disks. There exist a number of tools for parallel disks programming — such as MapReduce [12], Roomy [17] or STXXL [7] — which allow the developer to focus primarily on the algo- Permission to make digital or hard copies of all or part of this work for rithm, while the runtime library takes care of the underlying personal or classroom use is granted without fee provided that copies are details, such as working with the filesystem and sending data not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to over the network. While all these tools increase productiv- republish, to post on servers or to redistribute to lists, requires prior specific ity, they do not address the design task of transforming a permission and/or a fee. RAM-based algorithm into a parallel disk-based one. Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$10.00. We propose a programming model that is independent ity of the programming model. of the programming language/extension used. Unlike pre- In the rest of this paper, Section 2 presents the gen- vious approaches, which focused on a single programming eral programming model and method for migrating recursive language (MapReduce, STXXL, HaLoop [10]), the proposed RAM-based algorithms to parallel disk-based algorithms, to- programming model can be implemented on top of many gether with a discussion on how this method fits with tools tools for parallel disk-based computing, as described in Sec- such as Roomy, MapReduce and STXXL. Section 3 illus- tion 2.2. trates how the theoretical method is applied in practice by As validation of this programming model, we implement looking at the well-know recursive legacy algorithm for con- examples in C, using the Roomy extension library. Roomy verting a non-deterministic finite automaton (NFA) to a de- was chosen because it provides a broad range of primitive terministic finite automaton (DFA) via subset construction, data structures including bit arrays, hash tables, unordered followed by minimizing the DFA. The NFA to minimal DFA sets (RoomyList), and other data structures important for software is then validated experimentally by running it on symbolic algebra. For particular programs, there exist ex- very large inputs obtained from an application in the field tensions of the map-reduce platform (HaLoop [10], Pregel [21], of Token Passing Networks [4]. Section 4 presents a par- and others) that are also general enough to serve as an imple- allel disk-based implementation of the Knapsack problem. mentation platform for the proposed programming model. Section 5 presents related work. The programming model tries to be agnostic about the choice of implementation platform. 2 A General Method for The ideas of this model were motivated by the particular Migrating Recursive Algorithms needs for recursion in symbolic algebra. An important part of many symbolic computations is work- from the CPU to Parallel Disks ing with very large graph-like data structures, that only fit A recursive algorithm starts out with the initial problem, on parallel disks: groups [25], binary decision diagrams [19] and it generates sub-problems whose results are used to solve and others. Very large finite state automata are useful in the initial, larger problem. The generated sub-problems many areas, within symbolic computations and outside of form a callgraph, which is a directed acyclic graph (DAG). it, as discussed in Section 3. The graph has an initial node, called the root, which de- When working with the above mentioned large graph-like scribes the problem (the level 0 sub-problem), and the edges data structures, techniques similar to dynamic programming point from a sub-problem to the smaller sub-problems it de- are often employed (for example, see [19, 25]). Migrating pends upon. dynamic programming to parallel disks is described in Sec- For migrating such a traditional recursive algorithm to tion 2.1 and in our parallel disk-based implementation of the parallel disks, we propose a general method based on travers- 0-1 Knapsack problem in Section 4. ing the problem callgraph. n In the next few paragraphs we discuss possible applica- Figure 1a presents the callgraph of a simple k compu- tions in symbolic computation that are likely to benefit from tation. Here n = 6 and k = 3. Even though there exists a n ` ´ the programming model proposed in this paper. simple formula for k , for the sake of explaining how the method works calculating C(n, k) = n is here done by ` ´ k Possible Applications in Symbolic Computations In com- recursion: putational group theory, a hugely important example is the ` ´ n−1 n−1 , n > , k > matrix recognition project. The matrix recognition project k + k−1 0 0 uses Aschbacher’s classification of the subgroups of the gen- 8 `n =´ n,n` > ´0 eral linear group into 9 categories [3]. For each of the non- n > 1 simple classes, the project provides an algorithm to identify C(n, k) = = > k > n ! > ` ´ , n > a normal subgroup. The matrix recognition algorithm then <> 0 = 1 0 recursively calls itself to identify both the normal subgroup `n´ and the quotient of the original group over the given nor- > = 1, n > 0 > n mal subgroup.

Load more