An E Cient and General Implementation of Futures on Large

Total Page:16

File Type:pdf, Size:1020Kb

An E Cient and General Implementation of Futures on Large An Ecient and General Implementation of Futures on Large Scale SharedMemory Multipro cessors A Dissertation Presented to The Faculty of the Graduate School of Arts and Sciences Brandeis University Department of Computer Science James S Miller advisor In Partial Fulllment of the Requirements of the Degree of Doctor of Philosophy by Marc Feeley April This dissertation directed and approved by the candidates committee has b een ac cepted and approved by the Graduate Faculty of Brandeis University in partial fulll ment of the requirements for the degree of DOCTOR OF PHILOSOPHY Dean Graduate School of Arts and Sciences Dissertation Committee Dr James S Miller chair Digital Equipment Corp oration Prof Harry Mairson Prof Timothy Hickey Prof David Waltz Dr Rob ert H Halstead Jr Digital Equipment Corp oration Copyright by Marc Feeley Abstract An Ecient and General Implementation of Futures on Large Scale SharedMemory Multipro cessors A dissertation presented to the Faculty of the Graduate School of Arts and Sciences of Brandeis University Waltham Massachusetts by Marc Feeley This thesis describ es a highp erformance implementation technique for Multilisps future parallelism construct This metho d addresses the nonuniform memory access NUMA problem inherent in large scale sharedmemory multiprocessors The technique is based on lazy task creation LTC a dynamic task partitioning mechanism that dramatically reduces the cost of task creation and consequently makes it p ossible to exploit ne grain parallelism In LTC idle pro cessors get work to do by stealing tasks from other pro cessors A previously prop osed implementation of LTC is the sharedmemory SM proto col The main disadvantage of the SM proto col is that it requires the stack to b e cached sub optimally on cacheincoherent machines This thesis prop oses a new implementation technique for LTC that allows full caching of the stack the messagepassing MP proto col Idle pro cessors ask for work by sending work request messages to other pro cessors After receiving such a message a pro cessor checks its private stack and task queue and sends back a task if one is available The message passing proto col has the added b enets of a lower task creation cost and simpler algorithms Extensive exp eriments evaluate the p erformance of b oth proto cols on large sharedmemory multiprocessors a pro cessor GP and a pro cessor TC The results show that the MP proto col is consistently b etter than the SM proto col The dierence in p erformance is as high as a factor of two when a cache is available and a factor of when a cache is not available In addition the thesis shows that the semantics of the Multilisp language do es not have to b e imp overished to attain go o d p erformance The laziness of LTC can b e exploited to supp ort at virtually no cost several programming features including the KatzWeise continuation semantics with legitimacy dynamic scoping and fairness Acknowledgements Cette theseest dedieeames grandparents Rose et Emile Monna pour lamour que jai pour eux I wish to thank my family my friends and colleagues without whom this thesis would not have b een p ossible Sp ecial thanks go to Jim Miller my thesis advisor for giving me the freedom to explore my ideas at my own pace He has gone b eyond the call of duty to see me through with my degree Bert Halsteads words of encouragement gave me the condence that my ideas were interesting and worth writing ab out Thank you Bert Sabine Bergler deserves sp ecial thanks for taking care of me To Chris Mauricio Harry Emmanuel Don Shyam Larry Xiru Mary and Paulo thank you for making my stay at Brandeis so enjoyable Finally I wish to thank the National Science and Engineering Research Council of Canada and the Universitede Montrealfor nancial supp ort and Michigan State Uni versity Argonne National Lab oratory Lawrence Livermore National Lab oratory and the MIT AI Lab oratory for the use of their computers Contents Introduction Motivation Why Multilisp Fundamental Issues Architecture SharedMemory MIMD Computers NonUniform Memory Access Sharing Data Caches Memory Consistency The GP and TC Computers Memory Management Dynamic Partitioning Eager Task Creation Lazy Task Creation Overview Background Schemes Legacy FirstClass Continuations Continuation Passing Style Programming with Continuations Multilisps Mo del of Parallelism FUTURE and TOUCH i Placeholders Spawning Trees Types of Parallelism Pip eline Parallelism ForkJoin Parallelism Divide and Conquer Parallelism Implementing Eager Task Creation The Work Queue FUTURE and TOUCH Scheme Enco ding Chasing vs No Chasing Critical Sections Centralized vs Distributed Work Queue Fairness of Scheduling Dynamic Scoping Continuation Semantics Original Semantics MultiScheme Semantics KatzWeise Continuations KatzWeise Continuations with Legitimacy Implementing Legitimacy Sp eculation Barriers The Cost of Supp orting Legitimacy Benchmark Programs abisort allpairs fib mm mst poly qsort queens rantree ii scan sum tridiag The Performance of ETC Lazy Task Creation Overview of LTC Scheduling Task Stealing Behavior Task Susp ension Behavior Continuations for Futures Pro cedure Calling Convention Unlimited Extent Continuations Continuation Heapication Parsing Continuations Implementing FirstClass Continuations The LTC Mechanism The Lazy Task Queue Pushing and Popping Lazy Tasks Stealing Lazy Tasks The Dynamic Environment Queue The Problem of Overow The Heavyweight Task Queue Supp orting Weaker Continuation Semantics Synchronizing Access to the Task Stack The SharedMemory Proto col Avoiding Hardware Lo cks Cost of a Future on GP Impact of Memory Hierarchy on Performance The MessagePassing Proto col Really Lazy Task Creation Communicating Steal Requests Potential Problems with the MP Proto col Co de Generated for SM and MP Proto cols iii Summary Polling Eciently The Problem of Pro cedure Calls Co de Structure CallReturn Polling Short Lived Pro cedures Balanced Polling Subproblem Calls Reduction Calls Minimal Polling Handling Join Points Polling in Gambit Results .
Recommended publications
  • The Complexity of Flow Analysis in Higher-Order Languages
    The Complexity of Flow Analysis in Higher-Order Languages David Van Horn arXiv:1311.4733v1 [cs.PL] 19 Nov 2013 The Complexity of Flow Analysis in Higher-Order Languages A Dissertation Presented to The Faculty of the Graduate School of Arts and Sciences Brandeis University Mitchom School of Computer Science In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy by David Van Horn August, 2009 This dissertation, directed and approved by David Van Horn’s committee, has been accepted and approved by the Graduate Faculty of Brandeis University in partial fulfillment of the requirements for the degree of: DOCTOR OF PHILOSOPHY Adam B. Jaffe, Dean of Arts and Sciences Dissertation Committee: Harry G. Mairson, Brandeis University, Chair Olivier Danvy, University of Aarhus Timothy J. Hickey, Brandeis University Olin Shivers, Northeastern University c David Van Horn, 2009 Licensed under the Academic Free License version 3.0. in memory of William Gordon Mercer July 22, 1927–October 8, 2007 Acknowledgments Harry taught me so much, not the least of which was a compelling kind of science. It is fairly obvious that I am not uninfluenced by Olivier Danvy and Olin Shivers and that I do not regret their influence upon me. My family provided their own weird kind of emotional support and humor. I gratefully acknowledge the support of the following people, groups, and institu- tions, in no particular order: Matthew Goldfield. Jan Midtgaard. Fritz Henglein. Matthew Might. Ugo Dal Lago. Chung-chieh Shan. Kazushige Terui. Christian Skalka. Shriram Krishnamurthi. Michael Sperber. David McAllester. Mitchell Wand. Damien Sereni. Jean-Jacques Levy.´ Julia Lawall.
    [Show full text]
  • Fundamentals of Type Inference Systems
    Fundamentals of type inference systems Fritz Henglein DIKU, University of Copenhagen [email protected] February 1, 1991; revised January 31, 1994; updated August 9, 2009 1 Introduction These notes give a compact overview of established core type systems and of their fundamental properties. We emphasize the use and application of type systems in programming languages, but also mention their role in logic. Proofs are omitted, but references to relevant sources in the literature are usually given. 1.1 What is a \type"? There are many examples of types in programming languages: • Primitive types: int, float, bool • Compound types: products (records), sums (disjoint unions), lists, arrays • Recursively definable types, such as tree data types • Function types, e.g. int !int • Parametric polymorphic types • Abstract types, e.g. Java interfaces Loosely speaking a type is a description of a collection of related values. • A type has syntax (\description"): It denotes a collection. • A type has elements (\collection"): It makes sense to talk about ele- ments of a type; in particular, it may have zero, one or many elements 1 • A type's elements have common properties (\related"): Users of a type can rely on each element having some common properties|an interface| without having to know the identity of particular elements. Types incorporate multiple aspects: • Type as a set of values: This view focuses on how values are con- structed to be elements of a type; e.g. constructing the natural num- bers from 0 and the successor function; • Type as a an interface: This view focuses on how values can be used (\deconstructed") by a client; e.g.
    [Show full text]
  • On the Resolution Semiring
    Aix-Marseille Université École doctorale 184 UFR sciences Institut de Mathématiques de Marseille Thèse présentée pour obtenir le grade universitaire de docteur Spécialité: Mathématiques On the Resolution Semiring Marc Bagnol Jury: Pierre-Louis Curien Université Paris Diderot Jean-Yves Girard Aix-Marseille Université (directeur) Ugo dal Lago Università di Bologna (rapporteur) Paul-André Melliès Université Paris Diderot Myriam Quatrini Aix-Marseille Université Ulrich Schöpp LMU München Philip Scott University of Ottawa (rapporteur) Kazushige Terui Kyoto University Soutenue le 4/12/2014 à Marseille. This thesis is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International licence. Résumé On étudie dans cette thèse une structure de semi-anneau dont le produit est basé sur la règle de résolution de la programmation logique. Cet objet mathématique a été initialement introduit dans le but de modéliser la procédure d’élimination des coupures de la logique linéaire, dans le cadre du programme de géométrie de l’interaction. Il fournit un cadre algébrique et abstrait, tout en étant présenté sous une forme syntaxique et concrète, dans lequel mener une étude théorique du calcul. On reviendra dans un premier temps sur l’interprétation interactive de la théorie de la démonstration dans ce semi-anneau, via l’axiomatisation catégorique de l’approche de la géométrie de l’interaction. Cette interprétation établit une traduction des programmes fonctionnels vers une forme très simple de programmes logiques. Dans un deuxième temps, on abordera des problématiques de théorie de la complexité: bien que le problème de la nilpotence dans le semi-anneau étudié soit indécidable en général, on fera apparaître des restrictions qui permettent de caractériser le calcul en espace logarithmique (déterministe et non-déterministe) et en temps polynomial (déterministe).
    [Show full text]
  • Metatheorems About Convertibility in Typed Lambda Calculi
    Metatheorems about Convertibility in Typed Lambda Calculi: Applications to CPS transform and "Free Theorems" by Jakov Kucan B.S.E., University of Pennsylvania (1991) M.A., University of Pennsylvania (1991) Submitted to the Department of Mathematics in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 1997 @ Massachusetts Institute of Technology 1997. All rights reserved. x - I A uthor .... ................ Department of Mathematics /1. October 10, 1996 Certified by ....... , ,.... ..... .... ........................... Albert R. Meyer Hitachi America Professor of Engineering / -Thesis Supervisor (/1 n / Accepted by..... ... ......Accep.. ...yHungApplied.... .... ....ma.. .........Cheng..... Chairman, itics Committee Accepted by ................................... .......... .... ............ Richard Melrose oChairman, Departmental Committee on Graduate Students MAR 0 41997 Metatheorems about Convertibility in Typed Lambda Calculi: Applications to CPS transform and "Free Theorems" by Jakov Kutan Submitted to the Department of Mathematics on October 10, 1996, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract In this thesis we present two applications of using types of a typed lambda calculus to derive equation schemas, instances of which are provable by convertibility. Part 1: Retraction Approach to CPS transform: We study the continuation passing style (CPS) transform and its generalization, the computational transform, in which the notion of computation is generalized from continuation passing to an arbitrary one. To establish a relation between direct style and continuationpassing style interpretation of sequential call- by-value programs, we prove the Retraction Theorem which says that a lambda term can be recovered from its continuationized form via a A-definable retraction. The Retraction Theorem is proved in the logic of computational lambda calculus for the simply typable terms.
    [Show full text]
  • Chih-Hao Luke ONG Contents
    Chih-Hao Luke ONG Department of Computer Science Tel: +44 1865 283522 University of Oxford Fax: +44 1865 273839 Wolfson Building, Parks Road Email: [email protected] Oxford OX1 3QD Homepage: http://www.cs.ox.ac.uk/people/luke.ong/personal/ United Kingdom Born in Singapore; Singapore citizen. Married with one child. 1 Contents 1 Education 2 2 Appointments 2 2.1 Current Appointments............................................2 2.2 Past Appointments..............................................2 2.3 Visiting Positions...............................................3 3 Awards 3 4 Research and Research Group 3 4.1 Postdoctoral Researchers Supervised and Research Fellows Hosted.....................3 4.2 Research Visitors Hosted (4 Weeks or Longer)................................4 4.3 Doctoral Students Graduated.........................................5 4.4 Current Doctoral Students:..........................................7 4.5 Software....................................................7 4.6 Patent.....................................................7 4.7 Research Grants as PI or co-PI........................................8 4.8 Named Collaborator on International Research Grants............................8 5 Invited Presentations 8 5.1 Conferences, Workshops and Instructional Meetings.............................8 5.2 University Colloquia and Seminars...................................... 13 6 Publications 14 7 Professional Activities 19 7.1 International External Review........................................ 19 7.2 Editorial Duties...............................................
    [Show full text]
  • ICFP 2008 Final Program
    ICFP 2008 Final Program Monday, Sep 22, 2008 Tuesday, Sep 23, 2008 Wednesday, Sep 24, 2008 Invited Talk (Chair: Peter Thiemann) Invited Talk (Chair: James Hook) Invited Talk (Chair: Mitchell Wand) 9:00 Lazy and Speculative Execution in Computer Systems 9:00 Defunctionalized Interpreters for Higher-Order Lan- 9:00 Polymorphism and Page Tables|Systems Program- Butler Lampson; Microsoft Research guages ming From a Functional Programmer's Perspective 10:00 Break Olivier Danvy; University of Aarhus Mark Jones; Portland State University Session 1 (Chair: Martin Sulzmann) 10:00 Break 10:00 Break 10:30 Flux: FunctionaL Updates for XML Session 6 (Chair: Andrew Tolmach) Session 11 (Chair: Fritz Henglein) James Cheney; University of Edinburgh 10:30 Parametric Higher-Order Abstract Syntax for Mecha- 10:30 Pattern Minimization Problems over Recursive Data 10:55 Typed Iterators for XML nized Semantics Types 1 2 Giuseppe Castagna , Kim Nguyen ; 1PPS (CNRS) - Universit´e Adam Chlipala; Harvard University Alexander Krauss; TU M¨unchen Paris 7 - Paris, France, 2LRI - Universit´eParis-Sud 11 - Orsay, France 10:55 Typed Closure Conversion Preserves Observational 10:55 Deciding kCFA is complete for EXPTIME 11:20 Break Equivalence David Van Horn, Harry Mairson; Brandeis University Toyota Technological Institute at Session 2 (Chair: Matthew Fluet) Amal Ahmed, Matthias Blume; 11:20 Break 11:50 Aura: A Programming Language for Authorization Chicago Session 12 (Chair: Derek Dreyer) and Audit 11:20 Break 11:50 HMF: Simple Type Inference for First-Class Polymor-
    [Show full text]
  • Database Query Languages Embedded in the Typed Lambda Calculus
    Information and Computation IC2571 information and computation 127, 117144 (1996) article no. 0055 ViewDatabase metadata, citation and Query similar papers Languages at core.ac.uk Embedded in the Typed Lambda Calculusbrought* to you by CORE provided by Elsevier - Publisher Connector Gerd G. Hillebrand- and Paris C. Kanellakis- Department of Computer Science, Brown University, Box 1910, Providence, Rhode Island 02912 and Harry G. Mairson Department of Computer Science, Brandeis University, Waltham, Massachusetts 02254 picture is somewhat complex. If inputs and outputs are We investigate the expressive power of the typed *-calculus when Church numerals typed as Int (where Int#({ Ä {) Ä { Ä { expressing computations over finite structures, i.e., databases. We for some fixed {), Schwichtenberg [42] and Statman show that the simply typed *-calculus can express various database showed that the expressible multi-argument functions of query languages such as the relational algebra, fixpoint logic, and the complex object algebra. In our embeddings, inputs and outputs are type (Int, ..., Int) Ä Int (or equivalently, Int Ä }}} Ä *-terms encoding databases, and a program expressing a query is a Int Ä Int) are exactly the extended polynomials, i.e., the *-term which types when applied to an input and reduces to an output. functions generated by 0 and 1 using the operations addi- Our embeddings have the additional property that PTIME computable tion, multiplication and conditional. If inputs and outputs queries are expressible by programs that, when applied to an input, are Church numerals given more complex types than Int, reduce to an output in a PTIME sequence of reduction steps. Under our database input-output conventions, all elementary queries are express- exponentiation and predecessor can also be expressed.
    [Show full text]
  • First-Class Continuations Basic Research in Computer Science
    BRICS BRICS RS-96-20 Danvy & Lawall: Back to Direct Style II: First-Class Continuations Basic Research in Computer Science Back to Direct Style II: First-Class Continuations Olivier Danvy Julia L. Lawall BRICS Report Series RS-96-20 ISSN 0909-0878 June 1996 Copyright c 1996, BRICS, Department of Computer Science University of Aarhus. All rights reserved. Reproduction of all or part of this work is permitted for educational or research use on condition that this copyright notice is included in any copy. See back inner page for a list of recent publications in the BRICS Report Series. Copies may be obtained by contacting: BRICS Department of Computer Science University of Aarhus Ny Munkegade, building 540 DK - 8000 Aarhus C Denmark Telephone: +45 8942 3360 Telefax: +45 8942 3255 Internet: [email protected] BRICS publications are in general accessible through WWW and anonymous FTP: http://www.brics.dk/ ftp ftp.brics.dk (cd pub/BRICS) Back to Direct Style II: First-Class Continuations ∗ Olivier Danvy Julia L. Lawall BRICS † IRISA Computer Science Department University of Rennes § Aarhus University ‡ ([email protected]) ([email protected]) June 1996 Abstract The direct-style transformation aims at mapping continuation- passing programs back to direct style, be they originally written in continuation-passing style or the result of the continuation-passing- style transformation. In this paper, we continue to investigate the direct-style transformation by extending it to programs with first-class continuations. First-class continuations break the stack-like discipline of continua- tions in that they are sent results out of turn.
    [Show full text]
  • Efficient Inference of Object Types
    Information and Computation, 123(2):198{209, 1995. Efficient Inference of Object Types Jens Palsberg [email protected] Laboratory for Computer Science Massachusetts Institute of Technology NE43-340 545 Technology Square Cambridge, MA 02139 Abstract Abadi and Cardelli have recently investigated a calculus of objects [2]. The calculus supports a key feature of object-oriented languages: an object can be emulated by another object that has more refined methods. Abadi and Cardelli presented four first-order type systems for the calculus. The simplest one is based on finite types and no subtyping, and the most powerful one has both recursive types and subtyping. Open until now is the question of type inference, and in the presence of subtyping \the absence of minimum typings poses practical problems for type inference" [2]. In this paper we give an O(n3) algorithm for each of the four type inference problems and we prove that all the problems are P-complete. We also indicate how to modify the algorithms to handle functions and records. 1 Introduction Abadi and Cardelli have recently investigated a calculus of objects [2]. The calculus supports a key feature of object-oriented languages: an object can 1 be emulated by another object that has more refined methods. For example, if the method invocation a:l is meaningful for an object a, then it will also be meaningful for objects with more methods than a, and for objects with more refined methods. This phenomenon is called subsumption. The calculus contains four constructions: variables, objects, method in- vocation, and method override.
    [Show full text]
  • Notes on the History of Type Inference
    Notes on the history of type inference Fritz Henglein DIKU, University of Copenhagen 2010-01-29 Newman's typability algorithm Newman [1943] for Quine's Type Theory (Quine [1937]) and an implicit version of Church's Theory of Simple Types (Church [1940])|a Curry-style formulation without explicit types (Curry [1934])| was recently rediscovered by Hindley [2008]. The algorithm is a fascinating precursor to constraint-based type inference and program analysis techniques, which have been developed in the late 80s and onwards for both theoretical and practical purposes. Using terminology from constraint-based type inference Newman's algorithm can be described as follows when applied to simple typing:1 1. Ensure that all bound and free variables in the subject term M are named apart. Let each subterm X of M be associated with a unique type variable αX . In Newman's description αX is identified by the subterm X itself. 2. Generate constraints for each subterm Z of M: (a) For Z ≡ XY Newman generates the constraints X γ1 Z and X γ2 Y , which corresponds to the (type-)equational constraint αX = αY ! αZ . (b) For Z ≡ λx.U generate Z γ1 U and Z γ2 x, corresponding to αZ = αx ! αU . It can be observed that whenever there exists X γ1 Y in Newman's con- straints then they also contain X γ2 Z for some Z, and this is preserved throughout the subsequent constraint simplification process. A |decidedly revisionist|interpretation of this observation is that the type constraint notation subsequently adopted in constraint-based type inference syntac- tically incorporates this duality by combining them into a single construct: Define αX = αY ! αZ if X γ1 Z and X γ2 Y in Newman's formulation.
    [Show full text]
  • Bottom-Up Beta-Reduction
    Bottom-up β-reduction: uplinks and λ-DAGs Olin Shivers1 and Mitchell Wand2 1 Georgia Institute of Technology 2 Northeastern University Abstract. Representing a λ-calculus term as a DAG rather than a tree allows us to represent the sharing that arises from β-reduction, thus avoiding combina- torial explosion in space. By adding uplinks from a child to its parents, we can efficiently implement β-reduction in a bottom-up manner, thus avoiding com- binatorial explosion in time required to search the term in a top-down fashion. We present an algorithm for performing β-reduction on λ-terms represented as uplinked DAGs; discuss its relation to alternate techniques such as Lamping graphs, explicit-substitution calculi and director strings; and present some tim- ings of an implementation. Besides being both fast and parsimonious of space, the algorithm is particularly suited to applications such as compilers, theorem provers, and type-manipulation systems that may need to examine terms in- between reductions—i.e., the “readback” problem for our representation is trivial. Like Lamping graphs, and unlike director strings or the suspension λ-calculus, the algorithm functions by side-effecting the term containing the redex; the rep- resentation is not a “persistent” one. The algorithm additionally has the charm of being quite simple: a complete implementation of the core data structures and algorithms is 180 lines of SML. 1 Introduction The λ-calculus [2, 5] is a simple language with far-reaching use in the programming- languages and formal-methods communities, where it is frequently employed to repre- sent, among other objects, functional programs, formal proofs, and types drawn from sophisticated type systems.
    [Show full text]
  • Type Inference with Expansion Variables and Intersection Types in System E and an Exact Correspondence with Β-Reduction∗
    Type Inference with Expansion Variables and Intersection Types in System E and an Exact Correspondence with β-Reduction∗ Sebastien´ Carlier J. B. Wells Heriot-Watt University Heriot-Watt University http://www.macs.hw.ac.uk/~sebc/ http://www.macs.hw.ac.uk/~jbw/ ABSTRACT 1. DISCUSSION System E is a recently designed type system for the λ- calculus with intersection types and expansion variables. Dur- 1.1 Background and Motivation ing automatic type inference, expansion variables allow post- poning decisions about which non-syntax-driven typing rules 1.1.1 Types for Programs and Type Inference to use until the right information is available and allow im- Types have been used extensively to analyze computer plementing the choices via substitution. program properties without executing the programs, for pur- This paper uses expansion variables in a unification-based poses such as detecting programming errors, enforcing ab- automatic type inference algorithm for System E that suc- stract inter-module interfaces, justifying compiler optimiza- ceeds for every β-normalizable λ-term. We have imple- tions, and enforcing security properties. mented and tested our algorithm and released our implemen- In the type assignment style, a type system associates each tation publicly. Each step of our unification algorithm corre- untyped (i.e., free of type annotations) term (e.g., computer sponds to exactly one β-reduction step, and vice versa. This program fragment) with 0 or more typings, where each typ- formally verifies and makes precise a step-for-step correspon- ing is a pair of a result type and a type environment for free dence between type inference and β-reduction.
    [Show full text]