Linear Logic and Coordination for Parallel Programming

Linear Logic and Coordination for Parallel Programming Flavio´ Manuel Fernandes Cruz CMU-CS-15-148 March 2016 Computer Science Department Consortium of: School of Computer Science Universidades do Minho, Aveiro e Porto Carnegie Mellon University Portugal Pittsburgh PA, USA Thesis Committee Frank Pfenning, Carnegie Mellon University (Co-Chair) Seth Goldstein, Carnegie Mellon University (Co-Chair) Ricardo Rocha, University of Porto (Co-Chair) Umut Acar, Carnegie Mellon University Luis Barbosa, University of Minho Carlos Guestrin, University of Washington Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Copyright © 2016 Flavio´ Cruz Support for this research was provided by the European Regional Development Fund (ERDF) through the Operational Programme for Competitiveness and Internationalisation - COMPETE 2020 Programme; by the Portuguese funding agency - Fundaçao˜ para a Cienciaˆ e a Tecnologia (Portuguese Foundation for Science and Technology) through the Carnegie Mellon Portugal Program, under grants SFRH / BD / 51566 / 2011 and within projects POCI-01-0145-FEDER-006961 and FCOMP-01-0124-FEDER-037281. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, government or any other entity. Keywords: Programming Languages, Parallel Programming, Coordination, Linear Logic, Logic Programming, Implementation Abstract Parallel programming is known to be difficult to apply, exploit and reason about. Programs written using low level parallel constructs tend to be problematic to understand and debug. Declarative programming is a step towards the right direction because it moves the details of parallelization from the programmer to the runtime system. However, this paradigm leaves the programmer with little opportunities to co- ordinate the execution of the program, resulting in suboptimal declarative programs. We propose a new declarative programming language, called Linear Meld (LM), that provides a solution to this problem by supporting data-driven dynamic coordination mechanisms that are semantically equivalent to regular computation. LM is a logic programming language designed for programs that operate on graphs and supports coordination and structured manipulation of mutable state. Co- ordination is achieved through two mechanisms: (i) coordination facts, which allow the programmer to control how computation is scheduled and how data is laid out, and (ii) thread facts, which allow the programmer to reason about the state of the underlying parallel architecture. The use of coordination allows the programmer to combine the inherent implicit parallelism of LM with a form of declarative explicit parallelism provided by coordination, allowing the development of complex coordinated programs that run faster than regular programs. Furthermore, since coordination is indistinguishable from regular computation, it allows the programmer to reason both about the problem at hand and also about parallel execution. We have written several graph algorithms, search algorithms and machine learn- ing algorithms in LM. For some programs, we have written informal proofs of cor- rectness to show that programs are easily proven correct. We have also engineered a compiler and runtime system that is able to run LM programs on multi core archi- tectures with decent performance and scalability. ii Acknowledgments I would like to thank my advisors, for their help and encouragement during the development of this thesis. I thank Prof. Ricardo Rocha for his help during implementation and debugging, for his suggestions and for his immense attention to detail. I thank Prof. Frank Pfenning for helping me understand proof theory and for instilling a sense of rigor about my work. I thank Prof. Seth Goldstein for his great insights and for giving me the encourage to pursue the ideas I had during the last 4 years. I have certainly become a better researcher because of you all. Besides my advisors, I would like to express my gratitude to the following per- sons. A special acknowledgment to Prof. Fernando Silva, for encouraging me to apply to the PhD program and for believing that I could do it. To Michael Ashley- Rollman, for helping me understand his work on ensemble programming. This thesis would not be possible without his initial contribution. I am thankful to the Fundacao para a Ciencia e Tecnologia (FCT) for the research grant SFRH/BD/51566/2011, to the Center for Research and Advanced Computing Systems (CRACS), and to the Computer Science Department at CMU for their fi- nancial support. To my fellow friends from DCC-FCUP, specially Miguel Areias, Joao Santos, and Joana Corte-Real, for their friendship and excellent work environment, which certainly helped me during the time I spent in Porto. To my roommate Salil Joshi, for his friendship during my time at CMU. To all my other friends at CMU for their intellectually stimulating conversations. My deepest gratitude for my parents and my sisters, for their support and for teaching me the value of hard work and persistence. Finally, I would also like to thank my wife Telma, for her affection and encouragement. Flavio Cruz iv Contents List of Figures xi List of Tables xvii List of Equations xix 1 Introduction1 1.1 Thesis Statement . .2 2 Parallelism, Declarative Programming, Coordination and Provability5 2.1 Explicit Parallel Programming . .5 2.2 Implicit Parallel Programming . .6 2.2.1 Functional Programming . .6 2.2.2 Logic Programming . .7 2.2.3 Data-Centric Languages . .9 2.2.4 Granularity Problem . 10 2.3 Coordination . 10 2.4 Provability . 12 2.5 Chapter Summary . 12 3 Linear Meld: The Base Language 13 3.1 A Taste Of LM . 13 3.1.1 First Example: Message Routing . 14 3.1.2 Second Example: Key/Value Dictionary . 16 3.1.3 Third Example: Graph Visit . 17 3.2 Types and Locality . 21 3.3 Operational Semantics . 22 3.4 LM Abstract Syntax . 23 3.4.1 Selectors . 25 3.4.2 Exists Expressions . 25 3.4.3 Comprehensions . 26 3.4.4 Aggregates . 26 3.4.5 Directives . 27 3.5 Applications . 27 v 3.5.1 Bipartiteness Checking . 27 3.5.2 Synchronous PageRank . 29 3.5.3 Quick-Sort . 30 3.6 Related Work . 34 3.6.1 Graph-Based Programming Models . 34 3.6.2 Sensor Network Programming Languages . 35 3.6.3 Constraint Handling Rules . 36 3.6.4 Graph Transformation Systems . 36 3.7 Chapter Summary . 36 4 Logical Foundations: Abstract Machine 39 4.1 Linear Logic . 39 4.1.1 Sequent Calculus . 40 4.1.2 From The Sequent Calculus To LM . 41 4.2 High Level Dynamic Semantics . 43 4.2.1 Step . 44 4.2.2 Application . 44 4.2.3 Match . 45 4.2.4 Derivation . 45 4.3 Low Level Dynamic Semantics . 46 4.3.1 Application . 47 4.3.2 Continuation Frames . 48 4.3.3 Structure of Continuation Frames . 48 4.3.4 Match . 51 4.3.5 Backtracking . 53 4.3.6 Derivation . 54 4.3.7 Aggregates . 55 4.3.8 State well-formedness . 61 4.4 Soundness Proof . 63 4.4.1 Soundness Of Matching . 63 4.4.2 Soundness Of Derivation . 66 4.4.3 Aggregate Soundness . 66 4.4.4 Soundness Of Derivation . 72 4.4.5 Wrapping-up . 74 4.5 Related Work . 75 4.6 Chapter Summary . 75 5 Local Computation: Data Structures and Compilation 77 5.1 Implementation Overview . 77 5.2 Node Data Structure . 79 5.2.1 Indexing Engine . 81 5.3 Rule Engine . 82 5.4 Compilation . 84 5.4.1 Ordinary Rules . 84 vi 5.4.2 Persistence Checking . 87 5.4.3 Updating Facts . 89 5.4.4 Enforcing Linearity . 89 5.4.5 Comprehensions . 90 5.4.6 Aggregates . 92 5.5 Chapter Summary . 93 6 Multi Core Implementation 95 6.1 Parallelism . 95 6.2 Runtime Data Structures And Garbage Collection . 101 6.3 Memory Allocation . 102 6.4 Experimental Evaluation . 103 6.4.1 Sequential Performance . 104 6.4.2 Memory Usage . 107 6.4.3 Dynamic Indexing . 108 6.4.4 Scalability . 109 6.4.5 Threaded allocator versus malloc ..................... 117 6.4.6 Threaded allocator versus node-local allocator . 117 6.5 Related Work . 119 6.5.1 Virtual Machines . 119 6.5.2 Datalog Execution . ..

Load more