The University of Chicago Abstract Trace Analysis

Total Page:16

File Type:pdf, Size:1020Kb

The University of Chicago Abstract Trace Analysis THE UNIVERSITY OF CHICAGO ABSTRACT TRACE ANALYSIS FOR CONCURRENT PROGRAMS A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF COMPUTER SCIENCE BY YINGQI XIAO CHICAGO, ILLINOIS AUGUST 2010 To my parents, Meng and Mingying, and my wife, Jing. TABLE OF CONTENTS LIST OF FIGURES . vi ACKNOWLEDGEMENTS . viii ABSTRACT . ix Chapter 1 INTRODUCTION . 1 1.1 The Challenges . 2 1.1.1 A Simple Concurrent Language . 2 1.1.2 A Simple Example . 4 1.1.3 A Pipeline Example . 6 1.2 This Dissertation . 9 1.3 Organization of This Dissertation . 12 2 BACKGROUND AND RELATED WORK . 14 2.1 Data-Flow Analysis . 14 2.1.1 An Example: Reaching-Definition Analysis for Channels . 15 2.2 Control-Flow Analysis . 16 2.3 π-calculus . 18 2.4 Related Work . 20 2.4.1 Input and Output Only Channels . 21 2.4.2 Linear Types for Concurrent Languages . 22 2.4.3 Nielson and Nielson’s Type and Effect System . 26 2.4.4 Session Types . 29 2.4.5 Colby’s Abstract Interpretation Approach . 30 2.4.6 Our Previous Work . 31 iii 3 ABSTRACT TRACE ANALYSIS . 35 3.1 Dynamic Semantics . 35 3.1.1 Properties of Traces . 39 3.2 Abstract Trace Analysis Overview . 40 3.3 Type-Sensitive Control-Flow Analysis for CML . 41 3.3.1 Introduction of 0-CFA . 41 3.3.2 Abstract Values . 43 3.3.3 Type-Sensitive CFA . 44 3.3.4 Properties . 45 3.4 The Extended CFG . 45 3.5 Abstract Trace Analysis . 47 4 APPLICATIONS OF ANALYSIS . 51 4.1 Detecting Special Communication Pattern . 51 4.1.1 Specialized Channel Operations . 52 4.1.2 An Example . 53 4.1.3 Communication Topology Analysis . 56 4.2 Concurrency Analysis . 65 4.2.1 Overview . 66 4.2.2 Ordering Enforced by Control Flow . 67 4.2.3 Ordering Enforced by Synchronization . 71 4.2.4 Refinement . 73 4.2.5 Automatic Monitor Recognition and Transformation . 73 4.2.6 Monitor Communication Pattern Recognition . 77 4.2.7 Monitor Thread Transformation . 81 5 ANALYSIS CORRECTNESS . 84 5.1 Communication Topology . 84 5.2 Concurrency Analysis . 97 6 IMPLEMENTATION . 103 6.1 The Type-Sensitive CFA Algorithm . 103 6.2 Abstract Trace Analysis Algorithm . 108 6.3 Implementation of Specialized CML Primitives . 110 6.3.1 Preliminaries . 111 6.3.2 The General Parallel Implementation . 114 6.3.3 A Reference Implementation of Fan-In and Fan-Out Channels . 114 iv 7 PERFORMANCE . 122 7.1 One-To-One Benchmark . 123 7.2 Primes Benchmark . 124 7.3 Fan-Out and Fan-In Benchmarks . 126 7.4 Monitor Benchmark . 129 8 CONCLUSION . 132 8.1 Summary . 132 8.2 Future Work . 133 REFERENCES . 135 v LIST OF FIGURES 1.1 A simple concurrent language . 3 1.2 A simple example in SCL . 4 1.3 The flow graph of the simple example . 5 1.4 Simple pipeline . 7 1.5 The flow graph of the simple pipeline example . 8 1.6 One possible history of five processes . 10 2.1 Syntax of π-calculus . 18 2.2 Structural congruence of the π-calculus . 19 2.3 Reduction rules of the π-calculus . 19 2.4 Syntax of tagged processes . 22 2.5 Typing rules . 23 2.6 Syntax for linear type channels . 23 2.7 Reduction rules of linear types channels . 24 2.8 Typing rules . 25 2.9 Typing rules . 27 2.10 Concurrent evaluation rules . 28 2.11 Analyzing the CFG G for channel c .................... 33 3.1 Sequential evaluation . 37 3.2 The join of two abstract values . 44 3.3 Approximation of channel send and receive sites . 45 3.4 Simple service . 49 4.1 A simple service with an abstract client-server protocol . 54 4.2 Data-flow of the server’s request channel . 55 4.3 A version of Figure 4.1 with specialized communication operations . 56 4.4 Standard data-flow equations . 68 4.5 A self-concurrent example . 68 4.6 The self-concurrent example in ECFG . 69 4.7 Data-flow equations for concurrency analysis . 71 4.8 A simple example with synchronization edge . 73 4.9 The self-concurrent example in ECFG . 74 4.10 An example of monitor tasks . 75 4.11 Simple monitor . 76 4.12 The optimized version of simple monitor . 77 4.13 The simple monitor after transformation . 83 vi 5.1 The definition of abs ............................ 87 6.1 CFA for expressions Part I . 104 6.2 CFA for expressions Part II . 105 6.3 Algorithm for abstract traceflow analysis . 109 6.4 Algorithm for testing for a valid trace . 109 6.5 Algorithm for redundant trace . 110 6.6 The reference implementation of general send . 113 6.7 The reference implementation of general recv . 113 6.8 The specialized implementation of send for fan-in channels . 118 6.9 The specialized implementation of recv for fan-in channels . 119 6.10 The specialized implementation of send for fan-out channels . 120 6.11 The specialized implementation of recv for fan-out channels . 121 7.1 Primes benchmark results . 125 7.2 Primes speed-up curves . 125 7.3 Fan-out benchmark results . 126 7.4 Fan-out speed-up curves . 127 7.5 Fan-in benchmark results . 128 7.6 Fan-in speed-up curves . 128 7.7 Monitor benchmark results . 130 7.8 Monitor speed-up curves . 131 vii ACKNOWLEDGEMENTS First of all, I would like to thank my advisor, Professor John H. Reppy, for his support and guidance throughout my graduate research at the University of Chicago and for his help with the proof reading of this dissertation. Without his insight and guidance, I would never be able to accomplish this dissertation. I would also like to thank other members of my committee, Dave MacQueen and Xinyu Feng, for reading my dissertation and providing feedback on it. I would like to give my heartfelt thanks to my parents, who raised me with their endless love and support. I would also like to give my heartfelt thanks to my wife, who has accom- panied me with her love, understanding and encouragement. I dedicate this dissertation to my parents and my wife. Finally, I would like to.
Recommended publications
  • Fast and Correct Load-Link/Store-Conditional Instruction Handling in DBT Systems
    Fast and Correct Load-Link/Store-Conditional Instruction Handling in DBT Systems Abstract Atomic Read Memory Dynamic Binary Translation (DBT) requires the implemen- Operation tation of load-link/store-conditional (LL/SC) primitives for guest systems that rely on this form of synchronization. When Equals targeting e.g. x86 host systems, LL/SC guest instructions are YES NO typically emulated using atomic Compare-and-Swap (CAS) expected value? instructions on the host. Whilst this direct mapping is efficient, this approach is problematic due to subtle differences between Write new value LL/SC and CAS semantics. In this paper, we demonstrate that to memory this is a real problem, and we provide code examples that fail to execute correctly on QEMU and a commercial DBT system, Exchanged = true Exchanged = false which both use the CAS approach to LL/SC emulation. We then develop two novel and provably correct LL/SC emula- tion schemes: (1) A purely software based scheme, which uses the DBT system’s page translation cache for correctly se- Figure 1: The compare-and-swap instruction atomically reads lecting between fast, but unsynchronized, and slow, but fully from a memory address, and compares the current value synchronized memory accesses, and (2) a hardware acceler- with an expected value. If these values are equal, then a new ated scheme that leverages hardware transactional memory value is written to memory. The instruction indicates (usually (HTM) provided by the host. We have implemented these two through a return register or flag) whether or not the value was schemes in the Synopsys DesignWare® ARC® nSIM DBT successfully written.
    [Show full text]
  • Concepts of Concurrent Programming Summary of the Course in Spring 2011 by Bertrand Meyer and Sebastian Nanz
    Concepts of Concurrent Programming Summary of the course in spring 2011 by Bertrand Meyer and Sebastian Nanz Stefan Heule 2011-05-28 Licence: Creative Commons Attribution-Share Alike 3.0 Unported (http://creativecommons.org/licenses/by-sa/3.0/) Contents 1 Introduction .......................................................................................................................................... 4 1.1 Ambdahl’s Law .............................................................................................................................. 4 1.2 Basic Notions ................................................................................................................................. 4 1.2.1 Multiprocessing ..................................................................................................................... 4 1.2.2 Multitasking .......................................................................................................................... 4 1.2.3 Definitions ............................................................................................................................. 4 1.2.4 The Interleaving Semantics ................................................................................................... 5 1.3 Transition Systems and LTL ........................................................................................................... 6 1.3.1 Syntax and Semantics of Linear-Time Temporal Logic.......................................................... 7 1.3.2 Safety and Liveness Properties
    [Show full text]
  • Implementing a Transformation from BPMN to CSP+T with ATL: Lessons Learnt
    Implementing a Transformation from BPMN to CSP+T with ATL: Lessons Learnt Aleksander González1, Luis E. Mendoza1, Manuel I. Capel2 and María A. Pérez1 1 Processes and Systems Department, Simón Bolivar University PO Box 89000, Caracas, 1080-A, Venezuela 2 Software Engineering Department, University of Granada Aynadamar Campus, 18071, Granada, Spain Abstract. Among the challenges to face in order to promote the use of tech- niques of formal verification in organizational environments, there is the possi- bility of offering the integration of features provided by a Model Transforma- tion Language (MTL) as part of a tool very used by business analysts, and from which formal specifications of a model can be generated. This article presents the use of MTL ATLAS Transformation Language (ATL) as a transformation artefact within the domains of Business Process Modelling Notation (BPMN) and Communicating Sequential Processes + Time (CSP+T). It discusses the main difficulties encountered and the lessons learnt when building BTRANSFORMER; a tool developed for the Eclipse platform, which allows us to generate a formal specification in the CSP+T notation from a business process model designed with BPMN. This learning is valid for those who are interested in formalizing a Business Process Modelling Language (BPML) by means of a process calculus or another formal notation. 1 Introduction Business Processes (BP) must be properly and formally specified in order to be able to verify properties, such as scope, structure, performance, capacity, structural consis- tency and concurrency, i.e., those properties of BP which can provide support to the critical success factors of any organization. Formal specification languages and proc- ess algebras, which allow for the exhaustive verification of BP behaviour [17], are used to carry out the formalization of models obtained from Business Process Model- ling (BPM).
    [Show full text]
  • Bisimulations in the Join-Calculus
    Bisimulations in the Join-Calculus C´edricFournet a Cosimo Laneve b,1 aMicrosoft Research, 1 Guildhall Street, Cambridge, U.K. b Dipartimento di Scienze dell’Informazione, Universit`adi Bologna, Mura Anteo Zamboni 7, 40127 Bologna, Italy. Abstract We develop a theory of bisimulations in the join-calculus. We introduce a refined operational model that makes interactions with the environment explicit, and dis- cuss the impact of the lexical scope discipline of the join-calculus on its extensional semantics. We propose several formulations of bisimulation and establish that all formulations yield the same equivalence. We prove that this equivalence is finer than barbed congruence, but that both relations coincide in the presence of name matching. Key words: asynchronous processes; barbed congruence; bisimulation; chemical semantics; concurrency; join-calculus; locality; name matching; pi-calculus. 1 Introduction The join-calculus is a recent formalism for modeling mobile systems [15,17]. Its main motivation is to relate two crucial issues in concurrency: distributed implementation and formal semantics. To this end, the join-calculus enforces a strict lexical scope discipline over the channel names that appear in processes: names can be sent and received, but their input capabilities cannot be affected by the receivers. This is the locality property. 2 Locality yields a realistic distributed model, because the communication prim- itives of the calculus can be directly implemented via standard primitives of 1 This work is partly supported by the ESPRIT CONFER-2 WG-21836 2 The term locality is a bit overloaded in the literature; here, names are locally defined inasmuch as no external definition may interfere; this is the original meaning of locality in the chemical semantics of Banˆatre et al.
    [Show full text]
  • Mastering Concurrent Computing Through Sequential Thinking
    review articles DOI:10.1145/3363823 we do not have good tools to build ef- A 50-year history of concurrency. ficient, scalable, and reliable concur- rent systems. BY SERGIO RAJSBAUM AND MICHEL RAYNAL Concurrency was once a specialized discipline for experts, but today the chal- lenge is for the entire information tech- nology community because of two dis- ruptive phenomena: the development of Mastering networking communications, and the end of the ability to increase processors speed at an exponential rate. Increases in performance come through concur- Concurrent rency, as in multicore architectures. Concurrency is also critical to achieve fault-tolerant, distributed services, as in global databases, cloud computing, and Computing blockchain applications. Concurrent computing through sequen- tial thinking. Right from the start in the 1960s, the main way of dealing with con- through currency has been by reduction to se- quential reasoning. Transforming problems in the concurrent domain into simpler problems in the sequential Sequential domain, yields benefits for specifying, implementing, and verifying concur- rent programs. It is a two-sided strategy, together with a bridge connecting the Thinking two sides. First, a sequential specificationof an object (or service) that can be ac- key insights ˽ A main way of dealing with the enormous challenges of building concurrent systems is by reduction to sequential I must appeal to the patience of the wondering readers, thinking. Over more than 50 years, more sophisticated techniques have been suffering as I am from the sequential nature of human developed to build complex systems in communication. this way. 12 ˽ The strategy starts by designing —E.W.
    [Show full text]
  • Deadlock Analysis of Wait-Notify Coordination Laneve Cosimo, Luca Padovani
    Deadlock Analysis of Wait-Notify Coordination Laneve Cosimo, Luca Padovani To cite this version: Laneve Cosimo, Luca Padovani. Deadlock Analysis of Wait-Notify Coordination. The Art of Modelling Computational Systems: A Journey from Logic and Concurrency to Security and Privacy - Essays Dedicated to Catuscia Palamidessi on the Occasion of Her 60th Birthday, Nov 2019, Paris, France. hal-02430351 HAL Id: hal-02430351 https://hal.archives-ouvertes.fr/hal-02430351 Submitted on 7 Jan 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Deadlock Analysis of Wait-Notify Coordination Cosimo Laneve1[0000−0002−0052−4061] and Luca Padovani2[0000−0001−9097−1297] 1 Dept. of Computer Science and Engineering, University of Bologna { INRIA Focus 2 Dipartimento di Informatica, Universit`adi Torino Abstract. Deadlock analysis of concurrent programs that contain co- ordination primitives (wait, notify and notifyAll) is notoriously chal- lenging. Not only these primitives affect the scheduling of processes, but also notifications unmatched by a corresponding wait are silently lost. We design a behavioral type system for a core calculus featuring shared objects and Java-like coordination primitives. The type system is based on a simple language of object protocols { called usages { to determine whether objects are used reliably, so as to guarantee deadlock freedom.
    [Show full text]
  • A Separation Logic for Refining Concurrent Objects
    A Separation Logic for Refining Concurrent Objects Aaron Turon ∗ Mitchell Wand Northeastern University {turon, wand}@ccs.neu.edu Abstract do { tmp = *C; } until CAS(C, tmp, tmp+1); Fine-grained concurrent data structures are crucial for gaining per- implements inc using an optimistic approach: it takes a snapshot formance from multiprocessing, but their design is a subtle art. Re- of the counter without acquiring a lock, computes the new value cent literature has made large strides in verifying these data struc- of the counter, and uses compare-and-set (CAS) to safely install the tures, using either atomicity refinement or separation logic with new value. The key is that CAS compares *C with the value of tmp, rely-guarantee reasoning. In this paper we show how the owner- atomically updating *C with tmp + 1 and returning true if they ship discipline of separation logic can be used to enable atomicity are the same, and just returning false otherwise. refinement, and we develop a new rely-guarantee method that is lo- Even for this simple data structure, the fine-grained imple- calized to the definition of a data structure. We present the first se- mentation significantly outperforms the lock-based implementa- mantics of separation logic that is sensitive to atomicity, and show tion [17]. Likewise, even for this simple example, we would prefer how to control this sensitivity through ownership. The result is a to think of the counter in a more abstract way when reasoning about logic that enables compositional reasoning about atomicity and in- its clients, giving it the following specification: terference, even for programs that use fine-grained synchronization and dynamic memory allocation.
    [Show full text]
  • The Beacon Calculus: a Formal Method for the flexible and Concise Modelling of Biological Systems
    bioRxiv preprint doi: https://doi.org/10.1101/579029; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. The Beacon Calculus: A formal method for the flexible and concise modelling of biological systems Michael A. Boemo1∗ Luca Cardelli2 Conrad A. Nieduszynski3 1Department of Pathology, University of Cambridge 2Department of Computer Science, University of Oxford 3Genome Damage and Stability Centre, University of Sussex Abstract Biological systems are made up of components that change their actions (and interactions) over time and coordinate with other components nearby. Together with a large state space, the complexity of this behaviour can make it difficult to create concise mathematical models that can be easily extended or modified. This paper introduces the Beacon Calculus, a process algebra designed to simplify the task of modelling interacting biological components. Its breadth is demonstrated by creating models of DNA replication dynamics, the gene expression dynamics in response to DNA methylation damage, and a multisite phosphorylation switch. The flexibility of these models is shown by adapting the DNA replication model to further include two topics of interest from the literature: cooperative origin firing and replication fork barriers. The Beacon Calculus is supported with the open-source simulator bcs (https://github.com/MBoemo/bcs.git) to allow users to develop and simulate their own models. Author summary Simulating a model of a biological system can suggest ideas for future experiments and help ensure that conclusions about a mechanism are consistent with data.
    [Show full text]
  • Lock-Free Programming
    Lock-Free Programming Geoff Langdale L31_Lockfree 1 Desynchronization ● This is an interesting topic ● This will (may?) become even more relevant with near ubiquitous multi-processing ● Still: please don’t rewrite any Project 3s! L31_Lockfree 2 Synchronization ● We received notification via the web form that one group has passed the P3/P4 test suite. Congratulations! ● We will be releasing a version of the fork-wait bomb which doesn't make as many assumptions about task id's. – Please look for it today and let us know right away if it causes any trouble for you. ● Personal and group disk quotas have been grown in order to reduce the number of people running out over the weekend – if you try hard enough you'll still be able to do it. L31_Lockfree 3 Outline ● Problems with locking ● Definition of Lock-free programming ● Examples of Lock-free programming ● Linux OS uses of Lock-free data structures ● Miscellanea (higher-level constructs, ‘wait-freedom’) ● Conclusion L31_Lockfree 4 Problems with Locking ● This list is more or less contentious, not equally relevant to all locking situations: – Deadlock – Priority Inversion – Convoying – “Async-signal-safety” – Kill-tolerant availability – Pre-emption tolerance – Overall performance L31_Lockfree 5 Problems with Locking 2 ● Deadlock – Processes that cannot proceed because they are waiting for resources that are held by processes that are waiting for… ● Priority inversion – Low-priority processes hold a lock required by a higher- priority process – Priority inheritance a possible solution L31_Lockfree
    [Show full text]
  • Multithreading for Gamedev Students
    Multithreading for Gamedev Students Keith O’Conor 3D Technical Lead Ubisoft Montreal @keithoconor - Who I am - PhD (Trinity College Dublin), Radical Entertainment (shipped Prototype 1 & 2), Ubisoft Montreal (shipped Watch_Dogs & Far Cry 4) - Who this is aimed at - Game programming students who don’t necessarily come from a strict computer science background - Some points might be basic for CS students, but all are relevant to gamedev - Slides available online at fragmentbuffer.com Overview • Hardware support • Common game engine threading models • Race conditions • Synchronization primitives • Atomics & lock-free • Hazards - Start at high level, finish in the basement - Will also talk about potential hazards and give an idea of why multithreading is hard Overview • Only an introduction ◦ Giving a vocabulary ◦ See references for further reading ◦ Learn by doing, hair-pulling - Way too big a topic for a single talk, each section could be its own series of talks Why multithreading? - Hitting power & heat walls when going for increased frequency alone - Go wide instead of fast - Use available resources more efficiently - Taken from http://www.karlrupp.net/2015/06/40-years-of-microprocessor- trend-data/ Hardware multithreading Hardware support - Before we use multithreading in games, we need to understand the various levels of hardware support that allow multiple instructions to be executed in parallel - There are many more aspects to hardware multithreading than we’ll look at here (processor pipelining, cache coherency protocols etc.) - Again,
    [Show full text]
  • CS 241 Honors Concurrent Data Structures
    CS 241 Honors Concurrent Data Structures Bhuvan Venkatesh University of Illinois Urbana{Champaign March 7, 2017 CS 241 Course Staff (UIUC) Lock Free Data Structures March 7, 2017 1 / 37 What to go over Terminology What is lock free Example of Lock Free Transactions and Linerazability The ABA problem Drawbacks Use Cases CS 241 Course Staff (UIUC) Lock Free Data Structures March 7, 2017 2 / 37 Atomic instructions - Instructions that happen in one step to the CPU or not at all. Some examples are atomic add, atomic compare and swap. Terminology Transaction - Like atomic operations, either the entire transaction goes through or doesn't. This could be a series of operations (like push or pop for a stack) CS 241 Course Staff (UIUC) Lock Free Data Structures March 7, 2017 3 / 37 Terminology Transaction - Like atomic operations, either the entire transaction goes through or doesn't. This could be a series of operations (like push or pop for a stack) Atomic instructions - Instructions that happen in one step to the CPU or not at all. Some examples are atomic add, atomic compare and swap. CS 241 Course Staff (UIUC) Lock Free Data Structures March 7, 2017 3 / 37 int atomic_cas(int *addr, int *expected, int value){ if(*addr == *expected){ *addr = value; return 1;//swap success }else{ *expected = *addr; return 0;//swap failed } } But this all happens atomically! Atomic Compare and Swap? CS 241 Course Staff (UIUC) Lock Free Data Structures March 7, 2017 4 / 37 But this all happens atomically! Atomic Compare and Swap? int atomic_cas(int *addr, int *expected,
    [Show full text]
  • Proof-Relevant Π-Calculus
    Proof-relevant π-calculus Roly Perera James Cheney University of Glasgow University of Edinburgh Glasgow, UK Edinburgh, UK [email protected] [email protected] Formalising the π-calculus is an illuminating test of the expressiveness of logical frameworks and mechanised metatheory systems, because of the presence of name binding, labelled transitions with name extrusion, bisimulation, and structural congruence. Formalisations have been undertaken in a variety of systems, primarily focusing on well-studied (and challenging) properties such as the theory of process bisimulation. We present a formalisation in Agda that instead explores the theory of concurrent transitions, residuation, and causal equivalence of traces, which has not previously been formalised for the π-calculus. Our formalisation employs de Bruijn indices and dependently- typed syntax, and aligns the “proved transitions” proposed by Boudol and Castellani in the context of CCS with the proof terms naturally present in Agda’s representation of the labelled transition relation. Our main contributions are proofs of the “diamond lemma” for residuation of concurrent transitions and a formal deVnition of equivalence of traces up to permutation of transitions. 1 Introduction The π-calculus [18, 19] is an expressive model of concurrent and mobile processes. It has been investigated extensively and many variations, extensions and reVnements have been proposed, including the asynchronous, polyadic, and applied π-calculus (among many others). The π-calculus has also attracted considerable attention from the logical frameworks and meta-languages community, and formalisations of its syntax and semantics have been performed using most of the extant mechanised metatheory techniques, including (among others) Coq [13, 12, 15], Nominal Isabelle [2], Abella [1] (building on Miller and Tiu [26]), CLF [6], and Agda [21].
    [Show full text]