Measuring and Understanding Consistency at Facebook
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Memory Consistency
Lecture 9: Memory Consistency Parallel Computing Stanford CS149, Winter 2019 Midterm ▪ Feb 12 ▪ Open notes ▪ Practice midterm Stanford CS149, Winter 2019 Shared Memory Behavior ▪ Intuition says loads should return latest value written - What is latest? - Coherence: only one memory location - Consistency: apparent ordering for all locations - Order in which memory operations performed by one thread become visible to other threads ▪ Affects - Programmability: how programmers reason about program behavior - Allowed behavior of multithreaded programs executing with shared memory - Performance: limits HW/SW optimizations that can be used - Reordering memory operations to hide latency Stanford CS149, Winter 2019 Today: what you should know ▪ Understand the motivation for relaxed consistency models ▪ Understand the implications of relaxing W→R ordering Stanford CS149, Winter 2019 Today: who should care ▪ Anyone who: - Wants to implement a synchronization library - Will ever work a job in kernel (or driver) development - Seeks to implement lock-free data structures * - Does any of the above on ARM processors ** * Topic of a later lecture ** For reasons to be described later Stanford CS149, Winter 2019 Memory coherence vs. memory consistency ▪ Memory coherence defines requirements for the observed Observed chronology of operations on address X behavior of reads and writes to the same memory location - All processors must agree on the order of reads/writes to X P0 write: 5 - In other words: it is possible to put all operations involving X on a timeline such P1 read (5) that the observations of all processors are consistent with that timeline P2 write: 10 ▪ Memory consistency defines the behavior of reads and writes to different locations (as observed by other processors) P2 write: 11 - Coherence only guarantees that writes to address X will eventually propagate to other processors P1 read (11) - Consistency deals with when writes to X propagate to other processors, relative to reads and writes to other addresses Stanford CS149, Winter 2019 Coherence vs. -
Memory Consistency: in a Distributed Outline for Lecture 20 Memory System, References to Memory in Remote Processors Do Not Take Place I
Memory consistency: In a distributed Outline for Lecture 20 memory system, references to memory in remote processors do not take place I. Memory consistency immediately. II. Consistency models not requiring synch. This raises a potential problem. Suppose operations that— Strict consistency Sequential consistency • The value of a particular memory PRAM & processor consist. word in processor 2’s local memory III. Consistency models is 0. not requiring synch. operations. • Then processor 1 writes the value 1 to that word of memory. Note that Weak consistency this is a remote write. Release consistency • Processor 2 then reads the word. But, being local, the read occurs quickly, and the value 0 is returned. What’s wrong with this? This situation can be diagrammed like this (the horizontal axis represents time): P1: W (x )1 P2: R (x )0 Depending upon how the program is written, it may or may not be able to tolerate a situation like this. But, in any case, the programmer must understand what can happen when memory is accessed in a DSM system. Consistency models: A consistency model is essentially a contract between the software and the memory. It says that iff they software agrees to obey certain rules, the memory promises to store and retrieve the expected values. © 1997 Edward F. Gehringer CSC/ECE 501 Lecture Notes Page 220 Strict consistency: The most obvious consistency model is strict consistency. Strict consistency: Any read to a memory location x returns the value stored by the most recent write operation to x. This definition implicitly assumes the existence of a global clock, so that the determination of “most recent” is unambiguous. -
Towards Shared Memory Consistency Models for Gpus
TOWARDS SHARED MEMORY CONSISTENCY MODELS FOR GPUS by Tyler Sorensen A thesis submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Bachelor of Science School of Computing The University of Utah March 2013 FINAL READING APPROVAL I have read the thesis of Tyler Sorensen in its final form and have found that (1) its format, citations, and bibliographic style are consistent and acceptable; (2) its illustrative materials including figures, tables, and charts are in place. Date Ganesh Gopalakrishnan ABSTRACT With the widespread use of GPUs, it is important to ensure that programmers have a clear understanding of their shared memory consistency model i.e. what values can be read when issued concurrently with writes. While memory consistency has been studied for CPUs, GPUs present very different memory and concurrency systems and have not been well studied. We propose a collection of litmus tests that shed light on interesting visibility and ordering properties. These include classical shared memory consistency model properties, such as coherence and write atomicity, as well as GPU specific properties e.g. memory visibility differences between intra and inter block threads. The results of the litmus tests are determined by feedback from industry experts, the limited documentation available and properties common to all modern multi-core systems. Some of the behaviors remain unresolved. Using the results of the litmus tests, we establish a formal state transition model using intuitive data structures and operations. We implement our model in the Murphi modeling language and verify the initial litmus tests. -
Sysml, the Language of MBSE Paul White
Welcome to SysML, the Language of MBSE Paul White October 8, 2019 Brief Introduction About Myself • Work Experience • 2015 – Present: KIHOMAC / BAE – Layton, Utah • 2011 – 2015: Astronautics Corporation of America – Milwaukee, Wisconsin • 2001 – 2011: L-3 Communications – Greenville, Texas • 2000 – 2001: Hynix – Eugene, Oregon • 1999 – 2000: Raytheon – Greenville, Texas • Education • 2019: OMG OCSMP Model Builder—Fundamental Certification • 2011: Graduate Certification in Systems Engineering and Architecting – Stevens Institute of Technology • 1999 – 2004: M.S. Computer Science – Texas A&M University at Commerce • 1993 – 1998: B.S. Computer Science – Texas A&M University • INCOSE • Chapters: Wasatch (2015 – Present), Chicagoland (2011 – 2015), North Texas (2007 – 2011) • Conferences: WSRC (2018), GLRCs (2012-2017) • CSEP: (2017 – Present) • 2019 INCOSE Outstanding Service Award • 2019 INCOSE Wasatch -- Most Improved Chapter Award & Gold Circle Award • Utah Engineers Council (UEC) • 2019 & 2018 Engineer of the Year (INCOSE) for Utah Engineers Council (UEC) • Vice Chair • Family • Married 14 years • Three daughters (1, 12, & 10) 2 Introduction 3 Our Topics • Definitions and Expectations • SysML Overview • Basic Features of SysML • Modeling Tools and Techniques • Next Steps 4 What is Model-based Systems Engineering (MBSE)? Model-based systems engineering (MBSE) is “the formalized application of modeling to support system requirements, design, analysis, verification and validation activities beginning in the conceptual design phase and continuing throughout development and later life cycle phases.” -- INCOSE SE Vision 2020 5 What is Model-based Systems Engineering (MBSE)? “Formal systems modeling is standard practice for specifying, analyzing, designing, and verifying systems, and is fully integrated with other engineering models. System models are adapted to the application domain, and include a broad spectrum of models for representing all aspects of systems. -
Plantuml Language Reference Guide (Version 1.2021.2)
Drawing UML with PlantUML PlantUML Language Reference Guide (Version 1.2021.2) PlantUML is a component that allows to quickly write : • Sequence diagram • Usecase diagram • Class diagram • Object diagram • Activity diagram • Component diagram • Deployment diagram • State diagram • Timing diagram The following non-UML diagrams are also supported: • JSON Data • YAML Data • Network diagram (nwdiag) • Wireframe graphical interface • Archimate diagram • Specification and Description Language (SDL) • Ditaa diagram • Gantt diagram • MindMap diagram • Work Breakdown Structure diagram • Mathematic with AsciiMath or JLaTeXMath notation • Entity Relationship diagram Diagrams are defined using a simple and intuitive language. 1 SEQUENCE DIAGRAM 1 Sequence Diagram 1.1 Basic examples The sequence -> is used to draw a message between two participants. Participants do not have to be explicitly declared. To have a dotted arrow, you use --> It is also possible to use <- and <--. That does not change the drawing, but may improve readability. Note that this is only true for sequence diagrams, rules are different for the other diagrams. @startuml Alice -> Bob: Authentication Request Bob --> Alice: Authentication Response Alice -> Bob: Another authentication Request Alice <-- Bob: Another authentication Response @enduml 1.2 Declaring participant If the keyword participant is used to declare a participant, more control on that participant is possible. The order of declaration will be the (default) order of display. Using these other keywords to declare participants -
Sysml Distilled: a Brief Guide to the Systems Modeling Language
ptg11539604 Praise for SysML Distilled “In keeping with the outstanding tradition of Addison-Wesley’s techni- cal publications, Lenny Delligatti’s SysML Distilled does not disappoint. Lenny has done a masterful job of capturing the spirit of OMG SysML as a practical, standards-based modeling language to help systems engi- neers address growing system complexity. This book is loaded with matter-of-fact insights, starting with basic MBSE concepts to distin- guishing the subtle differences between use cases and scenarios to illu- mination on namespaces and SysML packages, and even speaks to some of the more esoteric SysML semantics such as token flows.” — Jeff Estefan, Principal Engineer, NASA’s Jet Propulsion Laboratory “The power of a modeling language, such as SysML, is that it facilitates communication not only within systems engineering but across disci- plines and across the development life cycle. Many languages have the ptg11539604 potential to increase communication, but without an effective guide, they can fall short of that objective. In SysML Distilled, Lenny Delligatti combines just the right amount of technology with a common-sense approach to utilizing SysML toward achieving that communication. Having worked in systems and software engineering across many do- mains for the last 30 years, and having taught computer languages, UML, and SysML to many organizations and within the college setting, I find Lenny’s book an invaluable resource. He presents the concepts clearly and provides useful and pragmatic examples to get you off the ground quickly and enables you to be an effective modeler.” — Thomas W. Fargnoli, Lead Member of the Engineering Staff, Lockheed Martin “This book provides an excellent introduction to SysML. -
On the Coexistence of Shared-Memory and Message-Passing in The
On the Co existence of SharedMemory and MessagePassing in the Programming of Parallel Applications 1 2 J Cordsen and W SchroderPreikschat GMD FIRST Rudower Chaussee D Berlin Germany University of Potsdam Am Neuen Palais D Potsdam Germany Abstract Interop erability in nonsequential applications requires com munication to exchange information using either the sharedmemory or messagepassing paradigm In the past the communication paradigm in use was determined through the architecture of the underlying comput ing platform Sharedmemory computing systems were programmed to use sharedmemory communication whereas distributedmemory archi tectures were running applications communicating via messagepassing Current trends in the architecture of parallel machines are based on sharedmemory and distributedmemory For scalable parallel applica tions in order to maintain transparency and eciency b oth communi cation paradigms have to co exist Users should not b e obliged to know when to use which of the two paradigms On the other hand the user should b e able to exploit either of the paradigms directly in order to achieve the b est p ossible solution The pap er presents the VOTE communication supp ort system VOTE provides co existent implementations of sharedmemory and message passing communication Applications can change the communication paradigm dynamically at runtime thus are able to employ the under lying computing system in the most convenient and applicationoriented way The presented case study and detailed p erformance analysis un derpins the -
Consistency Models • Data-Centric Consistency Models • Client-Centric Consistency Models
Consistency and Replication • Today: – Consistency models • Data-centric consistency models • Client-centric consistency models Computer Science CS677: Distributed OS Lecture 15, page 1 Why replicate? • Data replication: common technique in distributed systems • Reliability – If one replica is unavailable or crashes, use another – Protect against corrupted data • Performance – Scale with size of the distributed system (replicated web servers) – Scale in geographically distributed systems (web proxies) • Key issue: need to maintain consistency of replicated data – If one copy is modified, others become inconsistent Computer Science CS677: Distributed OS Lecture 15, page 2 Object Replication •Approach 1: application is responsible for replication – Application needs to handle consistency issues •Approach 2: system (middleware) handles replication – Consistency issues are handled by the middleware – Simplifies application development but makes object-specific solutions harder Computer Science CS677: Distributed OS Lecture 15, page 3 Replication and Scaling • Replication and caching used for system scalability • Multiple copies: – Improves performance by reducing access latency – But higher network overheads of maintaining consistency – Example: object is replicated N times • Read frequency R, write frequency W • If R<<W, high consistency overhead and wasted messages • Consistency maintenance is itself an issue – What semantics to provide? – Tight consistency requires globally synchronized clocks! • Solution: loosen consistency requirements -
Challenges for the Message Passing Interface in the Petaflops Era
Challenges for the Message Passing Interface in the Petaflops Era William D. Gropp Mathematics and Computer Science www.mcs.anl.gov/~gropp What this Talk is About The title talks about MPI – Because MPI is the dominant parallel programming model in computational science But the issue is really – What are the needs of the parallel software ecosystem? – How does MPI fit into that ecosystem? – What are the missing parts (not just from MPI)? – How can MPI adapt or be replaced in the parallel software ecosystem? – Short version of this talk: • The problem with MPI is not with what it has but with what it is missing Lets start with some history … Argonne National Laboratory 2 Quotes from “System Software and Tools for High Performance Computing Environments” (1993) “The strongest desire expressed by these users was simply to satisfy the urgent need to get applications codes running on parallel machines as quickly as possible” In a list of enabling technologies for mathematical software, “Parallel prefix for arbitrary user-defined associative operations should be supported. Conflicts between system and library (e.g., in message types) should be automatically avoided.” – Note that MPI-1 provided both Immediate Goals for Computing Environments: – Parallel computer support environment – Standards for same – Standard for parallel I/O – Standard for message passing on distributed memory machines “The single greatest hindrance to significant penetration of MPP technology in scientific computing is the absence of common programming interfaces across various parallel computing systems” Argonne National Laboratory 3 Quotes from “Enabling Technologies for Petaflops Computing” (1995): “The software for the current generation of 100 GF machines is not adequate to be scaled to a TF…” “The Petaflops computer is achievable at reasonable cost with technology available in about 20 years [2014].” – (estimated clock speed in 2004 — 700MHz)* “Software technology for MPP’s must evolve new ways to design software that is portable across a wide variety of computer architectures. -
An Overview on Cyclops-64 Architecture - a Status Report on the Programming Model and Software Infrastructure
An Overview on Cyclops-64 Architecture - A Status Report on the Programming Model and Software Infrastructure Guang R. Gao Endowed Distinguished Professor Electrical & Computer Engineering University of Delaware [email protected] 2007/6/14 SOS11-06-2007.ppt 1 Outline • Introduction • Multi-Core Chip Technology • IBM Cyclops-64 Architecture/Software • Cyclops-64 Programming Model and System Software • Future Directions • Summary 2007/6/14 SOS11-06-2007.ppt 2 TIPs of compute power operating on Tera-bytes of data Transistor Growth in the near future Source: Keynote talk in CGO & PPoPP 03/14/07 by Jesse Fang from Intel 2007/6/14 SOS11-06-2007.ppt 3 Outline • Introduction • Multi-Core Chip Technology • IBM Cyclops-64 Architecture/Software • Programming/Compiling for Cyclops-64 • Looking Beyond Cyclops-64 • Summary 2007/6/14 SOS11-06-2007.ppt 4 Two Types of Multi-Core Architecture Trends • Type I: Glue “heavy cores” together with minor changes • Type II: Explore the parallel architecture design space and searching for most suitable chip architecture models. 2007/6/14 SOS11-06-2007.ppt 5 Multi-Core Type II • New factors to be considered –Flops are cheap! –Memory per core is small –Cache-coherence is expensive! –On-chip bandwidth can be enormous! –Examples: Cyclops-64, and others 2007/6/14 SOS11-06-2007.ppt 6 Flops are Cheap! An example to illustrate design tradeoffs: • If fed from small, local register files: 64-bit FP – 3200 GB/s, 10 pJ/op unit – < $1/Gflop (60 mW/Gflop) (drawn to a 64-bit FPU is < 1mm^2 scale) and ~= 50pJ • If fed from global on-chip memory: Can fit over 200 on a chip. -
Constraint Graph Analysis of Multithreaded Programs
Journal of Instruction-Level Parallelism 6 (2004) 1-23 Submitted 10/03; published 4/04 Constraint Graph Analysis of Multithreaded Programs Harold W. Cain [email protected] Computer Sciences Dept., University of Wisconsin 1210 W. Dayton St., Madison, WI 53706 Mikko H. Lipasti [email protected] Dept. of Electrical and Computer Engineering, University of Wisconsin 1415 Engineering Dr., Madison, WI 53706 Ravi Nair [email protected] IBM T.J. Watson Research Laboratory P.O. Box 218, Yorktown Heights, NY 10598 Abstract This paper presents a framework for analyzing the performance of multithreaded programs using a model called a constraint graph. We review previous constraint graph definitions for sequentially consistent systems, and extend these definitions for use in analyzing other memory consistency models. Using this framework, we present two constraint graph analysis case studies using several commercial and scientific workloads running on a full system simulator. The first case study illus- trates how a constraint graph can be used to determine the necessary conditions for implementing a memory consistency model, rather than conservative sufficient conditions. Using this method, we classify coherence misses as either required or unnecessary. We determine that on average over 30% of all load instructions that suffer cache misses due to coherence activity are unnecessarily stalled because the original copy of the cache line could have been used without violating the mem- ory consistency model. Based on this observation, we present a novel delayed consistency imple- mentation that uses stale cache lines when possible. The second case study demonstrates the effects of memory consistency constraints on the fundamental limits of instruction level parallelism, com- pared to previous estimates that did not include multiprocessor constraints. -
Case No COMP/M.4747 Œ IBM / TELELOGIC REGULATION (EC)
EN This text is made available for information purposes only. A summary of this decision is published in all Community languages in the Official Journal of the European Union. Case No COMP/M.4747 – IBM / TELELOGIC Only the English text is authentic. REGULATION (EC) No 139/2004 MERGER PROCEDURE Article 8(1) Date: 05/03/2008 Brussels, 05/03/2008 C(2008) 823 final PUBLIC VERSION COMMISSION DECISION of 05/03/2008 declaring a concentration to be compatible with the common market and the EEA Agreement (Case No COMP/M.4747 - IBM/ TELELOGIC) COMMISSION DECISION of 05/03/2008 declaring a concentration to be compatible with the common market and the EEA Agreement (Case No COMP/M.4747 - IBM/ TELELOGIC) (Only the English text is authentic) (Text with EEA relevance) THE COMMISSION OF THE EUROPEAN COMMUNITIES, Having regard to the Treaty establishing the European Community, Having regard to the Agreement on the European Economic Area, and in particular Article 57 thereof, Having regard to Council Regulation (EC) No 139/2004 of 20 January 2004 on the control of concentrations between undertakings1, and in particular Article 8(1) thereof, Having regard to the Commission's decision of 3 October 2007 to initiate proceedings in this case, After consulting the Advisory Committee on Concentrations2, Having regard to the final report of the Hearing Officer in this case3, Whereas: 1 OJ L 24, 29.1.2004, p. 1 2 OJ C ...,...200. , p.... 3 OJ C ...,...200. , p.... 2 I. INTRODUCTION 1. On 29 August 2007, the Commission received a notification of a proposed concentration pursuant to Article 4 and following a referral pursuant to Article 4(5) of Council Regulation (EC) No 139/2004 ("the Merger Regulation") by which the undertaking International Business Machines Corporation ("IBM", USA) acquires within the meaning of Article 3(1)(b) of the Council Regulation control of the whole of the undertaking Telelogic AB ("Telelogic", Sweden) by way of a public bid which was announced on 11 June 2007.