A Programmer-Oriented Approach to Safe Concurrency

Total Page:16

File Type:pdf, Size:1020Kb

A Programmer-Oriented Approach to Safe Concurrency A Programmer-Oriented Approach to Safe Concurrency Aaron Greenhouse May 2003 CMU-CS-03-135 Computer Science Department School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Thesis Committee: William L. Scherlis, Chair Thomas Gross, Co-Chair Guy E. Blelloch John Boyland, University of Wisconsin–Milwaukee Copyright c 2003 Aaron Greenhouse Effort sponsored in part through the High Dependability Computing Program from NASA Ames cooperative agree- ment NCC-2-1298, in part by the Defense Advanced Research Projects Agency (DARPA) and Air Force Research Lab- oratory (AFRL), Air Force Materiel Command, USAF, under agreement number F30602-99-2-0522, and in part through the Carnegie Mellon School of Computer Science Siebel Scholars Program. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsement, either expressed or implied, of NASA, DARPA, AFRL, or the U.S. Government. Keywords: Java, concurrency, multi-threading, analysis, annotation, software engineering tools, program understanding, assurance, thread-safety For Irachka. Abstract Assuring and evolving concurrent programs requires understanding the concurrency- related design decisions used in their implementation. In Java-style shared-memory programs, these decisions include which state is shared, how access to it is regulated, and the policy that distinguishes desired concurrency from race conditions. Source code often does not reveal these design decisions because they rarely have purely local manifestations in the code, or because they cannot be inferred from code. Many pro- grammers believe it is too difficult to explicate the models in ordinary practice. As a result, this design intent is usually not expressed, and it is therefore generally infeasible to assure that concurrent programs are free of race conditions. This thesis is about a practicable approach to capturing and expressing design in- tent, and, through the use of annotations and composable static analyses, assuring con- sistency of code and intent as both evolve. We use case studies from production Java code and a prototype analysis tool to explore the costs and benefits of a new annotation- based approach for expressing design intent. Our annotations express “mechanical” properties that programmers must already be considering, such as lock–state associ- ations, uniqueness of references, and conceptual aggregations of state. Our analyses reveal race conditions in a variety of case study samples which were drawn from li- brary code and production open source projects. We developed a prototype tool that embodies static analysis techniques for assuring consistency between code and models (expressed as code annotations). Our experience with the tool provides some preliminary evidence of the practicability of our approach for ordinary programmers on deadlines. The dominant design consideration for the tool was adherence to the principle of “early gratification”—some assurance can be obtained with minimal or no annotation effort, and additional increments of annotation are rewarded with additional increments of assurance. The novel technical features of this approach include (1) regions as flexible ag- gregations of state that can cross object boundaries, (2) a region-based object-oriented effects system; (3) analysis to track the association of locks with regions, (4) policy descriptions for allowable method interleavings, and (5) an incremental process for inserting, validating, and exploiting annotations. Acknowledgements It’s been a longer journey than I originally intended, but I’m finally done with my dissertation. This was not a solitary journey, and I owe thanks to the many people who gave me support along the way. Obviously, I would like to thank my advisor, Bill Scherlis, for his invaluable advice, guidance, encouragement, and enthusiasm. I’d like to thank my co-advisor, Thomas Gross, and the rest of my committee for their time and for the helpful feedback they have provided. My research wasn’t performed in a vacuum, and without the research and engineering results of the other members of the Fluid Group, this dissertation would never have been possible. Thank you, John Boyland, Edwin Chan, Tim Halloran, Elissa Newman, Dean Sutherland, and everyone else. I’d like to thank my wife, Irene, for her unfailing support in all things. Her exceptional patience and emotional support during the past few months have been invaluable to the completion of this dissertation and preservation of my well-being. I’d like to thank my parents, Anna and Gerald Greenhouse, who besides bringing me into the world—a big win for me—also instilled in me the appreciation for learning and education that got me into this mess to begin with. I’d like to thank my newly acquired in-laws, Alla and Victor Sorokorensky, for their support. I’d like to thank my only long-term office-mate, Orna Raz, for putting up with me. I hope our discussions have been as useful for your research as they have been for mine. On a more serious note, I would like to thank the taxpayers of the United States of America. Without their support my work could never have been funded. I would also like to thank Siebel Systems for their support of the Siebel Scholars Program. On a less serious note, I would like the thank the fortune cookie that I received earlier this week for its vote of encouragement: “Soon you will be sitting on top of the world.” So thank you, tasty vii snack treat, and know that you were consumed to support a worthwhile activity! Finally, I should point out that examples used in this dissertation are taken from copyrighted sources: • The Apache/Jakarta Log4j logging library is Copyright c 1999 The Apache Software Foun- dation. All Rights Reserved. • The W3C Jigsaw web server is Copyright c 1998 World Wide Web Consortium, (Mas- sachusetts Institute of Technology, Institut National de Recherche en Informatique et en Au- tomatique, Keio University). All Rights Reserved. • The Java 2 SDK, Standard Edition, Version 1.4.1 is Copyright c 2003 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All Rights Reserved. viii Contents 1 Introduction 1 1.1 Missing Design Information ............................. 1 1.1.1 Kinds of Software Information . ....................... 3 1.1.2 Recording Design Intent ........................... 5 1.1.3 Intent vs. Accident . ............................. 6 1.1.4 Assuring Consistency Between Intent and Code . ............ 7 1.2 Concurrency-Related Design Intent . ....................... 8 1.2.1 Establishing Safe Concurrency . ....................... 9 1.3 Example: Missing Models . ............................. 10 1.3.1 Class BoundedFIFO ............................. 11 1.3.2 The State of BoundedFIFO ......................... 13 1.3.3 Protecting BoundedFIFO .......................... 13 1.3.4 Evolution and Misunderstood Intent . .................. 15 1.3.5 Summary . .................................. 16 1.4 Evolution and Unknown Intent ............................ 18 1.5 Locking Design and Representation Invariants . .................. 20 1.6 Towards a Generative Approach ........................... 22 1.6.1 Source-level Program Transformation . .................. 23 1.6.2 The Generative Approach . ....................... 24 1.6.3 Tool Support ................................. 25 1.7 Outline ........................................ 25 2 Concurrency and Java 27 2.1 Shared-Memory Concurrent Programming ...................... 28 2.2 Lock-Based Concurrency Management ....................... 29 x CONTENTS 2.2.1 Missing Intent . ............................. 30 2.3 Condition Variables .................................. 31 2.4 Monitors ........................................ 32 2.5 Additional Risks of Concurrency ........................... 32 3 Recording Design Intent 33 3.1 Analysis and Assurance of Design Intent ....................... 33 3.2 The FLUIDJAVA Language . ............................. 34 3.2.1 The Language ................................. 34 3.2.2 Language Predicates ............................. 35 3.2.3 Typing Rules ................................. 37 3.3 Labeling Expressions ................................. 40 3.4 Binding Context Analysis . ............................. 41 4 An Object-Oriented Effects System 45 4.1 Regions Identify State ................................. 47 4.1.1 The Region Hierarchy ............................ 47 4.1.2 An Example ................................. 48 4.1.3 Regions and Subclasses ........................... 50 4.2 Targets and State Aliasing . ............................. 50 4.2.1 Kinds of Targets . ............................. 51 4.2.2 Targets and Method Effects . ....................... 52 4.3 Effects . ........................................ 53 4.3.1 Computing Effects . ............................. 54 4.3.2 Comparing Effects . ............................. 54 4.3.3 Checking Declared Effects . ....................... 55 4.4 Example: Class AtomicInteger .......................... 56 4.5 Example: Class BoundedFIFO ............................ 58 4.5.1 Annotating BoundedFIFO .......................... 59 4.6 State Aggregation through Uniqueness . ....................... 60 4.7 State Aggregation
Recommended publications
  • Introduction to Multi-Threading and Vectorization Matti Kortelainen Larsoft Workshop 2019 25 June 2019 Outline
    Introduction to multi-threading and vectorization Matti Kortelainen LArSoft Workshop 2019 25 June 2019 Outline Broad introductory overview: • Why multithread? • What is a thread? • Some threading models – std::thread – OpenMP (fork-join) – Intel Threading Building Blocks (TBB) (tasks) • Race condition, critical region, mutual exclusion, deadlock • Vectorization (SIMD) 2 6/25/19 Matti Kortelainen | Introduction to multi-threading and vectorization Motivations for multithreading Image courtesy of K. Rupp 3 6/25/19 Matti Kortelainen | Introduction to multi-threading and vectorization Motivations for multithreading • One process on a node: speedups from parallelizing parts of the programs – Any problem can get speedup if the threads can cooperate on • same core (sharing L1 cache) • L2 cache (may be shared among small number of cores) • Fully loaded node: save memory and other resources – Threads can share objects -> N threads can use significantly less memory than N processes • If smallest chunk of data is so big that only one fits in memory at a time, is there any other option? 4 6/25/19 Matti Kortelainen | Introduction to multi-threading and vectorization What is a (software) thread? (in POSIX/Linux) • “Smallest sequence of programmed instructions that can be managed independently by a scheduler” [Wikipedia] • A thread has its own – Program counter – Registers – Stack – Thread-local memory (better to avoid in general) • Threads of a process share everything else, e.g. – Program code, constants – Heap memory – Network connections – File handles
    [Show full text]
  • Lecture 4 Resource Protection and Thread Safety
    Concurrency and Correctness – Resource Protection and TS Lecture 4 Resource Protection and Thread Safety 1 Danilo Piparo – CERN, EP-SFT Concurrency and Correctness – Resource Protection and TS This Lecture The Goals: 1) Understand the problem of contention of resources within a parallel application 2) Become familiar with the design principles and techniques to cope with it 3) Appreciate the advantage of non-blocking techniques The outline: § Threads and data races: synchronisation issues § Useful design principles § Replication, atomics, transactions and locks § Higher level concrete solutions 2 Danilo Piparo – CERN, EP-SFT Concurrency and Correctness – Resource Protection and TS Threads and Data Races: Synchronisation Issues 3 Danilo Piparo – CERN, EP-SFT Concurrency and Correctness – Resource Protection and TS The Problem § Fastest way to share data: access the same memory region § One of the advantages of threads § Parallel memory access: delicate issue - race conditions § I.e. behaviour of the system depends on the sequence of events which are intrinsically asynchronous § Consequences, in order of increasing severity § Catastrophic terminations: segfaults, crashes § Non-reproducible, intermittent bugs § Apparently sane execution but data corruption: e.g. wrong value of a variable or of a result Operative definition: An entity which cannot run w/o issues linked to parallel execution is said to be thread-unsafe (the contrary is thread-safe) 4 Danilo Piparo – CERN, EP-SFT Concurrency and Correctness – Resource Protection and TS To Be Precise: Data Race Standard language rules, §1.10/4 and /21: • Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location.
    [Show full text]
  • Clojure, Given the Pun on Closure, Representing Anything Specific
    dynamic, functional programming for the JVM “It (the logo) was designed by my brother, Tom Hickey. “It I wanted to involve c (c#), l (lisp) and j (java). I don't think we ever really discussed the colors Once I came up with Clojure, given the pun on closure, representing anything specific. I always vaguely the available domains and vast emptiness of the thought of them as earth and sky.” - Rich Hickey googlespace, it was an easy decision..” - Rich Hickey Mark Volkmann [email protected] Functional Programming (FP) In the spirit of saying OO is is ... encapsulation, inheritance and polymorphism ... • Pure Functions • produce results that only depend on inputs, not any global state • do not have side effects such as Real applications need some changing global state, file I/O or database updates side effects, but they should be clearly identified and isolated. • First Class Functions • can be held in variables • can be passed to and returned from other functions • Higher Order Functions • functions that do one or both of these: • accept other functions as arguments and execute them zero or more times • return another function 2 ... FP is ... Closures • main use is to pass • special functions that retain access to variables a block of code that were in their scope when the closure was created to a function • Partial Application • ability to create new functions from existing ones that take fewer arguments • Currying • transforming a function of n arguments into a chain of n one argument functions • Continuations ability to save execution state and return to it later think browser • back button 3 ..
    [Show full text]
  • CSC 553 Operating Systems Multiple Processes
    CSC 553 Operating Systems Lecture 4 - Concurrency: Mutual Exclusion and Synchronization Multiple Processes • Operating System design is concerned with the management of processes and threads: • Multiprogramming • Multiprocessing • Distributed Processing Concurrency Arises in Three Different Contexts: • Multiple Applications – invented to allow processing time to be shared among active applications • Structured Applications – extension of modular design and structured programming • Operating System Structure – OS themselves implemented as a set of processes or threads Key Terms Related to Concurrency Principles of Concurrency • Interleaving and overlapping • can be viewed as examples of concurrent processing • both present the same problems • Uniprocessor – the relative speed of execution of processes cannot be predicted • depends on activities of other processes • the way the OS handles interrupts • scheduling policies of the OS Difficulties of Concurrency • Sharing of global resources • Difficult for the OS to manage the allocation of resources optimally • Difficult to locate programming errors as results are not deterministic and reproducible Race Condition • Occurs when multiple processes or threads read and write data items • The final result depends on the order of execution – the “loser” of the race is the process that updates last and will determine the final value of the variable Operating System Concerns • Design and management issues raised by the existence of concurrency: • The OS must: – be able to keep track of various processes
    [Show full text]
  • Towards Gradually Typed Capabilities in the Pi-Calculus
    Towards Gradually Typed Capabilities in the Pi-Calculus Matteo Cimini University of Massachusetts Lowell Lowell, MA, USA matteo [email protected] Gradual typing is an approach to integrating static and dynamic typing within the same language, and puts the programmer in control of which regions of code are type checked at compile-time and which are type checked at run-time. In this paper, we focus on the π-calculus equipped with types for the modeling of input-output capabilities of channels. We present our preliminary work towards a gradually typed version of this calculus. We present a type system, a cast insertion procedure that automatically inserts run-time checks, and an operational semantics of a π-calculus that handles casts on channels. Although we do not claim any theoretical results on our formulations, we demonstrate our calculus with an example and discuss our future plans. 1 Introduction This paper presents preliminary work on integrating dynamically typed features into a typed π-calculus. Pierce and Sangiorgi have defined a typed π-calculus in which channels can be assigned types that express input and output capabilities [16]. Channels can be declared to be input-only, output-only, or that can be used for both input and output operations. As a consequence, this type system can detect unintended or malicious channel misuses at compile-time. The static typing nature of the type system prevents the modeling of scenarios in which the input- output discipline of a channel is discovered at run-time. In these scenarios, such a discipline must be enforced with run-time type checks in the style of dynamic typing.
    [Show full text]
  • Java Concurrency in Practice
    Java Concurrency in practice Chapters: 1,2, 3 & 4 Bjørn Christian Sebak ([email protected]) Karianne Berg ([email protected]) INF329 – Spring 2007 Chapter 1 - Introduction Brief history of concurrency Before OS, a computer executed a single program from start to finnish But running a single program at a time is an inefficient use of computer hardware Therefore all modern OS run multiple programs (in seperate processes) Brief history of concurrency (2) Factors for running multiple processes: Resource utilization: While one program waits for I/O, why not let another program run and avoid wasting CPU cycles? Fairness: Multiple users/programs might have equal claim of the computers resources. Avoid having single large programs „hog“ the machine. Convenience: Often desirable to create smaller programs that perform a single task (and coordinate them), than to have one large program that do ALL the tasks What is a thread? A „lightweight process“ - each process can have many threads Threads allow multiple streams of program flow to coexits in a single process. While a thread share process-wide resources like memory and files with other threads, they all have their own program counter, stack and local variables Benefits of threads 1) Exploiting multiple processors 2) Simplicity of modeling 3) Simplified handling of asynchronous events 4) More responsive user interfaces Benefits of threads (2) Exploiting multiple processors The processor industry is currently focusing on increasing number of cores on a single CPU rather than increasing clock speed. Well-designed programs with multiple threads can execute simultaneously on multiple processors, increasing resource utilization.
    [Show full text]
  • Enhanced Debugging for Data Races in Parallel
    ENHANCED DEBUGGING FOR DATA RACES IN PARALLEL PROGRAMS USING OPENMP A Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of the Requirements for the Degree Master of Science By Marcus W. Hervey December 2016 ENHANCED DEBUGGING FOR DATA RACES IN PARALLEL PROGRAMS USING OPENMP Marcus W. Hervey APPROVED: Dr. Edgar Gabriel, Chairman Dept. of Computer Science Dr. Shishir Shah Dept. of Computer Science Dr. Barbara Chapman Dept. of Computer Science, Stony Brook University Dean, College of Natural Sciences and Mathematics ii iii Acknowledgements A special thank you to Dr. Barbara Chapman, Ph.D., Dr. Edgar Gabriel, Ph.D., Dr. Shishir Shah, Ph.D., Dr. Lei Huang, Ph.D., Dr. Chunhua Liao, Ph.D., Dr. Laksano Adhianto, Ph.D., and Dr. Oscar Hernandez, Ph.D. for their guidance and support throughout this endeavor. I would also like to thank Van Bui, Deepak Eachempati, James LaGrone, and Cody Addison for their friendship and teamwork, without which this endeavor would have never been accomplished. I dedicate this thesis to my parents (Billy and Olivia Hervey) who have always chal- lenged me to be my best, and to my wife and kids for their love and sacrifice throughout this process. ”Our greatest weakness lies in giving up. The most certain way to succeed is always to try just one more time.” – Thomas Alva Edison iv ENHANCED DEBUGGING FOR DATA RACES IN PARALLEL PROGRAMS USING OPENMP An Abstract of a Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of the Requirements for the Degree Master of Science By Marcus W.
    [Show full text]
  • Reconciling Real and Stochastic Time: the Need for Probabilistic Refinement
    DOI 10.1007/s00165-012-0230-y The Author(s) © 2012. This article is published with open access at Springerlink.com Formal Aspects Formal Aspects of Computing (2012) 24: 497–518 of Computing Reconciling real and stochastic time: the need for probabilistic refinement J. Markovski1,P.R.D’Argenio2,J.C.M.Baeten1,3 and E. P. de Vink1,3 1 Eindhoven University of Technology, Eindhoven, The Netherlands. E-mail: [email protected] 2 FaMAF, Universidad Nacional de Cordoba,´ Cordoba,´ Argentina 3 Centrum Wiskunde & Informatica, Amsterdam, The Netherlands Abstract. We conservatively extend an ACP-style discrete-time process theory with discrete stochastic delays. The semantics of the timed delays relies on time additivity and time determinism, which are properties that enable us to merge subsequent timed delays and to impose their synchronous expiration. Stochastic delays, however, interact with respect to a so-called race condition that determines the set of delays that expire first, which is guided by an (implicit) probabilistic choice. The race condition precludes the property of time additivity as the merger of stochastic delays alters this probabilistic behavior. To this end, we resolve the race condition using conditionally-distributed unit delays. We give a sound and ground-complete axiomatization of the process the- ory comprising the standard set of ACP-style operators. In this generalized setting, the alternative composition is no longer associative, so we have to resort to special normal forms that explicitly resolve the underlying race condition. Our treatment succeeds in the initial challenge to conservatively extend standard time with stochastic time. However, the ‘dissection’ of the stochastic delays to conditionally-distributed unit delays comes at a price, as we can no longer relate the resolved race condition to the original stochastic delays.
    [Show full text]
  • C/C++ Thread Safety Analysis
    C/C++ Thread Safety Analysis DeLesley Hutchins Aaron Ballman Dean Sutherland Google Inc. CERT/SEI Email: [email protected] Email: [email protected] Email: [email protected] Abstract—Writing multithreaded programs is hard. Static including MacOS, Linux, and Windows. The analysis is analysis tools can help developers by allowing threading policies currently implemented as a compiler warning. It has been to be formally specified and mechanically checked. They essen- deployed on a large scale at Google; all C++ code at Google is tially provide a static type system for threads, and can detect potential race conditions and deadlocks. now compiled with thread safety analysis enabled by default. This paper describes Clang Thread Safety Analysis, a tool II. OVERVIEW OF THE ANALYSIS which uses annotations to declare and enforce thread safety policies in C and C++ programs. Clang is a production-quality Thread safety analysis works very much like a type system C++ compiler which is available on most platforms, and the for multithreaded programs. It is based on theoretical work analysis can be enabled for any build with a simple warning on race-free type systems [3]. In addition to declaring the flag: −Wthread−safety. The analysis is deployed on a large scale at Google, where type of data ( int , float , etc.), the programmer may optionally it has provided sufficient value in practice to drive widespread declare how access to that data is controlled in a multithreaded voluntary adoption. Contrary to popular belief, the need for environment. annotations has not been a liability, and even confers some Clang thread safety analysis uses annotations to declare benefits with respect to software evolution and maintenance.
    [Show full text]
  • Lecture 18 18.1 Distributed and Concurrent Systems 18.2 Race
    CMPSCI 691ST Systems Fall 2011 Lecture 18 Lecturer: Emery Berger Scribe: Sean Barker 18.1 Distributed and Concurrent Systems In this lecture we discussed two broad issues in concurrent systems: clocks and race detection. The first paper we read introduced the Lamport clock, which is a scalar providing a logical clock value for each process. Every time a process sends a message, it increments its clock value, and every time a process receives a message, it sets its clock value to the max of the current value and the message value. Clock values are used to provide a `happens-before' relationship, which provides a consistent partial ordering of events across processes. Lamport clocks were later extended to vector clocks, in which each of n processes maintains an array (or vector) of n clock values (one for each process) rather than just a single value as in Lamport clocks. Vector clocks are much more precise than Lamport clocks, since they can check whether intermediate events from anyone have occurred (by looking at the max across the entire vector), and have essentially replaced Lamport clocks today. Moreover, since distributed systems are already sending many messages, the cost of maintaining vector clocks is generally not a big deal. Clock synchronization is still a hard problem, however, because physical clock values will always have some amount of drift. The FastTrack paper we read leverages vector clocks to do race detection. 18.2 Race Detection Broadly speaking, a race condition occurs when at least two threads are accessing a variable, and one of them writes the value while another is reading.
    [Show full text]
  • Programming Clojure Second Edition
    Extracted from: Programming Clojure Second Edition This PDF file contains pages extracted from Programming Clojure, published by the Pragmatic Bookshelf. For more information or to purchase a paperback or PDF copy, please visit http://www.pragprog.com. Note: This extract contains some colored text (particularly in code listing). This is available only in online versions of the books. The printed versions are black and white. Pagination might vary between the online and printer versions; the content is otherwise identical. Copyright © 2010 The Pragmatic Programmers, LLC. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior consent of the publisher. The Pragmatic Bookshelf Dallas, Texas • Raleigh, North Carolina Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and The Pragmatic Programmers, LLC was aware of a trademark claim, the designations have been printed in initial capital letters or in all capitals. The Pragmatic Starter Kit, The Pragmatic Programmer, Pragmatic Programming, Pragmatic Bookshelf, PragProg and the linking g device are trade- marks of The Pragmatic Programmers, LLC. Every precaution was taken in the preparation of this book. However, the publisher assumes no responsibility for errors or omissions, or for damages that may result from the use of information (including program listings) contained herein. Our Pragmatic courses, workshops, and other products can help you and your team create better software and have more fun.
    [Show full text]
  • Breadth First Search Vectorization on the Intel Xeon Phi
    Breadth First Search Vectorization on the Intel Xeon Phi Mireya Paredes Graham Riley Mikel Luján University of Manchester University of Manchester University of Manchester Oxford Road, M13 9PL Oxford Road, M13 9PL Oxford Road, M13 9PL Manchester, UK Manchester, UK Manchester, UK [email protected] [email protected] [email protected] ABSTRACT Today, scientific experiments in many domains, and large Breadth First Search (BFS) is a building block for graph organizations can generate huge amounts of data, e.g. social algorithms and has recently been used for large scale anal- networks, health-care records and bio-informatics applica- ysis of information in a variety of applications including so- tions [8]. Graphs seem to be a good match for important cial networks, graph databases and web searching. Due to and large dataset analysis as they can abstract real world its importance, a number of different parallel programming networks. To process large graphs, different techniques have models and architectures have been exploited to optimize been applied, including parallel programming. The main the BFS. However, due to the irregular memory access pat- challenge in the parallelization of graph processing is that terns and the unstructured nature of the large graphs, its large graphs are often unstructured and highly irregular and efficient parallelization is a challenge. The Xeon Phi is a this limits their scalability and performance when executing massively parallel architecture available as an off-the-shelf on off-the-shelf systems [18]. accelerator, which includes a powerful 512 bit vector unit The Breadth First Search (BFS) is a building block of with optimized scatter and gather functions.
    [Show full text]