Concurrency and Synchronisation Learning Outcomes Textbook Inter- Thread and Process Communication Critical Region

Total Page:16

File Type:pdf, Size:1020Kb

Concurrency and Synchronisation Learning Outcomes Textbook Inter- Thread and Process Communication Critical Region Learning Outcomes • Understand concurrency is an issue in operating systems and multithreaded applications Concurrency and • Know the concept of a critical region . • Understand how mutual exclusion of critical Synchronisation regions can be used to solve concurrency issues – Including how mutual exclusion can be implemented correctly and efficiently. • Be able to identify and solve a producer consumer bounded buffer problem. • Understand and apply standard synchronisation primitives to solve synchronisation problems. 1 2 Making Single-Threaded Code Multithreaded Textbook • Sections 2.3 & 2.4 Conflicts between threads over the use of a 3 global variable 4 Inter - Thread and Process Communication Critical Region • We can control access to the shared resource by controlling access to the code that accesses the resource. ⇒ A critical region is a region of code where We have a shared resources are accessed. race condition – Variables, memory, files, etc… • Uncoordinated entry to the critical region results in a race condition Two processes want to access shared memory at same ⇒ Incorrect behaviour, deadlock, lost work,… time 5 6 1 Critical Regions Example critical sections struct node { void insert(struct *item) int data; { struct node *next; item->next = head; }; head = item; struct node *head; } void init(void) struct node *remove(void) { { head = NULL; struct node *t; } t = head; if (t != NULL) { head = head->next; • Simple last-in-first-out queue } implemented as a linked list. return t; } Mutual exclusion using critical regions 7 8 Example critical sections Critical Regions struct node { void insert(struct *item) int data; { Also called critical sections struct node *next; item->next = head; }; head = item; struct node *head; } Conditions required of any solution to the critical void init(void) struct node *remove(void) region problem { { Mutual Exclusion: head = NULL; struct node *t; No two processes simultaneously in critical region } t = head; if (t != NULL) { No assumptions made about speeds or numbers of head = head->next; CPUs • Critical sections } return t; Progress } No process running outside its critical region may block another process Bounded No process must wait forever to enter its critical region 9 10 A solution? A solution? • A lock variable while(TRUE) { while(TRUE) { while(lock == 1); while(lock == 1); – If lock == 1, lock = 1; lock = 1; • somebody is in the critical section and we must critical(); critical(); wait lock = 0 lock = 0 – If lock == 0, non_critical(); non_critical(); } } • nobody is in the critical section and we are free to enter 11 12 2 A problematic execution Observation sequence while(TRUE) { • Unfortunately, it is usually easier to show while(TRUE) { while(lock == 1); something does not work, than it is to while(lock == 1); prove that it does work. lock = 1; – Ideally, we’d like to prove, or at least lock = 1; critical(); critical(); informally demonstrate, that our solutions lock = 0 work. non_critical(); } lock = 0 non_critical(); } 13 14 Mutual Exclusion by Taking Turns Mutual Exclusion by Taking Turns • Works due to strict alternation – Each process takes turns • Cons – Busy waiting – Process must wait its turn even while the other process is doing something else. • With many processes, must wait for everyone to have a turn – Does not guarantee progress if a process no longer needs a turn. • Poor solution when processes require the critical section at Proposed solution to critical region problem differing rates (a) Process 0. (b) Process 1. 15 16 Mutual Exclusion by Disabling Peterson’s Solution Interrupts • See the textbook • Before entering a critical region, disable interrupts • After leaving the critical region, enable interrupts • Pros – simple • Cons – Only available in the kernel – Blocks everybody else, even with no contention • Slows interrupt response time – Does not work on a multiprocessor 17 18 3 Hardware Support for mutual Mutual Exclusion with Test-and-Set exclusion • Test and set instruction – Can be used to implement lock variables correctly • It loads the value of the lock • If lock == 0, – set the lock to 1 – return the result 0 – we acquire the lock • If lock == 1 – return 1 – another thread/process has the lock – Hardware guarantees that the instruction executes atomically. • Atomically: As an indivisible unit. Entering and leaving a critical region using the 19 TSL instruction 20 Test-and-Set Tackling the Busy-Wait Problem • Pros – Simple (easy to show it’s correct) • Sleep / Wakeup – Available at user-level – The idea • To any number of processors • To implement any number of lock variables • When process is waiting for an event, it calls sleep to block, instead of busy waiting. • Cons • The the event happens, the event generator – Busy waits (also termed a spin lock) (another process) calls wakeup to unblock the • Consumes CPU • Livelock in the presence of priorities sleeping process. – If a low priority process has the lock and a high priority process attempts to get it, the high priority process will busy-wait forever. • Starvation is possible when a process leaves its critical section and more than one process is waiting. 21 22 The Producer-Consumer Issues Problem • We must keep an accurate count of items in buffer • Also called the bounded buffer problem – Producer • can sleep when the buffer is full, • A producer produces data items and stores the • and wakeup when there is empty space in the buffer items in a buffer – The consumer can call wakeup when it consumes the first entry of the full buffer • A consumer takes the items out of the buffer and – Consumer consumes them. • Can sleep when the buffer is empty Producer • And wake up when there are items available – Producer can call wakeup when it adds the first item to the buffer X X X Producer Consumer X X X 23 Consumer 24 4 Pseudo-code for producer and Problems consumer int count = 0; con() { int count = 0; con() { #define N 4 /* buf size */ while(TRUE) { #define N 4 /* buf size */ while(TRUE) { prod() { if (count == 0) prod() { if (count == 0) while(TRUE) { sleep(); while(TRUE) { sleep(); item = produce() remove_item(); item = produce() remove_item(); if (count == N) count--; if (count == N) count--; sleep(); if (count == N-1) sleep(); if (count == N-1) insert_item(); wakeup(prod); insert_item(); wakeup(prod); count++; } count++; } if (count == 1) } if (count == 1) } Concurrent wakeup(con); wakeup(con); uncontrolled access to the } } buffer } } 25 26 Problems Proposed Solution int count = 0; con() { #define N 4 /* buf size */ while(TRUE) { • Lets use a locking primitive based on test- prod() { if (count == 0) and-set to protect the concurrent access while(TRUE) { sleep(); item = produce() remove_item(); if (count == N) count--; sleep(); if (count == N-1) insert_item(); wakeup(prod); count++; } if (count == 1) } Concurrent wakeup(con); uncontrolled access to the } counter } 27 28 Problematic execution sequence Proposed solution? con() { while(TRUE) { if (count == 0) int count = 0; con() { prod() { #define N 4 /* buf size */ while(TRUE) { while(TRUE) { item = produce() prod() { if (count == 0) while(TRUE) { if (count == N) wakeup without a sleep(); item = produce() sleep(); matching sleep is acquire_lock() acquire_lock() if (count == N) lost remove_item(); insert_item(); sleep(); count++; acquire_lock() count--; release_lock() insert_item(); release_lock(); if (count == 1) count++; if (count == N-1) wakeup(con); release_lock() sleep(); wakeup(prod); acquire_lock() if (count == 1) } remove_item(); wakeup(con); } count--; } release_lock(); } if (count == N-1) wakeup(prod); 29 } 30 } 5 Problem Semaphores • The test for some condition and actually • Dijkstra (1965) introduced two primitives going to sleep needs to be atomic that are more powerful than simple sleep and wakeup alone. • The following does not work – P(): proberen, from Dutch to test. acquire_lock() if (count == N) – V(): verhogen, from Dutch to increment. sleep(); release_lock() – Also called wait & signal, down & up. The lock is held while asleep ⇒ count will never change 31 32 How do they work Semaphore Implementation • If a resource is not available, the corresponding • Define a semaphore as a record semaphore blocks any process wait ing for the resource typedef struct { • Blocked processes are put into a process queue int count; maintained by the semaphore (avoids busy waiting!) struct process *L; • When a process releases a resource, it signal s this by } semaphore; means of the semaphore • Signalling resumes a blocked process if there is any • Assume two simple operations: • Wait and signal operations cannot be interrupted – sleep suspends the process that invokes it. • Complex coordination can be implemented by multiple – wakeup( P) resumes the execution of a blocked semaphores process P. 33 34 • Semaphore operations now defined as wait (S): Semaphore as a General S.count--; if (S.count < 0) { Synchronization Tool add this process to S.L; sleep; • Execute B in Pj only after A executed in Pi } • Use semaphore count initialized to 0 signal (S): • Code: S.count++; Pi Pj if (S.count <= 0) { MM remove a process P from S.L; wakeup(P); A wait (flag ) } • Each primitive is atomic signal (flag ) B 35 36 6 Semaphore Implementation of a Solving the producer-consumer Mutex problem with semaphores • Mutex is short for Mutual Exclusion #define N = 4 – Can also be called a lock semaphore mutex = 1; semaphore mutex; mutex.count = 1; /* initialise mutex */ /* count empty slots */ wait(mutex); /* enter the critcal region */ semaphore empty
Recommended publications
  • Introduction to Multi-Threading and Vectorization Matti Kortelainen Larsoft Workshop 2019 25 June 2019 Outline
    Introduction to multi-threading and vectorization Matti Kortelainen LArSoft Workshop 2019 25 June 2019 Outline Broad introductory overview: • Why multithread? • What is a thread? • Some threading models – std::thread – OpenMP (fork-join) – Intel Threading Building Blocks (TBB) (tasks) • Race condition, critical region, mutual exclusion, deadlock • Vectorization (SIMD) 2 6/25/19 Matti Kortelainen | Introduction to multi-threading and vectorization Motivations for multithreading Image courtesy of K. Rupp 3 6/25/19 Matti Kortelainen | Introduction to multi-threading and vectorization Motivations for multithreading • One process on a node: speedups from parallelizing parts of the programs – Any problem can get speedup if the threads can cooperate on • same core (sharing L1 cache) • L2 cache (may be shared among small number of cores) • Fully loaded node: save memory and other resources – Threads can share objects -> N threads can use significantly less memory than N processes • If smallest chunk of data is so big that only one fits in memory at a time, is there any other option? 4 6/25/19 Matti Kortelainen | Introduction to multi-threading and vectorization What is a (software) thread? (in POSIX/Linux) • “Smallest sequence of programmed instructions that can be managed independently by a scheduler” [Wikipedia] • A thread has its own – Program counter – Registers – Stack – Thread-local memory (better to avoid in general) • Threads of a process share everything else, e.g. – Program code, constants – Heap memory – Network connections – File handles
    [Show full text]
  • CSC 553 Operating Systems Multiple Processes
    CSC 553 Operating Systems Lecture 4 - Concurrency: Mutual Exclusion and Synchronization Multiple Processes • Operating System design is concerned with the management of processes and threads: • Multiprogramming • Multiprocessing • Distributed Processing Concurrency Arises in Three Different Contexts: • Multiple Applications – invented to allow processing time to be shared among active applications • Structured Applications – extension of modular design and structured programming • Operating System Structure – OS themselves implemented as a set of processes or threads Key Terms Related to Concurrency Principles of Concurrency • Interleaving and overlapping • can be viewed as examples of concurrent processing • both present the same problems • Uniprocessor – the relative speed of execution of processes cannot be predicted • depends on activities of other processes • the way the OS handles interrupts • scheduling policies of the OS Difficulties of Concurrency • Sharing of global resources • Difficult for the OS to manage the allocation of resources optimally • Difficult to locate programming errors as results are not deterministic and reproducible Race Condition • Occurs when multiple processes or threads read and write data items • The final result depends on the order of execution – the “loser” of the race is the process that updates last and will determine the final value of the variable Operating System Concerns • Design and management issues raised by the existence of concurrency: • The OS must: – be able to keep track of various processes
    [Show full text]
  • Static Analysis of a Concurrent Programming Language by Abstract Interpretation
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Concordia University Research Repository STATIC ANALYSIS OF A CONCURRENT PROGRAMMING LANGUAGE BY ABSTRACT INTERPRETATION Maryam Zakeryfar A thesis in The Department of Computer Science and Software Engineering Presented in Partial Fulfillment of the Requirements For the Degree of Doctor of Philosophy (Computer Science) Concordia University Montreal,´ Quebec,´ Canada March 2014 © Maryam Zakeryfar, 2014 Concordia University School of Graduate Studies This is to certify that the thesis prepared By: Mrs. Maryam Zakeryfar Entitled: Static Analysis of a Concurrent Programming Language by Abstract Interpretation and submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science) complies with the regulations of this University and meets the accepted standards with respect to originality and quality. Signed by the final examining committee: Dr. Deborah Dysart-Gale Chair Dr. Weichang Du External Examiner Dr. Mourad Debbabi Examiner Dr. Olga Ormandjieva Examiner Dr. Joey Paquet Examiner Dr. Peter Grogono Supervisor Approved by Dr. V. Haarslev, Graduate Program Director Christopher W. Trueman, Dean Faculty of Engineering and Computer Science Abstract Static Analysis of a Concurrent Programming Language by Abstract Interpretation Maryam Zakeryfar, Ph.D. Concordia University, 2014 Static analysis is an approach to determine information about the program without actually executing it. There has been much research in the static analysis of concurrent programs. However, very little academic research has been done on the formal analysis of message passing or process-oriented languages. We currently miss formal analysis tools and tech- niques for concurrent process-oriented languages such as Erasmus .
    [Show full text]
  • Sleeping Barber
    Sleeping Barber CSCI 201 Principles of Software Development Jeffrey Miller, Ph.D. [email protected] Outline • Sleeping Barber USC CSCI 201L Sleeping Barber Overview ▪ The Sleeping Barber problem contains one barber and a number of customers ▪ There are a certain number of waiting seats › If a customer enters and there is a seat available, he will wait › If a customer enters and there is no seat available, he will leave ▪ When the barber isn’t cutting someone’s hair, he sits in his barber chair and sleeps › If the barber is sleeping when a customer enters, he must wake the barber up USC CSCI 201L 3/8 Program ▪ Write a solution to the Sleeping Barber problem. You will need to utilize synchronization and conditions. USC CSCI 201L 4/8 Program Steps – main Method ▪ The SleepingBarber will be the main class, so create the SleepingBarber class with a main method › The main method will instantiate the SleepingBarber › Then create a certain number of Customer threads that arrive at random times • Have the program sleep for a random amount of time between customer arrivals › Have the main method wait to finish executing until all of the Customer threads have finished › Print out that there are no more customers, then wake up the barber if he is sleeping so he can go home for the day USC CSCI 201L 5/8 Program Steps – SleepingBarber ▪ The SleepingBarber constructor will initialize some member variables › Total number of seats in the waiting room › Total number of customers who will be coming into the barber shop › A boolean variable indicating
    [Show full text]
  • Towards Gradually Typed Capabilities in the Pi-Calculus
    Towards Gradually Typed Capabilities in the Pi-Calculus Matteo Cimini University of Massachusetts Lowell Lowell, MA, USA matteo [email protected] Gradual typing is an approach to integrating static and dynamic typing within the same language, and puts the programmer in control of which regions of code are type checked at compile-time and which are type checked at run-time. In this paper, we focus on the π-calculus equipped with types for the modeling of input-output capabilities of channels. We present our preliminary work towards a gradually typed version of this calculus. We present a type system, a cast insertion procedure that automatically inserts run-time checks, and an operational semantics of a π-calculus that handles casts on channels. Although we do not claim any theoretical results on our formulations, we demonstrate our calculus with an example and discuss our future plans. 1 Introduction This paper presents preliminary work on integrating dynamically typed features into a typed π-calculus. Pierce and Sangiorgi have defined a typed π-calculus in which channels can be assigned types that express input and output capabilities [16]. Channels can be declared to be input-only, output-only, or that can be used for both input and output operations. As a consequence, this type system can detect unintended or malicious channel misuses at compile-time. The static typing nature of the type system prevents the modeling of scenarios in which the input- output discipline of a channel is discovered at run-time. In these scenarios, such a discipline must be enforced with run-time type checks in the style of dynamic typing.
    [Show full text]
  • Bounded Model Checking for Asynchronous Concurrent Systems
    Bounded Model Checking for Asynchronous Concurrent Systems Manitra Johanesa Rakotoarisoa Department of Computer Architecture Technical University of Catalonia Advisor: Enric Pastor Llorens Thesis submitted to obtain the qualification of Doctor from the Technical University of Catalonia To my mother Contents Abstract xvii Acknowledgments xix 1 Introduction1 1.1 Symbolic Model Checking...........................3 1.1.1 BDD-based approach..........................3 1.1.2 SAT-based Approach..........................6 1.2 Synchronous Versus Asynchronous Systems.................8 1.2.1 Synchronous systems..........................8 1.2.2 Asynchronous systems......................... 10 1.3 Scope of This Work.............................. 11 1.4 Structure of the Thesis............................. 12 2 Background 13 2.1 Transition Systems............................... 13 2.1.1 Definitions............................... 13 2.1.2 Symbolic Representation........................ 15 2.2 Other Models for Concurrent Systems.................... 17 2.2.1 Kripke Structure............................ 17 2.2.2 Petri Nets................................ 21 2.2.3 Automata................................ 22 2.3 Linear Temporal Logic............................. 24 2.4 Satisfiability Problem............................. 26 2.4.1 DPLL Algorithm............................ 27 2.4.2 Stålmarck’s Algorithm......................... 28 2.4.3 Other Methods for Solving SAT................... 32 2.5 Bounded Model Checking........................... 34 2.5.1 BMC Idea...............................
    [Show full text]
  • Enhanced Debugging for Data Races in Parallel
    ENHANCED DEBUGGING FOR DATA RACES IN PARALLEL PROGRAMS USING OPENMP A Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of the Requirements for the Degree Master of Science By Marcus W. Hervey December 2016 ENHANCED DEBUGGING FOR DATA RACES IN PARALLEL PROGRAMS USING OPENMP Marcus W. Hervey APPROVED: Dr. Edgar Gabriel, Chairman Dept. of Computer Science Dr. Shishir Shah Dept. of Computer Science Dr. Barbara Chapman Dept. of Computer Science, Stony Brook University Dean, College of Natural Sciences and Mathematics ii iii Acknowledgements A special thank you to Dr. Barbara Chapman, Ph.D., Dr. Edgar Gabriel, Ph.D., Dr. Shishir Shah, Ph.D., Dr. Lei Huang, Ph.D., Dr. Chunhua Liao, Ph.D., Dr. Laksano Adhianto, Ph.D., and Dr. Oscar Hernandez, Ph.D. for their guidance and support throughout this endeavor. I would also like to thank Van Bui, Deepak Eachempati, James LaGrone, and Cody Addison for their friendship and teamwork, without which this endeavor would have never been accomplished. I dedicate this thesis to my parents (Billy and Olivia Hervey) who have always chal- lenged me to be my best, and to my wife and kids for their love and sacrifice throughout this process. ”Our greatest weakness lies in giving up. The most certain way to succeed is always to try just one more time.” – Thomas Alva Edison iv ENHANCED DEBUGGING FOR DATA RACES IN PARALLEL PROGRAMS USING OPENMP An Abstract of a Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of the Requirements for the Degree Master of Science By Marcus W.
    [Show full text]
  • Reconciling Real and Stochastic Time: the Need for Probabilistic Refinement
    DOI 10.1007/s00165-012-0230-y The Author(s) © 2012. This article is published with open access at Springerlink.com Formal Aspects Formal Aspects of Computing (2012) 24: 497–518 of Computing Reconciling real and stochastic time: the need for probabilistic refinement J. Markovski1,P.R.D’Argenio2,J.C.M.Baeten1,3 and E. P. de Vink1,3 1 Eindhoven University of Technology, Eindhoven, The Netherlands. E-mail: [email protected] 2 FaMAF, Universidad Nacional de Cordoba,´ Cordoba,´ Argentina 3 Centrum Wiskunde & Informatica, Amsterdam, The Netherlands Abstract. We conservatively extend an ACP-style discrete-time process theory with discrete stochastic delays. The semantics of the timed delays relies on time additivity and time determinism, which are properties that enable us to merge subsequent timed delays and to impose their synchronous expiration. Stochastic delays, however, interact with respect to a so-called race condition that determines the set of delays that expire first, which is guided by an (implicit) probabilistic choice. The race condition precludes the property of time additivity as the merger of stochastic delays alters this probabilistic behavior. To this end, we resolve the race condition using conditionally-distributed unit delays. We give a sound and ground-complete axiomatization of the process the- ory comprising the standard set of ACP-style operators. In this generalized setting, the alternative composition is no longer associative, so we have to resort to special normal forms that explicitly resolve the underlying race condition. Our treatment succeeds in the initial challenge to conservatively extend standard time with stochastic time. However, the ‘dissection’ of the stochastic delays to conditionally-distributed unit delays comes at a price, as we can no longer relate the resolved race condition to the original stochastic delays.
    [Show full text]
  • Using TOST in Teaching Operating Systems and Concurrent Programming Concepts
    Advances in Science, Technology and Engineering Systems Journal Vol. 5, No. 6, 96-107 (2020) ASTESJ www.astesj.com ISSN: 2415-6698 Special Issue on Multidisciplinary Innovation in Engineering Science & Technology Using TOST in Teaching Operating Systems and Concurrent Programming Concepts Tzanko Golemanov*, Emilia Golemanova Department of Computer Systems and Technologies, University of Ruse, Ruse, 7020, Bulgaria A R T I C L E I N F O A B S T R A C T Article history: The paper is aimed as a concise and relatively self-contained description of the educational Received: 30 July, 2020 environment TOST, used in teaching and learning Operating Systems basics such as Accepted: 15 October, 2020 Processes, Multiprogramming, Timesharing, Scheduling strategies, and Memory Online: 08 November, 2020 management. TOST also aids education in some important IT concepts such as Deadlock, Mutual exclusion, and Concurrent processes synchronization. The presented integrated Keywords: environment allows the students to develop and run programs in two simple programming Operating Systems languages, and at the same time, the data in the main system tables can be monitored. The Concurrent Programming paper consists of a description of TOST system, the features of the built-in programming Teaching Tools languages, and demonstrations of teaching the basic Operating Systems principles. In addition, some of the well-known concurrent processing problems are solved to illustrate TOST usage in parallel programming teaching. 1. Introduction • Visual OS simulators This paper is an extension of work originally presented in 29th The systems from the first group (MINIX [2], Nachos [3], Xinu Annual Conference of the European Association for Education in [4], Pintos [5], GeekOS [6]) run on real hardware and are too Electrical and Information Engineering (EAEEIE) [1].
    [Show full text]
  • Lecture 18 18.1 Distributed and Concurrent Systems 18.2 Race
    CMPSCI 691ST Systems Fall 2011 Lecture 18 Lecturer: Emery Berger Scribe: Sean Barker 18.1 Distributed and Concurrent Systems In this lecture we discussed two broad issues in concurrent systems: clocks and race detection. The first paper we read introduced the Lamport clock, which is a scalar providing a logical clock value for each process. Every time a process sends a message, it increments its clock value, and every time a process receives a message, it sets its clock value to the max of the current value and the message value. Clock values are used to provide a `happens-before' relationship, which provides a consistent partial ordering of events across processes. Lamport clocks were later extended to vector clocks, in which each of n processes maintains an array (or vector) of n clock values (one for each process) rather than just a single value as in Lamport clocks. Vector clocks are much more precise than Lamport clocks, since they can check whether intermediate events from anyone have occurred (by looking at the max across the entire vector), and have essentially replaced Lamport clocks today. Moreover, since distributed systems are already sending many messages, the cost of maintaining vector clocks is generally not a big deal. Clock synchronization is still a hard problem, however, because physical clock values will always have some amount of drift. The FastTrack paper we read leverages vector clocks to do race detection. 18.2 Race Detection Broadly speaking, a race condition occurs when at least two threads are accessing a variable, and one of them writes the value while another is reading.
    [Show full text]
  • Breadth First Search Vectorization on the Intel Xeon Phi
    Breadth First Search Vectorization on the Intel Xeon Phi Mireya Paredes Graham Riley Mikel Luján University of Manchester University of Manchester University of Manchester Oxford Road, M13 9PL Oxford Road, M13 9PL Oxford Road, M13 9PL Manchester, UK Manchester, UK Manchester, UK [email protected] [email protected] [email protected] ABSTRACT Today, scientific experiments in many domains, and large Breadth First Search (BFS) is a building block for graph organizations can generate huge amounts of data, e.g. social algorithms and has recently been used for large scale anal- networks, health-care records and bio-informatics applica- ysis of information in a variety of applications including so- tions [8]. Graphs seem to be a good match for important cial networks, graph databases and web searching. Due to and large dataset analysis as they can abstract real world its importance, a number of different parallel programming networks. To process large graphs, different techniques have models and architectures have been exploited to optimize been applied, including parallel programming. The main the BFS. However, due to the irregular memory access pat- challenge in the parallelization of graph processing is that terns and the unstructured nature of the large graphs, its large graphs are often unstructured and highly irregular and efficient parallelization is a challenge. The Xeon Phi is a this limits their scalability and performance when executing massively parallel architecture available as an off-the-shelf on off-the-shelf systems [18]. accelerator, which includes a powerful 512 bit vector unit The Breadth First Search (BFS) is a building block of with optimized scatter and gather functions.
    [Show full text]
  • Rapid Prototyping of High-Performance Concurrent Java Applications
    Rapid Prototyping of High-Performance Concurrent Java Applications K.J.R. Powell Master of Science School of Computer Science School of Informatics University of Edinburgh 2002 (Graduation date:December 2002) Abstract The design and implementation of concurrent applications is more challenging than that of sequential applications. The aim of this project is to address this challenge by producing an application which can generate skeleton Java systems from a high-level PEPA modelling language description. By automating the process of translating the design into a Java skeleton system, the result will maintain the performance and be- havioural characteristics of the model, providing a sound framework for completing the concurrent application. This method accelerates the process of initial implementation whilst ensuring the system characterisics remain true to the high-level model. i Acknowledgements Many thanks to my supervisor Stephen Gilmore for his excellent guidance and ideas. I’d also like to express my gratitude to L.L. for his assistance above and beyond the call of duty in the field of editing. ii Declaration I declare that this thesis was composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text, and that this work has not been submitted for any other degree or professional qualification except as specified. (K.J.R. Powell) iii Table of Contents 1 Introduction 1 1.1 Principal Goal . 1 1.2 PEPA and Java . 2 1.3 Project Objective . 2 2 Performance Evaluation Process Algebra 4 2.1 Overview of PEPA and its role in design . 4 2.2 The Syntax of PEPA .
    [Show full text]