Nowa: a Wait-Free Continuation-Stealing Concurrency Platform
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
8A. Going Parallel
2019/2020 UniPD - T. Vardanega 02/06/2020 Terminology /1 • Program = static piece of text 8a. Going parallel – Instructions + link-time data • Processor = resource that can execute a program – In a “multi-processor,” processors may – Share uniformly one common address space for main memory – Have non-uniform access to shared memory – Have unshared parts of memory – Share no memory as in “Shared Nothing” (distributed memory) Where we take a bird’s eye look at what architectures changes (again!) when jobs become • Process = instance of program + run-time data parallel in response to the quest for best – Run-time data = registers + stack(s) + heap(s) use of (massively) parallel hardware Parallel Lang Support 505 Terminology /2 Terms in context: LWT within LWP • No uniform naming of threads of control within process –Thread, Kernel Thread, OS Thread, Task, Job, Light-Weight Process, Virtual CPU, Virtual Processor, Execution Agent, Executor, Server Thread, Worker Thread –Taskgenerally describes a logical piece of work – Element of a software architecture –Threadgenerally describes a virtual CPU, a thread of control within a process –Jobin the context of a real-time system generally describes a single execution consequent to a task release Lightweight because OS • No uniform naming of user-level lightweight threads does not schedule threads –Task, Microthread, Picothread, Strand, Tasklet, Fiber, Lightweight Thread, Work Item – Called “user-level” in that scheduling is done by code outside of Exactly the same logic applies to user-level lightweight the kernel/operating-system threads (aka tasklets), which do not need scheduling Parallel Lang Support 506 Parallel Lang Support 507 Real-Time Systems 1 2019/2020 UniPD - T. -
Correct and Efficient Work-Stealing for Weak Memory Models
Correct and Efficient Work-Stealing for Weak Memory Models Nhat Minh Lê Antoniu Pop Albert Cohen Francesco Zappa Nardelli INRIA and ENS Paris Abstract Lev’s deque for the ARM architectures, and prove its correctness Chase and Lev’s concurrent deque is a key data structure in shared- against the memory semantics defined in [12] and [7]. Our second memory parallel programming and plays an essential role in work- contribution is a systematic study of the performance of several stealing schedulers. We provide the first correctness proof of an implementations of Chase–Lev on relaxed hardware. In detail, we optimized implementation of Chase and Lev’s deque on top of the compare our optimized ARM implementation against a standard POWER and ARM architectures: these provide very relaxed mem- implementation for the x86 architecture and two portable variants ory models, which we exploit to improve performance but consider- expressed in C11: a reference sequentially consistent translation ably complicate the reasoning. We also study an optimized x86 and of the algorithm, and an aggressively optimized version making a portable C11 implementation, conducting systematic experiments full use of the release–acquire and relaxed semantics offered by to evaluate the impact of memory barrier optimizations. Our results C11 low-level atomics. These implementations of the Chase–Lev demonstrate the benefits of hand tuning the deque code when run- deque are evaluated in the context of a work-stealing scheduler. We ning on top of relaxed memory models. consider diverse worker/thief configurations, including a synthetic benchmark with two different workloads and standard task-parallel Categories and Subject Descriptors D.1.3 [Programming Tech- kernels. -
Package 'Slurmr'
Package ‘slurmR’ September 3, 2021 Title A Lightweight Wrapper for 'Slurm' Version 0.5-1 Description 'Slurm', Simple Linux Utility for Resource Management <https://slurm.schedmd.com/>, is a popular 'Linux' based software used to schedule jobs in 'HPC' (High Performance Computing) clusters. This R package provides a specialized lightweight wrapper of 'Slurm' with a syntax similar to that found in the 'parallel' R package. The package also includes a method for creating socket cluster objects spanning multiple nodes that can be used with the 'parallel' package. Depends R (>= 3.3.0), parallel License MIT + file LICENSE BugReports https://github.com/USCbiostats/slurmR/issues URL https://github.com/USCbiostats/slurmR, https://slurm.schedmd.com/ Encoding UTF-8 RoxygenNote 7.1.1 Suggests knitr, rmarkdown, covr, tinytest Imports utils VignetteBuilder knitr Language en-US NeedsCompilation no Author George Vega Yon [aut, cre] (<https://orcid.org/0000-0002-3171-0844>), Paul Marjoram [ctb, ths] (<https://orcid.org/0000-0003-0824-7449>), National Cancer Institute (NCI) [fnd] (Grant Number 5P01CA196569-02), Michael Schubert [rev] (JOSS reviewer, <https://orcid.org/0000-0002-6862-5221>), Michel Lang [rev] (JOSS reviewer, <https://orcid.org/0000-0001-9754-0393>) Maintainer George Vega Yon <[email protected]> Repository CRAN Date/Publication 2021-09-03 04:20:02 UTC 1 2 expand_array_indexes R topics documented: expand_array_indexes . .2 JOB_STATE_CODES . .3 makeSlurmCluster . .4 new_rscript . .6 opts_slurmR . .7 parse_flags . .9 random_job_name . .9 read_sbatch . 10 slurmR . 11 slurmr_docker . 11 slurm_available . 12 Slurm_clean . 15 Slurm_collect . 16 Slurm_env . 17 Slurm_EvalQ . 18 slurm_job . 19 Slurm_log . 21 Slurm_Map . 22 snames . 25 sourceSlurm . 25 status . 28 the_plan . -
TEE Internal Core API Specification V1.1.2.50
GlobalPlatform Technology TEE Internal Core API Specification Version 1.1.2.50 (Target v1.2) Public Review June 2018 Document Reference: GPD_SPE_010 Copyright 2011-2018 GlobalPlatform, Inc. All Rights Reserved. Recipients of this document are invited to submit, with their comments, notification of any relevant patents or other intellectual property rights (collectively, “IPR”) of which they may be aware which might be necessarily infringed by the implementation of the specification or other work product set forth in this document, and to provide supporting documentation. The technology provided or described herein is subject to updates, revisions, and extensions by GlobalPlatform. This documentation is currently in draft form and is being reviewed and enhanced by the Committees and Working Groups of GlobalPlatform. Use of this information is governed by the GlobalPlatform license agreement and any use inconsistent with that agreement is strictly prohibited. TEE Internal Core API Specification – Public Review v1.1.2.50 (Target v1.2) THIS SPECIFICATION OR OTHER WORK PRODUCT IS BEING OFFERED WITHOUT ANY WARRANTY WHATSOEVER, AND IN PARTICULAR, ANY WARRANTY OF NON-INFRINGEMENT IS EXPRESSLY DISCLAIMED. ANY IMPLEMENTATION OF THIS SPECIFICATION OR OTHER WORK PRODUCT SHALL BE MADE ENTIRELY AT THE IMPLEMENTER’S OWN RISK, AND NEITHER THE COMPANY, NOR ANY OF ITS MEMBERS OR SUBMITTERS, SHALL HAVE ANY LIABILITY WHATSOEVER TO ANY IMPLEMENTER OR THIRD PARTY FOR ANY DAMAGES OF ANY NATURE WHATSOEVER DIRECTLY OR INDIRECTLY ARISING FROM THE IMPLEMENTATION OF THIS SPECIFICATION OR OTHER WORK PRODUCT. Copyright 2011-2018 GlobalPlatform, Inc. All Rights Reserved. The technology provided or described herein is subject to updates, revisions, and extensions by GlobalPlatform. -
Powerview Command Reference
PowerView Command Reference TRACE32 Online Help TRACE32 Directory TRACE32 Index TRACE32 Documents ...................................................................................................................... PowerView User Interface ............................................................................................................ PowerView Command Reference .............................................................................................1 History ...................................................................................................................................... 12 ABORT ...................................................................................................................................... 13 ABORT Abort driver program 13 AREA ........................................................................................................................................ 14 AREA Message windows 14 AREA.CLEAR Clear area 15 AREA.CLOSE Close output file 15 AREA.Create Create or modify message area 16 AREA.Delete Delete message area 17 AREA.List Display a detailed list off all message areas 18 AREA.OPEN Open output file 20 AREA.PIPE Redirect area to stdout 21 AREA.RESet Reset areas 21 AREA.SAVE Save AREA window contents to file 21 AREA.Select Select area 22 AREA.STDERR Redirect area to stderr 23 AREA.STDOUT Redirect area to stdout 23 AREA.view Display message area in AREA window 24 AutoSTOre .............................................................................................................................. -
Xshell 6 User Guide Secure Terminal Emualtor
Xshell 6 User Guide Secure Terminal Emualtor NetSarang Computer, Inc. Copyright © 2018 NetSarang Computer, Inc. All rights reserved. Xshell Manual This software and various documents have been produced by NetSarang Computer, Inc. and are protected by the Copyright Act. Consent from the copyright holder must be obtained when duplicating, distributing or citing all or part of this software and related data. This software and manual are subject to change without prior notice for product functions improvement. Xlpd and Xftp are trademarks of NetSarang Computer, Inc. Xmanager and Xshell are registered trademarks of NetSarang Computer, Inc. Microsoft Windows is a registered trademark of Microsoft. UNIX is a registered trademark of AT&T Bell Laboratories. SSH is a registered trademark of SSH Communications Security. Secure Shell is a trademark of SSH Communications Security. This software includes software products developed through the OpenSSL Project and used in OpenSSL Toolkit. NetSarang Computer, Inc. 4701 Patrick Henry Dr. BLDG 22 Suite 137 Santa Clara, CA 95054 http://www.netsarang.com/ Contents About Xshell ............................................................................................................................................... 1 Key Functions ........................................................................................................... 1 Minimum System Requirements .................................................................................. 3 Install and Uninstall .................................................................................................. -
Lock-Free Programming
Lock-Free Programming Geoff Langdale L31_Lockfree 1 Desynchronization ● This is an interesting topic ● This will (may?) become even more relevant with near ubiquitous multi-processing ● Still: please don’t rewrite any Project 3s! L31_Lockfree 2 Synchronization ● We received notification via the web form that one group has passed the P3/P4 test suite. Congratulations! ● We will be releasing a version of the fork-wait bomb which doesn't make as many assumptions about task id's. – Please look for it today and let us know right away if it causes any trouble for you. ● Personal and group disk quotas have been grown in order to reduce the number of people running out over the weekend – if you try hard enough you'll still be able to do it. L31_Lockfree 3 Outline ● Problems with locking ● Definition of Lock-free programming ● Examples of Lock-free programming ● Linux OS uses of Lock-free data structures ● Miscellanea (higher-level constructs, ‘wait-freedom’) ● Conclusion L31_Lockfree 4 Problems with Locking ● This list is more or less contentious, not equally relevant to all locking situations: – Deadlock – Priority Inversion – Convoying – “Async-signal-safety” – Kill-tolerant availability – Pre-emption tolerance – Overall performance L31_Lockfree 5 Problems with Locking 2 ● Deadlock – Processes that cannot proceed because they are waiting for resources that are held by processes that are waiting for… ● Priority inversion – Low-priority processes hold a lock required by a higher- priority process – Priority inheritance a possible solution L31_Lockfree -
An Efficient Work-Stealing Scheduler for Task Dependency Graph
2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS) An Efficient Work-Stealing Scheduler for Task Dependency Graph Chun-Xun Lin Tsung-Wei Huang Martin D. F. Wong Department of Electrical Department of Electrical Department of Electrical and Computer Engineering, and Computer Engineering, and Computer Engineering, University of Illinois Urbana-Champaign, The University of Utah, University of Illinois Urbana-Champaign, Urbana, IL, USA Salt Lake City, Utah Urbana, IL, USA [email protected] [email protected] [email protected] Abstract—Work-stealing is a key component of many parallel stealing scheduler is not an easy job, especially when dealing task graph libraries such as Intel Threading Building Blocks with task dependency graph where the parallelism could be (TBB) FlowGraph, Microsoft Task Parallel Library (TPL) very irregular and unstructured. Due to the decentralized ar- Batch.Net, Cpp-Taskflow, and Nabbit. However, designing a correct and effective work-stealing scheduler is a notoriously chitecture, developers have to sort out many implementation difficult job, due to subtle implementation details of concur- details to efficiently manage workers such as deciding the rency controls and decentralized coordination between threads. number of steals attempted by a thief and how to mitigate This problem becomes even more challenging when striving the resource contention between workers. The problem is for optimal thread usage in handling parallel workloads even more challenging when considering throughput and with complex task graphs. As a result, we introduce in this paper an effective work-stealing scheduler for execution of energy efficiency, which have emerged as critical issues in task dependency graphs. -
Intel Threading Building Blocks
Praise for Intel Threading Building Blocks “The Age of Serial Computing is over. With the advent of multi-core processors, parallel- computing technology that was once relegated to universities and research labs is now emerging as mainstream. Intel Threading Building Blocks updates and greatly expands the ‘work-stealing’ technology pioneered by the MIT Cilk system of 15 years ago, providing a modern industrial-strength C++ library for concurrent programming. “Not only does this book offer an excellent introduction to the library, it furnishes novices and experts alike with a clear and accessible discussion of the complexities of concurrency.” — Charles E. Leiserson, MIT Computer Science and Artificial Intelligence Laboratory “We used to say make it right, then make it fast. We can’t do that anymore. TBB lets us design for correctness and speed up front for Maya. This book shows you how to extract the most benefit from using TBB in your code.” — Martin Watt, Senior Software Engineer, Autodesk “TBB promises to change how parallel programming is done in C++. This book will be extremely useful to any C++ programmer. With this book, James achieves two important goals: • Presents an excellent introduction to parallel programming, illustrating the most com- mon parallel programming patterns and the forces governing their use. • Documents the Threading Building Blocks C++ library—a library that provides generic algorithms for these patterns. “TBB incorporates many of the best ideas that researchers in object-oriented parallel computing developed in the last two decades.” — Marc Snir, Head of the Computer Science Department, University of Illinois at Urbana-Champaign “This book was my first introduction to Intel Threading Building Blocks. -
Waste Transfer Stations: a Manual for Decision-Making Acknowledgments
Waste Transfer Stations: A Manual for Decision-Making Acknowledgments he Office of Solid Waste (OSW) would like to acknowledge and thank the members of the Solid Waste Association of North America Focus Group and the National Environmental Justice Advisory Council Waste Transfer Station Working Group for reviewing and providing comments on this draft document. We would also like to thank Keith Gordon of Weaver Boos & Gordon, Inc., for providing a technical Treview and donating several of the photographs included in this document. Acknowledgements i Contents Acknowledgments. i Introduction . 1 What Are Waste Transfer Stations?. 1 Why Are Waste Transfer Stations Needed?. 2 Why Use Waste Transfer Stations? . 3 Is a Transfer Station Right for Your Community? . 4 Planning and Siting a Transfer Station. 7 Types of Waste Accepted . 7 Unacceptable Wastes . 7 Public Versus Commercial Use . 8 Determining Transfer Station Size and Capacity . 8 Number and Sizing of Transfer Stations . 10 Future Expansion . 11 Site Selection . 11 Environmental Justice Considerations . 11 The Siting Process and Public Involvement . 11 Siting Criteria. 14 Exclusionary Siting Criteria . 14 Technical Siting Criteria. 15 Developing Community-Specific Criteria . 17 Applying the Committee’s Criteria . 18 Host Community Agreements. 18 Transfer Station Design and Operation . 21 Transfer Station Design . 21 How Will the Transfer Station Be Used? . 21 Site Design Plan . 21 Main Transfer Area Design. 22 Types of Vehicles That Use a Transfer Station . 23 Transfer Technology . 25 Transfer Station Operations. 27 Operations and Maintenance Plans. 27 Facility Operating Hours . 32 Interacting With the Public . 33 Waste Screening . 33 Emergency Situations . 34 Recordkeeping. 35 Environmental Issues. -
Lecture 15-16: Parallel Programming with Cilk
Lecture 15-16: Parallel Programming with Cilk Concurrent and Mul;core Programming Department of Computer Science and Engineering Yonghong Yan [email protected] www.secs.oakland.edu/~yan 1 Topics (Part 1) • IntroducAon • Principles of parallel algorithm design (Chapter 3) • Programming on shared memory system (Chapter 7) – OpenMP – Cilk/Cilkplus – PThread, mutual exclusion, locks, synchroniza;ons • Analysis of parallel program execuAons (Chapter 5) – Performance Metrics for Parallel Systems • Execuon Time, Overhead, Speedup, Efficiency, Cost – Scalability of Parallel Systems – Use of performance tools 2 Shared Memory Systems: Mul;core and Mul;socket Systems • a 3 Threading on Shared Memory Systems • Employ parallelism to compute on shared data – boost performance on a fixed memory footprint (strong scaling) • Useful for hiding latency – e.g. latency due to I/O, memory latency, communicaon latency • Useful for scheduling and load balancing – especially for dynamic concurrency • Relavely easy to program – easier than message-passing? you be the judge! 4 Programming Models on Shared Memory System • Library-based models – All data are shared – Pthreads, Intel Threading Building Blocks, Java Concurrency, Boost, Microso\ .Net Task Parallel Library • DirecAve-based models, e.g., OpenMP – shared and private data – pragma syntax simplifies thread creaon and synchronizaon • Programming languages – CilkPlus (Intel, GCC), and MIT Cilk – CUDA (NVIDIA) – OpenCL 5 Toward Standard Threading for C/C++ At last month's meeting of the C standard committee, WG14 decided to form a study group to produce a proposal for language extensions for C to simplify parallel programming. This proposal is expected to combine the best ideas from Cilk and OpenMP, two of the most widely-used and well-established parallel language extensions for the C language family. -
System Calls & Signals
CS345 OPERATING SYSTEMS System calls & Signals Panagiotis Papadopoulos [email protected] 1 SYSTEM CALL When a program invokes a system call, it is interrupted and the system switches to Kernel space. The Kernel then saves the process execution context (so that it can resume the program later) and determines what is being requested. The Kernel carefully checks that the request is valid and that the process invoking the system call has enough privilege. For instance some system calls can only be called by a user with superuser privilege (often referred to as root). If everything is good, the Kernel processes the request in Kernel Mode and can access the device drivers in charge of controlling the hardware (e.g. reading a character inputted from the keyboard). The Kernel can read and modify the data of the calling process as it has access to memory in User Space (e.g. it can copy the keyboard character into a buffer that the calling process has access to) When the Kernel is done processing the request, it restores the process execution context that was saved when the system call was invoked, and control returns to the calling program which continues executing. 2 SYSTEM CALLS FORK() 3 THE FORK() SYSTEM CALL (1/2) • A process calling fork()spawns a child process. • The child is almost an identical clone of the parent: • Program Text (segment .text) • Stack (ss) • PCB (eg. registers) • Data (segment .data) #include <sys/types.h> #include <unistd.h> pid_t fork(void); 4 THE FORK() SYSTEM CALL (2/2) • The fork()is one of the those system calls, which is called once, but returns twice! Consider a piece of program • After fork()both the parent and the child are ..