Warm-Up: Computations Using Multiple Cores on a Single Machine

Total Page:16

File Type:pdf, Size:1020Kb

Warm-Up: Computations Using Multiple Cores on a Single Machine Distributed computing with Julia (Day 2) May 23rd, 2018 09:00-11:00 UNISA Przemek Szufel https://szufel.pl/ Materials for this course https://szufel.pl/unisa/ Day 2 Agenda • Parallelizing Julia on a single machine. • SIMD in Julia • Threading • Configuring the threading mechanism in Julia • multithreaded code efficiency issues • Multiprocessing • Local multiprocessing • parallelizing loops • introduction to interprocess communication issues JuliaBox – easiest way to start (pure cloud https://juliabox.com) Learning more about Julia • Website: https://julialang.org/ • Learning materials: https://julialang.org/learning/ • Where it is taught: https://julialang.org/teaching/ • Blogs about Julia: https://www.juliabloggers.com/ • Julia forum: https://discourse.julialang.org/ • Q&A for Julia: https://stackoverflow.com/questions/tagged/julia-lang Parallelization options in programming languages • Single instruction, multiple data (SIMD) • Green-threads • Multi-threading • Language • Libraries • Multi-processing • single machine • distributed (cluster) • distributed (cluster) via external tools SIMD • Single instruction, multiple data (SIMD) describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously. Such machines exploit data level parallelism, but not concurrency: there are simultaneous (parallel) computations, but only a single process (instruction) at a given moment. Source: https://en.wikipedia.org/wiki/SIMD Data level parallelism # 1_dot/dot_simd.jl function dot1(x, y) s = 0.0 for i in 1:length(x) @inbounds s += x[i]*y[i] end s end function dot2(x, y) s = 0.0 @simd for i in 1:length(x) @inbounds s += x[i]*y[i] end s Image source: https://en.wikipedia.org/wiki/SIMD end Dot product: output $ julia 1_dot/dot_simd.jl dot1 elapsed time: 0.832743291 seconds dot2 elapsed time: 0.303591816 seconds Green threading • In computer programming, green threads are threads that are scheduled by a runtime library or virtual machine (VM) instead of natively by the underlying operating system. Green threads emulate multithreaded environments without relying on any native OS capabilities, and they are managed in user space instead of kernel space, enabling them to work in environments that do not have native thread support. https://en.wikipedia.org/wiki/Green_threads A simple web server with green threading 2_web/webserver.jl @async begin server = listen(8080) while true sock = accept(server) @async begin data = readline(sock) print("Got request\n",data,"\n") header = "\nHTTP/1.1 200 OK\nContent-Type: text/html\n\n" message = string("<html><body>Hello from Julia at ",now(),"</body></html>") write(sock,string(header, message)) close(sock) end end end Comparison of parallelism types Threading Multiprocessing • Single process (cheap) • Multiple processes • Shared memory • Separate memory • Number of threads running • Number of processes running simultaneously limited by simultaneously limited by cluster number of processors size • Possible issues with locking and • Possible issues if inter-process false sharing communication is needed Threading Simple example – threading 3_sum/sum_thread.jl Single threaded Multithreading function ssum(x) function tsum(x) r, c = size(x) r, c = size(x) y = zeros(c) y = zeros(c) for i in 1:c Threads.@threads for i in 1:c for j in 1:r for j in 1:r y[i] += x[j, i] y[i] += x[j, i] end end end end y y end end Sum: output $ 3_sum/run_sum_thread.sh threads: 1 1.147527 seconds (4.71 k allocations: 420.445 KiB) 1.132901 seconds (6 allocations: 156.484 KiB) 1.207195 seconds (10.22 k allocations: 696.149 KiB) 1.179634 seconds (7 allocations: 156.531 KiB) threads: 2 1.147714 seconds (4.71 k allocations: 420.445 KiB) 1.133718 seconds (6 allocations: 156.484 KiB) 0.620536 seconds (10.22 k allocations: 696.149 KiB) 0.592958 seconds (7 allocations: 156.531 KiB) threads: 16 1.147191 seconds (4.71 k allocations: 420.445 KiB) 1.132812 seconds (6 allocations: 156.484 KiB) 0.175705 seconds (10.22 k allocations: 696.149 KiB) delta is compilation time 0.084011 seconds (7 allocations: 156.531 KiB) Threading: synchronization 4_locking/locking.jl Increment 푥 107 times using threads: • Atomic operations • SpinLock (busy waiting) • Mutex (OS provided lock) $ julia 4_locking/run_locking.sh Locking: output on c4.4xlarge (16 vCPU) 1 thread 16 threads f_bad f_bad 10000000 950043 0.498997 seconds (10.01 M allocations: 153.318 MiB, 49.89% gc time) 0.449196 seconds (1.63 M allocations: 27.759 MiB) 10000000 630661 0.198711 seconds (10.00 M allocations: 152.580 MiB, 3.04% gc time) 0.922549 seconds (1.52 M allocations: 26.963 MiB, 61.86% gc time) f_atomic f_atomic 10000000 10000000 0.082628 seconds (7.54 k allocations: 403.376 KiB) 0.217921 seconds (7.54 k allocations: 403.376 KiB) 10000000 10000000 0.059487 seconds (11 allocations: 288 bytes) 0.187748 seconds (12 allocations: 688 bytes) f_spin f_spin 10000000 10000000 0.286315 seconds (10.01 M allocations: 153.074 MiB, 2.25% gc time) 2.238537 seconds (10.01 M allocations: 153.074 MiB, 15.81% gc time) 10000000 10000000 0.257490 seconds (10.00 M allocations: 152.580 MiB, 1.52% gc time) 1.602330 seconds (10.00 M allocations: 152.581 MiB, 19.85% gc time) f_mutex f_mutex 10000000 10000000 0.557977 seconds (10.01 M allocations: 153.260 MiB, 1.17% gc time) 4.862945 seconds (10.01 M allocations: 153.260 MiB, 3.67% gc time) 10000000 10000000 0.491197 seconds (10.00 M allocations: 152.580 MiB, 1.02% gc time) 4.662214 seconds (10.00 M allocations: 152.580 MiB) Threading: false sharing 5_falsesharing/falsesharing.jl • Calculate sum of 12 × 108 ones (should be 12 × 108 ) • False sharing: threads modify independent variables sharing the same cache line • Caution: • threading performance is sometimes hard to predict • adding cores does not always help $ 5_falsesharing/run_falsesharing.sh 0.5 1.5 2.5 False sharing: output on c4.4xlarge (16 vCPU) (16 onc4.4xlarge output sharing: False 0 1 2 1 2 4 1 thread 8 16 2 threads 32 3 threads 64 128 4 threads 256 512 1024 2048 4096 Exercise I (6_sums/sums.jl) Parallelize the code summing 108 random numbers. Compare the performance with in-built 푠푢푚 function sums: output on c4.4xlarge (16 vCPU) $ sh ~/fields/2_3/5_sums/solution/run_sums.sh threads: 1 sum 0.070023 seconds (1 allocation: 16 bytes) s 0.116564 seconds (1 allocation: 16 bytes) s2 0.117371 seconds (3 allocations: 144 bytes) s3 0.071282 seconds (1 allocation: 16 bytes) s4 0.069734 seconds (47 allocations: 1.891 KiB) threads: 2 sum 0.069787 seconds (1 allocation: 16 bytes) s 0.116970 seconds (1 allocation: 16 bytes) s2 0.061721 seconds (3 allocations: 144 bytes) s3 0.071210 seconds (1 allocation: 16 bytes) s4 0.039898 seconds (48 allocations: 1.953 KiB) threads: 16 sum 0.080793 seconds (1 allocation: 16 bytes) s 0.113512 seconds (1 allocation: 16 bytes) s2 0.020983 seconds (3 allocations: 256 bytes) s3 0.078284 seconds (1 allocation: 16 bytes) s4 0.019935 seconds (52 allocations: 2.578 KiB) Egxample – multiprocessing 7_rand/rand_process.jl using BenchmarkTools using BenchmarkTools function s_rand() n = 10^4 function p_rand() x = 0.0 n = 10^4 for i in 1:n x = @parallel (+) for i in 1:n x += sum(rand(10^4)) sum(rand(10^4)) end end x / n x / n end end @time s_rand() @time p_rand() @time s_rand() @time p_rand() $ julia -p $(nproc) rand_process.jl Parallelizing Julia code • @parallel • @spawnat • @everywhere • @async • @sync • fetch() Rand: output $ 3_rand/run_rand_process.sh 0.381071 seconds (46.21 k allocations: 765.124 MiB, 37.20% gc time) 0.161149 seconds (20.00 k allocations: 763.703 MiB, 9.64% gc time) 1.661893 seconds (230.81 k allocations: 12.494 MiB, 0.15% gc time) delta is compilation and 0.092413 seconds (1.89 k allocations: 155.766 KiB) process spawning time Full example – Asian option pricing 8_asianoption/* • An Asian option (or average value option) is a special type of option contract. For Asian options the payoff is determined by the average underlying price over some pre-set period of time. • An asset with known price 푋0 at time 0 • By 푋푡 denote asset price at time 푡 • We have to calculate value 푣 of Asian option executable at time 푇: 푣 = 퐸 exp −푟푇 max 푋ത − 퐾, 0 1 푇 where 푋ത = ׬ 푋 푑푡 푇 0 푡 What is geometric Brownian motion (GBM)? Formally: 푋푝 ln 푋푞 has normal distribution Intuitively: percentage price change is normally distributed Numerical approximation of 푣 • Replace 푋ത by its approximation in 푚 discrete periods 푚 1 푥ො = ෍ 푋 , Δ = 푇/푚 푚 푖Δ 푖=1 • Assume that process 푋푡 is geometric Brownian motion with drift 푟 and volatility 휎2 휎2 푋 = 푋 exp 푟 − Δ + 휎 Δ푍 , 푍 ∼ 푁 0,1 푖+1 Δ 푖Δ 2 푖 푖 • Average 푛 independent samples of exp −푟푇 max 푥ො − 퐾, 0 Example implementations 8_asianoption/* • asianoption.jl • single CPU • asianoption_thread.jl • threaded • asianoption_parallel.jl • pmap • asianoption_parallel2.jl • @parallel Single process (Julia) function v_asian_sample(T, r, K, σ, X₀, m::Integer) X = X₀ hatx = zero(X) Δ = T / m for i in 1:m X *= exp((r-σ^2/2)*Δ + σ*√Δ*randn()) hatx += X end exp(-r*T)*max(hatx/m - K, 0) end function v_asian(T, r, K, σ, X₀, m, n) mean(v_asian_sample(T, r, K, σ, X₀, m) for I in 1:n) end Julia: using all cores of your cluster by adding a single @parallel command function v_asian(T, r, K, σ, X₀, m, n) res = @parallel (+) for i in 1:n X = X₀ hatx = zero(X) Δ = T / m for i in 1:m X *= exp((r-σ^2/2)*Δ + σ*√Δ*randn()) hatx += X end exp(-r*T)*max(hatx/m - K, 0) end res / n end Asian option: output $ 4_asianoption/run_asianoption.sh one CPU => 3.065627163 (2.042850249101069) threads: 1 => 3.017402595 (2.124517036279159) pmap: 1 => 3.025395902 (2.105147813902863) @parallel 1 => 3.368927703 (2.074390346543304) threads: 4 => 0.75593904 (2.086799284142133) pmap: 4 => 0.772638171 (2.10637071660848) @parallel 4 => 0.844037915 (2.0170305511851563) threads: 16 => 0.339850353 (2.0950297040301478) pmap: 16 => 0.340659614 (2.0348591986978257) @parallel 16 => 0.3597416 (1.9945444607076115) Threading is sensitive to details Comment out lines 12 do 14 in asianoptions_thread.jl File.
Recommended publications
  • Events, Co-Routines, Continuations and Threads OS (And Application)Execution Models System Building
    Events, Co-routines, Continuations and Threads OS (and application)Execution Models System Building General purpose systems need to deal with • Many activities – potentially overlapping – may be interdependent • Activities that depend on external phenomena – may requiring waiting for completion (e.g. disk read) – reacting to external triggers (e.g. interrupts) Need a systematic approach to system structuring © Kevin Elphinstone 2 Construction Approaches Events Coroutines Threads Continuations © Kevin Elphinstone 3 Events External entities generate (post) events. • keyboard presses, mouse clicks, system calls Event loop waits for events and calls an appropriate event handler. • common paradigm for GUIs Event handler is a function that runs until completion and returns to the event loop. © Kevin Elphinstone 4 Event Model The event model only requires a single stack Memory • All event handlers must return to the event loop CPU Event – No blocking Loop – No yielding PC Event SP Handler 1 REGS Event No preemption of handlers Handler 2 • Handlers generally short lived Event Handler 3 Data Stack © Kevin Elphinstone 5 What is ‘a’? int a; /* global */ int func() { a = 1; if (a == 1) { a = 2; } No concurrency issues within a return a; handler } © Kevin Elphinstone 6 Event-based kernel on CPU with protection Kernel-only Memory User Memory CPU Event Loop Scheduling? User PC Event Code SP Handler 1 REGS Event Handler 2 User Event Data Handler 3 Huh? How to support Data Stack multiple Stack processes? © Kevin Elphinstone 7 Event-based kernel on CPU with protection Kernel-only Memory User Memory CPU PC Trap SP Dispatcher User REGS Event Code Handler 1 User-level state in PCB Event PCB Handler 2 A User Kernel starts on fresh Timer Event Data stack on each trap (Scheduler) PCB B No interrupts, no blocking Data Current in kernel mode Thead PCB C Stack Stack © Kevin Elphinstone 8 Co-routines Originally described in: • Melvin E.
    [Show full text]
  • Designing an Ultra Low-Overhead Multithreading Runtime for Nim
    Designing an ultra low-overhead multithreading runtime for Nim Mamy Ratsimbazafy Weave [email protected] https://github.com/mratsim/weave Hello! I am Mamy Ratsimbazafy During the day blockchain/Ethereum 2 developer (in Nim) During the night, deep learning and numerical computing developer (in Nim) and data scientist (in Python) You can contact me at [email protected] Github: mratsim Twitter: m_ratsim 2 Where did this talk came from? ◇ 3 years ago: started writing a tensor library in Nim. ◇ 2 threading APIs at the time: OpenMP and simple threadpool ◇ 1 year ago: complete refactoring of the internals 3 Agenda ◇ Understanding the design space ◇ Hardware and software multithreading: definitions and use-cases ◇ Parallel APIs ◇ Sources of overhead and runtime design ◇ Minimum viable runtime plan in a weekend 4 Understanding the 1 design space Concurrency vs parallelism, latency vs throughput Cooperative vs preemptive, IO vs CPU 5 Parallelism is not 6 concurrency Kernel threading 7 models 1:1 Threading 1 application thread -> 1 hardware thread N:1 Threading N application threads -> 1 hardware thread M:N Threading M application threads -> N hardware threads The same distinctions can be done at a multithreaded language or multithreading runtime level. 8 The problem How to schedule M tasks on N hardware threads? Latency vs 9 Throughput - Do we want to do all the work in a minimal amount of time? - Numerical computing - Machine learning - ... - Do we want to be fair? - Clients-server - Video decoding - ... Cooperative vs 10 Preemptive Cooperative multithreading:
    [Show full text]
  • An Ideal Match?
    24 November 2020 An ideal match? Investigating how well-suited Concurrent ML is to implementing Belief Propagation for Stereo Matching James Cooper [email protected] OutlineI 1 Stereo Matching Generic Stereo Matching Belief Propagation 2 Concurrent ML Overview Investigation of Alternatives Comparative Benchmarks 3 Concurrent ML and Belief Propagation 4 Conclusion Recapitulation Prognostication 5 References Outline 1 Stereo Matching Generic Stereo Matching Belief Propagation 2 Concurrent ML Overview Investigation of Alternatives Comparative Benchmarks 3 Concurrent ML and Belief Propagation 4 Conclusion Recapitulation Prognostication 5 References Outline 1 Stereo Matching Generic Stereo Matching Belief Propagation 2 Concurrent ML Overview Investigation of Alternatives Comparative Benchmarks 3 Concurrent ML and Belief Propagation 4 Conclusion Recapitulation Prognostication 5 References Stereo Matching Generally SM is finding correspondences between stereo images Images are of the same scene Captured simultaneously Correspondences (`disparity') are used to estimate depth SM is an ill-posed problem { can only make best guess Impossible to perform `perfectly' in general case Stereo Matching ExampleI (a) Left camera's image (b) Right camera's image Figure 1: The popular 'Tsukuba' example stereo matching images, so called because they were created by researchers at the University of Tsukuba, Japan. They are probably the most widely-used benchmark images in stereo matching. Stereo Matching ExampleII (a) Ground truth disparity map (b) Disparity map generated using a simple Belief Propagation Stereo Matching implementation Figure 2: The ground truth disparity map for the Tsukuba images, and an example of a possible real disparity map produced by using Belief Propagation Stereo Matching. The ground truth represents what would be expected if stereo matching could be carried out `perfectly'.
    [Show full text]
  • Tcl and Java Performance
    Tcl and Java Performance http://ptolemy.eecs.berkeley.edu/~cxh/java/tclblend/scriptperf/scriptperf.html Tcl and Java Performance by H. John Reekie, University of California at Berkeley Christopher Hylands, University of California at Berkeley Edward A. Lee, University of California at Berkeley Abstract Combining scripting languages such as Tcl with lower−level programming languages such as Java offers new opportunities for flexible and rapid software development. In this paper, we benchmark various combinations of Tcl and Java against the two languages alone. We also provide some comparisons with JavaScript. Performance can vary by well over two orders of magnitude. We also uncovered some interesting threading issues that affect performance on the Solaris platform. "There are lies, damn lies and statistics" This paper is a work in progress, we used the information here to give our group some generalizations on the performance tradeoffs between various scripting languages. Updating the timing results to include JDK1.2 with a Just In Time (JIT) compiler would be useful. Introduction There is a growing trend towards integration of multiple languages through scripting. In a famously controversial white paper (Ousterhout 97), John Ousterhout, now of Scriptics Corporation, argues that scripting −− the use of a high−level, untyped, interpreted language to "glue" together components written in a lower−level language −− provides greater reuse benefits that other reuse technologies. Although traditionally a language such as C or C++ has been the lower−level language, more recent efforts have focused on using Java. Recently, Sun Microsystems laboratories announced two products aimed at fulfilling this goal with the Tcl and Java programming languages.
    [Show full text]
  • Thread Scheduling in Multi-Core Operating Systems Redha Gouicem
    Thread Scheduling in Multi-core Operating Systems Redha Gouicem To cite this version: Redha Gouicem. Thread Scheduling in Multi-core Operating Systems. Computer Science [cs]. Sor- bonne Université, 2020. English. tel-02977242 HAL Id: tel-02977242 https://hal.archives-ouvertes.fr/tel-02977242 Submitted on 24 Oct 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Ph.D thesis in Computer Science Thread Scheduling in Multi-core Operating Systems How to Understand, Improve and Fix your Scheduler Redha GOUICEM Sorbonne Université Laboratoire d’Informatique de Paris 6 Inria Whisper Team PH.D.DEFENSE: 23 October 2020, Paris, France JURYMEMBERS: Mr. Pascal Felber, Full Professor, Université de Neuchâtel Reviewer Mr. Vivien Quéma, Full Professor, Grenoble INP (ENSIMAG) Reviewer Mr. Rachid Guerraoui, Full Professor, École Polytechnique Fédérale de Lausanne Examiner Ms. Karine Heydemann, Associate Professor, Sorbonne Université Examiner Mr. Etienne Rivière, Full Professor, University of Louvain Examiner Mr. Gilles Muller, Senior Research Scientist, Inria Advisor Mr. Julien Sopena, Associate Professor, Sorbonne Université Advisor ABSTRACT In this thesis, we address the problem of schedulers for multi-core architectures from several perspectives: design (simplicity and correct- ness), performance improvement and the development of application- specific schedulers.
    [Show full text]
  • Comparison of Concurrency Frameworks for the Java Virtual Machine
    Universität Ulm Fakultät für Ingenieurwissenschaften und Informatik Institut für Verteilte Systeme, Bachelorarbeit im Studiengang Informatik Comparison of Concurrency Frameworks for the Java Virtual Machine Thomas Georg Kühner vorgelegt am 25. Oktober 2013 VS-B13-2013 Gutachter Prof. Dr. Frank Kargl Fassung vom: December 8, 2013 cbnd Diese Arbeit ist lizensiert unter der Creative Commons Namensnennung-Keine kommerzielle Nutzung-Keine Bearbeitung 3.0 Deutschland Lizenz. Nähere Informationen finden Sie unter: http://creativecommons.org/licenses/by-nc-nd/3.0/de/ oder senden Sie einen Brief an: Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Contents 1. Introduction 1 1.1. Motivation..........................................1 1.2. Scope of this Thesis.....................................2 1.3. Methodology........................................2 1.4. Road Map for this thesis..................................2 2. Basics and Background about Concurrency and Concurrent Programming3 2.1. Terminology.........................................3 2.2. Small History of Concurrency Theory..........................5 2.3. General Programming Models for Concurrency....................6 2.4. Traditional Concurrency Issues..............................7 2.4.1. Race Condition...................................7 2.4.2. Dead-Lock......................................8 2.4.3. Starvation......................................8 2.4.4. Priority Inversion..................................9 2.5. Summary...........................................9
    [Show full text]
  • Threading and GUI Issues for R
    Threading and GUI Issues for R Luke Tierney School of Statistics University of Minnesota March 5, 2001 Contents 1 Introduction 2 2 Concurrency and Parallelism 2 3 Concurrency and Dynamic State 3 3.1 Options Settings . 3 3.2 User Defined Options . 5 3.3 Devices and Par Settings . 5 3.4 Standard Connections . 6 3.5 The Context Stack . 6 3.5.1 Synchronization . 6 4 GUI Events And Blocking IO 6 4.1 UNIX Issues . 7 4.2 Win32 Issues . 7 4.3 Classic MacOS Issues . 8 4.4 Implementations To Consider . 8 4.5 A Note On Java . 8 4.6 A Strategy for GUI/IO Management . 9 4.7 A Sample Implementation . 9 5 Threads and GUI’s 10 6 Threading Design Space 11 6.1 Parallelism Through HL Threads: The MXM Options . 12 6.2 Light-Weight Threads: The XMX Options . 12 6.3 Multiple OS Threads Running One At A Time: MSS . 14 6.4 Variations on OS Threads . 14 6.5 SMS or MXS: Which To Choose? . 14 7 Light-Weight Thread Implementation 14 1 March 5, 2001 2 8 Other Issues 15 8.1 High-Level GUI Interfaces . 16 8.2 High-Level Thread Interfaces . 16 8.3 High-Level Streams Interfaces . 16 8.4 Completely Random Stuff . 16 1 Introduction This document collects some random thoughts on runtime issues relating to concurrency, threads, GUI’s and the like. Some of this is extracted from recent R-core email threads. I’ve tried to provide lots of references that might be of use.
    [Show full text]
  • Fibers Without Scheduler
    Document number: P0876R5 Date: 2019-01-21 Author: Oliver Kowalke ([email protected]) Nat Goodspeed ([email protected]) Audience: SG1 fiber_context - fibers without scheduler Revision History . .1 abstract . .1 control transfer mechanism . .2 std::fiber_context as a first-class object . .3 encapsulating the stack . .3 invalidation at resumption . .4 problem: avoiding non-const global variables and undefined behaviour . .4 solution: avoiding non-const global variables and undefined behaviour . .6 inject function into suspended fiber . 10 passing data between fibers . 12 termination . 13 exceptions . 13 stack destruction . 14 std::fiber_context as building block for higher-level frameworks . 14 interaction with STL algorithms . 16 possible implementation strategies . 16 fiber switch on architectures with register window . 18 how fast is a fiber switch . 18 interaction with accelerators . 18 multi-threading environment . 18 acknowledgment . 19 API ................................................................ 20 33.7 Cooperative User-Mode Threads . 20 33.7.1 General . 20 33.7.2 Header <experimental/fiber_context> synopsis . 20 33.7.3 Class fiber_context . 20 33.7.4 Function unwind_fiber() . 25 33.7.5 Class unwind_exception . 25 references . 26 Revision History This document supersedes P0876R4. Changes since P0876R4: abstract This paper addresses concerns, questions and suggestions from the past meetings. The proposed API supersedes the former proposals N3985,5 P0099R1,7 P0534R3,8 P0876R09 and P0876R2.10 Because of name clashes with coroutine from coroutine TS, execution context from executor proposals and continuation used in the context of future::then(), the committee has indicated that fiber is preferable. However, given the foundational, low-level nature of this proposal, we choose fiber_context, leaving the term fiber for a higher-level facility built on top of this one.
    [Show full text]
  • O'reilly & Associates, Inc
    Java Threads, 2nd edition Scott Oaks & Henry Wong 2nd Edition January 1999 ISBN: 1-56592-418-5, 332 pages Revised and expanded to cover Java 2, Java Threads shows you how to take full advantage of Java's thread facilities: where to use threads to increase efficiency, how to use them effectively, and how to avoid common mistakes. It thoroughly covers the Thread and ThreadGroup classes, the Runnable interface, and the language's synchronized operator. The book pays special attention to threading issues with Swing, as well as problems like deadlock, race condition, and starvation to help you write code without hidden bugs. Table of Contents Preface 1 1. Introduction to Threading 5 Java Terms Thread Overview Why Threads? Summary 2. The Java Threading API 12 Threading Using the Thread Class Threading Using the Runnable Interface The Life Cycle of a Thread Thread Naming Thread Access More on Starting, Stopping, and Joining Summary 3. Synchronization Techniques 31 A Banking Example Reading Data Asynchronously A Class to Perform Synchronization The Synchronized Block Nested Locks Deadlock Return to the Banking Example Synchronizing Static Methods Summary 4. Wait and Notify 50 Back to Work (at the Bank) Wait and Notify wait(), notify(), and notifyAll() wait() and sleep() Thread Interruption Static Methods (Synchronization Details) Summary 5. Useful Examples of Java Thread Programming 64 Data Structures and Containers Simple Synchronization Examples A Network Server Class The AsyncInputStream Class Using TCPServer with AsyncInputStreams Summary 6. Java Thread Scheduling 87 An Overview of Thread Scheduling When Scheduling Is Important Scheduling with Thread Priorities Popular Scheduling Implementations Native Scheduling Support Other Thread-Scheduling Methods Summary Table of Contents (cont...) 7.
    [Show full text]
  • Ruby Programming
    Ruby Programming Wikibooks.org December 1, 2012 On the 28th of April 2012 the contents of the English as well as German Wikibooks and Wikipedia projects were licensed under Creative Commons Attribution-ShareAlike 3.0 Unported license. An URI to this license is given in the list of figures on page 249. If this document is a derived work from the contents of one of these projects and the content was still licensed by the project under this license at the time of derivation this document has to be licensed under the same, a similar or a compatible license, as stated in section 4b of the license. The list of contributors is included in chapter Contributors on page 243. The licenses GPL, LGPL and GFDL are included in chapter Licenses on page 253, since this book and/or parts of it may or may not be licensed under one or more of these licenses, and thus require inclusion of these licenses. The licenses of the figures are given in the list of figures on page 249. This PDF was generated by the LATEX typesetting software. The LATEX source code is included as an attachment (source.7z.txt) in this PDF file. To extract the source from the PDF file, we recommend the use of http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/ utility or clicking the paper clip attachment symbol on the lower left of your PDF Viewer, selecting Save Attachment. After extracting it from the PDF file you have to rename it to source.7z. To uncompress the resulting archive we recommend the use of http://www.7-zip.org/.
    [Show full text]
  • Discussions on Asynchronous Programming Apis Xiao-Feng Li [email protected] 2019-10-1 What Is Asynchronous Operation?
    Discussions on Asynchronous Programming APIs Xiao-Feng Li [email protected] 2019-10-1 What is asynchronous operation? • An operation whose result depends on external data that is not computed by the processor, therefore does not prevent the processor from executing other task in parallel with it (the operation). • External data: timer interrupt, signal, and mostly the IO data • IO goes out of CPU’s control for local or remote data • Excluding cache access, which is handled by the processor, invisible to OS issue external data request external data ready op start op end 10/01/2019 Discussions on Asynchronous Programming APIs ([email protected]) 2 Async ops need system support • To use external data requires: • Need to access system to issue IO request, to trigger timer, etc. • Need a way to know the external data ready • System needs to provide APIs to support async ops • Semantics that has to be supported by the system API: • 1. To issue request, 2. to wait for the result, 3. to receive the result. • A few factors to consider for API design 1. Cost: Best use of processor resource 2. Latency: Program receives data without delay 3. Throughput: As many as possible async ops can be served (This is from async support point of view, omitting lots of other API design factors.) 10/01/2019 Discussions on Asynchronous Programming APIs ([email protected]) 3 Blocking single request issue external data request external result ready op start blocking op end Multiple requests needs multithreads • Integrate all three parts into a single API • Blocking API (issue request + wait for result + receive available result).
    [Show full text]
  • Rtmlton : an SML Runtime for Real-Time Systems
    RTMLton : An SML runtime for real-time systems Bhargav Shivkumar[0000−0002−8430−9229], Jeffrey Murphy, and Lukasz Ziarek SUNY - University at Buffalo , New York, USA https://ubmltongroup.github.io/ Abstract. There is a growing interest in leveraging functional programming lan- guages in real-time and embedded contexts. Functional languages are appeal- ing as many are strictly typed, amenable to formal methods, have limited muta- tion, and have simple, but powerful concurrency control mechanisms. Although there have been many recent proposals for specialized domain specific languages for embedded and real-time systems, there has been relatively little progress on adapting more general purpose functional languages for programming embedded and real-time systems. In this paper we present our current work on leverag- ing Standard ML in the embedded and real-time domains. Specifically we de- tail our experiences in modifying MLton, a whole program, optimizing compiler for Standard ML, for use in such contexts. We focus primarily on the language runtime, re-working the threading subsystem and garbage collector. We provide preliminary results over a radar-based aircraft collision detector ported to SML. Keywords: Real-time systems · Predictable GC · Functional programming. 1 Introduction With the renewed popularity of functional programming, practitioners have begun re- examining functional programming languages as an alternative for programming em- bedded and real-time applications [13, 7, 25, 8]. Recent advances in program verifica- tion [2, 12] and formal methods [1, 14] make functional programming languages ap- pealing, as embedded and real-time systems have more stringent correctness criteria. Correctness is not based solely on computed results (logic) but also the predictability of execution (timing).
    [Show full text]