List of Lectures Lecture Content Concurrent Computing

List of lectures 1Introduction and Functional Programming 2Imperative Programming and Data Structures 3Parsing TDDA69 Data and Program Structure 4Evaluation 5Object Oriented Programming Concurrent Computing 6Macros and decorators Cyrille Berger 7Virtual Machines and Bytecode 8Garbage Collection and Native Code 9Concurrent Computing 10Declarative Programming 11Logic Programming 12Summary 2 / 65 Lecture content Concurrent computing Parallel Programming In concurrent computing several Multithreaded Programming computations are executed at the same The States Problems and Solutions time Atomic actions Language and Interpreter Design Considerations In parallel computing all computations units Single Instruction, Multiple Threads Programming have access to shared memory (for instance Distributed programming in a single process) Message Passing In distributed computing computations units MapReduce communicate through messages passing 3 / 65 4 / 65 Benefits of concurrent computing Disadvantages of concurrent computing Faster computation Concurrency is hard to implement Responsiveness properly Interactive applications can be performing two Safety tasks at the same time: rendering, spell checking... Easy to corrupt memory Availability of services Deadlock Load balancing between servers Tasks can wait indefinitely for each other Controllability Non-deterministic Tasks can be suspended, resumed and stopped. Not always faster! The memory bandwidth and CPU cache is limited 5 / 65 6 / 65 Concurrent computing programming Stream Programming in Functional Programming Four basic approach to computing: No global state Sequencial programming: no concurrency Functions only act on their input, Declarative concurrency: streams in a functional language they are reentrant Message passing: with active objects, used in Functions can then be executed in distributed computing parallel Atomic actions: on a shared memory, used in parallel computing As long as they do not depend on the output of an other function 7 / 65 8 / 65 Parallel Programming In parallel computing several computations are executed at the Parallel Programming same time and have access to shared memory Unit Unit Unit Memory 10 SIMD, SIMT, SMT (1/2) SIMD, SIMT, SMT (2/2) SIMD: Single Instruction, Multiple Data SIMD: Elements of a short vector (4 to 8 elements) are processed in parallel PUSH [1, 2, 3, 4] PUSH [4, 5, 6, 7] VEC_ADD_4 SIMT: Single Instruction, Multiple Threads The same instruction is executed by multiple threads (from 128 to 3048 or more in SIMT: the future) execute([1, 2, 3, 4], [4, 5, 6, 7], lambda a,b,ti: a[ti]=a[ti] + max(b[ti], 5)) SMT: a = [1, 2, 3, 4] SMT: Simultaneous Multithreading b = [4, 5, 6, 7] General purpose, different instructions are executed by different threads ... Thread.new(lambda : a = a + b) Thread.new(lambda : c = c * b) 11 12 Why the need for the different models? Flexibility: SMT > SIMT > SIMD Less flexibility give higher performance Multithreaded Programming Unless the lack of flexibility prevent to accomplish the task Performance: SIMD > SIMT > SMT 13 Singlethreaded vs Multithreaded Multithreaded Programming Model Start with a single root main thread Fork: to create concurently sub 0 ... sub n executing threads Join: to synchronize threads main Threads communicate through shared memory sub 0 ... sub n Threads execute assynchronously They may or may not execute main on different processors 15 16 A multithreaded example thread1 = new Thread( function() { /* do some computation */ }); thread2 = new Thread( function() The States Problems and Solutions { /* do some computation */ }); thread1.start(); thread2.start(); thread1.join(); thread2.join(); 17 Global States and multi-threading Example: var a = 0; thread1 = new Thread( function() { a = a + 1; }); thread2 = new Thread( Atomic actions function() { a = a + 1; }); thread1.start(); thread2.start(); What is the value of a ? This is called a (data) race condition 19 Atomic operations Mutex An operation is said to be atomic, if it Mutex is the short of Mutual thread2 = new Thread( exclusion function() appears to happen instantaneously It is a technique to prevent two threads to access a shared resource { read/write, swap, fetch-and-add... at the same time m.lock(); test-and-set: set a value and return the Example: a = a + 1; var a = 0; old one var m = new Mutex(); m.unlock(); thread1 = new Thread( }); To implement a lock: function() thread1.start(); while(test-and-set(lock, 1) == 1) {} { m.lock(); thread2.start(); To unlock: a = a + 1; lock = 0 m.unlock(); Now a=2 }); 21 22 Dependency Condition variable thread2 = new Thread( A Condition variable is a set thread2 = new Thread( Example: function() function() of threads waiting for a var a = 1; { { certain condition cv.wait(); var m = new Mutex(); m.lock(); Example: m.lock(); a = a * 3; thread1 = new Thread( a = a * 3; var a = 1; var m = new Mutex(); m.unlock(); function() m.unlock(); var cv = new ConditionVariable(); }); { }); thread1 = new Thread( thread1.start(); function() thread2.start(); m.lock(); thread1.start(); { a = 6 thread2.start(); m.lock(); a = a + 1; a = a + 1; m.unlock(); What is the value of cv.notify(); m.unlock(); }); a ? 4 or 6 ? }); 23 24 Deadlock Advantages of atomic actions What might happen: thread2 = new Thread( var a = 0; function() Very efficient var b = 2; { var ma = new Mutex(); mb.lock(); Less overhead, faster than message var mb = new Mutex(); ma.lock(); thread1 = new Thread( b = b - 1; passing function() a = a + b; { mb.unlock(); ma.lock(); ma.unlock(); mb.lock(); }); b = b - 1; thread1.start(); a = a - 1; thread2.start(); ma.unlock(); thread1 waits for mb, mb.unlock(); }); thread2 waits for ma 25 26 Disadvantages of atomic actions Blocking Meaning some threads have to wait Small overhead Deadlock Language and Interpreter Design Considerations A low-priority thread can block a high priority thread A common source of programming errors 27 Common mistakes Forget to unlock a mutex Forget to unlock a mutex Most programming language have, Race condition either: Deadlocks A guard object that will unlock a mutex upon destruction Granularity issues: too much locking A synchronization statement will kill the performance some_rlock = threading.RLock() with some_rlock: print("some_rlock is locked while this executes") 29 30 Race condition Safe Shared Mutable State in rust (1/3) Can we detect potential race let mut data = vec![1, 2, 3]; for i in 0..3 { condition during compilation? thread::spawn(move || { data[i] += 1; In the rust programming language }); Objects are owned by a specific thread } Types can be marked with Send trait Gives an error: "capture of moved indicate that the object can be moved between threads Types can be marked with Sync trait value: `data`" indicate that the object can be accessed by multiple threads safely 31 32 Safe Shared Mutable State in rust (2/3) Safe Shared Mutable State in rust (3/3) let mut data = Arc::new(vec![1, 2, 3]); let data = Arc::new(Mutex::new(vec![1, 2, 3])); for i in 0..3 { for i in 0..3 { let data = data.clone(); let data = data.clone(); thread::spawn(move || { thread::spawn(move || { data[i] += 1; }); let mut data = data.lock().unwrap(); } data[i] += 1; Arc add reference counting, is movable }); } and syncable It now compiles and it works Gives error: cannot borrow immutable borrowed content as mutable 33 34 Single Instruction, Multiple Threads Programming With SIMT, the same instructions is executed by multiple threads on Single Instruction, Multiple Threads Programming different registers 36 Single instruction, multiple flow paths (1/2) Single instruction, multiple flow paths (1/2) Using a masking system, it is possible to support if/ Benefits: else block Multiple flows are needed in many algorithms Threads are always executing the instruction of both part of the if/else blocks Drawbacks: data = [-2, 0, 1, -1, 2], data2 = [...] function f(thread_id, data, data2) Only one flow path is executed at a time, non { if(data[thread_id] < 0) running threads must wait { data[thread_id] = data[thread_id]-data2[thread_id]; Randomize memory access } else if(data[thread_id] > 0) Elements of a vector are not accessed sequentially { data[thread_id] = data[thread_id]+data2[thread_id]; } } 37 38 Programming Language Design for SIMT OpenCL, CUDA are the most common Very low level, C/C++-derivative General purpose programming language are not suitable Some work has been done to be able to write in Distributed programming Python and run on a GPU with CUDA @jit(argtypes=[float32[:], float32[:], float32[:]], target='gpu') def add_matrix(A, B, C): A[cuda.threadIdx.x] = B[cuda.threadIdx.x] + C[cuda.threadIdx.x] with limitation on standard function that can be called 39 Distributed Programming (1/4) Distributed programming (2/4) In distributed computing several A distributed computing application computations are executed at the consists of multiple programs running same time and communicate on multiple computers that together through messages passing coordinate to perform some task. Computation is performed in parallel by many Unit Unit Unit computers. Information can be restricted to certain computers. Memory Memory Memory Redundancy and geographic diversity improve reliability. 41 42 Distributed programming (3/4) Distributed programming (4/4) Characteristics of distributed Individual programs have computing: differentiating roles. Computers are independent — they do not share Distributed computing for large- memory. Coordination is enabled by messages passed scale data processing: across a network. Databases respond

Load more