Lecture 7: Parallelism

Fall 2018 Jason Tang

Slides based upon Concept slides, http://codex.cs.yale.edu/avi/os-book/OS9/slide-dir/index.html Copyright Silberschatz, Galvin, and Gagne, 2013 1 Topics

• Overview of Threading

• Multithreading Models

• Pthreads

• Implicit Threading

2 Why Threading?

• Modern programs, including kernels, are usually multithreaded

• Example: Update display, while

• Handling key strokes, while

• Sending network messages

• Process creation is heavy-weight, creation is light-weight

• Can simplify code and increase computer usage

3 Single and Multithreaded Processes

4 Multithreading Benefits

• Responsiveness - continue processing while waiting for other I/O, especially in GUIs

• Resource Sharing - because threads share the process’s resources, they may be more efficient than having different processes communicate via messages

• Economy - thread switching usually faster than process context switching

• Scalability - can run in parallel across multiple processors

5 Concurrency vs. Parallelism

• Concurrency - Multiple processes making progress

• If only one CPU, no true parallelism; OS rapidly switches between processes to give illusion of parallelism

• Parallelism - Multiple processes simultaneously making progress

• Modern computers have multiple cores, with true parallelism

• Challenge is to write programs that run in parallel, which is a very hard problem

6 Concurrency vs. Parallelism

• Concurrent execution on single core system:

• Concurrent and parallelism on a multicore system:

7 Types of Parallelism

• Data Parallelism - distributes subsets of same data across multiple cores; same operation run on each core

• Examples: ray tracer, search engine, BitCoin miner

• Task Parallelism - distributes threads across cores, with each thread performing unique operations

• Examples: web server, database server

• In modern software, programs tend to have both data and task parallelism

8 Amdahl’s Law

• Programs have a mix of serial (non-parallel) and parallel components

• Let S = percent of program that must be serial, and N = number of cores

• Example: if process foo is 75% parallel and 25% serial, then by going from 1 to 2 cores, the expected speedup is 1.6x

• As N approaches infinity, speedup approaches S-1

9 User Threads and Kernel Threads

• User Threads - threading handled entirely by user-level threading library; kernel is unaware of any threading

• Examples: Java “green” threads, , Tcl

• Kernel Threads - kernel supported threading

• Supported by nearly all modern OSes (Windows, macOS, , Unix, …)

10 Many-to-One Multithreading Model

• Many user-level threads mapped to single kernel thread

• If one thread blocks, entire process blocks

• Only model possible if no kernel support

• Threading library handles

• Faster to schedule (no context switching)

• Unable to take advantage of multiple processors

11 One-to-One Multithreading Model

• Each thread created by user maps to a kernel thread

• OS schedules each thread independently

• Each thread could run on separate processor

• OS may limit number of threads per process

• Used by Windows, Linux, etc.

12 Many-to-Many (Hybrid) Multithreading Model

• Many user-level threads map to many kernel threads

• Threading library handles scheduling

• OS creates kernel threads based upon number of processors available

• Much more complex to implement

• Available in Windows 7, older versions of FreeBSD, and in experimental OSes

13 Thread Libraries

• Thread Library - provides an API for creating and managing threads

• Threads could be entirely user-space construct ()

• Or library invokes system calls to request OS to create threads

• POSIX Threads (Pthreads) - standard API for threading, used by Linux, macOS, and other Unix-like systems

14 About Pthreads

• Pthreads library specifies thread creation, management, synchronizations (next lecture)

• Add #include to programs

• On Linux, compile and link with -pthread flag

gcc -o foo foo. -pthread

• See man 7 pthreads for an overview

• On Ubuntu, make sure you have installed the manpages-dev package (sudo apt-get install manpages-dev)

15 Thread Creation

int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg)

• Use to create a thread

• First parameter is address to store new thread’s unique ID

• Second parameter gives thread settings, or set to NULL for default settings

• Third parameter gives starting address for thread

• Fourth parameter is for arbitrary data to be passed into thread

• Return value is 0 on success, or negative on error

16 Starting Routine

void *thread_func(void *) { … }

• Function where newly created thread will start running

• Fourth parameter of pthread_create() passed as this function’s parameter

• When this function returns, thread will terminate

• OS stores function’s return value until later retrieved

• In Pthreads, the new thread will be begin in the ready state (assuming thread creation was successful)

• In Windows, two-step process (create thread, and then start running it)

17 Joining Threads

int pthread_join(pthread_t thread, void **retval);

• Just as with processes, Linux stores a thread’s return status, until something later reaps the thread

• In Linux, threads and processes are implemented uniformly as tasks

• This function blocks until target thread terminates

• First parameter is target thread to join

• Second parameter gives location to store return status, or NULL to ignore return status

18 Example Threading Code #1

#include #include

static void *example1(void *args) { printf("In thread\n"); return NULL; }

int main(void) { pthread_t tid; pthread_create(&tid, NULL, example1, NULL); printf("In main thread\n"); pthread_join(tid, NULL); printf("After join\n"); return 0; }

19 Example Threading Code #2

#include #include

static void *example2(void *args) { int my_ID = *(int *)(args); printf("I am thread %\n", my_ID); return NULL; }

int main(void) { int i, data[4]; pthread_t tids[4]; for (i = 0; i < 4; i++) { data[i] = i; pthread_create(tids + i, NULL, example2, data + i); } for (i = 0; i < 4; i++) { pthread_join(tids[i], NULL); } printf("After join\n"); return 0; }

20 Example Threading Code #3

#include #include

static void *example2(void *args) { int my_ID = *(int *)(args); printf("I am thread %d\n", my_ID); return NULL; }

int main(void) { int i, data[4]; pthread_t tids[4]; for (i = 0; i < 4; i++) { pthread_create(tids + i, NULL, example2, &i); } for (i = 0; i < 4; i++) { pthread_join(tids[i], NULL); } printf("After join\n"); return 0; } 21 Implicit Threading

• Creation and thread management automatically done by compilers and run- time libraries, instead of

• Examples:

• Thread pools

• OpenMP

• Grand Central Dispatch

22 Thread Pools

• Pre-create many threads, in a pool awaiting work

• Usually faster to service request when thread already exists, rather than creating a new thread each time

• Number of threads can be customized based upon application workload, amount of RAM, number of CPUs

• Starting with Windows Vista, a process can push work into a thread pool

• Within Linux kernel code, lower-priority tasks run within workqueues

23 OpenMP

• Compiler directives that will #include automatically parallelize #include loops int main(void) { int i; • Creates threads when double vals[1000000], sum = 0;

compiler detects parallel #pragma omp parallel for regions for (i = 0; i < 1000000; i++) { vals[i] = sqrt(i) + logf(i + 1); } • Available for C, C++, Fortran, running on modern /* this loop cannot be optimized */ operating OSes for (i = 0; i < 1000000; i++) { sum += vals[i]; } • Used in scientific computing printf("Sum is %f\n", sum); return 0; }

24 Grand Central Dispatch

• Designed by Apple for macOS and iOS, but has been ported to other OSes

• Programmer designates parts of their program that can be run in parallel

• GCD automatically dispatches that code into thread(s)

• GCD abstracts thread creation and joining

• OS determines how many threads to create based upon system load

25