Programming with Shared Memory PART I

Programming with Shared Memory PART I

Programming with Shared Memory PART I HPC Fall 2012 Prof. Robert van Engelen Overview n Shared memory machines n Programming strategies for shared memory machines n Allocating shared data for IPC n Processes and threads n MT-safety issues n Coordinated access to shared data ¨ Locks ¨ Semaphores ¨ Condition variables ¨ Barriers n Further reading 2/23/17 HPC Fall 2012 2 Shared Memory Machines n Single address space n Shared memory ¨ Single bus (UMA) Shared memory UMA machine with a single bus ¨ Interconnect with memory banks (NUMA) ¨ Cross-bar switch n Distributed shared memory (DSM) ¨ Logically shared, physically distributed Shared memory NUMA machine with memory banks DSM Shared memory multiprocessor with cross-bar switch 2/23/17 HPC Fall 2012 3 Programming Strategies for Shared Memory Machines n Use a specialized programming language for parallel computing ¨ For example: HPF, UPC n Use compiler directives to supplement a sequential program with parallel directives ¨ For example: OpenMP n Use libraries ¨ For example: ScaLapack (though ScaLapack is primarily designed for distributed memory) n Use heavyweight processes and a shared memory API n Use threads n Use a parallelizing compiler to transform (part of) a sequential program into a parallel program 2/23/17 HPC Fall 2012 4 Heavyweight Processes Fork-join of processes n The UNIX system call fork() creates a new process ¨ fork() returns 0 to the child process ¨ fork() returns process ID (pid) of child to parent n System call exit(n) joins child process with parent and passes exit value n to it n Parent executes wait(&n) to wait until one of its children joins, where n is set to the exit value Process tree n System and user processes form a tree 2/23/17 HPC Fall 2012 5 Fork-Join Process 1 … … pid = fork(); if (pid == 0) { … // code for child exit(0); } else { … // parent code continues wait(&n); // join } … // parent code continues … SPMD program 2/23/17 HPC Fall 2012 6 Fork-Join Process 1 Process 2 … … … … pid = fork(); pid = fork(); if (pid == 0) if (pid == 0) { … // code for child { … // code for child exit(0); exit(0); } else } else { … // parent code continues { … // parent code continues wait(&n); // join wait(&n); // join } } … // parent code continues … // parent code continues … … SPMD program Copy of program, data, and file descriptors (operations by the processes on open files will be independent) 2/23/17 HPC Fall 2012 7 Fork-Join Process 1 Process 2 … … … … pid = fork(); pid = fork(); if (pid == 0) if (pid == 0) { … // code for child { … // code for child exit(0); exit(0); } else } else { … // parent code continues { … // parent code continues wait(&n); // join wait(&n); // join } } … // parent code continues … // parent code continues … … SPMD program Copy of program and data 2/23/17 HPC Fall 2012 8 Fork-Join Process 1 Process 2 … … … … pid = fork(); pid = fork(); if (pid == 0) if (pid == 0) { … // code for child { … // code for child exit(0); exit(0); } else } else { … // parent code continues { … // parent code continues wait(&n); // join wait(&n); // join } } … // parent code continues … // parent code continues … … SPMD program Copy of program and data 2/23/17 HPC Fall 2012 9 Fork-Join Process 1 Process 2 … … … … pid = fork(); pid = fork(); if (pid == 0) if (pid == 0) { … // code for child { … // code for child exit(0); exit(0); } else } else { … // parent code continues { … // parent code continues wait(&n); // join wait(&n); // join } } … // parent code continues … // parent code continues … … SPMD program Terminated 2/23/17 HPC Fall 2012 10 Creating Shared Data for IPC n Interprocess communication shmget() returns the shared memory identifier for a (IPC) via shared data given key (key is for naming and locking) n Processes do not automatically share data shmat() n Use files to share data attaches the segment identified by a shared memory identifier and returns the address of ¨ Slow, but portable the memory segment n Unix system V shmget() ¨ Allocates shared pages shmctl() between two or more deletes the segment with IPC_RMID argument processes mmap() n BSD Unix mmap() returns the address of a mapped object ¨ Uses file-memory mapping to described by the file id returned by open() create shared data in memory ¨ Based on the principle that munmap() files are shared between deletes the mapping for a given address processes 2/23/17 HPC Fall 2012 11 shmget vs mmap #include <sys/types.h> #include <sys/ipc.h> #include <sys/types.h> #include <sys/shm.h> #include <sys/mman.h> size_t len; // size of data we want size_t len; // size of data we want void *buf; // to point to shared data void *buf; // to point to shared data int shmid; buf = mmap(NULL, key_t key = 9876; // or IPC_PRIVATE len, shmid = shmget(key, PROT_READ|PROT_WRITE, len, MAP_SHARED|MAP_ANON, IPC_CREAT|0666); -1, // fd=-1 is unnamed if (shmid == -1) … // error 0); buf = shmat(shmid, NULL, 0); if (buf == MAP_FAILED) … // error if (buf == (void*)-1) … // error … … fork(); // parent and child use buf fork(); // parent and child use buf … … wait(&n); wait(&n); munmap(buf, len); shmctl(shmid, IPC_RMID, NULL); … Tip: use ipcs command to display IPC shared memory status of a system 2/23/17 HPC Fall 2012 12 Threads n Threads of control operate in the same memory space, sharing code and data ¨ Data is implicitly shared ¨ Consider data on a thread’s stack private n Many OS-specific thread APIs ¨ Windows threads, Linux threads, Java threads, … n POSIX-compliant Pthreads: pthread_create() start a new thread Thread creation and join pthread_join() wait for child thread to join pthread_exit() stop thread 2/23/17 HPC Fall 2012 13 Detached Threads n Detached threads do not join n Use pthread_detach(thread_id) n Detached threads are more efficient n Make sure that all detached threads terminate before program terminates 2/23/17 HPC Fall 2012 14 Process vs Threads Process Process with two threads What happens when we fork a process that executes multiple threads? Does fork duplicate only the calling thread or all threads? 2/23/17 HPC Fall 2012 15 Thread Pools n Thread pooling (or process pooling) is an efficient mechanism n One master thread dispatches jobs to worker threads n Worker threads in pool never terminate and keep accepting new jobs when old job done n Jobs are communicated to workers via a shared job queue n Best for irregular job loads 2/23/17 HPC Fall 2012 16 MT-Safety time_t clk = clock(); n Routines must be multi-thread char *txt = ctime(&clk); safe (MT-safe) when invoked by printf(“Current time: %s\n”, txt); more than one thread n Non-MT-safe routines must be Use of a non-MT-safe routine placed in a critical section, e.g. using a mutex lock (see later) time_t clk = clock(); n Many C libraries are not MT-safe char txt[32]; ¨ Use libroutine_r() versions that ctime_r(&clk, txt); are “reentrant” printf(“Current time: %s\n”, txt); ¨ When building your own MT-safe Use of the reentrant version of ctime library, use #define _REENTRANT n Always make your routines MT- static int counter = 0; safe for reuse in a threaded int count_events() application { return counter++; } n Use locks when necessary (see next slides) Is this routine MT-safe? What can go wrong? 2/23/17 HPC Fall 2012 17 Coordinated Access to Shared Data (Such as Job Queues) n Reading and writing shared data by more than one thread or process requires coordination with locking n Cannot update shared variables simultaneously by more than one thread static int counter = 0; int count_events() { int n; static int counter = 0; pthread_mutex_lock(&lock); int count_events() n = counter++; { return counter++; pthread_mutex_unlock(&lock); } return n-1; } reg1 = M[counter] = 3 reg1 = M[counter] = 3 acquire lock acquire lock reg2 = reg1 + 1 = 4 reg2 = reg1 + 1 = 4 reg1 = M[counter] = 3 … M[counter] = reg2 = 4 M[counter] = reg2 = 4 reg2 = reg1 + 1 = 4 … wait return reg1 = 3 return reg1 = 3 M[counter] = reg2 = 4 … release lock … Thread 1 Thread 2 reg1 = M[counter] = 4 Thread 1 reg2 = reg1 + 1 = 5 M[counter] = reg2 = 5 release lock 2/23/17 HPC Fall 2012 Thread 2 18 Spinlocks n Spin locks use busy waiting until a condition is met n Naïve implementations are almost always incorrect // initially lock = 0 while (lock == 1) ; // do nothing Acquire lock lock = 1; … critical section … lock = 0; Release lock Two or more threads want to enter the critical section, what can go wrong? 2/23/17 HPC Fall 2012 19 Spinlocks n Spin locks use busy waiting until a condition is met n Naïve implementations are almost always incorrect Thread 1 while (lock == 1) Thread 2 ; // do nothing lock = 1; while (lock == 1) … critical section … … lock = 0; … lock = 1; … critical section … This ordering works lock = 0; 2/23/17 HPC Fall 2012 20 Spinlocks n Spin locks use busy waiting until a condition is met n Naïve implementations are almost always incorrect Thread 1 Not set in time Thread 2 while (lock == 1) before read ; // do nothing while (lock == 1) lock = 1; … … critical section … lock = 1; lock = 0; … critical section … lock = 0; This statement interleaving leads Both threads end up executing the to failure critical section! 2/23/17 HPC Fall 2012 21 Spinlocks n Spin locks use busy waiting until a condition is met n Naïve implementations are almost always incorrect Thread 1 Thread 2 while (lock == 1) Useless ; // do nothing assignment while (lock == 1) lock = 1; removed … … critical section … lock = 1; lock = 0; Assignment … critical section … can be lock = 0; moved by compiler Atomic operations such as atomic “test- Compiler optimizes and-set” instructions must be used the code! (these

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    37 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us