Shared Memory Programming

PCOPP2002 PCOPP-2002 Day – 2 Classroom Lecture SharedShared MemoryMemory Programming:Programming: AnAn IntroductionIntroduction toto PthreadsPthreads June03- 06,2002 C-DAC, Pune Shared Memory Programming – An Introduction to Pthreads Copyright C-DAC, 2002 1 PCOPP2002 Shared Memory Programming – Pthreads Lecture Outline Following Topics will be discussed What is Thread model Designing Threaded Programs Examples of Threaded Programs Understanding Pthreads implementation Pthread functions for Synchronization, Synchronization tools, Debugging tools Pthread - Performance issues Pthread - Performance issues using Threads and Processes June03- 06,2002 C-DAC, Pune Shared Memory Programming – An Introduction to Pthreads Copyright C-DAC, 2002 2 PCOPP2002 Shared Memory Programming (Contd…) Parallel programming based on shared-memory model has not progressed as much as message-passing model Why? Lack of a widely accepted standard such as MPI or PVM. Shared memory programs are written in a platform-specific language for multiprocessors (mostly SMP’s) such programs are not portable. Now idea on MPPs/PVPS/Clusters. Platform independent shared memory programming models : X3H5, Pthreads, and Open MP. The SGI power C uses a small set of structured constructs to extend C to a shared memory parallel language. The X3H5 standard has not gained wide acceptance, but has influenced the design of several commercial shared June03- 06,2002 C-DAC, Pune memory languages. Shared Memory Programming – An Introduction to Pthreads Copyright C-DAC, 2002 3 PCOPP2002 ExplicitExplicit ParallelismParallelism :: SharedShared VariableVariable ModelModel It has a single address space and data resides in single shared address space, thus does not have to be explicitly allocated It is multithreading and asynchronous (Similar to message- passing model) Workload can be either explicitly or implicitly allocated Communication is done implicitly through shared reads and writes of variables. However synchronization is explicit Shared-variable model is not easier than the message passing model. Shared variable programs could incur higher interaction overhead and run more slowly than a message passing one on June03- 06,2002 C-DAC, Pune a cluster an MPP or even a SMP Shared Memory Programming – An Introduction to Pthreads Copyright C-DAC, 2002 4 PCOPP2002 Why Use Threads Over Processes? Creating a new process can be expensive. More resources and required. If the process creation triggers process rescheduling activity, the operating system’s context-switching mechanism will become involved. It takes memory (The entire process must be replicated.) The cost of inter-process communication and synchronization of shared data, which also may involve calls into the operating system kernel and is quite complicated compared to MPI etc. June03- 06,2002 C-DAC, Pune Shared Memory Programming – An Introduction to Pthreads Copyright C-DAC, 2002 5 PCOPP2002 Why Use Threads Over Processes? Contd.. When processes synchronize, they usually have to issue system calls, a relatively expensive operation that involves trapping into the kernel. Threads can be created without replicating an entire process. Some work (not all) of the creating thread can be done in user space rather than kernel space. Threads can synchronize by simply monitoring a variable - staying within the user address space of the program June03- 06,2002 C-DAC, Pune Shared Memory Programming – An Introduction to Pthreads Copyright C-DAC, 2002 6 PCOPP2002 Performance issues-Between using Threads /processes Threads and processes are alike in many respects Creation : Processes are more expensive to create, and once created, they use more resources than threads to intercommunicate Overhead : Using processes results in more overhead than using threads Synchronization : The synchronization mechanisms used by the multi-threaded server are more efficient than those used by two multiprocess server. Where the multi-threaded server uses mutex locks to control access to shared data, the mulitprocess server uses System V June03- 06,2002 C-DAC, Pune semaphores Shared Memory Programming – An Introduction to Pthreads Copyright C-DAC, 2002 7 PCOPP2002 Performance issues-Between using Threads /processes Contention :When there is little contention among threads for account data, the multithreaded server operates more efficiently because the Pthreads mutex-locking calls operate within user space. Multiprocess server’s semaphore locking calls are system calls and involve the operating system’s kernel. Cost : The multi-threaded server outperforms the multiprocess server, regardless of the number of clients. The difference between the multithreaded and multiprocess servers is in the relative costs of creating threads versus creating processes Sharing Data : Whereas threads exchange data by simply placing it in global variables in their process’s address space, processes must use pipes or special shared memory segments June03- 06,2002 C-DAC, Pune controlled by the operating system Shared Memory Programming – An Introduction to Pthreads Copyright C-DAC, 2002 8 PCOPP2002 Shared Memory Programming : Threads Threads are usually the preferred way to parallelize codes on an SMP All threads share a common address space, so communication and synchronization are much faster than possible with either explicit or implicit distributed shared memory (DSM) Because all threads are part of the same process, co-coordinating access to resources is very easy and is usually automatic. Example All threads use the same file table, so sharing a file and keeping coherency with respect to file position and synchronization I/O is automatic June03- 06,2002 C-DAC, Pune Shared Memory Programming – An Introduction to Pthreads Copyright C-DAC, 2002 9 PCOPP2002 Shared Memory Programming : Threads (Contd…) Threads may communicate to share work, synchronize or do other tasks Because all of the threads are in same process, and will therefore have a common address space, it is fast and easy for them to communicate with each other through global variables Critical Region Simplest mechanism for synchronization is called critical region A critical region is a block of code which at most one thread can execute at a time Critical regions can be created with mutual exclusion locks. There are four interesting mutex system calls June03- 06,2002 C-DAC, Pune Shared Memory Programming – An Introduction to Pthreads Copyright C-DAC, 2002 10 PCOPP2002 Shared Memory Programming : Threads (Contd…) An example of critical region is in a module where threads try to retrieve messages In general you do not want several threads trying to retrieve a given message at the same time so you would put a critical region around code to retrieve a message : • Enter critical region • Get message from message queue • Update pointers into message queue • Leave critical region (allow other threads to enter) Remark :If two threads simultaneously try to retrieve a message, one will be allowed to retrieve the message and the other will June03- 06,2002 C-DAC, Pune block at the point where the critical region is entered Shared Memory Programming – An Introduction to Pthreads Copyright C-DAC, 2002 11 PCOPP2002 Shared Memory Programming : Threads Threads are usually the preferred way to parallelize codes on an SMP. Because all threads are part of the same process, coordinating access to resources is very easy and is usually automatic. All threads share a common address space, so communication and synchronization are much faster than possible with either explicit or implicit distributed shared memory (DSM) Benefits: Major benefit of multi threaded programs over non threaded ones is in their ability to concurrently execute tasks. In providing concurrency, multithreaded programs introduce a certain amount of overhead. June03- 06,2002 C-DAC, Pune Introducing threads in an application that can’t use concurrency, you’ll add overhead without any performance benefit. Shared Memory Programming – An Introduction to Pthreads Copyright C-DAC, 2002 12 PCOPP2002 Threads Parallel Programming A process on a multithreaded system is more complex than a process on other systems Processes from other systems typically own everything associated with the execution: address space file description working directory Priority registers and everything else A multithreaded process may have many threads of central running concurrently It is difficult to have only one copy of certain resources. June03- 06,2002 C-DAC, Pune Shared Memory Programming – An Introduction to Pthreads Copyright C-DAC, 2002 13 PCOPP2002 Threads Parallel Programming A process is divided into three parts The highest level, called the process – contains global information that is unique within the process or must be know by all members of the process. A process may consist of one or many light weight processes (LWPs) each LWP may host one or many threads Each part of the process includes and manages part of the information and resources of which the whole process is composed. Users only need to be concerned with the structure of a process is they, are using threaded parallelism. June03- 06,2002 C-DAC, Pune Shared Memory Programming – An Introduction to Pthreads Copyright C-DAC, 2002 14 PCOPP2002 Parallel programming with threads Resources with process level scope include Address space File description Working directory Any resource that is necessarily process wide in space Resources with LWP scope include

Load more