Programming with Shared Memory PART I

Total Page:16

File Type:pdf, Size:1020Kb

Programming with Shared Memory PART I Programming with Shared Memory PART I HPC Fall 2012 Prof. Robert van Engelen Overview n Shared memory machines n Programming strategies for shared memory machines n Allocating shared data for IPC n Processes and threads n MT-safety issues n Coordinated access to shared data ¨ Locks ¨ Semaphores ¨ Condition variables ¨ Barriers n Further reading 2/23/17 HPC Fall 2012 2 Shared Memory Machines n Single address space n Shared memory ¨ Single bus (UMA) Shared memory UMA machine with a single bus ¨ Interconnect with memory banks (NUMA) ¨ Cross-bar switch n Distributed shared memory (DSM) ¨ Logically shared, physically distributed Shared memory NUMA machine with memory banks DSM Shared memory multiprocessor with cross-bar switch 2/23/17 HPC Fall 2012 3 Programming Strategies for Shared Memory Machines n Use a specialized programming language for parallel computing ¨ For example: HPF, UPC n Use compiler directives to supplement a sequential program with parallel directives ¨ For example: OpenMP n Use libraries ¨ For example: ScaLapack (though ScaLapack is primarily designed for distributed memory) n Use heavyweight processes and a shared memory API n Use threads n Use a parallelizing compiler to transform (part of) a sequential program into a parallel program 2/23/17 HPC Fall 2012 4 Heavyweight Processes Fork-join of processes n The UNIX system call fork() creates a new process ¨ fork() returns 0 to the child process ¨ fork() returns process ID (pid) of child to parent n System call exit(n) joins child process with parent and passes exit value n to it n Parent executes wait(&n) to wait until one of its children joins, where n is set to the exit value Process tree n System and user processes form a tree 2/23/17 HPC Fall 2012 5 Fork-Join Process 1 … … pid = fork(); if (pid == 0) { … // code for child exit(0); } else { … // parent code continues wait(&n); // join } … // parent code continues … SPMD program 2/23/17 HPC Fall 2012 6 Fork-Join Process 1 Process 2 … … … … pid = fork(); pid = fork(); if (pid == 0) if (pid == 0) { … // code for child { … // code for child exit(0); exit(0); } else } else { … // parent code continues { … // parent code continues wait(&n); // join wait(&n); // join } } … // parent code continues … // parent code continues … … SPMD program Copy of program, data, and file descriptors (operations by the processes on open files will be independent) 2/23/17 HPC Fall 2012 7 Fork-Join Process 1 Process 2 … … … … pid = fork(); pid = fork(); if (pid == 0) if (pid == 0) { … // code for child { … // code for child exit(0); exit(0); } else } else { … // parent code continues { … // parent code continues wait(&n); // join wait(&n); // join } } … // parent code continues … // parent code continues … … SPMD program Copy of program and data 2/23/17 HPC Fall 2012 8 Fork-Join Process 1 Process 2 … … … … pid = fork(); pid = fork(); if (pid == 0) if (pid == 0) { … // code for child { … // code for child exit(0); exit(0); } else } else { … // parent code continues { … // parent code continues wait(&n); // join wait(&n); // join } } … // parent code continues … // parent code continues … … SPMD program Copy of program and data 2/23/17 HPC Fall 2012 9 Fork-Join Process 1 Process 2 … … … … pid = fork(); pid = fork(); if (pid == 0) if (pid == 0) { … // code for child { … // code for child exit(0); exit(0); } else } else { … // parent code continues { … // parent code continues wait(&n); // join wait(&n); // join } } … // parent code continues … // parent code continues … … SPMD program Terminated 2/23/17 HPC Fall 2012 10 Creating Shared Data for IPC n Interprocess communication shmget() returns the shared memory identifier for a (IPC) via shared data given key (key is for naming and locking) n Processes do not automatically share data shmat() n Use files to share data attaches the segment identified by a shared memory identifier and returns the address of ¨ Slow, but portable the memory segment n Unix system V shmget() ¨ Allocates shared pages shmctl() between two or more deletes the segment with IPC_RMID argument processes mmap() n BSD Unix mmap() returns the address of a mapped object ¨ Uses file-memory mapping to described by the file id returned by open() create shared data in memory ¨ Based on the principle that munmap() files are shared between deletes the mapping for a given address processes 2/23/17 HPC Fall 2012 11 shmget vs mmap #include <sys/types.h> #include <sys/ipc.h> #include <sys/types.h> #include <sys/shm.h> #include <sys/mman.h> size_t len; // size of data we want size_t len; // size of data we want void *buf; // to point to shared data void *buf; // to point to shared data int shmid; buf = mmap(NULL, key_t key = 9876; // or IPC_PRIVATE len, shmid = shmget(key, PROT_READ|PROT_WRITE, len, MAP_SHARED|MAP_ANON, IPC_CREAT|0666); -1, // fd=-1 is unnamed if (shmid == -1) … // error 0); buf = shmat(shmid, NULL, 0); if (buf == MAP_FAILED) … // error if (buf == (void*)-1) … // error … … fork(); // parent and child use buf fork(); // parent and child use buf … … wait(&n); wait(&n); munmap(buf, len); shmctl(shmid, IPC_RMID, NULL); … Tip: use ipcs command to display IPC shared memory status of a system 2/23/17 HPC Fall 2012 12 Threads n Threads of control operate in the same memory space, sharing code and data ¨ Data is implicitly shared ¨ Consider data on a thread’s stack private n Many OS-specific thread APIs ¨ Windows threads, Linux threads, Java threads, … n POSIX-compliant Pthreads: pthread_create() start a new thread Thread creation and join pthread_join() wait for child thread to join pthread_exit() stop thread 2/23/17 HPC Fall 2012 13 Detached Threads n Detached threads do not join n Use pthread_detach(thread_id) n Detached threads are more efficient n Make sure that all detached threads terminate before program terminates 2/23/17 HPC Fall 2012 14 Process vs Threads Process Process with two threads What happens when we fork a process that executes multiple threads? Does fork duplicate only the calling thread or all threads? 2/23/17 HPC Fall 2012 15 Thread Pools n Thread pooling (or process pooling) is an efficient mechanism n One master thread dispatches jobs to worker threads n Worker threads in pool never terminate and keep accepting new jobs when old job done n Jobs are communicated to workers via a shared job queue n Best for irregular job loads 2/23/17 HPC Fall 2012 16 MT-Safety time_t clk = clock(); n Routines must be multi-thread char *txt = ctime(&clk); safe (MT-safe) when invoked by printf(“Current time: %s\n”, txt); more than one thread n Non-MT-safe routines must be Use of a non-MT-safe routine placed in a critical section, e.g. using a mutex lock (see later) time_t clk = clock(); n Many C libraries are not MT-safe char txt[32]; ¨ Use libroutine_r() versions that ctime_r(&clk, txt); are “reentrant” printf(“Current time: %s\n”, txt); ¨ When building your own MT-safe Use of the reentrant version of ctime library, use #define _REENTRANT n Always make your routines MT- static int counter = 0; safe for reuse in a threaded int count_events() application { return counter++; } n Use locks when necessary (see next slides) Is this routine MT-safe? What can go wrong? 2/23/17 HPC Fall 2012 17 Coordinated Access to Shared Data (Such as Job Queues) n Reading and writing shared data by more than one thread or process requires coordination with locking n Cannot update shared variables simultaneously by more than one thread static int counter = 0; int count_events() { int n; static int counter = 0; pthread_mutex_lock(&lock); int count_events() n = counter++; { return counter++; pthread_mutex_unlock(&lock); } return n-1; } reg1 = M[counter] = 3 reg1 = M[counter] = 3 acquire lock acquire lock reg2 = reg1 + 1 = 4 reg2 = reg1 + 1 = 4 reg1 = M[counter] = 3 … M[counter] = reg2 = 4 M[counter] = reg2 = 4 reg2 = reg1 + 1 = 4 … wait return reg1 = 3 return reg1 = 3 M[counter] = reg2 = 4 … release lock … Thread 1 Thread 2 reg1 = M[counter] = 4 Thread 1 reg2 = reg1 + 1 = 5 M[counter] = reg2 = 5 release lock 2/23/17 HPC Fall 2012 Thread 2 18 Spinlocks n Spin locks use busy waiting until a condition is met n Naïve implementations are almost always incorrect // initially lock = 0 while (lock == 1) ; // do nothing Acquire lock lock = 1; … critical section … lock = 0; Release lock Two or more threads want to enter the critical section, what can go wrong? 2/23/17 HPC Fall 2012 19 Spinlocks n Spin locks use busy waiting until a condition is met n Naïve implementations are almost always incorrect Thread 1 while (lock == 1) Thread 2 ; // do nothing lock = 1; while (lock == 1) … critical section … … lock = 0; … lock = 1; … critical section … This ordering works lock = 0; 2/23/17 HPC Fall 2012 20 Spinlocks n Spin locks use busy waiting until a condition is met n Naïve implementations are almost always incorrect Thread 1 Not set in time Thread 2 while (lock == 1) before read ; // do nothing while (lock == 1) lock = 1; … … critical section … lock = 1; lock = 0; … critical section … lock = 0; This statement interleaving leads Both threads end up executing the to failure critical section! 2/23/17 HPC Fall 2012 21 Spinlocks n Spin locks use busy waiting until a condition is met n Naïve implementations are almost always incorrect Thread 1 Thread 2 while (lock == 1) Useless ; // do nothing assignment while (lock == 1) lock = 1; removed … … critical section … lock = 1; lock = 0; Assignment … critical section … can be lock = 0; moved by compiler Atomic operations such as atomic “test- Compiler optimizes and-set” instructions must be used the code! (these
Recommended publications
  • 2.5 Classification of Parallel Computers
    52 // Architectures 2.5 Classification of Parallel Computers 2.5 Classification of Parallel Computers 2.5.1 Granularity In parallel computing, granularity means the amount of computation in relation to communication or synchronisation Periods of computation are typically separated from periods of communication by synchronization events. • fine level (same operations with different data) ◦ vector processors ◦ instruction level parallelism ◦ fine-grain parallelism: – Relatively small amounts of computational work are done between communication events – Low computation to communication ratio – Facilitates load balancing 53 // Architectures 2.5 Classification of Parallel Computers – Implies high communication overhead and less opportunity for per- formance enhancement – If granularity is too fine it is possible that the overhead required for communications and synchronization between tasks takes longer than the computation. • operation level (different operations simultaneously) • problem level (independent subtasks) ◦ coarse-grain parallelism: – Relatively large amounts of computational work are done between communication/synchronization events – High computation to communication ratio – Implies more opportunity for performance increase – Harder to load balance efficiently 54 // Architectures 2.5 Classification of Parallel Computers 2.5.2 Hardware: Pipelining (was used in supercomputers, e.g. Cray-1) In N elements in pipeline and for 8 element L clock cycles =) for calculation it would take L + N cycles; without pipeline L ∗ N cycles Example of good code for pipelineing: §doi =1 ,k ¤ z ( i ) =x ( i ) +y ( i ) end do ¦ 55 // Architectures 2.5 Classification of Parallel Computers Vector processors, fast vector operations (operations on arrays). Previous example good also for vector processor (vector addition) , but, e.g. recursion – hard to optimise for vector processors Example: IntelMMX – simple vector processor.
    [Show full text]
  • Chapter 5 Multiprocessors and Thread-Level Parallelism
    Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism Copyright © 2012, Elsevier Inc. All rights reserved. 1 Contents 1. Introduction 2. Centralized SMA – shared memory architecture 3. Performance of SMA 4. DMA – distributed memory architecture 5. Synchronization 6. Models of Consistency Copyright © 2012, Elsevier Inc. All rights reserved. 2 1. Introduction. Why multiprocessors? Need for more computing power Data intensive applications Utility computing requires powerful processors Several ways to increase processor performance Increased clock rate limited ability Architectural ILP, CPI – increasingly more difficult Multi-processor, multi-core systems more feasible based on current technologies Advantages of multiprocessors and multi-core Replication rather than unique design. Copyright © 2012, Elsevier Inc. All rights reserved. 3 Introduction Multiprocessor types Symmetric multiprocessors (SMP) Share single memory with uniform memory access/latency (UMA) Small number of cores Distributed shared memory (DSM) Memory distributed among processors. Non-uniform memory access/latency (NUMA) Processors connected via direct (switched) and non-direct (multi- hop) interconnection networks Copyright © 2012, Elsevier Inc. All rights reserved. 4 Important ideas Technology drives the solutions. Multi-cores have altered the game!! Thread-level parallelism (TLP) vs ILP. Computing and communication deeply intertwined. Write serialization exploits broadcast communication
    [Show full text]
  • Beej's Guide to Unix IPC
    Beej's Guide to Unix IPC Brian “Beej Jorgensen” Hall [email protected] Version 1.1.3 December 1, 2015 Copyright © 2015 Brian “Beej Jorgensen” Hall This guide is written in XML using the vim editor on a Slackware Linux box loaded with GNU tools. The cover “art” and diagrams are produced with Inkscape. The XML is converted into HTML and XSL-FO by custom Python scripts. The XSL-FO output is then munged by Apache FOP to produce PDF documents, using Liberation fonts. The toolchain is composed of 100% Free and Open Source Software. Unless otherwise mutually agreed by the parties in writing, the author offers the work as-is and makes no representations or warranties of any kind concerning the work, express, implied, statutory or otherwise, including, without limitation, warranties of title, merchantibility, fitness for a particular purpose, noninfringement, or the absence of latent or other defects, accuracy, or the presence of absence of errors, whether or not discoverable. Except to the extent required by applicable law, in no event will the author be liable to you on any legal theory for any special, incidental, consequential, punitive or exemplary damages arising out of the use of the work, even if the author has been advised of the possibility of such damages. This document is freely distributable under the terms of the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. See the Copyright and Distribution section for details. Copyright © 2015 Brian “Beej Jorgensen” Hall Contents 1. Intro................................................................................................................................................................1 1.1. Audience 1 1.2. Platform and Compiler 1 1.3.
    [Show full text]
  • Mmap and Dma
    CHAPTER THIRTEEN MMAP AND DMA This chapter delves into the area of Linux memory management, with an emphasis on techniques that are useful to the device driver writer. The material in this chap- ter is somewhat advanced, and not everybody will need a grasp of it. Nonetheless, many tasks can only be done through digging more deeply into the memory man- agement subsystem; it also provides an interesting look into how an important part of the kernel works. The material in this chapter is divided into three sections. The first covers the implementation of the mmap system call, which allows the mapping of device memory directly into a user process’s address space. We then cover the kernel kiobuf mechanism, which provides direct access to user memory from kernel space. The kiobuf system may be used to implement ‘‘raw I/O’’ for certain kinds of devices. The final section covers direct memory access (DMA) I/O operations, which essentially provide peripherals with direct access to system memory. Of course, all of these techniques requir e an understanding of how Linux memory management works, so we start with an overview of that subsystem. Memor y Management in Linux Rather than describing the theory of memory management in operating systems, this section tries to pinpoint the main features of the Linux implementation of the theory. Although you do not need to be a Linux virtual memory guru to imple- ment mmap, a basic overview of how things work is useful. What follows is a fairly lengthy description of the data structures used by the kernel to manage memory.
    [Show full text]
  • An Introduction to Linux IPC
    An introduction to Linux IPC Michael Kerrisk © 2013 linux.conf.au 2013 http://man7.org/ Canberra, Australia [email protected] 2013-01-30 http://lwn.net/ [email protected] man7 .org 1 Goal ● Limited time! ● Get a flavor of main IPC methods man7 .org 2 Me ● Programming on UNIX & Linux since 1987 ● Linux man-pages maintainer ● http://www.kernel.org/doc/man-pages/ ● Kernel + glibc API ● Author of: Further info: http://man7.org/tlpi/ man7 .org 3 You ● Can read a bit of C ● Have a passing familiarity with common syscalls ● fork(), open(), read(), write() man7 .org 4 There’s a lot of IPC ● Pipes ● Shared memory mappings ● FIFOs ● File vs Anonymous ● Cross-memory attach ● Pseudoterminals ● proc_vm_readv() / proc_vm_writev() ● Sockets ● Signals ● Stream vs Datagram (vs Seq. packet) ● Standard, Realtime ● UNIX vs Internet domain ● Eventfd ● POSIX message queues ● Futexes ● POSIX shared memory ● Record locks ● ● POSIX semaphores File locks ● ● Named, Unnamed Mutexes ● System V message queues ● Condition variables ● System V shared memory ● Barriers ● ● System V semaphores Read-write locks man7 .org 5 It helps to classify ● Pipes ● Shared memory mappings ● FIFOs ● File vs Anonymous ● Cross-memory attach ● Pseudoterminals ● proc_vm_readv() / proc_vm_writev() ● Sockets ● Signals ● Stream vs Datagram (vs Seq. packet) ● Standard, Realtime ● UNIX vs Internet domain ● Eventfd ● POSIX message queues ● Futexes ● POSIX shared memory ● Record locks ● ● POSIX semaphores File locks ● ● Named, Unnamed Mutexes ● System V message queues ● Condition variables ● System V shared memory ● Barriers ● ● System V semaphores Read-write locks man7 .org 6 It helps to classify ● Pipes ● Shared memory mappings ● FIFOs ● File vs Anonymous ● Cross-memoryn attach ● Pseudoterminals tio a ● proc_vm_readv() / proc_vm_writev() ● Sockets ic n ● Signals ● Stream vs Datagram (vs uSeq.
    [Show full text]
  • Chapter 5 Thread-Level Parallelism
    Chapter 5 Thread-Level Parallelism Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Progress Towards Multiprocessors + Rate of speed growth in uniprocessors saturated + Wide-issue processors are very complex + Wide-issue processors consume a lot of power + Steady progress in parallel software : the major obstacle to parallel processing 2 Flynn’s Classification of Parallel Architectures According to the parallelism in I and D stream • Single I stream , single D stream (SISD): uniprocessor • Single I stream , multiple D streams(SIMD) : same I executed by multiple processors using diff D – Each processor has its own data memory – There is a single control processor that sends the same I to all processors – These processors are usually special purpose 3 • Multiple I streams, single D stream (MISD) : no commercial machine • Multiple I streams, multiple D streams (MIMD) – Each processor fetches its own instructions and operates on its own data – Architecture of choice for general purpose mps – Flexible: can be used in single user mode or multiprogrammed – Use of the shelf µprocessors 4 MIMD Machines 1. Centralized shared memory architectures – Small #’s of processors (≈ up to 16-32) – Processors share a centralized memory – Usually connected in a bus – Also called UMA machines ( Uniform Memory Access) 2. Machines w/physically distributed memory – Support many processors – Memory distributed among processors – Scales the mem bandwidth if most of the accesses are to local mem 5 Figure 5.1 6 Figure 5.2 7 2. Machines w/physically distributed memory (cont) – Also reduces the memory latency – Of course interprocessor communication is more costly and complex – Often each node is a cluster (bus based multiprocessor) – 2 types, depending on method used for interprocessor communication: 1.
    [Show full text]
  • Table of Contents
    TABLE OF CONTENTS Chapter 1 Introduction ............................................................................................. 3 Chapter 2 Examples for FPGA ................................................................................. 4 2.1 Factory Default Code ................................................................................................................................. 4 2.2 Nios II Control for Programmable PLL/ Temperature/ Power/ 9-axis ....................................................... 6 2.3 Nios DDR4 SDRAM Test ........................................................................................................................ 12 2.4 RTL DDR4 SDRAM Test ......................................................................................................................... 14 2.5 USB Type-C DisplayPort Alternate Mode ............................................................................................... 15 2.6 USB Type-C FX3 Loopback .................................................................................................................... 17 2.7 HDMI TX and RX in 4K Resolution ........................................................................................................ 21 2.8 HDMI TX in 4K Resolution ..................................................................................................................... 26 2.9 Low Latency Ethernet 10G MAC Demo .................................................................................................. 29 2.10 Socket
    [Show full text]
  • Trafficdb: HERE's High Performance Shared-Memory Data Store
    TrafficDB: HERE’s High Performance Shared-Memory Data Store Ricardo Fernandes Piotr Zaczkowski Bernd Gottler¨ HERE Global B.V. HERE Global B.V. HERE Global B.V. [email protected] [email protected] [email protected] Conor Ettinoffe Anis Moussa HERE Global B.V. HERE Global B.V. conor.ettinoff[email protected] [email protected] ABSTRACT HERE's traffic-aware services enable route planning and traf- fic visualisation on web, mobile and connected car appli- cations. These services process thousands of requests per second and require efficient ways to access the information needed to provide a timely response to end-users. The char- acteristics of road traffic information and these traffic-aware services require storage solutions with specific performance features. A route planning application utilising traffic con- gestion information to calculate the optimal route from an origin to a destination might hit a database with millions of queries per second. However, existing storage solutions are not prepared to handle such volumes of concurrent read Figure 1: HERE Location Cloud is accessible world- operations, as well as to provide the desired vertical scalabil- wide from web-based applications, mobile devices ity. This paper presents TrafficDB, a shared-memory data and connected cars. store, designed to provide high rates of read operations, en- abling applications to directly access the data from memory. Our evaluation demonstrates that TrafficDB handles mil- HERE is a leader in mapping and location-based services, lions of read operations and provides near-linear scalability providing fresh and highly accurate traffic information to- on multi-core machines, where additional processes can be gether with advanced route planning and navigation tech- spawned to increase the systems' throughput without a no- nologies that helps drivers reach their destination in the ticeable impact on the latency of querying the data store.
    [Show full text]
  • IPC: Mmap and Pipes
    Interprocess Communication: Memory mapped files and pipes CS 241 April 4, 2014 University of Illinois 1 Shared Memory Private Private address OS address address space space space Shared Process A segment Process B Processes request the segment OS maintains the segment Processes can attach/detach the segment 2 Shared Memory Private Private Private address address space OS address address space space space Shared Process A segment Process B Can mark segment for deletion on last detach 3 Shared Memory example /* make the key: */ if ((key = ftok(”shmdemo.c", 'R')) == -1) { perror("ftok"); exit(1); } /* connect to (and possibly create) the segment: */ if ((shmid = shmget(key, SHM_SIZE, 0644 | IPC_CREAT)) == -1) { perror("shmget"); exit(1); } /* attach to the segment to get a pointer to it: */ data = shmat(shmid, (void *)0, 0); if (data == (char *)(-1)) { perror("shmat"); exit(1); } 4 Shared Memory example /* read or modify the segment, based on the command line: */ if (argc == 2) { printf("writing to segment: \"%s\"\n", argv[1]); strncpy(data, argv[1], SHM_SIZE); } else printf("segment contains: \"%s\"\n", data); /* detach from the segment: */ if (shmdt(data) == -1) { perror("shmdt"); exit(1); } return 0; } Run demo 5 Memory Mapped Files Memory-mapped file I/O • Map a disk block to a page in memory • Allows file I/O to be treated as routine memory access Use • File is initially read using demand paging ! i.e., loaded from disk to memory only at the moment it’s needed • When needed, a page-sized portion of the file is read from the file system
    [Show full text]
  • A Machine-Independent DMA Framework for Netbsd
    AMachine-Independent DMA Framework for NetBSD Jason R. Thorpe1 Numerical Aerospace Simulation Facility NASA Ames Research Center Abstract 1.1. Host platform details One of the challenges in implementing a portable In the example platforms listed above,there are at kernel is finding good abstractions for semantically- least three different mechanisms used to perform DMA. similar operations which often have very machine- The first is used by the i386 platform. This mechanism dependent implementations. This is especially impor- can be described as "what you see is what you get": the tant on modern machines which share common archi- address that the device uses to perform the DMA trans- tectural features, e.g. the PCI bus. fer is the same address that the host CPU uses to access This paper describes whyamachine-independent the memory location in question. DMA mapping abstraction is needed, the design consid- DMA address Host address erations for such an abstraction, and the implementation of this abstraction in the NetBSD/alpha and NetBSD/i386 kernels. 1. Introduction NetBSD is a portable, modern UNIX-likeoperat- ing system which currently runs on eighteen platforms covering nine processor architectures. Some of these platforms, including the Alpha and i3862,share the PCI busasacommon architectural feature. In order to share device drivers for PCI devices between different platforms, abstractions that hide the details of bus access must be invented. The details that must be hid- den can be broken down into twoclasses: CPU access Figure 1 - WYSIWYG DMA to devices on the bus (bus_space)and device access to host memory (bus_dma). Here we will discuss the lat- The second mechanism, employed by the Alpha, ter; bus_space is a complicated topic in and of itself, is very similar to the first; the address the host CPU and is beyond the scope of this paper.
    [Show full text]
  • Interprocess Communication
    06 0430 CH05 5/22/01 10:22 AM Page 95 5 Interprocess Communication CHAPTER 3,“PROCESSES,” DISCUSSED THE CREATION OF PROCESSES and showed how one process can obtain the exit status of a child process.That’s the simplest form of communication between two processes, but it’s by no means the most powerful.The mechanisms of Chapter 3 don’t provide any way for the parent to communicate with the child except via command-line arguments and environment variables, nor any way for the child to communicate with the parent except via the child’s exit status. None of these mechanisms provides any means for communicating with the child process while it is actually running, nor do these mechanisms allow communication with a process outside the parent-child relationship. This chapter describes means for interprocess communication that circumvent these limitations.We will present various ways for communicating between parents and chil- dren, between “unrelated” processes, and even between processes on different machines. Interprocess communication (IPC) is the transfer of data among processes. For example, a Web browser may request a Web page from a Web server, which then sends HTML data.This transfer of data usually uses sockets in a telephone-like connection. In another example, you may want to print the filenames in a directory using a command such as ls | lpr.The shell creates an ls process and a separate lpr process, connecting 06 0430 CH05 5/22/01 10:22 AM Page 96 96 Chapter 5 Interprocess Communication the two with a pipe, represented by the “|” symbol.A pipe permits one-way commu- nication between two related processes.The ls process writes data into the pipe, and the lpr process reads data from the pipe.
    [Show full text]
  • Real-Time Streaming Video and Image Processing on Inexpensive Hardware with Low Latency
    University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Theses, Dissertations, and Student Research Electrical & Computer Engineering, Department from Electrical & Computer Engineering of Spring 5-2018 Real-Time Streaming Video and Image Processing on Inexpensive Hardware with Low Latency Richard L. Gregg University of Nebraska - Lincoln, [email protected] Follow this and additional works at: https://digitalcommons.unl.edu/elecengtheses Part of the Computer Engineering Commons, and the Other Electrical and Computer Engineering Commons Gregg, Richard L., "Real-Time Streaming Video and Image Processing on Inexpensive Hardware with Low Latency" (2018). Theses, Dissertations, and Student Research from Electrical & Computer Engineering. 93. https://digitalcommons.unl.edu/elecengtheses/93 This Article is brought to you for free and open access by the Electrical & Computer Engineering, Department of at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in Theses, Dissertations, and Student Research from Electrical & Computer Engineering by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. Real-Time Streaming Video And Image Processing On Inexpensive Hardware With Low Latency by Richard L. Gregg A THESIS Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Master of Science Major: Telecommunications Engineering Under the Supervision of Professor Dongming Peng Lincoln, Nebraska May 2018 REAL-TIME STREAMING VIDEO AND IMAGE PROCESSING ON INEXPENSIVE HARDWARE WITH LOW LATENCY Richard L. Gregg, M.S. University of Nebraska, 2018 Advisor: Dongming Peng The use of resource constrained inexpensive hardware places restrictions on the design of streaming video and image processing system performance.
    [Show full text]