
Shared memory and message passing revisited in the many-core era Shared memory and Message passing revisited in the many-core era Aram Santogidis CERN Inverted CERN School of Computing, 29 February – 2 March 2016 1 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era The pioneers of concurrent programming Edsger Dijkstra Per Brinch Hansen C.A.R Hoare • Mutual exclusion • Concurrent Pascal • Communicating • Cooperating • Shared Classes Sequential Processes Sequential • The Solo OS (CSP) Processes • Distributed Processes • Monitors • Semaphores 2 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era Communication is important Process 1 Process 2 Process 3 Time Shared memory Communication/ VS Synchronization Message passing 3 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era Agenda of the talk . Concurrency and communication . Two basic examples of the two models . Conventional wisdom for the two models . Cache coherence and manycore processors . Emerging paradigm shift in OS architectures . The future perspective 4 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era The Shared Memory model Thread . Threads Shared data structures communicate implicitly with each other via shared data structures Shared Memory . Synchronization primitives (locks, semaphores, etc.) 5 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era The message passing model Thread . Threads communicate explicitly with each Message other by exchanging messages . Is the more fundamental class from the two . Synchronous or asynchronous communication 6 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era Lets see an example for each model 1. Image processing (shared memory) 2. Simple GUI (message passing) 7 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era A shared memory-based example: Convert from colour to grayscale R: 204 G: 46 (R+G+B) / 3 = 130 B: 10 8 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era A shared memory-based example: Convert from colour to grayscale We parallelize the computation by assigning tiles (pieces) of the image to threads which execute the conversion in parallel. 9 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era A message passing-based example . A GUI with 3 widgets Many operating . Text Area system . Up scroll button designs can be . Down scroll button placed into one of two very rough . Must be interactive categories, (Immediate feedback) depending upon how they… 10 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era GUI example implementation: Message passing solution Many operating system designs can be placed into one of two very rough categories, depending upon Thread how they… UpButton Thread TextArea Thread DownButton 11 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era Conventional wisdom about the characteristics of the two models 1.Performance 2.Programmability 12 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era Performance comparison Shared Memory Message Passing Hardware support Extensive Limited (All popular architectures) (Only special purpose architectures) Data transfer Low High Overhead (Cache block management in (Data replication) HW) Access/Sync Sometimes high Low overhead (Critical section contention, (Local private memory NUMA effects) access) 13 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era Programmability comparison Shared Memory Message Passing Communication Implicit Explicit Synchronization Explicit (locks etc.) Implicit (side-effect) Interface (API) Read/write shared Send/Receive messages, data structures, Multicast mutex primitives Hazards Race conditions, Deadlocks, Deadlocks, Starvation Starvation 14 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era Towards the manycore architectures http://www.wired.com/images_blogs/gadgetlab/2009/10/tilera-wafer-1.jpg 15 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era The manycore era http://image.slideserve.com/277797/manycore-systems-design-space-n.jpg . Power limits the frequency increase of the processor. Moore’s law: The transistors keep doubling every two years . Replication: Increasing number of cores The graph is from (presentation): “Joshi, Ajay, et al. "Building manycore processor-to-DRAM networks using monolithic silicon photonics." High Performance Embedded Computing (HPEC) Workshop. 2008.” 16 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era On the duality of operating systems structures . Operating Systems are generally classified as: . Message passing oriented . Procedure-oriented (shared memory) . Each system from one category has the other category. Neither model is inherently better than the other (depends on the machine architecture). From: Lauer, Hugh C., and Roger M. Needham. "On the duality of operating system structures." ACM SIGOPS Operating Systems Review 13.2 (1979): 3-19. 17 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era Non Uniform Memory Access (NUMA) Socket 0 Socket 1 P0 P1 P2 P3 P4 P5 CPU CPU CPU CPU CPU core 1 2 3 4 5 Local Local Local QPI Local Local Local Cache Cache Cache Cache Cache Cache Last Level Cache Last Level Cache RAM Domain 0 RAM Domain 1 18 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era Cache coherence i: 0 j=i; i++; L1 Shared L1 Core i: 0 i: 0 Core 0 1 Cache LLC Cache Controller i: 0 Controller BUS 19 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era Cache coherence i: 0 j=i; i++; L1 Modified L1 Core i: 0 i: 1 Core 0 1 Bus/ Invalid Message Invalidate Cache LLC Cache Controller i: 0 Controller BUS 20 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era Cache coherence i: 0 j=i; i++; L1 Shared L1 Core i: 1 i: 1 Core 0 1 Bus Read Cache LLC Cache Controller i: 1 Controller BUS 21 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era A key question When updating shared state, which uproach is more expensive (in terms of latency), Shared memory or Message passing? 22 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era An experiment of shared memory vs message passing performance Thread Core Cache Shared state BUS CPU Hyper in Socket Updating shared state of size [1,8] cachelines, relying on cache coherent shared memory on 4x4 AMD system Transport 23 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era An experiment of shared memory vs message passing performance Server, updating S the shared state on behalf of the threads BUS CPU Hyper Updating shared state of size in Socket [1,8] cachelines, relying on synchronous Lightweight Remote Procedure Calls Transport (message passing) 24 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era Messages scale better than shared memory Message passing scales better than shared memory when increasing the core count and the size of the shared state. The plot is adapted from: Baumann, Andrew, et al. "The multikernel: a new OS architecture for scalable multicore systems." Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. ACM, 2009. 25 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era …some other hints that may lead to further fragmentation of coherency domains http://www.racktopsystems.com/wp-content/uploads/2013/01/sql-server-fragmentation.jpg 26 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era Increasing Heterogeneity of computing platforms Multi socket Manycores, GPU . Message passing: Coprocessor Fundamental for FPGA communication in heterogeneous Dual cores, environment pthreads Single cores, . Shared memory: Concurrent OS, Hard to implement Coroutines Heterogeneity in a heterogeneous environment Time 27 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era Message passing OS vs Shared memory OS Barrelfish OS (Message passing) Linux OS (Shared memory) App App App App App OSnode OSnode OSnode State State Async State Linux replica replica replica messages Kernel Driver Single arch(x86, etc.) GPU x86 ARM … GPU Architecture dependant code Adapted from : Baumann, Andrew, et al. "The multikernel: a new OS architecture for scalable multicore systems." Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. ACM, 2009 28 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era What to expect ? http://tech.co/wp-content/uploads/2014/12/future-marketing.jpg 29 iCSC2016, Aram Santogidis, CERN Shared memory and message passing revisited in the many-core era Emerging concurrency paradigms New high level paradigms are being developed, based on shared memory and/or message passing constructs.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages34 Page
-
File Size-