CS 134: Operating Systems Multiprocessing 2013-05-17 CS 134: Operating Systems Multiprocessing
Total Page:16
File Type:pdf, Size:1020Kb
CS34 CS 134: Operating Systems Multiprocessing 2013-05-17 CS 134: Operating Systems Multiprocessing 1 / 13 Overview CS34 Overview Multiprocessing Designs OS Implications Overview Programming Models Other Issues 2013-05-17 Multiprocessing Designs OS Implications Programming Models Other Issues 2 / 13 Multiprocessing Designs SIMD and MIMD CS34 SIMD and MIMD Multiprocessing Designs Multiple CPUs come in several flavors: SIMD: Single Instruction, Multiple Data I Also called vector processor I Sample instruction: a[i] = b[i] + c[i] for i in small range (e.g., 0-3) SIMD and MIMD I Canonical example: GPUs MIMD: Multiple Instruction, Multiple Data I.e., 2 or more (semi-)independent CPUS 2013-05-17 Multiple CPUs come in several flavors: We won’t talk further about SIMD; from an OS point of view it’s just SIMD: Single Instruction, Multiple Data another CPU. I Also called vector processor I Sample instruction: a[i] = b[i] + c[i] for i in small range (e.g., 0-3) I Canonical example: GPUs MIMD: Multiple Instruction, Multiple Data I.e., 2 or more (semi-)independent CPUS 3 / 13 Multiprocessing Designs MIMD Approaches CS34 MIMD Approaches Multiprocessing Designs MIMD can be: I Several chips or cores, (semi-)private memories, able to access each other’s memory (NUMA—Non-Uniform Memory Access) I Several chips or cores, one memory (SMP—Symmetric MIMD Approaches Multiprocessing) I Several boxes (possibly each SMP or NUMA) connected by network (distributed system) 2013-05-17 MIMD can be: I Several chips or cores, (semi-)private memories, able to access each other’s memory (NUMA—Non-Uniform Memory Access) I Several chips or cores, one memory (SMP—Symmetric Multiprocessing) I Several boxes (possibly each SMP or NUMA) connected by network (distributed system) 4 / 13 OS Implications NUMA Issues CS34 NUMA Issues OS Implications NUMA means processes access local memory faster ) Allocate process memory on local CPU NUMA Issues ) Processes should have “CPU affinity” 2013-05-17 NUMA means processes access local memory faster ) Allocate process memory on local CPU ) Processes should have “CPU affinity” 5 / 13 OS Implications SMP Issues CS34 SMP Issues OS Implications SMPs still have caches Introduces cache coherency problems: I Processor 0 uses compare-and-swap to set a lock nonzero I Write goes into local cache for speed SMP Issues I Processor 1 reads lock from own cache, sees it’s still zero. Cure: hardware coherency guarantees ... but spinlocks now have super-high costs I May be better to do thread switch 2013-05-17 SMPs still have caches Thread switch is high cost, but may be cheaper than spinlock. Introduces cache coherency problems: I Processor 0 uses compare-and-swap to set a lock nonzero I Write goes into local cache for speed I Processor 1 reads lock from own cache, sees it’s still zero. Cure: hardware coherency guarantees ... but spinlocks now have super-high costs I May be better to do thread switch 6 / 13 OS Implications SMP Scheduling CS34 SMP Scheduling OS Implications Threads are often related I Schedule independently or together? I Completely independent: job completion is slowest thread I Together: some CPUs may be wasted on waiting for events SMP Scheduling I Always good to keep thread x on same CPU (because cache is filled) 2013-05-17 Threads are often related I Schedule independently or together? I Completely independent: job completion is slowest thread I Together: some CPUs may be wasted on waiting for events I Always good to keep thread x on same CPU (because cache is filled) 7 / 13 the Internet! Communicating with skinny wires introduces new problems: I Can’t move process to other machine (or must work hard) I Locking becomes really hard I Programming multiprocessor systems is much harder I . and what if network connection goes down? the Internet! Communicating with skinny wires introduces new problems: I Can’t move process to other machine (or must work hard) I Locking becomes really hard I Programming multiprocessor systems is much harder I . and what if network connection goes down? OS Implications Distributed Systems CS34 Distributed Systems OS Implications Many ways to communicate Most important modern approach is. Distributed Systems 2013-05-17 Many ways to communicate Most important modern approach is. 8 / 13 Communicating with skinny wires introduces new problems: I Can’t move process to other machine (or must work hard) I Locking becomes really hard I Programming multiprocessor systems is much harder I . and what if network connection goes down? Communicating with skinny wires introduces new problems: I Can’t move process to other machine (or must work hard) I Locking becomes really hard I Programming multiprocessor systems is much harder I . and what if network connection goes down? OS Implications Distributed Systems CS34 Distributed Systems OS Implications Many ways to communicate Most important modern approach is. the Internet! Distributed Systems 2013-05-17 Many ways to communicate Most important modern approach is. the Internet! 8 / 13 I . and what if network connection goes down? I . and what if network connection goes down? OS Implications Distributed Systems CS34 Distributed Systems OS Implications Many ways to communicate Most important modern approach is. the Internet! Communicating with skinny wires introduces new problems: Distributed Systems I Can’t move process to other machine (or must work hard) I Locking becomes really hard I Programming multiprocessor systems is much harder 2013-05-17 Many ways to communicate Most important modern approach is. the Internet! Communicating with skinny wires introduces new problems: I Can’t move process to other machine (or must work hard) I Locking becomes really hard I Programming multiprocessor systems is much harder 8 / 13 OS Implications Distributed Systems CS34 Distributed Systems OS Implications Many ways to communicate Most important modern approach is. the Internet! Communicating with skinny wires introduces new problems: Distributed Systems I Can’t move process to other machine (or must work hard) I Locking becomes really hard I Programming multiprocessor systems is much harder I . and what if network connection goes down? 2013-05-17 Many ways to communicate Most important modern approach is. the Internet! Communicating with skinny wires introduces new problems: I Can’t move process to other machine (or must work hard) I Locking becomes really hard I Programming multiprocessor systems is much harder I . and what if network connection goes down? 8 / 13 Programming Models RPC CS34 RPC Programming Models Programming is hard, so need abstractions that simplify things Remote Procedure Call (RPC) makes distant system look like normal function 1. Marshal arguments (i.e., pack up and serialize) 2. Send procedure ID and arguments to remote system RPC 3. Wait for response 4. Deserialize return value Class Exercise Programming is hard, so need abstractions that simplify things 2013-05-17 What are the advantages and disadvantages? Remote Procedure Call (RPC) makes distant system look like normal function 1. Marshal arguments (i.e., pack up and serialize) 2. Send procedure ID and arguments to remote system 3. Wait for response 4. Deserialize return value Class Exercise What are the advantages and disadvantages? 9 / 13 Programming Models DSM CS34 DSM Programming Models RPC is nice, but limits parallelism SMPs can do cool things because memory is shared DSM So why not simulate shared memory across the network? Teeny problem: hard to make it work fasta 2013-05-17 “Hard” is a gross understatement. RPC is nice, but limits parallelism SMPs can do cool things because memory is shared So why not simulate shared memory across the network? Teeny problem: hard to make it work fasta 10 / 13 Other Issues Load Balancing CS34 Load Balancing Other Issues Suppose you have servers A, B, C, and D A and B are currently overloaded, C and D underloaded A notices the situation and sends excess work to C and D Simultaneously, B does the same! Now C and D are overloaded Load Balancing Result can be thrashing Common solution: have one front-end machine whose sole job is allocating load to others 2013-05-17 Suppose you have servers A, B, C, and D Random assignment works surprisingly well. A and B are currently overloaded, C and D underloaded A notices the situation and sends excess work to C and D Simultaneously, B does the same! Now C and D are overloaded Result can be thrashing Common solution: have one front-end machine whose sole job is allocating load to others 11 / 13 Other Issues How Does Google Work? CS34 How Does Google Work? Other Issues Well, it’s a secret. But basically they use the front-end approach Obvious problem: one front end can’t handle millions of requests How Does Google Work? per second even if it does almost nothing Solution: DNS Round Robin tricks you into picking one of many dozens of front ends (roughly at random) to talk to 2013-05-17 Well, it’s a secret. But basically they use the front-end approach Obvious problem: one front end can’t handle millions of requests per second even if it does almost nothing Solution: DNS Round Robin tricks you into picking one of many dozens of front ends (roughly at random) to talk to 12 / 13 Other Issues CS34 Example of Google’s DNS tricks Example of Google’s DNS tricks These commands were run within 15 seconds of each other: bow:2:877> host www.google.com Other Issues www.google.com has address 74.125.224.241 www.google.com has address 74.125.224.242 www.google.com has address 74.125.224.243 www.google.com has address 74.125.224.244 www.google.com