Understanding Parallelism and the Limitations of Parallel Computing Understanding Parallelism: Sequential Work

Understanding Parallelism and the Limitations of Parallel Computing Understanding Parallelism: Sequential Work

Understanding Parallelism and the Limitations of Parallel Computing Understanding Parallelism: Sequential work After 16 time steps: 4 cars (c) RRZE 2013 Scalability Laws 2 Understanding Parallelism: Parallel work After 4 time steps: 4 cars “perfect speedup” (c) RRZE 2013 Scalability Laws 3 Understanding parallelism: Shared resources, imbalance shared resource Unused resources due to resource bottleneck and imbalance! Waiting for shared Waiting for resource synchronization (c) RRZE 2013 Scalability Laws 4 Limitations of Parallel Computing: Amdahl's Law Ideal world: All work is perfectly parallelizable serial serial serial serial Closer to reality: Purely serial parts serialseriell serial limit maximum speedup Reality is even worse: Communication and synchronization impede scalability even further (c) RRZE 2013 Scalability Laws 5 Limitations of Parallel Computing: Calculating Speedup in a Simple Model (“strong scaling”) T(1) = s+p = serial compute time (=1) parallelizable part: p = 1-s purely serial part s Parallel execution time: T(N) = s+p/N General formula for speedup: k T(1) 1 Amdahl's Law (1967) S p “strong scaling” 1s T(N) s N (c) RRZE 2013 Scalability Laws 6 Limitations of Parallel Computing: Amdahl's Law (“strong scaling”) . Reality: No task is perfectly parallelizable . Shared resources have to be used serially . Task interdependencies must be accounted for . Communication overhead (but that can be modeled separately) . Benefit of parallelization may be strongly limited . "Side effect": limited scalability leads to inefficient use of resources . Metric: Parallel Efficiency (what percentage of the workers/processors is efficiently used): S (N) (N) p p N . Amdahl case: 1 p s(N 1) 1 (c) RRZE 2013 Scalability Laws 7 Limitations of Parallel Computing: Adding a simple communication model for strong scaling T(1) = s+p = serial compute time parallelizable part: p = 1-s purely serial part s Model assumption: non- parallel: T(N) = s+p/N+Nk overlapping communication fraction k for messages communication per worker General formula for speedup: T(1) 1 S k p 1s T(N) s N Nk (c) RRZE 2013 Scalability Laws 8 Limitations of Parallel Computing: Amdahl's Law (“strong scaling”) . Large N limits 0 1 . at k=0, Amdahl's Law lim S p (N) predicts N s independent of N ! . at k≠0, our simple model k N 1 1 of communication S p (N) overhead yields a Nk beaviour of . Problems in real world programming . Load imbalance . Shared resources have to be used serially (e.g. IO) . Task interdependencies must be accounted for . Communication overhead (c) RRZE 2013 Scalability Laws 9 Limitations of Parallel Computing: Amdahl´s Law (“strong scaling”) + comm. model 10 9 8 7 6 s=0.01 5 s=0.1 S(N) s=0.1, k=0.05 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 # CPUs (c) RRZE 2013 Scalability Laws 10 Limitations of Parallel Computing: Amdahl´s Law (“strong scaling”) Parallel 100 efficiency: 90 <10% 80 ~50% 70 60 s=0.01 50 s=0.1 S(N) s=0.1, k=0.05 40 30 20 10 0 1 10 100 500 1000 # CPUs (c) RRZE 2013 Scalability Laws 11 Limitations of Parallel Computing: How to mitigate overheads . Communication is not necessarily purely serial . Non-blocking crossbar networks can transfer many messages concurrently – factor Nk in denominator becomes k (technical measure) . Sometimes, communication can be overlapped with useful work (implementation, algorithm): . Communication overhead may show a more fortunate behavior than Nk . "superlinear speedups“: data size per CPU decreases with increasing CPU count may fit into cache at large CPU counts (c) RRZE 2013 Scalability Laws 12 Limits of Scalability: Serial & Parallel fraction Serial fraction s may depend on . Program / algorithm . Non-parallelizable part, e.g. recursive data setup . Non-parallelizable IO, e.g. reading input data . Communication structure . Load balancing (assumed so far: perfect balanced) . … . Computer hardware . Processor: Cache effects & memory bandwidth effects . Parallel Library; Network capabilities; Parallel IO . … Determine s "experimentally": Measure speedup and fit data to Amdahl’s law – but that could be more complicated than it seems… (c) RRZE 2013 Scalability Laws 13 Scalability data on modern multi-core systems An example Scaling across nodes P P P P C C C C CP P CP P C C C C CP P CP P C P C P C P C P C C ChipsetC C C C 12 sockets ChipsetC C on node Chipset MemoryChipset Memory 12 cores on Memory socket Memory (c) RRZE 2013 Scalability Laws 14 Scalability data on modern multi-core systems The scaling baseline . Scalability presentations should be 2,0 intranode grouped according to the 1,5 memory- largest unit that the bound 1,0 scaling is based on code! Speedup Good 0,5 (the “scaling baseline”) scaling 0,0 across 1 2 4 sockets # CPUs . Amdahl model with communication: Fit 1 S(N) s 1s kN N internode to inter-node scalability numbers (N = # nodes, >1) (c) RRZE 2013 Scalability Laws 15 Application to “accelerated computing” . SIMD, GPUs, Cell SPEs, FPGAs, just any optimization… . Assume overall (serial, un-accelerated) runtime to be Ts=s+p=1 . Assume p can be accelerated and run α times faster. We neglect any additional cost (communication…) . To get a speedup of rα, how small must s be? Solve for s: 1 r 1 1 r s 1 s 1 s . At α=10 and r =0.9 (for an overall speedup of 9), we get s≈0.012, i.e. you must accelerate 98.8% of serial runtime! . Limited memory on accelerators may limit the achievable speedup (c) RRZE 2013 Scalability Laws 16 .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    16 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us