Locality-Conscious Parallel Algorithms

Locality-Conscious Parallel Algorithms

Motivation PEM Model PEM Algorithms GPGPU Conclusions Locality-conscious parallel algorithms Nodari Sitchinava Karlsruhe Institute of Technology University of Hawaii October 17, 2013 Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions 1990s: Faster Programs Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions 1990s: Faster Programs Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions 1990s: Faster Programs Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions 1990s: Faster Programs Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Mid 2000s – Power Dissipation Is a Problem Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Mid 2000s – Power Dissipation Is a Problem Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Parallel computing 1970-90s: PRAM, BSP, hypercubes... Important results on the limits of parallelism Are they practical? Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Example: Orthogonal Point Location Runtime Θ(n log n) Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Example: Orthogonal Point Location Runtime per element 1e-05 9e-06 8e-06 7e-06 seconds 6e-06 Runtime 5e-06 Θ(n log n) 4e-06 3e-06 0 2e+07 4e+07 6e+07 8e+07 1e+08 1.2e+08 n Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Example: Orthogonal Point Location Runtime per element 1e-05 9e-06 8e-06 7e-06 seconds 6e-06 Runtime 5e-06 Θ(n log n) 4e-06 3e-06 0 2e+07 4e+07 6e+07 8e+07 1e+08 1.2e+08 n Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Θ notation What does Θ notation represent? Number of comparisons Number of arithmetic operations Number of memory accesses Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Θ notation What does Θ notation represent? Number of comparisons Number of arithmetic operations Number of memory accesses All of the above in RAM/PRAM models Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Example: Orthogonal Point Location Θ(n log n) memory accesses? Runtime per element DRAM accesses per element 1e-05 80 9e-06 70 60 8e-06 50 7e-06 40 seconds 6e-06 30 5e-06 20 4e-06 10 3e-06 0 0 2e+07 4e+07 6e+07 8e+07 1e+08 1.2e+08 0 2e+07 4e+07 6e+07 8e+07 1e+08 1.2e+08 n n Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Modern memory design CPU Disk Caches! L1 Cache L2 Cache RAM Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Caches in Multi-cores CPU 1 CPU 2 Disk CPU 3 CPU 4 L1 L2 RAM Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Dealing with Caches External Memory Model CPU M Cache B External memory Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions RAM PRAM CPU PE 1 PE 2 ... PE P Random Access Memory Random access memory External Memory CPU M Cache B External memory Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions RAM PRAM CPU PE 1 PE 2 ... PE P Random Access Memory Random access memory External Memory Parallel External Memory CPU PE 1 PE 2 PE P ... M M M M Cache Cache Cache Cache B B B B External memory Shared memory Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Parallel External Memory (PEM) Model [SPAA ’08] PEM: A simple model of locality and parallelism PE 1 PE 2 PE P ... M M M Cache Cache Cache Explicit cache replacement B B B Up to P block transfers = 1 I/O Shared memory Block-level CREW access Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Parallel External Memory (PEM) Model [SPAA ’08] PEM: A simple model of locality and parallelism PE 1 PE 2 PE P ... M M M Cache Cache Cache Explicit cache replacement B B B Up to P block transfers = 1 I/O Shared memory Block-level CREW access Complexity: Parallel I/O complexity – number of parallel block transfers Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Solutions in PEM Algorithms Sorting [SPAA’08] Orthogonal Computational Geometry [ESA’10, Independent IPDPS’11] Set List Problems on Graphs Ranking Tree [IPDPS’10] Euler Tour Connected Expression Tree MST Components Evaluation Bi−connected Batched LCA PEM Data Structures [SPAA’12] Components Ear Decomposition Parallel Buffer/Range Tree Exp. Validation [ESA’13] Orthogonal Computational Geometry Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions PEM Algorithms Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Sorting [SPAA ’08] d = min{M/B, n/P} p log (N/B) Sorting M/B d-way mergesort d-way distribution sort Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Sorting [SPAA ’08] d = min{M/B, n/P} p log (N/B) Sorting M/B d-way mergesort d-way distribution sort n n sort Θ PB logM/B B = P (n) I/Os Θ(n log n) comparisons Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Geometric Algorithms [ESA ’10, IPDPS ’11] Solved Problems Line segment intersections Rectangle intersections Range Query Orthogonal point location n n sort Θ PB logM/B B = P (n) I/Os Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Parallel Distribution Sweeping d = min{M/B, n/P} p Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Parallel Distribution Sweeping d = min{M/B, n/P} p d vertical slabs Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Parallel Distribution Sweeping d = min{M/B, n/P} p d vertical slabs Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Parallel Distribution Sweeping d = min{M/B, n/P} p d vertical slabs Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Parallel Distribution Sweeping d = min{M/B, n/P} p d vertical slabs Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Parallel Distribution Sweeping d = min{M/B, n/P} p d vertical slabs Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Parallel Distribution Sweeping d = min{M/B, n/P} p O(N/P) d vertical d vertical slabs slabs Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Parallel Distribution Sweeping d = min{M/B, n/P} p O(N/P) d vertical d vertical slabs slabs Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Example: Orthogonal Point Location Complexity n n O PB logM/B B I/Os O (n log n) comparisons Runtime per element DRAM accesses per element 1e-05 80 9e-06 70 60 8e-06 50 7e-06 40 seconds 6e-06 30 5e-06 20 4e-06 10 3e-06 0 0 2e+07 4e+07 6e+07 8e+07 1e+08 1.2e+08 0 2e+07 4e+07 6e+07 8e+07 1e+08 1.2e+08 n n logM/B (n/B) ≈ 2 Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Graph Problems [IPDPS ’10] Independent Set List Ranking Tree Euler Tour Connected Expression Tree MST Components Evaluation Bi−connected Batched LCA Components Ear Decomposition Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Pointers And Caches Don’t Get Along Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions Pointers And Caches Don’t Get Along n n n min Θ + log n , Θ logM B I/Os for list ranking. n P PB / B o Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions If you bear with me for a moment... Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions GPGPU Computing Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions GPU computing GPGPU – graphics cards for computation Hundreds of cores Thousands of threads Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms GPGPU Conclusions GPUs Global Memory (n ≤ 3GB) SM w SM w sh. mem sh. mem Shared (Local) Memory (M ≈ 48 kB) Global memory CPU memory Registers (≈ 32 kB) Nodari Sitchinava – Locality-conscious parallel algorithms Motivation PEM Model PEM Algorithms

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    60 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us