A High-Quality and Fast Maximal Independent Set for GPUs

Martin Burtscher and Sindhu Devale Department of Computer Science Overview  Introduction  Serial and parallel  Our parallel algorithm  Optimizations  Results  Summary

A High-Quality and Fast MIS Algorithm 2 Maximal Independent Set  Maximal independent set ( MIS )  Subset of vertices of undirected graph  Vertices in subset are independent (not adjacent)  Subset is maximal (all other vertices are adjacent)  Not unique a

b c d e f  Largest possible MIS  Maximum independent set g  NP-hard to compute h i

A High-Quality and Fast MIS Algorithm 3 Importance of MIS  Building block of many parallel graph algorithms   Maximal a  2-satisfiability b c d e f  Maximal  Odd set cover problem g  etc. h i

A High-Quality and Fast MIS Algorithm 4 Importance of MIS (cont.)  Parallelization of complex computations  Supports arbitrary and dynamically changing conflicts 1. Build graph (vertices = computations, edges = conflicts) 2. Compute MIS 3. Run computations in MIS in parallel (w/o locks or atomics) 4. Repeat if necessary  E.g., Delaunay mesh refinement

 Approach is only useful if MIS can be computed quickly in parallel and benefits from large sets

A High-Quality and Fast MIS Algorithm 5 Highlights  ECL-MIS algorithm for massively-parallel devices

 Fastest MIS runtimes on modern GPUs  Randomized permutation selection function

 Largest set sizes among many MIS algorithms  New optimizations

 Enhance performance and reduce memory footprint

A High-Quality and Fast MIS Algorithm 6 Serial Algorithm

A High-Quality and Fast MIS Algorithm 7 Serial Algorithm  Repeating steps a  Visit unvisited vertex  Add vertex to set if no graph neighbors in set b c d e f  Example g  Start with empty set

h i  Set = {}

A High-Quality and Fast MIS Algorithm 8 Serial Algorithm  Repeating steps a  Visit unvisited vertex  Add vertex to set if no graph neighbors in set b c d e f  Example g  a has no neighbor in set  Add vertex a h i  Set = {a}

A High-Quality and Fast MIS Algorithm 9 Serial Algorithm  Repeating steps a  Visit unvisited vertex  Add vertex to set if no graph neighbors in set b c d e f  Example g  b has neighbor in set  Discard vertex b h i  Set = {a}

A High-Quality and Fast MIS Algorithm 10 Serial Algorithm  Repeating steps a  Visit unvisited vertex  Add vertex to set if no graph neighbors in set b c d e f  Example g  c has neighbor in set  Discard vertex c h i  Set = {a}

A High-Quality and Fast MIS Algorithm 11 Serial Algorithm  Repeating steps a  Visit unvisited vertex  Add vertex to set if no graph neighbors in set b c d e f  Example g  d has neighbor in set  Discard vertex d h i  Set = {a}

A High-Quality and Fast MIS Algorithm 12 Serial Algorithm  Repeating steps a  Visit unvisited vertex  Add vertex to set if no graph neighbors in set b c d e f  Example g  e has no neighbor in set  Add vertex e h i  Set = {a, e}

A High-Quality and Fast MIS Algorithm 13 Serial Algorithm  Repeating steps a  Visit unvisited vertex  Add vertex to set if no graph neighbors in set b c d e f  Example g  f has neighbor in set  Discard vertex f h i  Set = {a, e}

A High-Quality and Fast MIS Algorithm 14 Serial Algorithm  Repeating steps a  Visit unvisited vertex  Add vertex to set if no graph neighbors in set b c d e f  Example g  g has neighbor in set  Discard vertex g h i  Set = {a, e}

A High-Quality and Fast MIS Algorithm 15 Serial Algorithm  Repeating steps a  Visit unvisited vertex  Add vertex to set if no graph neighbors in set b c d e f  Example g  h has no neighbor in set  Add vertex h h i  Set = {a, e, h}

A High-Quality and Fast MIS Algorithm 16 Serial Algorithm  Repeating steps a  Visit unvisited vertex  Add vertex to set if no graph neighbors in set b c d e f  Example g  i has neighbor in set  Discard vertex i h i  MIS = {a, e, h}

A High-Quality and Fast MIS Algorithm 17 Luby’s Random-Priority Parallel MIS Algorithm

A High-Quality and Fast MIS Algorithm 18 Random-Priority Algorithm (Luby)  Repeating steps a  Assign random priorities  Add vertices with highest local priority to set b c d e f  Remove their neighbors from graph g

 Set = {} h i

A High-Quality and Fast MIS Algorithm 19 Random-Priority Algorithm (Luby)  Repeating steps  Assign random priorities a5  Add vertices with highest local priority to set b c d e f 7 1 4 6 3  Remove their neighbors from graph g4  Set = {} h3 i2

A High-Quality and Fast MIS Algorithm 20 Random-Priority Algorithm (Luby)  Repeating steps  Assign random priorities a5  Add vertices with highest local priority to set b c d e f 7 1 4 6 3  Remove their neighbors from graph g4  Set = { b, e} h3 i2

A High-Quality and Fast MIS Algorithm 21 Random-Priority Algorithm (Luby)  Repeating steps a  Assign random priorities  Add vertices with highest local priority to set b c d e f  Remove their neighbors from graph g

 Set = {b, e} h i

A High-Quality and Fast MIS Algorithm 22 Random-Priority Algorithm (Luby)  Repeating steps a  Assign random priorities  Add vertices with highest local priority to set b c d e f 9  Remove their neighbors from graph g

 Set = {b, e} h5 i6

A High-Quality and Fast MIS Algorithm 23 Random-Priority Algorithm (Luby)  Repeating steps a  Assign random priorities  Add vertices with highest local priority to set b c d e f 9  Remove their neighbors from graph g

 Set = {b, c, e, i} h5 i6

A High-Quality and Fast MIS Algorithm 24 Random-Priority Algorithm (Luby)  Repeating steps a  Assign random priorities  Add vertices with highest local priority to set b c d e f  Remove their neighbors from graph g

 MIS = {b, c, e, i} h i

A High-Quality and Fast MIS Algorithm 25 Random-Permutation Parallel MIS Algorithm

A High-Quality and Fast MIS Algorithm 26 Random-Permutation Algorithm  Initialization a  Assign random priorities  Repeating steps b c d e f  Add vertices with highest local priority to set g  Remove neighbors and their edges from graph

h i  Set = {}

A High-Quality and Fast MIS Algorithm 27 Random-Permutation Algorithm  Initialization  Assign random priorities a5  Repeating steps  Add vertices with highest b7 c1 d4 e6 f3 local priority to set g  Remove neighbors and 4 their edges from graph

h i 3 2  Set = {}

A High-Quality and Fast MIS Algorithm 28 Random-Permutation Algorithm  Initialization  Assign random priorities a5  Repeating steps  Add vertices with highest b7 c1 d4 e6 f3 local priority to set g  Remove neighbors and 4 their edges from graph

h i 3 2  Set = {b, e}

A High-Quality and Fast MIS Algorithm 29 Random-Permutation Algorithm  Initialization  Assign random priorities a5  Repeating steps  Add vertices with highest b7 c1 d4 e6 f3 local priority to set g  Remove neighbors and 4 their edges from graph

h i 3 2  Set = {b, e}

A High-Quality and Fast MIS Algorithm 30 Random-Permutation Algorithm  Initialization  Assign random priorities a5  Repeating steps  Add vertices with highest b7 c1 d4 e6 f3 local priority to set g  Remove neighbors and 4 their edges from graph

h i 3 2  Set = {b, c, e, h}

A High-Quality and Fast MIS Algorithm 31 Random-Permutation Algorithm  Initialization  Assign random priorities a5  Repeating steps  Add vertices with highest b7 c1 d4 e6 f3 local priority to set g  Remove neighbors and 4 their edges from graph

h i 3 2  MIS = {b, c, e, h}

A High-Quality and Fast MIS Algorithm 32 Luby’s Random-Selection Parallel MIS Algorithm

A High-Quality and Fast MIS Algorithm 33 Random-Selection Algorithm (Luby)  Repeating steps a  Mark vertices with probability 0.5/degree  Add marked vertices to set b c d e f if no marked neighbors  Remove their neighbors g from graph

h i  Set = {}

A High-Quality and Fast MIS Algorithm 34 Random-Selection Algorithm (Luby)  Repeating steps  Mark vertices with ao probability 0.5/degree  Add marked vertices to set bi co di eo fo if no marked neighbors  Remove their neighbors go from graph

hi ii  Set = {}

A High-Quality and Fast MIS Algorithm 35 Random-Selection Algorithm (Luby)  Repeating steps  Mark vertices with ao probability 0.5/degree  Add marked vertices to set bi co di eo fo if no marked neighbors  Remove their neighbors go from graph

hi ii  Set = {b, d}

A High-Quality and Fast MIS Algorithm 36 Random-Selection Algorithm (Luby)  Repeating steps a  Mark vertices with probability 0.5/degree(v)  Add marked vertices to set b c d e f if no marked neighbors  Remove their neighbors g from graph

h i  Set = {b, d}

A High-Quality and Fast MIS Algorithm 37 Random-Selection Algorithm (Luby)  Repeating steps a  Mark vertices with probability 0.5/degree  Add marked vertices to set b ci d e fi if no marked neighbors  Remove their neighbors g from graph

ho ii  Set = {b, d}

A High-Quality and Fast MIS Algorithm 38 Random-Selection Algorithm (Luby)  Repeating steps a  Mark vertices with probability 0.5/degree  Add marked vertices to set b ci d e fi if no marked neighbors  Remove their neighbors g from graph

ho ii  Set = {b, c, d, f, i}

A High-Quality and Fast MIS Algorithm 39 Random-Selection Algorithm (Luby)  Repeating steps a  Mark vertices with probability 0.5/degree  Add marked vertices to set b c d e f if no marked neighbors  Remove their neighbors g from graph

h i  MIS = {b, c, d, f, i}

A High-Quality and Fast MIS Algorithm 40 ECL-MIS Our Permutation-Selection Parallel MIS Algorithm

A High-Quality and Fast MIS Algorithm 41 Our Permutation-Selection Algorithm  Initialization a  Assign priorities ~ 1/deg  Randomize within level b c d e f  Repeating steps  Add vertices with highest g local priority to set  Remove their neighbors from graph h i  Set = {}

A High-Quality and Fast MIS Algorithm 42 Our Permutation-Selection Algorithm  Initialization  Assign priorities ~ 1/deg a65  Randomize within level  Repeating steps b87 c82 d76 e74 f73  Add vertices with highest local priority to set g35  Remove their neighbors from graph h89 i81  Set = {}

A High-Quality and Fast MIS Algorithm 43 Our Permutation-Selection Algorithm  Initialization  Assign priorities ~ 1/deg a65  Randomize within level  Repeating steps b87 c82 d76 e74 f73  Add vertices with highest local priority to set g35  Remove their neighbors from graph h89 i81  Set = {}

A High-Quality and Fast MIS Algorithm 44 Our Permutation-Selection Algorithm  Initialization  Assign priorities ~ 1/deg a65  Randomize within level  Repeating steps b87 c82 d76 e74 f73  Add vertices with highest local priority to set g35  Remove their neighbors from graph h89 i81  Set = {b, c, d, h}

A High-Quality and Fast MIS Algorithm 45 Our Permutation-Selection Algorithm  Initialization  Assign priorities ~ 1/deg a65  Randomize within level  Repeating steps b87 c82 d76 e74 f73  Add vertices with highest local priority to set g35  Remove their neighbors from graph h89 i81  Set = {b, c, d, h}

A High-Quality and Fast MIS Algorithm 46 Our Permutation-Selection Algorithm  Initialization  Assign priorities ~ 1/deg a65  Randomize within level  Repeating steps b87 c82 d76 e74 f73  Add vertices with highest local priority to set g35  Remove their neighbors from graph h89 i81  Set = {b, c, d, f, h}

A High-Quality and Fast MIS Algorithm 47 Our Permutation-Selection Algorithm  Initialization a  Assign priorities ~ 1/deg  Randomize within level b c d e f  Repeating steps  Add vertices with highest g local priority to set  Remove their neighbors from graph h i  MIS = {b, c, d, f, h}

A High-Quality and Fast MIS Algorithm 48 ECL-MIS Features  Single initialization  Requires less work (faster)  Enables asynchronous implementation (faster)  Permutation-selection function  Boosts set size (higher-quality result)  Requires only a few bits (lower memory footprint)  Combined priority and status information  Reduces storage (lower memory footprint)  Minimizes memory accesses (faster)

A High-Quality and Fast MIS Algorithm 49 Permutation-Selection Function  Requirements  Has to work for all graphs  Needs to be proportional to 1/degree  Do not know highest degree (but know average)

 Our solution priority(v) = avg_degree / (avg_degree + degree *(v))  Degree * includes random fraction (e.g., 3. xyz )  Scaled to small integer (e.g., a byte)

A High-Quality and Fast MIS Algorithm 50 Permutation-Selection Function

Wide range at low degrees

50% of range is below avg degree

Makes ties unlikely Narrow range at high degrees

Ties are likely but unimportant

A High-Quality and Fast MIS Algorithm 51 Combining Information  Standard implementation ( 2 arrays )  1st array: vertex state (undecided, in set, out of set)  2nd array: vertex priority (random number)

 Our implementation ( 1 array ) in  7 MSBs hold combined status and priority prio prio  Reserved highest value: in set (= higher than its neighbors) . .  Reserved lowest value: out of set (= removed from graph) .  Remaining values = priority prio prio  LSB = decided/undecided (to boost performance) out

A High-Quality and Fast MIS Algorithm 52 Results

A High-Quality and Fast MIS Algorithm 53 Methodology  System  GPUs: Titan X and K40 , nvcc 8.0  CPUs: 2 Xeon E5-2687W v3 (20 cores, 3.1GHz), gcc 5.3  GPU MIS codes  CUSP , ECL, IrGL , and Pannotia  CPU MIS codes  Ligra, Ligra+, and PBBS (Cilk and OpenMP, incremental and non-deterministic)  PBBS (serial)

A High-Quality and Fast MIS Algorithm 54 Input Graphs

vertex degree vertices edges min max avg  16 graphs 2d-2e20.sym 1,048,576 4,190,208 2 4 4.0 amazon0601 403,394 4,886,816 1 2,752 12.1  Real-world + synth. as-skitter 1,696,415 22,190,596 1 35,455 13.1 citationCiteseer 268,495 2,313,294 1 1,318 8.6  All made undirected cit-Patents 3,774,768 33,037,894 1 793 8.8 coPapersDBLP 540,486 30,491,458 1 3,299 56.4 delaunay_n24 16,777,216 100,663,202 3 26 6.0 in-2004 1,382,908 27,182,946 0 21,869 19.7  Sizes internet 124,651 387,240 1 151 3.1 kron_g500-logn21 2,097,152 182,081,864 0 213,904 86.8  66k – 24M vertices r4-2e23.sym 8,388,608 67,108,846 2 26 8.0  387k – 524M edges rmat16.sym 65,536 967,866 0 569 14.8 rmat22.sym 4,194,304 65,660,814 0 3,687 15.7  0 – 214k degrees uk-2002 18,520,486 523,574,516 0 194,955 28.3 USA-road-d.NY 264,346 730,100 1 8 2.8 USA-road-d.USA 23,947,347 57,708,624 1 9 2.4

A High-Quality and Fast MIS Algorithm 55 Titan X Performance (Edges/Second)

>100x faster than other codes on kron ECL-MIS is >3.9x faster on each tested graph >12x faster than other codes on average

A High-Quality and Fast MIS Algorithm 56 K40 Performance (Edges/Second)

>70x faster than ECL-MIS is >3.8x faster other codes on kron on each tested graph

>9x faster than other codes on average

A High-Quality and Fast MIS Algorithm 57 Set Size (Deterministic)

ECL-MIS yields largest set on all but one graph 10% larger on average

A High-Quality and Fast MIS Algorithm 58 Performance Optimizations

Synchronous execution Using 16-bit values yields an 18% slowdown yields a 15% slowdown

Using 32-bit values yields a 33% slowdown Using 2 separate arrays Not using randomization yields an 18% slowdown yields a 6% slowdown

Combination of Visiting all neighbors optimizations is key yields a 59% slowdown

A High-Quality and Fast MIS Algorithm 59 Comparison to CPU Codes (Averages)

9% to 11% larger >2.9x faster than CPU on average codes on average ECL-MIS yields largest set on 15 of 16 graphs ECL-MIS is fastest on all but one tested graph

A High-Quality and Fast MIS Algorithm 60 Summary  ECL-MIS maximal independent set algorithm  Fastest GPU implementation (due to optimizations)  Produces largest sets (due to permutation selection)  Atomic-free CUDA implementation  http://cs.txstate.edu/~burtscher/research/ECL-MIS/

 Acknowledgments  NSF grant 1406304  Nvidia donations

A High-Quality and Fast MIS Algorithm 61