A High-Quality and Fast Maximal Independent Set Algorithm for GPUs
Martin Burtscher and Sindhu Devale Department of Computer Science Overview Introduction Serial and parallel algorithms Our parallel algorithm Optimizations Results Summary
A High-Quality and Fast MIS Algorithm 2 Maximal Independent Set Maximal independent set ( MIS ) Subset of vertices of undirected graph Vertices in subset are independent (not adjacent) Subset is maximal (all other vertices are adjacent) Not unique a
b c d e f Largest possible MIS Maximum independent set g NP-hard to compute h i
A High-Quality and Fast MIS Algorithm 3 Importance of MIS Building block of many parallel graph algorithms Graph coloring Maximal matching a 2-satisfiability b c d e f Maximal set packing Odd set cover problem g etc. h i
A High-Quality and Fast MIS Algorithm 4 Importance of MIS (cont.) Parallelization of complex computations Supports arbitrary and dynamically changing conflicts 1. Build graph (vertices = computations, edges = conflicts) 2. Compute MIS 3. Run computations in MIS in parallel (w/o locks or atomics) 4. Repeat if necessary E.g., Delaunay mesh refinement
Approach is only useful if MIS can be computed quickly in parallel and benefits from large sets
A High-Quality and Fast MIS Algorithm 5 Highlights ECL-MIS algorithm for massively-parallel devices
Fastest MIS runtimes on modern GPUs Randomized permutation selection function
Largest set sizes among many MIS algorithms New optimizations
Enhance performance and reduce memory footprint
A High-Quality and Fast MIS Algorithm 6 Serial Algorithm
A High-Quality and Fast MIS Algorithm 7 Serial Algorithm Repeating steps a Visit unvisited vertex Add vertex to set if no graph neighbors in set b c d e f Example g Start with empty set
h i Set = {}
A High-Quality and Fast MIS Algorithm 8 Serial Algorithm Repeating steps a Visit unvisited vertex Add vertex to set if no graph neighbors in set b c d e f Example g a has no neighbor in set Add vertex a h i Set = {a}
A High-Quality and Fast MIS Algorithm 9 Serial Algorithm Repeating steps a Visit unvisited vertex Add vertex to set if no graph neighbors in set b c d e f Example g b has neighbor in set Discard vertex b h i Set = {a}
A High-Quality and Fast MIS Algorithm 10 Serial Algorithm Repeating steps a Visit unvisited vertex Add vertex to set if no graph neighbors in set b c d e f Example g c has neighbor in set Discard vertex c h i Set = {a}
A High-Quality and Fast MIS Algorithm 11 Serial Algorithm Repeating steps a Visit unvisited vertex Add vertex to set if no graph neighbors in set b c d e f Example g d has neighbor in set Discard vertex d h i Set = {a}
A High-Quality and Fast MIS Algorithm 12 Serial Algorithm Repeating steps a Visit unvisited vertex Add vertex to set if no graph neighbors in set b c d e f Example g e has no neighbor in set Add vertex e h i Set = {a, e}
A High-Quality and Fast MIS Algorithm 13 Serial Algorithm Repeating steps a Visit unvisited vertex Add vertex to set if no graph neighbors in set b c d e f Example g f has neighbor in set Discard vertex f h i Set = {a, e}
A High-Quality and Fast MIS Algorithm 14 Serial Algorithm Repeating steps a Visit unvisited vertex Add vertex to set if no graph neighbors in set b c d e f Example g g has neighbor in set Discard vertex g h i Set = {a, e}
A High-Quality and Fast MIS Algorithm 15 Serial Algorithm Repeating steps a Visit unvisited vertex Add vertex to set if no graph neighbors in set b c d e f Example g h has no neighbor in set Add vertex h h i Set = {a, e, h}
A High-Quality and Fast MIS Algorithm 16 Serial Algorithm Repeating steps a Visit unvisited vertex Add vertex to set if no graph neighbors in set b c d e f Example g i has neighbor in set Discard vertex i h i MIS = {a, e, h}
A High-Quality and Fast MIS Algorithm 17 Luby’s Random-Priority Parallel MIS Algorithm
A High-Quality and Fast MIS Algorithm 18 Random-Priority Algorithm (Luby) Repeating steps a Assign random priorities Add vertices with highest local priority to set b c d e f Remove their neighbors from graph g
Set = {} h i
A High-Quality and Fast MIS Algorithm 19 Random-Priority Algorithm (Luby) Repeating steps Assign random priorities a5 Add vertices with highest local priority to set b c d e f 7 1 4 6 3 Remove their neighbors from graph g4 Set = {} h3 i2
A High-Quality and Fast MIS Algorithm 20 Random-Priority Algorithm (Luby) Repeating steps Assign random priorities a5 Add vertices with highest local priority to set b c d e f 7 1 4 6 3 Remove their neighbors from graph g4 Set = { b, e} h3 i2
A High-Quality and Fast MIS Algorithm 21 Random-Priority Algorithm (Luby) Repeating steps a Assign random priorities Add vertices with highest local priority to set b c d e f Remove their neighbors from graph g
Set = {b, e} h i
A High-Quality and Fast MIS Algorithm 22 Random-Priority Algorithm (Luby) Repeating steps a Assign random priorities Add vertices with highest local priority to set b c d e f 9 Remove their neighbors from graph g
Set = {b, e} h5 i6
A High-Quality and Fast MIS Algorithm 23 Random-Priority Algorithm (Luby) Repeating steps a Assign random priorities Add vertices with highest local priority to set b c d e f 9 Remove their neighbors from graph g
Set = {b, c, e, i} h5 i6
A High-Quality and Fast MIS Algorithm 24 Random-Priority Algorithm (Luby) Repeating steps a Assign random priorities Add vertices with highest local priority to set b c d e f Remove their neighbors from graph g
MIS = {b, c, e, i} h i
A High-Quality and Fast MIS Algorithm 25 Random-Permutation Parallel MIS Algorithm
A High-Quality and Fast MIS Algorithm 26 Random-Permutation Algorithm Initialization a Assign random priorities Repeating steps b c d e f Add vertices with highest local priority to set g Remove neighbors and their edges from graph
h i Set = {}
A High-Quality and Fast MIS Algorithm 27 Random-Permutation Algorithm Initialization Assign random priorities a5 Repeating steps Add vertices with highest b7 c1 d4 e6 f3 local priority to set g Remove neighbors and 4 their edges from graph
h i 3 2 Set = {}
A High-Quality and Fast MIS Algorithm 28 Random-Permutation Algorithm Initialization Assign random priorities a5 Repeating steps Add vertices with highest b7 c1 d4 e6 f3 local priority to set g Remove neighbors and 4 their edges from graph
h i 3 2 Set = {b, e}
A High-Quality and Fast MIS Algorithm 29 Random-Permutation Algorithm Initialization Assign random priorities a5 Repeating steps Add vertices with highest b7 c1 d4 e6 f3 local priority to set g Remove neighbors and 4 their edges from graph
h i 3 2 Set = {b, e}
A High-Quality and Fast MIS Algorithm 30 Random-Permutation Algorithm Initialization Assign random priorities a5 Repeating steps Add vertices with highest b7 c1 d4 e6 f3 local priority to set g Remove neighbors and 4 their edges from graph
h i 3 2 Set = {b, c, e, h}
A High-Quality and Fast MIS Algorithm 31 Random-Permutation Algorithm Initialization Assign random priorities a5 Repeating steps Add vertices with highest b7 c1 d4 e6 f3 local priority to set g Remove neighbors and 4 their edges from graph
h i 3 2 MIS = {b, c, e, h}
A High-Quality and Fast MIS Algorithm 32 Luby’s Random-Selection Parallel MIS Algorithm
A High-Quality and Fast MIS Algorithm 33 Random-Selection Algorithm (Luby) Repeating steps a Mark vertices with probability 0.5/degree Add marked vertices to set b c d e f if no marked neighbors Remove their neighbors g from graph
h i Set = {}
A High-Quality and Fast MIS Algorithm 34 Random-Selection Algorithm (Luby) Repeating steps Mark vertices with ao probability 0.5/degree Add marked vertices to set bi co di eo fo if no marked neighbors Remove their neighbors go from graph
hi ii Set = {}
A High-Quality and Fast MIS Algorithm 35 Random-Selection Algorithm (Luby) Repeating steps Mark vertices with ao probability 0.5/degree Add marked vertices to set bi co di eo fo if no marked neighbors Remove their neighbors go from graph
hi ii Set = {b, d}
A High-Quality and Fast MIS Algorithm 36 Random-Selection Algorithm (Luby) Repeating steps a Mark vertices with probability 0.5/degree(v) Add marked vertices to set b c d e f if no marked neighbors Remove their neighbors g from graph
h i Set = {b, d}
A High-Quality and Fast MIS Algorithm 37 Random-Selection Algorithm (Luby) Repeating steps a Mark vertices with probability 0.5/degree Add marked vertices to set b ci d e fi if no marked neighbors Remove their neighbors g from graph
ho ii Set = {b, d}
A High-Quality and Fast MIS Algorithm 38 Random-Selection Algorithm (Luby) Repeating steps a Mark vertices with probability 0.5/degree Add marked vertices to set b ci d e fi if no marked neighbors Remove their neighbors g from graph
ho ii Set = {b, c, d, f, i}
A High-Quality and Fast MIS Algorithm 39 Random-Selection Algorithm (Luby) Repeating steps a Mark vertices with probability 0.5/degree Add marked vertices to set b c d e f if no marked neighbors Remove their neighbors g from graph
h i MIS = {b, c, d, f, i}
A High-Quality and Fast MIS Algorithm 40 ECL-MIS Our Permutation-Selection Parallel MIS Algorithm
A High-Quality and Fast MIS Algorithm 41 Our Permutation-Selection Algorithm Initialization a Assign priorities ~ 1/deg Randomize within level b c d e f Repeating steps Add vertices with highest g local priority to set Remove their neighbors from graph h i Set = {}
A High-Quality and Fast MIS Algorithm 42 Our Permutation-Selection Algorithm Initialization Assign priorities ~ 1/deg a65 Randomize within level Repeating steps b87 c82 d76 e74 f73 Add vertices with highest local priority to set g35 Remove their neighbors from graph h89 i81 Set = {}
A High-Quality and Fast MIS Algorithm 43 Our Permutation-Selection Algorithm Initialization Assign priorities ~ 1/deg a65 Randomize within level Repeating steps b87 c82 d76 e74 f73 Add vertices with highest local priority to set g35 Remove their neighbors from graph h89 i81 Set = {}
A High-Quality and Fast MIS Algorithm 44 Our Permutation-Selection Algorithm Initialization Assign priorities ~ 1/deg a65 Randomize within level Repeating steps b87 c82 d76 e74 f73 Add vertices with highest local priority to set g35 Remove their neighbors from graph h89 i81 Set = {b, c, d, h}
A High-Quality and Fast MIS Algorithm 45 Our Permutation-Selection Algorithm Initialization Assign priorities ~ 1/deg a65 Randomize within level Repeating steps b87 c82 d76 e74 f73 Add vertices with highest local priority to set g35 Remove their neighbors from graph h89 i81 Set = {b, c, d, h}
A High-Quality and Fast MIS Algorithm 46 Our Permutation-Selection Algorithm Initialization Assign priorities ~ 1/deg a65 Randomize within level Repeating steps b87 c82 d76 e74 f73 Add vertices with highest local priority to set g35 Remove their neighbors from graph h89 i81 Set = {b, c, d, f, h}
A High-Quality and Fast MIS Algorithm 47 Our Permutation-Selection Algorithm Initialization a Assign priorities ~ 1/deg Randomize within level b c d e f Repeating steps Add vertices with highest g local priority to set Remove their neighbors from graph h i MIS = {b, c, d, f, h}
A High-Quality and Fast MIS Algorithm 48 ECL-MIS Features Single initialization Requires less work (faster) Enables asynchronous implementation (faster) Permutation-selection function Boosts set size (higher-quality result) Requires only a few bits (lower memory footprint) Combined priority and status information Reduces storage (lower memory footprint) Minimizes memory accesses (faster)
A High-Quality and Fast MIS Algorithm 49 Permutation-Selection Function Requirements Has to work for all graphs Needs to be proportional to 1/degree Do not know highest degree (but know average)
Our solution priority(v) = avg_degree / (avg_degree + degree *(v)) Degree * includes random fraction (e.g., 3. xyz ) Scaled to small integer (e.g., a byte)
A High-Quality and Fast MIS Algorithm 50 Permutation-Selection Function
Wide range at low degrees
50% of range is below avg degree
Makes ties unlikely Narrow range at high degrees
Ties are likely but unimportant
A High-Quality and Fast MIS Algorithm 51 Combining Information Standard implementation ( 2 arrays ) 1st array: vertex state (undecided, in set, out of set) 2nd array: vertex priority (random number)
Our implementation ( 1 array ) in 7 MSBs hold combined status and priority prio prio Reserved highest value: in set (= higher than its neighbors) . . Reserved lowest value: out of set (= removed from graph) . Remaining values = priority prio prio LSB = decided/undecided (to boost performance) out
A High-Quality and Fast MIS Algorithm 52 Results
A High-Quality and Fast MIS Algorithm 53 Methodology System GPUs: Titan X and K40 , nvcc 8.0 CPUs: 2 Xeon E5-2687W v3 (20 cores, 3.1GHz), gcc 5.3 GPU MIS codes CUSP , ECL, IrGL , and Pannotia CPU MIS codes Ligra, Ligra+, and PBBS (Cilk and OpenMP, incremental and non-deterministic) PBBS (serial)
A High-Quality and Fast MIS Algorithm 54 Input Graphs
vertex degree vertices edges min max avg 16 graphs 2d-2e20.sym 1,048,576 4,190,208 2 4 4.0 amazon0601 403,394 4,886,816 1 2,752 12.1 Real-world + synth. as-skitter 1,696,415 22,190,596 1 35,455 13.1 citationCiteseer 268,495 2,313,294 1 1,318 8.6 All made undirected cit-Patents 3,774,768 33,037,894 1 793 8.8 coPapersDBLP 540,486 30,491,458 1 3,299 56.4 delaunay_n24 16,777,216 100,663,202 3 26 6.0 in-2004 1,382,908 27,182,946 0 21,869 19.7 Sizes internet 124,651 387,240 1 151 3.1 kron_g500-logn21 2,097,152 182,081,864 0 213,904 86.8 66k – 24M vertices r4-2e23.sym 8,388,608 67,108,846 2 26 8.0 387k – 524M edges rmat16.sym 65,536 967,866 0 569 14.8 rmat22.sym 4,194,304 65,660,814 0 3,687 15.7 0 – 214k degrees uk-2002 18,520,486 523,574,516 0 194,955 28.3 USA-road-d.NY 264,346 730,100 1 8 2.8 USA-road-d.USA 23,947,347 57,708,624 1 9 2.4
A High-Quality and Fast MIS Algorithm 55 Titan X Performance (Edges/Second)
>100x faster than other codes on kron ECL-MIS is >3.9x faster on each tested graph >12x faster than other codes on average
A High-Quality and Fast MIS Algorithm 56 K40 Performance (Edges/Second)
>70x faster than ECL-MIS is >3.8x faster other codes on kron on each tested graph
>9x faster than other codes on average
A High-Quality and Fast MIS Algorithm 57 Set Size (Deterministic)
ECL-MIS yields largest set on all but one graph 10% larger on average
A High-Quality and Fast MIS Algorithm 58 Performance Optimizations
Synchronous execution Using 16-bit values yields an 18% slowdown yields a 15% slowdown
Using 32-bit values yields a 33% slowdown Using 2 separate arrays Not using randomization yields an 18% slowdown yields a 6% slowdown
Combination of Visiting all neighbors optimizations is key yields a 59% slowdown
A High-Quality and Fast MIS Algorithm 59 Comparison to CPU Codes (Averages)
9% to 11% larger >2.9x faster than CPU on average codes on average ECL-MIS yields largest set on 15 of 16 graphs ECL-MIS is fastest on all but one tested graph
A High-Quality and Fast MIS Algorithm 60 Summary ECL-MIS maximal independent set algorithm Fastest GPU implementation (due to optimizations) Produces largest sets (due to permutation selection) Atomic-free CUDA implementation http://cs.txstate.edu/~burtscher/research/ECL-MIS/
Acknowledgments NSF grant 1406304 Nvidia donations
A High-Quality and Fast MIS Algorithm 61