Solving Applied Graph Theory Problems in the JuliaGraphs Ecosystem

James P Fairbanks Oct 24, 2018

Georgia Tech Research Institute

1 Introduction The Julia Programming Language

• 1.0 stable release in Aug 2018 • Multiple dispatch • Dynamic Type system • JIT Compiler • Metaprogramming • Single machine, GPU, and distributed parallelism • Open Source (MIT License)

2 Julia Performance Benchmarks

Figure 1: Benchmark times relative to C (smaller is better, C performance = 1.0) (Source: julialang.org)

3 My path to Julia

• Started in pure math (chalk), • Intro programming class (Java) • Grad school in CSE, Cray XMT (C) C++ was too modern • Numpy/Pandas bandwagon (Python) • Numerical Graph Algorithms (Julia)

4 Outline

• LightGraphs.jl • Spectral Clustering • 2 language problem • Fake News • Future Directions

5 LightGraphs.jl

6 LightGraphs.jl is a central vertex

7 Generic Programing in LightGraphs.jl

Interface for subtypes of AbstractGraph.

• edges • Base.eltype • has edge • has vertex • inneighbors • ne • nv • outneighbors • vertices • is directed

8 for Spectral Partitioning Spectral Clustering is Graphs + FP

Figure 2: A graph with four natural clusters. 9 Spectral Graph Matrices

Graphs and Linear Algebra  1, if vi ∼ vj Adjacency Matrix Aij = 0, otherwise

Degrees Dii = di = deg(vi ) all other entries 0.

Combinatorial Laplacian L = D − A

Normalized Laplacian Lˆ = I − D−1/2AD−1/2

Normalized Adjacency Aˆ = D−1/2AD−1/2

10 Spectral Graph Types

11 Spectral Graph Types

11 Spectral Graph Types

NonBacktracking operator

B(s,t),(u,v) = Ast ∗ Auv ∗ δtu ∗ (1 − δsv )

11 Graph Clustering is Finding Block Structure

Adjacency Matrix of Scrambled Block Graph Adjacency Matrix of Block Graph 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 0 0

500 500

1000 1000

1500 1500 destination destination

2000 2000

2500 2500

source source

Figure 3: A Stochastic Block Model graph showing the recovery of clusters. On left the adjacency matrix with a random labeling. On right the adjacency matrix permuted to reveal block structure.

12 The Spectral Sweep Cut Method

• Solve Lˆx = λ2x to accuracy τ − 1 • Sort D 2 x − 1 • Check all n sweep cuts of D 2 x for best partition

• Recursive bisection applies spectral sweep cuts to each part • Can achieve any number of clusters • First step is the bottleneck

• Large τ seems to work in practice • The goal is to ensure that the final partition recovers the clusters

13 The Spectral Sweep Cut Method

• Solve Lˆx = λ2x to accuracy τ − 1 • Sort D 2 x − 1 • Check all n sweep cuts of D 2 x for best partition

• Recursive bisection applies spectral sweep cuts to each part • Can achieve any number of clusters • First step is the bottleneck

• Large τ seems to work in practice • The goal is to ensure that the final partition recovers the clusters

13 The Spectral Sweep Cut Method

• Solve Lˆx = λ2x to accuracy τ − 1 • Sort D 2 x − 1 • Check all n sweep cuts of D 2 x for best partition

• Recursive bisection applies spectral sweep cuts to each part • Can achieve any number of clusters • First step is the bottleneck

• Large τ seems to work in practice • The goal is to ensure that the final partition recovers the clusters

13 Ring of Cliques

Rb,q: q blocks each of size b.  t t  Jb e1e1 0 ··· e1e1  t t  e1e Jb e1e 0   1 1   t t   0 e1e1 Jb e1e1 0     ......   . . . .     e et J e et   0 1 1 b 1 1 t t e1e1 0 e1e1 Jb

t where Jb = 1b1b − Ib.

Figure 5: The adjacency matrix Figure 4: A drawing of Rb,q of Rb,q has block circulant laid out to show structure. structure. 14 Ring of Cliques

Rb,q: q blocks each of size b.

Figure 6: A drawing of Rb,q laid out to show structure. 15 Ring of Cliques Correct Partitions

−2 −1 • φG = O b q is optimal −2 • φ∗ = O b is sufficient to find clusters with recursive bisection sweep cut

Figure 7: In a “correct” partition of Rb,q all cliques are entirely red or entirely blue.

16 Ring of Cliques Correct Partitions

−2 −1 • φG = O b q is optimal −2 • φ∗ = O b is sufficient to find clusters with recursive bisection sweep cut

Figure 7: In a “correct” partition of Rb,q all cliques are entirely red or entirely blue.

16 Eigenpairs of Rb,q normalized Adjacency Matrix

λ2=0.0016

λ4=0.0061

λ16=0.0320

0 20 40 60 80 100 120 140 160 vertex index

Figure 8: The eigenvectors of Lˆ for R10,16. The eigenvectors with eigenvalues close to 0 indicate the block structure with differing frequencies. 17 Rb,q Spectrum

λ (k) λ (k) 1 2 Xnoise 0.0 0.2 0.4 0.6 0.8 1.0 1.2

Figure 9: Spectrum of Lˆ of R34,10.

• [λ2, λq] in green ˆ 1 • λ(A) > 2 eigenvectors recover partition

b q n δ2 50 20 1000 1.2275 × 10−4 500 10 5000 5.1487 × 10−6 −2 • Spectral gap δ2 = O n • Blend gap δq = O (1) 18 Bidirectional learning

We learn about data through algorithms and about algorithms through data.

19 Legacy Languages Slow Down Research

ARPACK is the universally approved eigensolver implementation of eigs.

ArnoldiMethod.jl is a modern Julia implementation 20 Summary

• Spectral blends lead to lower accuracy requirements in spectral partitioning

  • Residual tolerance necessary for Fiedler partitions is O n−5/2  −1/2 while data clusters are resolved at O n for Rb,q

• Analyzing model problems for data mining is important for determining accuracy requirements for numerical methods

• Data Analysis structure can be revealed faster than accurate solutions to numerical problem

21 Novel Stopping Criteria for Spectral Partitioning Alternative Stopping Criteria

i i − 1 i Let x be the i-th iterate and y = D 2 x .

Name Criterion Guarantee1 √ ˆ i i Residual Lx − µx < τ 2λ2 + τ Difference |φ yi  − φ yi+1| < τ — √ Naive Cheeger φ yi  < 2µ trivial

i  p √ Accurate Cheeger φ y < 2(µ − r) 2λ2

Accurate Cheeger reduces iterations by 4X with only an increase in φ yi  of less than 2X on average over real world data sets. 1 Assuming µ − λ2 < |µ − λi | for all i 6= 2 22 Real World Networks Sizes and Eigengaps

Graph Name VE λ2 λ3 δ2 Newman/dolphins 62 318 0.03952 0.23435 0.19483 SNAP/wiki-Vote 8297 103689 0.10055 0.16859 0.06804 Newman/football 115 1226 0.13680 0.18292 0.04612 SNAP/Oregon-2 11806 65460 0.02919 0.04191 0.01272 SNAP/soc-Epinions1 75888 508837 0.00479 0.01323 0.00844 SNAP/cit-HepTh 27770 352807 0.01839 0.02391 0.00552 SNAP/soc-sign-epinions 131828 841372 0.01123 0.01575 0.00452 Newman/lesmis 77 508 0.08813 0.09222 0.00409 SNAP/soc-Slashdot0902 82168 948464 0.01174 0.01321 0.00147 SNAP/email-Enron 36692 367662 0.00353 0.00454 0.00101 Newman/cond-mat-2005 40421 351382 0.00428 0.00484 0.00056 SNAP/web-NotreDame 325729 1497134 0.00134 0.00182 0.00048 SNAP/web-Stanford 281903 2312497 0.00002 0.00008 0.00006 SNAP/web-Google 916428 5105039 0.00043 0.00044 0.00001 23 Summary

p • Stopping when φ yi  ≤ 2(µ − r) leads to good performance on real world networks • Accurate Cheeger reduces iterations by 4X with only an increase in φ yi  of less than 2X on average over real world data sets • Reduction in iterations increases as problems get harder • Analysis of numerical accuracy and graph analysis objectives leads to better performance • Learning about algorithms from data and data from algorithms

24 Julia is an ideal environment

• Hard to write graph algorithms in high level languages • Hard to inspect and modify numerical algorithm written in low level languages • Julia lets you solve problem in one language

25 Summary

• Low accuracy eigenvectors are useful for graph partitioning and easier to compute • Spectral blends are a powerful analytic tool. • Analyzing numerical accuracy in a graph analysis context leads to improved algorithms and a deeper understanding of computational phenomena

26 Integrating HPL with HPC Libraries Graph Analysis

• Applications: Cybersecurity, Social Media, Fraud Detection...

(a) Big Graphs (b) HPC

(c) Productivity 27 Types of Graph Analysis Libraries

• Purely High productivity Language with simple data structures • Low level language core with high productivity language interface.

Name High Level Interface Low Level Core Parallelism SNAP Python C++ OpenMP igraph Python, R C - graph-tool Python C++ (BGL) OpenMP NetworKit Python C++ OpenMP Stinger Julia (new) C OpenMP/Julia

Table 1: Libraries using the hybrid model

28 Why is graph analysis is harder than scientific computing?

(a) z = exp(a + b2) (b) BFS from s

Figure 11: Computations access patterns in scientific computing and graph analysis

• Less regular computation • Diverse user defined functions beyond arithmetic • Temporary allocations kill performance 29 High Productivity Languages

Feature Python R Ruby Julia

REPL XXXX Dynamic Typing XXXX Compilation × × × X Multithreading Limited × Limited X

Table 2: Comparison of features of High Productivity Languages

30 STINGER

• A complex data structure for graphs in C • Parallel primitives for graph algorithms

31 Addressing the 2 language problem using Julia

• Two languages incurs development complexity • All algorithms in Julia • Reuse only the complex STINGER data structure from C • Parallel constructs in Julia, NOT low level languages

32 Integrating Julia with STINGER

• All algorithms in Julia • Reuse only the complex STINGER data structure from C • Parallel constructs in Julia, not low level languages • Productivity + Performance!

33 Graph 500 benchmark

• Standard benchmark for large graphs • BFS on a RMAT graph • 2scale vertices • 2scale ∗ 16 edges • Comparing BFS on graphs from scale 10 to 27 in C and using StingerGraphs.jl • A multithreaded version of the BFS with up to 64 threads was also run using both libraries

34 Results Preview

Threads = 1 Threads = 6 Threads = 12 2.5 Stinger 2.0 StingerGraphs.jl

1.5

1.0

0.5 Normalized Runtime Normalized Runtime Normalized Runtime 0.0 10 12 14 16 18 20 22 24 26 10 12 14 16 18 20 22 24 26 10 12 14 16 18 20 22 24 26 Scale Threads = 24 Threads = 48 2.5

2.0

1.5

1.0

0.5 Normalized Runtime Normalized Runtime 0.0 10 12 14 16 18 20 22 24 26 10 12 14 16 18 20 22 24 26 Scale Scale

Figure 12: Graph500 Benchmark Results (Normalized to STINGER – C) 35 Moving data kills performance

Bulk transfer of memory between memory spaces is more expensive than direct iteration

Scale Exp (I) Exp (G) BFS (I) BFS (G) 10 1.03 2.43 252.17 1833.70 11 2.21 4.92 504.37 3623.40 12 4.64 10.33 1034.36 7239.56 13 9.70 21.04 2142.28 14461.98 14 20.79 44.18 4328.72 28767.98 15 58.11 107.91 12583.00 67962.16 16 127.92 225.55 27036.85 128637.68

Table 3: Iterators (I) vs Gathering successors (G) – all times in ms

36 Parallelism options in Julia

• MPI style remote processes • Cilk style Tasks that are lightweight “green” threads • OpenMP style native multithreading support - @threads

We use the @threads primitives to avoid communication costs

37 Julia Atomics

• Atomic type on which atomic ops are dispatched • Atomic{T} contains a reference to a Julia variable of type T • Extra level of indirection for a vector of atomics

Figure 13: Julia provides easy access to LLVM/Clang intrinsics

38 Unsafe Atomics

Standard atomic types give poor performance, UnsafeAtomics.jl package reduces overhead.

Figure 14: Atomic data structures in Julia 39 Unsafe Atomics Performance

Exp Exp Exp(N)/ BFS BFS BFS(N)/ Scale (N) (U) Exp(U) (N) (U) BFS(U) 10 0.13 0.1 1.3 47.23 43.27 1.10 11 0.27 0.23 1.17 98.99 91.32 1.08 12 0.62 0.47 1.32 217.44 190.74 1.14 13 1.31 0.97 1.35 505.59 420.84 1.20 14 2.7 2.17 1.24 1158.3 977.1 1.185 15 5.74 3.93 1.46 2576.18 2154.5 1.20 16 11.6 8.77 1.32 5565.87 4559.16 1.22

Table 4: Atomics: Native (N) VS Unsafe (U) (Times in ms)

40 Results: Parallel run times are comparable for Graph500 Runtimes

Threads STINGER Stinger.jl Slowdown 1 276.46 250.18 0.90x 6 169.93 237.21 1.40x 12 140.53 185.74 1.32x 24 97.73 145.83 1.49x 48 86.41 103.08 1.19x

Table 5: Total time to run Graph500 BFS benchmark for all graphs scale 10-27, in minutes

41 Results: Parallel Scaling is competitive with OpenMP

Scale 27 BFS 10000 Stinger StingerGraphs.jl 8000

6000

4000

Runtime (seconds) 2000

0 1 6 12 24 48 Threads

Figure 15: Performance scaling with threads 42 Summary

• Tight integration between high productivity and high performance languages is possible • Julia is ready for HPC graph workloads • Julia parallelism can compete with OpenMP parallelism • We can expand HPC in High Level Languages beyond traditional scientific applications

43 Fast Abstractions Open New Possibilities

• Stinger has a huge problem: inflexibility • Because it was written in C for the XMT with no abstraction • Can’t change the types of the edge weights

44 Stinger has no Abstraction

• Stinger metadata is totally rigid • MetaGraphs.jl uses dictionaries for each edge like NetworkX

45 Finding Fake News with Math Fake News Flavors

46 Fake News and the Modern Web

• MOTIVE: Clickbait revenue streams and political campaign funding incentivizes low quality articles to attract readers • MEANS: The democratization of online media allows anyone to setup a website and publish unadjudicated content • OPPORTUNITY: Social media has decentralized the enforcement of ethical norms in journalism

Manual fact-checking is valuable but suffers from slow reaction 47 times and accusations of bias. Approach

48 Bias Detection

Humans can pick up on nuanced but powerful signals of bias in terms of semantics, sentiment (tone) and content.

49 Content Model

Figure 16: Content Model Pipeline

50 Credibility Assessment: Fake or Not Fake?

51 When words are not enough. . . Source: HillaryDaily.com

• Trump Slams ‘Crooked Hillary’ for Making Excuses for Losing the Election • New Book Reveals That Obama Pushed Hillary to Concede in 2016 Election • 2016 Democratic Presidential Candidate Blasts Media for Being Against Trump ‘Right from the Beginning’ • NYT Report Suggests Trump & Russia Conspired During Election–But Then There’s the Fine Print. . . • Michelle Obama: If I Ran Against Trump I Would Have Beaten Him Easily! • Chelsea’s Husband Shutters Hedge Fund 52 After Hillary Loses Nearly All Influence Structural Analysis Example: Stars

• Sites of degree 1 with one central, common site • Commonly seen in CSS link networks

53 Figure 17: Shared CSS sourced from Big News Network

54 Structural Method

Figure 18: Structural Method Pipeline

54 Structural Method: Graph Creation

HTML Tag Description Mutually linked sites (text content) Shared CSS (visual style)