Note to other teachers and users of these slides: We would be delighted if you found our material useful for giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. If you make use of a significant portion of these slides in your own lecture, please include this message, or a link to our web site: http://www.mmds.org

CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu  We often think of networks being organized into modules, clusters, communities:

5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 2 5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 3

Nodes Nodes

Network

5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 4  Find micro-markets by partitioning the

query-to-advertiser graph: query

advertiser

[Andersen, Lang: Communities from seed sets, 2006] 5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 5  Clusters in Movies-to-Actors graph:

[Andersen, Lang: Communities from seed sets, 2006] 5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 6  Discovering social circles, circles of trust:

[McAuley, Leskovec: Discovering social circles in ego networks, 2012] 5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 7  Graph is large ▪ Assume the graph fits in main memory ▪ For example, to work with a 200M node and 2B edge graph one needs approx. 16GB RAM. ▪ But the graph is too big for running anything more than linear time algorithms.  We will cover a PageRank based algorithm for finding dense clusters. ▪ The runtime of the algorithm will be proportional to the cluster size (not the graph size!).

5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 8  Discovering clusters based on seed nodes ▪ Given: Seed node s ▪ Compute (approximate) Personalized PageRank (PPR) around node s (teleport set={s}) ▪ Idea is that if s belongs to a nice cluster, the will get trapped inside the cluster

Seed node 5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 9

Good clusters (lower (lower better)is

Seed node Cluster Cluster “quality”  Algorithm outline: Node rank in decreasing PPR score ▪ Pick a seed node s of interest ▪ Run PPR with teleport set = {s} ▪ Sort the nodes by the decreasing PPR score ▪ Sweep over the nodes and find good clusters

5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 10 5  Undirected graph 푮(푽, 푬): 1 2 6 4 3  Partitioning task: ▪ Divide vertices into 2 disjoint groups 퐴, 퐵 = 푉\퐴

A 1 5 B=V\A 2 6 3 4

 Question: ▪ How can we define a “good” cluster in 푮?

5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 11  What makes a good cluster? ▪ Maximize the number of within-cluster connections ▪ Minimize the number of between-cluster connections

5 1

2 6 4 3 A V\A

5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 12  Express cluster quality as a function of the “edge cut” of the cluster  Cut: Set of edges (edge weights) with only one node in the cluster: Note: This works for weighted and unweighted

(set all wij=1) graphs

5 A 1 cut(A) = 2 2 6 4 3

5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 13  Partition quality: Cut score ▪ Quality of a cluster is the weight of connections pointing outside the cluster  Not so uncommon case: “Optimal cut” Minimum cut

 Problem: ▪ Only considers external cluster connections ▪ Does not consider internal cluster connectivity

5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 14 [Shi-Malik]

 Criterion: : Connectivity of the group to the rest of the network relative to the density of the group |{(i, j) E;i  A, j  A}| (A) = min(vol(A), 2m − vol(A)) 풗풐풍(푨): total weight of the edges with at least one endpoint in 푨: 퐯퐨퐥 푨 = σ풊∈푨 풅풊 ◼ Vol(A)=2*#edges inside A + #edges pointing out of A ◼ Why use conductance? m… number of edges of the graph ◼ Produces more balanced partitions di… of node I E...edge set of the graph 5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 15 흓 = ퟐ/ퟒ = ퟎ. ퟓ 흓 = ퟔ/ퟗퟐ = ퟎ. ퟎퟔퟓ

5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 16

 Algorithm outline: 풊 ▪ Pick a seed node s of 푨 Good clusters interest 흓 ▪ Run PPR w/ teleport={s} ▪ Sort the nodes by the decreasing PPR score ▪ Sweep over the nodes Conductance and find good clusters Node rank i in decreasing PPR score  Sweep:

▪ Sort nodes by decreasing PPR score 푟1 > 푟2 > ⋯ > 푟푛

▪ For each 풊 compute 흓(푨풊 = 풖ퟏ, … 풖풊 )

▪ Local minima of 흓(푨풊) correspond to good clusters

5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 17 풊

 The whole Sweep 푨 Good clusters curve can be 흓 computed in linear

time: Conductance ▪ For over the nodes

▪ Keep hash-table of Node rank i in decreasing PPR score nodes in a set 퐴푖 ▪ To compute 흓 푨풊+ퟏ = 퐶푢푡(퐴푖+1)/푉표푙(퐴푖+1)

▪ 푉표푙 퐴푖+1 = 푉표푙 퐴푖 + 푑푖+1

▪ 퐶푢푡 퐴푖+1 = 퐶푢푡 퐴푖 + 푑푖+1 − 2#(푒푑푔푒푠 표푓 푢푖+1 푡표 퐴푖)

5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 18  How to compute Personalized PageRank (PPR) without touching the whole graph? ▪ Power method won’t work since each single iteration accesses all nodes of the graph: 퐫(퐭+ퟏ) = 훃퐌 ⋅ 퐫(풕) + ퟏ − 휷 풂 At index S ▪ 풂 is a teleport vector: 풂 = ퟎ … ퟎ ퟏ ퟎ … ퟎ 푻 ▪ 풓 is the personalized PageRank vector  Approximate PageRank (AKA PageRank-Nibble) [Andersen, Chung, Lang, ‘07] ▪ A fast method for computing approximate Personalized PageRank (PPR) with teleport set S={s} ▪ ApproxPageRank(s, β, ε) ▪ s … seed node ▪ β … teleportation parameter ▪ ε … approximation error parameter

5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 19  Approximate PPR on undirected graph ▪ Lazy random walk, which is a variant of a random walk that stays put with probability 1/2 at each time step, and walks to a random neighbor the other half of the time:

푑푖… degree of 푖

(풕) ▪ Keep track of residual PPR score 풒풖 = 풑풖 − 풓풖 ▪ Residual 풒풖: how well is PPR score 풑풖 of 풖 is approximated ▪ 풑풖… is the “true” PageRank of node 풖 (풕) ▪ 풓풖 … is PageRank estimate of node 풖 at around 풕 풒풖 If residual 풒풖 of node 풖 is too big ≥ 휺 then push the walk 풅풖 further (distribute some of residual 푞푢 to all 푢’s neighbors along outgoing edges), else we don’t touch the node 5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 20  A different way to look at PageRank: [Jeh&Widom. Scaling Personalized Web Search, 2002] 풑휷(풂) = ퟏ − 휷 풂 + 휷 풑휷(푴 ⋅ 풂)

▪ 풑휷(풂) is the true PageRank vector with teleport parameter 휷, and teleport vector 풂

▪ 풑휷(푴 ⋅ 풂) is the PageRank vector with teleportation vector 푴 ⋅ 풂, and teleportation parameter 휷 ▪ where 푴 is the stochastic PageRank transition matrix ▪ Notice: 푴 ⋅ 풂 is one step of a random walk

5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 21  Proving: 풑휷(풂) = ퟏ − 휷 풂 + 휷 풑휷(푴 ⋅ 풂) ▪ We can break this probability into two cases: ▪ Walks of length 0, and ▪ Walks of length longer than 0 ▪ The probability of length 0 walk is ퟏ − 휷, and the walk ends where it started, with walker distribution 풂 ▪ The probability of walk length >0 is 휷, and then the walk starts at distribution 풂, takes a step, (so it has distribution 푴풂), then takes the rest of the random walk to with distribution 풑휷(푴풂) ▪ Note that we used the memoryless nature of the walk: After we know the location of the second step of the walk has distribution 푴풂, the rest of the walk can forget where it started and behave as if it started at 푴풂. This proves the equation.

5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 22 residual PPR score 풒풖 = 풑풖 − 풓풖  Idea: 풂…teleport vector ▪ 풓… approx. PageRank, 풒… its residual PageRank ▪ Start with trivial approximation: 풓 = ퟎ and 풒 = 풂 ▪ Iteratively push PageRank from 풒 to 풓 until 풒 is small  Push: 1 step of a lazy random walk from node 풖: 푷풖풔풉(풖, 풓, 풒): 풓′ = 풓, 풒′ = 풒 1-훽…teleport prob ′ 풓풖 = 풓풖 + ퟏ − 휷 풒풖 ퟏ Update r 풒′ = 휷풒 풖 ퟐ 풖 Do 1 step of a walk: for each 풗 such that 풖 → 풗: Stay at u with prob. ½ Spread remaining ½ ′ ퟏ 풒풖 풒풗 = 풒풗 + 휷 fraction of qu as if a ퟐ 풅풖 return 풓′, 풒′ single step of random walk were applied to u 5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 23 푷풖풔풉(풖,풓, 풒):  If 풒풖 is large, this 풓′ = 풓, 풒′ = 풒 ′ means that we have 풓풖 = 풓풖 + ퟏ − 휷 풒풖 ퟏ 풒′ = 휷풒 underestimated the 풖 ퟐ 풖 importance of node 풖 for each 풗 such that풖 → 풗: ′ ퟏ 풒풖 풒풗 = 풒풗 + 휷 ퟐ 풅풖  Then we want to take some return 풓′, 풒′ of that residual (풒풖) and give it away, since we know that we have too much of it ퟏ  So, we keep 휷풒 and then give away the rest to our ퟐ 풖 neighbors, so that we can get rid of it ퟏ ▪ This correspond to the spreading of 휷 풒 /풅 term ퟐ 풖 풖  Each node wants to keep giving away this excess PageRank until all nodes have no or a very small gap in excess PageRank 5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 24  ApproxPageRank(S, β, ε): r … PPR vector ru …PPR score of u Set 풓 = 0, 풒 = [0 . . 0 1 0 … 0] q …residual PPR vector 풒 q … residual of node u 풖 At index S u While 퐦퐚퐱 ≥ 휺: d … degree of u 풖∈푽 풅풖 u 푞 Choose any 풖 where 푢 ≥ 휀 푑푢 푷풖풔풉(풖, 풓, 풒): 풓′ = 풓, 풒′ = 풒 Update r: Move (1 − 훽) 풓′ = 풓 + ퟏ − 휷 풒 풖 풖 풖 of the prob. from qu to ru ′ ퟏ 풒풖 = 휷풒풖 1 step of a lazy ퟐ random walk: For each 풗 such that 풖 → 풗: - Stay at u with prob. ½ ퟏ 풒′ = 풒 + 휷풒 /풅 - Spread remaining ½ 휷 풗 풗 ퟐ 풖 풖 fraction of q as if a ′ ′ u 풓 = 풓 , 풒 = 풒 single step of random Return 풓 walk were applied to u

5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 25 s a b c s c Init: r = [0, 0, 0, 0] q = [1, 0, 0, 0] Push(s, r, q): b r = [.5, 0, 0, 0] ApproxPageRank(S, β, ε): a q = [.25, .08, .08, .08] Set 풓 = 0, 풒 = 0 . . 0 1 0 … 0 Push(s, r, q): 풒 while 퐦퐚퐱 풖 ≥ 휺: r = [.62, 0, 0, 0] 풖∈푽 풅풖 푞푢 q = [.06, .10, .10, .10] Choose any vertex 풖 where ≥ 휀 푑푢 Push(a, r, q): 푷풖풔풉(풖,풓, 풒): r = [0.62, .05, 0, 0] 풓′ = 풓, 풒′ = 풒 q = [.09, .03, .10, .10] ′ 풓풖 = 풓풖 + ퟏ − 휷 풒풖 Push(b, r, q): ퟏ 풒′ = 휷풒 r = [0.62, .05, .05, 0] 풖 ퟐ 풖 For each 풗 such that 풖,풗 ∈ 푬: q = [.09, .05, .03, .10] ′ ퟏ 풒풖 풒풗 = 풒풗 + 휷 ퟐ 풅풖 …. ′ ′ Update 풓 = 풓 , 풒 = 풒 r = [0.57, 0.19, 0.14, 0.09] return 풓

5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 26  Runtime: ▪ Approximate PageRank computes PPR in time ퟏ with residual error ≤ 휺 휺 ퟏ−휷 퐥퐨퐠 풏 ▪ Power method would take time 푶( ) 휺(ퟏ−휷)  Graph cut approximation guarantee: ▪ If there exists a cut of conductance 흓 and volume 풌 then the method finds a cut of conductance 퐎( 흓/ 풍풐품 풌)

▪ Details in [Andersen, Chung, Lang. Local graph partitioning using PageRank vectors, 2007] http://www.math.ucsd.edu/~fan/wp/localpartfull.pdf 5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 27  The smaller the ε the farther the random walk will spread!

Seed node

5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 28 [Andersen, Lang: Communities from seed sets, 2006] 5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 29 5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 30

Good clusters (lower (lower better)is

Seed node Cluster Cluster “quality”  Algorithm summary: Node rank in decreasing PPR score ▪ Pick a seed node s of interest ▪ Run PPR with teleport set = {s} ▪ Sort the nodes by the decreasing PPR score ▪ Sweep over the nodes and find good clusters

5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 31 5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 32  What if we want our clustering based on other patterns (not edges)?

Small subgraphs (motifs, graphlets) are building blocks of networks [Milo et al., ’02]

5/4/2021 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 33 Network:

Motif:

5/4/2021 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 34  Generalize cuts and volumes to motifs

푒푑푔푒푠 푐푢푡 푚표푡푖푓푠 푐푢푡

푣표푙(푆) = #(edge end-points in S) 푣표푙푀(푆) = #(motif end-points in S)

Optimize motif conductance [Benson et al., ’16]

5/4/2021 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 35  Three basic stages: ▪ 1) Pre-processing (M) ▪ Wij = # times (i, j) participates in the motif

1 1 1 1 3 1 1 1 1 1 2 1 1 1 1

Graph G Weighted graph W(M) ▪ 2) Approximate PageRank ▪ Same as before but on weighted W(M) ▪ 3) Sweep ▪ Same as before

5/4/2021 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 36 Pelagic fishes Micro- and benthic prey nutrient sources Benthic Fishes

Use multiple eigenvectors or recursive bi-partitioning to get multiple clusters

Benthic Macroinvertebrates

5/4/2021 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 37 Neuron locations

“Bi-fan” motif known to be important in neural networks [Milo et al., ’02]  Ring motor (RME*) neurons act as inputs  Inner labial sensory (IL2*) neurons are the destinations  URA neurons act as intermediaries

5/4/2021 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 38 5/4/2021 Jure Leskovec, Stanford CS246: Mining Massive Datasets 39  Communities: sets of tightly connected nodes  Define: 푸 ▪ A measure of how well a network is partitioned into communities ▪ Given a partitioning of the network into groups 풔 ∈ 푺:

Q  ∑s S [ (# edges within group s) – (expected # edges within group s) ]

Need a null model!

5/4/2021 Jure Leskovec, Stanford CS224W: Analysis of Networks 40  Given real 푮 on 풏 nodes and 풎 edges, construct rewired network 푮’

▪ Same but i random connections j ▪ Consider 푮’ as a ▪ The expected number of edges between nodes 풌 풌 풌 풊 and 풋 of degrees 풌 and 풌 equals to: 풌 ⋅ 풋 = 풊 풋 풊 풋 풊 ퟐ풎 ퟐ풎 ▪ The expected number of edges in (multigraph) G’: ퟏ 풌 풌 ퟏ ퟏ ▪ = σ σ 풊 풋 = ⋅ σ 풌 σ 풌 = ퟐ 풊∈푵 풋∈푵 ퟐ풎 ퟐ ퟐ풎 풊∈푵 풊 풋∈푵 풋 Note: ퟏ ▪ = ퟐ풎 ⋅ ퟐ풎 = 풎 ෍ 푘푢 = 2푚 ퟒ풎 푢∈푉

5/4/2021 Jure Leskovec, Stanford CS224W: Analysis of Networks 41  Modularity of partitioning S of graph G:

▪ Q  ∑s S [ (# edges within group s) – (expected # edges within group s) ] ퟏ 풌 풌 ▪ 푸 푮, 푺 = σ σ σ 푨 − 풊 풋 ퟐ풎 풔∈푺 풊∈풔 풋∈풔 풊풋 ퟐ풎 A = 1 if i→j, Normalizing const.: -1

is an indicator function Idea: We can identify communities by maximizing modularity 5/4/2021 Jure Leskovec, Stanford CS224W: Analysis of Networks 43

 Greedy algorithm for community detection ▪ 푂(푛 log 푛) run time  Supports weighted graphs  Provides hierarchical communities Network and communities:  Widely utilized to study large networks because: ▪ Fast ▪ Rapid convergence Dendrogram: ▪ High modularity output (i.e., “better communities”) 5/4/2021 Jure Leskovec, Stanford CS224W: Machine Learning“Fast with unfolding Graphs of communities in large networks” Blondel et al. (2008)45  Louvain algorithm greedily maximizes modularity  Each pass is made of 2 phases: ▪ Phase 1: Modularity is optimized by allowing only local changes to node-communities memberships ▪ Phase 2: The identified communities are aggregated into super-nodes to build a new network ▪ Goto Phase 1

The passes are repeated iteratively until no increase of modularity is possible.

5/4/2021 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 46  Put each node in a graph into a distinct community (one node per community)  For each node 푖, the algorithm performs two calculations: ▪ Compute the modularity delta (∆푄) when putting node 푖 into the community of some neighbor 푗 ▪ Move 푖 to a community of node 푗 that yields the largest gain in ∆푄  Phase 1 runs until no movement yields a gain This first phase stops when a local maxima of the modularity is attained, i.e., when no individual node move can improve the modularity. Note that the output of the algorithm depends on the order in which the nodes are considered. Research indicates that the ordering of the nodes does not have a significant influence on the overall modularity that is obtained. 5/4/2021 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 47 What is 횫푸 if we move node 풊 to community 푪?

Δ푄 푖 → 퐶 ▪ where: Σ푖푛: ▪ Σ푖푛… sum of link weights between nodes in 퐶

▪ Σ푡표푡… sum of all link weights of nodes in 퐶 ▪ 푘 … sum of link weights between node 푖 and 퐶 푖,푖푛 Σ푡표푡: ▪ 푘푖… sum of all link weights (i.e., degree) of node 푖  Also need to derive Δ푄 퐷 → 푖 of taking node 푖 out of community 퐷.  And then: Δ푄 = Δ푄 푖 → 퐶 + Δ푄 퐷 → 푖

5/4/2021 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 48 More in detail:

Modularity contribution Modularity contribution after merging node i before merging node i

Δ푄 푖 → 퐶

Modularity of C Modularity of i Self-edge weight

By applying the Modularity definition: + Edge weight of the resulting super- node from merging C and i rest of the graph (modeled as a single node)

5/4/2021 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 49  The communities obtained in the first phase are contracted into super-nodes, and the network is created accordingly: ▪ Super-nodes are connected if there is at least one edge between the nodes of the corresponding communities ▪ The weight of the edge between the two super- nodes is the sum of the weights from all edges between their corresponding communities  Phase 1 is then run on the super-node network

5/4/2021 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 50 Communities contracted into super-nodes Modularity gain

Halting criterion for 2nd Phase

the weights of the edges between the new super-nodes are given by the sum of the Halting criterion for 1st Phase weights of the edges between vertices in the corresponding two communities

5/4/2021 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 51 5/4/2021 Jure Leskovec, Stanford CS224W: Analysis of Networks 52  2M nodes  Red nodes: French speakers  Green nodes: Dutch speakers

5/4/2021 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 53  Modularity: ▪ Overall quality of the partitioning of a graph into communities ▪ Used to determine the number of communities

 Louvain modularity maximization: ▪ Greedy strategy ▪ Great performance, scales to large networks

5/4/2021 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 54