Network Modularity

Modularity CMSC 858L Module-detection for Function Prediction • Biological networks generally modular (Hartwell+, 1999) • We can try to find the modules within a network. • Once we find modules, we can look at over-represented functions within a module, e.g.: - If a majority of the proteins within a module have annotation A, predict annotation A for the other proteins in the module. • ⇒ Graph clustering methods - Min Multiway Cut, Graph Summarization, VI-Cut: examples we’ve already seen. - Methods often borrowed from other “community detection” applications. Modularity Modularity eii = % edges in module i eii=|{(u,v) : u ∈ Vi, v ∈ Vi, (u,v)∈E}| / |E| 2 ai = % edges with at least 1 i end in module i ai=|{(u,v) : u ∈ Vi, (u,v) ∈ E}| / |E| probability a random 3 edge would fall into Modularity is: module i k Q = e a2 ii − i i=1 High modularity ⇒ more edges probability edge within the module that you expect is in module i by chance. Examples 28 5 22 27 6 3 4 23 7 16 19 8 21 12 13 14 24 2 9 9 29 17 3 10 5 1 7 1 20 10 30 25 26 2 11 8 18 6 4 Communities Assigned 15 to a small graph Communities assigned to Note: maximizing a random graph modularity will find it’s own # of clusters Modularity Algorithm #1 • Modularity is NP-hard to optimize (Brandes, 2007) • Greedy Heuristic: (Newman, 2003) - C = trivial clustering with each node in its own cluster - Repeat: • Merge the two clusters that will increase the modularity by the largest amount • Stop when all merges would reduce the modularity. Karate Club (again) Newman-Girvan, 2004 Only 3 is in the “wrong” community. Maximizing Modularity via a Spectral Technique Another View of Modularity probability a random adjacency edge would go matrix normalization between i and j 1 k k Q = A i j 4m ij − 2m i,j in same module m = # edges in graph ki = degree(i) Consider the case of only 2 modules. Let si = 1 if node i is in module 1; -1 if node i is in module 2 1 k k Q = A i j (s s + 1) 4m ij − 2m i j i,j 1 k k = A i j s s 4m ij − 2m i j i,j Goal: Maximize modularity • Try to find ±1 vector s that maximizes the modularity. • Start with the case above: only two groups. • Then show how to extend to ≥ 2 groups. • Will use some ideas from linear algebra. 1 k k Q = A i j s s 4m ij − 2m i j i,j 1 = sT Bs s is a {-1,1} 4m membership vector “modularity” matrix Let ui (i = 1,...,n) be the eigenvectors of matrix B with eigenvalue βi for vector ui. (Assume β1 ≥ β2 ≥ β3 ≥ β4 ≥ ... ≥ βn) Write s as: where: T s = aiui ai = ui s i T s = aiui ai = ui s i 1 Q = sT Bs 4m = a uT B a u drop the (1/4m) i i j j i j = a uT B a u i i j j i j T = aiajui Buj i j Note: 1. Buj = βi uj T 2. When i ≠ j, ui Buj = 0 because ui ⊥ uj T 2 Q = (ui s) βi i To Maximize Q T 2 Q = (ui s) βi i • If we were allowed to choose any s we’d pick the one that is parallel to u1. • But: si must be +1 or -1. This is a severe restriction. • So: maximize u1⋅s, the projection of s along vector u1. • To do this: choose si = 1 if u1 > 0, and si = -1 if u1 ≤ 0. Subsequent Splits The modularity if this module The modularity of was split according to s module g as it stands now 1 1 = B s s + B B 2 ij i j 2 ij − ij i,j g i,j g i,j g ∈ ∈ ∈ g Bij = sisjδi,j Bik i,j g i,j g k g ∈ ∈ ∈ +1 -1 Karate Club Results: Exactly Right (Newman, 2006) Greedy Improvement Given a partition of the network • largest increase • Repeat: might be negative - Find the vertex that would yield the largest modularity increase if it were moved into a different community AND that has not yet been moved - Move the vertex into that new community • Return the best partitioning ever observed Similar to the Kernighan-Lin graph partitioning heuristic (details in a few slides) Additional Results Girvan-Newman Newman (betweenness) Spectral Greedy Hierarchical Newman, 2006 Krebs Political Books Nodes = political books; shape = Edges = books frequently bought conservative (squares) / liberal by the same readers on (circles) / “centrist” (triangles) Amazon.com Complexes Biological Processes “+” indicates parameters All GS predictions are Pareto optimal tuned to maximize precision Many unique predictions made by each algorithm % Modules Enriched A lower % of GS modules are enriched for some annotation, but not indicative of predictive performance. “Easy” to get legitimate statistical signiBicant enrichment. Summary: Modularity • Modularity is widely used as a measure for how good a clustering is. • Particularly popular in social network analysis, but used in other contexts as well (e.g. Brain networks). • Has a “resolution” preference: for a given network, will tend to prefer clusters of a particular size. • Often this means the clusters are too big. • A good example of where a spectral clustering technique can work..

Network Modularity

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support