Motifs in Temporal Networks
Total Page:16
File Type:pdf, Size:1020Kb
Motifs in Temporal Networks Ashwin Paranjape, Austin Benson, & Jure Leskovec {ashwinpp,arbenson,jure}@stanford.edu Code & data available at http://snap.stanford.edu/temporal-motifs/ Overview The 36 motifs we can count quickly Temporal networks model dynamic complex systems such as telecommunications, credit With our algorithms, we can count 2-node and 3-node, 3-edge motifs efficiently. card payments, and social interactions. It takes a couple hours to count all 36 of these motifs for a phone call network with 2B edges. 2 2 2 2 2 2, 3 Two common ways that people study temporal networks are 3 1, 3 3 1 1 3 1 3 1 1 1. Growth models consider how nodes and edges enter a network (e.g., how does the M1,1 M1,2 M1,3 M1,4 M1,5 M1,6 3 2 2 2 2 2, 3 2 internet infrastructure grow over time) 1, 3 3 1 1 3 1 3 1 1 2. Snapshot analysis creates a sequence of static graphs by aggregating links in coarse- M2,1 M2,2 M2,3 M2,4 M2,5 M2,6 3 3 2 1 2 2 2, 3 2 2 grained intervals (e.g., daily phone call graph) 1, 3 3 1 3 1 1 1 M3,1 M3,2 M3,3 M3,4 M3,5 M3,6 3 3 These existing analyses do not capture the rich temporal information of complex systems 1 3 1, 3 2 3 2 1 2, 3 1 2 1 2 1 2 that are constantly in motion. M4,1 M4,2 M4,3 M4,4 M4,5 M4,6 3 3 2 1, 3 2, 3 1 2 1 3 2 1 3 2 1 2 1 M5,1 M5,2 M5,3 M5,4 M5,5 M5,6 3 3 1, 2, 3 3 1, 2 1, 2 3 1, 2 3 1, 2 1, 2 Temporal network motifs M6,1 M6,2 M6,3 M6,4 M6,5 M6,6 We propose temporal network motifs, or small temporal subgraph patterns as an analytical tool for temporal networks. These are analogous to network motifs, which are small subgraph patterns, used to study static graphs. Empirical observations (! = 1 hour) c r-node, k-edge temporal network motif § Answers to questions and comments b § Email-Eu E-mails between researchers. StackOverflow 15s, 32s 1. Directed multigraph on questions and answers. 3 § Phonecall-Eu Phone call records. 31s 1 2 25s with r nodes and k edges § SMS-A Text messages. § Bitcoin transactions between addresses. 17s, 28s, 30s, 35s 2. Edge ordering § College-Msg Online private messages. § WikiTalk edits of user talk pages. a 14s d δ = 10s 3. Maximum time span δ § FBWall Facebook wall posts. Email-Eu Phonecall-Eu SMS-A CollegeMsg source destination timestamp 2 b c 1 13K 10K 4K 6K 14K 28K 0.5M 0.2M 56K 69K 0.3M 0.8M 31K 26K 2K 2K 75K 85K 0.1M 75K 3K 2K 0.1M 0.2M Counts of all 2- and 3- a d 14s Motif instance k temporal 2 1 18K 30K 3K 2K 35K 14K 1M 0.3M 57K 50K 2M 0.3M 45K 47K 1K 0.9K 97K 82K 92K 64K 2K 2K 0.1M 0.1M 32s node, 3-edge temporal c a 15s edges that match the pattern 2 25s 1 18K 12K 35K 40K 3K 6K 2M 0.2M 0.3M 1M 46K 71K 38K 35K 0.1M 91K 1K 2K 81K 85K 0.1M 0.2M 2K 3K 28s motifs in our datasets (! a c 17s that all occur within δ time 1 2 0.7M 35K 0.8M 45K 6K 6K 0.6M 0.2M 0.7M 0.3M 68K 75K 0.1M 59K 0.2M 0.1M 1K 1K 0.2M 79K 0.3M 0.1M 3K 2K a 68K 59K 28K 16K 38K 15K 0.3M 0.3M 0.3M 0.3M 0.3M 0.3M 3M 3M 0.1M 81K 89K 58K 0.2M 0.1M 0.1M 0.1M 0.1M 0.1M a b 25s 2 1 =1 hour). Along each 0.4M 59K 0.8M 40K 34K 28K 4M 0.3M 0.7M 2M 2M 0.9M 5M 3M 0.2M 90K 84K 62K 0.3M 0.2M 0.2M 0.1M 0.1M 0.2M a c 28s 1, 2 row, the first two edges a c 30s StackOverflow Bitcoin WikiTalk FBWall d c 2 are the same. Along Wrong order! 1 6M 4M 1M 1M 7M 8M 18B 10B 1B 1B 14B 23B 1M 0.1M 0.1M 0.2M 0.2M 1M 18K 22K 2K 2K 35K 27K c d 31s 2 14s 15s (c, a) before (a, c) 1 2M 2M 0.6M 0.2M 6M 2M 15B 13B 1B 2B 22B 15B 0.6M 0.3M 0.1M 18K 1M 0.2M 11K 14K 2K 1K 28K 34K each column, the third c a 32s 17s 1 2 2M 2M 2M 6M 0.3M 0.7M 19B 14B 15B 19B 2B 1B 0.6M 0.1M 1M 2M 27K 0.1M 15K 15K 19K 14K 1K 1K edge is the same. a a c 35s 1 2 3M 3M 9M 5M 0.7M 0.3M 21B 11B 25B 14B 1B 1B 0.2B 0.7M 0.4B 1M 91K 91K 13K 12K 30K 21K 2K 2K 2 1 0.4M 4M 4M 3M 3M 3M 8B 9B 14B 16B 17B 15B 0.8M 0.7M 1M 0.3M 0.8M 0.3M 0.3M 0.2M 24K 27K 22K 22K 1, 2 7M 2M 8M 4M 5M 7M 15B 9B 25B 25B 20B 23B 17M 0.7M 0.4B 2M 1M 2M 0.3M 0.2M 27K 17K 17K 22K 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 Efficient counting algorithms Given a temporal network and a motif, we want to efficiently count the number of Blocking motifs Non-blocking motifs Blocking and non-blocking communication. 1, 3 2 1, 3 2 Fraction of all 2 and 3-node, 3-edge motif instances of the motif in the temporal network. FBWall counts that correspond to two groups of SMS-A c c motifs. Motifs on the left capture “blocking” General algorithm for any motif b b 15s, 32s 3 1 Email-Eu 1 2, 3 1 2 2, 3 31s behavior (common in SMS) and motifs on 1. Ignore timestamps to get a static 25s 17s, 28s, 30s, 35s CollegeMsg a d the right capture “non-blocking” behavior graph and a static motif. a 14s d δ = 10s Phonecall-Eu (common in email). 1, 2 3 1, 2 3 b c 0.5 0.0 0.0 0.5 2. Find instances of the static motif in Fraction of counts Fraction of counts a the static graph (using known d c Cyclic triangles in Bitcoin. Bitcoin Cyclic triangles algorithms). Fraction of 3-edge temporal triangle motifs Phonecall-Eu 2 a 3 a corresponding to cyclic triangles. Bitcoin d CollegeMsg 1 3 1 2 3. For each static motif instance, fetch has a much higher fraction compared to FBWall c time-ordered temporal edges. 0.0 0.2 0.4 0.6 0.8 1.0 all other datasets. b c Fraction cyclic triangles a 4. Count temporal motif instances in (%) 100 Networks from the same domain have similar each temporal edge list using a 90 motif count distributions. 80 Email-Eu subnetworks dynamic programming algorithm 70 Stack exchange networks Variance explained by number of principal that maintains subsequence 60 Dissimilar networks components for three groups of networks. 50 counts. Runs in linear time in 1 2 3 4 Variance Explained the number of timestamps. Number of principal components Special case analysis. Runs in O(k2m) time for 2-node, k-edge motifs, 1, 3 where m is the total number of temporal edges. 2 Empirical observations (varying !) This is optimal, i.e., linear in the size of the data for constant k. 1200 Motif counts over time in CollegeMsg Fewer switches in target Faster algorithms for special cases 1000 1, 2, 3 1, 3 2 à more common 3-node, 3-edge stars 800 2, 3 1, 2 have to enumerate pairs of neighbors of high-degree nodes 1 3 1, 2 3 Problem Improvement count for all neighbors simultaneously 600 Runtime complexity is O(m), where m is the total number of edges (optimal) 400 t2n+1,...,tm u v Number of instances 200 t2n 1 Overall growth from − Early time scales t1 t3 t2 t2n 1 to 20 minutes t4 dominated by the 0 ... 1 2 3 3-node, 3-edge triangles w1 w2 wn single-destination motif 10 10 10 3 Time to complete motif (seconds) Problem a static edge with many timestamps may appear in several triangles 1 2 à O(Tm) complexity, where T is the number of static triangles Improvement simultaneously count triangles for edges with many timestamps References Runtime complexity is O(T1/2m), a significant improvement Motifs for static networks 1. Milo, Ron, et al. Network motifs: simple building blocks of complex networks. Science 298.5594 (2002): 824-827. Similar temporal network pattern counting ideas Triangle speedups Wiki. edits Stack Overflow Bitcoin Texts Phone calls 2. Kovanen, et al. Temporal motifs in time-dependent networks. JSTAT, 2011. 3. Zhao, et al. Communication motifs: a tool to characterize social communications. CIKM, 2010. # temp. edges 10M 63M 123M 800M 2.04B 4.