<<

Virality Prediction and in Social Networks

Yong-Yeol “YY” Ahn @yy

Most populated countries

1,300,000,000+

1,200,000,000+

300,000,000+ Most populated countries

1,300,000,000+

1,200,000,000+

1,100,000,000+

500,000,000+

300,000,000+ Social media can amplify messages. “Casually pepper spraying cop”

Companies are desperately trying to leverage social media to make their products and ads viral.

Original, useful - hard

“viral marketing”

‘Astroturfing’ may change election results.

“1/3 of online reviews may be fake.” - Bing Liu (UIC)

How can we understand virality?

Can we predict viral ?

Clue #1: Memes = Infectious diseases? Germs spread through the “social” network Memes, ideas and behaviors also spread through . Memes = Infectious diseases? Maybe not. “Large” world “Small” world

D. Centola, Science 2010 Which network is better at spreading quickly? “Large” world “Small” world

D. Centola, Science 2010

Multiple exposures are crucial. Social reinforcement Complex Contagion 三人成虎 三人成虎 three 三人成虎 three people construct 三人成虎 three people construct 三人成虎 three people tiger ? ? ! Complex Contagion needs multiple exposures.

&

Social contagion seems to be complex contagions. Clustering should be important. (B) Social Reinforcement Multiple Exposures Multiple

Exposures A

High Clustering Low Clustering (D) Retweet Network (E) Follower Network

Figure 1: The importance of community structure in the spreading of social contagions. (A) Structural trapping: dense communities with few outgoing links naturally trap informa- tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) trigger multiple exposures to others (red nodes). In the presence of high clustering, any additional adoption is likely to produce more multiple exposures than in the case of low clustering, in- ducing cascades of additional adoptions. (C) Homophily: people in the same community (same color nodes) are more likely to be similar and to adopt the same ideas. (D) Diffusion structure based on retweets among users sharing the hashtag #USA. Blue nodes represent English users and red nodes are Arabic users. Node size and link weight are proportional to retweet ac- tivity. (E) Community structure among Twitter users sharing the hashtags #BBC and #FoxNews. Blue nodes represent #BBC users, red nodes are #FoxNews users, and users who have used both hashtags are green. Node size is proportional to usage (tweet) activity, links represent mutual following relations.

10 Highly clustered structure?

Communities! Newman, 2006 Communities should enhance the spread of complex contagion. Clue #2: Homophily “Birds of a feather flock together.” Eric Fischer, Race and ethnicity Adamic & Glance, 2005

You are likely to share similar interests with your friends. You are more likely to adopt something from your social circles. Again, communities become important. A community ~ a common characteristic or shared interests Adamic & Glance, 2005 (C) Homophily

(E) Follower Network

Figure 1: The importance of community structure in the spreading of social contagions. (A) Structural trapping: dense communities with few outgoing links naturally trap informa- tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) trigger multiple exposures to others (red nodes). In the presence of high clustering, any additional adoption is likely to produce more multiple exposures than in the case of low clustering, in- ducing cascades of additional adoptions. (C) Homophily: people in the same community (same color nodes) are more likely to be similar and to adopt the same ideas. (D) Diffusion structure based on retweets among Twitter users sharing the hashtag #USA. Blue nodes represent English users and red nodes are Arabic users. Node size and link weight are proportional to retweet ac- tivity. (E) Community structure among Twitter users sharing the hashtags #BBC and #FoxNews. Blue nodes represent #BBC users, red nodes are #FoxNews users, and users who have used both hashtags are green. Node size is proportional to usage (tweet) activity, links represent mutual following relations.

10 Trivial example:

Language and Country (E) Follower Network

English #usa BBC News #bbc

Fox News #foxnews Arabic #usa

FigureFigureRetweet 1: 1: The The importance importance Network of community of community structure structure in theFollower in spreading the spreading of social Network of contagions. social contagions. (A) Structural trapping: dense communities with few outgoing links naturally trap informa- (A) Structural trapping: dense communities with few outgoing links naturally trap informa- tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) trigger tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) trigger multiple exposures to others (red nodes). In the presence of high clustering, any additional multiple exposures to others (red nodes). In the presence of high clustering, any additional adoption is likely to produce more multiple exposures than in the case of low clustering, in- ducingadoption cascades is likely of additional to produce adoptions. more (C) multipleHomophily exposures: people than in the in the same case community of low clustering, (same in- colorducing nodes) cascades are more of likely additional to be similar adoptions. and to (C) adoptHomophily the same: people ideas. in(D) the Diffusion same community structure (same basedcolor on retweets nodes) are among more Twitter likely users to be sharing similar the and hashtag to adopt#USA the. Blue same nodes ideas. represent (D) Diffusion English structure usersbased and red on retweets nodes are among Arabic Twitter users. Node users size sharing and link the hashtagweight are#USA proportional. Blue nodes to retweet represent ac- English tivity.users (E) and Community red nodes structure are Arabic among users. Twitter Node users size sharing and link the weight hashtags are#BBC proportionaland #FoxNews to retweet. ac- Bluetivity. nodes (E) represent Community#BBC users, structure red among nodes are Twitter#FoxNews usersusers, sharing and the users hashtags who have#BBC usedand both#FoxNews. hashtagsBlue nodesare green. represent Node#BBC size isusers, proportional red nodes to usage are #FoxNews (tweet) activity,users, and links users represent who have mutual used both followinghashtags relations. are green. Node size is proportional to usage (tweet) activity, links represent mutual following relations.

10 10 Both clues indicate that Communities should a crucial role in complex contagion. Communities weakly trap simple contagions.

Communities strongly trap complex contagions. Communities: traps for random walkers

Rosvall, Bergstrom, Lambiotte, ... (A) Structural Trapping Exposures

(D) Retweet Network Affects both simple and complex contagions.

Figure 1: The importance of community structure in the spreading of social contagions. (A) Structural trapping: dense communities with few outgoing links naturally trap informa- tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) trigger multiple exposures to others (red nodes). In the presence of high clustering, any additional adoption is likely to produce more multiple exposures than in the case of low clustering, in- ducing cascades of additional adoptions. (C) Homophily: people in the same community (same color nodes) are more likely to be similar and to adopt the same ideas. (D) Diffusion structure based on retweets among Twitter users sharing the hashtag #USA. Blue nodes represent English users and red nodes are Arabic users. Node size and link weight are proportional to retweet ac- tivity. (E) Community structure among Twitter users sharing the hashtags #BBC and #FoxNews. Blue nodes represent #BBC users, red nodes are #FoxNews users, and users who have used both hashtags are green. Node size is proportional to usage (tweet) activity, links represent mutual following relations.

10 (B) Social Reinforcement Multiple Exposures Multiple

Exposures A

High Clustering Low Clustering (D) Retweet Network (E) Follower Network Affects complex contagions.

Figure 1: The importance of community structure in the spreading of social contagions. (A) Structural trapping: dense communities with few outgoing links naturally trap informa- tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) trigger multiple exposures to others (red nodes). In the presence of high clustering, any additional adoption is likely to produce more multiple exposures than in the case of low clustering, in- ducing cascades of additional adoptions. (C) Homophily: people in the same community (same color nodes) are more likely to be similar and to adopt the same ideas. (D) Diffusion structure based on retweets among Twitter users sharing the hashtag #USA. Blue nodes represent English users and red nodes are Arabic users. Node size and link weight are proportional to retweet ac- tivity. (E) Community structure among Twitter users sharing the hashtags #BBC and #FoxNews. Blue nodes represent #BBC users, red nodes are #FoxNews users, and users who have used both hashtags are green. Node size is proportional to usage (tweet) activity, links represent mutual following relations.

10 (C) Homophily

(E) Follower Network Affects complex contagions.

Figure 1: The importance of community structure in the spreading of social contagions. (A) Structural trapping: dense communities with few outgoing links naturally trap informa- tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) trigger multiple exposures to others (red nodes). In the presence of high clustering, any additional adoption is likely to produce more multiple exposures than in the case of low clustering, in- ducing cascades of additional adoptions. (C) Homophily: people in the same community (same color nodes) are more likely to be similar and to adopt the same ideas. (D) Diffusion structure based on retweets among Twitter users sharing the hashtag #USA. Blue nodes represent English users and red nodes are Arabic users. Node size and link weight are proportional to retweet ac- tivity. (E) Community structure among Twitter users sharing the hashtags #BBC and #FoxNews. Blue nodes represent #BBC users, red nodes are #FoxNews users, and users who have used both hashtags are green. Node size is proportional to usage (tweet) activity, links represent mutual following relations.

10 Simple contagion

(A) Structural Trapping Exposures

(D) Retweet Network Complex contagion

(A) Structural Trapping (B) Social Reinforcement (C) Homophily Multiple ExposuresExposures Multiple

Exposures A

Figure 1: The importance of community structure in the spreading of social contagions. High Clustering Low Clustering (A) Structural trapping: dense communities with few outgoing links naturally trap informa- (D) Retweet Network (E)(E) Follower Follower Network Network (D) Retweet Network tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) trigger multiple exposures to others (red nodes). In the presence of high clustering, any additional adoption is likely to produce more multiple exposures than in the case of low clustering, in- ducing cascades of additional adoptions. (C) Homophily: people in the same community (same color nodes) are more likely to be similar and to adopt the same ideas. (D) Diffusion structure based on retweets among Twitter users sharing the hashtag #USA. Blue nodes represent English users and red nodes are Arabic users. Node size and link weight are proportional to retweet ac- tivity. (E) Community structure among Twitter users sharing the hashtags #BBC and #FoxNews. Blue nodes represent #BBC users, red nodes are #FoxNews users, and users who have used both hashtags are green. Node size is proportional to usage (tweet) activity, links represent mutual following relations.

FigureFigure 1: 1: The The importance importance of of community community structure structure in in the the spreading spreading of of social social contagions. contagions. Figure(A) Structural 1: The trapping importance: dense of community communities structure with few in outgoing the spreading links naturally of social trap10 contagions. informa- (A)(A)StructuralStructural trapping trapping: dense: dense communities communities with with few few outgoing outgoing links links naturally naturally trap trap informa- informa- tiontion flow. flow. (B) (B)SocialSocial reinforcement reinforcement:: people people who who have have adopted adopted a ameme meme (black (black nodes) nodes) trigger trigger tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) trigger multiplemultiple exposures exposures to to others others (red (red nodes). nodes). In In the the presence presence of of high high clustering, clustering, any any additional additional multiple exposures to others (red nodes). In the presence of high clustering, any additional adoptionadoption is is likely likely to to produce produce more more multiple multiple exposures exposures than than in in the the case case of of low low clustering, clustering, in- in- adoption is likely to produce more multiple exposures than in the case of low clustering, in- ducingducing cascades cascades of of additional additional adoptions. adoptions. (C) (C)HomophilyHomophily: people: people in in the the same same community community (same (same ducing cascades of additional adoptions. (C) Homophily: people in the same community (same colorcolor nodes) nodes) are are more more likely likely to to be be similar similar and and to to adopt adopt the the same same ideas. ideas. (D) (D) Diffusion Diffusion structure structure color nodes) are more likely to be similar and to adopt the same ideas. (D) Diffusion structure basedbased on on retweets retweets among among Twitter Twitter users users sharing sharing the the hashtag hashtag#USA#USA. Blue. Blue nodes nodes represent represent English English based on retweets among Twitter users sharing the hashtag #USA. Blue nodes represent English usersusers and and red red nodes nodes are are Arabic Arabic users. users. Node Node size size and and link link weight weight are are proportional proportional to to retweet retweet ac- ac- users and red nodes are Arabic users. Node size and link weight are proportional to retweet ac- tivity.tivity. (E) (E) Community Community structure structure among among Twitter Twitter users users sharing sharing the the hashtags hashtags#BBC#BBCandand#FoxNews#FoxNews. . tivity. (E) Community structure among Twitter users sharing the hashtags #BBC and #FoxNews. BlueBlue nodes nodes represent represent#BBC#BBCusers,users, red red nodes nodes are are#FoxNews#FoxNewsusers,users, and and users users who who have have used used both both Blue nodes represent #BBC users, red nodes are #FoxNews users, and users who have used both hashtagshashtags are are green. green. Node Node size size is is proportional proportional to to usage usage (tweet) (tweet) activity, activity, links links represent represent mutual mutual hashtags are green. Node size is proportional to usage (tweet) activity, links represent mutual followingfollowing relations. relations. following relations.

1010 10 Communities weakly trap simple contagions.

Communities strongly trap complex contagions. No concentration Strong concentration If our is correct, then we will see strong concentration. In complex contagion, The edges in the communities should transmit more information. What about simple contagion? Traversing probability of an edge from many events of simple contagion

~ that from many random walks

: transition matrix : transition matrix : transition matrix For simple contagion, we expect to see no difference between edges inside communities vs. ones between communities. In complex contagion, The edges in the communities should transmit more information. Then, why don’t we measure the concentration of memes and edge activities regarding communities? Provides data of both social networks and meme diffusion. A multiplex, - dependent network.

Following, RT, mention Tweets and Retweets

A B Following

If A follows B, B’s tweets and retweets will appear in A’s timeline tweets Oi, I’m at Oxford!

A retweets

B

follows C

If B retweets a tweet, this tweets show up in the timelines of B’s followers. tweets @B Oi!

A notification

B

If A mentions B (with @B), B gets a notification about the tweet. As an initial analysis, we constructed three networks separately. 120 million tweets (Mar 24 – Apr 25, 2012)

600k users, only reciprocal edges. Hashtag ~ Meme From 10 million hashtags, we pick only the ‘new’ hashtags (fewer than 20 tweets in the previous month). Two community detection methods

Infomap (Rosvall & Link clustering (Ahn, Bergstrom, 2008) Bagrow, Lehmann, 2010) Didn’t use edge weights when detecting communities. Table S1: Network statistics of retweet, mention, and follower networks used in the study. Node coverage measures the proportion of nodes that belong to communities with at least three nodes.

Network types Retweet Mention Follower Number of nodes 300,197 374,829 595,460 Number of edges 598,487 1,048,818 14,273,311 Avg. clustering coefficient 0.0902 0.1284 0.1972 Number of communities 14,144 14,222 6,360 InfoMap Node coverage 99.86% 99.72% 99.72% Number of communities 57,317 97,198 321,774 LinkComm Node coverage 48.42% 67.23% 47.62%

17 Are memes complex contagions? If it’s complex contagion, The edges inside communities should transmit more information. Two types of edges in the following graph.

Intra-edges: Inter-edges: For each type of edges, we measure the # of retweets and mentions through the edges. (A)

wRT w@ wRT w@

Figure 2: Meme concentration in communities. We show (A) community edge weight and (B) user community focus using box plots. Boxes cover 50% of data and whisker cover 95%. The line and triangle in a box represent the median and mean, respectively. Changes in meme concentration as a function of meme popularity are illustrated by plotting (C) meme usage dominance, (D) meme usage entropy, (E) meme adopter dominance, and (F) meme adopter entropy. Dominance and entropy ratios are averaged across hashtags in each popularity bin, with popularity defined as number of tweets T or adopters U; error bars indicate standard errors. Gray areas represent the ranges of popularity in which actual data exhibit weaker concentration than the baseline model M2. Similar results for different types of networks and community methods are described in Supplementary Information.

11

Fraction of interactions for each person (B)

f RT f @ f RT f @

Figure 2: Meme concentration in communities. We show (A) community edge weight and (B) user community focus using box plots. Boxes cover 50% of data and whisker cover 95%. The line and triangle in a box represent the median and mean, respectively. Changes in meme concentration as a function of meme popularity are illustrated by plotting (C) meme usage dominance, (D) meme usage entropy, (E) meme adopter dominance, and (F) meme adopter entropy. Dominance and entropy ratios are averaged across hashtags in each popularity bin, with popularity defined as number of tweets T or adopters U; error bars indicate standard errors. Gray areas represent the ranges of popularity in which actual data exhibit weaker concentration than the baseline model M2. Similar results for different types of networks and community methods are described in Supplementary Information.

11

Indeed, we see more activities within communities. How do we measure “concentration”? We need models to compare

Network Social Homophily effect reinforcement Random M1 sampling

Simple M2 O cascade

Social M3 O O reinforcement

M4 O O Homophily the proportion of tweets produced in the dominant community the proportion of tweets produced in the dominant community the proportion of users adopted in the dominant community the proportion of tweets produced in the dominant community the proportion of users adopted in the dominant community

Tweet entropy in terms of communities the proportion of tweets produced in the dominant community the proportion of users adopted in the dominant community

Tweet entropy in terms of communities

User entropy in terms of communities Normalize every one with M1

And use only the first 50 tweets. (A)

(Final # of tweets)

Figure 3: Meme concentration in communities. Changes in meme concentration as a function of meme popularity are illustrated by plotting relative (A) usage dominance,(B)adoption dominance,(C)usage entropy, and (D) adoption entropy. The relative dominance and entropy ratios are averaged across hashtags in each popularityFigure bin, with 3: popularity Meme concentration defined as number in communities. of tweets T or Changes adopters U in; error meme bars concentration indicate standard as a function of meme errors. Gray areaspopularity represent are the illustrated ranges of popularity by plotting in which relative actual (A) datausage exhibit dominance weaker,(B) concentrationadoption than dominance,(C)usage both baseline modelsentropy,M and3 and (D)Madoption4.Thee entropy↵ect of. The multiple relative social dominance reinforcement and entropy is estimated ratios are by averagedaverage across hashtags in exposures for everyeach popularity meme. The bin, exposures with popularity can be measured defined as in number terms of of (E) tweets tweetsT or or adopters (F) users.U; error Similar bars indicate standard results for di↵erenterrors. types Gray of networks areas represent and community the ranges methods of popularity are described in which in actual SI. data exhibit weaker concentration than both baseline models M and M .Thee↵ect of multiple social reinforcement is estimated by average 3 4 15 exposures for every meme. The exposures can be measured in terms of (E) tweets or (F) users. Similar results for di↵erent types of networks and community methods are described in SI. 15 (A)

Simple Contagion Baseline (Final # of tweets)

Figure 3: Meme concentration in communities. Changes in meme concentration as a function of meme popularity are illustrated by plotting relative (A) usage dominance,(B)adoption dominance,(C)usage entropy, and (D) adoption entropy. The relative dominance and entropy ratios are averaged across hashtags in each popularityFigure bin, with 3: popularity Meme concentration defined as number in communities. of tweets T or Changes adopters U in; error meme bars concentration indicate standard as a function of meme errors. Gray areaspopularity represent are the illustrated ranges of popularity by plotting in which relative actual (A) datausage exhibit dominance weaker,(B) concentrationadoption than dominance,(C)usage both baseline modelsentropy,M and3 and (D)Madoption4.Thee entropy↵ect of. The multiple relative social dominance reinforcement and entropy is estimated ratios are by averagedaverage across hashtags in exposures for everyeach popularity meme. The bin, exposures with popularity can be measured defined as in number terms of of (E) tweets tweetsT or or adopters (F) users.U; error Similar bars indicate standard results for di↵erenterrors. types Gray of networks areas represent and community the ranges methods of popularity are described in which in actual SI. data exhibit weaker concentration than both baseline models M and M .Thee↵ect of multiple social reinforcement is estimated by average 3 4 15 exposures for every meme. The exposures can be measured in terms of (E) tweets or (F) users. Similar results for di↵erent types of networks and community methods are described in SI. 15 Complex Contagion(A) Baselines

Simple Contagion Baseline (Final # of tweets)

Figure 3: Meme concentration in communities. Changes in meme concentration as a function of meme popularity are illustrated by plotting relative (A) usage dominance,(B)adoption dominance,(C)usage entropy, and (D) adoption entropy. The relative dominance and entropy ratios are averaged across hashtags in each popularityFigure bin, with 3: popularity Meme concentration defined as number in communities. of tweets T or Changes adopters U in; error meme bars concentration indicate standard as a function of meme errors. Gray areaspopularity represent are the illustrated ranges of popularity by plotting in which relative actual (A) datausage exhibit dominance weaker,(B) concentrationadoption than dominance,(C)usage both baseline modelsentropy,M and3 and (D)Madoption4.Thee entropy↵ect of. The multiple relative social dominance reinforcement and entropy is estimated ratios are by averagedaverage across hashtags in exposures for everyeach popularity meme. The bin, exposures with popularity can be measured defined as in number terms of of (E) tweets tweetsT or or adopters (F) users.U; error Similar bars indicate standard results for di↵erenterrors. types Gray of networks areas represent and community the ranges methods of popularity are described in which in actual SI. data exhibit weaker concentration than both baseline models M and M .Thee↵ect of multiple social reinforcement is estimated by average 3 4 15 exposures for every meme. The exposures can be measured in terms of (E) tweets or (F) users. Similar results for di↵erent types of networks and community methods are described in SI. 15 Complex Contagion(A) Baselines

Empirical Data

Simple Contagion Baseline (Final # of tweets)

Figure 3: Meme concentration in communities. Changes in meme concentration as a function of meme popularity are illustrated by plotting relative (A) usage dominance,(B)adoption dominance,(C)usage entropy, and (D) adoption entropy. The relative dominance and entropy ratios are averaged across hashtags in each popularityFigure bin, with 3: popularity Meme concentration defined as number in communities. of tweets T or Changes adopters U in; error meme bars concentration indicate standard as a function of meme errors. Gray areaspopularity represent are the illustrated ranges of popularity by plotting in which relative actual (A) datausage exhibit dominance weaker,(B) concentrationadoption than dominance,(C)usage both baseline modelsentropy,M and3 and (D)Madoption4.Thee entropy↵ect of. The multiple relative social dominance reinforcement and entropy is estimated ratios are by averagedaverage across hashtags in exposures for everyeach popularity meme. The bin, exposures with popularity can be measured defined as in number terms of of (E) tweets tweetsT or or adopters (F) users.U; error Similar bars indicate standard results for di↵erenterrors. types Gray of networks areas represent and community the ranges methods of popularity are described in which in actual SI. data exhibit weaker concentration than both baseline models M and M .Thee↵ect of multiple social reinforcement is estimated by average 3 4 15 exposures for every meme. The exposures can be measured in terms of (E) tweets or (F) users. Similar results for di↵erent types of networks and community methods are described in SI. 15 (A) (B)

(Final # of adopters)

Figure 3: Meme concentration in communities.Figure 3: Meme Changes concentration in meme concentration in communities. as a Changes function in of meme meme concentration as a function of meme popularity are illustrated by plottingpopularity relative (A) areusage illustrated dominance by plotting,(B)adoption relative dominance (A) usage,(C) dominanceusage,(B)adoption dominance,(C)usage entropy, and (D) adoption entropy. Theentropy relative, and dominance (D) adoption and entropy entropy ratios. The are relative averaged dominance across hashtags and entropy in ratios are averaged across hashtags in each popularity bin, with popularity definedeach popularity as number bin, of tweets with popularityT or adopters definedU; error as number bars indicate of tweets standardT or adopters U; error bars indicate standard errors. Gray areas represent the rangeserrors. of popularity Gray areas in which represent actual the data ranges exhibit of popularity weaker concentration in which actual than data exhibit weaker concentration than both baseline models M3 and M4.Theeboth baseline↵ect of multiple models M social3 and reinforcementM4.Thee↵ isect estimated of multiple by socialaverage reinforcement is estimated by average exposures for every meme. The exposuresexposures canfor be measured every meme. in terms The exposures of (E) tweets can or be (F) measured users. inSimilar terms of (E) tweets or (F) users. Similar results for di↵erent types of networksresults and community for di↵erent methods types areof networks described and in SI. community methods are described in SI. 15 15 (C) (D)

1e2 1e2

Figure 3: Meme concentration in communities. Changes in meme concentration as a function of meme popularity are illustrated by plotting relative (A) usage dominance,(B)adoption dominance,(C)usage entropy, and (D) adoption entropy. The relative dominance and entropy ratios are averaged across hashtags in each popularity bin, with popularity defined as number of tweets T or adopters U; error bars indicate standard errors. Gray areas represent the ranges of popularity in which actual data exhibit weaker concentration than both baseline models M3 and M4.Thee↵ect of multiple social reinforcement is estimated by average exposures for every meme. The exposures can be measured in terms of (E) tweets or (F) users. Similar results for di↵erent types of networks and community methods are described in SI. 15

Figure 3: Meme concentration in communities. Changes in meme concentration as a function of meme popularity are illustrated by plotting relative (A) usage dominance,(B)adoption dominance,(C)usage entropy, and (D) adoption entropy. The relative dominance and entropy ratios are averaged across hashtags in each popularity bin, with popularity defined as number of tweets T or adopters U; error bars indicate standard errors. Gray areas represent the ranges of popularity in which actual data exhibit weaker concentration than both baseline models M3 and M4.Thee↵ect of multiple social reinforcement is estimated by average exposures for every meme. The exposures can be measured in terms of (E) tweets or (F) users. Similar results for di↵erent types of networks and community methods are described in SI. Figure 3: Meme concentration in communities.15 Changes in meme concentration as a function of meme popularity are illustrated by plotting relative (A) usage dominance,(B)adoption dominance,(C)usage entropy, and (D) adoption entropy. The relative dominance and entropy ratios are averaged across hashtags in each popularity bin, with popularity defined as number of tweets T or adopters U; error bars indicate standard errors. Gray areas represent the ranges of popularity in which actual data exhibit weaker concentration than both baseline models M3 and M4.Thee↵ect of multiple social reinforcement is estimated by average exposures for every meme. The exposures can be measured in terms of (E) tweets or (F) users. Similar results for di↵erent types of networks and community methods are described in SI. 15 Simple Contagion Baseline

(C) (D)

1e2 1e2

Figure 3: Meme concentration in communities. Changes in meme concentration as a function of meme popularity are illustrated by plotting relative (A) usage dominance,(B)adoption dominance,(C)usage entropy, and (D) adoption entropy. The relative dominance and entropy ratios are averaged across hashtags in each popularity bin, with popularity defined as number of tweets T or adopters U; error bars indicate standard errors. Gray areas represent the ranges of popularity in which actual data exhibit weaker concentration than both baseline models M3 and M4.Thee↵ect of multiple social reinforcement is estimated by average exposures for every meme. The exposures can be measured in terms of (E) tweets or (F) users. Similar results for di↵erent types of networks and community methods are described in SI. 15

Figure 3: Meme concentration in communities. Changes in meme concentration as a function of meme popularity are illustrated by plotting relative (A) usage dominance,(B)adoption dominance,(C)usage entropy, and (D) adoption entropy. The relative dominance and entropy ratios are averaged across hashtags in each popularity bin, with popularity defined as number of tweets T or adopters U; error bars indicate standard errors. Gray areas represent the ranges of popularity in which actual data exhibit weaker concentration than both baseline models M3 and M4.Thee↵ect of multiple social reinforcement is estimated by average exposures for every meme. The exposures can be measured in terms of (E) tweets or (F) users. Similar results for di↵erent types of networks and community methods are described in SI. Figure 3: Meme concentration in communities.15 Changes in meme concentration as a function of meme popularity are illustrated by plotting relative (A) usage dominance,(B)adoption dominance,(C)usage entropy, and (D) adoption entropy. The relative dominance and entropy ratios are averaged across hashtags in each popularity bin, with popularity defined as number of tweets T or adopters U; error bars indicate standard errors. Gray areas represent the ranges of popularity in which actual data exhibit weaker concentration than both baseline models M3 and M4.Thee↵ect of multiple social reinforcement is estimated by average exposures for every meme. The exposures can be measured in terms of (E) tweets or (F) users. Similar results for di↵erent types of networks and community methods are described in SI. 15 Simple Contagion Baseline

(C) (D)

1e2 Complex Contagion1e2 Baselines

Figure 3: Meme concentration in communities. Changes in meme concentration as a function of meme popularity are illustrated by plotting relative (A) usage dominance,(B)adoption dominance,(C)usage entropy, and (D) adoption entropy. The relative dominance and entropy ratios are averaged across hashtags in each popularity bin, with popularity defined as number of tweets T or adopters U; error bars indicate standard errors. Gray areas represent the ranges of popularity in which actual data exhibit weaker concentration than both baseline models M3 and M4.Thee↵ect of multiple social reinforcement is estimated by average exposures for every meme. The exposures can be measured in terms of (E) tweets or (F) users. Similar results for di↵erent types of networks and community methods are described in SI. 15

Figure 3: Meme concentration in communities. Changes in meme concentration as a function of meme popularity are illustrated by plotting relative (A) usage dominance,(B)adoption dominance,(C)usage entropy, and (D) adoption entropy. The relative dominance and entropy ratios are averaged across hashtags in each popularity bin, with popularity defined as number of tweets T or adopters U; error bars indicate standard errors. Gray areas represent the ranges of popularity in which actual data exhibit weaker concentration than both baseline models M3 and M4.Thee↵ect of multiple social reinforcement is estimated by average exposures for every meme. The exposures can be measured in terms of (E) tweets or (F) users. Similar results for di↵erent types of networks and community methods are described in SI. Figure 3: Meme concentration in communities.15 Changes in meme concentration as a function of meme popularity are illustrated by plotting relative (A) usage dominance,(B)adoption dominance,(C)usage entropy, and (D) adoption entropy. The relative dominance and entropy ratios are averaged across hashtags in each popularity bin, with popularity defined as number of tweets T or adopters U; error bars indicate standard errors. Gray areas represent the ranges of popularity in which actual data exhibit weaker concentration than both baseline models M3 and M4.Thee↵ect of multiple social reinforcement is estimated by average exposures for every meme. The exposures can be measured in terms of (E) tweets or (F) users. Similar results for di↵erent types of networks and community methods are described in SI. 15 Simple Contagion Baseline

(C) (D)

Empirical Data

1e2 Complex Contagion1e2 Baselines

Figure 3: Meme concentration in communities. Changes in meme concentration as a function of meme popularity are illustrated by plotting relative (A) usage dominance,(B)adoption dominance,(C)usage entropy, and (D) adoption entropy. The relative dominance and entropy ratios are averaged across hashtags in each popularity bin, with popularity defined as number of tweets T or adopters U; error bars indicate standard errors. Gray areas represent the ranges of popularity in which actual data exhibit weaker concentration than both baseline models M3 and M4.Thee↵ect of multiple social reinforcement is estimated by average exposures for every meme. The exposures can be measured in terms of (E) tweets or (F) users. Similar results for di↵erent types of networks and community methods are described in SI. 15

Figure 3: Meme concentration in communities. Changes in meme concentration as a function of meme popularity are illustrated by plotting relative (A) usage dominance,(B)adoption dominance,(C)usage entropy, and (D) adoption entropy. The relative dominance and entropy ratios are averaged across hashtags in each popularity bin, with popularity defined as number of tweets T or adopters U; error bars indicate standard errors. Gray areas represent the ranges of popularity in which actual data exhibit weaker concentration than both baseline models M3 and M4.Thee↵ect of multiple social reinforcement is estimated by average exposures for every meme. The exposures can be measured in terms of (E) tweets or (F) users. Similar results for di↵erent types of networks and community methods are described in SI. Figure 3: Meme concentration in communities.15 Changes in meme concentration as a function of meme popularity are illustrated by plotting relative (A) usage dominance,(B)adoption dominance,(C)usage entropy, and (D) adoption entropy. The relative dominance and entropy ratios are averaged across hashtags in each popularity bin, with popularity defined as number of tweets T or adopters U; error bars indicate standard errors. Gray areas represent the ranges of popularity in which actual data exhibit weaker concentration than both baseline models M3 and M4.Thee↵ect of multiple social reinforcement is estimated by average exposures for every meme. The exposures can be measured in terms of (E) tweets or (F) users. Similar results for di↵erent types of networks and community methods are described in SI. 15 All memes are not equal. Unsuccessful memes behave like complex contagions

Viral memes behave like simple contagions Viral memes are literally viral. Simple contagion

(A) Structural Trapping Exposures

(D) Retweet Network Complex contagion

(A) Structural Trapping (B) Social Reinforcement (C) Homophily Multiple ExposuresExposures Multiple

Exposures A

Figure 1: The importance of community structure in the spreading of social contagions. High Clustering Low Clustering (A) Structural trapping: dense communities with few outgoing links naturally trap informa- (D) Retweet Network (E)(E) Follower Follower Network Network (D) Retweet Network tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) trigger multiple exposures to others (red nodes). In the presence of high clustering, any additional adoption is likely to produce more multiple exposures than in the case of low clustering, in- ducing cascades of additional adoptions. (C) Homophily: people in the same community (same color nodes) are more likely to be similar and to adopt the same ideas. (D) Diffusion structure based on retweets among Twitter users sharing the hashtag #USA. Blue nodes represent English users and red nodes are Arabic users. Node size and link weight are proportional to retweet ac- tivity. (E) Community structure among Twitter users sharing the hashtags #BBC and #FoxNews. Blue nodes represent #BBC users, red nodes are #FoxNews users, and users who have used both hashtags are green. Node size is proportional to usage (tweet) activity, links represent mutual following relations.

FigureFigure 1: 1: The The importance importance of of community community structure structure in in the the spreading spreading of of social social contagions. contagions. Figure(A) Structural 1: The trapping importance: dense of community communities structure with few in outgoing the spreading links naturally of social trap10 contagions. informa- (A)(A)StructuralStructural trapping trapping: dense: dense communities communities with with few few outgoing outgoing links links naturally naturally trap trap informa- informa- tiontion flow. flow. (B) (B)SocialSocial reinforcement reinforcement:: people people who who have have adopted adopted a ameme meme (black (black nodes) nodes) trigger trigger tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) trigger multiplemultiple exposures exposures to to others others (red (red nodes). nodes). In In the the presence presence of of high high clustering, clustering, any any additional additional multiple exposures to others (red nodes). In the presence of high clustering, any additional adoptionadoption is is likely likely to to produce produce more more multiple multiple exposures exposures than than in in the the case case of of low low clustering, clustering, in- in- adoption is likely to produce more multiple exposures than in the case of low clustering, in- ducingducing cascades cascades of of additional additional adoptions. adoptions. (C) (C)HomophilyHomophily: people: people in in the the same same community community (same (same ducing cascades of additional adoptions. (C) Homophily: people in the same community (same colorcolor nodes) nodes) are are more more likely likely to to be be similar similar and and to to adopt adopt the the same same ideas. ideas. (D) (D) Diffusion Diffusion structure structure color nodes) are more likely to be similar and to adopt the same ideas. (D) Diffusion structure basedbased on on retweets retweets among among Twitter Twitter users users sharing sharing the the hashtag hashtag#USA#USA. Blue. Blue nodes nodes represent represent English English based on retweets among Twitter users sharing the hashtag #USA. Blue nodes represent English usersusers and and red red nodes nodes are are Arabic Arabic users. users. Node Node size size and and link link weight weight are are proportional proportional to to retweet retweet ac- ac- users and red nodes are Arabic users. Node size and link weight are proportional to retweet ac- tivity.tivity. (E) (E) Community Community structure structure among among Twitter Twitter users users sharing sharing the the hashtags hashtags#BBC#BBCandand#FoxNews#FoxNews. . tivity. (E) Community structure among Twitter users sharing the hashtags #BBC and #FoxNews. BlueBlue nodes nodes represent represent#BBC#BBCusers,users, red red nodes nodes are are#FoxNews#FoxNewsusers,users, and and users users who who have have used used both both Blue nodes represent #BBC users, red nodes are #FoxNews users, and users who have used both hashtagshashtags are are green. green. Node Node size size is is proportional proportional to to usage usage (tweet) (tweet) activity, activity, links links represent represent mutual mutual hashtags are green. Node size is proportional to usage (tweet) activity, links represent mutual followingfollowing relations. relations. following relations.

1010 10 Viral memes

(A) Structural Trapping Exposures

(D) Retweet Network Complex contagion

(A) Structural Trapping (B) Social Reinforcement (C) Homophily Multiple ExposuresExposures Multiple

Exposures A

Figure 1: The importance of community structure in the spreading of social contagions. High Clustering Low Clustering (A) Structural trapping: dense communities with few outgoing links naturally trap informa- (D) Retweet Network (E)(E) Follower Follower Network Network (D) Retweet Network tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) trigger multiple exposures to others (red nodes). In the presence of high clustering, any additional adoption is likely to produce more multiple exposures than in the case of low clustering, in- ducing cascades of additional adoptions. (C) Homophily: people in the same community (same color nodes) are more likely to be similar and to adopt the same ideas. (D) Diffusion structure based on retweets among Twitter users sharing the hashtag #USA. Blue nodes represent English users and red nodes are Arabic users. Node size and link weight are proportional to retweet ac- tivity. (E) Community structure among Twitter users sharing the hashtags #BBC and #FoxNews. Blue nodes represent #BBC users, red nodes are #FoxNews users, and users who have used both hashtags are green. Node size is proportional to usage (tweet) activity, links represent mutual following relations.

FigureFigure 1: 1: The The importance importance of of community community structure structure in in the the spreading spreading of of social social contagions. contagions. Figure(A) Structural 1: The trapping importance: dense of community communities structure with few in outgoing the spreading links naturally of social trap10 contagions. informa- (A)(A)StructuralStructural trapping trapping: dense: dense communities communities with with few few outgoing outgoing links links naturally naturally trap trap informa- informa- tiontion flow. flow. (B) (B)SocialSocial reinforcement reinforcement:: people people who who have have adopted adopted a ameme meme (black (black nodes) nodes) trigger trigger tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) trigger multiplemultiple exposures exposures to to others others (red (red nodes). nodes). In In the the presence presence of of high high clustering, clustering, any any additional additional multiple exposures to others (red nodes). In the presence of high clustering, any additional adoptionadoption is is likely likely to to produce produce more more multiple multiple exposures exposures than than in in the the case case of of low low clustering, clustering, in- in- adoption is likely to produce more multiple exposures than in the case of low clustering, in- ducingducing cascades cascades of of additional additional adoptions. adoptions. (C) (C)HomophilyHomophily: people: people in in the the same same community community (same (same ducing cascades of additional adoptions. (C) Homophily: people in the same community (same colorcolor nodes) nodes) are are more more likely likely to to be be similar similar and and to to adopt adopt the the same same ideas. ideas. (D) (D) Diffusion Diffusion structure structure color nodes) are more likely to be similar and to adopt the same ideas. (D) Diffusion structure basedbased on on retweets retweets among among Twitter Twitter users users sharing sharing the the hashtag hashtag#USA#USA. Blue. Blue nodes nodes represent represent English English based on retweets among Twitter users sharing the hashtag #USA. Blue nodes represent English usersusers and and red red nodes nodes are are Arabic Arabic users. users. Node Node size size and and link link weight weight are are proportional proportional to to retweet retweet ac- ac- users and red nodes are Arabic users. Node size and link weight are proportional to retweet ac- tivity.tivity. (E) (E) Community Community structure structure among among Twitter Twitter users users sharing sharing the the hashtags hashtags#BBC#BBCandand#FoxNews#FoxNews. . tivity. (E) Community structure among Twitter users sharing the hashtags #BBC and #FoxNews. BlueBlue nodes nodes represent represent#BBC#BBCusers,users, red red nodes nodes are are#FoxNews#FoxNewsusers,users, and and users users who who have have used used both both Blue nodes represent #BBC users, red nodes are #FoxNews users, and users who have used both hashtagshashtags are are green. green. Node Node size size is is proportional proportional to to usage usage (tweet) (tweet) activity, activity, links links represent represent mutual mutual hashtags are green. Node size is proportional to usage (tweet) activity, links represent mutual followingfollowing relations. relations. following relations.

1010 10 Viral memes

(A) Structural Trapping Exposures

(D) Retweet Network Non-viral memes

(A) Structural Trapping (B) Social Reinforcement (C) Homophily Multiple ExposuresExposures Multiple

Exposures A

Figure 1: The importance of community structure in the spreading of social contagions. High Clustering Low Clustering (A) Structural trapping: dense communities with few outgoing links naturally trap informa- (D) Retweet Network (E)(E) Follower Follower Network Network (D) Retweet Network tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) trigger multiple exposures to others (red nodes). In the presence of high clustering, any additional adoption is likely to produce more multiple exposures than in the case of low clustering, in- ducing cascades of additional adoptions. (C) Homophily: people in the same community (same color nodes) are more likely to be similar and to adopt the same ideas. (D) Diffusion structure based on retweets among Twitter users sharing the hashtag #USA. Blue nodes represent English users and red nodes are Arabic users. Node size and link weight are proportional to retweet ac- tivity. (E) Community structure among Twitter users sharing the hashtags #BBC and #FoxNews. Blue nodes represent #BBC users, red nodes are #FoxNews users, and users who have used both hashtags are green. Node size is proportional to usage (tweet) activity, links represent mutual following relations.

FigureFigure 1: 1: The The importance importance of of community community structure structure in in the the spreading spreading of of social social contagions. contagions. Figure(A) Structural 1: The trapping importance: dense of community communities structure with few in outgoing the spreading links naturally of social trap10 contagions. informa- (A)(A)StructuralStructural trapping trapping: dense: dense communities communities with with few few outgoing outgoing links links naturally naturally trap trap informa- informa- tiontion flow. flow. (B) (B)SocialSocial reinforcement reinforcement:: people people who who have have adopted adopted a ameme meme (black (black nodes) nodes) trigger trigger tion flow. (B) Social reinforcement: people who have adopted a meme (black nodes) trigger multiplemultiple exposures exposures to to others others (red (red nodes). nodes). In In the the presence presence of of high high clustering, clustering, any any additional additional multiple exposures to others (red nodes). In the presence of high clustering, any additional adoptionadoption is is likely likely to to produce produce more more multiple multiple exposures exposures than than in in the the case case of of low low clustering, clustering, in- in- adoption is likely to produce more multiple exposures than in the case of low clustering, in- ducingducing cascades cascades of of additional additional adoptions. adoptions. (C) (C)HomophilyHomophily: people: people in in the the same same community community (same (same ducing cascades of additional adoptions. (C) Homophily: people in the same community (same colorcolor nodes) nodes) are are more more likely likely to to be be similar similar and and to to adopt adopt the the same same ideas. ideas. (D) (D) Diffusion Diffusion structure structure color nodes) are more likely to be similar and to adopt the same ideas. (D) Diffusion structure basedbased on on retweets retweets among among Twitter Twitter users users sharing sharing the the hashtag hashtag#USA#USA. Blue. Blue nodes nodes represent represent English English based on retweets among Twitter users sharing the hashtag #USA. Blue nodes represent English usersusers and and red red nodes nodes are are Arabic Arabic users. users. Node Node size size and and link link weight weight are are proportional proportional to to retweet retweet ac- ac- users and red nodes are Arabic users. Node size and link weight are proportional to retweet ac- tivity.tivity. (E) (E) Community Community structure structure among among Twitter Twitter users users sharing sharing the the hashtags hashtags#BBC#BBCandand#FoxNews#FoxNews. . tivity. (E) Community structure among Twitter users sharing the hashtags #BBC and #FoxNews. BlueBlue nodes nodes represent represent#BBC#BBCusers,users, red red nodes nodes are are#FoxNews#FoxNewsusers,users, and and users users who who have have used used both both Blue nodes represent #BBC users, red nodes are #FoxNews users, and users who have used both hashtagshashtags are are green. green. Node Node size size is is proportional proportional to to usage usage (tweet) (tweet) activity, activity, links links represent represent mutual mutual hashtags are green. Node size is proportional to usage (tweet) activity, links represent mutual followingfollowing relations. relations. following relations.

1010 10 Another perspective Each community

~ interest group Concentrated in one community

The meme only appeals to the population Distributed throughout many communities

The meme appeals to the general population

Viral memes are literally like viruses.

Viral memes are attractive to everyone. Then, can we use this information to predict viral memes? Task: Given the network structure and early tweets, predict the final popularity. Old New Less dominant More dominant

(A) #ThoughtsDuringSchool 200 tweets

30 tweets

Early Stage Late Stage

(B) #ProperBand 65 tweets 30 tweets

Early Stage Late Stage

Figure 4: of two contrasting memes (viral vs. non-viral) in terms of commu- nity structure. We represent each community as a node, whose size is proportional to the number of tweets produced by the community. The color of a community represents the time when the hashtag is first used in the community. (A) The evolution of a viral meme (#ThoughtsDuringSchool) from the early stage (30 tweets) to the late stage (200 tweets) of diffusion. (B) The evolution of a non-viral meme (#ProperBand) from the early stage (30 tweets) to the final stage (65 tweets).

13 Old New Less dominant More dominant

(A) #ThoughtsDuringSchool 200 tweets

30 tweets

Early Stage Late Stage

(B) #ProperBand 65 tweets 30 tweets

Early Stage Late Stage

Figure 4: Evolution of two contrasting memes (viral vs. non-viral) in terms of commu- nity structure. We represent each community as a node, whose size is proportional to the number of tweets produced by the community. The color of a community represents the time when the hashtag is first used in the community. (A) The evolution of a viral meme (#ThoughtsDuringSchool) from the early stage (30 tweets) to the late stage (200 tweets) of diffusion. (B) The evolution of a non-viral meme (#ProperBand) from the early stage (30 tweets) to the final stage (65 tweets).

13 With random forest classifier

Summary

• Communities give us invaluable information about spreading patterns of memes. • We can predict viral memes by looking at communities • Non-viral memes seems to be strongly affected by social reinforcement and homophily while viral memes are not. • Viral memes spread (literally) virally. Lilian Weng Fil Menczer

Virality Prediction and Community Structure in Social Networks [.org:1306.0158]