<<

Analyzing the Impact of Events in an Online Community

Juan M. Tirado Daniel Higuero Florin Isaila Jes´us Carretero Carlos III University {jtirado,dhiguero,fisaila,jcarrete}@inf.uc3m.es

Abstract On-line social networking sites such as [21], Twit- The huge popularity of on-line social networking sites has in- ter [6] or YouTube [7] are currently among the most visited in the creased the likelihood that locally-relevant events propagate glob- . These sites integrate many Web 2.0 applications, which ally throughout the Web. Conversely, real world events captured as can be used in cooperation by communities build based on ex- digital content on the Web may influence the behavior of these dig- plicitly declared relationships such as friendships, follower, col- ital social communities. In this work we collected and analyzed league, fan etc. Given the huge popularity of these sites, the be- event-related data from LastFM, and global volume of searches havior of these communities is reflected throughout the Web. Re- from GoogleTrends, with the following objectives: (1) to study the versely, events from the real world reflected in the Web may influ- event mechanism provided by LastFM, (2) to evaluate the impact of ence the behavior of these digital communities. global and local events on system utilization, and (3) to understand One of the mechanisms that most of the social networking the event-related propagation of information over social links. sites offer is the declaration, sharing and of an event. We analyze the impact of LastFM events on the user activity The events that can be declared in advance are expected such as and interests. Our study indicates that half of LastFM events cause a concert, a political meeting, a music release, a party, etc. A an increase of the interest for an artist. However, several peaks category of events are unexpected (the sudden death of an artist, of popularity are not associated with LastFM events, while being an earthquake etc.) and are not reflected in the system before highly correlated with global volume of Internet searches provided their occurrence, unless there is some expectancy that they might by Trends. Finally, our analysis shows that the interest for happen. an artist appears to be disseminated over social links. We find out Understanding the effect of events on the community and on the that there are two factors likely to make a user influential over his system can bring several benefits. The intuition is that events have friends: the degree of interest and the number of social links. the potential of triggering activity (such as message exchange, ru- mor diffusion, content propagation) in an on-line social network Categories and Subject Descriptors H.3.5 [Online Information and even beyond it depending on the general interest and magni- Services]: Web-based services; J4 [Social and behavioral sci- tude. Events may be used as conveyors of advertising in viral mar- ences]: Sociology keting strategies [12, 14, 15]. On the other hand in an active social network, events may quickly influence users and cause increases General Terms Measurement in both traffic and server load. The knowledge of patterns of traf- fic produced by events can help service providers to forecast high Keywords Social Networks, Data Analysis, Last.FM, Google workloads and adequately distribute content or provision resources trends in order to improve user experience. In this paper we collected and analyzed data from LastFM [17], 1. Introduction a popular music Web site and Google Trends [11] for a period of 142 days. LastFM allows users to listen to selected artists, to Recent years have shown a spectacular surge in popularity of Web build bidirectional friendship relationships, to declare and share 2.0 based applications such as blogging, wikis, mashups, file shar- events such as concerts or music releases, and to post comments ing or social networking. One of the main reasons for this phe- on events. Google Trends is an indicator of the evolution of world nomenon is that these applications appear to reflect the behavior interests in certain topics given in number of searches. Our analysis and interests of users in the real world and allow participants to targets to answer the following questions. What impact do events intercommunicate in an efficient way. The ever increasing inter- declared in LastFM have on the behavior of the users and on the connectedness among the users makes Internet a ”global village” system? How does the local impact of events correlate with the in which trends, content, marketing and rumors propagate at high global impact? How can events be characterized? Have events any speed. impact on influencing behavior over social links? What are the shortcomings of the event mechanism in LastFM? How can the event mechanism in LastFM be improved? The remainder of this paper is structured as follows. Section 2 Permission to make digital or hard copies of all or part of this work for personal or presents the related work. Section 3 presents a summary of the data classroom use is granted without fee provided that copies are not made or distributed set and the results of our analysis. Finally, Section 4 concludes and for profit or commercial advantage and that copies bear this notice and the full citation presents future work. on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SNS ’11 April 10, 2011, Salzburg, Austria. Copyright c 2011 ACM 978-1-4503-0634-8/11/04. . . $10.00 2. Related work For selected artists, we used Google Trends to extract the search On-line social networking research has approached various aspects volume index for the name of the artist, daily for the period of our such as structural social network characteristics [1, 21], statistical trace. Google Trends [11] reports the share of a search term relative properties of content popularity [7, 19], growth and evolution of to the total number of searches done on Google over time. social graphs [5, 16, 20], network-level activity [22, 23]. While our study shares certain analyses and data collection methodology 3.2 Event impact with the cited works, it differs in its focus on characterization of In this section we evaluate the relevance of LastFM events based on events and their impact on workload and social interactions. the user participation in the system. The traces show that LastFM- Understanding the behavior of social networks has derived in registered events caused user activity. The average number of users different studies that propose algorithms to detect communities of announcing to attend an event of the 68 most popular artists was similar users. Community detection is achieved by different means 143. The play counts of songs of the involved artist on event’s such as analyzing the modularity [10, 29] between partitions of a day was 11,919. There was an average of 7.8 shouts per event and graph, or the overlapping cliques [9] in a graph. While these studies 1.3 reviews per event. Our results show weak correlation between are interested in discovering communities in a social network, our play counts and reviews, shouts and reviews, play counts and atten- approach differs in that we are not interested in finding commu- dance, and shouts and attendance. However, the number of reviews nities based only on social links, but taking also into account the and attendance are highly correlated (0.9), while play counts and profile of the users. shouts are moderately correlated (0.49). There are few works that examine events in social networks. Figure 1(a) shows the CDF of the ratio between the play counts Yin et al. [28] present an analysis of a particular event (Beijing on event day and the day before (pre-event impact) and Figure 1(b) 2008 Olympic games) for on Demand (VoD) workloads. the ratio between the day after and the day of the event (post-event They study user behavior patterns such as the emergence of flash impact) for the most popular 68 artists amounting to 13% of the crowds and the impact of advertising on video access patterns. play counts. On the event day 52% of the events appear not to have Becker et al. [13] present a method for clustering pictures by any impact on the system. The remaining 48% increase the play correlating Flickr upcoming events with LastFM events. In turn, we count 8.9% on average. In the day after, 50% of the events generate want to extend the understanding of event impact (1) by studying a similar average increase of 8.3%. the correlation between local social network events and global This lack of impact for the events is particularly interesting. interest (e.g. Google Trends) and (2) by analyzing the effect of Events in LastFM can be included in the system without any re- events on content utilization and propagation through social links. liability or relevance control. Although it is possible to notify prob- Several works have investigated patterns of dynamics of user lems related with duplicated or canceled events, there is no way to interaction and behavior in on-line social networks [4, 25, 27]. determine the reliability or relevance of an event. Some studies focus on the role of social links on the propagation of information and influence [8, 26]. Bakshy et al. [3] analyze contagion in Second Life on-line game by tracking the transfers 3.3 Impact of events on different popularity evolution of online artifacts over 130 days. They observe that 48% of all patterns transfers occur along social links. Last section has shown that LastFM events are qualitatively differ- ent in terms of impact. In this section, we complement the LastFM data with information extracted from Google Trends, reflecting the 3. Data set and Event characterization global interest in an artist based on the volume of Internet searches. In this section we provide a characterization of the events based on For this analysis, we filtered relevant LastFM events producing the collected data. First, we describe the methodology and provide an increase of 10% in the play count in two consecutive days general information of the data set. Second, we investigate the im- including the day of the event. Furthermore, from the 68 most pact of LastFM events based on the effect on the play count. Third, popular artists, we selected four artists with relevant popularity we evaluate the efficiency of LastFM events, by comparing explicit evolution patterns. Figure 2 shows the evolution of LastFM play events from LastFM and events inferred from Google Trends for counts, Google Trends volume search and related LastFM events. different patterns of popularity evolution. Finally, we study how Figure 2(a) shows a fast increase fast decrease pattern of play events may cause users to influence each other through social links. count evolution for ”Blink 182”. The LastFM play count is strongly correlated with Google Trends volume search index (0.91). How- th 3.1 Data set ever, the surge in popularity on February 9 , one day after the 1 group reunited for the Grammy Awards ceremony, is not repre- We crawled LastFM data analyzed in this paper through LastFM sented by any LastFM event. This event can be inferred from API by using a distributed crawler deployed in cluster over 20 Google Trends spike, which accurately corresponds to the surge machines. We traversed the friendship graph in a breadth first in the LastFM play counts. This event does not produce a consoli- search manner and extracted the profiles of a set of 250,000 users dated interest and both indicators show a fast decay. Events on May including the listened artists, daily for the period between January 15th 18th 1st 22nd and , posted in LastFM, can be inferred from play counts to May , 2009 (142 days). The extracted social graph has and Google Trends, but they produce a relative small increase com- an average degree of 14.9, a diameter of 8, an average path length pared to the earlier spike. of 4.37, and an average clustering coefficient of 0.17. Lily Allen’s popularity evolution plotted in Figure 2(b) shows The total number of artists users listened to was 2,390,970, a sluggish increase in the first days of observations when a new amounting to a total play count of 780,579,318. For a set of 68 album was promoted in different performances followed by a four most popular artists, representing 13% of the total observed play times surge coinciding with the launch of the album on February, count, we extracted the following attributes of all associated 1241 9th. In this case, the play count increases due to performances and events registered in LastFM: date, number of announced partici- an album release matched by LastFM events. Additionally, we can pants, number of comments (shouts), and number of reviews. also observe a strong correlation (0.86) between the play count in LastFM and the search volume index of Google Trends. After the 1 Traces from Last.FM are available upon request to the authors peak, the play count volume decreases slowly, with the particularity CDF CDF 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

0.8 1.0 1.2 1.4 1.6 1.8 0.8 1.0 1.2 1.4 1.6 1.8

Ratio of increase in the playcount Ratio of increase in the playcount

(a) Pre-event impact. (b) Post-event impact.

Figure 1. Impact of LastFM events on the play counts. 40 30 Trend Trend Playcount Playcount 20 10 5000 10000 20000 30000 5000 15000 25000 35000 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

01/01 01/22 02/12 03/05 03/26 04/16 05/07 01/01 01/22 02/12 03/05 03/26 04/16 05/07

(a) Fast increase - fast decrease (Artist: Blink 182). (b) Fast increase - slow decrease (Artist: Lily Allen). Trend Trend Playcount Playcount 50 100 150 200 4000 6000 8000 10000 5000 10000 15000 20000 246810

01/01 01/22 02/12 03/05 03/26 04/16 05/07 01/01 01/22 02/12 03/05 03/26 04/16 05/07

(c) Slow increase (Artist: Lady GaGa). (d) Slow decrease (Artist: Fall Out Boy).

Figure 2. Examples of four popularity evolution patterns. The continuous line indicates the LastFM play count and the dashed line indicates the Google Trends search volume index. Events from LastFM are plotted using vertical lines. that the mean play count is double after the launch of the new album posted in the system. As in previous cases, play count is strongly compared with the play count volume before. correlated with Google Trends search volume (0.74). Play count evolution also reflects how artists gain or lose popu- Figure 2(d) shows the play count evolution of ”Fall Out Boy”. In larity as they become famous. Figure 2(c) shows the slow increase this case, we can clearly observe a slowly decreasing trend, which of play count for ”Lady GaGa” as she becomes popular. The posi- shows how the artist becomes unpopular over time. The negative tive trend has a local maximum after events related with her Euro- trend starts with a play count volume of more than 10,000 per pean Tour (February 2nd − 13th). The local peak is not matched day, and ends at roughly 6,000. The last album of the artist was by any LastFM event, while the events on the European tour are presented on December, 12th, 2008. After the release of the new album, no major events were registered in the system until May, 3rd, when a United States tour starts. The list of LastFM events shown in the figure corresponds to the different major cities toured by the band, but unlike other cases, while the system registers minor peaks in play count volume, the negative trend continues. How the artist loses popularity is also reflected in Google Trends (2.8 times less search volume), matching the trend observed in LastFM (correlation 0.69).

3.4 Influence propagation over social links CDF In this section, we want to understand how events may affect in- fluence propagation over social links. For this analysis, we have Influential users (pre-event) selected for each of the four artists from Figure 2 the events as- Non influential users (pre-event) Influential users (post-event) sociated with the highest increase in play counts. The selected Non influential users (post-event) events correspond either to LastFM-declared events or events in- 0.2 0.4 0.6 0.8 1.0 1 2 5 10 20 50 100 200 500 ferred from correlated peaks of LastFM play count and Google User playcount Trends global volume search index (e.g. the peaks for Blink-182 from Figure 2(a) without an associated LastFM event). For the an- (a) Play count of potential influential and non-influential users alyzed events, we select the users listening to the associated artist on day D1, where D1 is either the pre-event day (pre-event in- and the play count for three days: pre-event day, day of the event fluence) or event day (post-event influence). Potential influen- and post-event day. For the users from this set, we define a user to tial users show a higher activity than non-influential users on D1 day. be active if: (1) the user listens to an artist two consecutive days D1 and D2 (2) D1 is the day of the event (post-event active) or D2 is the day of the event (pre-event active). We classify the active users in potential influential over other users with respect to an event if the number of friends listening to the artist on day D2 is greater than on day D1, and non-influential otherwise. A potential influential user has pre-event influence if D2 is the day of the event and post-event influence if D1 is the day of the event. A user that may have been influenced by others is CDF called infected. We use the attribute potential in order to point out that there can be other factors, which can affect the behavior of Influential users (pre-event) a friend listening to the same artist in the next day. One of these Non influential users (pre-event) 0.2 0.4 0.6 0.8 1.0 Influential users (post-event) factors is homophily, which is defined as the tendency of people Non influential users (post-event) with similar tastes to behave similarly. Influence and homophily have been shown to be hard to distinguish when present [2, 24]. 1 2 5 10 20 50 100 200 However, the absence of homophily between two users listening Average playcount of user friends to the same artist in consecutive days, might indicate a higher (b) Average play count of friends of potential influential and probability of influence propagation over the social links. non-influential users on day D2, where D2 is either event day Table 1 shows that from 2759 pre-event active users 32% may (pre-event influence) or post-event day (post-event influence). be influential over 5970 of their friends. The number of post- Friends of potential influential users show a higher activity event active users increases to 3475, out of which 24% may be compared to friends of non-influential users. influential over 5116 friends. A number of 1516 are both pre-event and post-event active users. A number of 108 pre-event infected users become post-event active, out of which 16 become post-event influential, a result suggesting the propagation of influence over the social cascade.

Pre-event Post-event

Active users 2759 3475 CDF Potential Influential 894 822 Non-influential 1865 2653 Influential users (pre-event) Infected 5970 5116 Non influential users (pre-event) Influential users (post-event) Non influential users (post-event) 0.0 0.2 0.4 0.6 0.8 1.0 Table 1. Active and potential influential users. Active users are 1 2 5 10 20 50 100 200 500 users listening to an artist two consecutive days, including the Total number of friends day of the event. Potential influential users are active users, whose (c) Social degree distribution of potential influential and non- number of friends listening to an artist increments the second day. influential users for both pre-event and post-event influence. Potential influential users have a significant higher degree than First, we want to understand if there is any factor that makes non-influential users. an active user become or not influential over other users on day D2. Figure 3. Analysis of pre-event and post-event user influence. Figure 3(a) shows the CDF of the play count of potential influential and non-influential users during D1 day for two cases: D1 is the pre-event day (pre-event influence), and D1 is the event day (post event influence). Potential influential users show a higher play count both days. The play count for non influential users slightly increases the day of the event, but it shows a significant difference with potential influential users. On the day before the event, 50% of the potential influential users listen to less than 3 songs (with an average of 13.05 songs), while one half of the non influential users listens to less than 2 songs (average 8.883). For the event day the CDF play count increases in both cases. Potential influential users listen to less than 5 songs in 50% of the cases (average 18.97), while 50% Blink-182 Lady GaGa of non influential users listen to less than 3 songs (average 10.43). Fall Out Boy Lily Allen On average, potential influential users listen to 68% and 54% more 0.0 0.2 0.4 0.6 0.8 1.0 songs than non influential users the day before the event and the 0.0 0.2 0.4 0.6 0.8 1.0 event day, respectively. suiuj Second, we study the effect that active users have on their s(u ,u ) friends on day D2. Figure 3(b) illustrates the average play count for Figure 4. i j for potential influential users the friends of potential influential and non-influential users. Friends of potential influential users listen on average to more songs than friends of non-influential users on event day: 14.65 songs (median: to differentiate between two cases: do we identify an influence 7.889) versus 11.439 songs (median: 4.536) for friends of non- propagated over the social link? or was the social link created influential users. On post-event day the average play count value because the two users share common interests (homophily)? We for friends of both potential influential and non-influential users is propose a probabilistic interpretation, which associates a higher similar (the median differs: 9.102 for potential influential and 6 for certainty of influence propagations to the pairs sharing less content, non-influential users). This is due to the fact that, even though the and a higher certainty of homophily to the pairs sharing more aggregate play count increases, the number of friends of potential content. influential users listening to the same artist also increases; for the non-influential users this number remains constant. 4. Conclusions and future work Third, we investigate how the social degree of active users might indicate the proclivity of a user to become potential influential or The analysis performed in this work suggests that events are corre- non-influential. Figure 3(c) shows that potential influential users lated with activity in LastFM, irrespective if they are or not posted have a significant higher number of friends (median 58, mean 75.6) in the local scope of the system. than non-influential users (median 24, mean 39.55). The results First, we have observed that only around 50% of the events suggest that users with a high social degree are more likely to posted in the system are associated with an increase in the play influence their friends than users with a lower social degree. count of the involved artist larger than 10%, while the other 50% Finally, we analyze the pairs of friends classified by the pre- did not appear to have any impact. On the other hand, several peaks vious analysis as potential influential and infected users, in order of popularity were not associated with any LastFM event, reducing to estimate if the observed behavior might be a result of influence the possibility of load prediction. propagation or homophily [18]. We base this analysis on a met- Second, our study shows that there is a high correlation between ric computing the distance between user consumption profiles. The the LastFM play count per artist and the volume of searches in Google Trends. This fact indicates that local LastFM activity re- consumption profile P (ui,dk) is the set of tags associated to artists flects the global interest on artists. This result suggests that the lack listened by user ui on day dk. We calculate s(ui,uj ), a content of user-declared LastFM events may be compensated by events in- sharing coefficient between users ui and uj over a period of D days before the potential influence was observed, as the average over D ferred from the global interest for an artist. This approach can be used both for advertising inside the system or predicting system of Jaccard similarity coefficient for user day profiles P (ui,dk) and load. P (uj ,dk). If two users are listening daily to the very same set of artists the value of the sharing coefficient will be 1, while if they Third, we investigate the influence of events on the propagation are listening to completely different artists it will be 0. The formula of information through social links. Our analysis shows that inter- of s is given by Equation 1. est for an artist appears to be disseminated over social links. We identify a set of active users, which listen to an artist both the day X before the event and the day of the event. Our analysis suggests 1 |P (ui,dk) ∩ P (uj ,dk)| s(ui,uj )= (1) that there are two factors likely to make an active user potentially |D| |P (ui,dk) ∪ P (uj ,dk)| d ∈D influential or non-influential. Influential users listen on average to k more songs than non-influential users in the day before event: 13 Figure 4 shows a CDF with the values of s(ui,uj ),whereui versus 9. Influential users have on average more friends than non- are potential influential users, uj the potentially infected user, and influential users: 76 versus 40. Our results also show that 16 in- D one week before an event occurs. We observe that the pairs of fected users become influential for their friends in the day after the potential influential-infected users listening to Blink-182 appear to event, suggesting the existence of a social cascade. In order to con- share less similar content than the users listening to all the other firm this hypothesis, we measure the level of homophily as the ratio artists, whereas the pairs listening to Lady Gaga share more content of shared tags between potential influential and potentially infected than the others. This indicates that the probability that a friend users for the period of our trace. Our study shows that the sharing was infected over a social link is larger for Blink-182 and smaller coefficient remains low (under 0.4) in most cases. This suggests for Lady Gaga. For example, for Blink-182 around 20% of pairs that with a larger certainty the potential users can be classified as do not share any tag 7 days before the event. This observation influential. However, in some cases the sharing coefficient can be suggests that listening to the artist by a potentially infected user as high as 1. We propose a probabilistic interpretation, associating might be the result of a influence propagating over a social link. high certainty of influence propagation with low sharing coefficient In the case of Lady Gaga, around 20% of the pairs have a sharing values and high certainty of homophily with high sharing coeffi- coefficient larger than 0.4. For these pairs of users it is complicated cient values. Finally, we make some observations and suggestions related to [9] N. Du, B. Wang, and B. Wu. Community detection in complex the LastFM event mechanism. In LastFM, users can post events networks. Journal of Science and Technology, 23(4):672– with a limited number of attributes (for instance an event can be 683, 2008. only a concert or a festival), making the automatic classification of [10] J. Duch and A. Arenas. Community detection in complex networks events difficult. Additionally there is no validation and reputation using extremal optimization. Phys.Rev.E, 72(2):027104, Aug 2005. mechanism, which should allow users to evaluate the relevance of [11] Google Trends. http://www.google.com/trends, 2009. an event. We suggest three improvements in the LastFM event man- [12] J. Hartline, V. Mirrokni, and M. Sundararajan. Optimal marketing agement system. First, a broader range of attributes should allow strategies over social networks. In Proceeding of the 17th interna- the classification of events into live performances, disc releases, tional conference on World Wide Web. ACM New York, NY, USA, TV appearances, etc. Second, we suggest a collaborative event val- 2008. idation mechanism including a reliability system to help users to [13] L. G. Hila Becker, Mor Naaman. Event identification in social media. decide by vote if an event is relevant or reliable or whether a user In ACM SIGMOD Workshop on the Web and Databases (WebDB ’09), is to be trusted. Third, we recommend a mechanism for semiauto- 2009. matic addition of events, by monitoring the global interest based on [14] D. Kempe, J. Kleinberg, and E.´ Tardos. Maximizing the spread of services such as Google Trends. This is likely to increase both user influence through a social network. pages 137–146. ACM New York, satisfaction and system popularity. NY, USA, 2003. We would like to extend this work in various directions. First, [15] J. Kleinberg. Cascading behavior in networks: Algorithmic and eco- we would like to propose mechanisms for inferring events from nomic issues. 2007. the activity and interests reflected either in related sites or globally [16] R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online in the Internet. Second, we are interested in studying the relation- social networks. In KDD ’06: Proceedings of the 12th ACM SIGKDD ship between events and interaction patterns in LastFM. Third, we international conference on Knowledge discovery and data mining, would like to investigate how events can be leveraged for predicting pages 611–617, New York, NY, USA, 2006. ACM. activity in the system, in order to adequately provision resources [17] Last.FM. http://www.last.fm, 2009. during peak loads. Fourth, we wish to make a comparative study [18] M. McPherson, L. Smith-Lovin, and J. Cook. Birds of a feather: among different systems to extend our conclusions to a larger sam- Homophily in social networks. Annual review of sociology, 27:415– ple of users and systems. 444, 2001. [19] Y. G. Michael Zink, Kyoungwon Suh and J. Kurose. Watch global, Acknowledgements cache local: Youtube network traces at a campus network - measure- ments and implications. In IEEE MMCN, 2008. This research has been partially funded by the Spanish Ministry [20] A. Mislove, H. S. Koppula, K. P. Gummadi, P. Druschel, and B. Bhat- of Education under FPU program AP2007-03530 (Juan M. Tirado) tacharjee. Growth of the flickr social network. In WOSP ’08: Proceed- and by the Spanish Ministry of Science and Innovation under the ings of the first workshop on Online social networks, pages 25–30, project TIN2010-16497. New York, NY, USA, 2008. ACM. [21] A. Mislove, M. Marcon, K. Gummadi, P. Druschel, and B. Bhattachar- jee. Measurement and analysis of online social networks. In Proceed- References ings of the 7th ACM SIGCOMM conference on Internet measurement, [1] Y.-Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong. Analysis of pages 42–53. ACM, 2007. topological characteristics of huge online social networking services. [22] A. Nazir, S. Raza, D. Gupta, C.-N. Chuah, and B. Krishnamurthy. Net- In WWW ’07: Proceedings of the 16th international conference on work level footprints of facebook applications. In IMC ’09: Proceed- World Wide Web, pages 835–844, New York, NY, USA, 2007. ACM. ings of the 9th ACM SIGCOMM conference on Internet measurement [2] S. Aral, L. Muchnik, and A. Sundararajan. Distinguishing conference, pages 63–75, New York, NY, USA, 2009. ACM. influence-based contagion from homophily-driven diffusion in dy- [23] F. Schneider, A. Feldmann, B. Krishnamurthy, and W. Willinger. Un- namic networks. Proceedings of the National Academy of Sciences, derstanding online social network usage from a network perspective. 106(51):21544, 2009. In IMC ’09: Proceedings of the 9th ACM SIGCOMM conference on [3] E. Bakshy, B. Karrer, and L. Adamic. Social influence and the diffu- Internet measurement conference, pages 35–48, New York, NY, USA, sion of user-created content. Proceedings of the tenth ACM conference 2009. ACM. on Electronic commerce, pages 325–334, 2009. [24] C. R. Shalizi and A. C. Thomas. Homophily and contagion are [4] F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida. Characterizing generically confounded in observational social network studies. arXiv, user behavior in online social networks. In IMC ’09: Proceedings of 2010. the 9th ACM SIGCOMM conference on Internet measurement confer- [25] B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. On the ence, pages 49–62, New York, NY, USA, 2009. ACM. evolution of user interaction in facebook. In WOSN ’09: Proceedings [5] C. Canali, M. Colajanni, and R. Lancellotti. Characteristics and of the 2nd ACM workshop on Online social networks, pages 37–42, evolution of content popularity and user relations in social networks. New York, NY, USA, 2009. ACM. In and Communications (ISCC), 2010 IEEE Symposium [26] J. Weng, E. Lim, J. Jiang, and Q. He. TwitterRank: Finding Topic- on, pages 1 –7, jun. 2010. sensitive Influential Twitterers. International Conference on Web [6] M. Cha, H. Haddadi, F. Benevenuto, and K. P. Gummadi. Measuring Search and Data Mining, 2010. User Influence in Twitter: The Million Follower Fallacy. In In Pro- [27] C. Wilson, B. Boe, A. Sala, K. Puttaswamy, and B. Zhao. User ceedings of the 4th International AAAI Conference on Weblogs and interactions in social networks and their implications. In Proceedings Social Media (ICWSM). of the 4th ACM European conference on Computer systems, pages 205–218. ACM, 2009. [7] M. Cha, H. Kwak, P. Rodriguez, Y. Ahn, and S. Moon. I tube, you tube, everybody tubes: analyzing the world’s largest user generated content [28] H. Yin, X. Liu, F. Qiu, N. Xia, C. Lin, H. Zhang, V. Sekar, and G. Min. video system. In Proceedings of the 7th ACM SIGCOMM conference Inside the bird’s nest: Measurements of large-scale live vod from the on Internet measurement, page 14. ACM, 2007. 2008 olympics. In IMC’09, 2009. [8] M. Cha, A. Mislove, B. Adams, and K. Gummadi. Characterizing [29] X. S. Zhang, R. S. Wang, Y. Wang, J. Wang, Y. Qiu, L. Wang, and social cascades in Flickr. Proceedings of the first workshop on Online L. Chen. Modularity optimization in community detection of complex social networks, pages 13–18, 2008. networks. EPL (Europhysics Letters), 87(3):38002, 2009.