Finding Influencers in Social Networks

Finding Influencers in Social Networks

Finding Influencers in Social Networks Carolina de Figueiredo Bento Dissertation submitted to obtain the Master Degree in Information Systems and Computer Engineering Jury President: Prof. Dr. Mario´ Jorge Costa Gaspar da Silva Supervisor: Prof. Dr. Bruno Emanuel da Grac¸a Martins Co-Supervisor: Prof. Dr. Pavel´ Pereira Calado Member: Prof. Dr. Alexandre Paulo Lourenc¸o Francisco November 2012 Abstract rom the millions of users that social platforms have, one can acknowledge that the activities of a Fselected number of users are more rapidly perceived and spread through the network, than those of others. These users are the influencers. They generate trends and shape opinions in social networks, being crucial in areas such as marketing or opinion mining. In my MSc thesis, I studied network analysis methods to identify influencers, experimenting with different types of networks, namely location-based networks from services like FourSquare or Twitter, that include relationships between users and between users and the locations they have visited, and academic citation networks, i.e., networks that relate scientific papers through citations. Within location-based networks I estimated the most influential nodes, through a set of network analysis techniques. Assessing the veracity of these results was done by comparison to traditional measures (e.g., the number of friends a user has) because there is no ground-truth list, i.e., a list containing a set of well known accepted influencers. The majority of the influencers are not the ones who have the highest number of friends. Within academic citation networks, the most influential papers identified were really considered impor- tant publications, due to being authored by renowned authors and recipients of important awards, being fundamental reading or recent developments on a topic. I also developed a framework to predict fu- ture influence scores and download counts through a combination of features. Accurate estimates were obtained through the use of learning methods such as the RT-Rank. Keywords: Social Networks, Network analysis, Impact Scores, Finding Influencers, i Resumo s servic¸os de social networking temˆ milhoes˜ de utilizadores contudo, percebemo-nos que a activi- O dade de um grupo selecto de utilizadores e´ mais rapidamente captada e propagada pela rede, do que a de outros. Chamamos a este grupo os influentes. Eles criam tendenciasˆ e dominam as opinioes˜ nas redes sociais, sendo cruciais em areas` como o marketing ou opinion mining. Na minha tese, estudei metodos´ de analise´ de redes para a identificar influentes, analizando dois tipos de redes, nomeadamente, redes baseadas na localizac¸ao,˜ provindas de servic¸os como o FourSquare ou o Twitter, que incluem relac¸oes˜ entre os utilizadores e entre estes e os locais que estes visitaram, e redes de citac¸oes˜ academicas,´ i.e., relacionando artigos cient´ıficos atraves´ de citac¸oes.˜ Em redes baseadas na localizac¸ao,˜ estimaram-se quais os nos´ mais influentes, atraves´ de um conjunto de tecnicas´ de analise´ de redes. A veracidade destes resultados foi aferida comparando medidas tradicionais (e.g., o numero´ de amigos de um utilizador) dado nao˜ existir uma lista de influentes para validac¸ao,˜ i.e., uma lista contendo um conjunto de influentes unanimemente reconhecidos. Em redes de citac¸oes˜ academicas,´ os artigos obtidos como mais influentes sao˜ realmente publicac¸oes˜ importantes, devido a serem da autoria de cientistas de renome galardoados passado, por serem publicac¸oes˜ essenciais ou desenvolvimentos recentes num topico´ espec´ıfico. Desenvolvi tambem´ uma framework que preveˆ futuros valores de influenciaˆ e o futuro total de downloads efectuados, combinando caracter´ısticas como valores de influenciaˆ anteriores. Atraves´ da utilizac¸ao˜ de metodos´ de aprendiza- gem com o RT-Rank, e´ poss´ıvel realizar estimativas precisas. Palavras-chave: Redes Sociais, Analise´ de Redes, Valores de Influencia,ˆ Encontrar Influentes iii Acknowledgments irst and foremost I have to thank my parents, sister and brother-in-law for the unconditional support Fand selflessness throughout these years, and specially during my MSc thesis. I must thank my advisors, Prof. Dr. Bruno Martins and Prof. Dr. Pavel´ Calado, for all the support, motivation, patience and availability. It is very comforting to be able to share ideas and openly discuss new ways of addressing a problem with such ease. Also, I must thank them for giving me the oppor- tunity of being part of projects, such as, the European Digital Mathematics Library (EuDML) and the Services for Intelligent Geographical Information Systems (SInteliGIS), both funded by the Portuguese Foundation for Science and Technology (FCT) through the project grants with reference 250503 in CIP- ICT-PSP.2009.2.4 and PTDC/EIA-EIA/109840/2009, respectively. I thank all the colleagues and close friends that have accompanied me throughout the years, and spe- cially, the ones who have filled these last couple of years with so much joy, laughter and camaraderie. So, to Ana Silva, Joao˜ Lobato Dias, Lu´ıs Santos, Joao˜ Amaro, Pedro Cruz, Jacqueline Jardim, Maria Rosa, Lu´ıs Luciano, Carlos Simoes,˜ Mafalda Abreu, Celia´ Tavares and, thankfully many others, I express my enormous gratitude for keeping me (in)sane. Last, but definitely not the least, I must thank my boyfriend, Joao˜ Fernandes, for the unconditional love, support, patience and confidence, for helping me being more creative and acute during the stressful times and for showing me there is always a light at the end of the tunnel. v Contents Abstract i Resumo iii Acknowledgments v 1 Introduction 1 1.1 Hypothesis and Methodology . .2 1.2 Main Contributions . .3 1.3 Organization of the Dissertation . .4 2 Fundamental Concepts5 2.1 Fundamental Concepts in Graph Theory . .5 2.2 Influencers in Social Networks . .7 2.3 Prestige, Popularity and Attention in Social Networks . .9 2.4 Recognition, Novelty, Homophily and Reciprocity . 10 2.5 Active versus Inactive Users, User Retention, Confounding, Social Influence and Social Correlation . 10 2.6 Information Cascades . 11 2.7 Information Diffusion Models and Measures . 12 2.8 Graph Centrality Measures and Bibliographic Indexes . 14 2.9 Unsupervised Rank Aggregation Approaches . 20 2.10 Supervised Learning for Rank Aggregation . 24 2.11 Summary . 26 vii 3 Related Work 27 3.1 The Hyperlinked Induced Topic Search (HITS) Algorithm . 27 3.2 The PageRank algorithm and its Variants . 28 3.2.1 Weighted PageRank . 30 3.2.2 Topic-Sensitive PageRank . 32 3.2.3 TwitterRank . 33 3.3 The Influence-Passivity (IP) Algorithm . 36 3.4 Citation and Co-Authorship Networks . 38 3.5 Temporal Issues in Ranking Scientific Articles . 40 3.6 Summary . 41 4 Finding Influencers in Social Networks 43 4.1 Available Resources for Finding Influencers . 44 4.1.1 Characterizing Networks . 44 4.2 Analysis of Location-based Social Networks . 45 4.2.1 Data Collection from Online Services . 47 4.2.2 Adaptation of the Influence-Passivity (IP) Algorithm . 49 4.3 Analysis of Academic Social Networks . 50 4.3.1 Predicting Future Influence Scores and Download Counts . 51 4.3.2 The Learning Approach . 54 4.4 Summary . 55 5 Validation Experiments 57 5.1 The Considered Datasets . 57 5.2 Evaluation Methodology . 60 5.3 The Obtained Results . 62 5.3.1 Finding Influencers . 63 5.3.2 Predicting Future PageRank Scores and Download Counts . 67 5.4 Summary . 70 viii 6 Conclusions 71 6.1 Summary of Results . 72 6.2 Future Work . 73 Bibliography 75 Apendices 83 A Important Awards in Computer Science 83 ix List of Tables 5.1 Characterization of the FourSquare and Twitter networks. 58 5.2 Characterization of the DBLP dataset. 59 5.3 Characterization of the DBLP network. 61 5.4 User influence scores for PageRank and HITS algorithms, for the User+Spot Graph, built from the FourSquare dataset. 63 5.5 User influence scores for PageRank and HITS algorithms, for the User Graph, built from the FourSquare dataset. 64 5.6 User influence scores for the IP algorithm, built from the FourSquare dataset. 64 5.7 Spot influence scores for PageRank and HITS algorithms (that present the exact same top-10), for the User+Spot Graph, built from the FourSquare dataset. 65 5.8 User influence scores for PageRank and HITS algorithms, for the User+Spot Graph, built from the Twitter dataset. 66 5.9 User influence scores for PageRank and HITS algorithms, for the User Graph, built from the Twitter dataset. 66 5.10 Spot influence scores for PageRank and HITS algorithms, for the User+Spot Graph, built from the Twitter dataset. 66 5.11 PageRank scores for top-10 highest ranked papers of the DBLP dataset. 67 5.12 Results for the prediction of impact PageRank scores for papers in the DBLP dataset. 68 5.13 Results for the prediction of download numbers for papers in the DBLP dataset. 69 xi List of Figures 2.1 A graph with the set of vertices V=f1; :::; 8g, the set of edges E=f(1; 2); (2; 4); (3; 4); :::g and encoding a path P with length 6 (adapted from (Diestel, 2005)). .7 2.2 Graph with three components and two SCC’s denoted by dashed lines (adapted from Easley & Kleinberg(2010) and Cormen et al. (2001)). .8 2.3 Flowchart for the Single Transferable Vote rule. 22 2.4 Learning-To-Rank (L2R) Framework (adapted from Liu(2009)). 24 3.5 A graph with hubs and authorities (adapted from Kleinberg(1998)). 28 3.6 A graph illustrating the computation of PageRank (adapted from Page et al. (1998)). 29 3.7 The general TwitterRank framework (adapted from Weng et al. (2010)). 34 4.8 Example of a location-based social network (adapted from Zheng & Zhou(2011)). 46 4.9 A sequence of subdivisions of the world sphere, starting from the octahedron, down to level 5 corresponding to 8192 spherical triangles.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    100 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us