On Social Interaction Metrics
Total Page:16
File Type:pdf, Size:1020Kb
ON SOCIAL INTERACTION METRICS SOCIAL INTERACTION ON ABSTRACT The use of online social networks poses interes- necessarily have to be connected. Methods using ting big data challenges. With limited resources it the same data to identify and cluster different opi- is important to evaluate and prioritize interesting nions in online communities have been developed ON SOCIAL INTERACTION METRICS data. This thesis addresses the following aspects of and evaluated. SOCIAL NETWORK CRAWLING BASED ON social network analysis: efficient data collection, The privacy of the content produced and the social interaction evaluation and user privacy con- end-users’ private information provided in social INTERESTINGNESS cerns. networks is important to protect. Users should be It is possible to collect data from most online aware of the privacy-related consequence of pos- social networks via their open APIs. However, a ting in online social networks in terms of privacy. systematic and efficient collection of online social Therefore, mitigating privacy risks contributes to a networks data is still challenging. Results in this secure environment and methods to protect user thesis suggest that the collection time can be privacy are presented. reduced to 48% by prioritizing the collection of The proposed tool has, over the period of 20 posts. months, collected 38 millionposts from public pa- Fredrik Erlandsson Evaluation of social interactions requires data ges on Facebook which include, 4 billion likes and that covers all the interactions in a given domain. 340 million comments from 280 million users. The This has previously been difficult to do. In this the- data collection is, to the best of our knowledge, sis we propose a tool that is capable of extracting the largest research dataset of social interactions all social interactions from Facebook. With the ex- on Facebook, enabling research in the area of so- tracted data it is for instance possible to illustrate cial network analysis. interactions between different users that do not Fredrik Erlandsson Fredrik Blekinge Institute of Technology Licentiate Dissertation Series No. 2014:06 2014:06 ISSN 1650-2140 Department of Computer Science and Engineering 2014:06 ISBN: 978-91-7295-287-4 On social interaction metrics social network crawling based on interestingness Fredrik Erlandsson BlekingeBlekinge InstituteInstitute ofof Technology Technology Licentiatedoctoral disserDissertationtation series Series NNoo 2014:03 2014:06 Psychosocial,On social interaction Socio-Demographic metrics andsocial networkHealth crawling Determinants based on interestingness in Information Communication Technology Use of Older-Adult FredrikJessica Erlandsson Berner DoctoralLicentiate Disser Dissertationtation in in AppliedComputer Health Technology Science Department Departmentof Computer of Science Health and Engineering BlekingeBlekinge InstituteInstitute ofof TTechnologyechnology SWEDENSWEDEN 2014 Fredrik Erlandsson Department of Computer Science and Engineering Publisher: Blekinge Institute of Technology, SE-371 79 Karlskrona, Sweden Printed by Lenanders Grafiska, Kalmar, 2014 ISBN: 978-91-7295-287-4 ISSN 1650-2140 urn:nbn:se:bth-00596 ”A squirrel dying in front of your house may be more relevant to your interests right now than people dying in Africa.“ Mark Zukerberg, 2011. Abstract The use of online social networks poses interesting big data challenges. With limited resources it is important to evaluate and prioritize interesting data. This thesis addresses the following aspects of social network analysis: efficient data collection, social interaction evaluation and user privacy concerns. It is possible to collect data from most online social networks via their open APIs. However, a systematic and efficient collection of online social networks data is still challenging. Results in this thesis suggest that the collection time can be reduced to 48 % by prioritizing the collection of posts. Evaluation of social interactions requires data that covers all the interac- tions in a given domain. This has previously been difficult to do. In this thesis we propose a tool that is capable of extracting all social interactions from Facebook. With the extracted data it is for instance possible to illus- trate interactions between different users that do not necessarily have to be connected. Methods using the same data to identify and cluster different opinions in online communities have been developed and evaluated. The privacy of the content produced and the end-users’ private informa- tion provided in social networks is important to protect. Users should be aware of the privacy-related consequence of posting in online social networks in terms of privacy. Therefore, mitigating privacy risks contributes to a secure environment and methods to protect user privacy are presented. The proposed tool has, over the period of 20 months, collected 38 million posts from public pages on Facebook which include, 4 billion likes and 340 million comments from 280 million users. The data collection is, to the best of our knowledge, the largest research dataset of social interactions on Facebook, enabling research in the area of social network analysis. i Preface This thesis consists of four articles that have been submitted, peer reviewed and published in conferences. The thesis also contains a submitted journal article. The articles have been written together with other colleagues from Blekinge Institute of Technology and University of California Davis. The thesis material has appeared in the following publications: 1. Fredrik Erlandsson, Martin Boldt, Henric Johnson, ”Privacy Threats Related to User Profiling in Online Social Networks“, 2012 Interna- tional Conference on Privacy, Security, Risk and Trust (PASSAT), pp. 838–842, 2012. 2. Roozbeh Nia, Fredrik Erlandsson, Prantik Bhattacharyya, Mohammad Rezaur Rahman, Henric Johnson, S. Felix Wu, ”SIN: A Platform to Make Interactions in Social Networks Accessible“, 2012 International Conference on Social Informatics (SocialInformatics), pp. 205–214, 2012. 3. Fredrik Erlandsson, Roozbeh Nia, Henric Johnson, S. Felix Wu, ”Mak- ing social interactions accessible in online social networks“, Information Services and Use, pp. 113–118, 2013. 4. Teng Wang, Keith C. Wang, Fredrik Erlandsson, S. Felix Wu, Robert Faris, ”The Influence of Feedback with Different Opinions on User Continued Participation in Online Newsgroups“, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM ’13), pp. 388–395, 2013. 5. Fredrik Erlandsson, Martin Boldt, Henric Johnson, S. Felix Wu ”In- teraction metrics to support crawling prioritization in online social networks“, Submitted to Information Sciences June 2014. iii Publication1 deals with privacy issues identified by the authors, in which the thesis author is the main driver. Publications2 and4 are related as they form part of the motivation for the data collection process discussed in publication5. For both publications2 and4 the thesis author contributed with the dataset, experiment design and the development of the SINCERE search engine shown in 8.3.5. The thesis author were highly involved in the writing of publication2, together with the co-authors. Publication3 is a pre-study for publication5 with the thesis author as the main driver and contributor of the material. For publication5, the thesis author was the main driver, conducting and developing experiments and tools. The thesis author is also the principal driver of the writing, together with the senior co-authors. iv Contents Abstract................................. i Preface ................................. iii 1 Introduction1 2 Background5 2.1 Related Work........................... 6 2.2 Terminology............................ 7 3 Approach9 3.1 Aim ................................ 9 3.2 Scope ............................... 9 3.3 Research Questions........................ 9 3.4 Research Methodology...................... 11 4 Results 13 4.1 Contributions........................... 13 4.2 Discussion............................. 16 4.3 Conclusion ............................ 17 4.4 Future Work ........................... 18 4.5 References............................. 18 5 Privacy Threats Related to User Profiling in Online Social Networks 23 Fredrik Erlandsson, Martin Boldt, Henric Johnson 5.1 Introduction............................ 23 v 5.2 Privacy Threats.......................... 24 5.3 Proof-of-Concept......................... 28 5.4 Protection Mechanisms...................... 30 5.5 Conclusion ............................ 31 5.6 References............................. 32 6 SIN: A Platform to Make Interactions in Social Networks Accessible 35 Roozbeh Nia, Fredrik Erlandsson, Prantik Bhattacharyya, Moham- mad Rezaur Rahman, Henric Johnson, S. Felix Wu 6.1 Introduction............................ 35 6.2 Related Work........................... 39 6.3 Social Interactions Network................... 40 6.4 Applications............................ 42 6.5 SIN API.............................. 43 6.6 Security Issues and Implementation Challenges . 49 6.7 Evaluation............................. 50 6.8 Future Work ........................... 52 6.9 Acknowledgements........................ 52 6.10 References............................. 53 7 Making social interactions accessible in online social net- works 57 Fredrik Erlandsson, Roozbeh Nia, Henric Johnson, S. Felix Wu 7.1 Introduction............................ 57 7.2 Related Work........................... 58 7.3 A Platform to Make