From Information Cascade to Knowledge Transfer:Predictive Analyses on Social Networks

Rochester Institute of Technology RIT Scholar Works Theses 7-2016 From Information Cascade to Knowledge Transfer:Predictive Analyses on Social Networks Biru Cui [email protected] Follow this and additional works at: https://scholarworks.rit.edu/theses Recommended Citation Cui, Biru, "From Information Cascade to Knowledge Transfer:Predictive Analyses on Social Networks" (2016). Thesis. Rochester Institute of Technology. Accessed from This Dissertation is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. From Information Cascade to Knowledge Transfer:Predictive Analyses on Social Networks by Biru Cui A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computing and Information Sciences PhD Program in Computing and Information Sciences B. Thomas Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, New York July 2016 From Information Cascade to Knowledge Transfer:Predictive Analyses on Social Networks by Biru Cui Committee Approval: We, the undersigned committee members, certify that we have advised and/or supervised the candidate on the work described in this dissertation. We further certify that we have reviewed the dissertation manuscript and approve it in partial fulfillment of the requirements of the degree of Doctor of Philosophy in Computing and Information Sciences. ______________________________________________________________________________ Dr. Shanchieh Jay Yang Date Dissertation Advisor ______________________________________________________________________________ Dr. Andres Kwasinski Date Dissertation Committee Member ______________________________________________________________________________ Dr. Christopher Homan Date Dissertation Committee Member ______________________________________________________________________________ Dr. Linwei Wang Date Dissertation Committee Member ______________________________________________________________________________ Dr. Victor Perotti Date Dissertation Committee Member Certified by: ______________________________________________________________________________ Dr. Pengcheng Shi Date Director, Computing and Information Sciences Acknowledgments When I start writing this paragraph, the memory refreshes and reminds me the moments happened in these past years. It is a long and short journey, a journey with puzzle, confusion, frustration, starting over; a journey with friendship, encourage- ment, happiness; and a journey will always remind me that be positive and don’t give up. It is much appreciated to have the guidance during the process of whole study, my supervisor, my mentors and my friends: Dr. Shanchieh Jay Yang, Dr. Andres Kwasinski, Dr. Linwei Wang, Dr. Chris Homan, Dr. Justin Domke, Dr. Pengcheng Shi, Richard Tolleson, Charles Gruener, Emilio Del Plato, Haitao Du, Wenbo Wang. I cannot achieve this without your support and help. Finally, I would like to dedicate this dissertation to my parents, my wife, my sister, my nephew, my family. You are the reason I’m able to keep moving forward. iii Abstract From Information Cascade to Knowledge Transfer: Predictive Analyses on Social Networks Publication No. Biru Cui, Ph.D. Rochester Institute of Technology, 2016 Supervisor: Dr. Shanchieh Jay Yang As social media continues to influence our daily life, much research has focused on analyzing characteristics of social networks and tracking how information flows in social media. Information cascade originated from the study of information diffusion which focused on how decision making is affected by others depending on the network structure. An example of such study is the SIR (Susceptible, Infected, Removed) model. The current research on information cascade mainly focuses on three open questions: diffusion model, network inference, and influence maximization. Different from these studies, this dissertation aims at deriving a better understanding to the problem of who will transfer information to whom. Particularly, we want to investigate how knowledge is transferred in social media. The process of transferring knowledge is similar to the information cascade observed in other social networks in the way that both processes transfer particular iv information from information container to users who do not have the information. The study first works on understanding information cascade in term of detecting information outbreak in Twitter and the factors affecting the cascades. Then we analyze how knowledge is transferred in the sense of adopting research topic among scholars in the DBLP network. However, the knowledge transfer is not able to be well modeled by scholars’ publications since a “publication” action is a result of many complicated factors which is not controlled by the knowledge transfer only. So, we turn to Q&A forum, a different type of social media that explicitly con- tain the process of transferring knowledge, where knowledge transfer is embodied by the question and answering process. This dissertation further investigates Stack- Overflow, a popular Q&A forum, and models how knowledge is transferred among StackOverflow users. The knowledge transfer includes two parts: whether a question will receive answers, and whether an answer will be accepted. By investigating these two problems, it turns out that the knowledge transfer process is affected by the temporal factor and the knowledge level, defined as the combination of the user reputation and posted text. Take these factors into consideration, this work proposes TKTM (Time based Knowledge Transfer Modeling) where the likelihood of a user transfers knowledge to another is modeled as a continuous function of time and the knowledge level being transferred. TKTM is applied to solve several predictive problems: how many user accounts will be involved in the thread to provide answers and comments over time; who will provide the answer; and who will provide the accepted answer. The result is compared to NetRate, QLI, and regression methods such as RandomForest, linear regression. In all experiments, TKTM outperforms v other methods significantly. vi Table of Contents Acknowledgments iii Abstract iv List of Tables xi List of Figures xiii Chapter 1. Introduction 1 1.1 Knowledge Mining in Social Network (Static) . .1 1.2 Information Cascade in Social Network (Dynamic) . .3 1.2.1 What is Information Cascade . .4 1.2.2 Diffusion Models . .5 1.2.3 Network Inference . .7 1.2.4 Influence Maximization . .8 1.2.5 Influence Estimation . .9 1.3 Knowledge Mining in Q&A Forum . .9 1.4 Research Problem . 11 Chapter 2. Information Flow Detection 13 2.1 Influence Maximization . 13 2.2 Sensing Security Vulnerability by using Twitter . 14 2.2.1 Problem Definition . 15 2.2.2 TCAD (Twitter Critical Account Discovery) . 16 vii Reward Functions . 17 Total Reward . 19 Greedy Algorithm . 20 2.2.3 Experiment Design & Result . 21 2.2.4 Summary . 23 Chapter 3. Information Cascade Prediction 25 3.1 Introduction . 26 3.2 Conflicting Observations on Cascade Size Distribution . 28 3.3 Phase-Transition and GPC . 30 3.4 What Affects the Cascade Size? . 33 3.4.1 Temporal Factor . 33 Information Attractiveness . 33 3.4.2 Spatial Factors . 34 Infection at Different Locations . 34 Number of Infected Friends(Parents) . 35 3.5 Aggregate Cascade Model . 35 3.6 Summary . 38 Chapter 4. Individual Knowledge Adoption Prediction 39 4.1 Introduction . 39 4.2 Influence Factors for Research Topic Adoption . 41 4.2.1 Direct Observation . 41 4.2.2 Indirect Observation . 42 Connections . 42 Topic Popularity . 42 4.3 Model Formulation . 43 4.3.1 NetInf* . 43 4.3.2 HetNetInf . 44 4.4 Experiments . 46 4.4.1 The Dataset and Experimental Setting . 46 4.4.2 Prediction Study . 47 4.5 Summary . 50 viii Chapter 5. Knowledge Propagation Network 53 5.1 Introduction . 53 5.2 Q&A forum: StackOverflow . 57 5.2.1 Dataset Description . 58 5.2.2 User Network . 59 5.2.3 Properties . 60 Reputation . 61 Thread Life Time . 62 5.3 Factors affecting knowledge transfer . 64 5.3.1 Questioner: Whether a Question will Receive Answers . 64 5.3.2 Answerer: Which Answer will be Selected? . 65 5.4 TKTM (Time based Knowledge Transfer Modeling) . 68 5.5 Community analysis with TKTM . 74 5.6 Evaluation on TKTM . 77 5.6.1 Task1: Thread Size Prediction . 77 Thread Size Estimation with Regression . 78 Thread Size Prediction with Link Weight . 78 Result . 80 5.6.2 Task2: Thread Size Prediction over Time . 82 5.6.3 Task3: Individual Action Prediction . 84 5.6.4 Task4: Who will Provide the Accepted Answer . 86 5.6.5 TKTM & Semantic Analysis . 88 5.7 Summary . 91 Chapter 6. Conclusion 92 Appendix 95 Appendix 1. Datasets 96 1.1 DBLP . 96 1.2 Digg Network . 97 1.3 Twitter Network . 99 1.4 StackOverflow . 99 ix Appendix 2. Network Analysis 102 2.1 Terminologies . 102 2.2 Random Networks . 102 2.2.1 Poisson Random Network . 102 2.2.2 General Degree Distribution Random Network . 104 2.3 Giant Component . 105 2.4 K-core size Estimation . 107 2.4.1 Methodology . 107 2.4.2 Example . 109 Bibliography 112 x List of Tables 3.1 Independent cascade generation process . 29 3.2 Aggregate cascade generation process . 37 4.1 Topic Adoption Prediction Performance . 49 4.2 Individual Features Weight . 50 5.1 Q&A forum Terminologies . 57 5.2 Datasets Summary . 59 5.3 Statistics related to accepted answers . 63 5.4 Features for predicting #answers. 65 5.5 Features for classifying whether an answer will be accepted . 66 5.6 MSE of estimation on Q&A thread size . 81 5.7 Estimation on Answerer in MRR . 85 5.8 Algorithm of estimating who will provide accepted answer . 87 5.9 Estimation on Answerer in MRR . 87 5.10 Prediction on Answerer in MRR with different γ .......... 90 1.1 DBLP:Author . 96 1.2 DBLP:Paper . 97 1.3 DBLP:Citation . 97 1.4 DBLP:Term . 97 1.5 Digg:Users . 98 1.6 Digg:Friendships . 98 1.7 Digg:Votes . 98 xi 1.8 Twitter:Users . 99 1.9 Twitter:Friendships . 99 1.10 Twitter:Tweets . 100 1.11 StackOverflow datasets . 101 2.1 Terminology List . ..

Load more