Probabilistic Modeling of Rumour Stance and Popularity in Social Media Michal Lukasik

Probabilistic Modeling of Rumour Stance and Popularity in Social Media By: Michal Lukasik A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy The University of Sheffield Faculty of Engineering Department of Computer Science Abstract Social media tends to be rife with rumours when new reports are released piecemeal during breaking news events. One can mine multiple reactions expressed by social media users in those situations, exploring users’ stance towards rumours, ultimately enabling the flagging of highly disputed rumours as being potentially false. Moreover, rumours in social media exhibit complex temporal patterns. Some rumours are discussed with an increasing number of tweets per unit of time whereas other rumours fail to gain ground. This thesis develops probabilistic models of rumours in social media driven by two applications: rumour stance classification and modeling temporal dynamics of rumours. Rumour stance classification is the task of classifying the stance expressed in an individual tweet towards a rumour. Modeling temporal dynamics of rumours is an application where rumour prevalence is modeled over time. Both applications provide insights into how a rumour attracts attention from the social media community. These can assist journalists with their work on rumour tracking and debunking, and can be used in downstream applications such as systems for rumour veracity classification. In this thesis, we develop models based on probabilistic approaches. We motivate Gaus- sian processes and point processes as appropriate tools and show how features not considered in previous work can be included. We show that for both applications, transfer learning approaches are successful, supporting the hypothesis that there is a common underlying sig- nal across different rumours. We furthermore introduce novel machine learning techniques which have the potential to be used in other applications: convolution kernels for streams of text over continuous time and a sequence classification algorithm based on point processes. ii Acknowledgements I am very grateful to my supervisors Kalina Bontcheva and Trevor Cohn for their support and guidance throughout my PhD. I am grateful to Kalina for introducing me to social media research and supporting me throughout my studies. Trevor shaped me as a researcher, it was wonderful to work together on so many projects and explore the many exciting ideas that we had. I would like to express my gratitude to my examiners, Andreas Vlachos and Stephen Clark, who greatly helped in improving this thesis. I was very lucky to have worked closely with Srijith P.K. It is hard to overestimate the impact he had on my PhD. Neil Lawrence and his group greatly inspired me to enter the depths of the exciting field of machine learning. I benefited tremendously from attending the Gaussian process schools and frequently interacting with the group. Thank you to my collaborators and colleagues: Arkaitz Zubiaga, Duy Vu, Zsolt Bitvai, Daniel Beck, Maria Liakata, Rob Procter, Varvara Logacheva, Tomasz Kusmierczyk, and others, who greatly impacted my research through numerous conversations, shared ideas, and code. I am grateful to Richard Zens for inviting me for an internship at Google Research and to Manuel Gomez-Rodriguez for inviting me for an internship at the Max Planck Institute for Software Systems. These experiences enriched me greatly. Thanks to all the friends I made while I was working on my PhD at both the Univer- sity of Sheffield and the University of Melbourne. Because of you my journey was very enjoyable. Last but not the least, I thank my parents who have always supported me and my broth- ers for putting up with me. iii Contents Abstract ii Acknowledgements iii List of Figures vii List of Tables ix Nomenclature xi 1 Introduction 1 1.1 Statement of the Problem . .3 1.2 Aims and Research Questions . .3 1.2.1 Scope . .7 1.3 Thesis Structure . .8 1.4 Published Material . 10 2 Rumours in Social Media 12 2.1 Rumour Definition . 12 2.2 Rumour Stance Classification . 13 2.3 Modeling Temporal Dynamics of Rumours . 18 2.4 Other Related Problems . 20 2.5 Rumour Datasets . 23 2.5.1 England Riots Dataset . 23 2.5.2 PHEME Dataset . 27 2.5.3 Other Rumour Datasets . 31 2.6 Conclusions . 32 iv CONTENTS v 3 Probabilistic Models for Classification and Temporal Modeling 33 3.1 Gaussian Processes . 33 3.1.1 Motivation . 33 3.1.2 Model . 34 3.1.3 Outputs . 37 3.1.4 Approximate Inference . 39 3.1.5 Kernels . 41 3.1.6 Multi-task Learning with Gaussian Processes . 46 3.1.7 Gaussian Processes for NLP and Social Media . 47 3.2 Point processes . 48 3.2.1 Motivation . 48 3.2.2 Preliminaries . 49 3.2.3 Poisson Processes . 52 3.2.4 Log-Gaussian Cox Processes . 54 3.2.5 Hawkes processes . 57 3.3 Evaluation Metrics . 61 3.4 Conclusions . 63 4 Rumour Stance Classification 64 4.1 Introduction . 64 4.2 Problem Definition . 66 4.3 Model . 68 4.4 Experiment Settings . 70 4.5 Results . 72 4.6 Conclusions . 79 5 Modeling Temporal Dynamics of Rumours 80 5.1 Introduction . 81 5.2 Problem definition . 82 5.3 Model . 84 5.4 Experiment Settings . 86 5.5 Experiments . 88 5.6 Conclusions . 97 6 Convolution Kernels for Modeling Temporal Dynamics of Rumours 98 6.1 Introduction . 98 6.2 Related Work on Modeling Sequences of Text over Time . 99 vi CONTENTS 6.3 Notation and Problem Formulation . 101 6.4 Convolution time series kernels . 102 6.4.1 Formulations . 102 6.4.2 Proof of correctness . 104 6.5 Experiments on Synthetic Data . 106 6.5.1 Toy example for time . 106 6.5.2 Toy example for time and text . 107 6.5.3 Complex synthetic experiment . 109 Output variables . 110 Results . 111 6.6 Experiments on Rumour Data . 112 6.7 Conclusions . 118 7 Temporal Dynamics for Rumour Stance Classification 120 7.1 Introduction . 120 7.2 Problem Settings . 123 7.3 Model . 124 7.3.1 Intensity Function . 125 7.3.2 Likelihood . 127 7.3.3 Prediction . 127 7.3.4 Parameter Optimization . 128 7.4 Experiments . 131 7.4.1 Baselines . 131 7.4.2 Results . 132 7.5 Conclusions . 139 8 Conclusions 140 8.1 Contributions . 140 8.2 Future Work . 143 8.3 Final Remarks . 146 Appendices 147 A Derivation of the Log Likelihood under the Hawkes Process Model 148 Bibliography 151 List of Figures 1.1 An illustrative example rumour about the ISIS flag being displayed on the cafeteria besieged in Sydney in 2014. .2 2.1 Time profiles of several example rumours in the Ferguson data set happen- ing during 13/8/2013. 31 3.1 A graphical representation of the relation between the input x, the latent function value f, and the output y in the Gaussian process framework. 36 3.2 Posterior distributions from Gaussian processes with RBF kernels controlled by different hyperparameter values κ..................... 44 3.3 An illustrative example of a coregionalization matrix B. 47 3.4 Relations between the stochastic processes considered in this thesis. 49 3.5 Intensity function values over time and the corresponding: instantaneous likelihood of the first event occurrence and cumulative distribution function of the first event occurrence. 51 3.6 Time varying intensity functions for two inhomogeneous and one homoge- neous Poisson process, and the corresponding samples of tweet arrivals. 55 3.7 Samples of tweets drawn from a uni-variate Hawkes process and the corresponding intensity function values over time. 59 4.1 Illustration of different evaluation techniques for rumour stance classification. 67 4.2 Macro-F1 and Micro-F1 scores for different methods over the number of tweets from the target rumour used for training on the England riots and the PHEME datasets. 74 4.3 Cross-classification rates for competitive methods on the England riots dataset. 77 5.1 Counts of tweets in consecutive 6 minute time intervals over two hours for two illustrative rumours. 83 vii viii LIST OF FIGURES 5.2 Intensity functions for different methods across example Ferguson rumours in the extrapolation and interpolation setting. 90 5.3 Confusion matrices for four selected methods in the extrapolation setting. 93 5.4 Confusion matrices for four selected methods in the interpolation setting. 94 5.5 Intensity functions and corresponding predicted arrival times for different methods across example Ferguson rumours. 96 6.1 Samples from the toy classification example for time. 107 6.2 Samples from the toy classification example for time and text. 108 6.3 Comparison of accuracy between convolution kernels, showing kernels text◦ time, text and time. 112 6.4 Confusion matrices for LGCP text ◦ time, LGCP TXT, LGCP ICM+TXT and LGCP ICM. ..

Load more