Modeling Text Embedded Information Cascades

NORTHEASTERN UNIVERSITY Modeling Text Embedded Information Cascades by Shaobin Xu A Thesis Submitted in Partial Fulfilment of the Requirements for the Degree of Doctor of Philosophy in Computer Science December, 2019 Abstract Networks mediate several aspects of society. For example, social networking services (SNS) like Twitter and Facebook have greatly helped people connect with families, friends and the outside world. Public policy diffuses over institutional and social networks that connect political actors in different areas. Inferring network structure is thus essential for understanding the transmission of ideas and information, which in turn could answer questions about communities, collective actions, and influential social participants. Since many networks are not directly observed, we often rely on indirect evidence, such as the tim- ing of messages between participants, to infer latent connections. The textual content of messages, especially the reuse text originating elsewhere, is one source of such evidence. This thesis contributes techniques for detecting the evidence of text reuse and modeling underlying network structure. We propose methods to model text reuse with accidental and intentional lexical and semantic mutations. For lexical similarity detection, an n-gram shin- gling algorithm is proposed to detect “locally” reused passages, instead of near-duplicate documents, embedded within the larger text output of network nodes. For semantic similarity, we use an attention based neural network to also detect embedded reused texts. When modeling network structure, we are interested in inferring different levels of de- tails: individual links between participants, the structure of a specific information cascade, or global network properties. We propose a contrastive training objective for conditional models of edges in information cascades that has the flexibility to answer those questions ii Abstract iii and is also capable of incorporating rich node and edge features. Last but not least, network embedding methods prove to be a good way to learn the representations of nodes while preserving structure, node and edge properties, and side information. We propose a self-attention Transformer-based neural network trained to predict the next activated node in a given cascade to learn node embeddings. First Reader: David Smith Second Reader: Tina Eliassi-Rad Tertiary Reader: Byron Wallace External Reader: Bruce Desmarais Acknowledgment The journey to become a PhD can be daunting, frustrating, and yet wonderful. I owe so many thanks to many great people that helped me sail through the unforgettable 6 years as a PhD student. First and foremost, I would like to thank David, my advisor, for bringing me to the US, giving me the opportunity to work with him on many interesting NLP topics. I am deeply grateful for his patience and guidance throughout this time. His passion and knowledge about research have greatly educated and shaped me. This thesis could not have been done without his constant advice and support. His unique perspective of many matters also has made great impact on my life. I want to thank Tina, Byron, and Bruce for taking the time serving in my thesis commit- tee. Your comments and advice have greatly helped me make the thesis wholesome. I am so very appreciative of Byron’s detailed suggestions to revise the final draft of this thesis. I thank Professor Ryan Cordell, with whom I collaborated on part of the results related to this thesis. I am inspired by your passion on uncovering the 19-th century newspaper reprinting network and I am honored to be part of the team. I want to thank everyone from our lab – Liwen, Rui, Ansel, and Ryan. It is a privilege to have your company. Without any of you making the lab full of joy and energy, my life would have been miserable. I thank Rui for many deep discussions on work and life to keep my mind clear. I am truly grateful for all the brainstorms and discussions with Ansel iv Acknowledgment v in my final two and a half years to advance my research, as well as many aspects of the live in the US that I would never know otherwise. I thank my friends outside of my lab, Bingyu, Yupeng, Chin and Bochao. Your constant help has made my life so much easier when I already have so much weight on my shoulder for the PhD. You helped to fill a lot of void during this lonely journey. Finally, I thank my mom, Xiulin, who inspires me, encourages me and loves me uncon- ditionally and endlessly. I would not have come to the other side of the world, been able to face all the unknowns and stood on my feet, had I not had the support from her. To my family and my loving friends. Contents Abstract ii Acknowledgment iv Contents vii List of Figures xi List of Tables xiii 1 Introduction 1 1.1 Detecting Text Reuse . .2 1.2 Network Inference from Information Cascades . .5 1.3 Node Representation Learning . .7 1.4 Overview of The Thesis . .8 2 Text reuse in social networks 10 2.1 Local Text Reuse Detection . 11 2.2 Efficient N-gram Indexing . 12 2.3 Extracting and Ranking Candidate Pairs . 13 vii Contents viii 2.4 Computing Local Alignments . 14 2.5 Intrinsic Evaluation . 17 2.6 Extrinsic Evaluation . 18 2.7 Network Connections of 19c Reprints . 21 2.7.1 Dataset description . 21 2.7.2 Experiment . 24 2.8 Congressional Statements . 27 2.8.1 Dataset description . 27 2.8.2 Experiment . 28 2.9 Conclusion . 29 3 Semantic Text Reuse in Social Networks 30 3.1 Classifying text reuse as paraphrase or textual entailment . 31 3.2 Method Overview . 34 3.3 Word Representations . 35 3.4 Contextualized Sentence Representation . 37 3.5 Attention . 40 3.6 Final output . 41 3.7 Objective function . 42 3.8 Experiments . 42 3.8.1 Datasets . 42 3.8.2 Models . 44 3.8.3 Experiment settings . 46 3.8.4 Document level evaluation . 47 3.8.5 Sentence level evaluation . 49 3.8.6 Ablation Test . 50 Contents ix 3.9 Conclusion . 52 4 Modeling information cascades with rich feature sets 54 4.1 Network Structure Inference . 55 4.2 Log-linear Directed Spanning Tree Model . 56 4.3 Likelihood of a cascade . 58 4.4 Maximizing Likelihood . 59 4.5 Matrix-Tree Theorem and Laplacian Matrix . 59 4.6 Gradient . 61 4.7 ICWSM 2011 Webpost Dataset . 61 4.7.1 Dataset description . 62 4.7.2 Feature sets . 63 4.7.3 Result of unsupervised learning at cascade level . 64 4.7.4 Result of unsupervised learning at network level . 66 4.7.5 Enforcing tree structure on the data . 68 4.7.6 Result of supervised learning at cascade level . 70 4.8 State Policy Adoption Dataset . 70 4.8.1 Dataset description . 71 4.8.2 Effect of proximity of states . 71 4.9 Conclusion . 72 5 Modeling information cascades using self-attention neural networks 74 5.1 Node representation learning . 75 5.2 Information cascades as DAGs . 78 5.3 Graph self-attention network . 78 5.3.1 Analogy to language modeling . 79 5.3.2 Graph self-attention layer . 81 Contents x 5.3.3 Graph self-attention network . 83 5.3.4 Senders and receivers . 85 5.3.5 Hard attention . 86 5.3.6 Edge prediction . 90 5.4 Experiments . 90 5.4.1 Datasets . 90 5.4.2 Baselines . 93 5.4.3 Experimental settings . 95 5.4.4 Node prediction . 96 5.4.5 Edge prediction . 98 5.4.6 Effect of texts as side information . 99 5.5 Conclusion . 102 6 Conclusion 104 6.1 Future Work . 106 6.1.1 Text Reuse . 106 6.1.2 Network Structure Inference . 107 Bibliography 109 List of Figures 2.1 Average precision for aligned passages of different minimum length in characters. Vertical red lines indicate the performance of different parame- ter settings (see Table 2.1). 19 2.2 (Pseudo-)Recall for aligned passages of different minimum lengths in characters. 20 2.3 Newspaper issues mentioning “associated press” by year, from the Chron- icling America corpus. The black regression line fits the raw number of issues; the red line fits counts corrected for the number of times the Asso- ciated Press is mentioned in each issue. 23 2.4 Reprints of John Brown’s 1859 speech at his sentencing. Counties are shaded with historical population data, where available. Even taking population differences into account, few newspapers in the South printed the abolitionist’s statement. 25 3.1 The overview of structure of Attention Based Convolutional Network (ABCN) 34 3.2 Unrolled Vanilla Recurrent Neural Network . 38 3.3 An illustration of ConvNet structure used in ABCN with toy example with word embeddings in R5, kernel size 2, feature map size 3. It yields a representation for the sentence in R3 ....................... 39 xi List of Figures xii 4.1 Recall, precision, and average precision of InfoPath and DST on predicting the time-varying networks generated per day. The DST model is trained un- supervisedly on separate cascades using basic and enhanced features. The upper row uses graph-structured cascades from the ICWSM 2011 dataset. The lower row uses the subset of cascades with tree structures. 69 5.1 The illustration of a cascade structure as DAG in a toy network. 79 5.2 Graph Self-Attention Network architecture, with L identical multi-headed self attention layer. 82 5.3 Graph Sender-Receiver attention network architecture, with L identical multi-headed attention layer for senders and receivers respectively. 87 5.4 Graph hard self-attention network architecture, with L − 1 identical multi- headed attention layer and the last layer replaced with a reinforcement learning agent selecting mask actions.

Load more