Rumor Detection on Social Media: Datasets, Methods and Opportunities
Total Page:16
File Type:pdf, Size:1020Kb
Rumor Detection on Social Media: Datasets, Methods and Opportunities Quanzhi Li, Qiong Zhang, Luo Si, Yingchi Liu Alibaba Group, US Bellevue, WA, USA {quanzhi.li, qz.zhang, luo.si, yingchi.liu}@alibaba-inc.com Abstract 2017a; Cao et al., 2018). The focus of this study is rumor on social media, not fake news. There are Social media platforms have been used for also different definitions for rumor detection. In information and news gathering, and they are very valuable in many applications. However, some studies, rumor detection is defined as they also lead to the spreading of rumors and determining if a story or online post is a rumor or fake news. Many efforts have been taken to non-rumor (i.e. a real story, a news article), and detect and debunk rumors on social media by the task of determining the veracity of a rumor analyzing their content and social context (true, false or unverified) is defined as rumor using machine learning techniques. This paper gives an overview of the recent studies verification (Zubiaga et al., 2016; Kochkina et al., in the rumor detection field. It provides a 2018). But in this survey paper, as well as in (Ma comprehensive list of datasets used for rumor et al., 2016; Cao et al., 2018; Shu et al, 2017; Zhou detection, and reviews the important studies et al., 2018), rumor detection is defined as based on what types of information they determining the veracity value of a rumor. This exploit and the approaches they take. And more importantly, we also present several means it is the same as rumor verification defined new directions for future research. in some other studies. 1.2 Problem Statement 1 Introduction The rumor detection problem is defined as follow: A story x is defined as a set of n pieces of related Rumors sometimes may spread very quickly over messages M = {m , m m }. m is the source social media platforms, and rumor detection has 1 2, …, n 1 message (post) that initiated the message chain, gained great interest in both academia and which could be a tree-structure having multiple industry recently. Government authorities and branches. For each message m , it has attributes social media platforms are also taking efforts to i representing its content, such as text and image. defeat the negative impacts of rumors. In the Each message is also associated with a user who following sub sections, we first introduce the posted it. The user also has a set of attributes, rumor detection definition, the problem statement, including name, description, avatar image, past and user stance, an important concept for the rest posts, etc. The rumor detection task is then of this paper. defined as: Given a story x with its message set M 1.1 Rumor Detection and user set U, the rumor detection task aims to determine whether this story is true, false or Different publications may have different unverified (or just true or false for datasets having definitions for rumor. It is hard to do a head-to- just two labels). This definition formulates the head comparison between existing methods due rumor detection task as a veracity classification to the lack of consistency. In this survey, a rumor task. The definition is the same as the definition is defined as a statement whose truth value is true, used in many studies (Cao et al., 2018; Shu et al, unverified or false (Qazvinian et al., 2011). When 2017b; Ma et al., 2016; Zhou et al., 2018). a rumor’s veracity value is false, some studies call it “false rumor” or “fake news”. However, many 1.3 User Stance previous studies give “fake news” a stricter User responses to a source post (the first message) definition: fake news is a news article published have been exploited in some rumor detection by a news outlet that is intentionally and models. Most studies use four stance categories: verifiably false (Vosoughi et al., 2018; Shu et al., Total User Time Propagati Dataset rumors Text Platform Description info stamp on info (claims) PHEME-R 330 y y y y Twitter Tweets from [Zubiaga et al., 2016] PHEME 6425 y y y y Twitter Tweets from [Kochkina et al., 2018] Ma-Twitter 992 y y y Twitter Tweets from [Ma et al., 2016] Ma-Weibo 4,664 y y y Weibo Weibo data from [Ma et al., 2016] Tweets from [Liu et al., 2015; Ma et Twitter15 1,490 y y y y Twitter al.,2016] Twitter16 818 y y y y Twitter Tweets from [Ma et al., 2017b] Facebook data from [Silverman et al., BuzzFeedNews 2,282 y Facebook 2016] SemEval19 325 y y y y Twitter, Reddit SemEval 2019 Task 7 data set. Kaggle 2145 y Twitter, Facebook Kaggle rumors based on Emergent.info Emergent Kaggle Snopes 16.9K y Twitter, Facebook Kaggle rumors based on Snopes.com Facebook data from [Tacchini et al., Facebook Hoax 15.5K y y y Facebook 2017] Kaggle 2923 y y y y Twitter Kaggle rumors based on PolitiFact PolitiFact Dataset from [Shu et al., 2019], enhanced FakeNewsNet 23,196 y y y y Twitter from PolitiFact and GossipCop Table 1: Datasets for rumor detection and their properties supporting, denying, querying and commenting. fake news detection. Because this paper focuses Some studies have explicitly used stance on rumor detection on social media, and those information in their rumor detection model, and datasets are only for fake news detection and do have shown big performance improvement (Liu et not have social context information (e.g. user al., 2015; Enayet and El-Beltagy, 2017; Ma et al., responses, user data, and propagation 2018a; Kochkina et al., 2018), including the two information), so we did not list them here. The systems, (Enayet and El-Beltagy, 2017) and (Li et data of datasets in Table 1 are collected from four al., 2019a), that were ranked No. 1 in SemEval social media platforms: Twitter, Facebook, 2017 and SemEval 2019 rumor detection tasks, Reddit and Weibo. Weibo is a Chinese social respectively. Stance detection is not the focus of media platform with over 400 million users, and this paper, but stance information has been used it is very similar to Twitter. More than half of explicitly or implicitly in many rumor detection these datasets have three veracity labels: true, models, and in the next section we will also false and unverified. Others have only two labels: discuss some multi-task learning approaches that true and false. Among these datasets, PHEME-R jointly learn stance detection and rumor detection has been used by SemEval 2017 rumor detection models. task and SemEval19 has been used by SemEval In the following sections, we will 1. introduce 2019 rumor detection task (Gorrell et al., 2019). a comprehensive list of datasets for rumor The dataset links are listed below: detection, 2. discuss the research efforts • PHEME-R: categorized by the information and approaches https://figshare.com/articles/PHEME_rumour_scheme_da they use, and 3. present several directions for taset_journalism_use_case/2068650 • PHEME: future research https://figshare.com/articles/PHEME_dataset_for_Rumou r_Detection_and_Veracity_Classification/6392078 2 Datasets and Evaluation Metrics • Ma-Twitter: http://alt.qcri.org/~wgao/data/rumdect.zip • Ma-Weibo: http://alt.qcri.org/~wgao/data/rumdect.zip 2.1 Datasets • Twitter15: https://www.dropbox.com/s/7ewzdrbelpmrnxu/rumdetect Datasets could vary depending on what platforms 2017.zip?dl=0 • Twitter16: the data are collected from, what types of contents https://www.dropbox.com/s/7ewzdrbelpmrnxu/rumdetect are included, whether propagation information is 2017.zip?dl=0 recorded, and so on. Table 1 lists the datasets for • BuzzFeedNews: https://github.com/BuzzFeedNews/2016- rumor detection. There are also other datasets for 10-fac\ebook-fact-check • SemEval19: path. A few of them also explicitly incorporate https://competitions.codalab.org/competitions/19938#lear n_the_details-overview user stance in their models. It also shows that • Kaggle Emergent: almost all the most recent studies utilized neural https://www.kaggle.com/arminehn/rumor-citation networks in their models. Due to the space • Kaggle Snopes: https://www.kaggle.com/arminehn/rumor-citation limitation, we just describe the representative • Facebook Hoax: https://github.com/gabll/some-like-it- studies in this paper. hoax/tree/master/dataset • Kaggle PolitiFact: 3.1 Approaches Using Content Information https://www.kaggle.com/arminehn/rumor-citation • FakeNewsNet: Textual Content. Text content is utilized by https://github.com/KaiDMML/FakeNewsNet almost all the previous studies on rumor detection. 2.2 Evaluation Metrics It includes the source post and all user replies. According to deception style theory, the content Most existing approaches consider rumor style of deceptive information that aims to detection as a classification problem. Usually it is deceive readers should be somewhat different either a binary (true or false) or a multi-class (true, from that of the truth, e.g., using exaggerated false or unverified) classification problem. The expressions or strong emotions. And from user evaluation metrics used the most are precision, response text, we can also explore stance and recall, F1 and accuracy measures. Because some opinion of users towards rumors. datasets are skewed, Macro F1 measure will Generally, text features can be grouped into provide a better view on the algorithm attribute-based or structure-based features (Zhou performance over all classes. Here we briefly and Zafarani, 2018). Attribute-based features describe them. For each class C, we calculate its include quantity (word, noun, verb, phrase, etc.), precision (p), recall (r) and F1 score as follow: uncertainty (number of question mark, quantifiers, $%. %' ()*%(+ ,(-./0-1. 2+ 3 0%((-0145 tentative terms, modal terms), subjectivity � = (1) $%. %' ()*%(+ ,(-./01-. 2+ 3 (percentage of subjective verbs, imperative $%. %' ()*%(+ ,(-./0-1.