Fake News Detection on Social Media: a Data Mining Perspective

Fake News Detection on Social Media: A Data Mining Perspective Kai Shuy, Amy Slivaz, Suhang Wangy, Jiliang Tang \, and Huan Liuy yComputer Science & Engineering, Arizona State University, Tempe, AZ, USA zCharles River Analytics, Cambridge, MA, USA \Computer Science & Engineering, Michigan State University, East Lansing, MI, USA yfkai.shu,suhang.wang,[email protected], [email protected], \ [email protected] ABSTRACT on, and discuss the news with friends or other readers on social media. For example, 62 percent of U.S. adults get Social media for news consumption is a double-edged sword. news on social media in 2016, while in 2012, only 49 per- On the one hand, its low cost, easy access, and rapid dissem- cent reported seeing news on social media1. It was also ination of information lead people to seek out and consume found that social media now outperforms television as the news from social media. On the other hand, it enables the major news source2. Despite the advantages provided by wide spread of \fake news", i.e., low quality news with in- social media, the quality of news on social media is lower tentionally false information. The extensive spread of fake than traditional news organizations. However, because it news has the potential for extremely negative impacts on is cheap to provide news online and much faster and easier individuals and society. Therefore, fake news detection on to disseminate through social media, large volumes of fake social media has recently become an emerging research that news, i.e., those news articles with intentionally false infor- is attracting tremendous attention. Fake news detection mation, are produced online for a variety of purposes, such on social media presents unique characteristics and chal- as financial and political gain. It was estimated that over 1 lenges that make existing detection algorithms from tradi- million tweets are related to fake news \Pizzagate"3 by the tional news media ineffective or not applicable. First, fake end of the presidential election. Given the prevalence of this news is intentionally written to mislead readers to believe new phenomenon, \Fake news" was even named the word of false information, which makes it difficult and nontrivial to the year by the Macquarie dictionary in 2016. detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on The extensive spread of fake news can have a serious nega- social media, to help make a determination. Second, ex- tive impact on individuals and society. First, fake news can ploiting this auxiliary information is challenging in and of break the authenticity balance of the news ecosystem. For itself as users' social engagements with fake news produce example, it is evident that the most popular fake news was data that is big, incomplete, unstructured, and noisy. Be- even more widely spread on Facebook than the most popular authentic mainstream news during the U.S. 2016 pres- cause the issue of fake news detection on social media is both 4 challenging and relevant, we conducted this survey to fur- ident election . Second, fake news intentionally persuades ther facilitate research on the problem. In this survey, we consumers to accept biased or false beliefs. Fake news is present a comprehensive review of detecting fake news on usually manipulated by propagandists to convey political social media, including fake news characterizations on psy- messages or influence. For example, some report shows that Russia has created fake accounts and social bots to spread chology and social theories, existing algorithms from a data 5 mining perspective, evaluation metrics and representative false stories . Third, fake news changes the way people in- datasets. We also discuss related research areas, open prob- terpret and respond to real news. For example, some fake lems, and future research directions for fake news detection news was just created to trigger people's distrust and make on social media. them confused, impeding their abilities to differentiate what is true from what is not6. To help mitigate the negative ef- fects caused by fake news{both to benefit the public and 1. INTRODUCTION the news ecosystem{It's critical that we develop methods to As an increasing amount of our lives is spent interacting automatically detect fake news on social media. online through social media platforms, more and more peo- 1 ple tend to seek out and consume news from social media http://www.journalism.org/2016/05/26/news-use-across- social-media-platforms-2016/ rather than traditional news organizations. The reasons for 2http://www.bbc.com/news/uk-36528256 this change in consumption behaviors are inherent in the 3https://en.wikipedia.org/wiki/Pizzagate conspiracy theory nature of these social media platforms: (i) it is often more 4https://www.buzzfeed.com/craigsilverman/viral- timely and less expensive to consume news on social media fake-election-news-outperformed-real-news-on- compared with traditional news media, such as newspapers facebook?utm term=.nrg0WA1VP0#.gjJyKapW5y or television; and (ii) it is easier to further share, comment 5http://time.com/4783932/inside-russia-social-media-war- america/ 6https://www.nytimes.com/2016/11/28/opinion/fake- news-and-the-internet-shell-game.html? r=0 Detecting fake news on social media poses several new and • We discuss the narrow and broad definitions of fake challenging research problems. Though fake news itself is news that cover most existing definitions in the litera- not a new problem{nations or groups have been using the ture and further present the unique characteristics of news media to execute propaganda or influence operations fake news on social media and its implications com- for centuries{the rise of web-generated news on social me- pared with the traditional media; dia makes fake news a more powerful force that challenges traditional journalistic norms. There are several character- • We give an overview of existing fake news detection istics of this problem that make it uniquely challenging for methods with a principled way to group representative automated detection. First, fake news is intentionally writ- methods into different categories; and ten to mislead readers, which makes it nontrivial to detect • We discuss several open issues and provide future di- simply based on news content. The content of fake news is rections of fake news detection in social media. rather diverse in terms of topics, styles and media platforms, and fake news attempts to distort truth with diverse lin- The remainder of this survey is organized as follows. In guistic styles while simultaneously mocking true news. For Section2, we present the definition of fake news and char- example, fake news may cite true evidence within the in- acterize it by comparing different theories and properties in correct context to support a non-factual claim [22]. Thus, both traditional and social media. In Section3, we continue existing hand-crafted and data-specific textual features are to formally define the fake news detection problem and sum- generally not sufficient for fake news detection. Other aux- marize the methods to detect fake news. In Section4, we iliary information must also be applied to improve detec- discuss the datasets and evaluation metrics used by existing tion, such as knowledge base and user social engagements. methods. We briefly introduce areas related to fake news de- Second, exploiting this auxiliary information actually leads tection on social media in Section5. Finally, we discuss the to another critical challenge: the quality of the data itself. open issues and future directions in Section6 and conclude Fake news is usually related to newly emerging, time-critical this survey in Section7. events, which may not have been properly verified by existing knowledge bases due to the lack of corroborating evidence or claims. In addition, users' social engagements 2. FAKE NEWS CHARACTERIZATION with fake news produce data that is big, incomplete, un- In this section, we introduce the basic social and psychologi- structured, and noisy [79]. Effective methods to differenti- cal theories related to fake news and discuss more advanced ate credible users, extract useful post features and exploit patterns introduced by social media. Specifically, we first network interactions are an open area of research and need discuss various definitions of fake news and differentiate re- further investigations. lated concepts that are usually misunderstood as fake news. In this article, we present an overview of fake news detection We then describe different aspects of fake news on tradi- and discuss promising research directions. The key motiva- tional media and the new patterns found on social media. tions of this survey are summarized as follows: 2.1 Definitions of Fake News • Fake news on social media has been occurring for sev- Fake news has existed for a very long time, nearly the same eral years; however, there is no agreed upon definition amount of time as news began to circulate widely after the of the term \fake news". To better guide the future printing press was invented in 14397. However, there is no directions of fake news detection research, appropriate agreed definition of the term \fake news". Therefore, we first clarifications are necessary. discuss and compare some widely used definitions of fake news in the existing literature, and provide our definition of • Social media has proved to be a powerful source for fake news that will be used for the remainder of this survey. fake news dissemination. There are some emerging A narrow definition of fake news is news articles that are in- patterns that can be utilized for fake news detection tentionally and verifiably false and could mislead readers [2]. in social media. A review on existing fake news detec- There are two key features of this definition: authenticity tion methods under various social media scenarios can and intent.

Fake News Detection on Social Media: a Data Mining Perspective

Arxiv:1805.10105V1 [Cs.SI] 25 May 2018

Predicting the Political Alignment of Twitter Users

Political Astroturfing Across the World

Understanding Users' Perspectives of News Bots in the Age of Social Media

Topical Web Crawlers: Evaluating Adaptive Algorithms

Supplementary Materials For

Frame Viral Tweets.Pdf

Empirical Comparative Analysis of Bot Evidence in Social Networks

Disinformation, and Influence Campaigns on Twitter 'Fake News'

Social Bots and Social Media Manipulation in 2020: the Year in Review

The Rise of Social Botnets: Attacks and Countermeasures

Supervised Machine Learning Bot Detection Techniques to Identify