Automatic Fake News Detection

AUTOMATIC FAKE NEWS DETECTION Master Degree Project in Informatics with a Specialization in Privacy, Information and Cyber Security One Year Master 15 ECTS Spring term 2020 Pontus Nordberg Supervisor: Joakim Kävrestad Examiner: Marcus Nohlberg Abstract Due to the large increase in the proliferation of “fake news” in recent years, it has become a widely discussed menace in the online world. In conjunction with this popularity, research of ways to limit the spread has also increased. This paper aims to look at the current research of this area in order to see what automatic fake news detection methods exist and are being developed, which can help online users in protecting themselves against fake news. A systematic literature review is conducted in order to answer this question, with different detection methods discussed in the literature being divided into categories. The consensus which appears from the collective research between categories is also used to identify common elements between categories which are important to fake news detection; notably the relation of headlines and article content, the importance of high-quality datasets, the use of emotional words, and the circulation of fake news in social media groups. Keywords: Fake News, Automatic Detection Table of Contents Journal paper ........................................................................................................................................... 1 3 Journal paper Abstract Due to the large increase in the proliferation of “fake news” in recent years, it has become a widely discussed menace in the online world. In conjunction with this popularity, research of ways to limit the spread has also increased. This paper aims to look at the current research of this area in order to see what automatic fake news detection methods exist and are being developed, which can help online users in protecting themselves against fake news. A systematic literature review is conducted in order to answer this question, with different detection methods discussed in the literature being divided into categories. The consensus which appears from the collective research between categories is also used to identify common elements between categories which are important to fake news detection; notably the relation of headlines and article content, the importance of high-quality datasets, the use of emotional words, and the circulation of fake news in social media groups. Keywords: Fake News, Automatic Detection Introduction In the lead-up year before the 2016 American presidential election the term “fake news” was popularized and has since become a common term in the public vocabulary, being named the “word of the year” of 2017 by the Collins English Dictionary and American Dialect Society (American Dialect Society, 2018; Collins, 2017). Views have differed in regards to exactly what content should or should not be included under the “fake news”-umbrella. But the most commonly used definition is that fake news consists of deliberate disinformation, used to purposefully spread untrue information, or as skewed reporting of real events to push a certain narrative (Gelfert, 2018; Klein & Wueller, 2017; Shu et al., 2017). While the concept of “fake news” can be argued to be as old as news itself, the advent of social media and mass-information on the internet has led to fake news taking on a new form compared to its previous iterations (Alam & Ravshanbekov, 2019; Janze & Risius, 2017). Concerns have been raised especially in the context of the effect which fake news can have on elections (Faustini & Covões, 2019). This type of fake news, which can spread among millions of people online, has attracted much attention from researchers, and many studies have been made in order to examine fake news and its impact on society and democracy. Much of the research which has been done as a result of fake news becoming an increasingly relevant issue has been aimed towards finding ways to combat it. Education on how to identify fake news, bias, “clickbait” (eye-catching, typically misleading, article titles to bait people to click on them which generates advertising revenue), or other disinformation in online news has become something which is taught to students in many schools and libraries (Klein & Wueller, 2017). Additionally; government organizations, news agencies, and online- websites also provide manual fact-checking to identify fake news, as well as information on how to think critically and be aware of bias when reading news-stories they come across online (Ireton & Posetti, 2018; Shabani & Sokhn, 2018). Since online users who are using the internet to receive news and information get exposed to these large volumes of disinformation as a result of fake news, trusting what one reads online has become more difficult. “Manual” ways to detect fake news exist for such users, such as using their own judgement in determining if what they read is accurate or not, or by utilizing 1 the assistance provided by the aforementioned fact-checking websites. However, such processes can be rather unreliable, as the very purpose of fake news is to appear as legitimate news, or tedious, due to the need to look up fact-checks for various claims on different websites than the one they’re reading; assuming that what they’re reading even has been fact- checked in the first place. An efficient way to reliably detect fake news would therefore be very beneficial to online users and the trustworthiness of online news. Since a lot of research has been done regarding ways to combat fake news by identifying it using technology, this study will examine how online users can protect themselves against fake news by looking at what detection methods exists and are being developed today, as well as how this process can be automated. To answer this question, scientific literature will be systematically collected, mapped out, examined, and analysed. The results will then show the research community’s most recent developments regarding fake new detection technology, and what needs to be focused on to create an effective implementation to protect online users. Methodology The methodology chosen for this paper, and which is suitable for the type of question which the paper aims to answer, is a systematic literature review. Wee and Banister (2016) describe a literature review as something which should add value, unlike a mere literature overview. This thesis will, as previously mentioned, fulfil this criteria by providing an analysis of the current research in fake news detection, and thus contributing to the development of protection against fake news for regular users online. To perform a systematic literature review, relevant literature is needed from trusted and high- quality sources, such as academic databases. A variety of sources is necessary to yield a wide enough selection of papers, and since a single database is unlikely to contain all relevant material. Brereton et al. (2007) recommends the following online databases for literature reviews in the area of software engineering, making them suitable for this thesis. IEEExplore SpringerLink ACM Digital Library Google Scholar is also suggested to be useful for when one wants to examine more or less all main literature on a topic (Wee & Banister, 2016). The three databases suggested and listed above will therefore be the main sources of literature in this review, with Google Scholar being used as a secondary source for literature not already found in the main databases. The search term used in all the above-mentioned databases, with the publications year limited to 2015 and later (to get the most recent results and due to the explosion in the popularity and discussion of the term since that time as a result of the US presidential election) was: Automatic Fake News Detection These search parameters proved adequate and provided a vast amount of results in the aforementioned databases. Exclusion measures were then applied to the search results, this was the process of excluding articles in the results not related to fake news detection or which were not directly focused on it, for example articles that mentioned the concept but did not present automatic detection methods. Papers presenting developed detection methods in 2 regards to news pieces or articles were the ones collected from the databases. They were then sorted out of duplicates, which resulted in 47 relevant papers to be used. Thematic coding was chosen to be the analysis method for the review of the literature. This method creates a descriptive presentation of qualitative data where themes of the data are identified and implications from across several pieces of analysed content can be identified and noted (Anderson, 2007; Smith et al., 1992). The collected material will be placed into different fake news detection categories based on the method-types which researchers have focused on. This will result in a good overview of how similar models relate to other models of the same method-type and, as mentioned above, how implications can be seen and conclusions drawn from the identified themes. Results This chapter will present the data which has been gathered using the method specified in the previous chapter. The final 47 papers were analysed and the fake news detection research that each paper presented put in relation with the other papers of the same category. The categories are the themes of the thematic coding analysis method. They were created by grouping together the literature into the categories based on their (see Table 1 below). Text Classification Singhania et al. (2017), Abedalla et al. (2019), Saikh et al. (2019), (25) – Papers mainly Bhattacharjee et al. (2017), Janze & Risius (2017), Pérez-Rosas et focusing on analysis al. (2017), Qawasmeh et al. (2019), Kaur et al. (2019), Traylor et and classification based al. (2019), Monteiro et al. (2018), Girgis et al. (2018), Ahmed et on the content of the al. (2017), Manzoor & Singla (2019), Waikhom & Goswami article (2019), Granik & Meysura, (2017), Faustini & Covões (2019), Rubin et al.

Automatic Fake News Detection

Tutorial Blogspot Plus Blogger Templates

Unconscionability 2.0 and the Ip Boilerplate: a R Evised Doctrine of Unconscionability for the Information

Fake News Outbreak 2021: Can We Stop the Viral Spread?

Informational Apathy: the Impact of the Information Age

A Landscape Analysis

Unconscionability 2.0 and the IP Boilerplate: a Revised Doctrine of Unconscionability for the Information Age

CYBERSECURITY When Will You Be Hacked?

References-9781789733624 127..154

Technology-Enabled Disinformation: Summary, Lessons, and Recommendations

Downloaded Over Librarians Share Experiences of Chal- Search Algorithms to Databases

The 2020 Census: DIGITAL PREPAREDNESS PLAYBOOK

Transnational Lessons from Terrorist Use of Social Media in South Asia