Detecting Fake News on Twitter Using Machine Learning Models

Detecting Fake News on Twitter Using Machine Learning Models Emma Cueva Grace Ee [email protected] [email protected] Akshat Iyer Alexandra Pereira Alexander Roseman [email protected] [email protected] [email protected] Dayrene Martinez* [email protected] New Jersey’s Governor’s School of Engineering and Technology July 24, 2020 *Corresponding Author Abstract—With the rising popularity of social media, people On Twitter is a popular social media platform where users have become more aware of current events and important news, can easily share links to articles regardless of validity. As a often through sources such as Twitter. One issue with these result, fake news is rampant. Current solutions to combating sources of news is the prevalence of false information, or fake news. Even as some social media platforms take initiative with fake news are often heavily reliant on the initiative of readers. labels or warnings, fake news continues to have dangerous Social media users are encouraged to be vigilant regarding the consequences beyond misinformation. The goal of this research is news they see to avoid being manipulated. On average, humans to implement a highly effective method of identifying fake news identify lies with 54% accuracy, so the use of AI to spot spread on Twitter through the use of Artificial Intelligence (AI). fake news more accurately is a much more reliable solution More specifically, the investigation studied the Long Short Term Memory (LSTM), Gated Recurrent Unit (GRU), and Natural [3]. Some AI programs have already been created to detect Language Processing (NLP) networks to compare their accuracy fake news; one such program, developed by researchers at the when predicting fake news. The data was preprocessed and University of Western Ontario, performs with 63% accuracy used to train the models that were developed; figures were then [3]. generated for analysis. All three models achieved high accuracy in detecting fake news, however, the NLP model was the only Over the course of this project, by analyzing both real and iteration that possessed the ability to identify satire as fake news. For this reason, the NLP model was the preferred choice for fake news, several AI models were trained and optimized with detecting fake news on Twitter. the goal of increasing the accuracy of fake news detection. This paper discusses the development and comparative analysis of three AI models, LSTM neural networks, GRU, and I. INTRODUCTION NLP neural networks, that accurately detect fake news. Along with the rise of technology, social media has become increasingly popular in the average person’s daily life. Inno- II. BACKGROUND vations allow people to absorb vast amounts of information on a daily basis. Social media provides its users with a platform A. Machine Learning to voice their thoughts and connects people around the world. Machine learning is an application of AI designed to look However, a significant downside to these advances in tech- for patterns in large quantities of data and improve the pre- nology is the increasing prevalence of false information. Fake dictive accuracy and classification of each data point without news is defined as articles that misrepresent information to being explicitly programmed. It is regularly used online in deceive and manipulate their audience. They are ”70% more targeted advertisements and recommendations on streaming likely to be retweeted on Twitter than true ones” [1]. Its ripple services. By identifying patterns in fake news and comparing effects can include increased bigotry, global misunderstand- them to patterns in real news, machine learning can predict ings of current events, and biased election outcomes [2]. the falsity of information [4]. 1 B. Deep Learning • Named entity recognition (NER) is when the network Deep learning is an enhanced version of machine learning identifies proper nouns, such as “America” or “Donald that can identify and magnify patterns that other forms of ma- Trump”, which may have their own associated back- chine learning often miss [4]. Systems called neural networks grounds that would need to be identified. Additionally, use deep learning to mimic the structure of an organic brain. due to the usage of pronouns in English, the network While machine learning uses algorithms to analyze data and would need to identify proper nouns to connect them with apply the patterns it finds, deep learning develops an ensemble pronouns used to refer to them. algorithm to learn how to make decisions on its own [5]. • Co-reference resolution is the process of connecting all pronouns to the nouns that they refer to. This allows the C. Natural Language Processing network to consolidate and link the relevant information. 1) NLP Pipeline: NLP is a field of artificial intelligence 2) Word Embedding: Modern word embedding converts that uses text mining and analytics to process natural human words to dense vectors by projecting them into a high- language in applications such as chatbots, speech recognition, dimensional vector space. This allows neural networks to and targeted advertisements. NLP pipelines can be imple- develop connections between words, and plays an integral role mented into neural networks. They are designed to extract, in the conversion of words to numbers (vectorization), which analyze, and apply the critical information from a sample of can then be processed. Embedding can be incorporated into raw language. These pipelines can be deconstructed into eight neural networks via an embedding layer, which often utilizes components [6]. pre-trained word embeddings, allowing for more accurate translation than retraining with every new model. [8] • Sentence segmentation, where blocks of text are split into smaller samples that are easier for the network to analyze. • Word tokenization, which occurs after the sentences are broken up and splits each sentence into its words. Doing so makes it easier for the network to analyze each word for its components, such as meaning, context, and part of speech. • In tagging parts of speech, the network labels the words it has deemed most important with their parts of speech. This provides context for the network to identify how a word is being used to connect it with adjacent text. • Text lemmatization and stemming reduce words to their Fig. 1. These graphs depict Word2vector and GloVe word embeddings [9]. basic forms so that a network can more easily infer its meaning and connect it to other words. With stemming, the words extracted from the analysis, especially verbs, D. Neural Networks can be broken into roots that are more indicative of While there are several kinds of neural networks, their their definition. For example, “cleaned” would turn to general structures and the manners in which they process data “clean.” However, this process can produce stemmed are all similar, because they are modeled after the network results that are not coherent because it only strips suffixes. of neurons in the brain. Neural networks are constructed with Lemmatization also changes words but it ensures that the layers of densely interconnected nodes that pass input data products are whole words in the language [7]. from one layer to the next [10]. Fundamental layers include • Identifying stop words filters out filler words (like “a,” the input and output layers, as well as the hidden layers, where “the,” “and”) that are unnecessary for the neural net- the unique features of each neural network appear. Within work’s analysis. These words are usually listed for the these layers, each node is assigned a weight to multiply with NLP pipeline to recognize and flag so that the network incoming data. If the product satisfies a threshold value, the can ignore them. data is passed on. These weights and threshold values are • Dependency parsing reads through each sentence and randomized initially, and then trained over time [10]. The final determines how the tokenized words connect to each value is produced by the output layer, which can be translated other to understand the full meaning. It also creates a into a result, and neural networks can be trained and adjusted hierarchy by identifying parent words and associating all until these experimental results match the expected outputs other words with them. [9]. • Identifying noun phrases is important for separating commonly-used nouns from those that hold a different E. Convolutional Neural Networks meaning when combined with other words, such as in A Convolutional Neural Network (CNN) is a type of neural idioms like “it’s raining cats and dogs” versus the normal network with a convolution operation in its hidden layers. This usage of “dogs” alone. operation transforms inputs to outputs by filtering the data. It 2 reduces higher-level functions into a smaller, more condensed It controls the information written onto the Internal Cell State matrix [11]. CNNs perform these transformations through their by the input gate. The final output gate determines what output hidden convolutional layers that are each equipped with several will be generated from the current Internal Cell State.or the filters. These qualities allow for specialization in classification next hidden state, which gives the network a notion of past problems such as image recognition and pattern detection [12]. events This hidden layer will reset after every test to separate This convolution operation differentiates it from the multilayer the tested batches of input [16]. perceptron (MLP). 2) Hidden Layers: The hidden layers of an LSTM serve as Each filter can be mapped to a randomized matrix of a the “memory” of the network, and it is crucial to identify how specified size. These filters convolve, or slide, across each many layers and neurons to use in these layers. Using too many matrix along the input while storing the dot product of each hidden neurons or layers could result in overfitting, which matrix. In the case of image recognition problems, early layers means that the model only learns to identify training data in a CNN may have filters that are designed to detect geometric and doesn’t do well with external data.

Load more