Fake News Detection and Production Using Transformer-Based NLP Models

Fake News Detection and Production using Transformer-based NLP Models Authors: Branislav Sándor - Student Number: 117728 Frode Paaske - Student Number: 102164 Martin Pajtás - Student Number: 117468 Program: MSc Business Administration and Information Systems – Data Science Course: Master’s Thesis Date: May 15, 2020 Pages: 81 Characters: 184,169 Supervisor: Daniel Hardt Contract #: 17246 1 Acknowledgements The authors would like to express their sincerest gratitude towards the following essential supporters of life throughout their academic journey: - Stack Overflow - GitHub - Google - YouTube - Forno a Legna - Tatra tea - Nexus - Eclipse Sandwich - The Cabin Trip - BitLab “Don’t believe everything you hear: real eyes, realize, real lies” Tupac Shakur “A lie told often enough becomes the truth” Vladimir Lenin 2 Glossary The following is an overview of abbreviations that will be used in this paper. Abbreviation Full Name BERT Bidirectional Encoder Representations from Transformers Bi-LSTM Bidirectional Long Short-Term Memory BoW Bag of Words CBOW Continuous Bag of Words CNN Convolutional Neural Network CV Count Vectorizer EANN Event Adversarial Neural Network ELMo Embeddings from Language Models FFNN Feed-Forward Neural Network GPT-2 Generative Pretrained Transformer 2 GPU Graphics Processing Unit KNN K-Nearest Neighbors LSTM Long Short-Term Memory MLM Masked Language Modeling NLP Natural Language Processing NSP Next Sentence Prediction OOV Out of Vocabulary Token PCFG Probabilistic Context-Free Grammars RNN Recurrent Neural Network RST Rhetorical Structure Theory SEO Search Engine Optimization SGD Stochastic Gradient Descent SVM Support-Vector Machine TF Term Frequency TF-IDF Term Frequency - Inverse Document Frequency UNK Unknown Token VRAM Video RAM 3 Abstract This paper studies fake news detection using the biggest publicly available dataset of naturally occurring expert fact-checked claims (Augenstein et al., 2019). Based on existing theory that defines the task of fake news detection as a binary classification problem, this paper conducted an extensive process of reducing the label space of the dataset. Traditional machine learning models and three different BERT-based models were applied to the binary classification task on the data to investigate the performance of fake news detection. The RoBERTa model performed the best with an accuracy score of 0.7094. This implies that the model is capable of capturing syntactic features from a claim without the use of external features. In addition, this paper investigated the feasibility and effects of expanding the existing training data with artificially produced claims using the GPT-2 language model. The results showed that the addition of artificially produced training data, whether fact- checked or not, generally led to worse performance of the BERT-based models while increasing the accuracy scores of the traditional machine learning models. The paper finds that the Naïve Bayes model achieved the highest overall score on both the fact-checked and non-fact-checked artificially produced claims in addition to the human-produced training data, with accuracy scores of 0.7058 and 0.7047, respectively. These effects were hypothesized to be caused by differences in the underlying architecture of the different models, particularly the self-attention element of the Transformer architecture might have suffered from the stylistic and grammar inconsistencies in the artificially produced text. The results of this paper suggest that the field of automatic fake news detection requires further research. Specifically, future work should address the lack of sufficient data quality, size, and diversity, the increasing demand for computational resources, and inadequate inference speed severely limiting the application of BERT-based models in real-life scenarios. Keywords: Fake News, Transformers, Natural Language Processing, BERT, GPT-2, Fake News Detection, 4 Table of Contents Acknowledgements ................................................................................................................... 2 Glossary .................................................................................................................................... 3 Abstract .................................................................................................................................... 4 Introduction .............................................................................................................................. 7 Natural Language Processing .............................................................................................................. 8 Motivation ......................................................................................................................................... 9 Economics of Fake News ............................................................................................................................................ 9 Research Aim ................................................................................................................................... 13 Research Question ........................................................................................................................... 13 Theoretical Framework............................................................................................................ 15 News ............................................................................................................................................... 15 Fake News .................................................................................................................................................................17 Fake News Detection ....................................................................................................................... 20 Digital Media and Fake News ...................................................................................................................................21 Feature Extraction ....................................................................................................................................................23 Model Creation .........................................................................................................................................................25 Datasets ....................................................................................................................................................................27 Related Work .......................................................................................................................... 29 Text Features From Claims ............................................................................................................... 29 Fake News Detection Using Transformer Models .............................................................................. 31 Artificial Fake News.......................................................................................................................... 32 Limitations of Existing Research ....................................................................................................... 33 Methodology .......................................................................................................................... 35 Research Philosophy ........................................................................................................................ 35 Ontology ...................................................................................................................................................................35 Epistemology ............................................................................................................................................................35 Methodological Choice .............................................................................................................................................36 Data ................................................................................................................................................ 38 Primary Data .............................................................................................................................................................38 Data Preparation and Pre-processing.......................................................................................................................39 Label Reduction ........................................................................................................................................................40 Text Pre-processing .......................................................................................................................... 43 Text Feature Extraction Techniques .................................................................................................. 45 N-Grams ....................................................................................................................................................................46 Bag-of-Words ............................................................................................................................................................46 Term Frequency – Inverse Document Frequency ....................................................................................................47 Introduction to Applied Models .......................................................................................................

Fake News Detection and Production Using Transformer-Based NLP Models

Intro to Tensorflow 2.0 MBL, August 2019

Ways to Use Machine Learning Approaches for Software Development

Keras2c: a Library for Converting Keras Neural Networks to Real-Time Compatible C

Tensorflow, Theano, Keras, Torch, Caffe Vicky Kalogeiton, Stéphane Lathuilière, Pauline Luc, Thomas Lucas, Konstantin Shmelkov Introduction

Deep Learning with Keras I

Intro to Keras

Auto-Keras: an Efficient Neural Architecture Search System

Wavefront Parallelization of Recurrent Neural Networks on Multi-Core

Introduction to Keras Tensorflow

Deep Learning Software Security and Fairness of Deep Learning SP18 Today

Approach Pre-Trained Deep Learning Models with Caution Pre-Trained Models Are Easy to Use, but Are You Glossing Over Details That Could Impact Your Model Performance?

Practical Machine Learning Neural Network Structure