Identification of Machine- Generated Reviews 1D CNN Applied on the GPT-2 Neural Language Model

Total Page:16

File Type:pdf, Size:1020Kb

Load more

DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2020 Identification of machine- generated reviews 1D CNN applied on the GPT-2 neural language model STAFFAN AL-KADHIMI PAUL LÖWENSTRÖM KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Identification of machine-generated reviews 1D CNN applied on the GPT-2 neural language model STAFFAN AL-KADHIMI PAUL LÖWENSTRÖM Degree Project in Computer Science, DD142X Date: June 8, 2020 Supervisor: Christopher Peters Examiner: Pawel Herman School of Electrical Engineering and Computer Science Swedish title: Identifiering av maskingenererade recensioner: 1D CNN applicerat på den neurala språkmodellen GPT-2 iii Abstract With recent advances in machine learning, computers are able to create more convincing text, creating a concern for an increase in fake information on the internet. At the same time, researchers are creating tools for detecting computer-generated text. Researchers have been able to exploit flaws in neural language models and use them against themselves; for example, GLTR provides human users with a visual representation of texts that assists in classification as human-written or machine-generated. By training a convolutional neural network (CNN) on GLTR output data from analysis of machine-generated and human-written movie reviews, we are able to take GLTR a step further and use it to automatically perform this clas- sification. However, using a CNN with GLTR as the main source of data for classification does not appear to be enough to be on par with the best existing approaches. iv Sammanfattning I och med de senaste framstegen inom maskininlärning kan datorer skapa mer och mer övertygande text, vilket skapar en oro för ökad falsk information på internet. Samtidigt vägs detta upp genom att forskare skapar verktyg för att identifiera datorgenererad text. Forskare har kunnat utnyttja svagheter i neurala språkmodeller och använ- da dessa mot dem. Till exempel tillhandahåller GLTR användare en visuell representation av texter, som hjälp för att klassificera dessa som människo- skrivna eller maskingenererade. Genom att träna ett faltningsnätverk (convolutional neural network, eller CNN) på utdata från GLTR-analys av maskingenererade och människoskrivna filmrecensioner, tar vi GLTR ett steg längre och använder det för att genomföra klassifikationen automatiskt. Emellertid tycks det ej vara tillräckligt att använ- da en CNN med GLTR som huvuddatakälla för att klassificera på en nivå som är jämförbar med de bästa existerande metoderna. Contents 1 Introduction 1 1.1 Research Question . .2 1.2 Scope . .2 1.3 Approach . .3 1.4 Thesis Outline . .3 2 Background 4 2.1 Neural language models . .4 2.1.1 Transfer learning . .4 2.1.2 GPT-2 . .5 2.2 Detecting machine-generated text . .5 2.2.1 GAN-based approaches . .5 2.2.2 Neural LMs as weapons against themselves . .6 2.3 Text classification using CNNs . .7 3 Method 10 3.1 Text preparation stage . 10 3.2 GLTR stage . 11 3.3 CNN training stage . 12 3.4 CNN testing stage . 13 4 Results 14 4.1 Overall performance . 14 4.2 Impact of text length . 16 5 Discussion 18 5.1 Limitations . 20 5.2 Future Work . 20 6 Conclusions 21 v vi CONTENTS Bibliography 22 A Structure Diagram 25 B Review Samples 27 Chapter 1 Introduction “OpenAI has published the text-generating AI it said was too dangerous to share” —The Verge, November 2019.1 Never before has it been easier to publish information, and now with the recent improvements in machine learning, computers can create text that is hard to differentiate from human writing. Simple methods for machine generation of text have existed for a relatively long time, but in recent years it has been possible to utilize deep learning to achieve better results (Gatt and Krahmer 2017). In 2019, the research organi- zation OpenAI released GPT-2, which is one of the latest additions to the text generating neural language models, and has been recognized for its capabili- ties in generating false information including fake news1 (Radford et al. 2019). This has sprouted concerns about the future and the dangers of text-generating artificial intelligence (AI)1 (Radford et al. 2019). As technology continues to improve, we may need ways to distinguish real from fake automatically, considering the potential massive amount of false information that could be published on different media. Detection of auto- matically generated text is something that is being actively researched. For example, GLTR is a tool that has been recognized for how it can help people detect machine-generated text by visualizing important data given by GPT-2.2 Today people choose movies, restaurants, airlines, hair salons, and more 1Vincent, James (2019). OpenAI has published the text-generating AI it said was too dangerous to share. url: https://www.theverge.com/2019/11/7/20953040/ openai-text-generation-ai-gpt-2-full-model-release-1-5b-parameters (visited 2020-02-15) 2Quach, Katyanna (2019). Remember the OpenAI text spewer that was too danger- ous to release? Fear not, boffins have built a BS detector for it. url: https://www. theregister.co.uk/2019/03/11/openai_gltr_ai/ (visited 2020-05-13) 1 2 CHAPTER 1. INTRODUCTION based on online reviews that they read. The integrity of reviews could be chal- lenged if computers start to fill the internet with computer-generated reviews. This could in turn deceive people into buying a product or service, or more seriously, cripple companies and potential adversaries. The number of fake reviews on the web today is likely large; Akoglu, Chandy, and Faloutsos (2013) estimate that around 20% of the reviews on Yelp are faked by paid human writers. Automatically generating convincing reviews is also already possible, as evidenced by Adelani et al. (2019). In a capitalistic society where everyone wants to outperform their opponents, it may only be a matter of time before computer-generated reviews become a common weapon in the company arsenal. 1.1 Research Question The purpose of our work is to investigate a possible extension of GLTR, that would be possible to use as part of spam filters or similar utilities. In this thesis, we will therefore study: “Is it possible to automatically classify reviews as human-written or machine-generated, using the text body mapped to GLTR values as input?” Under the assumption that the answer to the aforementioned research ques- tion is “yes”, the following will be studied as subquestions: • “How well would this adaptation work for detecting reviews written by models other than the one GLTR was trained on?” • “How does the length of a given review influence the classification per- formance?” • “Would this adaptation be on par with other automatic detection meth- ods?” 1.2 Scope There are a large number of ways to generate text, and due to this we chose to limit our study to the GPT-2 language model only, and we also limited it to the smaller versions of it (124M and 355M) due to limited available resources. There is also a large number of types of reviews; by limiting ourselves to a single topic we can reduce the amount of variation in our text, and thus both reduce the complexity of our project and make our tool more useful in practice CHAPTER 1. INTRODUCTION 3 for entities such as websites. In our case, we focus on movie reviews from the media website and database IMDB, which makes our work potentially useful for identifying computer-generated reviews posted there. 1.3 Approach In short, we utilize GPT-2 to find out how likely it is to write text similar to what is analyzed. A similar project, GLTR, has analyzed text before with this approach with promising results (Gehrmann, Strobelt, and Rush 2019), and we integrate it as the core of our framework. However, unlike GLTR which is a tool for making it easier for humans to identify machine-generated text (Gehrmann, Strobelt, and Rush 2019), we use a convolutional neural network to automatically classify the text based on the data extracted from the analysis. 1.4 Thesis Outline The following chapter explores the current state of fake text/review detec- tion, in particular related work that has emerged following the development of GPT-2 and similar research. It also introduces the building blocks of the project: neural language models, GPT-2, GLTR, and convolutional neural net- works. This is followed by Chapter 3, where we describe our approach in detail and go through the structure of the neural network we create. In Chapter 4, we present the results and compare the models we create. Chapter 5 is then delegated to discussion of the results as well as limitations and future work, and finally, Chapter 6 holds the conclusion where we summarize our work and discuss it in a broader context. Chapter 2 Background 2.1 Neural language models A language model (LM) essentially describes the probability of each possible token (e.g., a word) appearing as the next one given an input text (Mikolov et al. 2010). Traditional models are based on n-grams and look at the n previous words to decide what word should follow (Arisoy et al. 2012). More recently, neural networks have risen into the spotlight for language model generation due to their handling of data sparseness; neural networks are able to avoid data sparseness issues through embedding words in a continuous space, which removes issues with small changes in probabilities creating a big impact (Arisoy et al. 2012). In particular, recurrent neural networks (RNN) with long-short term memory have been able to build on previous approaches by enhancing the network’s memory capability (Józefowicz et al. 2016). Vaswani et al. (2017) introduced transformer architectures, which have the potential to take the place of RNNs.
Recommended publications
  • Training Autoencoders by Alternating Minimization

    Training Autoencoders by Alternating Minimization

    Under review as a conference paper at ICLR 2018 TRAINING AUTOENCODERS BY ALTERNATING MINI- MIZATION Anonymous authors Paper under double-blind review ABSTRACT We present DANTE, a novel method for training neural networks, in particular autoencoders, using the alternating minimization principle. DANTE provides a distinct perspective in lieu of traditional gradient-based backpropagation techniques commonly used to train deep networks. It utilizes an adaptation of quasi-convex optimization techniques to cast autoencoder training as a bi-quasi-convex optimiza- tion problem. We show that for autoencoder configurations with both differentiable (e.g. sigmoid) and non-differentiable (e.g. ReLU) activation functions, we can perform the alternations very effectively. DANTE effortlessly extends to networks with multiple hidden layers and varying network configurations. In experiments on standard datasets, autoencoders trained using the proposed method were found to be very promising and competitive to traditional backpropagation techniques, both in terms of quality of solution, as well as training speed. 1 INTRODUCTION For much of the recent march of deep learning, gradient-based backpropagation methods, e.g. Stochastic Gradient Descent (SGD) and its variants, have been the mainstay of practitioners. The use of these methods, especially on vast amounts of data, has led to unprecedented progress in several areas of artificial intelligence. On one hand, the intense focus on these techniques has led to an intimate understanding of hardware requirements and code optimizations needed to execute these routines on large datasets in a scalable manner. Today, myriad off-the-shelf and highly optimized packages exist that can churn reasonably large datasets on GPU architectures with relatively mild human involvement and little bootstrap effort.
  • CS 189 Introduction to Machine Learning Spring 2021 Jonathan Shewchuk HW6

    CS 189 Introduction to Machine Learning Spring 2021 Jonathan Shewchuk HW6

    CS 189 Introduction to Machine Learning Spring 2021 Jonathan Shewchuk HW6 Due: Wednesday, April 21 at 11:59 pm Deliverables: 1. Submit your predictions for the test sets to Kaggle as early as possible. Include your Kaggle scores in your write-up (see below). The Kaggle competition for this assignment can be found at • https://www.kaggle.com/c/spring21-cs189-hw6-cifar10 2. The written portion: • Submit a PDF of your homework, with an appendix listing all your code, to the Gradescope assignment titled “Homework 6 Write-Up”. Please see section 3.3 for an easy way to gather all your code for the submission (you are not required to use it, but we would strongly recommend using it). • In addition, please include, as your solutions to each coding problem, the specific subset of code relevant to that part of the problem. Whenever we say “include code”, that means you can either include a screenshot of your code, or typeset your code in your submission (using markdown or LATEX). • You may typeset your homework in LaTeX or Word (submit PDF format, not .doc/.docx format) or submit neatly handwritten and scanned solutions. Please start each question on a new page. • If there are graphs, include those graphs in the correct sections. Do not put them in an appendix. We need each solution to be self-contained on pages of its own. • In your write-up, please state with whom you worked on the homework. • In your write-up, please copy the following statement and sign your signature next to it.
  • Can Temporal-Difference and Q-Learning Learn Representation? a Mean-Field Analysis

    Can Temporal-Difference and Q-Learning Learn Representation? a Mean-Field Analysis

    Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Analysis Yufeng Zhang Qi Cai Northwestern University Northwestern University Evanston, IL 60208 Evanston, IL 60208 [email protected] [email protected] Zhuoran Yang Yongxin Chen Zhaoran Wang Princeton University Georgia Institute of Technology Northwestern University Princeton, NJ 08544 Atlanta, GA 30332 Evanston, IL 60208 [email protected] [email protected] [email protected] Abstract Temporal-difference and Q-learning play a key role in deep reinforcement learning, where they are empowered by expressive nonlinear function approximators such as neural networks. At the core of their empirical successes is the learned feature representation, which embeds rich observations, e.g., images and texts, into the latent space that encodes semantic structures. Meanwhile, the evolution of such a feature representation is crucial to the convergence of temporal-difference and Q-learning. In particular, temporal-difference learning converges when the function approxi- mator is linear in a feature representation, which is fixed throughout learning, and possibly diverges otherwise. We aim to answer the following questions: When the function approximator is a neural network, how does the associated feature representation evolve? If it converges, does it converge to the optimal one? We prove that, utilizing an overparameterized two-layer neural network, temporal- difference and Q-learning globally minimize the mean-squared projected Bellman error at a sublinear rate. Moreover, the associated feature representation converges to the optimal one, generalizing the previous analysis of [21] in the neural tan- gent kernel regime, where the associated feature representation stabilizes at the initial one. The key to our analysis is a mean-field perspective, which connects the evolution of a finite-dimensional parameter to its limiting counterpart over an infinite-dimensional Wasserstein space.
  • Neural Networks Shuai Li John Hopcroft Center, Shanghai Jiao Tong University

    Neural Networks Shuai Li John Hopcroft Center, Shanghai Jiao Tong University

    Lecture 6: Neural Networks Shuai Li John Hopcroft Center, Shanghai Jiao Tong University https://shuaili8.github.io https://shuaili8.github.io/Teaching/VE445/index.html 1 Outline • Perceptron • Activation functions • Multilayer perceptron networks • Training: backpropagation • Examples • Overfitting • Applications 2 Brief history of artificial neural nets • The First wave • 1943 McCulloch and Pitts proposed the McCulloch-Pitts neuron model • 1958 Rosenblatt introduced the simple single layer networks now called Perceptrons • 1969 Minsky and Papert’s book Perceptrons demonstrated the limitation of single layer perceptrons, and almost the whole field went into hibernation • The Second wave • 1986 The Back-Propagation learning algorithm for Multi-Layer Perceptrons was rediscovered and the whole field took off again • The Third wave • 2006 Deep (neural networks) Learning gains popularity • 2012 made significant break-through in many applications 3 Biological neuron structure • The neuron receives signals from their dendrites, and send its own signal to the axon terminal 4 Biological neural communication • Electrical potential across cell membrane exhibits spikes called action potentials • Spike originates in cell body, travels down axon, and causes synaptic terminals to release neurotransmitters • Chemical diffuses across synapse to dendrites of other neurons • Neurotransmitters can be excitatory or inhibitory • If net input of neuro transmitters to a neuron from other neurons is excitatory and exceeds some threshold, it fires an action potential 5 Perceptron • Inspired by the biological neuron among humans and animals, researchers build a simple model called Perceptron • It receives signals 푥푖’s, multiplies them with different weights 푤푖, and outputs the sum of the weighted signals after an activation function, step function 6 Neuron vs.
  • Dynamic Modification of Activation Function Using the Backpropagation Algorithm in the Artificial Neural Networks

    Dynamic Modification of Activation Function Using the Backpropagation Algorithm in the Artificial Neural Networks

    (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 10, No. 4, 2019 Dynamic Modification of Activation Function using the Backpropagation Algorithm in the Artificial Neural Networks Marina Adriana Mercioni1, Alexandru Tiron2, Stefan Holban3 Department of Computer Science Politehnica University Timisoara, Timisoara, Romania Abstract—The paper proposes the dynamic modification of These functions represent the main activation functions, the activation function in a learning technique, more exactly being currently the most used by neural networks, representing backpropagation algorithm. The modification consists in from this reason a standard in building of an architecture of changing slope of sigmoid function for activation function Neural Network type [6,7]. according to increase or decrease the error in an epoch of learning. The study was done using the Waikato Environment for The activation functions that have been proposed along the Knowledge Analysis (WEKA) platform to complete adding this time were analyzed in the light of the applicability of the feature in Multilayer Perceptron class. This study aims the backpropagation algorithm. It has noted that a function that dynamic modification of activation function has changed to enables differentiated rate calculated by the neural network to relative gradient error, also neural networks with hidden layers be differentiated so and the error of function becomes have not used for it. differentiated [8]. Keywords—Artificial neural networks; activation function; The main purpose of activation function consists in the sigmoid function; WEKA; multilayer perceptron; instance; scalation of outputs of the neurons in neural networks and the classifier; gradient; rate metric; performance; dynamic introduction a non-linear relationship between the input and modification output of the neuron [9].
  • A Deep Reinforcement Learning Neural Network Folding Proteins

    A Deep Reinforcement Learning Neural Network Folding Proteins

    DeepFoldit - A Deep Reinforcement Learning Neural Network Folding Proteins Dimitra Panou1, Martin Reczko2 1University of Athens, Department of Informatics and Telecommunications 2Biomedical Sciences Research Center “Alexander Fleming” ABSTRACT Despite considerable progress, ab initio protein structure prediction remains suboptimal. A crowdsourcing approach is the online puzzle video game Foldit [1], that provided several useful results that matched or even outperformed algorithmically computed solutions [2]. Using Foldit, the WeFold [3] crowd had several successful participations in the Critical Assessment of Techniques for Protein Structure Prediction. Based on the recent Foldit standalone version [4], we trained a deep reinforcement neural network called DeepFoldit to improve the score assigned to an unfolded protein, using the Q-learning method [5] with experience replay. This paper is focused on model improvement through hyperparameter tuning. We examined various implementations by examining different model architectures and changing hyperparameter values to improve the accuracy of the model. The new model’s hyper-parameters also improved its ability to generalize. Initial results, from the latest implementation, show that given a set of small unfolded training proteins, DeepFoldit learns action sequences that improve the score both on the training set and on novel test proteins. Our approach combines the intuitive user interface of Foldit with the efficiency of deep reinforcement learning. KEYWORDS: ab initio protein structure prediction, Reinforcement Learning, Deep Learning, Convolution Neural Networks, Q-learning 1. ALGORITHMIC BACKGROUND Machine learning (ML) is the study of algorithms and statistical models used by computer systems to accomplish a given task without using explicit guidelines, relying on inferences derived from patterns. ML is a field of artificial intelligence.
  • A Study of Activation Functions for Neural Networks Meenakshi Manavazhahan University of Arkansas, Fayetteville

    A Study of Activation Functions for Neural Networks Meenakshi Manavazhahan University of Arkansas, Fayetteville

    University of Arkansas, Fayetteville ScholarWorks@UARK Computer Science and Computer Engineering Computer Science and Computer Engineering Undergraduate Honors Theses 5-2017 A Study of Activation Functions for Neural Networks Meenakshi Manavazhahan University of Arkansas, Fayetteville Follow this and additional works at: http://scholarworks.uark.edu/csceuht Part of the Other Computer Engineering Commons Recommended Citation Manavazhahan, Meenakshi, "A Study of Activation Functions for Neural Networks" (2017). Computer Science and Computer Engineering Undergraduate Honors Theses. 44. http://scholarworks.uark.edu/csceuht/44 This Thesis is brought to you for free and open access by the Computer Science and Computer Engineering at ScholarWorks@UARK. It has been accepted for inclusion in Computer Science and Computer Engineering Undergraduate Honors Theses by an authorized administrator of ScholarWorks@UARK. For more information, please contact [email protected], [email protected]. A Study of Activation Functions for Neural Networks An undergraduate thesis in partial fulfillment of the honors program at University of Arkansas College of Engineering Department of Computer Science and Computer Engineering by: Meena Mana 14 March, 2017 1 Abstract: Artificial neural networks are function-approximating models that can improve themselves with experience. In order to work effectively, they rely on a nonlinearity, or activation function, to transform the values between each layer. One question that remains unanswered is, “Which non-linearity is optimal for learning with a particular dataset?” This thesis seeks to answer this question with the MNIST dataset, a popular dataset of handwritten digits, and vowel dataset, a dataset of vowel sounds. In order to answer this question effectively, it must simultaneously determine near-optimal values for several other meta-parameters, including the network topology, the optimization algorithm, and the number of training epochs necessary for the model to converge to good results.
  • Long Short-Term Memory 1 Introduction

    Long Short-Term Memory 1 Introduction

    LONG SHORTTERM MEMORY Neural Computation J urgen Schmidhuber Sepp Ho chreiter IDSIA Fakultat f ur Informatik Corso Elvezia Technische Universitat M unchen Lugano Switzerland M unchen Germany juergenidsiach ho chreitinformatiktumuenchende httpwwwidsiachjuergen httpwwwinformatiktumuenchendehochreit Abstract Learning to store information over extended time intervals via recurrent backpropagation takes a very long time mostly due to insucient decaying error back ow We briey review Ho chreiters analysis of this problem then address it by introducing a novel ecient gradientbased metho d called Long ShortTerm Memory LSTM Truncating the gradient where this do es not do harm LSTM can learn to bridge minimal time lags in excess of discrete time steps by enforcing constant error ow through constant error carrousels within sp ecial units Multiplicative gate units learn to op en and close access to the constant error ow LSTM is lo cal in space and time its computational complexity p er time step and weight is O Our exp eriments with articial data involve lo cal distributed realvalued and noisy pattern representations In comparisons with RTRL BPTT Recurrent CascadeCorrelation Elman nets and Neural Sequence Chunking LSTM leads to many more successful runs and learns much faster LSTM also solves complex articial long time lag tasks that have never b een solved by previous recurrent network algorithms INTRODUCTION Recurrent networks can in principle use their feedback connections to store representations of recent input events in form of activations
  • Chapter 02: Fundamentals of Neural Networks

    Chapter 02: Fundamentals of Neural Networks

    FUNDAMENTALS OF NEURAL 2 NETWORKS Ali Zilouchian 2.1 INTRODUCTION For many decades, it has been a goal of science and engineering to develop intelligent machines with a large number of simple elements. References to this subject can be found in the scientific literature of the 19th century. During the 1940s, researchers desiring to duplicate the function of the human brain, have developed simple hardware (and later software) models of biological neurons and their interaction systems. McCulloch and Pitts [1] published the first systematic study of the artificial neural network. Four years later, the same authors explored network paradigms for pattern recognition using a single layer perceptron [2]. In the 1950s and 1960s, a group of researchers combined these biological and psychological insights to produce the first artificial neural network (ANN) [3,4]. Initially implemented as electronic circuits, they were later converted into a more flexible medium of computer simulation. However, researchers such as Minsky and Papert [5] later challenged these works. They strongly believed that intelligence systems are essentially symbol processing of the kind readily modeled on the Von Neumann computer. For a variety of reasons, the symbolic–processing approach became the dominant method. Moreover, the perceptron as proposed by Rosenblatt turned out to be more limited than first expected. [4]. Although further investigations in ANN continued during the 1970s by several pioneer researchers such as Grossberg, Kohonen, Widrow, and others, their works received relatively less attention. The primary factors for the recent resurgence of interest in the area of neural networks are the extension of Rosenblatt, Widrow and Hoff’s works dealing with learning in a complex, multi-layer network, Hopfield mathematical foundation for understanding the dynamics of an important class of networks, as well as much faster computers than those of 50s and 60s.
  • 1 Activation Functions in Single Hidden Layer Feed

    1 Activation Functions in Single Hidden Layer Feed

    View metadata, citation and similar papers at core.ac.uk brought to you by CORE Selçuk-Teknik Dergisi ISSN 1302-6178 Journal ofprovided Selcuk by- Selçuk-TeknikTechnic Dergisi (E-Journal - Selçuk Üniversiti) Özel Sayı 2018 (ICENTE’18) Special Issue 2018 (ICENTE'18) ACTIVATION FUNCTIONS IN SINGLE HIDDEN LAYER FEED-FORWARD NEURAL NETWORKS Ender SEVİNÇ University of Turkish Aeronautical Association, Ankara Turkey [email protected] Abstract Especially in the last decade, Artificial Intelligence (AI) has gained increasing popularity as the neural networks represent incredibly exciting and powerful machine learning-based techniques that can solve many real-time problems. The learning capability of such systems is directly related with the evaluation methods used. In this study, the effectiveness of the calculation parameters in a Single-Hidden Layer Feedforward Neural Networks (SLFNs) will be examined. We will present how important the selection of an activation function is in the learning stage. A lot of work is developed and presented for SLFNs up to now. Our study uses one of the most commonly known learning algorithms, which is Extreme Learning Machine (ELM). Main task of an activation function is to map the input value of a neural network to the output node with a high learning or achievement rate. However, determining the correct activation function is not as simple as thought. First we try to show the effect of the activation functions on different datasets and then we propose a method for selection process of it due to the characteristic of any dataset. The results show that this process is providing a remarkably better performance and learning rate in a sample neural network.
  • Performance Evaluation of Word Embeddings for Sarcasm Detection- a Deep Learning Approach

    Performance Evaluation of Word Embeddings for Sarcasm Detection- a Deep Learning Approach

    Performance Evaluation of Word Embeddings for Sarcasm Detection- A Deep Learning Approach Annie Johnson1, Karthik R2 1 School of Electronics Engineering, Vellore Institute of Technology, Chennai. 2Centre for Cyber Physical Systems, Vellore Institute of Technology, Chennai. [email protected], [email protected] Abstract.Sarcasm detection is a critical step to sentiment analysis, with the aim of understanding, and exploiting the available information from platforms such as social media. The accuracy of sarcasm detection depends on the nature of word embeddings. This research performs a comprehensive analysis of different word embedding techniques for effective sarcasm detection. A hybrid combination of optimizers including; the Adaptive Moment Estimation (Adam), the Adaptive Gradient Algorithm (AdaGrad) and Adadelta functions and activation functions like Rectified Linear Unit (ReLU) and Leaky ReLU have been experimented with.Different word embedding techniques including; Bag of Words (BoW), BoW with Term Frequency–Inverse Document Frequency (TF-IDF), Word2vec and Global Vectors for word representation (GloVe) are evaluated. The highest accuracy of 61.68% was achieved with theGloVeembeddings and an AdaGrad optimizer and a ReLU activation function, used in the deep learning model. Keywords: Sarcasm, Deep Learning, Word Embedding, Sentiment Analysis 1 Introduction Sarcasm is the expression of criticism and mockery in a particular context, by employing words that mean the opposite of what is intended. This verbal irony in sarcastic text challenges the domain of sentiment analysis. With the emergence of smart mobile devices and faster internet speeds, there has been an exponential growth in the use of social media websites. There are about 5.112 billion unique mobile users around the world.
  • Deep Learning and Neural Networks

    Deep Learning and Neural Networks

    ENLP Lecture 14 Deep Learning & Neural Networks Austin Blodgett & Nathan Schneider ENLP | March 4, 2020 a family of algorithms NN Task Example Input Example Output Binary classification Multiclass classification Sequence Sequence to Sequence Tree/Graph Parsing NN Task Example Input Example Output Binary features +/- classification Multiclass features decl, imper, … classification Sequence sentence POS tags Sequence to (English) sentence (Spanish) sentence Sequence Tree/Graph sentence dependency tree or Parsing AMR parsing 2. What’s Deep Learning (DL)? • Deep learning is a subfield of machine learning 3.3. APPROACH 35 • Most machine learning methods work Feature NER TF Current Word ! ! well because of human-designed Previous Word ! ! Next Word ! ! representations and input features Current Word Character n-gram all length ≤ 6 Current POS Tag ! • For example: features for finding Surrounding POS Tag Sequence ! Current Word Shape ! ! named entities like locations or Surrounding Word Shape Sequence ! ! Presence of Word in Left Window size 4 size 9 organization names (Finkel et al., 2010): Presence of Word in Right Window size 4 size 9 Table 3.1: Features used by the CRF for the two tasks: named entity recognition (NER) and template filling (TF). • Machine learning becomes just optimizing weights to best make a final prediction can go beyond imposing just exact identity conditions). I illustrate this by modeling two forms of non-local structure: label consistency in the named entity recognition task, and template consistency in the template filling task. One could imagine many ways of defining such models; for simplicity I use the form (Slide from Manning and Socher) P (y|x) #(λ,y,x) (3.1) M ∝ ∏ θλ λ∈Λ where the product is over a set of violation types Λ,andforeachviolationtypeλ we specify a penalty parameter θλ .Theexponent#(λ,s,o) is the count of the number of times that the violation λ occurs in the state sequence s with respect to the observation sequence o.Thishastheeffectofassigningsequenceswithmoreviolations a lower probability.