Effects of Pre-Trained Word Embeddings on Text-Based Deception Detection
Total Page:16
File Type:pdf, Size:1020Kb
2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress Effects of Pre-trained Word Embeddings on Text-based Deception Detection David Nam, Jerin Yasmin, Farhana Zulkernine School of Computing Queen’s University Kingston, Canada Email: {david.nam, jerin.yasmin, farhana.zulkernine}@queensu.ca Abstract—With e-commerce transforming the way in which be wary of any deception that the reviews might present. individuals and businesses conduct trades, online reviews have Deception can be defined as a dishonest or illegal method become a great source of information among consumers. With in which misleading information is intentionally conveyed 93% of shoppers relying on online reviews to make their purchasing decisions, the credibility of reviews should be to others for a specific gain [2]. strongly considered. While detecting deceptive text has proven Companies or certain individuals may attempt to deceive to be a challenge for humans to detect, it has been shown consumers for various reasons to influence their decisions that machines can be better at distinguishing between truthful about purchasing a product or a service. While the motives and deceptive online information by applying pattern analysis for deception may be hard to determine, it is clear why on a large amount of data. In this work, we look at the use of several popular pre-trained word embeddings (Word2Vec, businesses may have a vested interest in producing deceptive GloVe, fastText) with deep neural network models (CNN, reviews about their products. In 2019, the profit generated BiLSTM, CNN-BiLSTM) to determine the influence of word by e-commerce was determined to be 3.5 trillion US dollars embedding on the accuracy of detecting deception. Some pre- and was projected to climb to 4.2 trillion US dollars in the trained word embeddings have shown to adversely affect the following year [3]. Moreover, a study found that 93% of classification accuracy when compared to training the model on text embedding using the domain specific data. Through the consumers rely on online reviews to make their purchasing combination of CNN and BiLSTM along with the fastText pre- decisions [4]. With the online market being a great opportu- trained word embeddings, we were able to achieve an accuracy nity for businesses to expand, it is apparent that they look to of 88.8 percent on the hotel review dataset published by Ott promote or maintain a positive image of their products and et al. in 2011. brand. As for consumers, this may pose to be a challenge Keywords-Artificial Neural Network, Natural Language Pro- as they must constantly ask themselves whether the reviews cessing, Deception Detection, Online Reviews, Deep Learning, are reliable or not. Unfortunately, research by Bond et al. Convolutional Neural Network, Long Short Term Memory, found that humans’ deception detection skills without any Word Embeddings aid were no better than a flip of a coin [5]. Thus the paper looks into the use of Artificial Neural I. INTRODUCTION Network (ANN) classifiers and Natural Language Processing With the increasing access to the internet and popularity of (NLP) techniques to detect deception that may be present e-commerce, online shopping has rapidly become a common within product reviews. Although the idea or the use of ANN practice for consumers to avoid travel time and cost, and to detect deception is not new, our work focuses on studying acquire online services and goods to be delivered at home. some of the popular pre-trained word embedding methods For the year 2020 alone, it has been estimated that there with some of the state-of-the-art deep learning models to de- will be 2.05 billion digital buyers, which equates to about rive a solution to the problem of deception detection for hotel a quarter of the global population [1]. Furthermore, it has review data. Pre-trained word embeddings are simply vec- been found that 85% of those consumers conduct their tor representations of words generated by a computational own research online before making a purchase [1]. While model using a word to vector conversion algorithm based on the internet provides various materials for research, online a specific data corpus. Of the word embedding methods that reviews have become an important component in forming a are available, we specifically looked into Word2vec, GloVe, basis for purchasing decisions. and fastText, in combination with classifiers to conduct a The advantage of having online reviews is that it allows comparative analysis on which pre-trained word embedding product information and experiences to be readily available method positively affects the accuracy of the classifiers. to consumers [1]. They have also been a great way for To conduct our experiment, we used the dataset that was buyers to share their own experiences and opinions about the published by Ott et al. in 2011 which consists of various product as the reviews give others a glimpse of what they can hotel reviews [6][7]. With the use of the dataset, we trained expect. While new customers look through this information and tested different combinations of ANN classifiers and to gain insights into their potential purchases, they must pre-trained word embeddings to determine which model 978-1-7281-6609-4/20/$31.00 ©2020 IEEE 443 DOI 10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00083 gave the highest accuracy in detecting deception. C. Bidirectional Long Short Term Memory (BiLSTM) The rest of the paper is organized as follows. Section Recurrent Neural Network (RNN) is another class of deep 2 describes background concepts and related works. neural networks known for its use in NLP that is able to Experimentation with a selected set of neural network handle input sequences of any length and capture long- classifiers and word embedding methods is illustrated in term dependencies. Among the variants of RNN, Long Short Section 3. Section 4 presents and discusses the results. Term Memory (LSTM) is explicitly designed to address Section 5 presents concluding remarks on the research long-term dependency problems by storing information for including possible future work direction. long periods of time. An LSTM unit consists of a cell, an input gate, an output gate and a forget gate. Each cell re- members values over arbitrary time intervals while the three II. BACKGROUND AND RELATED WORK gates regulate the flow of information into and out of the cell by differentiating between important and unimportant In this study, we apply techniques that are common information. An added benefit of LSTM is that it eliminates for analyzing natural language. We use word embedding the problem of vanishing gradient that is prevalent within methods to model the texts, as well as classification models the standard RNN [14]. However, a limitation is that at any to predict the class labels from the input values. Therefore, particular node, it has access to only the past information, we describe some of these concepts in this section followed which means that the output can only be generated based on by the related work. what the network has seen. Bidirectional LSTM addresses that issue as it propagates input data forward as well as A. Word Embeddings backward in time during training. This provides additional context to the network and results in a better understanding Word embeddings are a learned representation for text of the input. such that words that have the same meaning will be similarly Despite RNN being established as an excellent network represented [8]. The use of word embeddings allows for for sequential modeling, we looked to take advantage of increased performance in models that look to tackle NLP the benefits that Bidirectional LSTM has to offer. By using tasks. Recent deep learning word embedding models can BiLSTM in text classification to detect deception, it can learn contexts of words from a large corpus of unstructured ”look forward” in the sentence to see if “future” tokens natural language text data and formulate an algorithm to may influence the current decision with effectiveness while transform words into numeric vector representations to be retaining information which is from the past and future. used in a variety of machine learning applications [9]. There are two options to using word embeddings, the first one is D. Related Work to implement an embedding layer and to train it with the Our work proposes CNN, BiLSTM and CNN-BiLSTM problem specific text data. Another option is to use pre- with different types of embeddings in deception detection. trained word embeddings that others have already trained. As such we briefly discuss some related work on CNN- LSTM model in text classification and different neural B. Convolutional Neural Network (CNN) network approaches proposed for deception detection. Convolutional Neural Network is a class of deep neural CNN-LSTM: Researchers have combined CNN with networks that have recently gained popularity for its ability LSTM as an essential direction of exploration. We discuss to learn the inherent characteristics of a given data [10]. some of the combined approaches used in text classification. CNNs consist of an input layer, a number of hidden layers, Zhou et al. [15] proposed a novel model called C-LSTM and an output layer. The hidden layers consist of convolu- combining the advantages of CNN and LSTM. In their tional layers that employ a linear mathematical transforma- proposed architecture, CNN was used to extract a sequence tion to extract key features from the data [11]. Despite its of higher-level phrase representations which was fed into a common use in image processing [12], it has also proven to long short-term memory neural network (LSTM) to obtain be effective for NLP [13].