2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress

Effects of Pre-trained Word Embeddings on Text-based Deception Detection

David Nam, Jerin Yasmin, Farhana Zulkernine School of Computing Queen’s University Kingston, Canada Email: {david.nam, jerin.yasmin, farhana.zulkernine}@queensu.ca

Abstract—With e-commerce transforming the way in which be wary of any deception that the reviews might present. individuals and businesses conduct trades, online reviews have Deception can be defined as a dishonest or illegal method become a great source of information among consumers. With in which misleading information is intentionally conveyed 93% of shoppers relying on online reviews to make their purchasing decisions, the credibility of reviews should be to others for a specific gain [2]. strongly considered. While detecting deceptive text has proven Companies or certain individuals may attempt to deceive to be a challenge for humans to detect, it has been shown consumers for various reasons to influence their decisions that machines can be better at distinguishing between truthful about purchasing a product or a service. While the motives and deceptive online information by applying pattern analysis for deception may be hard to determine, it is clear why on a large amount of data. In this work, we look at the use of several popular pre-trained word embeddings (, businesses may have a vested interest in producing deceptive GloVe, fastText) with deep neural network models (CNN, reviews about their products. In 2019, the profit generated BiLSTM, CNN-BiLSTM) to determine the influence of word by e-commerce was determined to be 3.5 trillion US dollars embedding on the accuracy of detecting deception. Some pre- and was projected to climb to 4.2 trillion US dollars in the trained word embeddings have shown to adversely affect the following year [3]. Moreover, a study found that 93% of classification accuracy when compared to training the model on text embedding using the domain specific data. Through the consumers rely on online reviews to make their purchasing combination of CNN and BiLSTM along with the fastText pre- decisions [4]. With the online market being a great opportu- trained word embeddings, we were able to achieve an accuracy nity for businesses to expand, it is apparent that they look to of 88.8 percent on the hotel review dataset published by Ott promote or maintain a positive image of their products and et al. in 2011. brand. As for consumers, this may pose to be a challenge Keywords-Artificial Neural Network, Natural Language Pro- as they must constantly ask themselves whether the reviews cessing, Deception Detection, Online Reviews, Deep Learning, are reliable or not. Unfortunately, research by Bond et al. Convolutional Neural Network, Long Short Term Memory, found that humans’ deception detection skills without any Word Embeddings aid were no better than a flip of a coin [5]. Thus the paper looks into the use of Artificial Neural I. INTRODUCTION Network (ANN) classifiers and Natural Language Processing With the increasing access to the internet and popularity of (NLP) techniques to detect deception that may be present e-commerce, online shopping has rapidly become a common within product reviews. Although the idea or the use of ANN practice for consumers to avoid travel time and cost, and to detect deception is not new, our work focuses on studying acquire online services and goods to be delivered at home. some of the popular pre-trained methods For the year 2020 alone, it has been estimated that there with some of the state-of-the-art deep learning models to de- will be 2.05 billion digital buyers, which equates to about rive a solution to the problem of deception detection for hotel a quarter of the global population [1]. Furthermore, it has review data. Pre-trained word embeddings are simply vec- been found that 85% of those consumers conduct their tor representations of words generated by a computational own research online before making a purchase [1]. While model using a word to vector conversion algorithm based on the internet provides various materials for research, online a specific data corpus. Of the word embedding methods that reviews have become an important component in forming a are available, we specifically looked into Word2vec, GloVe, basis for purchasing decisions. and fastText, in combination with classifiers to conduct a The advantage of having online reviews is that it allows comparative analysis on which pre-trained word embedding product information and experiences to be readily available method positively affects the accuracy of the classifiers. to consumers [1]. They have also been a great way for To conduct our experiment, we used the dataset that was buyers to share their own experiences and opinions about the published by Ott et al. in 2011 which consists of various product as the reviews give others a glimpse of what they can hotel reviews [6][7]. With the use of the dataset, we trained expect. While new customers look through this information and tested different combinations of ANN classifiers and to gain insights into their potential purchases, they must pre-trained word embeddings to determine which model

978-1-7281-6609-4/20/$31.00 ©2020 IEEE 443 DOI 10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00083 gave the highest accuracy in detecting deception. C. Bidirectional Long Short Term Memory (BiLSTM) The rest of the paper is organized as follows. Section Recurrent Neural Network (RNN) is another class of deep 2 describes background concepts and related works. neural networks known for its use in NLP that is able to Experimentation with a selected set of neural network handle input sequences of any length and capture long- classifiers and word embedding methods is illustrated in term dependencies. Among the variants of RNN, Long Short Section 3. Section 4 presents and discusses the results. Term Memory (LSTM) is explicitly designed to address Section 5 presents concluding remarks on the research long-term dependency problems by storing information for including possible future work direction. long periods of time. An LSTM unit consists of a cell, an input gate, an output gate and a forget gate. Each cell re- members values over arbitrary time intervals while the three II. BACKGROUND AND RELATED WORK gates regulate the flow of information into and out of the cell by differentiating between important and unimportant In this study, we apply techniques that are common information. An added benefit of LSTM is that it eliminates for analyzing natural language. We use word embedding the problem of vanishing gradient that is prevalent within methods to model the texts, as well as classification models the standard RNN [14]. However, a limitation is that at any to predict the class labels from the input values. Therefore, particular node, it has access to only the past information, we describe some of these concepts in this section followed which means that the output can only be generated based on by the related work. what the network has seen. Bidirectional LSTM addresses that issue as it propagates input data forward as well as A. Word Embeddings backward in time during training. This provides additional context to the network and results in a better understanding Word embeddings are a learned representation for text of the input. such that words that have the same meaning will be similarly Despite RNN being established as an excellent network represented [8]. The use of word embeddings allows for for sequential modeling, we looked to take advantage of increased performance in models that look to tackle NLP the benefits that Bidirectional LSTM has to offer. By using tasks. Recent deep learning word embedding models can BiLSTM in text classification to detect deception, it can learn contexts of words from a large corpus of unstructured ”look forward” in the sentence to see if “future” tokens natural language text data and formulate an algorithm to may influence the current decision with effectiveness while transform words into numeric vector representations to be retaining information which is from the past and future. used in a variety of machine learning applications [9]. There are two options to using word embeddings, the first one is D. Related Work to implement an embedding layer and to train it with the Our work proposes CNN, BiLSTM and CNN-BiLSTM problem specific text data. Another option is to use pre- with different types of embeddings in deception detection. trained word embeddings that others have already trained. As such we briefly discuss some related work on CNN- LSTM model in text classification and different neural B. Convolutional Neural Network (CNN) network approaches proposed for deception detection.

Convolutional Neural Network is a class of deep neural CNN-LSTM: Researchers have combined CNN with networks that have recently gained popularity for its ability LSTM as an essential direction of exploration. We discuss to learn the inherent characteristics of a given data [10]. some of the combined approaches used in text classification. CNNs consist of an input layer, a number of hidden layers, Zhou et al. [15] proposed a novel model called C-LSTM and an output layer. The hidden layers consist of convolu- combining the advantages of CNN and LSTM. In their tional layers that employ a linear mathematical transforma- proposed architecture, CNN was used to extract a sequence tion to extract key features from the data [11]. Despite its of higher-level phrase representations which was fed into a common use in image processing [12], it has also proven to long short-term memory neural network (LSTM) to obtain be effective for NLP [13]. the sentence representation. C-LSTM was able to capture The use of CNN to detect deception is not a new concept both local features of phrases as well as global and temporal as it has been explored by Zhao et al. [21]. In this article, the sentence semantics. Evaluation of this combined approach authors chose to use a CNN due to the variability of the short on sentiment classification and 6-way question classification text that is present within online opinions. Furthermore, its tasks proved the superiority of the combined model over a relatively quick training speed in comparison to Recurrent single CNN or LSTM model. Neural Networks (RNN) was another rationale that was Wang et al. [16] proposed a regional CNN-LSTM model stated for its use. For similar reason, we chose to implement to predict the value of text valence-arousal (VA) rating. The a CNN in our study. regional CNN used an individual sentence as a region and

444 divided the input sentence into several regions to obtain was capable of detecting deceptive opinions by learning and weigh significant sentiment information in each region document-level representation. Their empirically explored according to their contribution to the VA prediction. This technique contained three stages, the first stage was a model was evaluated using Stanford Sentiment convolutional neural network that constructed sentence rep- dataset and Chinese Valence-Arousal Texts. It outperformed resentations from word representation. The second stage was conventional neural network models. a bi-directional gated recurrent neural network (GRNN) with Meng et al. [17] proposed a neural network for sentiment an attention mechanism that produced document represen- analysis called the Feature Enhanced Attention CNN- tations. The output from the second stage was used as a BiLSTM (FEA-NN). Using CNN, a higher level phrase set of features for deception detection in the final stage. representation sequence was extracted from the embedding They evaluated their approach using data from three domains layer. They used BiLSTM to capture both local features (Hotel, Restaurant, and Doctor) and achieved an accuracy of of the phrases as well as global and temporal sentence 84.1%. In their experiment, GRNN with attention mecha- semantics. This aspect level network used an attention nism outperformed the model without attention mechanism. mechanism to model interactions between aspect words and Li et al. [20] proposed a deep learning model called sentences to learn context representation. They evaluated Sentence Weighted Neural Network (SWNN) which learned the proposed model on three datasets: Restaurant, Laptop, the representation of document-level reviews and detected and Twitter and achieved the best performance compared deceptive reviews in the hotel, restaurants, and medical to twelve other models. They also explored the impact domains. A comparison between representation learning al- of different word embedding models on their proposed gorithms and conventional features was also presented in this architecture. paper. The results showed that SWNN could detect deceptive reviews more effectively than other neural network-based Deception Detection: To automate the deception detection models. process, different classification models have been proposed. Zhao et al. [21] looked into deceptions that were present We discuss some of the recent related work on deception within online product reviews. In this work, the authors detection. developed a Convolutional Neural Network (CNN) model One of the biggest challenges in the study of deception for detecting deception. The reason they chose CNN was detection is the unavailability of labeled data. Ott et al. due to the fact that online opinions were generally short texts [6] proposed an approach using Amazon Mechanical Turk that vary in type and content. Additionally, they reported that (AMT) to generate positive deceptive hotel reviews. Uti- the training time was much shorter compared to a Recurrent lizing this approach the dataset was later expanded by the Neural Network (RNN). To improve the accuracy of their inclusion of negative deceptive opinion data in Ott et al.’s model, the authors also implemented a method on top of subsequent work [7]. In this work, the authors evaluated the CNN which preserved the order of the words within texts. performance of negative deception detection by humans and They called this “order-preserving CNN” (OPCNN) as they an automated machine classification approach. The results aimed to retain foundational textual characteristics. The data suggested that n-gram based SVM classifiers outperformed that the authors used to train and test their research is untrained human judges in detecting negative deceptive on hotel reviews that they gathered online. The data was opinions in a balanced dataset. manually annotated with labels to identify deceptive and Fuller et al. [18] looked towards the use of polygraph non-deceptive data based on data annotation methods and devices in the military. To conduct the experiment, the deception theories. The result of their research proved that authors acquired statements from those who were involved the OPCNN model (84.5%) performed slightly better than in crimes on various military bases. In addition to the pre- the traditional CNN model (82.04%). When comparing the processing of data, they found features or cues within the two models to other classification methods such as SVM data that were present in deceptive texts. Pre-processed (+svm, tfidf+svm), they demonstrated that they were data was fed through different models namely Artificial able to achieve better results. Neural Network, Decision Trees, Logistic Regression and Vanta et al. [22] explored many important features by an information fusion-based ensemble method which was extracting review texts and ratings. These features were a collection of the above-mentioned models. Of the four used by SVM, MLP and CNN+LSTM classifiers to detect methods that they employed, the ensemble model proved to deceptive restaurant reviews. The dataset used in this paper be the best as it provided an accuracy of 74.07%. Comparing was collected by Mukherjee et al. [23] from Yelp. Feature this to a prior study done with a handheld polygraph tool extraction from review texts improved the performance of that had an accuracy of 63%, they concluded that text-based classifiers from 2.2% to 2.6% compared to using BoW fea- deception detection was far more accurate with the models tures only. Classification results showed that CNN+LSTM that they used. outperformed other models and had an accuracy of 78.4%. Ren et al. [19] proposed a neural network model that After reviewing some of the recent work related to

445 deception detection, we find that there are no experiments done to explore the effect of different pre-trained word embeddings on the performance of combined neural network models in deception detection. So we used different pre-trained word embeddings: Word2Vec, Glove, and fastText with our three models: CNN, BiLSTM and combined CNN-BiLSTM. Figure 1: Sample Truthful Record from the Dataset

III. IMPLEMENTATIONS A. Experimental Dataset There is a scarcity of labeled datasets that are readily accessible in regard to deception in online reviews. Fortu- nately, the selected dataset by Ott et al. contained reviews labeled as truthful or deceptive. Going into the specifics of the chosen corpus, it comprises of 1600 reviews. Of Figure 2: Sample Deceptive Record from the Dataset these, 800 records are actual reviews that have been posted on 20 Chicago hotels. The other half of the reviews have library, we used the tokenizer class to vectorize the text been generated with the use of Amazon Mechanical Turk corpus. Each text was turned into a sequence of integers (AMT) [6][7]. AMT is a crowd-sourcing service provided by where each integer represented the index of a token in a Amazon that allows individuals or businesses to outsource dictionary [24]. The task of tokenizing text is a common their jobs to a distributed workforce. In this case, the creation practice for NLP, and the tokens were fed to the models of 800 reviews was outsourced to various individuals who as input. Additionally, the parameter that was passed onto had never visited the hotels. The reviews that have been the tokenizer for the maximum number of words to keep collected from the hotels were then labeled as truthful, was set at 5000. Due to the nature of online reviews being whereas the ones that have been generated were labeled as relatively short, it was determined that the number of words deceptive. The authors of the data further classified it into (vocabulary size) did not need to be large. It was felt that four different categories: truthful positive (T and P) review, the number that we had set was sufficient to capture the truthful negative (T and N) review, deceptive positive (D content of the text. The third technique that was conducted and P) review, and deceptive negative (D and N) review. was the padding of the text so that every review is of the Table I shows the number of records in the four categories. same length. In the dataset, it was found that the longest For the scope of this research, we disregarded the polarity review was 784 words, thus all the reviews that were not of (positive/negative) of the reviews as our main focus was only that length were padded at the end with 0s. to detect deception. As such, among the five attributes that existed in the dataset (deceptive, hotel, polarity, source, text), C. Data Split only the deceptive and text values were considered relevant. The dataset was split into 3 parts: Training, Validation, A sample of truthful and deceptive records from the dataset Testing. By using 80:20 split, testing was allocated 20% are shown in Fig. 1 and Fig. 2 respectively. of the data. The remaining 80% was then split again, with B. Data Preprocessing 20% going to validation and 80% for training. The final data distribution is shown in Table II. The data was preprocessed in 3 steps. The first was the encoding of the deceptive labels. The provided corpus D. Implementation Environment contained text labels of whether a review was truthful or All models that were trained and tested using Google deceptive. As such, it was necessary to convert them into Colab with Python 3 notebook. The following were the a binary format such that it would be interpretable for the notebook settings: neural network. With the use of the Scikit-learn library, the • Runtime Type: Python 3 text labels were encoded to 1 if they were truthful and to 0 • Hardware Accelerator: GPU if they were deceptive. The second technique that was employed to the dataset was to tokenize the review text. Using Keras’ preprocessing Table II: Dataset Split Dataset Records Percentage(%) Table I: Hotel Review Dataset Training Set 1024 64 Review Type T and P T and N D and P D and N Validation Set 256 16 Records 400 400 400 400 Testing Set 320 20 Total 1600 100

446 E. Embedding Layer G. CNN Model In this paper, we have explored both options where In the CNN model, we added an initial embedding layer we experimented by only using Keras’ embedding layer that was experimented with and without the pre-trained word and pre-trained embeddings. Among the pre-trained word embeddings. In the convolutional layer, we used a conv1D embeddings, the three well-known models were selected: function to extract the features followed by a dropout layer with parameter values of 0.5 and max pooling layer using a 1D function. Lastly, dense layers were added which • Word2Vec (Google): The pre-trained word embedding implemented a sigmoid function to classify the text. that was used for this paper was generated through the use of the Google News dataset [27]. Its vocabulary H. BiLSTM Model size is 3M and the given dimension is 300. For the BiLSTM model, we added an initial embedding layer that was experimented with and without the pre-trained • GloVe (Stanford NLP Group): The pre-trained word word embeddings. In the bidirectional layer, we used a embedding that was used for the research was trained LSTM function to capture context information followed on Wikipedia 2014 and English Gigaword Fifth Edition by a dropout layer with parameter values of 0.5 and max dataset [29]. The size of its vocabulary is 400K and pooling layer using a 1D function. Similar to CNN, dense the dimension is 300. layers were added which implemented a sigmoid function to classify the text. • fastText (Facebook): For the research we selected a model that was trained on the Wikipedia 2017, UMBC I. CNN-BiLSTM Model WebBase corpus and statmt.org news dataset [30]. The In addition to the above two mainstream neural network vocabulary size is 1M and the dimension is 300. architectures, we explored a CNN-BiLSTM model to evaluate their combined strength. Rather than using the We decided to work with pre-trained word embeddings C-LSTM that was previously mentioned, we used CNN and for this study to reduce the time and resources to train BiLSTM to detect deception. This was achieved by passing a word embedding model and existing pre-trained models the output from the convolutional layer to the bidirectional were expected to provide good performance for our hotel layer as input. Similar to the other models, a dropout of 0.5 review dataset. Fig. 3 shows the implementation steps with and 1D max pooling was used. Additionally, dense layers an overview of the architecture of all deep learning models. were used with a sigmoid function.

F. Deceptive Review Detection Models IV. RESULTS Based upon the prior research as explained in the related A. Evaluation Metrics work section, we decided to implement the following ANN To evaluate the performance of the models, the following classifiers: four metrics were used: accuracy, precision, recall, and • Convolutional Neural Network (CNN) F1-measure. • Bidirectional Long Short Term Memory (BiLSTM) • CNN-BiLSTM Accuracy (A): The ratio of the samples that are correctly identified by the model to the total number of samples that are given in the dataset. TP + TN A = TP + FP + FN + TN

Precision (P): The ratio of samples that are correctly iden- tified positive to all that are identified as positive. TP P = TP + FP

Recall (R): The ratio of samples that are correctly identified positive to all that should be identified as positive. Figure 3: Implementation steps with an overview of the TP R = architecture of all deep learning models TP + FN

447 F1-Measure (F): The harmonic mean of precision and recall. 2PR F = P + R B. Experimental Results The results show that for the individual models, perfor- mance improvement is not guaranteed with the use of pre- trained word embeddings (Table III). As we can see for the CNN, the use of word embeddings did not significantly impact the accuracy as opposed to letting the network learn Figure 4: Confusion Matrix for CNN-BiLSTM on its own. However, when we observe the results from BiLSTM, we can see that the use of word embeddings has negatively affected the accuracy. GloVe being the worst for consider the possibility of inherent bias within the dataset. BiLSTM giving an accuracy of 80.6%, which is a significant Ott et al. [7] assume that the data that they retrieve from drop from just using it by itself. However, when we review the actual hotel sites are all true but it may be possible that the results from the combined approach, CNN-BiLSTM among them some are deceptive. This may attribute to a shows that word embeddings do not affect the accuracy reduction in overall accuracy as it may appear as noise to except when used with fastText. When CNN-BiLSTM is the models. used with fastText we achieve an accuracy of 88.8% which is the highest among all the models. Confusion matrix for V. C ONCLUSION AND FUTURE WORK CNN-BiLSTM is shown in Fig. 4. In our paper, three types of deep learning models (CNN, C. Discussion BiLSTM, CNN-BiLSTM) and pre-trained word embeddings (Word2Vec, GloVe, fastText) were used to detect deception Taking into consideration the recent works by Ren in a hotel review dataset. We used various word embeddings et al. [19] and Li et al. [20] on deception detection to see if they increase the accuracy of the models. Exper- which incorporated reviews from hotels along with other iments show that word embeddings have different effects domains (i.e Restaurants, Doctors), we felt that it would depending on the network. While CNN’s accuracy did not be misleading to make a comparison to results that they significantly change with or without the pre-trained word had achieved as a baseline. As such, we determined that embeddings, BiLSTM and CNN-BiLSTM’s accuracies were a direct comparison to research that specifically used the notably affected by the type of word embeddings used. hotel reviews would be appropriate. By using the results The implications associated with this research are to from Ott et al. [7] as a baseline to compare our models, we aid online shoppers in making decisions based on reliable found that CNN-BiLSTM with fastText performed slightly information and possibly allowing companies to identity better than their implementation of support vector machine reviews that wrongfully harm their products or brands. As (SVM). With SVM they were able to achieve an accuracy for broader implications, deception detection can be used of 86%. Comparing it with the the deep learning models, on all forms of texts online such as the news and social it can be said that in general deep learning models perform media which can help to combat the spread of misleading or better than the SVM in detecting deception. Reflecting incorrect information which has become a prevalent societal upon the data that we used, we believe it is important to problem. For future work in regards to this research, we plan to Table III: Performance of all models continue looking at other ANN models, as well as other word Model Accuracy P R F embeddings. In particular, embeddings such as BERT and SVM (Ott et al.) 86% - - - ELMo [31][32] have recently gained popularity for their use CNN 87.5% 87.5 80.3 86.1 in NLP and are of interest for their ability to generate word CNN w/ Word2Vec 87.2% 92.8 80.7 85.8 CNN w/ GloVe 88.1% 90.1 85.8 87.8 embeddings based upon the context. It would be interesting CNN w/ fastText 88.1% 92.1 83.7 87.2 to see the difference in accuracy where context is taken into BiLSTM 87.2% 89.1 84.5 86.7 consideration for the words as opposed to how GloVe and BiLSTM w/ Word2Vec 82.2% 82.7 82.2 82.2 Word2Vec only process the words. BiLSTM w/ GloVe 80.6% 81.2 80.9 80.4 BiLSTM w/ fastText 82.8% 81.2 85.7 83.2 REFERENCES CNN-BiLSTM 87.5% 89.6 80.0 83.9 CNN-BiLSTM w/ Word2Vec 86.9% 84.7 91.0 87.3 [1] Law, T. J. (2019). 19 Powerful Ecommerce Statistics That CNN-BiLSTM w/ GloVe 86.6% 84.8 86.9 85.4 Will Guide Your Strategy in 2019. Retrieved November 27, CNN-BiLSTM w/ fastText 88.8% 92.5 92.5 92.5 2019, from https://www.oberlo.ca/blog/ecommerce-statistics- guide-your-strategy

448 [2] Law, J. (Ed.). (2018). Deception. A Dictionary of Law. Oxford Problems in Engineering, 1–9. Retrieved from University Press. http://downloads.hindawi.com/journals/mpe/2018/2410206.pdf [3] Clement, J. (2019). Global retail e-commerce market [22] Vanta, T., & Aono, M. (2019). Fake review detection focusing size 2014-2023. Retrieved February 7, 2020, from on emotional expressions and extreme rating. In Proceedings https://www.statista.com/statistics/379046/worldwide-retail- of the 25th Annual Conference of the Language Processing e-commerce-sales/ Society (NLP2019) (pp. 1– 30). Nagoya, Japan: The Associ- [4] Kaemingk, D. (2019). 20 Online Review Stats to ation for Natural Language Processing. Know in 2019. Retrieved November 27, 2019, from [23] Mukherjee, A., Venkataraman, V., Liu, B., & Glance, N. https://www.qualtrics.com/blog/online-review-stats/ (2013, June). What yelp fake review filter might be doing?. [5] Bond Jr, C. F., & DePaulo, B. M. (2006). Accuracy of de- In Seventh international AAAI conference on weblogs and ception judgments. Personality and social psychology Review, social media. 10(3), 214-234. [24] Text Preprocessing - Keras Documentation, Retrieved [6] Ott, M., Choi, Y., Cardie, C., Hancock, J.T. (2011). Finding November 27, 2019, from https://keras.io/preprocessing/text/ Deceptive Opinion Spam by Any Stretch of the Imagination. [25] Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013). ACL. Efficient Estimation of Word Representations in Vector Space. [7] Ott, M., Cardie, C., Hancock, J.T. (2013). Negative Deceptive CoRR, abs/1301.3781 Opinion Spam. HLT-NAACL. [26] Nicholson, C. (n.d.). A Beginner’s Guide to Word2Vec and Neural Word Embeddings. Retrieved February 5, 2020, from [8] Brownlee, J. (2019). What Are Word Embeddings for Text? https://pathmind.com/wiki/word2vec Retrieved from https://machinelearningmastery.com/what-are- [27] McCormick, C. (2016). Google’s trained Word2Vec word-embeddings/ model in Python. Retrieved February 5, 2020, from [9] Latysheva, N. (2019). Why do we use word https://mccormickml.com/2016/04/12/googles-pretrained- embeddings in NLP? Retrieved March 7, 2020, from word2vec-model-in-python/ https://towardsdatascience.com/why-do-we-use-embeddings- [28] Pennington, J., Socher, R., Manning, C. (2014). Glove: in-nlp-2f20e1b632d2 Global Vectors for Word Representation. Proceedings of the [10] Dertat, A. (2017). Applied Deep Learning - Part 4: 2014 Conference on Empirical Methods in Natural Language Convolutional Neural Networks. Retrieved March 7, 2020, Processing (EMNLP). doi: 10.3115/v1/d14-1162 from https://towardsdatascience.com/applied-deep-learning- [29] Pennington, J., Socher, R., Manning, C. D. part-4-convolutional-neural-networks-584bc134c1e2 (2014). Retrieved February 11, 2020, from [11] I. Goodfellow, Y. Bengio, and A. Courville. (2016). Deep https://nlp.stanford.edu/projects/glove/ Learning. MIT Press. [30] English word vectors · fastText. Retrieved February 11, 2020, [12] Xin, M., Wang, Y. (2019). Research on image classifi- from https://fasttext.cc/docs/en/english-vectors.html cation model based on deep convolution neural network. [31] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2018). EURASIP Journal on Image and Video Processing, 2019(1). BERT: Pre-training of Deep Bidirectional Transformers for doi: 10.1186/s13640-019-0417-8 Language Understanding. [13] Lopez, M.M., Kalita, J.K. (2017). Deep Learning applied to [32] Gardner, M., Grus, J., Neumann, M., Tafjord, O., Dasigi, P., NLP. ArXiv, abs/1703.03091. Liu, N. F., . . . Zettlemoyer, L. (2018). AllenNLP: A Deep [14] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term Semantic Natural Language Processing Platform. Proceedings memory. Neural computation, 9(8), 1735-1780. of Workshop for NLP Open Source Software (NLP-OSS). [15] Zhou, C., Sun, C., Liu, Z. & Lau, F. C. M. (2015). A doi: 10.18653/v1/w18-2501 C-LSTM Neural Network for Text Classification.. CoRR, abs/1511.08630 [16] Wang, J., Yu, L. C., Lai, K. R., & Zhang, X. (2016). Dimensional using a regional CNN-LSTM model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 225-230). [17] Meng, W., Wei, Y., Liu, P., Zhu, Z., & Yin, H. (2019). Aspect Based Sentiment Analysis With Feature Enhanced Attention CNN-BiLSTM. IEEE Access, 7, 167240-167249. [18] Fuller, C. M., Biros, D. P., & Delen, D. (2011). An in- vestigation of data and methods for real world deception detection. Expert Systems with Applications, 38(7), 8392–8398. doi: 10.1016/j.eswa.2011.01.032 [19] Ren, Y., & Zhang, Y. (2016). Deceptive opinion spam de- tection using neural network. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 140-150). [20] Li, L., Qin, B., Ren, W., & Liu, T. (2017). Document representation and feature combination for deceptive spam review detection. Neurocomputing, 254, 33-41. [21] Zhao, S., Xu, Z., Liu, L., Guo, M., & Yun, J. (2018). Towards Accurate Deceptive Opinions Detection based on Word Order-preserving CNN. Mathematical

449