Attending Sentences to Detect Satirical Fake News
Total Page:16
File Type:pdf, Size:1020Kb
Attending Sentences to detect Satirical Fake News Sohan De Sarkar Fan Yang Arjun Mukherjee Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science Indian Institute of Technology University of Houston University of Houston Kharagpur, West Bengal, India 3551 Cullen Blvd., Houston 3551 Cullen Blvd., Houston [email protected] [email protected] [email protected] Abstract Satirical news detection is important in order to prevent the spread of misinformation over the Internet. Existing approaches to capture news satire use machine learning models such as SVM and hierarchical neural networks along with hand-engineered features, but do not explore sen- tence and document difference. This paper proposes a robust, hierarchical deep neural network approach for satire detection, which is capable of capturing satire both at the sentence level and at the document level. The architecture incorporates pluggable generic neural networks like CNN, GRU, and LSTM. Experimental results on real world news satire dataset show substantial per- formance gains demonstrating the effectiveness of our proposed approach. An inspection of the learned models reveals the existence of key sentences that control the presence of satire in news. 1 Introduction In the era of the Internet, online journalism is now a common practice. Online news articles have a major contribution in keeping people informed about what is happening in the world. The usage of Internet to spread news comes with the disadvantage of deception. The presence of deceptive and misleading news articles has been around for a while. Although some news articles often have a disclaimer about it being fake, many other don’t and thus readers could be led to believe them to be true. This leads to spread of misinformation, which may also start off a rumour. The importance of the detection of deceptive news is increasing rapidly, as more and more people start relying on online news as their major source of news. News satire is a genre of deceptive news that is found on the web, with the intent of dispensing satire in the form of legitimate news articles. These articles differ from “fake” news, in the sense that fake news intend to mislead people by providing untrue facts, while satirical news intends to ridicule and criticize something by providing satirical comments or through fictionalized stories. Satire is the intention of the author to be discovered as “fake”, unlike fake news, in which the intention is to make make the readers believe in the news as true. Detection of news satire is thus important to control the spread of false stories. We propose a hierarchical deep neural network model for satirical news detection, that is able to capture satire both at the sentence level and at the document level. The architecture is very extensible and caters to a variety of plug-and-play neural network models such as CNN, LSTM and GRU. This pipelined architecture allows for optimal learning of parameters required to capture satire. We show that our model is able to capture satire more efficiently than existing models, by using only pretrained word embeddings as input, without the aid of any syntactic information or any hand-crafted features. We show that word level semantic information is sufficient for effective detection of satire, with word level syntax information only marginally improving the performance. An analysis of the learned models reveals that news satire is decided by a few key sentences of the news article, the last sentence being one of them. We use the dataset introduced in (Yang et al., 2017) as the dataset for satire news detection. We trained our proposed plug-and-play hierarchical model end-to-end on the ground truth data. Our model works at the sentence level as opposed to paragraph level attention in (Yang et al., 2017). We perform extensive This work is licenced under a Creative Commons Attribution 4.0 International Licence. Licence details: http:// creativecommons.org/licenses/by/4.0/ 3371 Proceedings of the 27th International Conference on Computational Linguistics, pages 3371–3380 Santa Fe, New Mexico, USA, August 20-26, 2018. experiments on the dataset, fine-tuning the model by plugging different neural network models into the architecture. Experimental results on the dataset shows superior performance of our model compared to existing state-of-the-art approaches. 2 Related Work Previous approaches for generic deception detection include the use of traditional machine learning model such as SVM (Zhang et al., 2012) and Naive Bayesian models (Oraby et al., 2015). These ap- proaches focus on using linguistic cues and the social network behavior (Conroy et al., 2015) to detect deception. Much work has been done for deception detection on social media platforms (Davidov et al., 2010; Reyes et al., 2012) and opinion spam (Ott et al., 2011; Mukherjee et al., 2012; Mukherjee et al., 2013) In the context of deceptions in news, the field of “fake” news detection has been explored before (Jin et al., 2016; Rubin et al., 2016). These also include the use of machine learning, some of them also leveraging neural networks (Wang, 2017; Ruchansky et al., 2017) for the task. Existing works towards satirical news detection focus on engineering features to denote satire. (Bur- foot and Baldwin, 2009) filter satirical news from true news with headline features, profanity, and slang. (Rubin et al., 2016) propose additional features to classify satirical news, including absurdity, humour, grammar, negative affect, and punctuation. (Yang et al., 2017) further show linguistic features could be incorporated at paragraph level and reveal the different behaviour of each feature at paragraph level and document level. These models heavily rely on linguistic/word features as opposed to our rep- resentation learning approach. From these works, we observe that word level features contribute to the detection most while linguistic features only improve the result by a little, so we focus on our model to detect satire without further hand-crafted features. While features generated with careful hand analysis might contribute a robust classifier, neural network based models, from convolutional neural network (Kim, 2014; Kalchbrenner et al., 2014) to recurrent neural network (Tang et al., 2015), or a hybrid of the two (Lai et al., 2015), have pushed classification task to a new level. Also, the recent advances in learning distributed representations for word semantics in the form of word embeddings (Mikolov et al., 2013; Pennington et al., 2014; Bojanowski et al., 2016) allow for better modeling of semantics both at the sentence and document level. In this work, we utilize the power of neural networks and aim to advance the result of satirical news detection. We pack two separate composition models to further enhance the performance of the learned representation. 3 Model We propose an approach for building a robust hierarchical neural network architecture for detecting satire news, as shown in Figure 1. We abstract the whole network into two major components, the S and D module. The compositional module S creates a sentence embedding, taking a sequence of word embeddings as inputs. The compositional module D creates a document embedding, which acts as a summarization of the document, taking sentence embeddings as input. We use the learned document embeddings to classify the news as satire or true. This kind of abstraction helped us to fine-tune the architecture by applying different choices of compositional models for the S and D module. 3.1 Word embeddings and Syntax We use different pretrained word embeddings such as Glove1 (Pennington et al., 2014) and fastText2 (Bo- janowski et al., 2016) as the initial word embeddings. These pretrained embeddings are (optionally) con- catenated with one-hot embeddings that contain the syntax information3 of the word (Baccianella et al., 2010; Miller, 1995). The various syntactic features used and their corresponding one-hot vector lengths is shown in Table 1. The named entities used are: FACILITY, GPE, GSP, LOCATION, ORGANIZA- TION, PERSON, NULL(representing no named entity). The SentiWordnet scores are 16 discrete values ranging between 0 and 1, thus requiring a one-hot vector of size 16 to represent each score. These word 1https://nlp.stanford.edu/projects/glove 2https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md 3http://www.nltk.org 3372 Figure 1: Model Architecture embeddings (concatenated with syntax information) are multiplied with a weight matrix Wemb (learned) to produce a final word embedding, that summarizes the required semantics of the word for capturing satire. 3.2 Sentence (S) Module The S module takes a sequence of word embeddings as input, and produces a sentence embedding. This module tries to capture the essential information for capturing satire in a news at the sentence level. The various model choices for the S module include Temporal Convolutional neural networks(Kim, 2014) (CNN) and sequential models like Long Short-Term Memory(Hochreiter and Schmidhuber, 1997) (LSTM) and Gated Recurrent Unit(Cho et al., 2014) (GRU). d th Let vi 2 R be the d-dimensional word embedding of the i word of a sentence of length n. We show 3 different models to produce a sentence embedding from the word embeddings. Then, the S module can be represented mathematically as a composition function f that takes a sequences of n word embeddings as input to produce a sentence embedding s. Thus, s = f([v1; v2; : : : vn]) (1) where the choice of the composition function are standard generic neural networks like LST M; GRU; CNN. In the case of LSTM/GRU, we use their deep bidirectional versions, where we stack multiple bidirectional LSTM/GRU on top of each other. 3.3 Document (D) Module Similar to the S module, D module takes a sequence of sentence embeddings as input and produces a document embedding, capturing information at the document level.