Using Semantic Role Labeling to Combat Adversarial SNLI

Using Semantic Role Labeling to Combat Adversarial SNLI Brett Szalapski Mengfan Zhang Miao Zhang [email protected] [email protected] [email protected] Abstract multi-layer neural network to train a classifier. It achieved an accuracy of 77.6% using LSTM net- Natural language inference is a fundamental works on the SNLI corpus. task in natural language understanding. Be- cause of the understanding required to assess (Rocktschel et al., 2016) improves the afore- the relationship between two sentences, it can mentioned LSTM model by applying a neural at- provide rich, generalized semantic representa- tention model. The basic architecture is the same tions. In this study, we implement a sentence- as (Bowman et al., 2015), which is based on sen- encoding model using recurrent neural net- tence embeddings for the premise and the hypoth- works. Our hypothesis was that semantic role esis. The key difference, which (Rocktschel et al., labels, along with GloVe word embeddings, 2016) uses to improve performance, is that the em- would give the sentences rich representations, bedding of the premise takes into consideration the making the model not only more successful on the original SNLI challenge, but also more ro- alignment between the premise and the hypothe- bust to adversarial examples. However, our sis. This attention-weighted representation of the findings show that adding the SRL informa- premise improves the model performance to an ac- tion does not improve the performance of our curacy of 83.5%. baseline model on either the SNLI task or the One limitation of the model proposed by (Rock- adversarial data sets. tschel et al., 2016) is that it reduces both the premise and the hypothesis to a single embed- 1 Problem and Related Work ding vector before matching them. Thus, it uses Natural language inference (NLI) is the problem two embedding vectors to perform sentence-level of determining whether a hypothesis sentence H matching in the end. However, not all word or follows from a premise sentence P. NLI is a fun- phrase-level matching results are equally impor- damental task in natural language understanding tant, and this model does not explicitly differen- because of the understanding required to assess tiate between good and bad matching results be- the relationship between two sentences. It has ap- tween the premise and the hypothesis. For ex- plications in many tasks, including question an- ample, matching of stop words is presumably less swering, semantic search, and automatic text sum- important than matching of content words. Ad- marizing. It is an ideal testing ground for theo- ditionally, some matching results may be particu- ries of semantic representation, and the training larly critical for making the final prediction. For for NLI tasks can provide rich, generalized seman- example, a mismatch of the subjects of two sen- tic representations. NLI has been addressed using tences may be sufficient to indicate that they are a variety of techniques, including symbolic logic, not entailment, but this intuition is hard to capture knowledge bases, and, in recent years, neural net- if two sentence embeddings are matched in their works (Bowman et al., 2015). The landscape of entirety. NLI models is shown in Figure1. To address the limitations of the models pro- (Bowman et al., 2015) proposes a straight- posed by (Bowman et al., 2015) and (Rock- forward architecture of deep neural networks for tschel et al., 2016), (Wang and Jiang, 2016) pro- NLI. In their architecture, the premise and the hy- poses a special LSTM-based architecture called pothesis are each represented by a sentence em- match-LSTMs. Instead of using whole sentence bedding vector. The two vectors are fed into a embeddings for the premise and the hypothesis, Figure 1: NLI model landscape this model uses an LSTM to perform word-by- with a very deep network (Kim et al., 2018), with word matching between the hypothesis with the multitask learning (Liu et al., 2019), or with se- premise. The LSTM sequentially processes the mantic knowledge (Zhang et al., 2018) leads to the hypothesis, matching each word in the hypothe- best results. sis with an attention-weighted representation of Due to limited time and resources, the base- the premise. This LSTM is able to place more line for our NLI project is a pair of bidirectional emphasis on important word-level matching re- LSTMs, one each for the premise and the hypothe- sults. In particular, this LSTM remembers im- sis. Recurrent neural networks (RNNs) are a well- portant mismatches that are critical for predicting understood model for sentence encoding. They the contradiction or the neutral relationship label. process input text sequentially and model the con- On the SNLI corpus, the match-LSTM architec- ditional transition between word tokens. The ad- ture achieve an accuracy of 86.1%. vantages of recursive networks include that they Different from (Wang and Jiang, 2016) using at- explicitly model the compositionality and the re- tention in conjunction with LSTMs, (Parikh et al., cursive structure of natural language, while the 2016) uses attention purely based on word em- current recursive architecture is limited by its de- beddings. This model consists of feed-forward pendence on syntactic tree (Munkhdalai and Yu, networks which operate largely independently of 2017). In (Munkhdalai and Yu, 2017), a syn- word order. Advantages of this model include the tactic parsing-independent tree structured model, simple neural architecture and the way attention called Neural Tree Indexers (NTI), provides a mid- is used to decompose the problem into indepen- dle ground between the sequential RNNs and syn- dently solvable sub-problems, facilitating paral- tactic tree-based recursive models. This model lelization. On the SNLI corpus, a new state-of-the- achieved the state-of-the-art performance on three art was established at 86.8% accuracy, with almost different NLP tasks: natural language inference, an order of magnitude fewer parameters than the answer sentence selection, and sentence classifi- previous state-of-the-art, LSTMN (Cheng et al., cation. In (Chen et al., 2017), RNN-based sen- 2016) and without relying on word-order. tence encoder equipped with intra-sentence gated- The power of LSTMs and attention is well- attention composition achieved the top perfor- known across a variety of tasks. However, one mances on both the RepEval-2017 and the SNLI piece of the puzzle that most of the top results dataset. on the SNLI leaderboard share that these previ- Intuitively, including information about the sen- ous models do not have is the incorporation of tence structure, such as part of speech or seman- pre-trained contextual word embeddings, such as tic role labels (SRL), should improve performance ELMO or BERT. Combining these embeddings on NLI challenges. Several research teams have Table 1: Examples from SNLI dataset, shown with both the selected gold labels and the full set of labels (abbreviated) from the individual annotators. found this to be true (Zhou and Xu, 2015; Shi than algorithmically generated; It uses a subset of et al., 2016). The SRL task is generally for- the resulting sentences on validation task to pro- mulated as multi-step classification subtasks in vide a reliable set of annotations over the same pipeline systems, consisting of predicate identifi- data and to identify areas of inferential uncertainty cation, predicate disambiguation, argument identi- (Bowman et al., 2015). fication, and argument classification (Zhang et al., Amazon Mechanical Turk was used for data 2018). An end-to-end system for SRL using deep collection —workers were presented with premise bi-directional recurrent network is proposed by scene descriptions from a preexisting corpus and (Zhou and Xu, 2015). Using only the original were asked to supply hypotheses for each of text as input, this system outperforms the previous three labels: entailment, neutral, and contradiction state-of-the-art model. Additionally, this model (Bowman et al., 2015). Each pair of sentences are is computationally efficient and better at handling possible captions for the same image. If the two longer sentences than traditional models (Zhou are labeled for entailment, it means that the sec- and Xu, 2015). ond caption is consistent with the information in the first. A label of contradiction indicates that the 2 Data two captions cannot possibly label the same pic- 2.1 SNLI Dataset ture. A third class of neutral allows for independent captions that might coexist (Bowman et al., The Stanford SNLI dataset (SNLI) is a freely 2015). Table1 shows a set of randomly chosen ex- available collection of 570,000 human-generated amples from the SNLI dataset. Both the selected English sentence pairs, manually labeled with one gold labels and the full set of labels (abbreviated) of three categories: entailment, contradiction, or from the individual annotators are described. A neutral. It constitutes one of the largest, high- gold label means if any one of the three labels was quality, labeled resources explicitly constructed chosen by at least three of the five annotators, then for understanding sentence semantics. SNLI is the this label will be the gold label. basis for much of the recent machine learning research in the NLI field. 2.2 Adversarial Datasets There was a longstanding limitation in NLI tasks that corpora are too small for training mod- 2.2.1 Compositionality-Sensitivity Test ern data-intensive, wide-coverage models. SNLI NLI model should understand both lexical and remedies this as a new, large-scale, naturalistic compositional semantics. Adversarial datasets corpus of sentence pairs labeled for entailment, can be used to test whether the model can contradiction, and independence. The differences sufficiently capture the compositional nature of between SNLI and many other resources are as sentences (Nie et al., 2018). Two types of follow: At 570,152 sentence pairs, it is two or- adversarial datasets—SOSWAP adversaries and ders of magnitude larger than the next largest NLI ADDAMOD adversaries—were used to test the dataset.

Load more