Analysis Methods in Neural Language Processing: A Survey

Yonatan Belinkov1,2 and James Glass1

1MIT Computer Science and Laboratory 2Harvard School of Engineering and Applied Sciences Cambridge, MA, USA {belinkov, glass}@mit.edu

Abstract the networks in different ways.1 Others strive to better understand how NLP models work. This The field of natural language processing has theme of analyzing neural networks has connec- seen impressive progress in recent years, with tions to the broader work on interpretability in neural network models replacing many of the machine learning, along with specific characteris- traditional systems. A plethora of new mod- tics of the NLP field. els have been proposed, many of which are Why should we analyze our neural NLP mod- thought to be opaque compared to their feature- els? To some extent, this question falls into rich counterparts. This has led researchers to the larger question of interpretability in machine analyze, interpret, and evaluate neural net- learning, which has been the subject of much works in novel and more fine-grained ways. In debate in recent years.2 Arguments in favor this survey paper, we review analysis meth- of interpretability in machine learning usually ods in neural language processing, categorize mention goals like accountability, trust, fairness, them according to prominent research trends, safety, and reliability (Doshi-Velez and Kim, highlight existing limitations, and point to po- 2017; Lipton, 2016). Arguments against inter- tential directions for future work. pretability typically stress performance as the most important desideratum. All these arguments naturally apply to machine learning applications in NLP. 1 Introduction In the context of NLP, this question needs to be understood in light of earlier NLP work, often The rise of has transformed the field referred to as feature-rich or feature-engineered of natural language processing (NLP) in recent systems. In some of these systems, features are years. Models based on neural networks have more easily understood by humans—they can be obtained impressive improvements in various morphological properties, lexical classes, syn- tasks, including language modeling (Mikolov tactic categories, semantic relations, etc. In theory, et al., 2010; Jozefowicz et al., 2016), syntactic one could observe the importance assigned by parsing (Kiperwasser and Goldberg, 2016), statistical NLP models to such features in order machine translation (MT) (Bahdanau et al., 2014; to gain a better understanding of the model.3 In Sutskever et al., 2014), and many other tasks; see Goldberg (2017) for example success stories. This progress has been accompanied by a 1See, for instance, Noah Smith’s invited talk at ACL myriad of new neural network architectures. In 2017: vimeo.com/234958746. See also a recent debate on this matter by Chris Manning and Yann LeCun: www. many cases, traditional feature-rich systems are youtube.com/watch?v=fKk9KhGRBdI.(Videosaccessed being replaced by end-to-end neural networks on December 11, 2018.) that aim to map input text to some output pre- 2See, for example, the NIPS 2017 debate: www.youtube. diction. As end-to-end systems are gaining preva- com/watch?v=2hW05ZfsUUo.(Accessed on December 11, lence, one may point to two trends. First, some 2018.) 3Nevertheless, one could question how feasible such push back against the abandonment of linguis- an analysis is; consider, for example, interpreting support vec- tic knowledge and call for incorporating it inside tors in high-dimensional support vector machines (SVMs). 49

Transactions of the Association for Computational Linguistics, vol. 7, pp. 49–72, 2019. Action Editor: Marco Baroni. Submission batch: 10/2018; Revision batch: 12/2018; Published 3/2019. c 2019 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license. contrast, it is more difficult to understand what our scope.4 However, we mention here a few happens in an end-to-end neural network model representative studies that focused on analyzing that takes input (say, word embeddings) and such networks in order to illustrate how recent generates an output (say, a sentence classification). trends have roots that go back to before the recent Much of the analysis work thus aims to understand deep learning revival. how linguistic concepts that were common as Rumelhart and McClelland (1986) built a features in NLP systems are captured in neural feedforward neural network for learning the networks. English past tense and analyzed its performance As the analysis of neural networks for language on a variety of examples and conditions. They is becoming more and more prevalent, neural were especially concerned with the performance networks in various NLP tasks are being analyzed; over the course of training, as their goal was to different network architectures and components model the past form acquisition in children. They are being compared, and a variety of new anal- also analyzed a scaled-down version having eight ysis methods are being developed. This survey input units and eight output units, which allowed aims to review and summarize this body of work, them to describe it exhaustively and examine how highlight current trends, and point to existing certain rules manifest in network weights. lacunae. It organizes the literature into several In his seminal work on recurrent neural themes. Section 2 reviews work that targets a networks (RNNs), Elman trained networks on fundamental question: What kind of linguistic in- synthetic sentences in a language prediction formation is captured in neural networks? We task (Elman, 1989, 1990, 1991). Through exten- also point to limitations in current methods for sive analyses, he showed how networks discover answering this question. Section 3 discusses visu- the notion of a word when predicting characters; alization methods, and emphasizes the difficulty capture syntactic structures like number agree- in evaluating visualization work. In Section 4, ment; and acquire word representations that we discuss the compilation of challenge sets, or reflect lexical and syntactic categories. Similar test suites, for fine-grained evaluation, a meth- analyses were later applied to other networks and odology that has old roots in NLP. Section 5 tasks (Harris, 1990; Niklasson and Linaker,˚ 2000; deals with the generation and use of adversarial Pollack, 1990; Frank et al., 2013). examples to probe weaknesses of neural networks. While Elman’s work was limited in some ways, We point to unique characteristics of dealing with such as evaluating generalization or various lin- text as a discrete input and how different studies guistic phenomena—as Elman himself recog- handle them. Section 6 summarizes work on nized (Elman, 1989)—it introduced methods that explaining model predictions, an important goal are still relevant today: from visualizing network of interpretability research. This is a relatively activations in time, through clustering words by underexplored area, and we call for more work hidden state activations, to projecting represen- in this direction. Section 7 mentions a few other tations to dimensions that emerge as capturing methods that do not fall neatly into one of the properties like sentence number or verb valency. above themes. In the conclusion, we summarize The sections on visualization (Section 3) and iden- the main gaps and potential research directions for tifying linguistic information (Section 2) contain the field. many examples for these kinds of analysis. The paper is accompanied by online supple- mentary materials that contain detailed references 2 What Linguistic Information Is for studies corresponding to Sections 2, 4, and Captured in Neural Networks? 5 (Tables SM1, SM2, and SM3, respectively), available at https://boknilev.github.io/ Neural network models in NLP are typically nlp-analysis-methods. trained in an end-to-end manner on input–output Before proceeding, we briefly mention some pairs, without explicitly encoding linguistic earlier work of a similar spirit. 4For instance, a neural network that learns distributed representations of words was developed already in A Historical Note Reviewing the vast literature Miikkulainen and Dyer (1991). See Goodfellow et al. (2016, on neural networks for language is beyond chapter 12.4) for references to other important milestones.

50 features. Thus, a primary question is the fol- significant syntactic information at both word lowing: What linguistic information is captured level and sentence level. They also compared in neural networks? When examining answers representations at different encoding layers and to this question, it is convenient to consider found that ‘‘local features are somehow preserved three dimensions: which methods are used for in the lower layer whereas more global, abstract conducting the analysis, what kind of linguistic information tends to be stored in the upper information is sought, and which objects in the layer.’’ These results demonstrate the kind of neural network are being investigated. Table SM1 insights that the classification analysis may lead (in the supplementary materials) categorizes rel- to, especially when comparing different models evant analysis work according to these criteria. In or model components. the next subsections, we discuss trends in analysis Other methods for finding correspondences work along these lines, followed by a discussion between parts of the neural network and certain of limitations of current approaches. properties include counting how often attention weights agree with a linguistic property like 2.1 Methods anaphora resolution (Voita et al., 2018) or directly The most common approach for associating neural computing correlations between neural network network components with linguistic properties activations and some property; for example, is to predict such properties from activations of correlating RNN state activations with depth the neural network. Typically, in this approach a in a syntactic tree (Qian et al., 2016a) or neural network model is trained on some task with Melfrequency cepstral coefficient (MFCC) (say, MT) and its weights are frozen. Then, acoustic features (Wu and King, 2016). Such the trained model is used for generating feature correspondence may also be computed indirectly. representations for another task by running it on For instance, Alishahi et al. (2017) defined an a corpus with linguistic annotations and recording ABX discrimination task to evaluate how a neural the representations (say, hidden state activations). model of speech (grounded in vision) encoded Another classifier is then used for predicting the phonology. Given phoneme representations from property of interest (say, part-of-speech [POS] different layers in their model, and three pho- tags). The performance of this classifier is nemes, A, B, and X, they compared whether used for evaluating the quality of the generated the model representation for X is closer to A representations, and by proxy that of the original or B. This discrimination task enabled them to model. This kind of approach has been used draw conclusions about which layers encoder in numerous papers in recent years; see Table SM1 phonology better, observing that lower layers for references.5 It is referred to by various names, generally encode more phonological information. including ‘‘auxiliary prediction tasks’’ (Adi et al., 2017b), ‘‘diagnostic classifiers’’ (Veldhoen et al., 2.2 Linguistic Phenomena 2016), and ‘‘probing tasks’’ (Conneau et al., 2018). Different kinds of linguistic information have As an example of this approach, let us walk been analyzed, ranging from basic properties like through an application to analyzing syntax in sentence length, word position, word presence, or neural machine translation (NMT) by Shi et al. simple word order, to morphological, syntactic, (2016b). In this work, two NMT models were and semantic information. Phonetic/phonemic trained on standard parallel data—English→ information, speaker information, and style and French and English→German. The trained models accent information have been studied in neural (specifically, the encoders) were run on an network models for speech, or in joint audio-visual annotated corpus and their hidden states were models. See Table SM1 for references. used for training a logistic regression classifier While it is difficult to synthesize a holistic that predicts different syntactic properties. The picture from this diverse body of work, it ap- authors concluded that the NMT encoders learn pears that neural networks are able to learn a substantial amount of information on various linguistic phenomena. These models are especially 5A similar method has been used to analyze hierarchical structure in neural networks trained on arithmetic expressions successful at capturing frequent properties, while (Veldhoen et al., 2016; Hupkes et al., 2018). some rare properties are more difficult to learn.

51 Linzen et al. (2016), for instance, found that long investigated aspect in NMT (and in phrase-based short-term memory (LSTM) language models statistical MT). They trained a classifier on NMT are able to capture subject–verb agreement in sentence encoding vectors and found that they can many common cases, while direct supervision is accurately predict tense about 90% of the time. required for solving harder cases. However, when evaluating the output translations, Another theme that emerges in several they found them to have the correct tense only studies is the hierarchical nature of the learned 79% of the time. They interpreted this result representations. We have already mentioned such to mean that ‘‘part of the aspectual information findings regarding NMT (Shi et al., 2016b) and a is lost during decoding.’’ Relatedly, C´ıfka and visually grounded speech model (Alishahi et al., Bojar (2018) compared the performance of various 2017). Hierarchical representations of syntax were NMT models in terms of translation quality also reported to emerge in other RNN models (BLEU) and representation quality (classification (Blevins et al., 2018). tasks). They found a negative correlation between Finally, a couple of papers discovered that the two, suggesting that high-quality systems models trained with latent trees perform better may not be learning certain sentence meanings. on natural language inference (NLI) (Williams In contrast, Artetxe et al. (2018) showed that et al., 2018; Maillard and Clark, 2018) than word embeddings contain divergent linguistic ones trained with linguistically annotated trees. information, which can be uncovered by applying Moreover, the trees in these models do not a linear transformation on the learned embeddings. resemble syntactic trees corresponding to known Their results suggest an alternative explanation, linguistic theories, which casts doubts on the showing that ‘‘embedding models are able to importance of syntax-learning in the underlying encode divergent linguistic information but have neural network.6 limits on how this information is surfaced.’’ From a methodological point of view, most 2.3 Neural Network Components of the relevant analysis work is concerned with In terms of the object of study, various neural correlation: How correlated are neural network network components were investigated, including components with linguistic properties? What may word embeddings, RNN hidden states or gate be lacking is a measure of causation: How does activations, sentence embeddings, and attention the encoding of linguistic properties affect the weights in sequence-to-sequence (seq2seq) mod- system output? Giulianelli et al. (2018) make some els. Generally less work has analyzed convo- headway on this question. They predicted number lutional neural networks in NLP, but see Jacovi agreement from RNN hidden states and gates et al. (2018) for a recent exception. In speech at different time steps. They then intervened in processing, researchers have analyzed layers in how the model processes the sentence by changing deep neural networks for speech recognition a hidden activation based on the difference and different speaker embeddings. Some analysis between the prediction and the correct label. This has also been devoted to joint language–vision improved agreement prediction accuracy, and the or audio–vision models, or to similarities bet- effect persisted over the course of the sentence, ween word embeddings and con volutional image indicating that this information has an effect on the representations. Table SM1 provides detailed model. However, they did not report the effect on references. overall model quality, for example by measuring perplexity. Methods from causal inference may 2.4 Limitations shed new light on some of these questions. The classification approach may find that a certain Finally, the predictor for the auxiliary task amount of linguistic information is captured in the is usually a simple classifier, such as logistic neural network. However, this does not necessar- regression. A few studies compared different clas- ily mean that the information is used by the net- sifiers and found that deeper classifiers lead to work. For example, Vanmassenhove et al. (2017) overall better results, but do not alter the respective trends when comparing different models or com- ponents (Qian et al., 2016b; Belinkov, 2018). 6Others found that even simple binary trees may work well in MT (Wang et al., 2018b) and sentence classification (Chen Interestingly, Conneau et al. (2018) found that et al., 2015). tasks requiring more nuanced linguistic knowledge

52 Figure 1: A heatmap visualizing neuron activations. In this case, the activations capture position in the sentence.

(e.g., tree depth, coordination inversion) gain the most from using a deeper classifier. However, the approach is usually taken for granted; given its prevalence, it appears that better theoretical or empirical foundations are in place.

3 Visualization Visualization is a valuable tool for analyzing neural networks in the language domain and Figure 2: A visualization of attention weights, showing beyond. Early work visualized hidden unit ac- soft alignment between source and target sentences tivations in RNNs trained on an artificial lan- in an NMT model. Reproduced from Bahdanau et al. (2014), with permission. guage modeling task, and observed how they correspond to certain grammatical relations such as agreement (Elman, 1991). Much recent work noticed, as in the reordering of noun and adjective has focused on visualizing activations on spe- when translating the phrase ‘‘European Economic cific examples in modern neural networks for Area.’’ language (Karpathy et al., 2015; Kad´ ar´ et al., Another line of work computes various saliency 2017; Qian et al., 2016a; Liu et al., 2018) and measures to attribute predictions to input features. speech (Wu and King, 2016; Nagamine et al., The important or salient features can then be 2015; Wang et al., 2017b). Figure 1 shows an visualized in selected examples (Li et al., 2016a; example visualization of a neuron that captures Aubakirova and Bansal, 2016; Sundararajan et al., position of words in a sentence. The heatmap 2017; Arras et al., 2017a,b; Ding et al., 2017; uses blue and red colors for negative and positive Murdoch et al., 2018; Mudrakarta et al., 2018; activation values, respectively, enabling the user Montavon et al., 2018; Godin et al., 2018). to quickly grasp the function of this neuron. Saliency can also be computed with respect to The attention mechanism that originated in intermediate values, rather than input features work on NMT (Bahdanau et al., 2014) also lends (Ghaeini et al., 2018).7 itself to a natural visualization. The alignments An instructive visualization technique is to obtained via different attention mechanisms have cluster neural network activations and compare produced visualizations ranging from tasks like them to some linguistic property. Early work NLI (Rocktaschel¨ et al., 2016; Yin et al., clustered RNN activations, showing that they or- 2016), summarization (Rush et al., 2015), MT ganize in lexical categories (Elman, 1989, 1990). post-editing (Jauregi Unanue et al., 2018), and Similar techniques have been followed by others. morphological inflection (Aharoni and Goldberg, Recent examples include clustering of sentence 2017) to matching users on social media (Tay embeddings in an RNN encoder trained in a et al., 2018). Figure 2 reproduces a visualization multitask learning scenario (Brunner et al., 2017), of attention alignments from the original work by and phoneme clusters in a joint audio-visual RNN Bahdanau et al. Here grayscale values correspond model (Alishahi et al., 2017). to the weight of the attention between words in an A few online tools for visualizing neural net- English source sentence (columns) and its French works have recently become available. LSTMVis translation (rows). As Bahdanau et al. explain, this visualization demonstrates that the NMT model 7Generally, many of the visualization methods are learned a soft alignment between source and target adapted from the vision domain, where they have been words. Some aspects of word order may also be extremely popular; see Zhang and Zhu (2018) for a survey.

53 (Strobelt et al., 2018b) visualizes RNN activa- and exhaustivity. They contrasted such datasets tions, focusing on tracing hidden state dynamics.8 with test corpora, ‘‘whose main advantage is Seq2Seq-Vis (Strobelt et al., 2018a) visual- that they reflect naturally occurring data.’’ This izes different modules in attention-based seq2seq idea underlines much of the work on challenge models, with the goal of examining model deci- sets and is echoed in more recent work (Wang sions and testing alternative decisions. Another et al., 2018a). For instance, Cooper et al. (1996) tool focused on comparing attention alignments constructed a semantic test suite that targets phe- was proposed by Rikters (2018). It also provides nomena as diverse as quantifiers, plurals, ana- translation confidence scores based on the distri- phora, ellipsis, adjectival properties, and so on. bution of attention weights. NeuroX (Dalvi et al., After a hiatus of a couple of decades,9 challenge 2019b) is a tool for finding and analyzing indi- sets have recently gained renewed popularity in vidual neurons, focusing on machine translation. the NLP community. In this section, we include datasets used for evaluating neural network Evaluation As in much work on interpretabil- models that diverge from the common average- ity, evaluating visualization quality is difficult case evaluation. Many of them share some of and often limited to qualitative examples. A few the properties noted by Lehmann et al. (1996), notable exceptions report human evaluations of although negative examples (ill-formed data) are visualization quality. Singh et al. (2018) showed typically less utilized. The challenge datasets can human raters hierarchical clusterings of input be categorized along the following criteria: the words generated by two interpretation methods, task they seek to evaluate, the linguistic phe- and asked them to evaluate which method is more nomena they aim to study, the language(s) they accurate, or in which method they trust more. target, their size, their method of construction, Others reported human evaluations for attention and how performance is evaluated.10 Table SM2 visualization in conversation modeling (Freeman (in the supplementary materials) categorizes many et al., 2018) and medical code prediction tasks recent challenge sets along these criteria. Below (Mullenbach et al., 2018). we discuss common trends along these lines. The availability of open-source tools of the sort described above will hopefully encourage users 4.1 Task to utilize visualization in their regular research By far, the most targeted tasks in challenge sets and development cycle. However, it remains to be are NLI and MT. This can partly be explained by seen how useful visualizations turn out to be. the popularity of these tasks and the prevalence of neural models proposed for solving them. Perhaps 4 Challenge Sets more importantly, tasks like NLI and MT arguably require inferences at various linguistic levels, The majority of benchmark datasets in NLP are making the challenge set evaluation especially drawn from text corpora, reflecting a natural attractive. Still, other high-level tasks like reading frequency distribution of language phenomena. comprehension or question answering have not While useful in practice for evaluating system received as much attention, and may also benefit performance in the average case, such datasets from the careful construction of challenge sets. may fail to capture a wide range of phenomena. A significant body of work aims to evaluate An alternative evaluation framework consists of the quality of embedding models by correlating challenge sets, also known as test suites, which the similarity they induce on word or sentence have been used in NLP for a long time (Lehmann pairs with human similarity judgments. Datasets et al., 1996), especially for evaluating MT sys- containing such similarity scores are often used tems (King and Falkedal, 1990; Isahara, 1995; Koh et al., 2001). Lehmann et al. (1996) noted several key properties of test suites: systematicity, 9One could speculate that their decrease in popularity control over data, inclusion of negative data, can be attributed to the rise of large-scale quantitative eval- uation of statistical NLP systems. 10Another typology of evaluation protocols was put forth 8RNNVis (Ming et al., 2017) is a similar tool, but its by Burlot and Yvon (2017). Their criteria are partially online demo does not seem to be available at the time of overlapping with ours, although they did not provide a writing. comprehensive categorization like the one compiled here.

54 to evaluate word embeddings (Finkelstein et al., and Isabelle and Kuhn (2018) prepared challenge 2002; Bruni et al., 2012; Hill et al., 2015, inter sets for MT evaluation covering fine-grained alia) or sentence embeddings; see the many phenomena at morpho-syntactic, syntactic, and shared tasks on semantic textual similarity in lexical levels. SemEval (Cer et al., 2017, and previous editions). Generally, datasets that are constructed pro- Many of these datasets evaluate similarity at a grammatically tend to cover less fine-grained coarse-grained level, but some provide a more linguistic properties, while manually constructed fine-grained evaluation of similarity or related- datasets represent more diverse phenomena. ness. For example, some datasets are dedicated for specific word classes such as verbs (Gerz 4.3 Languages et al., 2016) or rare words (Luong et al., 2013), As unfortunately usual in much NLP work, es- or for evaluating compositional knowledge in sen- pecially neural NLP, the vast majority of challenge tence embeddings (Marelli et al., 2014). Mul- sets are in English. This situation is slightly better tilingual and cross-lingual versions have also in MT evaluation, where naturally all datasets been collected (Leviant and Reichart, 2015; Cer feature other languages (see Table SM2). A et al., 2017). Although these datasets are widely notable exception is the work by Gulordava et al. used, this kind of evaluation has been criticized (2018), who constructed examples for evaluating for its subjectivity and questionable correlation number agreement in language modeling in with downstream performance (Faruqui et al., English, Russian, Hebrew, and Italian. Clearly, 2016). there is room for more challenge sets in non- English languages. However, perhaps more press- 4.2 Linguistic Phenomena ing is the need for large-scale non-English One of the primary goals of challenge sets is datasets (besides MT) to develop neural models to evaluate models on their ability to handle for popular NLP tasks. specific linguistic phenomena. While earlier studies emphasized exhaustivity (Cooper et al., 4.4 Scale 1996; Lehmann et al., 1996), recent ones tend The size of proposed challenge sets varies greatly to focus on a few properties of interest. For (Table SM2). As expected, datasets constructed example, Sennrich (2017) introduced a challenge by hand are smaller, with typical sizes in the set for MT evaluation focusing on five proper- hundreds. Automatically built datasets are much ties: subject–verb agreement, noun phrase agree- larger, ranging from several thousands to close to a ment, verb–particle constructions, polarity, and hundred thousand (Sennrich, 2017), or even more transliteration. Slightly more elaborated is an than one million examples (Linzen et al., 2016). MT challenge set for morphology, including In the latter case, the authors argue that such a 14 morphological properties (Burlot and Yvon, large test set is needed for obtaining a sufficient 2017). See Table SM2 for references to datasets representation of rare cases. A few manually targeting other phenomena. constructed datasets contain a fairly large number Other challenge sets cover a more diverse of examples, up to 10 thousand (Burchardt et al., range of linguistic properties, in the spirit of 2017). some of the earlier work. For instance, extend- ing the categories in Cooper et al. (1996), the 4.5 Construction Method GLUE analysis set for NLI covers more than Challenge sets are usually created either prog- 30 phenomena in four coarse categories (lexical rammatically or manually, by handcrafting spe- semantics, predicate–argument structure, logic, cific examples. Often, semi-automatic methods and knowledge). In MT evaluation, Burchardt are used to compile an initial list of examples that et al. (2017) reported results using a large test is manually verified by annotators. The specific suite covering 120 phenomena, partly based on 11 method also affects the kind of language use and Lehmann et al. (1996). Isabelle et al. (2017) how natural or artificial/synthetic the examples are. We describe here some trends in dataset 11Their dataset does not seem to be available yet, but more construction methods in the hope that they may be details are promised to appear in a future publication. useful for researchers contemplating new datasets.

55 Several datasets were constructed by modifying contrastive pairs evaluation of Sennrich (2017). or extracting examples from existing datasets. Automatic evaluation metrics are cheap to obtain For instance, Sanchez et al. (2018) and Glockner and can be calculated on a large scale. However, et al. (2018) extracted examples from SNLI they may miss certain aspects. Thus a few studies (Bowman et al., 2015) and replaced specific words report human evaluation on their challenge sets, such as hypernyms, synonyms, and antonyms, such as in MT (Isabelle et al., 2017; Burchardt followed by manual verification. Linzen et al. et al., 2017). (2016), on the other hand, extracted examples We note here also that judging the quality of a of subject–verb agreement from raw texts using model by its performance on a challenge set can heuristics, resulting in a large-scale dataset. be tricky. Some authors emphasize their wish Gulordava et al. (2018) extended this to other to test systems on extreme or difficult cases, agreement phenomena, but they relied on syntactic ‘‘beyond normal operational capacity’’ (Naik information available in treebanks, resulting in a et al., 2018). However, whether one should expect smaller dataset. systems to perform well on specially chosen cases Several challenge sets utilize existing test suites, (as opposed to the average case) may depend either as a direct source of examples (Burchardt on one’s goals. To put results in perspective, et al., 2017) or for searching similar naturally one may compare model performance to human occurring examples (Wang et al., 2018a).12 performance on the same task (Gulordava et al., Sennrich (2017) introduced a method for eval- 2018). uating NMT systems via contrastive translation pairs, where the system is asked to estimate 5 Adversarial Examples the probability of two candidate translations that are designed to reflect specific linguistic prop- Understanding a model also requires an under- erties. Sennrich generated such pairs program- standing of its failures. Despite their success matically by applying simple heuristics, such as in many tasks, machine learning systems can changing gender and number to induce agreement also be very sensitive to malicious attacks or errors, resulting in a large-scale challenge set adversarial examples (Szegedy et al., 2014; of close to 100 thousand examples. This frame- Goodfellow et al., 2015). In the vision domain, work was extended to evaluate other properties, small changes to the input image can lead to but often requiring more sophisticated genera- misclassification, even if such changes are in- tion methods like using morphological analyzers/ distinguishable by humans. generators (Burlot and Yvon, 2017) or more man- The basic setup in work on adversarial examples ual involvement in generation (Bawden et al., can be described as follows.13 Given a neural 2018) or verification (Rios Gonzales et al., 2017). network model f and an input example x, we Finally, a few studies define templates that seek to generate an adversarial example x0 that capture certain linguistic properties and instanti- will have a minimal distance from x, while being ate them with word lists (Dasgupta et al., 2018; assigned a different label by f: Rudinger et al., 2018; Zhao et al., 2018a). min||x − x0|| Template-based generation has the advantage of x0 providing more control, for example for obtaining s.t. f(x) = l, f(x0) = l0, l 6= l0 a specific vocabulary distribution, but this comes at the expense of how natural the examples are. In the vision domain, x can be the input image pixels, resulting in a fairly intuitive interpretation 4.6 Evaluation of this optimization problem: measuring the distance ||x − x0|| is straightforward, and finding Systems are typically evaluated by their per- x0 can be done by computing gradients with respect formance on the challenge set examples, either to the input, since all quantities are continuous. with the same metric used for evaluating the In the text domain, the input is discrete (for system in the first place, or via a proxy, as in the example, a sequence of words), which poses two problems. First, it is not clear how to measure 12Wang et al. (2018a) also verified that their examples do not contain annotation artifacts, a potential problem noted in recent studies (Gururangan et al., 2018; Poliak et al., 2018b). 13The notation here follows Yuan et al. (2017).

56 the distance between the original and adversarial on (Sakaguchi et al., 2017; Heigold et al., 2018; examples, x and x0, which are two discrete objects Belinkov and Bisk, 2018). Gao et al. (2018) (say, two words or sentences). Second, minimizing defined scoring functions to identify tokens to this distance cannot be easily formulated as an modify. Their functions do not require access to optimization problem, as this requires computing model internals, but they do require the model gradients with respect to a discrete input. prediction score. After identifying the important In the following, we review methods for tokens, they modify characters with common edit handling these difficulties according to several operations. criteria: the adversary’s knowledge, the specificity Zhao et al. (2018c) used generative adversar- of the attack, the linguistic unit being modified, ial networks (GANs) (Goodfellow et al., 2014) to and the task on which the attacked model was minimize the distance between latent repre- trained.14 Table SM3 (in the supplementary ma- sentations of input and adversarial examples, and terials) categorizes work on adversarial examples performed perturbations in latent space. Since the in NLP according to these criteria. latent representations do not need to come from the attacked model, this is a black-box attack. 5.1 Adversary’s Knowledge Finally, Alzantot et al. (2018) developed an Adversarial examples can be generated using interesting population-based genetic algorithm access to model parameters, also known as for crafting adversarial examples for text clas- white-box attacks, or without such access, with sification by maintaining a population of mod- black-box attacks (Papernot et al., 2016a, 2017; ifications of the original sentence and evaluating Narodytska and Kasiviswanathan, 2017; Liu fitness of modifications at each generation. They et al., 2017). do not require access to model parameters, but do White-box attacks are difficult to adapt to the use prediction scores. A similar idea was proposed text world as they typically require computing by Kuleshov et al. (2018). gradients with respect to the input, which would be discrete in the text case. One option is to 5.2 Attack Specificity compute gradients with respect to the input word Adversarial attacks can be classified to targeted embeddings, and perturb the embeddings. Since vs. non-targeted attacks (Yuan et al., 2017). A this may result in a vector that does not correspond targeted attack specifies a specific false class, l0, to any word, one could search for the closest word while a nontargeted attack cares only that the embedding in a given dictionary (Papernot et al., predicted class is wrong, l0 6= l. Targeted attacks 2016b); Cheng et al. (2018) extended this idea to are more difficult to generate, as they typically seq2seq models. Others computed gradients with require knowledge of model parameters; that is, respect to input word embeddings to identify and they are white-box attacks. This might explain rank words to be modified (Samanta and Mehta, why the majority of adversarial examples in NLP 2017; Liang et al., 2018). Ebrahimi et al. (2018b) are nontargeted (see Table SM3). A few targeted developed an alternative method by representing attacks include Liang et al. (2018), which specified text edit operations in vector space (e.g., a binary a desired class to fool a text classifier, and Chen vector specifying which characters in a word et al. (2018a), which specified words or captions would be changed) and approximating the change to generate in an image captioning model. Others in loss with the derivative along this vector. targeted specific words to omit, replace, or include Given the difficulty in generating white-box when attacking seq2seq models (Cheng et al., adversarial examples for text, much research has 2018; Ebrahimi et al., 2018a). been devoted to black-box examples. Often, the Methods for generating targeted attacks in adversarial examples are inspired by text edits that NLP could possibly take more inspiration from are thought to be natural or commonly generated adversarial attacks in other fields. For instance, by humans, such as typos, misspellings, and so in attacking malware detection systems, several studies developed targeted attacks in a black- 14 These criteria are partly taken from Yuan et al. (2017), box scenario (Yuan et al., 2017). A black-box where a more elaborate taxonomy is laid out. At present, though, the work on adversarial examples in NLP is more targeted attack for MT was proposed by Zhao limited than in computer vision, so our criteria will suffice. et al. (2018c), who used GANs to search for

57 attacks on Google’s MT system after mapping 2018), but this does not indicate how perceptible sentences into continuous space with adversarially the changes are. More informative human stud- regularized autoencoders (Zhao et al., 2018b). ies evaluate grammaticality or similarity of the adversarial examples to the original ones (Zhao 5.3 Linguistic Unit et al., 2018c; Alzantot et al., 2018). Given the Most of the work on adversarial text examples inherent difficulty in generating imperceptible involves modifications at the character- and/or changes in text, more such evaluations are needed. word-level; see Table SM3 for specific references. Other transformations include adding sentences 6 Explaining Predictions or text chunks (Jia and Liang, 2017) or gener- ating paraphrases with desired syntactic structures Explaining specific predictions is recognized as (Iyyer et al., 2018). In image captioning, Chen a desideratum in intereptability work (Lipton, et al. (2018a) modified pixels in the input image 2016), argued to increase the accountability of to generate targeted attacks on the caption text. machine learning systems (Doshi-Velez et al., 2017). However, explaining why a deep, highly 5.4 Task non-linear neural network makes a certain pre- Generally, most work on adversarial examples diction is not trivial. One solution is to ask the in NLP concentrates on relatively high-level model to generate explanations along with its language understanding tasks, such as text primary prediction (Zaidan et al., 2007; Zhang 15 classification (including sentiment analysis) and et al., 2016), but this approach requires manual reading comprehension, while work on text gen- annotations of explanations, which may be hard eration focuses mainly on MT. See Table SM3 to collect. for references. There is relatively little work on An alternative approach is to use parts of the adversarial examples for more low-level language input as explanations. For example, Lei et al. processing tasks, although one can mention (2016) defined a generator that learns a distri- morphological tagging (Heigold et al., 2018) and bution over text fragments as candidate ratio- spelling correction (Sakaguchi et al., 2017). nales for justifying predictions, evaluated on sentiment analysis. Alvarez-Melis and Jaakkola 5.5 Coherence and Perturbation (2017) discovered input–output associations in Measurement a sequence-to-sequence learning scenario, by In adversarial image examples, it is fairly perturbing the input and finding the most relevant straightforward to measure the perturbation, associations. Gupta and Schutze¨ (2018) inspected either by measuring distance in pixel space, say how information is accumulated in RNNs towards ||x − x0|| under some norm, or with alternative a prediction, and associated peaks in prediction measures that are better correlated with human scores with important input segments. As these perception (Rozsa et al., 2016). It is also visually methods use input segments to explain predictions, compelling to present an adversarial image with they do not shed much light on the internal imperceptible difference from its source image. computations that take place in the network. In the text domain, measuring distance is not as At present, despite the recognized importance straightforward, and even small changes to the text for interpretability, our ability to explain pre- may be perceptible by humans. Thus, evaluation dictions of neural networks in NLP is still limited. of attacks is fairly tricky. Some studies imposed constraints on adversarial examples to have a 7 Other Methods small number of edit operations (Gao et al., 2018). We briefly mention here several analysis methods Others ensured syntactic or semantic coherence in that do not fall neatly into the previous sections. different ways, such as filtering replacements by A number of studies evaluated the effect word similarity or sentence similarity (Alzantot of erasing or masking certain neural network et al., 2018; Kuleshov et al., 2018), or by using components, such as dimensions, synonyms and other word lists (Samanta and hidden units, or even full words (Li et al., 2016b; Mehta, 2017; Yang et al., 2018). Some reported whether a human can classify 15Other work considered learning textual-visual expla- the adversarial example correctly (Yang et al., nations from multimodal annotations (Park et al., 2018).

58 Feng et al., 2018; Khandelwal et al., 2018; visualization via saliency measures or evaluation Bau et al., 2018). For example, Li et al. by adversarial examples. But even those some- (2016b) erased specific dimensions in word times require non-trivial adaptations to work with embeddings or hidden states and computed the text input. Some methods are more specific to change in probability assigned to different labels. the field, but may prove useful in other domains. Their experiments revealed interesting differences Challenge sets or test suites are such a case. between word embedding models, where in some Throughout this survey, we have identified models information is more focused in individual several limitations or gaps in current analysis dimensions. They also found that information is work: more distributed in hidden layers than in the input layer, and erased entire words to find important • The use of auxiliary classification tasks words in a sentiment analysis task. for identifying which linguistic properties Several studies conducted behavioral experi- neural networks capture has become standard ments to interpret word embeddings by defining practice (Section 2), while lacking both a intrusion tasks, where humans need to identify theoretical foundation and a better empirical an intruder word, chosen based on difference consideration of the link between the in word embedding dimensions (Murphy et al., auxiliary tasks and the original task. 16 2012; Fyshe et al., 2015; Faruqui et al., 2015). • Evaluation of analysis work is often limited In this kind of work, a word embedding model or qualitative, especially in visualization may be deemed more interpretable if humans are techniques (Section 3). Newer forms of eval- better able to identify the intruding words. Since uation are needed for determining the suc- the evaluation is costly for high-dimensional rep- cess of different methods. resentations, alternative automatic metrics were considered (Park et al., 2017; Senel et al., 2018). • Relatively little work has been done on A long tradition in work on neural networks explaining predictions of neural network is to evaluate and analyze their ability to learn models, apart from providing visualizations different formal languages (Das et al., 1992; (Section 6). With the increasing public Casey, 1996; Gers and Schmidhuber, 2001; Boden´ demand for explaining algorithmic choices and Wiles, 2002; Chalup and Blair, 2003). This in machine learning systems (Doshi-Velez trend continues today, with research into modern and Kim, 2017; Doshi-Velez et al., 2017), architectures and what formal languages they there is pressing need for progress in this can learn (Weiss et al., 2018; Bernardy, 2018; direction. Suzgun et al., 2019), or the formal properties they • Much of the analysis work is focused on the possess (Chen et al., 2018b). English language, especially in constructing challenge sets for various tasks (Section 4), 8 Conclusion with the exception of MT due to its inherent multilingual character. Developing resources Analyzing neural networks has become a hot topic and evaluating methods on other languages in NLP research. This survey attempted to review is important as the field grows and matures. and summarize as much of the current research as possible, while organizing it along several • More challenge sets for evaluating other tasks prominent themes. We have emphasized aspects besides NLI and MT are needed. in analysis that are specific to language—namely, what linguistic information is captured in neural Finally, as with any survey in a rapidly evolv- networks, which phenomena they are successful ing field, this paper is likely to omit relevant at capturing, and where they fail. Many of the recent work by the time of publication. While we analysis methods are general techniques from the intend to continue updating the online appendix larger machine learning community, such as with newer publications, we hope that our sum- marization of prominent analysis work and its categorization into several themes will be a useful 16The methodology follows earlier work on evaluating the interpretability of probabilistic topic models with intrusion guide for scholars interested in analyzing and tasks (Chang et al., 2009). understanding neural networks for NLP.

59 Acknowledgments Processing, pages 412–421. Association for Computational Linguistics. We would like to thank the anonymous review- ers and the action editor for their very helpful Moustafa Alzantot, Yash Sharma, Ahmed comments. This work was supported by the Elgohary, Bo-Jhang Ho, Mani Srivastava, and Qatar Computing Research Institute. Y.B. is also Kai-Wei Chang. 2018. Generating Natural supported by the Harvard Mind, Brain, Behavior Language Adversarial Examples. In Proceed- Initiative. ings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2890–2896. Association for Computa- References tional Linguistics. Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2017a. Anal- Leila Arras, Franziska Horn, Gregoire´ Montavon, ysis of sentence embedding models using Klaus-Robert Muller,¨ and Wojciech Samek. prediction tasks in natural language processing. 2017a. ‘‘What is relevant in a text document?’’: IBM Journal of Research and Development, An interpretable machine learning approach. 61(4):3–9. PLOS ONE, 12(8):1–23.

Yossi Adi, Einat Kermany, Yonatan Belinkov, Leila Arras, Gregoire´ Montavon, Klaus-Robert Ofer Lavi, and Yoav Goldberg. 2017. Fine- Muller,¨ and Wojciech Samek. 2017b. Explain- Grained Analysis of Sentence Embeddings ing Predictions in Using Auxiliary Prediction Tasks. In Interna- Sentiment Analysis. In Proceedings of the tional Conference on Learning Representations 8th Workshop on Computational Approaches (ICLR). to Subjectivity, Sentiment and Social Media Analysis, pages 159–168. Association for Roee Aharoni and Yoav Goldberg. 2017. Computational Linguistics. Morphological Inflection Generation with Hard Monotonic Attention. In Proceedings of the Mikel Artetxe, Gorka Labaka, Inigo Lopez- 55th Annual Meeting of the Association for Gazpio, and Eneko Agirre. 2018. Uncovering Computational Linguistics (Volume 1: Long Divergent Linguistic Information in Word Papers), pages 2004–2015. Association for Embeddings with Lessons for Intrinsic and Computational Linguistics. Extrinsic Evaluation. In Proceedings of the 22nd Conference on Computational Natural Wasi Uddin Ahmad, Xueying Bai, Zhechao Language Learning, pages 282–291. Associa- Huang, Chao Jiang, Nanyun Peng, and Kai-Wei tion for Computational Linguistics. Chang. 2018. Multi-task Learning for Universal Sentence Embeddings: A Thorough Evaluation Malika Aubakirova and Mohit Bansal. 2016. Inter- using Transfer and Auxiliary Tasks. arXiv preting Neural Networks to Improve Politeness preprint arXiv:1804.07911v2. Comprehension. In Proceedings of the 2016 Conference on Empirical Methods in Natu- Afra Alishahi, Marie Barking, and Grzegorz ral Language Processing, pages 2035–2041. Chrupała. 2017. Encoding of phonology in a Association for Computational Linguistics. recurrent neural model of grounded speech. In Proceedings of the 21st Conference on Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Computational Natural Language Learning Bengio. 2014. Neural Machine Translation by (CoNLL 2017), pages 368–378. Association for Jointly Learning to Align and Translate. arXiv Computational Linguistics. preprint arXiv:1409.0473v7.

David Alvarez-Melis and Tommi Jaakkola. Anthony Bau, Yonatan Belinkov, Hassan Sajjad, 2017. A causal framework for explaining the Nadir Durrani, Fahim Dalvi, and James Glass. predictions of black-box sequence-to-sequence 2018. Identifying and Controlling Important models. In Proceedings of the 2017 Conference Neurons in Neural Machine Translation. arXiv on Empirical Methods in Natural Language preprint arXiv:1811.01157v1.

60 Rachel Bawden, Rico Sennrich, Alexandra Birch, Arianna Bisazza and Clara Tump. 2018. The Lazy and Barry Haddow. 2018. Evaluating Discourse Encoder: A Fine-Grained Analysis of the Role Phenomena in Neural Machine Translation. In of Morphology in Neural Machine Translation. Proceedings of the 2018 Conference of the In Proceedings of the 2018 Conference on North American Chapter of the Association Empirical Methods in Natural Language for Computational Linguistics: Human Lan- Processing, pages 2871–2876. Association for guage Technologies, Volume 1 (Long Papers), Computational Linguistics. pages 1304–1313. Association for Computa- Terra Blevins, Omer Levy, and Luke Zettlemoyer. tional Linguistics. 2018. Deep RNNs Encode Soft Hierarchi- Yonatan Belinkov. 2018. On Internal Language cal Syntax. In Proceedings of the 56th Annual Representations in Deep Learning: An Analy- Meeting of the Association for Computa- sis of Machine Translation and Speech Recog- tional Linguistics (Volume 2: Short Papers), nition. Ph.D. thesis, Massachusetts Institute of pages 14–19. Association for Computational Technology. Linguistics. Mikael Boden´ and Janet Wiles. 2002. On learning Yonatan Belinkov and Yonatan Bisk. 2018. Syn- context-free and context-sensitive languages. thetic and Natural Noise Both Break Neural IEEE Transactions on Neural Networks, 13(2): Machine Translation. In International Confer- 491–493. ence on Learning Representations (ICLR). Samuel R. Bowman, Gabor Angeli, Christopher Yonatan Belinkov, Nadir Durrani, Fahim Dalvi, Potts, and Christopher D. Manning. 2015. A Hassan Sajjad, and James Glass. 2017a. large annotated corpus for learning natural What do Neural Machine Translation Models language inference. In Proceedings of the Learn about Morphology? In Proceedings of 2015 Conference on Empirical Methods in the 55th Annual Meeting of the Association Natural Language Processing, pages 632–642. for Computational Linguistics (Volume 1: Association for Computational Linguistics. Long Papers), pages 861–872. Association for Computational Linguistics. Elia Bruni, Gemma Boleda, Marco Baroni, and Nam Khanh Tran. 2012. Distributional Yonatan Belinkov and James Glass. 2017, Anal- Semantics in Technicolor. In Proceedings of yzing Hidden Representations in End-to-End the 50th Annual Meeting of the Association Automatic Speech Recognition Systems, I. Guyon, for Computational Linguistics (Volume 1: U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, Long Papers), pages 136–145. Association for S. Vishwanathan, and R. Garnett, editors, Ad- Computational Linguistics. vances in Neural Information Processing Sys- Gino Brunner, Yuyi Wang, Roger Wattenhofer, tems 30, pages 2441–2451. Curran Associates, and Michael Weigelt. 2017. Natural Language Inc. Multitasking: Analyzing and Improving Syn- Yonatan Belinkov, Llu´ıs Marquez,` Hassan Sajjad, tactic Saliency of Hidden Representations. The Nadir Durrani, Fahim Dalvi, and James Glass. 31st Annual Conference on Neural Information 2017b. Evaluating Layers of Representation in Processing (NIPS)—Workshop on Learning Neural Machine Translation on Part-of-Speech Disentangled Features: From Perception to and Semantic Tagging Tasks. In Proceedings Control. of the Eighth International Joint Conference Aljoscha Burchardt, Vivien Macketanz, Jon on Natural Language Processing (Volume 1: Dehdari, Georg Heigold, Jan-Thorsten Peter, Long Papers), pages 1–10. Asian Federation of and Philip Williams. 2017. A Linguistic Natural Language Processing. Evaluation of Rule-Based, Phrase-Based, and Neural MT Engines. The Prague Bulletin of Jean-Philippe Bernardy. 2018. Can Recurrent Mathematical Linguistics, 108(1):159–170. Neural Networks Learn Nested Recursion? LiLT (Linguistic Issues in Language Tech- Franck Burlot and Franc¸ois Yvon. 2017. nology), 16(1). Evaluating the morphological competence of

61 Machine Translation Systems. In Proceedings Yining Chen, Sorcha Gilroy, Andreas Maletti, of the Second Conference on Machine Trans- Jonathan May, and Kevin Knight. 2018b. lation, pages 43–55. Association for Compu- Recurrent Neural Networks as Weighted tational Linguistics. Language Recognizers. In Proceedings of the 2018 Conference of the North American Mike Casey. 1996. The Dynamics of Discrete- Chapter of the Association for Computational Time Computation, with Application to Re- Linguistics: Human Language Technologies, current Neural Networks and Finite State Volume 1 (Long Papers), pages 2261–2271. Machine Extraction. Neural Computation, Association for Computational Linguistics. 8(6):1135–1178. Minhao Cheng, Jinfeng Yi, Huan Zhang, Pin-Yu Daniel Cer, Mona Diab, Eneko Agirre, Inigo Chen, and Cho-Jui Hsieh. 2018. Seq2Sick: Lopez-Gazpio, and Lucia Specia. 2017. Evaluating the Robustness of Sequence-to- SemEval-2017 Task 1: Semantic Textual Sim- Sequence Models with Adversarial Examples. ilarity Multilingual and Crosslingual Focused arXiv preprint arXiv:1803.01128v1. Evaluation. In Proceedings of the 11th Inter- national Workshop on Semantic Evaluation Grzegorz Chrupała, Lieke Gelderloos, and Afra (SemEval-2017), pages 1–14. Association for Alishahi. 2017. Representations of language in Computational Linguistics. a model of visually grounded speech signal. Rahma Chaabouni, Ewan Dunbar, Neil Zeghidour, In Proceedings of the 55th Annual Meeting of and Emmanuel Dupoux. 2017. Learning weakly the Association for Computational Linguistics supervised multimodal phoneme embeddings. (Volume 1: Long Papers), pages 613–622. In Interspeech 2017. Association for Computational Linguistics. Stephan K. Chalup and Alan D. Blair. 2003. Ondrejˇ C´ıfka and Ondrejˇ Bojar. 2018. Are BLEU Incremental Training of First Order Recurrent and Meaning Representation in Opposition? In Neural Networks to Predict a Context-Sensitive Proceedings of the 56th Annual Meeting of Language. Neural Networks, 16(7):955–972. the Association for Computational Linguistics (Volume 1: Long Papers), pages 1362–1371. Jonathan Chang, Sean Gerrish, Chong Wang, Association for Computational Linguistics. Jordan L. Boyd-graber, and David M. Blei. 2009, Reading Tea Leaves: How Humans Inter- Alexis Conneau, German´ Kruszewski, Guillaume pret Topic Models, Y. Bengio, D. Schuurmans, Lample, Lo¨ıc Barrault, and Marco Baroni. J. D. Lafferty, C. K. I. Williams, and A. Culotta, 2018. What you can cram into a single editors, Advances in Neural Information Pro- $&!#* vector: Probing sentence embeddings cessing Systems 22, pages 288–296, Curran for linguistic properties. In Proceedings of the Associates, Inc.. 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Hongge Chen, Huan Zhang, Pin-Yu Chen, Jinfeng Papers), pages 2126–2136. Association for Yi, and Cho-Jui Hsieh. 2018a. Attacking visual Computational Linguistics. language grounding with adversarial examples: A case study on neural image captioning. In Robin Cooper, Dick Crouch, Jan van Eijck, Chris Proceedings of the 56th Annual Meeting of Fox, Josef van Genabith, Jan Jaspars, Hans the Association for Computational Linguistics Kamp, David Milward, Manfred Pinkal, (Volume 1: Long Papers), pages 2587–2597. Massimo Poesio, Steve Pulman, Ted Briscoe, Association for Computational Linguistics. Holger Maier, and Karsten Konrad. 1996, Using the framework. Technical report, The FraCaS Xinchi Chen, Xipeng Qiu, Chenxi Zhu, Shiyu Wu, Consortium. and Xuanjing Huang. 2015. Sentence Modeling with Gated Recursive Neural Network. In Proc- Fahim Dalvi, Nadir Durrani, Hassan Sajjad, eedings of the 2015 Conference on Empirical Yonatan Belinkov, D. Anthony Bau, and James Methods in Natural Language Processing, Glass. 2019a, January. What Is One Grain pages 793–798. Association for Computational of Sand in the Desert? Analyzing Individual Linguistics. Neurons in Deep NLP Models. In Proceedings

62 of the Thirty-Third AAAI Conference on Machine Learning. In arXiv preprint arXiv: Artificial Intelligence (AAAI). 1702.08608v2.

Fahim Dalvi, Nadir Durrani, Hassan Sajjad, Finale Doshi-Velez, Mason Kortz, Ryan Yonatan Belinkov, and Stephan Vogel. 2017. Budish, Chris Bavitz, Sam Gershman, David Understanding and Improving Morphological O’Brien, Stuart Shieber, James Waldo, David Learning in the Neural Machine Transla- Weinberger, and Alexandra Wood. 2017. tion Decoder. In Proceedings of the Eighth Accountability of AI Under the Law: The International Joint Conference on Natural Lan- Role of Explanation. Privacy Law Scholars guage Processing (Volume 1: Long Papers), Conference. pages 142–151. Asian Federation of Natural Language Processing. Jennifer Drexler and James Glass. 2017. Analysis of Audio-Visual Features for Unsupervised Fahim Dalvi, Avery Nortonsmith, D. Anthony Speech Recognition. In International Workshop Bau, Yonatan Belinkov, Hassan Sajjad, Nadir on Grounding Language Understanding. Durrani, and James Glass. 2019b, January. NeuroX: A Toolkit for Analyzing Individual Javid Ebrahimi, Daniel Lowd, and Dejing Neurons in Neural Networks. In Proceedings of Dou. 2018a. On Adversarial Examples for the Thirty-Third AAAI Conference on Artificial Character-Level Neural Machine Translation. Intelligence (AAAI): Demonstrations Track. In Proceedings of the 27th International Sreerupa Das, C. Lee Giles, and Guo-Zheng Conference on Computational Linguistics, Sun. 1992. Learning Context-Free Grammars: pages 653–663. Association for Computational Capabilities and Limitations of a Recurrent Linguistics. Neural Network with an External Stack Memory. In Proceedings of The Fourteenth Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Annual Conference of Cognitive Science Dejing Dou. 2018b. HotFlip: White-Box Society. Indiana University, page 14. Adversarial Examples for Text Classification. In Proceedings of the 56th Annual Meeting of Ishita Dasgupta, Demi Guo, Andreas Stuhlmuller,¨ the Association for Computational Linguistics Samuel J. Gershman, and Noah D. Goodman. (Volume 2: Short Papers), pages 31–36. 2018. Evaluating Compositionality in Sen- Association for Computational Linguistics. tence Embeddings. arXiv preprint arXiv:1802. 04302v2. Ali Elkahky, Kellie Webster, Daniel Andor, and Emily Pitler. 2018. A Challenge Set Dhanush Dharmaretnam and Alona Fyshe. and Methods for Noun-Verb Ambiguity. In 2018. The Emergence of Semantics in Proceedings of the 2018 Conference on Neural Network Representations of Visual Empirical Methods in Natural Language Information. In Proceedings of the 2018 Processing, pages 2562–2572. Association for Conference of the North American Chapter of Computational Linguistics. the Association for Computational Linguistics: Human Language Technologies, Volume 2 Zied Elloumi, Laurent Besacier, Olivier Galibert, (Short Papers), pages 776–780. Association for and Benjamin Lecouteux. 2018. Analyzing Computational Linguistics. Learned Representations of a Deep ASR Yanzhuo Ding, Yang Liu, Huanbo Luan, Performance Prediction Model. In Proceedings and Maosong Sun. 2017. Visualizing and of the 2018 EMNLP Workshop BlackboxNLP: Understanding Neural Machine Translation. In Analyzing and Interpreting Neural Networks Proceedings of the 55th Annual Meeting of for NLP, pages 9–15. Association for the Association for Computational Linguistics Computational Linguistics. (Volume 1: Long Papers), pages 1150–1159. Jeffrey L. Elman. 1989. Representation and Association for Computational Linguistics. Structure in Connectionist Models, University Finale Doshi-Velez and Been Kim. 2017. of California, San Diego, Center for Research Towards a Rigorous Science of Interpretable in Language.

63 Jeffrey L. Elman. 1990. Finding Structure in Cynthia Freeman, Jonathan Merriman, Abhinav Time. Cognitive Science, 14(2):179–211. Aggarwal, Ian Beaver, and Abdullah Mueen. 2018. Paying Attention to Attention: High- Jeffrey L. Elman. 1991. Distributed represen- lighting Influential Samples in Sequential tations, simple recurrent networks, and gram- Analysis. arXiv preprint arXiv:1808.02113v1. matical structure. Machine Learning, 7(2–3): 195–225. Alona Fyshe, Leila Wehbe, Partha P. Talukdar, Brian Murphy, and Tom M. Mitchell. 2015. Allyson Ettinger, Ahmed Elgohary, and Philip A Compositional and Interpretable Semantic Resnik. 2016. Probing for semantic evidence of Space. In Proceedings of the 2015 Conference composition by means of simple classification of the North American Chapter of the tasks. In Proceedings of the 1st Workshop Association for Computational Linguistics: on Evaluating Vector-Space Representations Human Language Technologies, pages 32–41. for NLP, pages 134–139. Association for Association for Computational Linguistics. Computational Linguistics. David Gaddy, Mitchell Stern, and Dan Klein. Manaal Faruqui, Yulia Tsvetkov, Pushpendre 2018. What’s Going On in Neural Constituency Rastogi, and Chris Dyer. 2016. Problems Parsers? An Analysis. In Proceedings of With Evaluation of Word Embeddings Using the 2018 Conference of the North American Word Similarity Tasks. In Proceedings of the Chapter of the Association for Computational 1st Workshop on Evaluating Vector Space Linguistics: Human Language Technologies, Representations for NLP. Volume 1 (Long Papers), pages 999–1010. Association for Computational Linguistics. Manaal Faruqui, Yulia Tsvetkov, Dani Yogatama, Chris Dyer, and Noah A. Smith. 2015. Sparse J. Ganesh, Manish Gupta, and Vasudeva Overcomplete Word Vector Representations. Varma. 2017. Interpretation of Semantic In Proceedings of the 53rd Annual Meeting of Tweet Representations. In Proceedings of the the Association for Computational Linguistics 2017 IEEE/ACM International Conference on and the 7th International Joint Conference Advances in Social Networks Analysis and on Natural Language Processing (Volume 1: Mining 2017, ASONAM ’17, pages 95–102, Long Papers), pages 1491–1500. Association New York, NY, USA. ACM. for Computational Linguistics. Ji Gao, Jack Lanchantin, Mary Lou Soffa, and Shi Feng, Eric Wallace, Alvin Grissom II, Yanjun Qi. 2018. Black-box Generation of Mohit Iyyer, Pedro Rodriguez, and Jordan Adversarial Text Sequences to Evade Deep Boyd-Graber. 2018. Pathologies of Neural Learning Classifiers. arXiv preprint arXiv: Models Make Interpretations Difficult. In 1801.04354v5. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Lieke Gelderloos and Grzegorz Chrupała. 2016. Processing, pages 3719–3728. Association for From phonemes to images: Levels of repre- Computational Linguistics. sentation in a recurrent neural model of visually- grounded language learning. In Proceedings Lev Finkelstein, Evgeniy Gabrilovich, Yossi of COLING 2016, the 26th International Matias, Ehud Rivlin, Zach Solan, Gadi Conference on Computational Linguistics: Wolfman, and Eytan Ruppin. 2002. Placing Technical Papers, pages 1309–1319, Osaka, Search in Context: The Concept Revisited. Japan, The COLING 2016 Organizing ACM Transactions on Information Systems, Committee. 20(1):116–131. Felix A. Gers and Jurgen¨ Schmidhuber. 2001. Robert Frank, Donald Mathis, and William LSTM Recurrent Networks Learn Simple Badecker. 2013. The Acquisition of Anaphora Context-Free and Context-Sensitive Languages. by Simple Recurrent Networks. Language IEEE Transactions on Neural Networks, 12(6): Acquisition, 20(3):181–227. 1333–1340.

64 Daniela Gerz, Ivan Vulic,´ Felix Hill, Roi Reichart, of Synthesis Lectures on Human Language and Anna Korhonen. 2016. SimVerb-3500: A Technologies. Morgan & Claypool Publishers. Large-Scale Evaluation Set of Verb Similarity. In Proceedings of the 2016 Conference on Ian Goodfellow, Yoshua Bengio, and Aaron Empirical Methods in Natural Language Courville. 2016. Deep Learning, MIT Press. Processing, pages 2173–2182. Association for http://www.deepleaningbook.org. Computational Linguistics. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Hamidreza Ghader and Christof Monz. 2017. Mirza, Bing Xu, David Warde-Farley, Sherjil What does Attention in Neural Machine Ozair, Aaron Courville, and Yoshua Bengio. Translation Pay Attention to? In Proceedings 2014. Generative Adversarial Nets. In Advances of the Eighth International Joint Conference in Neural Information Processing Systems, on Natural Language Processing (Volume 1: pages 2672–2680. Long Papers), pages 30–39. Asian Federation Ian J. Goodfellow, Jonathon Shlens, and Christian of Natural Language Processing. Szegedy. 2015. Explaining and Harnessing Reza Ghaeini, Xiaoli Fern, and Prasad Tadepalli. Adversarial Examples. In International Con- 2018. Interpreting Recurrent and Attention- ference on Learning Representations (ICLR). Based Neural Models: A Case Study on Nat- Kristina Gulordava, Piotr Bojanowski, Edouard ural Language Inference. In Proceedings of the Grave, Tal Linzen, and Marco Baroni. 2018. 2018 Conference on Empirical Methods in Nat- Colorless Green Recurrent Networks Dream ural Language Processing, pages 4952–4957. Hierarchically. In Proceedings of the 2018 Association for Computational Linguistics. Conference of the North American Chapter of Mario Giulianelli, Jack Harding, Florian Mohnert, the Association for Computational Linguistics: Dieuwke Hupkes, and Willem Zuidema. 2018. Human Language Technologies, Volume 1 Under the Hood: Using Diagnostic Classifiers (Long Papers), pages 1195–1205. Association to Investigate and Improve How Language for Computational Linguistics. Models Track Agreement Information. In Abhijeet Gupta, Gemma Boleda, Marco Baroni, Proceedings of the 2018 EMNLP Workshop and Sebastian Pado.´ 2015. Distributional BlackboxNLP: Analyzing and Interpreting vectors encode referential attributes. In Pro- Neural Networks for NLP, pages 240–248. ceedings of the 2015 Conference on Empirical Association for Computational Linguistics. Methods in Natural Language Processing, Max Glockner, Vered Shwartz, and Yoav pages 12–21. Association for Computational Goldberg. 2018. Breaking NLI Systems with Linguistics. Sentences that Require Simple Lexical Infer- Pankaj Gupta and Hinrich Schutze.¨ 2018. LISA: ences. In Proceedings of the 56th Annual Meeting of the Association for Computa- Explaining Recurrent Neural Network Judg- ments via Layer-wIse Semantic Accumulation tional Linguistics (Volume 2: Short Papers), and Example to Pattern Transformation. In pages 650–655. Association for Computational Proceedings of the 2018 EMNLP Work- Linguistics. shop BlackboxNLP: Analyzing and Interpreting Frederic´ Godin, Kris Demuynck, Joni Dambre, Neural Networks for NLP, pages 154–164. Wesley De Neve, and Thomas Demeester. Association for Computational Linguistics. 2018. Explaining Character-Aware Neural Net- works for Word-Level Prediction: Do They Dis- Suchin Gururangan, Swabha Swayamdipta, Omer cover Linguistic Rules? In Proceedings of the Levy, Roy Schwartz, Samuel Bowman, and 2018 Conference on Empirical Methods in Nat- Noah A. Smith. 2018. Annotation Artifacts ural Language Processing, pages 3275–3284. in Natural Language Inference Data. In Association for Computational Linguistics. Proceedings of the 2018 Conference of the North American Chapter of the Association for Yoav Goldberg. 2017. Neural Network methods Computational Linguistics: Human Language for Natural Language Processing, volume 10 Technologies, Volume 2 (Short Papers),

65 pages 107–112. Association for Computational Paraphrase Networks. In Proceedings of the Linguistics. 2018 Conference of the North American Chapter of the Association for Computational Catherine L. Harris. 1990. Connectionism and Linguistics: Human Language Technologies, Cognitive Linguistics. Connection Science, Volume 1 (Long Papers), pages 1875–1885. 2(1–2):7–33. Association for Computational Linguistics. David Harwath and James Glass. 2017. Learning Alon Jacovi, Oren Sar Shalom, and Yoav Word-Like Units from Joint Audio-Visual Ana- Goldberg. 2018. Understanding Convolutional lysis. In Proceedings of the 55th Annual Meeting Neural Networks for Text Classification. In of the Association for Computational Lin- Proceedings of the 2018 EMNLP Workshop guistics (Volume 1: Long Papers), pages 506–517. BlackboxNLP: Analyzing and Interpreting Association for Computational Linguistics. Neural Networks for NLP, pages 56–65. Georg Heigold, Gunter¨ Neumann, and Josef Association for Computational Linguistics. van Genabith. 2018. How Robust Are Inigo Jauregi Unanue, Ehsan Zare Borzeshi, and Character-Based Word Embeddings in Tagging Massimo Piccardi. 2018. A Shared Atten- and MT Against Wrod Scramlbing or Randdm tion Mechanism for Interpretation of Neural Nouse? In Proceedings of the 13th Conference Automatic Post-Editing Systems. In Proceed- of The Association for Machine Translation ings of the 2nd Workshop on Neural Machine in the Americas (Volume 1: Research Track), Translation and Generation, pages 11–17. pages 68–79. Association for Computational Linguistics. Felix Hill, Roi Reichart, and Anna Korhonen. Robin Jia and Percy Liang. 2017. Adversarial 2015. SimLex-999: Evaluating Semantic examples for evaluating reading comprehension Models with (Genuine) Similarity Estimation. systems. In Proceedings of the 2017 Conference Computational Linguistics, 41(4):665–695. on Empirical Methods in Natural Language Processing, pages 2021–2031. Association for Dieuwke Hupkes, Sara Veldhoen, and Willem Computational Linguistics. Zuidema. 2018. Visualisation and ‘‘diagnostic classifiers’’ reveal how recurrent and recursive Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, neural networks process hierarchical structure. Noam Shazeer, and Yonghui Wu. 2016. Journal of Artificial Intelligence Research, Exploring the Limits of Language Modeling. 61:907–926. arXiv preprint arXiv:1602.02410v2. Pierre Isabelle, Colin Cherry, and George Foster. Akos Kad´ ar,´ Grzegorz Chrupała, and Afra 2017. A Challenge Set Approach to Eval- Alishahi. 2017. Representation of Lin- uating Machine Translation. In Proceed- guistic Form and Function in Recurrent ings of the 2017 Conference on Empirical Neural Networks. Computational Linguistics, Methods in Natural Language Processing, 43(4):761–780. pages 2486–2496. Association for Computa- tional Linguistics. Andrej Karpathy, Justin Johnson, and Fei-Fei Li. 2015. Visualizing and Understanding Recurrent Pierre Isabelle and Roland Kuhn. 2018. A Chal- Networks. arXiv preprint arXiv:1506.02078v2. lenge Set for French–> English Machine Trans- lation. arXiv preprint arXiv:1806.02725v2. Urvashi Khandelwal, He He, Peng Qi, and Dan Jurafsky. 2018. Sharp Nearby, Fuzzy Far Away: Hitoshi Isahara. 1995. JEIDA’s test-sets for How Neural Language Models Use Context. In quality evaluation of MT systems—technical Proceedings of the 56th Annual Meeting of evaluation from the developer’s point of view. the Association for Computational Linguistics In Proceedings of MT Summit V. (Volume 1: Long Papers), pages 284–294. Association for Computational Linguistics. Mohit Iyyer, John Wieting, Kevin Gimpel, and Luke Zettlemoyer. 2018. Adversarial Exam- Margaret King and Kirsten Falkedal. 1990. ple Generation with Syntactically Controlled Using Test Suites in Evaluation of Machine

66 Translation Systems. In COLNG 1990 Volume 2: Ira Leviant and Roi Reichart. 2015. Separated by Papers Presented to the 13th International an Un-Common Language: Towards Judgment Conference on Computational Linguistics. Language Informed Vector Space Modeling. arXiv preprint arXiv:1508.00106v5. Eliyahu Kiperwasser and Yoav Goldberg. 2016. Simple and Accurate Dependency Parsing Jiwei Li, Xinlei Chen, Eduard Hovy, and Using Bidirectional LSTM Feature Represen- Dan Jurafsky. 2016a. Visualizing and Under- tations. Transactions of the Association for standing Neural Models in NLP. In Proceedings Computational Linguistics, 4:313–327. of the 2016 Conference of the North American Chapter of the Association for Computational Sungryong Koh, Jinee Maeng, Ji-Young Lee, Linguistics: Human Language Technologies, Young-Sook Chae, and Key-Sun Choi. 2001. A pages 681–691. Association for Computational test suite for evaluation of English-to-Korean Linguistics. machine translation systems. In MT Summit Conference. Jiwei Li, Will Monroe, and Dan Jurafsky. 2016b. Understanding Neural Networks through Arne Kohn.¨ 2015. What’s in an Embedding? Representation Erasure. arXiv preprint arXiv: Analyzing Word Embeddings through Multi- 1612.08220v3. lingual Evaluation. In Proceedings of the 2015 Conference on Empirical Methods in Natu- Bin Liang, Hongcheng Li, Miaoqiang Su, ral Language Processing, pages 2067–2073, Pan Bian, Xirong Li, and Wenchang Shi. Lisbon, Portugal. Association for Computa- 2018. Deep Text Classification Can Be tional Linguistics. Fooled. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Volodymyr Kuleshov, Shantanu Thakoor, Intelligence, IJCAI-18, pages 4208–4215. Tingfung Lau, and Stefano Ermon. 2018. International Joint Conferences on Artificial Adversarial Examples for Natural Language Intelligence Organization. Classification Problems. Tal Linzen, Emmanuel Dupoux, and Yoav Brenden Lake and Marco Baroni. 2018. Goldberg. 2016. Assessing the Ability of Generalization without Systematicity: On the LSTMs to Learn Syntax-Sensitive Depen- Compositional Skills of Sequence-to-Sequence dencies. Transactions of the Association for Recurrent Networks. In Proceedings of the Computational Linguistics, 4:521–535. 35th International Conference on Machine Zachary C. Lipton. 2016. The Mythos of Model Learning, volume 80 of Proceedings of Ma- Interpretability. In ICML Workshop on Human chine Learning Research, pages 2873–2882, Interpretability of Machine Learning. Stockholmsmassan,¨ Stockholm, Sweden. PMLR. Nelson F. Liu, Omer Levy, Roy Schwartz, Sabine Lehmann, Stephan Oepen, Sylvie Regnier- Chenhao Tan, and Noah A. Smith. 2018. Prost, Klaus Netter, Veronika Lux, Judith LSTMs Exploit Linguistic Attributes of Data. Klein, Kirsten Falkedal, Frederik Fouvry, In Proceedings of The Third Workshop on Rep- Dominique Estival, Eva Dauphin, Herve resentation Learning for NLP, pages 180–186. Compagnion, Judith Baur, Lorna Balkan, and Association for Computational Linguistics. Doug Arnold. 1996. TSNLP—Test Suites for Natural Language Processing. In COLING 1996 Yanpei Liu, Xinyun Chen, Chang Liu, and Volume 2: The 16th International Conference Dawn Song. 2017. Delving into Transferable on Computational Linguistics. Adversarial Examples and Black-Box Attacks. In International Conference on Learning Tao Lei, Regina Barzilay, and Tommi Jaakkola. Representations (ICLR). 2016. Rationalizing Neural Predictions. In Proceedings of the 2016 Conference on Thang Luong, Richard Socher, and Christopher Empirical Methods in Natural Language Manning. 2013. Better Word Representations Processing, pages 107–117. Association for with Recursive Neural Networks for Mor- Computational Linguistics. phology. In Proceedings of the Seventeenth

67 Conference on Computational Natural Lan- Pramod Kaushik Mudrakarta, Ankur Taly, guage Learning, pages 104–113. Association Mukund Sundararajan, and Kedar Dhamdhere. for Computational Linguistics. 2018. Did the Model Understand the Question? In Proceedings of the 56th Annual Meeting of Jean Maillard and Stephen Clark. 2018. Latent the Association for Computational Linguistics Tree Learning with Differentiable Parsers: (Volume 1: Long Papers), pages 1896–1906. Shift-Reduce Parsing and Chart Parsing. In Association for Computational Linguistics. Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures James Mullenbach, Sarah Wiegreffe, Jon Duke, for NLP, pages 13–18. Association for Com- Jimeng Sun, and Jacob Eisenstein. 2018. putational Linguistics. Explainable Prediction of Medical Codes from Clinical Text. In Proceedings of the 2018 Marco Marelli, Luisa Bentivogli, Marco Conference of the North American Chapter of Baroni, Raffaella Bernardi, Stefano Menini, the Association for Computational Linguistics: and Roberto Zamparelli. 2014. SemEval- Human Language Technologies, Volume 1 2014 Task 1: Evaluation of Compositional (Long Papers), pages 1101–1111. Association Distributional Semantic Models on Full Sen- for Computational Linguistics. tences through Semantic Relatedness and W. James Murdoch, Peter J. Liu, and Bin Yu. Textual Entailment. In Proceedings of the 2018. Beyond Word Importance: Contextual 8th International Workshop on Semantic Eval- Decomposition to Extract Interactions from uation (SemEval 2014), pages 1–8. Association LSTMs. In International Conference on for Computational Linguistics. Learning Representations. R. Thomas McCoy, Robert Frank, and Tal Brian Murphy, Partha Talukdar, and Tom Linzen. 2018. Revisiting the poverty of the Mitchell. 2012. Learning Effective and stimulus: Hierarchical generalization without a Interpretable Semantic Models Using Non- hierarchical bias in recurrent neural networks. Negative Sparse Embedding. In Proceedings In Proceedings of the 40th Annual Conference of COLING 2012, pages 1933–1950. The of the Cognitive Science Society. COLING 2012 Organizing Committee.

Risto Miikkulainen and Michael G. Dyer. 1991. Tasha Nagamine, Michael L. Seltzer, and Nima Natural Language Processing with Modular Pdp Mesgarani. 2015. Exploring How Deep Neural Networks and Distributed Lexicon. Cognitive Networks Form Phonemic Categories. In Science, 15(3):343–399. Interspeech 2015. Tasha Nagamine, Michael L. Seltzer, and Toma´sˇ Mikolov, Martin Karafiat,´ Luka´sˇ Burget, Nima Mesgarani. 2016. On the Role of Non- Jan Cernockˇ y,` and Sanjeev Khudanpur. 2010. linear Transformations in Deep Neural Net- Recurrent neural network based language work Acoustic Models. In Interspeech 2016, model. In Eleventh Annual Conference of pages 803–807. the International Speech Communication Association. Aakanksha Naik, Abhilasha Ravichander, Norman Sadeh, Carolyn Rose, and Graham Yao Ming, Shaozu Cao, Ruixiang Zhang, Zhen Neubig. 2018. Stress Test Evaluation for Li, Yuanzhe Chen, Yangqiu Song, and Huamin Natural Language Inference. In Proceedings Qu. 2017. Understanding Hidden Memories of the 27th International Conference on of Recurrent Neural Networks. In IEEE Computational Linguistics, pages 2340–2353. Conference on Visual Analytics Science and Association for Computational Linguistics. Technology (IEEE VAST 2017). Nina Narodytska and Shiva Kasiviswanathan. 2017. Gregoire´ Montavon, Wojciech Samek, and Klaus- Simple Black-Box Adversarial Attacks on Deep Robert Muller.¨ 2018. Methods for interpreting Neural Networks. In 2017 IEEE Conference and understanding deep neural networks. on Computer Vision and Pattern Recognition Digital Signal Processing, 73:1–15. Workshops (CVPRW), pages 1310–1318.

68 Lars Niklasson and Fredrik Linaker.˚ 2000. Conference on Empirical Methods in Natu- Distributed representations for extended syn- ral Language Processing, pages 1499–1509. tactic transformation. Connection Science, Association for Computational Linguistics. 12(3–4):299–314. Adam Poliak, Aparajita Haldar, Rachel Rudinger, Tong Niu and Mohit Bansal. 2018. Adversarial J. Edward Hu, Ellie Pavlick, Aaron Steven Over-Sensitivity and Over-Stability Strategies White, and Benjamin Van Durme. 2018a. Col- for Dialogue Models. In Proceedings of the lecting Diverse Natural Language Inference 22nd Conference on Computational Natural Problems for Sentence Representation Evalu- Language Learning, pages 486–496. Associa- ation. In Proceedings of the 2018 Conference tion for Computational Linguistics. on Empirical Methods in Natural Language Processing, pages 67–81. Association for Nicolas Papernot, Patrick McDaniel, and Ian Computational Linguistics. Goodfellow. 2016. Transferability in Machine Learning: From Phenomena to Black-Box Adam Poliak, Jason Naradowsky, Aparajita Attacks Using Adversarial Samples. arXiv Haldar, Rachel Rudinger, and Benjamin Van preprint arXiv:1605.07277v1. Durme. 2018. Hypothesis Only Baselines in Natural Language Inference. In Proceedings Nicolas Papernot, Patrick McDaniel, Ian of the Seventh Joint Conference on Lexical Goodfellow, Somesh Jha, Z. Berkay Celik, and Computational Semantics, pages 180–191. and Ananthram Swami. 2017. Practical Black- Association for Computational Linguistics. Box Attacks Against Machine Learning. In Jordan B. Pollack. 1990. Recursive distrib- Proceedings of the 2017 ACM on Asia Con- uted representations. Artificial Intelligence, ference on Computer and Communications 46(1):77–105. Security, ASIA CCS ’17, pages 506–519, New York, NY, USA, ACM. Peng Qian, Xipeng Qiu, and Xuanjing Huang. 2016a. Analyzing Linguistic Knowledge in Nicolas Papernot, Patrick McDaniel, Ananthram Sequential Model of Sentence. In Proceedings Swami, and Richard Harang. 2016. Crafting of the 2016 Conference on Empirical Methods Adversarial Input Sequences for Recurrent in Natural Language Processing, pages 826–835, Neural Networks. In Military Communications Austin, Texas. Association for Computational Conference, MILCOM 2016, pages 49–54. Linguistics. IEEE. Peng Qian, Xipeng Qiu, and Xuanjing Huang. Dong Huk Park, Lisa Anne Hendricks, Zeynep 2016b. Investigating Language Universal and Akata, Anna Rohrbach, Bernt Schiele, Trevor Specific Properties in Word Embeddings. In Darrell, and Marcus Rohrbach. 2018. Multi- Proceedings of the 54th Annual Meeting of modal Explanations: Justifying Decisions and the Association for Computational Linguistics Pointing to the Evidence. In The IEEE Con- (Volume 1: Long Papers), pages 1478–1488, ference on Computer Vision and Pattern Berlin, Germany. Association for Computa- Recognition (CVPR). tional Linguistics.

Sungjoon Park, JinYeong Bak, and Alice Oh. Marco Tulio Ribeiro, Sameer Singh, and 2017. Rotated Word Vector Representations Carlos Guestrin. 2018. Semantically Equivalent and Their Interpretability. In Proceedings of Adversarial Rules for Debugging NLP models. the 2017 Conference on Empirical Methods in In Proceedings of the 56th Annual Meeting of Natural Language Processing, pages 401–411. the Association for Computational Linguistics Association for Computational Linguistics. (Volume 1: Long Papers), pages 856–865. Association for Computational Linguistics. Matthew Peters, Mark Neumann, Luke Zettlemoyer, and Wen-tau Yih. 2018. Dissect- Mat¯ıss Rikters. 2018. Debugging Neural Ma- ing Contextual Word Embeddings: Architecture chine Translations. arXiv preprint arXiv:1808. and Representation. In Proceedings of the 2018 02733v1.

69 Annette Rios Gonzales, Laura Mascarell, and Suranjana Samanta and Sameep Mehta. 2017. Rico Sennrich. 2017. Improving Word Sense Towards Crafting Text Adversarial Samples. Disambiguation in Neural Machine Translation arXiv preprint arXiv:1707.02812v1. with Sense Embeddings. In Proceedings of the Ivan Sanchez, Jeff Mitchell, and Sebastian Riedel. Second Conference on Machine Translation, 2018. Behavior Analysis of NLI Models: pages 11–19. Association for Computational Uncovering the Influence of Three Factors Linguistics. on Robustness. In Proceedings of the 2018 Tim Rocktaschel,¨ Edward Grefenstette, Karl Conference of the North American Chapter of Moritz Hermann, Toma´sˇ Kociskˇ y,` and Phil the Association for Computational Linguistics: Blunsom. 2016. Reasoning about Entailment Human Language Technologies, Volume 1 with Neural Attention. In International Con- (Long Papers), pages 1975–1985. Association ference on Learning Representations (ICLR). for Computational Linguistics. Motoki Sato, Jun Suzuki, Hiroyuki Shindo, and Andras Rozsa, Ethan M. Rudd, and Terrance E. Yuji Matsumoto. 2018. Interpretable Adversar- Boult. 2016. Adversarial Diversity and Hard ial Perturbation in Input Embedding Space for Positive Generation. In Proceedings of the IEEE Text. In Proceedings of the Twenty-Seventh Conference on Computer Vision and Pattern International Joint Conference on Artifi- Recognition Workshops, pages 25–32. cial Intelligence, IJCAI-18, pages 4323–4330. Rachel Rudinger, Jason Naradowsky, Brian International Joint Conferences on Artificial Leonard, and Benjamin Van Durme. 2018. Intelligence Organization. Gender Bias in Coreference Resolution. In Lutfi Kerem Senel, Ihsan Utlu, Veysel Yucesoy, Proceedings of the 2018 Conference of the Aykut Koc, and Tolga Cukur. 2018. Se- North American Chapter of the Association for mantic Structure and Interpretability of Word Computational Linguistics: Human Language Embeddings. IEEE/ACM Transactions on Technologies, Volume 2 (Short Papers), Audio, Speech, and Language Processing. pages 8–14. Association for Computational Linguistics. Rico Sennrich. 2017. How Grammatical Is Character-Level Neural Machine Translation? D. E. Rumelhart and J. L. McClelland. 1986. Assessing MT Quality with Contrastive Trans- Parallel Distributed Processing: Explorations lation Pairs. In Proceedings of the 15th Con- in the Microstructure of Cognition. volume 2, ference of the European Chapter of the chapter On Leaning the Past Tenses of English Association for Computational Linguistics: Verbs, pages 216–271. MIT Press, Cambridge, Volume 2, Short Papers, pages 376–382. MA, USA. Association for Computational Linguistics.

Alexander M. Rush, Sumit Chopra, and Jason Haoyue Shi, Jiayuan Mao, Tete Xiao, Yuning Weston. 2015. A Neural Attention Model Jiang, and Jian Sun. 2018. Learning Visually- for Abstractive Sentence Summarization. In Grounded Semantics from Contrastive Adver- Proceedings of the 2015 Conference on Em- sarial Samples. In Proceedings of the 27th pirical Methods in Natural Language Pro- International Conference on Computational cessing, pages 379–389. Association for Linguistics, pages 3715–3727. Association for Computational Linguistics. Computational Linguistics. Xing Shi, Kevin Knight, and Deniz Yuret. Keisuke Sakaguchi, Kevin Duh, Matt Post, and 2016a. Why Neural Translations are the Right Benjamin Van Durme. 2017. Robsut Wrod Length. In Proceedings of the 2016 Conference Reocginiton via Semi-Character Recurrent on Empirical Methods in Natural Language Neural Network. In Proceedings of the Processing, pages 2278–2282. Association for Thirty-First AAAI Conference on Artificial Computational Linguistics. Intelligence, February 4-9, 2017, San Francisco, California, USA., pages 3281–3287. AAAI Xing Shi, Inkit Padhi, and Kevin Knight. 2016b. Press. Does String-Based Neural MT Learn Source

70 Syntax? In Proceedings of the 2016 Conference the Third Conference on Machine Translation: on Empirical Methods in Natural Language Research Papers, pages 26–35. Association for Processing, pages 1526–1534, Austin, Texas. Computational Linguistics. Association for Computational Linguistics. Yi Tay, Anh Tuan Luu, and Siu Cheung Hui. Chandan Singh, W. James Murdoch, and Bin 2018. CoupleNet: Paying Attention to Couples Yu. 2018. Hierarchical interpretations for with Coupled Attention for Relationship Rec- neural network predictions. arXiv preprint ommendation. In Proceedings of the Twelfth arXiv:1806.05337v1. International AAAI Conference on Web and Social Media (ICWSM). Hendrik Strobelt, Sebastian Gehrmann, Michael Behrisch, Adam Perer, Hanspeter Pfister, and Ke Tran, Arianna Bisazza, and Christof Monz. Alexander M. Rush. 2018a. Seq2Seq-Vis: 2018. The Importance of Being Recurrent for A Visual Debugging Tool for Sequence- Modeling Hierarchical Structure. In Proceed- to-Sequence Models. arXiv preprint arXiv: ings of the 2018 Conference on Empirical 1804.09299v1. Methods in Natural Language Processing, pages 4731–4736. Association for Computa- Hendrik Strobelt, Sebastian Gehrmann, Hanspeter tional Linguistics. Pfister, and Alexander M. Rush. 2018b. LSTMVis: A Tool for Visual Analysis of Eva Vanmassenhove, Jinhua Du, and Andy Way. Hidden State Dynamics in Recurrent Neural 2017. Investigating ‘‘Aspect’’ in NMT and Networks. IEEE Transactions on Visualization SMT: Translating the English Simple Past and and Computer Graphics, 24(1):667–676. Present Perfect. Computational Linguistics in Mukund Sundararajan, Ankur Taly, and Qiqi the Netherlands Journal, 7:109–128. Yan. 2017. Axiomatic Attribution for Deep Sara Veldhoen, Dieuwke Hupkes, and Willem Networks. In Proceedings of the 34th Inter- Zuidema. 2016. Diagnostic Classifiers: Reveal- national Conference on Machine Learning, ing How Neural Networks Process Hierarchical Volume 70 of Proceedings of Machine Learn- Structure. In CEUR Workshop Proceedings. ing Research, pages 3319–3328, International Convention Centre, Sydney, Australia. PMLR. Elena Voita, Pavel Serdyukov, Rico Sennrich, Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. and Ivan Titov. 2018. Context-Aware Neural 2014. Sequence to Sequence Learning with Machine Translation Learns Anaphora Resolu- Neural Networks. In Advances in neural infor- tion. In Proceedings of the 56th Annual Meeting mation processing systems, pages 3104–3112. of the Association for Computational Linguis- tics (Volume 1: Long Papers), pages 1264–1274. Mirac Suzgun, Yonatan Belinkov, and Stuart M. Association for Computational Linguistics. Shieber. 2019. On Evaluating the Generaliza- tion of LSTM Models in Formal Languages. Ekaterina Vylomova, Trevor Cohn, Xuanli In Proceedings of the Society for Computation He, and Gholamreza Haffari. 2016. Word in Linguistics (SCiL). Representation Models for Morphologically Rich Languages in Neural Machine Translation. Christian Szegedy, Wojciech Zaremba, Ilya arXiv preprint arXiv:1606.04217v1. Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing Alex Wang, Amapreet Singh, Julian Michael, properties of neural networks. In International Felix Hill, Omer Levy, and Samuel R. Bowman. Conference on Learning Representations 2018a. GLUE: A Multi-Task Benchmark and (ICLR). Analysis Platform for Natural Language Under- standing. arXiv preprint arXiv:1804.07461v1. Gongbo Tang, Rico Sennrich, and Joakim Nivre. 2018. An Analysis of Attention Mechanisms: Shuai Wang, Yanmin Qian, and Kai Yu. 2017a. The Case of Word Sense Disambiguation in What Does the Speaker Embedding Encode? Neural Machine Translation. In Proceedings of In Interspeech 2017, pages 1497–1501.

71 Xinyi Wang, Hieu Pham, Pengcheng Yin, and Omar Zaidan, Jason Eisner, and Christine Graham Neubig. 2018b. A Tree-Based Decoder Piatko. 2007. Using ‘‘Annotator Rationales’’ for Neural Machine Translation. In Conference to Improve Machine Learning for Text Cate- on Empirical Methods in Natural Language gorization. In Human Language Technologies Processing (EMNLP). Brussels, Belgium. 2007: The Conference of the North American Chapter of the Association for Computa- Yu-Hsuan Wang, Cheng-Tao Chung, and Hung-yi tional Linguistics; Proceedings of the Main Lee. 2017b. Gate Activation Signal Analysis Conference, pages 260–267. Association for for Gated Recurrent Neural Networks and Computational Linguistics. Its Correlation with Phoneme Boundaries. In Interspeech 2017. Quan-shi Zhang and Song-chun Zhu. 2018. Gail Weiss, Yoav Goldberg, and Eran Yahav. Visual interpretability for deep learning: A 2018. On the Practical Computational Power survey. Frontiers of Information Technology of Finite Precision RNNs for Language & Electronic Engineering, 19(1):27–39. Recognition. In Proceedings of the 56th Annual Meeting of the Association for Computa- Ye Zhang, Iain Marshall, and Byron C. Wallace. tional Linguistics (Volume 2: Short Papers), 2016. Rationale-Augmented Convolutional pages 740–745. Association for Computational Neural Networks for Text Classification. In Linguistics. Proceedings of the 2016 Conference on Adina Williams, Andrew Drozdov, and Samuel R. Empirical Methods in Natural Language Bowman. 2018. Do latent tree learning models Processing, pages 795–804. Association for identify meaningful structure in sentences? Computational Linguistics. Transactions of the Association for Compu- tational Linguistics, 6:253–267. Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2018a. Gender Zhizheng Wu and Simon King. 2016. Inves- Bias in Coreference Resolution: Evaluation and tigating gated recurrent networks for speech Debiasing Methods. In Proceedings of the 2018 synthesis. In 2016 IEEE International Con- Conference of the North American Chapter of ference on Acoustics, Speech and Signal the Association for Computational Linguistics: Processing (ICASSP), pages 5140–5144. IEEE. Human Language Technologies, Volume 2 Puyudi Yang, Jianbo Chen, Cho-Jui Hsieh, Jane- (Short Papers), pages 15–20. Association for Ling Wang, and Michael I. Jordan. 2018. Computational Linguistics. Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data. arXiv Junbo Zhao, Yoon Kim, Kelly Zhang, Alexander preprint arXiv:1805.12316v1. Rush, and Yann LeCun. 2018b. Adversarially Regularized Autoencoders. In Proceedings of Wenpeng Yin, Hinrich Schutze,¨ Bing Xiang, and the 35th International Conference on Machine Bowen Zhou. 2016. ABCNN: Attention-Based Learning, Volume 80 of Proceedings of Ma- Convolutional Neural Network for Modeling chine Learning Research, pages 5902–5911, Sentence Pairs. Transactions of the Association Stockholmsmassan,¨ Stockholm, Sweden. PMLR. for Computational Linguistics, 4:259–272.

Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin Zhengli Zhao, Dheeru Dua, and Sameer Li. 2017. Adversarial Examples: Attacks and Singh. 2018c. Generating Natural Adversarial Defenses for Deep Learning. arXiv preprint Examples. In International Conference on arXiv:1712.07107v3. Learning Representations.

72