Analysis Methods in Neural Language Processing: a Survey

Analysis Methods in Neural Language Processing: A Survey Yonatan Belinkov1;2 and James Glass1 1MIT Computer Science and Artificial Intelligence Laboratory 2Harvard School of Engineering and Applied Sciences Cambridge, MA, USA fbelinkov, [email protected] Abstract the networks in different ways.1 Others strive to better understand how NLP models work. This The field of natural language processing has theme of analyzing neural networks has connec- seen impressive progress in recent years, with tions to the broader work on interpretability in neural network models replacing many of the machine learning, along with specific characteris- traditional systems. A plethora of new mod- tics of the NLP field. els have been proposed, many of which are Why should we analyze our neural NLP mod- thought to be opaque compared to their feature- els? To some extent, this question falls into rich counterparts. This has led researchers to the larger question of interpretability in machine analyze, interpret, and evaluate neural net- learning, which has been the subject of much works in novel and more fine-grained ways. In debate in recent years.2 Arguments in favor this survey paper, we review analysis meth- of interpretability in machine learning usually ods in neural language processing, categorize mention goals like accountability, trust, fairness, them according to prominent research trends, safety, and reliability (Doshi-Velez and Kim, highlight existing limitations, and point to po- 2017; Lipton, 2016). Arguments against inter- tential directions for future work. pretability typically stress performance as the most important desideratum. All these arguments naturally apply to machine learning applications in NLP. 1 Introduction In the context of NLP, this question needs to be understood in light of earlier NLP work, often The rise of deep learning has transformed the field referred to as feature-rich or feature-engineered of natural language processing (NLP) in recent systems. In some of these systems, features are years. Models based on neural networks have more easily understood by humans—they can be obtained impressive improvements in various morphological properties, lexical classes, syn- tasks, including language modeling (Mikolov tactic categories, semantic relations, etc. In theory, et al., 2010; Jozefowicz et al., 2016), syntactic one could observe the importance assigned by parsing (Kiperwasser and Goldberg, 2016), statistical NLP models to such features in order machine translation (MT) (Bahdanau et al., 2014; to gain a better understanding of the model.3 In Sutskever et al., 2014), and many other tasks; see Goldberg (2017) for example success stories. This progress has been accompanied by a 1See, for instance, Noah Smith’s invited talk at ACL myriad of new neural network architectures. In 2017: vimeo.com/234958746. See also a recent debate on this matter by Chris Manning and Yann LeCun: www. many cases, traditional feature-rich systems are youtube.com/watch?v=fKk9KhGRBdI.(Videosaccessed being replaced by end-to-end neural networks on December 11, 2018.) that aim to map input text to some output pre- 2See, for example, the NIPS 2017 debate: www.youtube. diction. As end-to-end systems are gaining preva- com/watch?v=2hW05ZfsUUo.(Accessed on December 11, lence, one may point to two trends. First, some 2018.) 3Nevertheless, one could question how feasible such push back against the abandonment of linguis- an analysis is; consider, for example, interpreting support vec- tic knowledge and call for incorporating it inside tors in high-dimensional support vector machines (SVMs). 49 Transactions of the Association for Computational Linguistics, vol. 7, pp. 49–72, 2019. Action Editor: Marco Baroni. Submission batch: 10/2018; Revision batch: 12/2018; Published 3/2019. c 2019 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license. contrast, it is more difficult to understand what our scope.4 However, we mention here a few happens in an end-to-end neural network model representative studies that focused on analyzing that takes input (say, word embeddings) and such networks in order to illustrate how recent generates an output (say, a sentence classification). trends have roots that go back to before the recent Much of the analysis work thus aims to understand deep learning revival. how linguistic concepts that were common as Rumelhart and McClelland (1986) built a features in NLP systems are captured in neural feedforward neural network for learning the networks. English past tense and analyzed its performance As the analysis of neural networks for language on a variety of examples and conditions. They is becoming more and more prevalent, neural were especially concerned with the performance networks in various NLP tasks are being analyzed; over the course of training, as their goal was to different network architectures and components model the past form acquisition in children. They are being compared, and a variety of new anal- also analyzed a scaled-down version having eight ysis methods are being developed. This survey input units and eight output units, which allowed aims to review and summarize this body of work, them to describe it exhaustively and examine how highlight current trends, and point to existing certain rules manifest in network weights. lacunae. It organizes the literature into several In his seminal work on recurrent neural themes. Section 2 reviews work that targets a networks (RNNs), Elman trained networks on fundamental question: What kind of linguistic in- synthetic sentences in a language prediction formation is captured in neural networks? We task (Elman, 1989, 1990, 1991). Through exten- also point to limitations in current methods for sive analyses, he showed how networks discover answering this question. Section 3 discusses visu- the notion of a word when predicting characters; alization methods, and emphasizes the difficulty capture syntactic structures like number agree- in evaluating visualization work. In Section 4, ment; and acquire word representations that we discuss the compilation of challenge sets, or reflect lexical and syntactic categories. Similar test suites, for fine-grained evaluation, a meth- analyses were later applied to other networks and odology that has old roots in NLP. Section 5 tasks (Harris, 1990; Niklasson and Linaker,˚ 2000; deals with the generation and use of adversarial Pollack, 1990; Frank et al., 2013). examples to probe weaknesses of neural networks. While Elman’s work was limited in some ways, We point to unique characteristics of dealing with such as evaluating generalization or various lin- text as a discrete input and how different studies guistic phenomena—as Elman himself recog- handle them. Section 6 summarizes work on nized (Elman, 1989)—it introduced methods that explaining model predictions, an important goal are still relevant today: from visualizing network of interpretability research. This is a relatively activations in time, through clustering words by underexplored area, and we call for more work hidden state activations, to projecting represen- in this direction. Section 7 mentions a few other tations to dimensions that emerge as capturing methods that do not fall neatly into one of the properties like sentence number or verb valency. above themes. In the conclusion, we summarize The sections on visualization (Section 3) and iden- the main gaps and potential research directions for tifying linguistic information (Section 2) contain the field. many examples for these kinds of analysis. The paper is accompanied by online supplementary materials that contain detailed references 2 What Linguistic Information Is for studies corresponding to Sections 2, 4, and Captured in Neural Networks? 5 (Tables SM1, SM2, and SM3, respectively), available at https://boknilev.github.io/ Neural network models in NLP are typically nlp-analysis-methods. trained in an end-to-end manner on input–output Before proceeding, we briefly mention some pairs, without explicitly encoding linguistic earlier work of a similar spirit. 4For instance, a neural network that learns distributed representations of words was developed already in A Historical Note Reviewing the vast literature Miikkulainen and Dyer (1991). See Goodfellow et al. (2016, on neural networks for language is beyond chapter 12.4) for references to other important milestones. 50 features. Thus, a primary question is the fol- significant syntactic information at both word lowing: What linguistic information is captured level and sentence level. They also compared in neural networks? When examining answers representations at different encoding layers and to this question, it is convenient to consider found that ‘‘local features are somehow preserved three dimensions: which methods are used for in the lower layer whereas more global, abstract conducting the analysis, what kind of linguistic information tends to be stored in the upper information is sought, and which objects in the layer.’’ These results demonstrate the kind of neural network are being investigated. Table SM1 insights that the classification analysis may lead (in the supplementary materials) categorizes rel- to, especially when comparing different models evant analysis work according to these criteria. In or model components. the next subsections, we discuss trends in analysis Other methods for finding correspondences work along these lines, followed by a discussion between parts of the neural network and certain of limitations of current

Load more