Probing Neural Language Models for Human Tacit Assumptions
Total Page:16
File Type:pdf, Size:1020Kb
Probing Neural Language Models for Human Tacit Assumptions Nathaniel Weir, Adam Poliak, and Benjamin Van Durme Department of Computer Science, Johns Hopkins University fnweir,azpoliak,[email protected] Abstract Prompt Model Predictions A has fur. dog, cat, fox, ... Humans carry stereotypic tacit assumptions (STAs) (Prince, 1978), or propositional beliefs about generic concepts. Such A has fur, is big, and has claws. cat, bear, lion, ... associations are crucial for understanding natural language. A has fur, is big, has claws, has bear, wolf, cat, ... We construct a diagnostic set of word prediction prompts to teeth, is an animal, eats, is brown, evaluate whether recent neural contextualized language mod- and lives in woods. els trained on large text corpora capture STAs. Our prompts are based on human responses in a psychological study of con- Figure 1: The concept bear as a target emerging as the highest ceptual associations. We find models to be profoundly effec- ranked predictions of the neural LM ROBERTA-L (Liu et al., 2019) tive at retrieving concepts given associated properties. Our re- when prompted with conjunctions of the concept’s human-produced sults demonstrate empirical evidence that stereotypic concep- properties. tual representations are captured in neural models derived from semi-supervised linguistic exposure. Keywords: language models; deep neural networks; concept representations; norms; semantics defining properties across multiple metrics; this perfor- mance discrepancy is consistent with many other lan- Introduction guage understanding tasks (Wang et al., 2018). We also find that models associate concepts with perceptual cat- Recognizing generally accepted properties about con- egories of properties (e.g. visual) worse than with non- cepts is key to understanding natural language (Prince, perceptual ones (e.g. encyclopaedic or functional). 1978). For example, if one mentions a bear, one does We further examine whether STAs can be extracted not have to explicitly describe the animal as having teeth from the models by designing prompts akin to those or claws, or as being a predator or a threat. This phe- shown to humans in psychological studies (McRae et nomenon reflects one’s held stereotypic tacit assump- al., 2005; Devereux et al., 2014). We find significant tions (STAs), i.e. propositions commonly attributed to overlap between model and human responses, but with “classes of entities” (Prince, 1978). STAs, a form of notable differences. We provide qualitative examples in common knowledge (Walker, 1991), are salient to cog- which the models’ predictive associations differ from nitive scientists concerned with how human representa- humans’, yet are still sensible given the prompt. Such tions of knowledge and meaning manifest. results highlight the difficulty of constructing word pre- As “studies in norming responses are prone to re- diction prompts that elicit particular forms of reasoning peated responses across subjects” (Poliak et al., 2018), from models optimized purely to predict co-occurrence. cognitive scientists demonstrate empirically that hu- Unlike other work analyzing linguistic meaning cap- mans share assumptions about properties associated tured in sentence representations derived from language with concepts (McRae et al., 2005). We take these con- models (Conneau et al., 2018; Tenney et al., 2019), we ceptual assumptions as one instance of STAs and ask do not fine-tune the models to perform any task; we in- whether recent contextualized language models trained stead find that the targeted tacit assumptions “fall out” on large text corpora capture them. In other words, do purely from semi-supervised masked language model- models correctly distinguish concepts associated with a ing. Our results demonstrate that exposure to large cor- given set of properties? To answer this question, we de- pora alone, without multi-modal perceptual signals or sign fill-in-the-blank diagnostic tests (Figure 1) based task-specific training cues, may enable a model to suffi- on existing data of concepts with corresponding sets of ciently capture STAs. human-elicited properties. By tracking conceptual recall from prompts of itera- Background tively concatenated conceptual properties, we find that the popular neural language models, BERT (Devlin et Contextualized Language Models Language models al., 2019) and ROBERTA (Liu et al., 2019), capture (LMs) assign probabilities to sequences of text. They STAs. We observe that ROBERTA consistently outper- are trained on large text corpora to predict the probabil- forms BERT in correctly associating concepts with their ity of a new word based on its surrounding context. Uni- 377 ©2020 The Author(s). This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY). directional models approximate for any text sequence she does not need to say “I am a person, and people w = [w1;w2;:::wN] the factorized left-context probabil- have dogs, and dogs need to be walked, so I have to N ity p(w) = ∏i=1 p(wi j w1 :::wi−1). Recent neural bi- walk my dog!” Comprehending STAs allows for gen- directional language models estimate the probability of eralized recognition of new categorical instances, and an intermediate ‘masked out’ token given both left and facilitates learning new categories (Lupyan et al., 2007), right context; this task is colloquially “masked language as shown in early word learning by children (Hills et al., modelling” (MLM). Training in this way produces a 2009). STAs are not explicitly facts.2 Rather, they are probability model that, given input sequence w and an sufficiently probable assumptions to be associated with arbitrary vocabulary word v, predicts the distribution concepts by a majority of people. A partial inspiration p(wi = v j w1;:::wi−1;wi+1;:::wn). When neural bi- for this work was the observation by Van Durme (2010) directional LMs trained for MLM are subsequently used that the concept attributes most supported by peoples’ as contextual encoders,1 performance across a wide search engine query logs (Pasca & Van Durme, 2007) range of language understanding tasks greatly improves. were strikingly similar to examples of STAs listed by We investigate two recent neural LMs: Bi- Prince. That is, there is strong evidence that the beliefs directional Encoder Representations from Transformers people hold about particular conceptual attributes (e.g. (BERT) (Devlin et al., 2019) and Robustly optimized “countries have kings”), are reflected in the aggregation BERT approach (ROBERTA) (Liu et al., 2019). In ad- of their most frequent search terms (“what is the name dition to the MLM objective, BERT is trained with an of the king of France?”). auxiliary objective of next-sentence prediction. BERT Our goal is to determine whether contextualized lan- is trained on a book corpus and English Wikipedia. guage models exposed to large corpora encode associa- Using an identical neural architecture, ROBERTA is tions between concepts and their tacitly assumed prop- trained for purely MLM (no next-sentence prediction) erties. We develop probes that specifically test a model’s on a much larger dataset with words masked out of ability to recognize STAs. Previous works (Rubinstein larger input sequences. Performance increases ubiqui- et al., 2015; Sommerauer & Fokkens, 2018; Da & Ka- tously on standard NLU tasks when BERT is replaced sai, 2019) have tested for similar types of stereotypic with ROBERTA as an off-the-shelf contexual encoder. beliefs; they use supervised training of probing classi- fiers (Conneau et al., 2018) to identify concept/attribute Probing Language Models via Word Prediction Re- pairs. In contrast, our word prediction diagnostics find cent research employs word prediction tests to ex- that these associations fall out of semi-supervised LM plore whether contextualized language models cap- pretraining. In other words, the neural LM inducts STAs ture a range of linguistic phenomena, e.g. syn- as a byproduct of learning co-occurrence without receiv- tax (Goldberg, 2019), pragmatics, semantic roles, and ing explicit cues to do so. negation (Ettinger, 2020). These diagnostics have psy- cholinguistic origins; they draw an analogy between the Probing for Stereotypic Tacit Assumptions “fill-in-the-blank” word predictions of a pre-trained lan- Despite introducing the notion of STAs, Prince (1978) guage model and distribution of aggregated human re- provides only a few examples. We therefore draw sponses in cloze tests designed to target specific sen- from other literature to create diagnostics that evalu- tence processing phenomena. Similar tests have been ate how well a contexualized language model captures used to evaluate how well these models capture sym- the phenomenon. Semantic feature production norms, bolic reasoning (Talmor et al., 2019) and relational i.e. properties elicited from human subjects regarding facts (Petroni et al., 2019). generic concepts, fall under the category of STAs. Inter- Stereotypic Tacit Assumptions Recognizing associa- ested in determining “what people know about different tions between concepts and their defining properties is things in the world,”3 McRae et al. (2005) had human key to natural language understanding and plays “a crit- subjects list properties that they associated with individ- ical role in language both for the conventional meaning ual concepts. When many people individually attribute of utterances, and in conversational inference” (Walker, the same properties to a specific concept, collectively 1991). Tacit