How Well Do NLI Models Capture Verb Veridicality?

How well do NLI models capture verb veridicality? Alexis Ross Ellie Pavlick Harvard University Brown University alexis [email protected] ellie [email protected] Abstract and Kiparsky, 1968) and implicatives like “man- age to” (Karttunen, 1971)–and on incorporating In natural language inference (NLI), contexts such lexical semantic information into computa- are considered veridical if they allow us to tional models (MacCartney and Manning, 2009). infer that their underlying propositions make However, increasingly, linguistic evidence sug- true claims about the real world. We inves- tigate whether a state-of-the-art natural lan- gests that inferences involving veridicality rely guage inference model (BERT) learns to make heavily on non-lexical information and are better correct inferences about veridicality in verb- understood as a graded, pragmatic phenomenon complement constructions. We introduce an (de Marneffe et al., 2012; Tonhauser et al., 2018). NLI dataset for veridicality evaluation con- Thus, in this paper, we revisit the question sisting of 1,500 sentence pairs, covering 137 of whether neural models of natural language unique verbs. We find that both human and inference–which are not explicitly endowed with model inferences generally follow theoretical patterns, but exhibit a systematic bias towards knowledge of verbs’ lexical semantic categories– assuming that verbs are veridical–a bias which learn to make inferences about veridicality consis- is amplified in BERT. We further show that, tent with those made by humans. We solicit hu- encouragingly, BERT’s inferences are sensi- man judgements on 1,500 sentence pairs involv- tive not only to the presence of individual verb ing 137 verb-complement constructions. Analy- types, but also to the syntactic role of the verb, sis of these annotations provides new evidence of to the form of the complement clause ( - vs. the importance of pragmatic inference in model- that-complements), and negation. ing veridicality judgements. We use our collected 1 Introduction annotations to analyze how well a state-of-the-art NLI model (BERT, Devlin et al., 2018) is able to A context is veridical when the propositions it mimic human behavior on such inferences. The contains are taken to be true, even if not explic- results suggest that, while not yet solved, BERT itly asserted. For example, in the sentence “He represents non-trivial properties of veridicality in does not know that the answer is 5”, “know” is context. Our primary contributions are: veridical with respect to “The answer is 5”, since a speaker cannot felicitously say the former sen- • We collect a new NLI evaluation set of 1,500 tence unless they believe the latter proposition to sentence pairs involving verb-complement be true. In contrast, “think” would not be veridical constructions (x4).1 here, since “He does not think that the answer is 5” is felicitous whether or not it is taken to be true • We discuss new analysis of human judge- that “The answer is 5”. Understanding veridical- ments of veridicality and implications for ity requires semantic subtlety and is still an open NLI system development going forward (x5). problem for computational models of natural language inference (NLI) (Rudinger et al., 2018). • We evaluate the state-of-the-art BERT model This paper deals specifically with veridicality on these inferences and present evidence that, in verb-complement constructions. Prior work while there is still work to be done, the in this area has focused on characterizing verb 1https://github.com/alexisjihyeross/ classes–e.g. factives like “know that” (Kiparsky verb_veridicality 2230 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 2230–2240, Hong Kong, China, November 3–7, 2019. c 2019 Association for Computational Linguistics model appears to capture non-trivial proper- ney, 2009;S anchez´ Valencia, 1991) in order to ties about verbs’ veridicality in context (x6). perform natural language inference. Richardson and Kuhn(2012) incorporated signatures into a se- 2 Background and Related Work mantic parsing system. Several recent models of There is significant work, both in linguistics and event factuality similarly make use of veridicality NLP, on veridicality and closely-related topics lexicons as input to larger machine-learned sys- (factuality, entailment, etc). We view past work tems for event factuality (Saur´ı and Pustejovsky, on veridicality within NLP as largely divisible into 2012; Lotan et al., 2013; Stanovsky et al., 2017; two groups, which align with two differing per- Rudinger et al., 2018). Cases et al.(2019) used spectives on the role of the NLI task: the sentence- nested veridicality inferences as a test case for a meaning perspective and the speaker-meaning per- meta-learning model, again assuming verb signa- spective. Briefly, the sentence meaning approach tures as “meta information” known a priori. to NLI takes the position that NLP systems should strive to model the aspects of a sentence’s seman- Pragmatics (Speaker Meaning). Geis and tics which are closely derivable from the lexicon Zwicky(1971) observed that implicative verbs and which hold independently of context (Zaenen often give rise to “invited inferences”, beyond et al., 2005). In contrast, the speaker meaning ap- what is explainable by the lexical semantic type proach to NLI takes the position that NLP sys- of the verb. For example, on hearing “He did not tems should prioritize representation of the goal- refuse to speak”, one naturally concludes that “He directed meaning of a sentence within the context spoke” unless additional qualifications are made in which it was generated (Manning, 2006). Work (e.g. “...he just didn’t have anything to say”). on veridicality which aligns with the sentence- de Marneffe et al.(2012) explored this idea in meaning perspective tends to focus on charac- depth and presented evidence that such pragmatic terizing verbs according to their lexical seman- inferences are both pervasive and annotator- tic classes (or “signatures”), while work which dependent, but nonetheless systematic enough aligns with the speaker-meaning approach focuses to be relevant for NLP models. Karttunen et al. on representing “world knowledge” and evaluat- (2014) makes similar observations specifically in ing inferences in naturalistic contexts. the case of evaluative adjectives, and Pavlick and Callison-Burch(2016) specifically in the case of Lexical Semantics (Sentence Meaning). Most simple implicative verbs. In non-computational prior work treats veridicality as a lexical semantic linguistics, Simons et al.(2017, 2010); Tonhauser phenomenon. Such work is largely based on lex- et al.(2018) take a strong stance and argue that icons of verb signatures which specify the types veridicality judgements are entirely pragmatic, of inferences licensed by individual verbs (Kart- dependent solely on the question under discussion tunen, 2012; Nairn et al., 2006; Falk and Martin, (QUD) within the given discourse. 2017). White and Rawlins(2018); White et al. (2018) evaluated neural models’ ability to carry This Work. This paper assumes the speaker- out inferences in line with these signatures, mak- meaning approach: we take the position that mod- ing use of templatized “semantically bleached” els which consistently mirror human inferences stimuli (e.g. “someone knew something”) in order about veridicality in context can be said to un- to avoid confounds introduced by world knowl- derstand veridicality in general. We acknowledge edge and pragmatic inference. McCoy et al. that the question of what is the “right” approach to (2019) perform a similar study, though without NLI has existed since the original definition of the specific focus on veridicality lexicons. recognizing textual entailment (RTE) task (Dagan Most applied work related to veridicality also et al., 2006) and remains open. However, there falls under the lexical semantic approach. In has been a de facto endorsement of the speaker- nearly all cases, relevant system development in- meaning definition, evidenced by the widespread volves explicit incorporation of verb lexicons and adoption of NLI datasets which favor informal, associated logical inference rules. MacCartney “natural” inferences over prescriptivist annotation and Manning(2009); Angeli and Manning(2014); guidelines (Manning, 2006; Bowman et al., 2015; and others incorporated knowledge of verb signa- Williams et al., 2018). (Note, recently, there have tures within a natural logic framework (MacCart- been explicit endorsements as well; see Westera 2231 Factive He realized that he had to leave this house. ! He had to leave this house. [+=+] He did not realize that he had to leave this house. ! He had to leave this house. Implic. At that moment, I happened to look up. ! At that moment, I looked up. [+=−] At that moment, I did not happen to look up. !: At that moment, I looked up. Implic. He refused to do the same. !: He did the same. [−=◦] He did not refuse to do the same. 6! He did the same. NA Many felt that its inclusion was a mistake. 6! Its inclusion was a mistake. [◦=◦] Many did not feel that its inclusion was a mistake. 6! Its inclusion was a mistake. Table 1: Examples of several verb signatures and illustrative contexts for each. Signature s1=s2 denotes that the complement will project with polarity s1 in a positive environment and polarity s2 in a negative environment. and Boleda(2019)). Thus, from this perspective, consider 8 signatures3 in total. Table1 provides we ask: do NLI models which are not specifically several examples. Table2 lists all of the signatures endowed with lexical semantic knowledge pertain- and the corresponding verbs we consider. ing to veridicality nonetheless learn to model this semantic phenomenon? 4 Data 3 Projectivity and Verb Signatures For our analysis, we collect an NLI dataset for veridicality evaluation derived from the MNLI Veridicality is typically treated as a lexical seman- corpus. This data is publicly available at https: tic property of verbs, specified by the verb’s sig- //github.com/alexisjihyeross/ nature.

Load more