Compositional Lexical Semantics in Natural Language Inference
Total Page:16
File Type:pdf, Size:1020Kb
COMPOSITIONAL LEXICAL SEMANTICS IN NATURAL LANGUAGE INFERENCE Ellie Pavlick A DISSERTATION in Computer and Information Science Presented to the Faculties of the University of Pennsylvania in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy 2017 Supervisor of Dissertation Chris Callison-Burch, Associate Professor of Computer and Information Science Graduate Group Chairperson Lyle Ungar, Professor of Computer and Information Science Dissertation Committee Chris Callison-Burch, Associate Professor of Computer and Information Science Ido Dagan, Professor of Computer Science Mitch Marcus, Professor of Computer and Information Science Florian Schwarz, Associate Professor of Linguistics Lyle Ungar, Professor of Computer and Information Science COMPOSITIONAL LEXICAL SEMANTICS IN NATURAL LANGUAGE INFERENCE c COPYRIGHT 2017 Ellie Pavlick Tobochnik This work is licensed under the Creative Commons Attribution NonCommercial-ShareAlike 3.0 License To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/ Dedicated to my sisters, Cassie and Ginger. Your creativity and sense of humor is why I love language. iii ACKNOWLEDGEMENT I am incredibly lucky to have had such a supportive advisor, Chris Callison-Burch. Chris invested in me very early in my computer science career and had confidence in me well before I had any idea what I was doing. He has been my go-to mentor and my strongest advocate since. With every year, I have only become more excited about the field, and I credit that to Chris's consistent energy and encouragement. It is been a genuine privilege to work together. A huge thank you to my dissertation committee. Mitch Marcus has been a constant source of wisdom and guidance, and his personality, in my mind, will always be synonymous with Penn NLP and the wonderful intellectual environment I have experienced during my time here. Lyle Ungar's directness and pragmatism has been invaluable, and he has provided me with clarity and perspective when trying to articulate my own thoughts. Florian Schwarz gave me my first formal education in linguistics, and I am not exaggerating to say that that has had the single largest influence on the way I approach research. Annual meetings with Ido Dagan have been the most enlightening part of ACL conferences over the past several years{his drive is inspiring and every conversation I have had with him has reaffirmed my enthusiasm for the ambitiousness of NLP. I have been fortunate to work with many other incredible research mentors. I have learned so much from Ben Van Durme, Joel Tetreault, Peter Clark, and Niranjan Balasubramanian and owe each of them a huge thank you. Thank you especially to Marius Pasca, for engaging in every discussion with such depth and for challenging me on every opinion{working together made me a holistically better researcher, and I am extremely grateful. Of course, thank you also to my many coauthors and other collaborators, who have been a constant source of new ideas and great conversations. Finally, I am forever thankful to my family and friends who have been an unwavering support network. Thank you first to my incredible husband Steve, for being my best friend iv and for somehow managing to both keep me inspired and keep me grounded over the years. Thank you to my parents, Karin and Michael, for endless love and encouragement and for raising me to be confident and ambitious and unapologetic. Thank you to my sisters, Cassie and Ginger{growing up with you two is the best thing that ever happened to me. Thank you to Andrea, Jan, Howard, Nate, and Sherrie for all the love and support. Last, but never least, thank you to all the friends I have made at Penn and within the ACL community, and to the friends from my other lives at JHU, Peabody, and elsewhere, for all the happiness you've brought me. v ABSTRACT COMPOSITIONAL LEXICAL SEMANTICS IN NATURAL LANGUAGE INFERENCE Ellie Pavlick Chris Callison-Burch The focus of this thesis is to incorporate linguistic theories of semantics into data-driven models for automatic natural language understanding. Most current models rely on an im- poverished version of semantics which can be learned automatically from large volumes of unannotated text. However, many aspects of language understanding require deeper models of semantic meaning than those which can be easily derived from word co-occurrence alone. In this thesis, we inform our models using insights from linguistics, so that we can continue to take advantage of large-scale statistical models of language without compromising on depth and interpretability. We begin with a discussion of lexical entailment. We classify pairs of words according a small set of distinct entailment relations: e.g. equivalence, en- tailment, exclusion, and independence. We show that imposing these relations onto a large, automatically constructed lexical entailment resource leads to measurable improvements in an end-to-end inference task. We then turn our attention to compositional entailment, in particular, to modifier-noun composition. We show that inferences involving modifier-noun phrases (e.g. \red dress", \imaginary friend") are much more complex than the conven- tional wisdom states. In a systematic evaluation of a range of existing state-of-the-art nat- ural language inference systems, we illustrate the inability of current technology to handle the types of common sense inferences necessary for human-like processing of modifier-noun phrases. We propose a data-driven method for operationalizing a formal semantics frame- work which assigns interpretable semantic representations to individual modifiers. We use our method in order to find instances of fine-grained classes involving multiple modifiers (e.g. \1950s American jazz composers"). We demonstrate that our proposed compositional model outperforms existing non-compositional approaches. vi TABLE OF CONTENTS ACKNOWLEDGEMENT . iv ABSTRACT . vi LIST OF TABLES . xv LIST OF ILLUSTRATIONS . xviii CHAPTER 1 : Introduction . 1 1.1 Overview . 1 1.2 Outline of this Document . 8 CHAPTER 2 : Background and Related Work . 11 2.1 Definition of \Entailment" in Natural Language . 11 2.1.1 Entailment in Formal Linguistics . 11 2.1.2 Entailment in Natural Language Processing . 13 2.1.3 Types of Knowledge Tested by RTE . 14 2.1.4 RTE Systems and Approaches . 17 2.2 Lexical Entailment . 20 2.2.1 Word Denotations and Semantic Types . 20 2.2.2 Definition of Semantic Containment . 22 2.2.3 Basic Entailment Relations in Natural Logic . 23 2.2.4 Lexical Entailment Resources in NLP . 25 2.2.5 The Paraphrase Database . 28 2.3 Compositional Entailment in Modifier-Noun Phrases . 29 2.3.1 Classes of Modifiers in Formal Semantics . 30 2.3.2 Adjective Noun Composition in Natural Logic . 34 vii 2.3.3 Pragmatic Factors Affecting Modifier-Noun Composition . 36 2.4 Definition of Basic Entailment Relations used in this Thesis . 38 2.4.1 Relaxing Requirements of Exhaustivity . 38 2.4.2 Definitions . 40 CHAPTER 3 : Lexical and Non-Compositional Entailment . 42 3.1 Annotating Basic Entailment Relations . 43 3.1.1 Assumptions about Context . 43 3.1.2 Design of Annotation Task . 44 3.1.3 Labeled Datasets for Training and Evaluation . 46 3.2 Supervised Model for Lexical Entailment Classification . 50 3.2.1 Classifier Configuration . 51 3.2.2 Feature Groups . 51 3.2.3 Feature Analysis . 54 3.3 Intrinsic Evaluation of Predicted Relations . 55 3.3.1 Performance on Paraphrase Pairs Occurring in RTE Data . 57 3.3.2 Labeling All Paraphrase Pairs in PPDB . 58 3.4 Using Lexical Entailment Classifier to Improve End-to-End RTE . 60 3.4.1 The Nutcracker RTE System . 60 3.4.2 Experimental Setup . 61 3.4.3 Results . 62 3.5 Discussion . 64 CHAPTER 4 : Semantic Containment in Compositional Noun Phrases . 66 4.1 Annotating Compositional Noun Phrases in Context . 67 4.1.1 Focusing on Denotations vs. Focusing on Inferences . 67 4.1.2 Studying Composition through Atomic Edits . 68 4.1.3 Limitations of our Methodology . 69 4.1.4 Treating Entailment as a Continuum . 70 viii 4.2 Labeled Datasets for Analysis . 70 4.2.1 Data Selection . 71 4.2.2 Annotation . 73 4.2.3 Filtering and Post-Processing . 76 4.2.4 Reproducibility . 77 4.3 Analysis of Human Inferences . 78 4.3.1 Basic Entailment Relations Generated by MH Composition . 78 4.3.2 Generalizations for when H entails MH . 82 4.3.3 Undefined Entailment Relations . 84 4.4 Privative and Non-Subsective Adjectives . 86 4.4.1 Experimental Design . 86 4.4.2 Results . 89 4.4.3 Analysis . 91 4.5 Performance of Current RTE Systems . 94 4.5.1 The Add-One Entailment Task . 95 4.5.2 Description of Evaluated RTE Systems . 96 4.5.3 Results and Analysis . 97 4.6 Discussion . 101 CHAPTER 5 : Noun Phase Composition for Class-Instance Identification . 105 5.1 Modeling the Semantics of Noun Phrases . 106 5.1.1 Modifiers in Formal Linguistics . 106 5.1.2 Desiderata . 107 5.1.3 Weaknesses of Existing Computational Approaches . 108 5.2 Modifier Interpretation . 109 5.2.1 Assumptions of our Approach . 110 5.2.2 Data Processing . 110 5.2.3 Associating Properties with Modifiers . 112 5.2.4 Analysis of Learned Properties . 115 ix 5.3 Class-Instance Identification . 116 5.3.1 Class Membership as a Real-Valued rather than Binary Attribute . 117 5.3.2 Weakly Supervised Scoring Model . 118 5.3.3 Summary of Proposed Methods and Variations . 119 5.4 Evaluation . 120 5.4.1 Evaluation Data Sets from Wikipedia . 120 5.4.2 Experimental Setup . 122 5.4.3 Results and Analysis . 123 5.5 Discussion . 129 CHAPTER 6 : Conclusion . 131 6.1 Summary of Contributions .