Learning Compound Noun Semantics
Total Page:16
File Type:pdf, Size:1020Kb
UCAM-CL-TR-735 Technical Report ISSN 1476-2986 Number 735 Computer Laboratory Learning compound noun semantics Diarmuid OS´ eaghdha´ December 2008 15 JJ Thomson Avenue Cambridge CB3 0FD United Kingdom phone +44 1223 763500 http://www.cl.cam.ac.uk/ c 2008 Diarmuid O´ Seaghdha´ This technical report is based on a dissertation submitted July 2008 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Corpus Christi College. Technical reports published by the University of Cambridge Computer Laboratory are freely available via the Internet: http://www.cl.cam.ac.uk/techreports/ ISSN 1476-2986 Learning compound noun semantics Diarmuid O´ S´eaghdha Summary This thesis investigates computational approaches for analysing the semantic relations in compound nouns and other noun-noun constructions. Compound nouns in particular have received a great deal of attention in recent years due to the challenges they pose for natural language processing systems. One reason for this is that the semantic relation between the constituents of a compound is not explicitly expressed and must be retrieved from other sources of linguistic and world knowledge. I present a new scheme for the semantic annotation of compounds, describing in detail the motivation for the scheme and the development process. This scheme is applied to create an annotated dataset for use in compound interpretation experiments. The results of a dual-annotator experiment indicate that good agreement can be obtained with this scheme relative to previously reported results and also provide insights into the challenging nature of the annotation task. I describe two corpus-driven paradigms for comparing pairs of nouns: lexical similarity and relational similarity. Lexical similarity is based on comparing each constituent of a noun pair to the corresponding constituent of another pair. Relational similarity is based on comparing the contexts in which both constituents of a noun pair occur together with the corresponding contexts of another pair. Using the flexible framework of kernel methods, I develop techniques for implementing both similarity paradigms. A standard approach to lexical similarity represents words by their co-occurrence distri- butions. I describe a family of kernel functions that are designed for the classification of probability distributions. The appropriateness of these distributional kernels for semantic tasks is suggested by their close connection to proven measures of distributional lexical similarity. I demonstrate the effectiveness of the lexical similarity model by applying it to two classification tasks: compound noun interpretation and the 2007 SemEval task on classifying semantic relations between nominals. To implement relational similarity I use kernels on strings and sets of strings. I show that distributional set kernels based on a multinomial probability model can be computed many times more efficiently than previously proposed kernels, while still achieving equal or better performance. Relational similarity does not perform as well as lexical similarity in my experiments. However, combining the two models brings an improvement over either model alone and achieves state-of-the-art results on both the compound noun and SemEval Task 4 datasets. 3 4 Acknowledgments The past four years have been a never-boring mixture of learning and relearning, exper- iments that worked in the end, experiments that didn’t, stress, relief, more stress, some croquet and an awful lot of running. And at the end a thesis was produced. A good number of people have contributed to getting me there, by offering advice and feedback, by answering questions and by providing me with hard-to-find papers; these include Tim Baldwin, John Beavers, Barry Devereux, Roxana Girju, Anders Søgaard, Simone Teufel, Peter Turney and Andreas Vlachos. Andreas also read and commented on parts of this thesis, and deserves to be thanked a second time. Julie Bazin read the whole thing, and many times saved me from my carelessness. Diane Nicholls enthusiastically and diligently participated in the annotation experiments. My supervisor Ann Copestake has been an unfailing source of support, from the beginning – by accepting an erstwhile Sanskritist who barely knew what a probability distribution was for a vague project involving compound nouns and machine learning – to the end – she has influenced many aspects of my writing and presentation for the better. I am grateful to my PhD examiners, Anna Korhonen and Diana McCarthy, for their constructive comments and good-humoured criticism. This thesis is dedicated to my parents for giving me a good head start, to Julie for sticking by me through good results and bad, and to Cormac, for whom research was just one thing that ended too soon. 5 6 Contents 1 Introduction 11 2 Compound semantics in linguistics and NLP 15 2.1 Introduction................................... 15 2.2 Linguistic perspectives on compounds . ...... 15 2.2.1 Compounding as a linguistic phenomenon . .. 15 2.2.2 Inventories, integrated structures and pro-verbs: A survey of repre- sentationaltheories . 16 2.3 Compounds and semantic relations in NLP . .... 21 2.3.1 Inventories, integrated structures and pro-verbs (again) . .. 21 2.3.2 Inventoryapproaches . 22 2.3.3 Integrationalapproaches . 24 2.4 Conclusion.................................... 25 3 Developing a relational annotation scheme 27 3.1 Introduction................................... 27 3.2 Desiderata for a semantic annotation scheme . ....... 27 3.3 Developmentprocedure. 30 3.4 Theannotationscheme. .. .. .. 32 3.4.1 Overview ................................ 32 3.4.2 Generalprinciples............................ 33 3.4.3 BE.................................... 35 3.4.4 HAVE .................................. 35 3.4.5 IN .................................... 36 3.4.6 ACTOR,INST ............................. 36 3.4.7 ABOUT................................. 38 3.4.8 REL,LEX,UNKNOWN . .. .. 38 3.4.9 MISTAG,NONCOMPOUND . 39 3.5 Conclusion.................................... 39 CONTENTS CONTENTS 4 Evaluating the annotation scheme 41 4.1 Introduction................................... 41 4.2 Data....................................... 41 4.3 Procedure .................................... 44 4.4 Analysis ..................................... 45 4.4.1 Agreement................................ 45 4.4.2 Causesofdisagreement. 46 4.5 Priorworkanddiscussion . 51 4.6 Conclusion.................................... 53 5 Semantic similarity and kernel methods 55 5.1 Introduction................................... 55 5.2 A similarity-based approach to relational classification ........... 55 5.3 Methods for computing noun pair similarity . ...... 56 5.3.1 Constituent lexical similarity . .... 57 5.3.1.1 Lexical similarity paradigms . 57 5.3.1.2 Thedistributionalmodel. 59 5.3.1.3 Measures of similarity and distance . 62 5.3.1.4 Fromwordstowordpairs . 67 5.3.2 Relationalsimilarity . 68 5.3.2.1 The relational distributional hypothesis . ... 68 5.3.2.2 Methods for token-level relational similarity . ..... 70 5.3.2.3 Methods for type-level relational similarity . ..... 71 5.4 Kernel methods and support vector machines . ..... 74 5.4.1 Kernels ................................. 75 5.4.2 Classification with support vector machines . ..... 77 5.4.3 Combining heterogeneous information for classification ....... 81 5.5 Conclusion.................................... 82 6 Learning with co-occurrence vectors 83 6.1 Introduction................................... 83 6.2 Distributional kernels for semantic similarity . ........... 83 6.3 Datasets..................................... 86 6.3.1 1,443Compounds............................ 86 6.3.2 SemEvalTask4............................. 87 6.4 Co-occurrencecorpora . 88 8 CONTENTS CONTENTS 6.4.1 BritishNationalCorpus . 89 6.4.2 Web1T5-GramCorpus . .. .. 90 6.5 Methodology .................................. 91 6.6 Compoundnounexperiments . 93 6.7 SemEvalTask4experiments. 97 6.8 Furtheranalysis................................. 100 6.8.1 Investigating the behaviour of distributional kernels.........100 6.8.2 Experiments with co-occurrence weighting . .102 6.9 Conclusion....................................104 7 Learning with strings and sets 107 7.1 Introduction...................................107 7.2 String kernels for token-level relational similarity . ..............107 7.2.1 Kernelsonstrings. .. .. .. .107 7.2.2 Distributional string kernels . 109 7.2.3 ApplicationtoSemEvalTask4 . 110 7.2.3.1 Method ............................110 7.2.3.2 Results ............................111 7.3 Set kernels for type-level relational similarity . ............113 7.3.1 Kernelsonsets .............................113 7.3.2 Multinomialsetkernels. 115 7.3.3 Relatedwork ..............................116 7.3.4 Application to compound noun interpretation . .118 7.3.4.1 Method ............................118 7.3.4.2 Comparison of time requirements . 120 7.3.4.3 Results ............................121 7.4 Conclusion....................................125 8 Conclusions and future work 127 8.1 Contributionsofthethesis . 127 8.2 Futurework...................................128 Appendices A Notational conventions 155 B Annotation guidelines for compound nouns 157 9 CONTENTS CONTENTS 10 Chapter 1 Introduction Noun-noun compounds are familiar facts of our daily linguistic lives. To take a simple example from my typical morning routine: each weekday morning I eat breakfast in the living room, while catching up on email correspondence