Compositional Morphology Through Deep Learning
Total Page:16
File Type:pdf, Size:1020Kb
Compositional Morphology Through Deep Learning A thesis presented by Ekaterina Vylomova ORCID: 0000-0002-4058-5459 to School of Computing and Information Systems in total fulfillment of the requirements for the degree of Doctor of Philosophy The University of Melbourne Melbourne, Australia November, 2018 Supervised by: Tim Baldwin and Trevor Cohn Declaration This is to certify that (i) the thesis comprises only my original work towards the PhD except where indicated in the Preface, (ii) due acknowledgement has been made in the text to all other material used, (iii) the thesis is less than 100, 000 words in length, exclusive of tables, maps, bibliographies and appendicies. Ekaterina Vylomova November, 2018 Abstract Most human languages have sophisticated morphological systems. In order to build success- ful models of language processing, we need to focus on morphology, the internal structure of words. In this thesis, we study two morphological processes: inflection (word change rules, e.g. run!runs) and derivation (word formation rules, e.g. run!runner). We first evaluate the ability of contemporary models that are trained using the distribu- tional hypothesis, which states that a word’s meaning can be expressed by the context in which it appears, to capture these types of morphology. Our study reveals that inflections are predicted at high accuracy whereas derivations are more challenging due to irregularity of meaning change. We then demonstrate that supplying the model with character-level information improves predictions and makes usage of language resources more efficient, especially in morphologically rich languages. We then address the question of to what extent and which information about word proper- ties (such as gender, case, number) can be predicted entirely from a word’s sentential content. To this end, we introduce a novel task of contextual inflection prediction. Our experiments on prediction of morphological features and a corresponding word form from sentential context show that the task is challenging, and as morphological complexity increases, performance significantly drops. We found that some morphological categories (e.g., verbal tense) are inherent and typically cannot be predicted from context while others (e.g., adjective num- ber and gender) are contextual and inferred from agreement. Compared to morphological inflection tasks, where morphological features are explicitly provided, and the system hasto predict only the form, accuracy on this task is much lower. vi Finally, we turn to word formation, derivation. Experiments with derivations show that they are less regular and systematic. We study how much a sentential context is indicative of a meaning change type. Our results suggest that even though inflections are more productive and regular than derivations, the latter also present cases of high regularity of meaning and form change, but often require extra information such as etymology, word frequency and more fine-grained annotation in order to be predicted at high accuracy. To my loving parents, Galina and Aleksei. Acknowledgements First and foremost, my highest gratitude goes to my great supervisors, Tim Baldwin and Trevor Cohn, my “Cyril and Methodius”, who introduced me to a whole new world of academic life and always supported me in all my endeavours. During my PhD course, I was fortunate to meet and collaborate with many great researchers. A large part of my research was done together with my good friends, Ryan Cotterell and Christo Kirov, who always inspired me to learn and explore more. I am also grateful to people with whom I collaborated during the JSALT’15 workshop, Reza Haffari, Kaisheng Yao, Chris Dyer and Kevin Duh, from whom I learned a lot about neural machine translation. I am also extremely thankful to Jason Eisner for all his support and brainstorming. Many thanks go to Laura Rimell with whom we worked on my first paper, which was eventually presented at ACL 2016. Certainly, this work would not be complete without David Yarowsky and his passion for derivational paradigms, and I hope we will have a chance to continue research in this direction. I was also very pleased to visit Hinrich Schütze and Alex Fraser’s labs in Munich and Chris Biemann’s group in Darmstadt in 2016, both of which great experiences, and I am very thankful to them for making this happen. Although my MSc and BSc are both in Computer Science, I have always been fascinated by linguistics. Here, I would like to thank all linguists with whom I had a chance to meet and be involved in fruitful discussions during the CoEDL Summer School 2017: Nick Evans, Mark Ellison, Martin Haspelmath, Sabine Stoll, Balthasar Bickel, and John Mansfield. During the last three years, I was also involved in teaching various university subjects, and would like to thank people with whom we tried our best to do “knowledge transfer” to x new generations of students: Jeremy Nicholson, David Eccles, Greg Wadley, Ben Rubinstein, Sue Wright, Rao Kotagiri, and Karin Verspoor. I would like to thank our great NLP lab and people with whom I shared a significant part of my PhD path: Bahar Salehi, Long Duong, Afshin Rahimi (and his wife Ava), Oliver Adams, Daniel Beck, Matthias Petri, Fei Liu, Doris Hoogeveen, Nitika Mathur, Mohammad Oloomi, Xuanli He, Brian Hur (and his wife Lena), Yitong Li, Shiva Subramanian, Ned Letcher, Moe Fang, Vu Hoang, Yuan Li, Malcolm Karutz, Anirudh Joshi, Julian Brooke, Philip Schulz, Alex Vedernikov, Michał Lukasik, Simon De Deyne, Luis Soto, Sasha Panchenko, and many-many others. And, certainly, this thesis would not have been be possible without the support of my family, especially Andrei and Mark, who always believed in my abilities.1 Finally, I would like to thank my awesome committee chair, Alistair Moffat, and (of course) Google Australia for supporting my thesis research. 1and even sometimes overestimated them (: xi Publications Part of the material presented in the thesis has been published before. In particular, Chapter 3 is based on publications [1,2], and most of the material for Chapter 5 was derived from [3]. [1] Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Timothy Baldwin. Take and took, gaggle and goose, book and read: Evaluating the utility of vector differences for lexical relation learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1671–1682, 2016. [2] Ekaterina Vylomova, Trevor Cohn, Xuanli He, and Gholamreza Haffari. Word representation models for morphologically rich languages in neural machine translation. In Proceedings of the First Workshop on Subword and Character Level Models in NLP, pages 103–108, 2017a. [3] Ekaterina Vylomova, Ryan Cotterell, Timothy Baldwin, and Trevor Cohn. Context- aware prediction of derivational word-forms. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: (Volume 2 : Short Papers), pages 118–124, 2017b. Contents List of Figures xix List of Tables xxi 1 Introduction1 1.1 Context of the Study . .1 1.2 Aim and Scope . .6 1.3 Thesis Structure . .9 2 Background 13 2.1 Language . 13 2.2 Morphology . 16 2.2.1 Linguistic approaches . 16 2.2.2 Morphological Typology . 18 2.2.3 Morphology-Syntax Interface: an example of grammatical case . 21 2.2.4 Inflections . 23 2.2.5 Derivations . 30 2.2.6 Resources . 40 2.2.7 Tasks . 41 2.3 Modelling . 44 2.3.1 Finite-State Machines . 44 2.3.2 Inflection as String-to-String Transduction and Automatic Learning of Edit Distances . 50 xvi Contents 2.3.3 Inflectional Paradigm Modelling . 52 2.3.4 Derivational Models . 59 2.3.5 Distributed Representations and Distributional Semantics . 61 2.3.6 Learning of Compositionality . 71 2.4 Conclusion . 74 3 Evaluation of Word Embeddings and Distributional Semantics Approach 75 3.1 Introduction . 75 3.2 Language Modelling for English . 77 3.2.1 Relation Learning . 78 3.2.2 General Approach and Resources . 81 3.2.3 Clustering . 85 3.2.4 Classification . 89 3.3 Language Modelling for Russian . 96 3.3.1 Closed-World Classification . 97 3.3.2 Open-World Classification . 99 3.3.3 Split Volcabulary for Morphology . 100 3.4 Machine Translation . 101 3.4.1 Models . 102 3.4.2 Experiments . 105 3.5 Conclusion . 108 4 Inflectional Morphology Models 111 4.1 Introduction . 111 4.2 Predicting Inflectional Morphology . 114 4.2.1 Task Notation . 114 4.2.2 Morphological Attributes . 115 4.3 An Encoder–Decoder Model . 117 4.4 A Structured Neural Model . 118 4.4.1 A Neural Conditional Random Field . 119 Contents xvii 4.4.2 The Morphological Inflector . 121 4.4.3 Parameter Estimation and Decoding . 121 4.5 Experiments . 123 4.5.1 Dataset . 123 4.5.2 Evaluation . 124 4.5.3 Hyperparameters and Other Minutiae . 125 4.6 Results, Error Analysis, and Discussion . 125 4.7 SIGMORPHON 2018 – SubTask 2 . 130 4.8 Conclusion . 132 5 Derivational Morphology Models 135 5.1 Introduction . 135 5.2 Context-Aware Prediction . 137 5.2.1 Dataset . 139 5.2.2 Experiments . 140 5.2.3 Discussion . 144 5.3 Conclusion . 145 6 Conclusions and Future Work 147 6.1 Future work . 150 6.1.1 Joint Modelling of Etymology and Derivation for English . 150 6.1.2 Low-Resource Language Modelling . 151 6.1.3 Morphological models for Machine Translation . 152 Bibliography 155 List of Figures 2.1 An example of the NOMLEX data entry for promotion.......... 40 2.2 Finite-State Machines . 46 2.3 An FSA for English adjective morphology. 46 2.4 An improved FSA for English adjective morphology. 47 2.5 An FSA for English derivational morphology. 47 2.6 An FST describing regular English verbal inflection cases. 48 2.7 A cascade of rules mapped into a single FST.