*Λcross-Lingual Semantic Parsing with Categorial Grammars
Total Page:16
File Type:pdf, Size:1020Kb
∨ ¬ = Cross-lingual Semantic Parsing with Categorial Grammars Kilian Evang ◇ λ+ ∧* > ; Cross-lingual Semantic Parsing with Categorial Grammars Kilian Evang The work in this thesis has been carried out under the auspices of the Center for Lan- guage and Cognition Groningen of the Faculty of Arts of the University of Groningen. Groningen Dissertations in Linguistics 155 ISSN: 0928-0030 ISBN: 978-90-367-9474-9 (printed version) ISBN: 978-90-367-9473-2 (electronic version) c 2016, Kilian Evang Document prepared with LATEX 2" and typeset by pdfTEX (TEX Gyre Pagella font) Cover design by Kilian Evang Printed by CPI Thesis (www.cpithesis.nl) on Matt MC 115g paper Cross-lingual Semantic Parsing with Categorial Grammars Proefschrift ter verkrijging van de graad van doctor aan de Rijksuniversiteit Groningen op gezag van de rector magnificus prof. dr. E. Sterken en volgens besluit van het College voor Promoties. De openbare verdediging zal plaatsvinden op donderdag 26 januari 2017 om 12.45 uur door Kilian Evang geboren op 1 mei 1986 te Siegburg, Duitsland Promotores Prof. dr. J. Bos Beoordelingscommissie Prof. dr. M. Butt Prof. dr. J. Hoeksema Prof. dr. M. Steedman Acknowledgments Whew, that was quite a ride! People had warned me that getting a PhD is not for the faint of heart, and they were right. But it’s finally done! Everybody whose guidance, support and friendship I have been able to count on during these five years certainly deserves a big “thank you”. Dear Johan, you created the unique research program on deep se- mantic annotation that lured me to Groningen, you hired me and agreed to be my PhD supervisor. Moreover, you gave me lots of freedom to de- velop and try out my ideas, always listened to me, always believed in me and always, always had my back in easy times and in difficult times. Thank you so much! Besides my supervisor, a number of other people have also helped with shaping and focusing my research. I would especially like to thank two pre-eminent experts on CCG parsing, Stephen Clark and James Curran. Discussing things with them helped me a great deal with get- ting my thoughts in order, inspired new ideas and encouraged me to pursue them. I would also like to thank Laura Kallmeyer and Frank Richter for their invaluable mentorship during the time before my PhD studies. Dear professors Miriam Butt, Jack Hoeksema and Mark Steedman— thank you for agreeing to be on the reading committee for my thesis! I am honored that you deem it scientifically sound. I apologize for not having had the time to address all of your—justified—suggestions for further improvement. I will heed them in my future research. Few things are as important to a researcher as a good working envi- ronment, and I have been especially fortunate here. I would like to start by thanking my fellow PhD students Johannes, Dieke, Rob, Hessel and v vi Acknowledgments Rik, as well as Vivian and Yiping. We make a fine pubquiz team even though we suck at recognizing bad music from the 70’s. Thank you also for the coffee at Simon’s, the beer at Hugo’s and more generally for keeping me sane and happy during the final year of my PhD. I will try to return the favor when it comes your turn. Dear Dieke and Rob, spe- cial thanks to you for agreeing to stand by my side as paranimfen on the day of my defense! Next, I would like to thank the whole Alfa-informatica department, that entity that formally does not exist, but is united by computational linguistics, the Information Science degree program, the reading group, the corridor and the greatest collegial spirit one could wish for. Dear Antonio, Barbara, Dieke, Duy, George, Gertjan, Gosse, Gregory, Hes- sel, Johan, John, Lasha, Leonie, Malvina, Martijn, Pierre, Rik and Rob— thank you for being great colleagues and coworkers in all matters of re- search, teaching and beyond. Congratulations for upholding that spirit for 30 years—and here’s to the next 30! Within the entire Faculty of Arts and its various organizational units, I would especially like to thank Alice, Karin, Marijke and Wyke for be- ing incredibly helpful and friendly with all things related to training, teaching and administration throughout my 5+ years here. Although I cannot name all former coworkers here, I would like to thank some of them, and their families, specially. Dear Valerio and Sara, thanks for being like a wise big brother and awesome sister-in-law to me. Dear Noortje and Harm, thank you for never ceasing to infect me with your energy and your quick and big thinking. Dear Elly, thank you for being a great help and wonderful friend. Dear Gideon, thanks for being such a cool buddy. Thanks are due also to my friends outside work, the wonderful peo- ple I met at Gopher, the USVA acting class, the “Kookclub” and else- where, who have been making Groningen feel like home. I would es- pecially like to thank Anique, Aswin, Evie, Hylke, Maaike, Marijn, Ni, Qing, Rachel, Rogier and Yining. Thank you for all the great times, here’s to many more! Furthermore, I am extremely grateful for all friendships that have stood the test of time and geographical distance—not in all cases evi- denced by frequent contact, but by the quality of contact when it hap- Acknowledgments vii pens. Dear Malik, Regina, Anne, Johannes, Anna, Katya, Norbert, Aleks, Anna, Laura, Marc, Chris, Laura, Nadja, Nomi, Chrissi, Armin and Martini; dear members of the Gesellschaft zur Stärkung der Verben; my dear Twitter timeline—thank you for your friendship, for our inspir- ing exchanges and for our common projects. The author of this thesis would not be the same person without you. To an even greater extent, this holds, of course, for my family. Dear Mom and Dad, thank you for giving me all the freedom, support, sta- bility and love one could wish for, for 30 years and counting. Dear Viola and Valentin, thank you for being fantastic siblings! Dear Oma, Roselies, Markus, Peter, Heike, Karin, Dany, Bille, Brigitte, Matze, Tina and the whole lot—you are a great family. Thank you for always making me feel good about where I come from, wherever I may go next. Groningen, December 2016 Contents Acknowledgments v Contents ix 1 Introduction 1 1.1 About this Thesis . 3 1.2 Publications . 5 I Background 7 2 Combinatory Categorial Grammar 9 2.1 Introduction . 9 2.2 Interpretations . 10 2.3 Syntactic Categories . 11 2.3.1 Morphosyntactic Features . 14 2.4 Combinatory Rules . 16 2.4.1 Application . 16 2.4.2 Composition . 17 2.4.3 Crossed Composition . 19 2.4.4 Generalized Composition . 20 2.4.5 Type Raising . 20 2.4.6 Rule Restrictions . 22 2.4.7 Type Changing . 23 2.5 Grammars and Derivations . 25 2.6 CCG as a Formalism for Statistical Parsing . 27 ix x Contents 3 Semantic Parsing 31 3.1 Introduction . 31 3.2 Early Work . 32 3.3 Early CCG-based Methods . 35 3.4 Context-dependent and Situated Semantic Parsing . 38 3.5 Alternative Forms of Supervision . 39 3.5.1 Ambiguous Supervision . 40 3.5.2 Highly Ambiguous Supervision . 41 3.5.3 Learning from Exact Binary Feedback . 41 3.5.4 Learning from Approximate and Graded Feedback 42 3.5.5 Unsupervised Learning . 43 3.6 Two-stage Methods . 43 3.6.1 Lexicon Generation and Lexicon Extension . 44 3.6.2 Semantic Parsing via Ungrounded MRs . 45 3.6.3 Semantic Parsing or Information Extraction? . 49 3.7 Broad-coverage Semantic Parsing . 49 II Narrow-coverage Semantic Parsing 55 4 Situated Semantic Parsing of Robotic Spatial Commands 57 4.1 Introduction . 57 4.2 Task Description . 58 4.3 Extracting a CCG from RCL . 60 4.3.1 Transforming RCL Trees to CCG Derivations . 60 4.3.2 The Lexicon . 63 4.3.3 Combinatory Rules . 64 4.3.4 Anaphora . 66 4.4 Training and Decoding . 66 4.4.1 Semantically Empty and Unknown Words . 67 4.4.2 Features . 67 4.4.3 The Spatial Planner . 68 4.5 Experiments and Results . 69 4.6 Conclusions . 70 Contents xi III Broad-coverage Semantic Parsing 71 5 Meaning Banking 73 5.1 Introduction . 73 5.2 Discourse Representation Structures . 75 5.2.1 Basic Conditions . 76 5.2.2 Complex Conditions . 82 5.2.3 Projection Pointers . 86 5.2.4 The Syntax-semantics Interface . 89 5.3 Token-level Annotation . 94 5.3.1 Segments . 96 5.3.2 Tags . 98 5.3.3 Quantifier Scope . 100 5.4 Building the Groningen Meaning Bank . 114 5.4.1 Data . 114 5.4.2 Human-aided Machine Annotation . 115 5.5 Results and Comparison . 120 5.6 Conclusions . 125 6 Derivation Projection Theory 127 6.1 Introduction . 127 6.1.1 Cross-lingual Semantic Annotation . 127 6.1.2 Cross-lingual Grammar Induction . 128 6.2 Basic Derivation Projection Algorithms . 130 6.2.1 Trap: Transfer and Parse . 132 6.2.2 Artrap: Align, Reorder, Transfer and Parse . 134 6.2.3 Arftrap: Align, Reorder, Flip, Transfer and Parse 136 6.3 Advanced Derivation Projection Algorithms . 143 6.3.1 Arfcotrap: Align, Reorder, Flip, Compose, Trans- fer and Parse . 143 6.3.2 Arfcostrap: Align, Reorder, Flip, Compose, Split, Transfer and Parse . 149 6.3.3 Arfcoistrap: Align, Reorder, Flip, Compose, In- sert, Split, Transfer, Parse . 154 6.4 Unaligned Source-language words . 157 6.5 Handling of Translation Divergences . 161 xii Contents 6.5.1 Thematic Divergences .