Automatic Lexico-Semantic Acquisition for Question Answering Plas, Marie Louise Elizabeth Van Der
Total Page:16
File Type:pdf, Size:1020Kb
University of Groningen Automatic lexico-semantic acquisition for question answering Plas, Marie Louise Elizabeth van der IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 2008 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): Plas, M. L. E. V. D. (2008). Automatic lexico-semantic acquisition for question answering. s.n. Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 24-09-2021 Automatic Lexico-Semantic Acquisition for Question Answering Lonneke van der Plas ii This research was carried out in the project Question Answering using Dependency Relations, which is part of the research programme for Interactive Multimedia Information eXtraction, IMIX, financed by NWO, the Netherlands organisation for scientific research. The work in this thesis has been carried out under the aus- pices of the LOT school and the Center for Language and Cognition Groningen (CLCG) from the University of Groningen. Groningen Dissertations in Linguistics 70 ISSN 0928-0030 Cover image: Aanknopingspunten by Miep van der Plas Cover design: Sander Gorter Printer: Grafimedia Document prepared with LATEX2ε. Rijksuniversiteit Groningen Automatic Lexico-Semantic Acquisition for Question Answering Proefschrift ter verkrijging van het doctoraat in de Letteren aan de Rijksuniversiteit Groningen op gezag van de Rector Magnificus, Dr. F. Zwarts, in het openbaar te verdedigen op donderdag 23 oktober 2008 om 16:15 uur door Marie Louise Elizabeth van der Plas geboren op 2 januari 1976 te Terneuzen iv Promotor: Prof.dr.ir. J. Nerbonne Copromotor: Dr. G. Bouma Beoordelingscommissie: Dr. I. Dagan Prof.dr. D. Geeraerts Prof.dr. P.T.J.M. Vossen ISBN: 978-90-367-3564-3 Acknowledgements Being more than 1000 kilometers and six months away from my time in Gronin- gen, I remember all those that contributed in one way or another to the work described in this book. First of all, I am grateful to my promotor John Nerbonne and co-promotor Gosse Bouma for their valuable comments on my written work and oral pre- sentations. I would also like to thank the members of my reading committee, Ido Dagan, Dirk Geeraerts and Piek Vossen for taking the time to assess my manuscript. For Gosse Bouma the term daily supervision can almost be taken literally, as he always made time for me when I needed to discuss things or I simply could not find the data or programs I needed. His reaction when I asked for an appointment, was usually: Now?. I found this way of working very motivating. In general I really enjoyed working in a team of researchers that are collab- orative, efficient, and open to new ideas. It was thanks to funding from NWO, the Netherlands organisation for scientific research, that we were able to build the very successful IMIX group in Groningen, and I would like to thank the members of that group for the collaborations that made research such an ex- citing and fruitful enterprise: Gosse Bouma, Ismail Fahmi, Jori Mur, Gertjan van Noord and J¨org Tiedemann. The IMIX project involved researchers from several universities in the Netherlands with whom we had interesting discussions and collaborations. I would like to thank all people from the Alfa-informatica division in Gronin- gen, people from the CLCG group in general, and from outside: Maria Georges- cul, Jennifer Spenader, and Theo Vosse for interesting discussions and/or help. The advertisement campaign for the city of Groningen says Er gaat niets boven Groningen ‘nothing is better than (literally: above) Groningen’, but I have really enjoyed collaborations with people outside Groningen as well. I would like to thank the members of the CLT group from Macquarie University and especially Robert Dale, Diego Molla, and Menno van Zaanen for having me as a visiting academic in their summer of 2007. Apart from skipping the winter vi in the Netherlands I enjoyed the reading groups and meetings a lot. During my visit in Sydney the several discussions with James Curran about distributional similarity techniques were also a great pleasure. After meeting Jean-Luc Manguin at CLIN, we started a very fruitful collab- oration that made it possible for me to apply some of my methods to French using the French synonym dictionary for evaluation. I also very much enjoyed the enthusiastic discussions during the distributional similarity meeting at the Lattice labs in Paris I was invited to. For one reason or another it helps to see people suffer in the same way as you do. I would like to thank the members of Schildpad, the thesis acceleration group, for their support, and for sharing frustrations and stress when the end of the thesis or rather the end of funding is approaching: Geoffrey Andogah, Jacky Benavides, Jantien Donkers, Ismail Fahmi, Jori Mur, and Erik-Jan Smits. Of those I would especially like to thank my officemates Ismail Fahmi and Jori Mur, for the great atmosphere and for sharing great moments. Conversations with Jori when I was in Groningen and still today are invaluable. A friendly face when entering the Harmonie building in the morning helps to start the working day. I would like to thank the porters for being such great professionals. I would also like to thank the administrative office for their help and support. Negative results, shattered expectations, a researcher needs to get rid of frustration and aggression. I would like to thank the colleagues that took part in our weekly sports activities ranging from football, a bit of volley, a bit of basketball, to football again, for risking their lives as members of our very friendly, non-competitive sports team. I would especially like to thank Roel Jonkers for keeping the group together. Although I did not particularly appreciate the food provided in the canteen of the Harmony building, I really enjoyed the Spanish lunches organised by Jacky Benavides, where we learned the basics of the Spanish language and had lots of fun. I would like to thank my friends and family for their encouragements, sup- port, help, and for distracting me from time to time: my parents, my grand- mother, Eleonora, Erla and Guido Keijser, Joost Doornik, Jori, Julia, Marieke, Marjolein, Sander, and Tanja. In particular, I want to thank Floris for his en- couragements to start a PhD project in chilly Groningen up north, for coming with me, and especially for his support in the last stages of the project. Without our evening outings after each day of non-stop writing, I would have definitely gone mad. Lastly, I would like to thank my paranimfs, Marjolein Deunk and Julia Klitsch, for helping me to try and turn the day of my defence into a great party. Contents 1 Introduction 1 1.1 Words................................. 1 1.2 Themeaningofwords ........................ 2 1.3 Automaticacquisition . .. .. .. .. .. .. .. .. .. 3 1.4 Typesoflexico-semanticinformation . 4 1.5 Application .............................. 5 1.6 Researchquestions .......................... 8 1.7 Overviewofchapters......................... 8 2 Lexico-semantic knowledge 11 2.1 Introduction.............................. 11 2.2 Lexicalelements ........................... 12 2.2.1 Open-classwords. 12 2.2.2 Polysemyandhomonymy . 13 2.3 Lexico-semanticrelations . 14 2.3.1 Associativerelations . 15 2.3.2 Taxonomicallyrelatedwords . 15 2.3.3 Synonymy........................... 18 2.4 Availablelexico-semanticresources . .. 20 2.4.1 EuroWordNet......................... 21 2.4.2 Wordassociationnorms . 22 2.5 Evaluating lexico-semantic knowledge . 23 2.5.1 Goldstandardevaluation . 24 2.5.2 Task-basedevaluation . 30 2.5.3 Evaluation against ad hoc human judgements . 34 3 Syntax-based distributional similarity 37 3.1 Introduction.............................. 37 3.2 Syntax-basedmethods . 39 3.2.1 Syntacticcontext. 39 viii CONTENTS 3.2.2 Measuresandfeatureweights . 39 3.2.3 Relatedwork ......................... 41 3.3 Methodology ............................. 45 3.3.1 Datacollection ........................ 45 3.3.2 Definitions .......................... 48 3.3.3 Similaritymeasures . 49 3.3.4 Weights ............................ 50 3.4 Evaluation............................... 51 3.4.1 EWNsimilaritymeasure. 51 3.4.2 Synonyms, hypernyms and (co)-hyponyms . 52 3.4.3 Testset ............................ 53 3.5 Results................................. 54 3.5.1 Cellandrowfrequencycutoffs . 54 3.5.2 Comparingmeasuresandweights . 56 3.5.3 Comparingcorpora. 62 3.5.4 Comparisonto proximity-basedmethod . 62 3.5.5 Distribution of semantic relations . 64 3.5.6 Comparingsyntacticrelations. 65 3.5.7 Comparisontoourpreviouswork. 70 3.6 Conclusions .............................. 71 4 Alignment-based distributional similarity 73 4.1 Introduction.............................. 73 4.2 Alignment-basedmethods . 75 4.2.1 Translationalcontext.