University of Groningen Off-Line Answer Extraction for Question
Total Page:16
File Type:pdf, Size:1020Kb
University of Groningen Off-line answer extraction for Question Answering Mur, Jori IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 2008 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): Mur, J. (2008). Off-line answer extraction for Question Answering. s.n. Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 24-09-2021 Jori Mur Off-line answer extraction for Question Answering ii This research was carried out in the project Question Answering using Dependency Relations, which is part of the research programme for Interactive Multimedia Inform- ation eXtraction, IMIX, financed by NWO. The work in this thesis has been carried out under the auspices of the LOT school and the Center for Language and Cognition Groningen (CLCG) of the Faculty of Arts of the University of Groningen. Groningen Dissertations in Linguistics 69 ISSN 0928-0030 c 2008, Jori Mur ISBN: 978-90-367-3567-4 Cover design: c 2008, Roelie van der Molen Printed by Grafimedia, Groningen Document prepared with LATEX 2ε and typeset in pdfTEX. RIJKSUNIVERSITEIT GRONINGEN Off-line answer extraction for Question Answering Proefschrift ter verkrijging van het doctoraat in de Letteren aan de Rijksuniversiteit Groningen op gezag van de Rector Magnificus, dr. F. Zwarts, in het openbaar te verdedigen op donderdag 23 oktober 2008 om 14.45 uur door Jori Mur geboren op 28 juni 1980 te Hardenberg iv Promotor: Prof.dr.ir. J. Nerbonne Copromotor: Dr. G. Bouma Beoordelingscommissie: Prof.dr. P. Hendriks Prof.dr. M. de Rijke Prof.dr. B.L. Webber Preface There are many people who helped me writing this thesis and in this preface I take the opportunity of thanking them. I am first indebted to Gosse Bouma, who has been a fine supervisor throughout these four years. Judging by the stories of many other phds I believe it is by no means commonplace that you were always available to answer questions, comment on papers and give advice. I have always appreciated that a lot. I am also grateful to my professor John Nerbonne for his valuable comments on my work and for reading my chapters so quickly everytime I had handed one in. Furthermore I want to thank my reading committee, Petra Hendriks, Bonnie Webber and Maarten de Rijke for their comments on my work. I thank all my colleagues of CLCG for creating such a nice working environment. Afraid of forgetting someone I will not mention you all by name, but there are a few people that I want to thank in particular. First Lonneke and Ismail for being the best roommates I could wish for. It was really quiet and empty when you left and I am happy that John came to keep me from feeling lonely the last couple of months. I thank Gertjan, Gosse, Ismail, J¨org, and Lonneke as fellow members of the Groningen QA CLEF team. It was great working with you in this project and participating in CLEF together. I thank all the people from wednesday sports, especially Roel for organising it every year. How I am going to miss these weekly hours. Jacky, besides being a great colleague you deserve a lot of gratitude for all the work you have put in the Spanish lunches. Muchas Gracias! I hope you do not stay too long on the other side of the world. Further, I wish to thank all the quiz people and especially Jacky and later Erik for organising it. I will continue to join although I am a bit dissappointed in how useful my CLEF knowledge turns out to be, that is, not at all. I am grateful to all my co-schildpadden, Lonneke, Jacky, Erik-Jan, Ismail, Geoffrey, and Jantien for the weekly meetings where we could vent our frustrations and support each other during the progress of writing our thesis. I am greatly indebted to Roelie for designing my website four years ago. I am very happy that you also agreed to design the cover of this thesis. Tanke wol! Special thanks in advance for Therese and Jelena, I am very happy that you will stand by my side as my paranimf’s at my public defense. Tack s˚a mycket, hvala lepa! Lonneke, I v vi have mentioned you already a couple of times, but you cannot be mentioned enough. Without you I would not have finished this work. Thank you for your support, your friendship and for all the fun we had together. We started almost at the same time and I am very happy that we will have our defence on the same day. A warm thanks goes to all my parents, Mam, Paul, Pap, and Hette for always supporting me and believing in me. Ik ben heel blij met jullie allemaal! Finally I want to thank Fokke for all his love and support. Dank je wel, liefie, dat je telkens al mijn hoofdstukken wilde lezen, mij steunde als ik weer eens in een dip zat, en er gewoon altijd voor me bent. Contents 1 Introduction 1 1.1 Question Answering: motivation and background . 1 1.2 This thesis . 7 1.2.1 Off-line answer extraction: motivation and background . 8 1.2.2 Research questions and claims . 12 1.2.3 Chapter overview . 13 2 Off-line Answer Extraction: Initial experiment 15 2.1 Experimental Setting . 15 2.1.1 Joost . 15 2.1.2 Alpino . 17 2.1.3 Corpus . 19 2.1.4 Question set . 19 2.1.5 Answer set . 22 2.2 Initial experiment . 23 2.2.1 Patterns . 23 2.2.2 Questions and answers . 28 2.2.3 Evaluation methods and results . 28 2.2.4 Discussion of results and error analysis . 29 2.3 Conclusion . 34 3 Extraction based on dependency relations 37 3.1 Introduction . 37 3.2 Answer Extraction . 39 3.2.1 Extraction with Surface Patterns . 40 3.2.2 Extraction with Syntactic Patterns . 41 3.2.2.1 Equivalence rules . 42 3.2.2.2 D-score . 44 3.3 Experiments . 45 3.3.1 Extraction task . 45 3.3.2 Question Answering task . 47 vii viii CONTENTS 3.4 Discussion of results . 51 3.5 Conclusion . 56 4 Coreference resolution for off-line answer extraction 57 4.1 Introduction . 57 4.2 Coreference resolution . 59 4.2.1 Choosing an approach . 59 4.2.2 Coreference resolution process . 61 4.2.2.1 Preprocessing . 62 4.2.2.2 Resolving Pronouns . 63 4.2.2.3 Resolving Common Nouns . 78 4.2.2.4 Resolving Named Entities . 86 4.2.3 Evaluation and results . 88 4.2.3.1 Trade-off recall and precision . 88 4.2.3.2 MUC-score . 88 4.2.3.3 Results . 93 4.2.3.4 Error analysis . 93 4.3 Using coreference information for answer extraction . 95 4.3.1 Extraction task . 97 4.3.2 Question Answering task . 101 4.3.3 Discussion of results . 104 4.4 Related work . 105 4.5 Conclusion . 107 5 Extraction based on learned patterns 109 5.1 Introduction . 109 5.1.1 Bootstrapping techniques . 110 5.1.2 Aims and overview . 113 5.2 Bootstrapping algorithm . 115 5.2.1 Pattern induction . 115 5.2.2 Pattern filtering . 117 5.2.3 Fact extraction . 118 5.3 Experiment . 118 5.3.1 Evaluation . 122 5.3.2 Results . 122 5.4 Discussion of results . 123 5.5 General discussion on learning patterns . 127 CONTENTS ix 5.6 Conclusion . 130 6 Conclusions 133 6.1 Summary of main findings . 133 6.2 Future work . 137 Bibliography 139 A Patterns 149 A.1 Capital . 150 A.1.1 Surface patterns . 150 A.1.2 Dependency patterns . 150 A.2 Currency . 150 A.2.1 Surface patterns . 150 A.2.2 Dependency patterns . 151 A.3 Date of Birth . 151 A.3.1 Surface patterns . 151 A.3.2 Dependency patterns . 151 A.4 Founder . 151 A.4.1 Surface patterns . 151 A.4.2 Dependency patterns . 152 A.5 Function . 152 A.5.1 Surface patterns . 152 A.5.2 Dependency patterns . 153 A.6 Location of Birth . 153 A.6.1 Surface patterns . 153 A.6.2 Dependency patterns . 153 Samenvatting 155 GRODIL 161 x CONTENTS Chapter 1 Introduction 1.1 Question Answering: motivation and background In this age of growing availability of digital information the development of tools to search through an abundant supply of information is crucial. Well-known are inform- ation retrieval systems, of which online search engines, such as Google, are the most widely used examples. Typing in a few keywords results in a list of links to relevant documents. There are, however, situations imaginable which require a more intuitive and nat- ural approach to providing information. For example, if a client of a bank wants to know how to open a bank account or a traveller wants to know if he still can cancel his flight.