From Document Retrieval to Question Answering
Total Page:16
File Type:pdf, Size:1020Kb
From Document Retrieval to Question Answering From Document Retrieval to Question Answering Christof Monz ILLC Dissertation Series DS-2003-4 ILLC Dissertation Series DS-2003-4 INSTITUTE FOR LOGIC,LANGUAGE AND COMPUTATION For further information about ILLC-publications, please contact Institute for Logic, Language and Computation Universiteit van Amsterdam Plantage Muidergracht 24 1018 TV Amsterdam phone: +31 20 525 6051 fax: +31 20 525 5206 e-mail: [email protected] home page: http://www.illc.uva.nl From Document Retrieval to Question Answering ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam op gezag van de Rector Magnificus prof.mr. P.F. van der Heijden ten overstaan van een door het college voor promoties ingestelde commissie, in het openbaar te verdedigen in de Aula der Universiteit op donderdag 11 december 2003, te 12.00 uur door Christof Monz geboren te Haan, Duitsland. Promotie commissie: Promotores: Prof.dr. F.M.G. de Jong Prof.dr. R. Scha Co-promotor: Dr. M. de Rijke Overige leden: Prof.dr. C. Clarke Dr. K. Sima’an Prof.dr. M. Stokhof Prof.dr. B. Webber Faculteit der Natuurwetenschappen, Wiskunde en Informatica This research was supported by the Netherlands Organization for Scientific Re- search (NWO) under project numbers 612-13-001 and 220-80-001. Copyright c 2003 by Christof Monz http://monzilla.net Cover design by Christof Monz. Typeset in Palatino using pdfLATEX. Printed and bound by Print Partners Ipskamp, Enschede. ISBN: 90-5776-116-5 F¨urmeine Eltern, Christina und Karl-Heinz Monz Contents Preface xvii 1 Introduction 1 1.1 Textual Question Answering System Architecture . 4 1.1.1 The General Architecture . 4 1.1.2 Question Analysis . 5 1.1.3 Document Retrieval . 7 1.1.4 Document Analysis . 8 1.1.5 Answer Selection . 10 1.2 Question Answering at TREC . 11 1.3 Research Questions . 13 1.4 Outline of the Thesis . 14 2 Theoretical and Practical Approaches to Question Answering 17 2.1 Formal Semantics of Question Answering . 18 2.1.1 Hamblin’s Postulates . 18 2.1.2 Completeness and Distinctness . 20 2.1.3 Informativeness . 21 2.2 Psychological Modeling of Question Answering . 22 2.2.1 Question Interpretation . 22 2.2.2 Question Categorization . 24 2.2.3 Knowledge Structure Procedures . 24 2.2.4 Answer Articulation . 25 2.2.5 Discussion . 26 2.3 Practical Approaches to Question Answering . 27 2.3.1 Database-Oriented Systems . 27 2.3.2 Text-Based Systems . 31 2.3.3 Inference-Based Systems . 36 2.4 Discussion . 41 x Contents 3 Document Retrieval as Pre-Fetching 43 3.1 Related Work . 45 3.2 Experimental Setup . 46 3.2.1 Test Data . 46 3.2.2 Document Retrieval Approaches . 49 3.2.3 Evaluation Measures . 51 3.2.4 Statistical Significance . 53 3.3 Experimental Results . 56 3.3.1 Document Similarity . 57 3.3.2 Query Formulation . 58 3.3.3 Stemming . 59 3.3.4 Blind Relevance Feedback . 61 3.3.5 Passage-Based Retrieval . 63 3.4 Conclusions . 65 4 Minimal Span Weighting 67 4.1 Related Work . 68 4.2 Minimal Span Weighting . 71 4.2.1 Definition of Minimal Span Weighting . 71 4.2.2 Computing Minimal Matching Spans . 73 4.3 Experimental Results . 76 4.3.1 Individual Query Performance . 78 4.3.2 The Effect of Coordination Level Matching . 79 4.4 Spans and Answerhood . 81 4.5 Conclusions . 86 5 Learning Query Term Selection 89 5.1 Related Work . 90 5.2 Optimal Query Term Selection . 91 5.3 Computing Query Term Weights . 93 5.4 Representing Terms by Sets of Features . 94 5.5 Machine Learning Approaches . 107 5.6 Experimental Results . 110 5.6.1 Model Tree Generation . 110 5.6.2 Retrieval Effectiveness of Learned Term Weights . 115 5.7 Conclusions . 116 6 Query Expansion for Specific Question Classes 119 6.1 Related Work . 120 6.2 Query Expansion . 122 6.3 Structured Querying . 124 6.3.1 Global Document Similarity . 125 6.3.2 Minimal Span Weighting for Structured Queries . 126 6.4 Experimental Results . 128 Contents xi 6.5 Conclusions . 131 7 Evaluating Retrieval within Tequesta 133 7.1 Related Work . 134 7.2 Architecture of the Tequesta System . 134 7.2.1 Question Analysis . 135 7.2.2 Document Retrieval . 136 7.2.3 Document Analysis . 136 7.2.4 Answer Selection . 139 7.3 Experimental Results . 140 7.3.1 Evaluation Criteria . 140 7.3.2 Minimal Span Weighting within Tequesta . 141 7.3.3 Expanding Measurement Questions within Tequesta . 143 7.4 Conclusions . 145 8 Conclusions 147 8.1 Recapitulation . 148 8.2 Future Directions . 151 Bibliography 152 Index 171 Summary in Dutch 175 List of Tables 1.1 Examples of question types . 6 1.2 Sample patterns for question classification . 7 2.1 Trigger words of Wendlandt & Driscoll . 34 2.2 Question categories in Murax . 35 2.3 Qualm question categories . 39 3.1 Retrieval systems used by TREC QA participants . 47 3.2 Kendall’s τ correlation between the different evaluation measures . 53 3.3 Lemmas vs. porter a@n scores . 59 3.4 Lemmas vs. porter p@n scores . 60 3.5 Lemmas vs. porter r@n scores . 60 3.6 One-pass retrieval vs. blind feedback a@n scores . 61 3.7 One-pass retrieval vs. blind feedback a@n scores (top 5) . 61 3.8 Feedback for ad-hoc retrieval . 62 3.8 Passage-based retrieval vs. baseline a@n scores . 64 3.9 Precision for passage-based retrieval . 65 4.1 Comparison of the a@n scores of msw retrieval runs to baseline runs 76 4.2 Comparison of p@n scores of msw retrieval runs to baseline runs . 77 4.3 Comparison of the r@n scores of msw retrieval runs to baseline runs 77 4.4 Comparison of mean average precisions (MAP) of msw retrieval runs to baseline runs . 78 4.5 Comparison of the a@n scores of clm retrieval runs to baseline runs . 80 4.6 Comparison of the a@n scores of msw retrieval runs to clm runs . 81 4.7 Minimal matching sentential span lengths . 83 4.8 Answer patterns for TREC-11 . 84 4.9 Minimal matching sentential spans containing a correct answer . 85 4.10 Limited minimal matching sentential spans containing a correct answer 86 5.1 Performances of term selection variants . 92 xiv List of Tables 5.2 Comparison of the a@n scores of optimal retrieval queries to baseline runs...................................... 93 5.3 Example term weights . 94 5.4 List of features for question words . 96 5.5 Types for question classification . 100 5.6 Example questions and their feature instantiations . 107 5.7 Accuracy of the model tree learning algorithm . 113 5.8 RReliefF estimates of features . 114 5.9 Comparison of the a@n scores of learned-weights retrieval runs to baseline runs . 116 5.10 Comparison of mean average precisions (MAP) of learned-weights retrieval runs to baseline runs . 116 6.1 Question types and their corresponding expansion terms . 123 6.2 Measurement questions and their frequency in the TREC data sets . 129 6.3 Comparison of the a@n scores of expanded retrieval runs to baseline msw runs . 129 6.4 Comparison of the p@n scores of expanded retrieval runs to baseline msw runs . 130 6.5 Comparison of the r@n scores of expanded retrieval runs to baseline msw runs . 130 6.6 Comparison of the MAP scores of expanded retrieval to msw retrieval 130 6.7 Comparing expanded retrieval to msw for all TREC datasets put to- gether . 131 7.1 Sample patters for question classification used in Tequesta . 137 7.2 Lenient evaluation of Tequesta using Lnu.ltc vs. msw retrieval . 141 7.3 Strict evaluation of Tequesta using Lnu.ltc vs. msw retrieval . 142 7.4 Lnu.ltc vs. msw MRR scores for TREC-9 per question class . 142 7.5 Lnu.ltc vs. msw MRR scores for TREC-10 per question class . 143 7.6 Lnu.ltc vs..