Knowledge-Based Query Formulation in Information Retrieval
Total Page:16
File Type:pdf, Size:1020Kb
Knowledge-based Query Formulation in Information Retrieval Knowledge-based Query Formulation in Information Retrieval PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit Maastricht, op gezag van de Rector Magnificus, Prof. dr. A.C. Nieuwenhuijzen Kruseman, volgens het besluit van het College van Decanen, in het openbaar te verdedigen op donderdag 14 september 2000 om 14:00 uur door Rudolf Wolfgang van der Pol Promotores: Prof. dr. H.J. van den Herik Prof. dr. ir. J.L.G. Dietz (Technische Universiteit Delft) Prof. dr. ir. A. Hasman Leden van de beoordelingscommissie: Prof. dr. H. Visser (voorzitter) Prof. dr. ir. K.L. Boon Prof. dr. K. Järvelin (Universiteit Tampere, Finland) Prof. dr. H. Koppelaar (Technische Universiteit Delft) Prof. dr. A.J. van Zanten Dissertation Series No. 2000-5 ISBN 90-801577-4-0 Uitgeverij Phidippides, Cadier en Keer © 2000 Ruud van der Pol Cover design and lay-out: Rob Molthoff Contents Chapter 1. Searching documents . 1 1.1 The central theme: document-search processes . 1 1.1.1 Concepts of query-based document-search processes. 2 1.1.2 Performance measures. 5 1.2 Motivation of the research . 5 1.2.1 Query-based document search processes have become crucial. 6 1.2.2 Query-based document-search processes are far from perfect . 7 1.2.3 A research opportunity: the project. 8 1.3 Outline of the thesis . 9 Chapter 2. Basic notions for document-search processes . 11 2.1 Basic assumptions. 11 2.1.1 The world . 11 2.1.2 Human beings . 12 2.2 Communication . 13 2.2.1 The mechanism of communication . 13 2.2.2 Language . 14 2.2.3 Expressing and interpreting. 15 2.3 Information. 16 2.4 Knowledge . 18 2.5 Documents . 20 2.5.1 Characterisation of document . 20 2.5.2 Distinguishing documents from each other . 21 Chapter 3 Matching models and research questions . 23 3.1 Matching . 23 3.1.1 General description framework . 23 3.1.2 Matching models. 24 3.2 Methods of indexing. 32 3.2.1 Simple indexes . 33 3.2.2 Structured indexes . 34 3.2.3 Weighted indexes . 34 3.3 Query formulation . 35 3.4 Query reformulation with domain information. 38 3.4.1 Domain information . 38 3.4.2 Reformulation techniques . 39 3.5 Query reformulation by relevance feedback. 43 3.6 Searching in webs of hyper documents. 45 3.6.1 The World Wide Web. 45 3.6.2 Using hyper structural information of the document collection . 47 3.7 Conclusions, a problem statement, and three research questions . 49 3.7.1 Conclusions . 49 3.7.2 Problem statement . 49 3.7.3 Three research questions . 50 Chapter 4. Query formulation in patent-search tasks . 53 4.1 Description of query formulation . 53 4.2 Patents introduced . 56 4.2.1 Patents . 56 4.2.2 Patent application documents. 57 4.2.3 The role of patent searching . 58 4.3 A patent-search task. 59 4.3.1 Description of the supertask . 59 4.3.2 Description of the patent-search task . 60 4.4 Simulated patent-search tasks . 60 4.5 Results and evaluation . 61 4.5.1 Results . 61 4.5.2 Discussion. 62 4.5.3 Conclusions and suggestions for future research . 64 Chapter 5. Dipe-R: a representation language for query formulation . 67 5.1 Two requirements for Dipe-R . 67 5.2 Related research. 69 5.3 Knowledge revisited . ..