INEX 2011 Workshop Pre-Proceedings
Total Page:16
File Type:pdf, Size:1020Kb
INEX 2011 Workshop Pre-proceedings Shlomo Geva, Jaap Kamps, Ralf Schenkel (editors) December 12–14, 2011 Hofgut Imsbach, Saarbr¨ucken, Germany http://inex.mmci.uni-saarland.de/ Attribution http://creativecommons.org/licenses/by/3.0/ Copyright c 2011 remains with the author/owner(s). The unreviewed pre-proceedings are collections of work submitted before the December workshops. They are not peer reviewed, are not quality controlled, and contain known errors in content and editing. The proceedings, published after the Workshop, is the authoritative reference for the work done at INEX. Published by: IR Publications, Amsterdam. ISBN 978-90-814485-8-1. INEX Working Notes Series, Volume 2011. Preface Welcome to the tenth workshop of the Initiative for the Evaluation of XML Retrieval (INEX)! Traditional IR focuses on pure text retrieval over “bags of words” but the use of structure—such as document structure, semantic metadata, entities, or genre/topical structure—is of increasing importance on the Web and in profes- sional search. INEX has been pioneering the use of structure for focused retrieval since 2002, by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. Now, in its tenth year, INEX is an established evaluation forum, with over 100 organizations worldwide registered and over 30 groups participating actively in at least one of the tracks. INEX 2011 was an exciting year for INEX in which a number of new tasks and tracks started, including Social Search, Faceted Search, Snippet Retrieval, and Tweet Contextualization. In total five research tracks were included, which studied different aspects of focused information access: Books and Social Search Track investigating techniques to support users in searching and navigating books, metadata and complementary social media. The Social Search for Best Books Task studies the relative value of authori- tative metadata and user-generated content using a collection based on data from Amazon and LibraryThing. The Prove It Task asks for pages confirm- ing or refuting a factual statement, using a corpus of the full texts of 50k digitized books. Data Centric Track investigating retrieval over a strongly structured collec- tion of documents based on IMDb. The Ad Hoc Search Task has informa- tional requests to be answered by the entities in IMDb (movies, actors, di- rectors, etc.). The Faceted Search Task asks for a restricted list of facets and facet-values that will optimally guide the searcher toward relevant informa- tion. Question Answering Track investigating tweet contextualization, answering questions of the form “what is this tweet about?” with a synthetic summary of contextual information grasped from Wikipedia and evaluated by both the relevant text retrieved, and the “last point of interest.” Relevance Feedback Track investigate the utility of incremental passage level relevance feedback by simulating a searcher’s interaction. An unconventional evaluation track where submissions are executable computer programs rather than search results. Snippet Retrieval Track investigate how to generate informative snippets for search results. Such snippets should provide sufficient information to allow the user to determine the relevance of each document, without needing to view the document itself. IV Two more tracks were announce, continuations of the Interactive Track and the Web Service Discovery Track, but these failed to complete in time for INEX 2011. The aim of the INEX 2011 workshop is to bring together researchers who participated in the INEX 2011 campaign. During the past year participating or- ganizations contributed to the building of a large-scale test collection by creating topics, performing retrieval runs and providing relevance assessments. The work- shop concludes the results of this large-scale effort, summarizes and addresses encountered issues and devises a work plan for the future evaluation of XML retrieval systems. All INEX tracks start from having available suitable text collections. We gratefully acknowledge the data made available by: Amazon and LibraryThing (Books and Social Search Track), Microsoft Research (Books and Social Search Track), the Internet Movie Database (Data Centric Track), and the Wikimedia Foundation (Question Answering Track and Relevance Feedback Track). Finally, INEX is run for, but especially by, the participants. It is a result of tracks and tasks suggested by participants, topics created by particants, systems built by participants, and relevance judgments provided by participants. So the main thank you goes each of these individuals! December 2011 Shlomo Geva Jaap Kamps Ralf Schenkel Organization Steering Committee Charles L. A. Clarke (University of Waterloo) Norbert Fuhr (University of Duisburg-Essen) Shlomo Geva (Queensland University of Technology) Jaap Kamps (University of Amsterdam) Mounia Lalmas (Yahoo! Research) Stephen E. Robertson (Microsoft Research Cambridge) Ralf Schenkel (Max-Planck-Institut f¨ur Informatik) Andrew Trotman (University of Otago) Ellen M. Voorhees (NIST) Arjen P. de Vries (CWI) Chairs Shlomo Geva (Queensland University of Technology) Jaap Kamps (University of Amsterdam) Ralf Schenkel (Max-Planck-Institut f¨ur Informatik) Track Organizers Books and Social Search Antoine Doucet (University of Caen) Jaap Kamps (University of Amsterdam) Gabriella Kazai (Microsoft Research Cambridge) Marijn Koolen (University of Amsterdam) Monica Landoni (University of Strathclyde) Data Centric Jaap Kamps (University of Amsterdam) Maarten Marx (University of Amsterdam) Georgina Ram´ırez Camps (Universitat Pompeu Fabra) Martin Theobald (Max-Planck-Institut f¨urInformatik) Qiuyue Wang (Renmin University of China) VI Question Answering Patrice Bellot (University of Avignon) V´eronique Moriceau (LIMSI-CNRS, University Paris-Sud 11) Josiane Mothe (IRIT, Toulouse) Eric SanJuan (University of Avignon) Xavier Tannier (LIMSI-CNRS, University Paris-Sud 11) Relevance Feedback Timothy Chappell (Queensland University of Technology) Shlomo Geva (Queensland University of Technology) Snippet Retrieval Shlomo Geva (Queensland University of Technology) Mark Sanderson (RMIT) Falk Scholer (RMIT) Andrew Trotman (University of Otago) Matthew Trappett (Queensland University of Technology) Table of Contents Front matter. Preface .......................................................... iii Organization ..................................................... v Table of Contents ................................................. vii Books and Social Search Track. Overview of the INEX 2011 Book Track ............................. 11 Gabriella Kazai, Marijn Koolen, Jaap Kamps, Antoine Doucet and Monica Landoni University of Amsterdam at INEX 2011: Book and Data Centric Tracks . 36 Frans Adriaans, Jaap Kamps and Marijn Koolen RSLIS at INEX 2011: Social Book Search Track ...................... 49 Toine Bogers, Kirstine Wilfred Christensen and Birger Larsen Social Recommendation and External Resources for Book Search........ 60 Romain Deveaud, Eric Sanjuan and Patrice Bellot The University of Massachusetts Amherst’s Participation in the INEX 2011 Prove It Track ............................................... 65 Henry Feild, Marc Cartright and James Allan TOC Structure Extraction from OCR-ed Books ....................... 70 Caihua Liu, Jiajun Chen, Xiaofeng Zhang, Jie Liu and Yalou Huang OUC’s participation in the 2011 INEX Book Track .................... 81 Michael Preminger and Ragnar Nordlie Data Centric Track. Overview of the INEX 2011 Data-Centric Track ....................... 88 Qiuyue Wang, Georgina Ram´ırez, Maarten Marx, Martin Theobald and Jaap Kamps Edit Distance for XML Information Retrieval : Some experiments on the Datacentric track of INEX 2011 ................................. 107 Cyril Laitang, Karen Pinel Sauvagnat and Mohand Boughanem UPF at INEX 2011: Data Centric and Books and Social Search tracks ... 117 Georgina Ram´ırez VIII University of Amsterdam Data Centric Ad Hoc and Faceted Search Runs 124 Anne Schuth and Maarten Marx BUAP: A Recursive Approach to the Data-Centric track of INEX 2011 .. 127 Darnes Vilari˜noAyala, David Pinto, Sa´ulLe´onSilverio, Esteban Castillo and Mireya Tovar Vidal RUC at INEX 2011 Data-Centric Track .............................. 136 Qiuyue Wang, Yantao Gan and Yu Sun MEXIR at INEX-2011 ............................................. 140 Tanakorn Wichaiwong and Chuleerat Jaruskulchai Question Answering Track. Overview of the INEX 2011 Question Answering Track (QA@INEX) .... 145 Eric Sanjuan, V´eronique Moriceau, Xavier Tannier, Patrice Bellot and Josiane Mothe A Dynamic Indexing Summarizer at the QA@INEX 2011 track ........ 154 Luis Adri´anCabrera Diego, Alejandro Molina and Gerardo Sierra IRIT at INEX: Question answering task ............................. 160 Liana Ermakova and Josiane Mothe Overview of the 2011 QA Track: Querying and Summarizing with XML.. 167 Killian Janod and Olivier Mistral A graph-based summarization system at QA@INEX2011 track 2011 ..... 175 Ana Lilia Laureano Cruces and Ramirez Javier SUMMA Content Extraction for INEX 2011 .......................... 180 Horacio Saggion Combining relevance and readability for INEX 2011 Question-Answering track ............................................................ 185 Jade Tavernier and Patrice Bellot The Cortex and Enertex summarization systems at the QA@INEX track 2011 ....................................................... 196 Juan-Manuel Torres Moreno, Patricia