Evaluating Information Retrieval and Access Tasks NTCIR’S Legacy of Research Impact the Information Retrieval Series
Total Page:16
File Type:pdf, Size:1020Kb
The Information Retrieval Series 43 Tetsuya Sakai · Douglas W. Oard · Noriko Kando Editors Evaluating Information Retrieval and Access Tasks NTCIR’s Legacy of Research Impact The Information Retrieval Series Volume 43 Series Editors ChengXiang Zhai, University of Illinois, Urbana, IL, USA Maarten de Rijke, University of Amsterdam, Netherlands and Ahold Delhaize, Zaandam, Netherlands Editorial Board Nicholas J. Belkin, Rutgers University, New Brunswick, NJ, USA Charles Clarke, University of Waterloo, Waterloo, ON, Canada Diane Kelly, University of Tennessee at Knoxville, Knoxville, TN, USA Fabrizio Sebastiani, Consiglio Nazionale delle Ricerche, Pisa, Italy Information Retrieval (IR) deals with access to and search in mostly unstructured information, in text, audio, and/or video, either from one large file or spread over separate and diverse sources, in static storage devices as well as on streaming data. It is part of both computer and information science, and uses techniques from e.g. mathematics, statistics, machine learning, database management, or computational linguistics. Information Retrieval is often at the core of networked applications, web-based data management, or large-scale data analysis. The Information Retrieval Series presents monographs, edited collections, and advanced text books on topics of interest for researchers in academia and industry alike. Its focus is on the timely publication of state-of-the-art results at the forefront of research and on theoretical foundations necessary to develop a deeper understanding of methods and approaches. More information about this series at http://www.springer.com/series/6128 Tetsuya Sakai • Douglas W. Oard • Noriko Kando Editors Evaluating Information Retrieval and Access Tasks NTCIR’s Legacy of Research Impact 123 Editors Tetsuya Sakai Douglas W. Oard Department of Computer College of Information Studies Science and Engineering University of Maryland Waseda University College Park, MD, USA Tokyo, Japan Noriko Kando Information-Society Research Division National Institute of Informatics Tokyo, Japan This open access book is funded by the National Institute of Informatics, Japan. ISSN 1871-7500 ISSN 2730-6836 (electronic) The Information Retrieval Series ISBN 978-981-15-5553-4 ISBN 978-981-15-5554-1 (eBook) https://doi.org/10.1007/978-981-15-5554-1 © The Editor(s) (if applicable) and The Author(s) 2021. This book is an open access publication. Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adap- tation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publi- cation does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Foreword It has been a privilege of my career to be involved in many information retrieval evaluation campaigns. For a range of reasons, my involvement with NTCIR has been the longest and the most enjoyable. Principal among the reasons is Noriko Kando. Evaluation campaigns are sus- tained by the unyielding enthusiasm of a core organizer. Kando-san has devotedly innovated this campaign from its very start. Thanks to her, what you will consis- tently see in the chapters of this book is a sequence of tasks or tracks that were ahead of their time. NTCIR was the first to explore patent search, first to incorporate life logging data, and first to examine retrieval of mathematical formulas. NTCIR has innovated in the methodologies used to measure a shared task: the visualization and summarization tasks, for example, require a quite different approach to evaluation than you would see in many other campaigns. Thanks to Sakai-san’s diligent creation, many of the chapters will describe assessment with novel measures. NTCIR was the first to use graded evaluation. Diversity of excellent evaluation research is what I know I will see when I attend NTCIR at the NII building in Chiyoda-ku. Such a range of innovations can only come from a team of outstanding collaborators: you can see from the diversity of chapter authors just how many have contributed their ideas and hard work to NTCIR. Dedication to quality is another reason for my regular visits to Tokyo. Such is the commitment of Kando-san and her team that on the evening marking the end of each NTCIR conference (a time when normal organizers just want to sleep) the team meet up in the NII Tower to discuss what worked well, what didn’t, and how to improve. Oard-san’s thoughtful advice is often to be found there. At that meeting less than a few hours after NTCIR has completed, the next campaign is being planned. v vi Foreword The work described in this book charts the progression of the academic field of information retrieval research from a rather limited library focused research topic to a rich multi-faceted study of information access of all forms of content. It has been my honor to be a part of this campaign and I look forward to what rich new topics it will tackle in the future, at its sesquiennial pace. Melbourne, Australia Mark Sanderson Preface The NTCIR-1 Conference took place in 1999. Back then, NTCIR stood for NACSIS Test Collection for Information Retrieval systems. Ever since, NTCIR has grown in size, broadened its scope, and evolved; now we know it as NII Testbeds and Community for Information access Research. We editors of this book would like to thank everyone who has been involved in NTCIR in the past two decades or so, and in particular the following people, for making this book happen. • The chapter authors: Akiko Aizawa, Rami Albatal, Kuang-Hua Chen, Duc-Tien Dang-Nguyen, Zhicheng Dou, Atsushi Fujii, Takahiro Fukushima, Isao Goto, Cathal Gurrin, Graham Healy, Tsutomu Hirao, Frank Hopfgartner, Makoto Iwayama, Hideo Joho, Noriko Kando, Makoto P. Kato, Tsuneaki Kato, Kazuaki Kishida, Michael Kohlhase, Yiqun Liu, Cheng Luo, Teruko Mitamura, Hidetsugu Nanba, Eric Nyberg, Douglas W. Oard, Manabu Okumura, Tetsuya Sakai, Mark Sanderson, Yohei Seki, Ruihua Song, Masaharu Yoshioka, Min Zhang, and Liting Zhou; • The chapter reviewers: Martin Braschler, Wolfgang Hurst, Nattiya Kanhabua, Stefano Mizzaro, Tatsunori Mori, Ian Soboroff, Damiano Spina, Takehiro Yamamoto, and Richard Zanibbi; • Those who offered constructive comments on the early drafts of the chapters that were publicly available online; • Past and present NTCIR general chairs, PC chairs, EVIA chairs, organizing committee members, and staff; • Past and present NTCIR task organizers and participants, and last but not least; • Springer’s Mio Sugino for her support and perseverance. vii viii Preface This is the first book on NTCIR. A copy of it will be given to all NTCIR-15 participants in December 2020. It has been a long journey, but the journey con- tinues. Stay safe and healthy. Tokyo, Japan Tetsuya Sakai College Park, MD, USA Douglas W. Oard Tokyo, Japan Noriko Kando April 2020 Contents 1 Graded Relevance ...................................... 1 Tetsuya Sakai 2 Experiments on Cross-Language Information Retrieval Using Comparable Corpora of Chinese, Japanese, and Korean Languages .................................. 21 Kazuaki Kishida and Kuang-hua Chen 3 Text Summarization Challenge: An Evaluation Program for Text Summarization ................................. 39 Hidetsugu Nanba, Tsutomu Hirao, Takahiro Fukushima, and Manabu Okumura 4 Challenges in Patent Information Retrieval .................. 49 Makoto Iwayama, Atsushi Fujii, and Hidetsugu Nanba 5 Multi-modal Summarization .............................. 71 Tsuneaki Kato 6 Opinion Analysis Corpora Across Languages ................. 83 Yohei Seki 7 Patent Translation ..................................... 97 Isao Goto 8 Component-Based Evaluation for Question Answering .......... 109 Teruko Mitamura and Eric Nyberg 9 Temporal Information Access ............................. 127 Masaharu Yoshioka and Hideo Joho 10 SogouQ: The First Large-Scale Test Collection with Click Streams Used in a Shared-Task Evaluation .................. 143 Ruihua Song, Min Zhang, Cheng Luo, Tetsuya Sakai, Yiqun Liu, and Zhicheng Dou ix x Contents 11 Evaluation