The Thirteenth Text Retrieval Conference, TREC 2004
Total Page:16
File Type:pdf, Size:1020Kb
NATL INST. OF STAND 8, TECH NIST AlllDt flfl7bfifl PUBLICATIONS NIST Special Publication 500-261 Information Technolosv: The Thirteenth Text REtrieval Conference, TREC 2004 Ellen M. Voorhees and Lori P. Buckland Editors Nisr ^ National Institute of Standards and Technology I ^ Technology Administration, U.S. Department of Commerce \jlJ^^ he National Institute of Standards and Technology was established in 1988 by Congress to "assist industry J ... in the development of technology needed to improve product quality, to modernize manufacturing processes, to ensure product reliability ... and to facilitate rapid commercialization ... of products based on new scientific discoveries." NIST, originally founded as the National Bureau of Standards in 1901, works to strengthen U.S. industry's competitiveness; advance science and engineering; and improve public health, safety, and the environment. One of the agency's basic functions is to develop, maintain, and retain custody of the national standards of measurement, and provide the means and methods for comparing standards used in science, engineering, manufacturing, commerce, industry, and education with the standards adopted or recognized by the Federal Government. As an agency of the U.S. Commerce Department's Technology Administration, NIST conducts basic and applied research in the physical sciences and engineering, and develops measurement techniques, test methods, standards, and related services. The Institute does generic and precompetitive work on new and advanced technologies. NIST's research facilities are located at Gaithersburg, MD 20899, and at Boulder, CO 80303. Major technical operating units and their principal activities are listed below. For more information visit the NIST Website at http://www.nist.gov, or contact the Publications and Program Inquiries Desk, 301-975-3058. Office of the Director Chemical Science and Technology • National Quality Program Laboratory • Intemational and Academic Affairs • Biotechnology • Process Measurements Technology Services • Surface and Microanalysis Science • Standards Services • Physical and Chemical Properties^ • Technology Partnerships • Analytical Chemistry • Measurement Services • Information Services Physics Laboratory • Weights and Measures • Electron and Optical Physics • Atomic Physics Advanced Technology Program • Optical Technology • Economic Assessment • Ionizing Radiation • Information Technology and Applications • Time and Frequency' • Chemistry and Life Sciences • Quantum Physics' • Electronics and Photonics Technology Manufacturing Engineering Manufacturing Extension Partnership Laboratory Program • Precision Engineering • Regional Programs • Manufacturing Metrology • National Programs • Intelligent Systems • Program Development • Fabrication Technology • Manufacturing Systems Integration Electronics and Electrical Engineering Laboratory Building and Fire Research • Microelectronics Laboratory • Law Enforcement Standards • Applied Economics • Electricity • Materials and Construction Research • Semiconductor Electronics • Building Environment • Radio-Frequency Teclmology' • Fire Research • Electromagnetic Teclmology' • Optoelectronics' Information Technology Laboratory • Magnetic Teclmology' • Mathematical and Computational Sciences^ • Advanced Network Technologies Materials Science and Engineering • Computer Security Laboratory • hifomiation Access • Intelligent Processing of Materials • Convergent Information Systems • Ceramics • Infomiation Services and Computing • Materials Reliability! • Software Diagnostics and Conformance Testing • Polymers • Statistical Engineering • Metallurgy • NIST Center for Neutron Research 'At Boulder, CO 80303 ^Soine elements at Boulder, CO NIST Special Publication 500-261 Information Technolosv: The Thirteenth Text Retrieval Conference, TREC 2004 Ellen M. Voorhees and Lori P. Buckland Editors Information Technology Laboratory Information Access Division National Institute ofStandards and Technology Gaithersburg, MD 20899-8940 August 2005 U.S. Department of Commerce Carlos M. Gutierrez, Secretary Technology Administration Michelle O 'Neill, Acting Under Secretary ofCommerce for Technology National Institute of Standards and Technology William A. Jeffrey, Director Reports on Information Technology The Information Technology Laboratory (ITL) at the National Institute of Standards and Technology (NIST) stimulates U.S. economic growth and industrial competitiveness through technical leadership and collaborative research in critical infrastructure technology, including tests, test methods, reference data, and forward-looking standards, to advance the development and productive use of information technology. To overcome barriers to usability, scalability, interoperability, and security in information systems and networks, ITL programs focus on a broad range of networking, security, and advanced information technologies, as well as the mathematical, statistical, and computational sciences. This Special Publication 500-series reports on ITL's research in tests and test methods for information technology, and its collaborative activities with industry, government, and academic organizations. Certain commercial entities, equipment, or materials may be identified in this document in order to describe an experimental procedure or concept adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the entities, materials, or equioment are necessarily the best available for the nurnose. National Institute of Standards and Technology Special Publication 500-261 Natl. Inst. Stand. Technol. Spec. Publ. 500-261 132 pages (August 2005) CODEN: NSPUE2 U.S. GOVERNMENT PRINTING OFFICE WASHINGTON: 2005 For sale by the Superintendent of Documents, U.S. Government Printing Office Internet: bookstore.gpo.gov—Phone: (202) 512-1800—Fax: (202) 512-2250 Mail: Stop SSOP, Washington, DC 20402-0001 Foreword This report constitutes the proceedings of the 2004 edition of the Text REtrieval Conference, TREC 2004, held in Gaithersburg, Maryland, November 16-19, 2004. The conference was co- sponsored by the National Institute of Standards and Technology (NIST), the Advanced Research and Development Activity (ARDA), and the Defense Advanced Research Projects Agency (DARPA) Approximately 200 people attended the conference, including representatives from 21 different countries. The conference was the thirteenth in an on-going series of workshops to evaluate new technologies for text retrieval and related information-seeking tasks. The workshop included plenary sessions, discussion groups, a poster session, and demonstrations. Because the participants in the workshop drew on their personal experiences, they sometimes cite specific vendors and commercial products. The inclusion or omission of a particular company or product implies neither endorsement nor criticism by NIST. Any opinions, findings, and con- clusions or recommendations expressed in the individual papers are the authors' own and do not necessarily reflect those of the sponsors. The sponsorship of the U.S. Department of Defense is gratefully acknowledged, as is the tremen- dous work of the program committee and the track coordinators. Ellen Voorhees August 2, 2005 TREC 2004 Program Committee Ellen Voorhees, NIST, chair James Allan, University of Massachusetts at Amherst Chris Buckley, Sabir Research, Inc. Gordon Cormack, University of Waterloo Susan Dumais, Microsoft Donna Harman, NIST David Hawking, CSIRO Bill Hersh, Oregon Health & Science University David Lewis, Omarose Inc. John Prager, IBM John Prange, U.S. Department of Defense Steve Robertson, Microsoft Mark Sanderson, University of Sheffield Karen Sparck Jones, University of Cambridge, UK Ross Wilkinson, CSIRO 111 I I I TREC 2004 Proceedings Foreword iii Listing of contents of Appendix xiv Listing of papers, alphabetical by organization xv Listing of papers, organized by track xxiv Abstract xxxvi Overview Papers Overview of TREC 2004 1 E.M. Voorhees, National Institute of Standards and Technology (NIST) TREC 2004 Genomics Track Overview 13 W.R. Hersh, R.T. Bhuptiraju, L. Ross, A.M. Cohen, D.F. Kraemer, Oregon Health & Science University P. Johnson, Biogen Idee Corporation HARD Track Overview in TREC 2004 25 High Accuracy Retrieval from Documents J. Allan, University of Massachusetts Amherst Overview of the TREC 2004 Novelty Track 36 I.Soboroff, NIST Overview of the TREC 2004 Question Answering Track 52 E.M. Voorhees, NIST Overview of the TREC 2004 Robust Track 70 E.M. Voorhees, NIST Overview of the TREC 2004 Terabyte Track 80 C. Clarke, University of Waterloo N. Craswell, Microsoft Research I. Soboroff, National Institute of Standards and Technology Overview of the TREC 2004 Web Track 89 N. Craswell, MSR Cambridge D. Hawking, CSIRO V Other Papers (contents ofthese papers are found on the TREC 2004 Proceedings CD) Phrasal Queries with LingPipe and Lucene: Ad Hoc Genomics Text Retrieval B. Carpenter, Alias-i, Inc. Experiments with Web QA System and TREC 2004 Questions D. Roussinov, Y. Ding, J.A. Robles-Flores, Arizona State University Categorization of Genomics Text Based on Decision Rules R. Guillen, California State University San Marcos Liitial Results with Structured Queries and Language Models on Half a Terabyte of Text K. Collins-Thompson, P. Ogilvie, J. Callan, Carnegie Mellon University Experiments in TREC 2004 Novelty Track at CAS-ICT H.-P. Zhang, H.-B. Xu, S. Bai, B. Wang, X.-Q.