State of the Art
Total Page:16
File Type:pdf, Size:1020Kb
Project no. 033104 MultiMatch Technology-enhanced Learning and Access to Cultural Heritage Instrument: Specific Targeted Research Project FP6-2005-IST-5 D1.1.3 – State of the Art Start Date of Project: 01 May 2006 Duration: 30 Months Organization Name of Lead Contractor for this Deliverable: ISTI-CNR Final Version Project co-funded by the European Commission within the Sixth Framework Programme (2002-2006) ______________________________________________________________ Document Information Deliverable number: D1.1.3 Deliverable title: State of the Art Report Due date of deliverable: July 2008 Actual date of deliverable: December 2008 Main Author(s): Editor & Introduction: Carol Peters, ISTI-CNR Section 2: Johan Oomen, BandG Section 3: Carl Ibbotson OCLC PICA; Contributions from WIND, UniGE, Alinari, ISTI-CNR Section 4: Neil Ireson, USFD Section 5: Martha Larson and Jaap Kamps, UvA Section 6 Stephane Marchand-Maillet and Eric Bruno, UniGE Section 7: Gareth Jones, DCU; Contributions from UniGE, UvA Section 8: Paul Clough, USFD Participant(s): All Partners Workpackage: 1 Workpackage title: User Requirements & Functional Specification Workpackage leader: UNED Dissemination Level: PU (Public) Version: Final Keywords: cultural heritage, metadata, digital asset management, search engines, multilingual indexing and retrieval, multimedia indexing and retrieval, information classification, information extraction, user interaction, interface design History of Versions Version Date Author (Partner) Description/Approval Level V1 15.11.08 C.Peters, ISTI-CNR Version1 with contributions for most sections Final 39.11.08 All Completed and checked version Abstract MultiMatch aims at complex, heterogeneous digital object retrieval and presentation. The development of the system implies addressing a number of significant research challenges in a multidisciplinary context. This report describes the state of the art in the relevant areas of research, thus specifying the scientific and technology baseline from which the consortium partners start. It has been released in three instalments (D1.1.1; D1.1.2, and the current document D1.1.3). We originally identified six main areas: existing technology for cultural heritage; search engines; information extraction and classification; multilingual/multimedia indexing; multilingual/multimedia retrieval; user interaction and interface design. Each area was first reviewed in a separate chapter in D1.1.1, released in December 2006. A substantial update describing Image Collections and Browsing was added as a separate report in December 2007 (D1.2). In this final version, we provide significant updates to the original documents and in addition each chapter terminates with a section which relates the general sate-of-the-art in that area to what has been done in MultiMatch. Our aim has been to provide a complete panorama of the actual state-of-the-art in the areas of interest to MultiMatch, covering as far as possible all relevant aspects. Del. 1.1.3 State of the Art – Revised version Page 1 of 157 Table of Contents Document Information .................................................................................................................................... 1 Abstract ............................................................................................................................................................ 1 Executive Summary ......................................................................................................................................... 5 Introduction ..................................................................................................................................................... 8 1.1 Structure and Contents ..................................................................................................................... 8 1.2 Technology for Cultural Heritage .................................................................................................... 8 1.3 Focussed Search Engines ................................................................................................................. 9 1.4 Information Extraction and Classification ....................................................................................... 9 1.5 Multilingual/Multimedia Indexing ................................................................................................. 10 1.6 Image Collections Overview and Browsing ................................................................................... 10 1.7 Multilingual/Multimedia Information Retrieval ............................................................................ 10 1.8 User-centred Interaction and Interface Design ............................................................................... 11 1.9 Summing Up .................................................................................................................................. 11 Technology for Cultural Heritage ................................................................................................................ 12 2.1 Trends in Digital Library Software ................................................................................................ 12 2.1.1 Commercial vendors update ...................................................................................................... 12 2.1.2 Open Source Software Suites .................................................................................................... 15 2.1.3 Europeana: The European Digital Library ................................................................................. 16 2.1.4 European Research initiatives .................................................................................................... 20 2.2 Developments in Metadata Interoperability ................................................................................... 22 2.2.1 OAI-ORE ................................................................................................................................... 22 2.2.2 Convert thesauri for interoperability with the Semantic Web ................................................... 23 2.2.3 Reference models CIDOC CRM and Getty Crosswalks ........................................................... 23 2.2.4 Atom and tx metadata ................................................................................................................ 24 2.2.5 PBCore and EBU Core .............................................................................................................. 25 2.3 Recent Trends regarding Digitization Standards ........................................................................... 25 2.3.1 Moving Images .......................................................................................................................... 25 2.3.2 Photographs ............................................................................................................................... 26 2.4 Cultural Heritage and Web 2.0 ....................................................................................................... 26 2.4.1 Social network services and software ........................................................................................ 27 2.4.2 Content distribution and mashups .............................................................................................. 27 2.4.3 Crowd sourcing and semantic tagging ....................................................................................... 28 2.5 MultiMatch and Moving beyond the State of the Art ............................................................................ 29 3. Vertical /Focussed Search Engines ...................................................................................................... 33 3.1 Generic Search Engines ................................................................................................................. 33 3.1.1 Web Crawling ............................................................................................................................ 33 3.1.2 Indexing ..................................................................................................................................... 33 3.1.3 Searching ................................................................................................................................... 35 3.2 Vertical/ Focussed Search Engines ................................................................................................ 35 3.3 Domain Targeted Search Engines .................................................................................................. 36 3.4 Media Targeted Search Engines ..................................................................................................... 37 3.4.1 Multimedia Search Engines ....................................................................................................... 37 Del. 1.1.3 State of the Art – Revised version Page 2 of 157 3.4.2 Future of Multimedia Searching ................................................................................................ 39 3.5 Multilingual Search Engines .......................................................................................................... 39 3.6 Recent Developments in Multilingual / Multimedia Search Engines ............................................ 40 3.7 Conclusions ...................................................................................................................................