Unified Retrieval in Distributed and Heterogeneous Multimedia Information Systems Dipl. Inf. Florian Stegmaier December 18, 2013

University of Passau Department of Informatics and Mathematics Chair of Distributed Information Systems Doctoral Thesis Unified Retrieval in Distributed and Heterogeneous Multimedia Information Systems Dipl. Inf. Florian Stegmaier December 18, 2013 Advisor: Prof. Dr. Harald Kosch Second Advisor: Prof. Dr. Richard Chbeir In theory, there is no difference between theory and practice. But, in practice, there is. Jan L. A. van de Snepscheut (1953 – 1994). For Tamara. Abstract Multimedia retrieval is an essential part of today’s world. This situation is ob- servable in industrial domains, e.g., medical imaging, as well as in the private sector, visible by activities in manifold Social Media platforms. This trend led to the cre- ation of a huge environment of multimedia information retrieval services offering multimedia resources for almost any user requests. Indeed, the encompassed data is in general retrievable by (proprietary) APIs and query languages, but unfortunately a unified access is not given due to arising interoperability issues between those services. In this regard, this thesis focuses on two application scenarios, namely a medical retrieval system supporting a radiologist’s workflow, as well as an interoperable image retrieval service interconnecting diverse data silos. The scientific contribution of this dissertation is split in three different parts: the first part of this thesis improves the metadata interoperability issue. Here, major contributions to a community-driven, international standardization have been proposed leading to the specification of an API and ontology toenableaunified annotation and retrieval of media resources. The second partissuesametasearch engine especially designed for unified retrieval in distributed and heterogeneous multimedia retrieval environments. This metasearch engineiscapableofbeing operated in a federated as well as autonomous manner inside the aforementioned application scenarios. The remaining third part ensures an efficient retrieval due to the integration of optimization techniques for multimedia retrieval in the overall query execution process of the metasearch engine. Keywords: Distributed multimedia retrieval, query optimization, multimedia annotation, interoperability Kurzzusammenfassung Egal ob im industriellen Bereich oder auch im Social Media - multimediale Daten nehmen eine immer zentralere Rolle ein. Aus diesem fortlaufendem En- twicklungsprozess entwickelten sich umfangreiche Informationssysteme, die Daten für zahlreiche Bedürfnisse anbieten. Allerdings ist ein einheitlicher Zugriff auf jene verteilte und heterogene Landschaft von Informationssystemen in der Praxis nicht gewährleistet. Und dies, obwohl die Datenbestände meist über Schnittstellen abruf- bar sind. Im Detail widmet sich diese Arbeit mit der Bearbeitung zweier Anwen- dungsszenarien. Erstens, einem medizinischen System zur Diagnoseunterstützung und zweitens einer interoperablen, verteilten Bildersuche. Der wissenschaftliche Teil der vorliegenden Dissertation gliedert sich in drei Teile: Teil eins befasst sich mit dem Problem der Interoperabilitätzwischenverschiede- nen Metadatenformaten. In diesem Bereich wurden maßgebliche Beiträge für ein internationales Standardisierungsverfahren entwickelt.Zielwares,einerOntolo- gie, sowie einer Programmierschnittstelle einen vereinheitlichten Zugriff auf multimediale Informationen zu ermöglichen. In Teil zwei wird eineexterneMetasuch- maschine vorgestellt, die eine einheitliche Anfrageverarbeitung in heterogenen und verteilten Multimediadatenbanken ermöglicht. In den Anwendungsszenarien wird zum einen auf eine föderative, als auch autonome Anfrageverarbeitung eingegan- gen. Abschließend werden in Teil drei Techniken zur Optimierung von verteilten multimedialen Anfragen präsentiert. Stichwörter: Verteilte Multimedia-Anfrageverarbeitung, Optimierung von Multimedia-Anfragen, Multimedia Annotation, Interoperabilität Acknowledgements Ihavebeenworkingonthetopicsofthisthesisformorethenfour years. Without the constant support of certain people, I surely would not have had the patience to complete it. I am not a man of many words or big explanations; however, I want to take the opportunity to express my grateful thanks: At first I want to thank my advisor Harald Kosch as well as Mario Döller for their supervision, sophisticated feedback and friendly advise. Honestly, I feel very lucky to have met so many outstanding colleagues during my period at the Chair of Distributed Information Systems (in alphabetic order): Werner Bailer, Sebastian Bayerl, Emanuel Berndl, Tobias Bürger, David Coquil, Andreas Eisenkolb, Michael Granitzer, Udo Gröbner, Thomas Kurz, Tobias Rene Mayer, Britta Meixner, Tilmann Rabl, Hatem Moussely Sergieh, Kai Schlegel, Stella Stars, Ingrid Win- ter, and Andreas Wölfl. All of you are a part of this thesis: the feedback you gave me, the ears you lent me during hard times or simply because youalwaysbrightened my day. You are all very special to me. Further, I want to thank my family: My parents Christa and Johann, who always believed in me and gave me the opportunity to study in the area of my interest. My brother Bernhard, who had really hard times in guiding my firststepsinprogram- ming Java, along with his wife Angela. My aunts and uncles Uschi and Hannes as well as Anita and Raimund. Finally my grandma Paula for being as she is. Besides, Iamblessedwithreallygreatfriends,onwhichIcanalwayscount – unfortunately Icannotenumerateeverybody,theonesIhaveinmindareawareofit. IdedicatethisthesistomybelovedwifeTamara,themostimportant person in my life. Thank you for everything. Contents List of Tables xiii List of Figures xvi List of Listings xvii IPreface 1 1 Introduction 3 1.1 Motivation ................................ 3 1.2 Contributions ............................... 5 1.3 Overview ................................. 6 II Foundations of Multimedia Information Retrieval 9 2 Multimedia Information Retrieval in a Nutshell 11 2.1 Terminology ................................ 11 2.2 Excursus: Information Retrieval .................... 12 2.3 Classification of Multimedia Information Retrieval Techniques .... 16 2.4 Multimedia Information Retrieval: An On-going Challenge ..... 17 3 Modeling Multimedia Metadata 21 3.1 Only Data about Data? ......................... 21 3.2 Classifying Metadata Schemas ...................... 25 3.3 Metamodels for Designing Metadata Schemas ............. 26 3.3.1 Representational Models ..................... 27 3.3.2 Multimedia Ontologies ...................... 29 3.4 Presence of Multimedia Metadata .................... 31 3.5 Metadata, Standardization and the Web ................ 33 4 Indexing Multimedia Resources 37 4.1 Usage of Multimedia Features During Retrieval ............ 37 4.2 Expressiveness of Features in Image Retrieval ............. 40 4.3 Similarity Measures ........................... 42 4.4 Characteristics of Similarity Query Types ............... 42 4.5 Accessing Multimedia Features ..................... 44 4.6 The Curse of Dimensionality and Beyond ............... 47 x Contents 5 Multimedia Retrieval Systems 49 5.1 Terminology & Requirements Definition ................ 49 5.2 Architectural Facets ........................... 50 5.3 Multimedia Query Languages ...................... 52 III Improving Metadata Interoperability 55 6 Unified Access to Multimedia Metadata 57 6.1 Related Work ............................... 57 6.1.1 Many Standards for Different Needs .............. 57 6.1.2 Interoperability Approaches between Metadata Schemas ... 58 6.2 Use Case & Requirements ........................ 59 6.3 A Pivot Metadata Scheme for Media Resources ............ 61 6.3.1 Ontology for Media Resources 1.0 ............... 62 6.3.2 Alignment of Metadata Formats ................ 62 6.3.3 API for Media Resources 1.0 .................. 65 7 Discussion 71 IV Distributed Multimedia Retrieval 75 8 AIR: Architecture for Interoperable Retrieval 77 8.1 Related Work ............................... 77 8.2 Application Scenarios .......................... 78 8.2.1 THESEUS: MEDICO ...................... 78 8.2.2 Interoperable Image Search ................... 79 8.3 Design Principles ............................. 81 8.4 Excursus: MPEG Query Format .................... 82 8.5 Query Execution Strategies ....................... 85 8.6 Architectural Facets ........................... 87 8.7 Distributed Query Processing ...................... 89 9 Discussion 95 VOptimizingDistributedMultimediaRetrieval 97 10 Optimization Techniques for Query Execution 99 10.1 Related Work ............................... 99 10.2 Intra-Query Optimization ........................100 10.2.1 Query Execution Planning ....................100 10.2.2 Query Processing Strategies ...................101 10.3 Inter-Query Optimization ........................104 Contents xi 10.3.1 Query Scheduler .........................104 10.3.2 Multimedia Caching System ...................104 11 Retrieval in Unfederated Multimedia Environments 109 11.1 Characterizing the Issue .........................109 11.2 Related Work ...............................110 11.3 Definitions and Notations ........................111 11.4 Algorithm Inspection & System Integration ..............117 12 Evaluation 119 12.1 Quality Measures .............................119 12.2 Evaluation of Optimization Techniques .................122 12.2.1 Evaluation Environment .....................122 12.2.2 Comparison of Intra-Query Optimization Strategies .....123 12.2.3 Results of Inter-Query Optimization ..............125 12.3 Evaluation

Load more