Dissertation

DISSERTATION Titel der Dissertation A Web-based Mapping Technique for Establishing Metadata Interoperability Verfasser DI. Mag. Bernhard Haslhofer angestrebter akademischer Grad Doktor der technischen Wissenschaften (Dr. techn.) Wien, im Oktober 2008 Studienkennzahl lt. Studienblatt: A 084 881 Dissertationsgebiet lt. Studienblatt: Informatik Betreuer: Prof. Dr. Wolfgang Klas Zweitbetreuer: Prof. Dr. A Min Tjoa Abstract The integration of metadata from distinct, heterogeneous data sources requires metadata interoperability, which is a qualitative property of metadata information objects that is not given by default. The technique of metadata mapping allows domain experts to establish metadata interoperability in a certain integration scenario. Mapping solutions, as a technical manifestation of this technique, are already available for the intensively studied domain of database system interoperability, but they rarely exist for the Web. If we consider the amount of steadily increasing structured metadata and corresponding metadata schemes on the Web, we can observe a clear need for a mapping solution that can operate in a Web-based environment. To achieve that, we first need to build its technical core, which is a mapping model that provides the language primitives to define mapping relationships. Existing Semantic Web languages such as RDFS and OWL define some basic mapping elements (e.g., owl:equivalentProperty, owl:sameAs), but do not address the full spectrum of semantic and structural heterogeneities that can occur among distinct, incompatible metadata information objects. Furthermore, it is still unclear how to process defined mapping relationships during run-time in order to deliver metadata to the client in a uniform way. As the main contribution of this thesis, we present an abstract mapping model, which reflects the mapping problem on a generic level and provides the means for reconciling incompatible metadata. Instance transformation functions and URIs take a central role in that model. The former cover a broad spectrum of possible structural and semantic heterogeneities, while the latter bind the complete mapping model to the architecture of the Word Wide Web. On the concrete, language-specific level we present a binding of the abstract mapping model for the RDF Vocabulary Description Language (RDFS), which allows us to create mapping specifications among incompatible metadata schemes expressed in RDFS. The mapping model is embedded in a cyclic process that categorises the requirements a mapping solution should fulfil into four subsequent phases: mapping discovery, mapping representation, mapping execution, and mapping maintenance. In this thesis, we mainly focus on mapping representation and on the transformation of mapping specifications into executable SPARQL queries. For mapping discovery support, the model provides an interface for plugging-in schema and ontology matching algorithms. For mapping maintenance we introduce the concept of a simple, but effective mapping registry. Based on the mapping model, we propose a Web-based mediator wrapper-architecture that allows domain experts to set up mediation endpoints that provide a uniform SPARQL query interface to a set of distributed metadata sources. The involved data sources are encapsulated by wrapper components that expose the con- tained metadata and the schema definitions on the Web and provide a SPARQL query interface to these metadata. In this thesis, we present the OAI2LOD Server, a wrapper component for integrating metadata that are accessible via the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). In a case study, we demonstrate how mappings can be created in a Web environment and how our mediator wrapper architecture can easily be configured in order to integrate metadata from various heterogeneous data sources without the need to install any mapping solution or metadata integration solution in a local system environment. i Zusammenfassung Die Integration von Metadaten aus unterschiedlichen, heterogenen Datenquellen erfordert Metadaten-Inter- operabilitat,¨ eine Eigenschaft die nicht standardmaßig¨ gegeben ist. Metadaten Mapping Verfahren ermo-¨ glichen es Domanenexperten¨ Metadaten-Interoperabilitat¨ in einem bestimmten Integrationskontext herzuste- llen. Mapping Losungen¨ sollen dabei die notwendige Unterstutzung¨ bieten. Wahrend¨ diese fur¨ den etablierten Bereich interoperabler Datenbanken bereits existieren, ist dies fur¨ Web-Umgebungen nicht der Fall. Betrachtet man das Ausmaß standig¨ wachsender strukturierter Metadaten und Metadatenschemata im Web, so zeichnet sich ein Bedarf nach Web-basierten Mapping Losungen¨ ab. Den Kern einer solchen Losung¨ bildet ein Mappingmodell, das die zur Spezifikation von Mappings notwendigen Sprachkonstrukte definiert. Existierende Semantic Web Sprachen wie beispielsweise RDFS oder OWL bieten zwar grundle- gende Mappingelemente (z.B.: owl:equivalentProperty, owl:sameAs), adressieren jedoch nicht das gesamte Sprektrum moglicher¨ semantischer und struktureller Heterogenitaten,¨ die zwischen unterschiedlichen, inkompatiblen Metadatenobjekten auftreten konnen.¨ Außerdem fehlen technische Losungsans¨ atze¨ zur Uberf¨ uhrung¨ zuvor definierter Mappings in ausfuhrbare¨ Abfragen. Als zentraler wissenschaftlicher Beitrag dieser Dissertation, wird ein abstraktes Mappingmodell pra-¨ sentiert, welches das Mappingproblem auf generischer Ebene reflektiert und Losungsans¨ atze¨ zum Abgleich inkompatibler Schemata bietet. Instanztransformationsfunktionen und URIs nehmen in diesem Modell eine zentrale Rolle ein. Erstere uberbr¨ ucken¨ ein breites Spektrum moglicher¨ semantischer und struktureller Het- erogenitaten,¨ wahrend¨ letztere das Mappingmodell in die Architektur des World Wide Webs einbinden. Auf einer konkreten, sprachspezifischen Ebene wird die Anbindung des abstrakten Modells an die RDF Vocabu- lary Description Language (RDFS) prasentiert,¨ wodurch ein Mapping zwischen unterschiedlichen, in RDFS ausgedruckten¨ Metadatenschemata ermoglicht¨ wird. Das Mappingmodell ist in einen zyklischen Mappingprozess eingebunden, der die Anforderungen an Mappinglosungen¨ in vier aufeinanderfolgende Phasen kategorisiert: mapping discovery, mapping representation, mapping execution und mapping maintenance. Im Rahmen dieser Dissertation beschaftigen¨ wir uns hauptsachlich¨ mit der Representation-Phase sowie mit der Transformation von Mappingspezifikationen in ausfuhrbare¨ SPARQL-Abfragen. Zur Unterstutzung¨ der Discovery-Phase bietet das Mappingmodell eine Schnittstelle zur Einbindung von Schema- oder Ontologymatching-Algorithmen. Fur¨ die Maintenance-Phase prasentieren¨ wir ein einfaches, aber seinen Zweck erfullendes¨ Mapping-Registry Konzept. Auf Basis des Mappingmodells stellen wir eine Web-basierte Mediator-Wrapper Architektur vor, die Domanenexperten¨ die Moglichkeit¨ bietet, SPARQL-Mediationsschnittstellen zu definieren. Die zu integri- erenden Datenquellen mussen¨ dafur¨ durch Wrapper-Komponenen gekapselt werden, welche die enthaltenen Metadaten im Web exponieren und SPARQL-Zugriff ermoglichen.¨ Als beipielhafte Wrapper Komponente prasentieren¨ wir den OAI2LOD Server, mit dessen Hilfe Datenquellen eingebunden werden konnen,¨ die ihre Metadaten uber¨ das Open Archives Initative Protocol for Metadata Harvesting (OAI-PMH) exponieren. Im Rahmen einer Fallstudie zeigen wir, wie Mappings in Web-Umgebungen erstellt werden konnen¨ und wie unsere Mediator-Wrapper Architektur nach wenigen, einfachen Konfigurationsschritten Metadaten aus unterschiedlichen, heterogenen Datenquellen integrieren kann, ohne dass dadurch die Notwendigkeit entsteht, eine Mapping Losung¨ in einer lokalen Systemumgebung zu installieren. iii Acknowledgements First of all, I would like to express my gratitude to my supervisor, Prof. Dr. Wolfgang Klas, whose expertise and stimulating suggestions helped me during all the time of writing this thesis. I also very much appreciate his guidance through the one or other organisational turmoil, which, at the end, made it possible for me to finish my work. I would also like to thank Prof. Dr. A Min Tjoa for being my secondary advisor and for the time and effort he has invested in judging on the contents of this thesis. Furthermore, I would like to thank Prof. DDr. Gerald Quirchmayr for being a member of my thesis commitee. A very special thanks goes out to my colleagues, Niko Poptisch, Wolfgang Jochum, Ross King, Stefan Leitich, Bernhard Schandl, and Stefan Zander. They have supported me throughout the past years, were always available for any kind of discussion, gave valuable comments on my work, and helped me in shaping the approach and bringing the text of this thesis into the present form. And most importantly, they have provided a working environment where doing research is not just work, but also great fun. Things would not run so smoothly without our administrative and technical colleagues. I would like to thank Manu for keeping a large portion of the administrative stuff away from me, and Jan and Peter for taking care that my prototypes keep running. Bringing up the motivation for a long-term project, such as doing a PhD, is hardly possible without strong support from friends and family. Special thanks to the Verein for the long-term friendship we have, and to my parents for the support they have provided me through my entire life. Last but not least, my profound gratitude and love goes to Silvia. She closely experienced the effects of being together with a PhD Student during the past years and always gave me

Dissertation

Metadata for Semantic and Social Applications

A Metadata Registry for Metadata Interoperability

1 1. Opening Page Good Morning Ladies and Gentlemen My Name Is

Reference Architecture for Space Information Management

Metadata Schema Registries in the Partially Semantic Web: the CORES Experience

2 Data Development Overview

Metadata Registry, Iso/Iec 11179

Metadata Standards and Metadata Registries: an Overview

Metadata and Paradata: Information Collection and Potential Initiatives

Ebxml Manager Composite Application User's Guide

Semantic Technologies I OMG Ontology Definition Metamodel

Metadata Standards & Applications