Tagging Methods for Linked Media Data

Master Thesis August 22, 2012 Tagging Methods for Linked Media Data David Oggier of Vevey VD, Switzerland Student-ID: 05-219-209 [email protected] Advisor: Thomas Scharrenbach, PhD Prof. Abraham Bernstein, PhD Department of Informatics University of Zurich http://www.iﬁ.uzh.ch/ddis Acknowledgements I would like to thank Professor Abraham Bernstein for the opportunity to write my Master’s Thesis with the DDIS group of the University of Zurich. I am also very thankful to my advisor Thomas Scharrenbach for his great support and advice and to Jörg-Uwe Kietz for filling in during his absence. Many thanks to the whole IT-OSE team at Hessischer Rundfunk for their help and in particular to Patric Kabus and Haiko Emmel for making this thesis possible. Last but not least, I would like to thank my family and friends for supporting me. Special thanks to my proofreaders Amir and Christoph. ii Abstract In this thesis, a method is presented to tag the media metadata of a broadcasting company with Linked Data concepts. Specifically, a controlled vocabulary in the form of a thesaurus is used as an intermediary between broadcast metadata and Linked Data vocabularies. A method to link this metadata with appropriate thesaurus entries, as well as an algorithm to align the latter with Linked Data concepts are presented and evaluated. Furthermore, it is investigated whether a benefit is gained for user queries by applying faceted search on the resulting semantically enhanced data. iii Zusammenfassung In dieser Arbeit wird ein Vorgehen um Medien-Metadaten einer Rundfunkanstalt mit Linked Data- Konzepten zu verlinken vorgestellt. Dabei wird ein kontrolliertes Vokabular in der Form eines Thesaurus verwendet, um die Verbindung zwischen Inhaltsbeschreibungen von Beiträgen und Linked Data-Datenbeständen herzustellen. Das Vorgehen beinhaltet eine Methode um diese Inhaltbeschreibungen mit Thesaurus-Konzepten zu versehen, sowie einen Algorithmus welcher diese Thesaurus-Konzepte mit Linked Data-Ressourcen verknüpft. Des Weiteren wird untersucht, wie diese Verknüpfung von Inhaltsbeschreibungen mit semantischen Daten durch eine facettierte Suche genutzt werden kann, und ob diese gegenüber herkömmlichen Suchmethoden vorteilhaft ist. iv List of Figures Figure 1: Semantic Web standards stack (Obitko 2007) ……………………………………………………………......... 7 Figure 2: Example RDF graph ……………………………………………………………………………………………………………. 8 Figure 3: The Linked Data cloud ……………………………………………………………………………………………………… 16 Figure 4: Screenshot of Faceted Wikipedia Search …………………………………………………………………………. 18 Figure 5: The FESADdigital search form ………………………………………………………………………………………….. 27 Figure 6: The FESADred search page ………………………………………………………………………………………………. 28 Figure 7: Overview of the implementation steps ……………………………………………………………………………. 31 Figure 8: Workflow of the matching for a particular thesaurus-DBpedia concept pair ……………………. 35 Figure 9: Screenshot of the faceted search client …………………………………………………………………………… 41 Figure 10: Aggregate precision values ……………………………………………………………………………………………. 49 Figure 11: Aggregate recall values …………………………………………………………………………………………………. 49 v List of Tables Table 1: General alignment statistics ……………………………………………………………………..…………….………. 45 Table 2: Frequencies of match types ………………………………………………………………………….…………………. 45 Table 3: Alignment quality results ………………………………………………………………………………………………… 46 Table 4: Keyword extraction quality results ………………………………………………………………………………….. 50 vi List of Listings Listing 1: Sample RDF/XML statement …………………………………………………………………………………………….. 8 Listing 2: Sample RDF/N3 statement ……………………………………………………………………………………………….. 9 Listing 3: Sample SPARQL query …………………………………………………………………………………………………….. 11 Listing 4: Sample SKOS concept definition ……………………………………………………………………………………… 13 vii Contents Acknowledgements ..................................................................................................................................ii Abstract ................................................................................................................................................... iii Zusammenfassung ................................................................................................................................... iv List of Figures ............................................................................................................................................ v List of Tables ............................................................................................................................................ vi List of Listings ......................................................................................................................................... vii Contents ................................................................................................................................................ viii 1. Introduction ......................................................................................................................................... 1 1.1 Motivation ..................................................................................................................................... 1 1.2 Thesis Goals ................................................................................................................................... 2 1.3 Thesis Outline ................................................................................................................................ 3 2. Background .......................................................................................................................................... 5 2.1 The Semantic Web......................................................................................................................... 5 2.1.1 Semantic Web Goals ............................................................................................................... 5 2.1.2 Resource Description Framework .......................................................................................... 6 2.1.3 RDF Schema .......................................................................................................................... 10 2.1.4 SPARQL ................................................................................................................................. 10 2.1.5 Web Ontology Language ...................................................................................................... 11 2.1.6 Simple Knowledge Organization System .............................................................................. 13 2.2 Linked Data .................................................................................................................................. 14 2.2.1 Linked Data Principles .......................................................................................................... 14 2.2.2 The Linked Data Cloud .......................................................................................................... 15 2.2.3 Faceted Search ..................................................................................................................... 17 2.3 Ontology Learning and Matching ................................................................................................ 19 2.3.1 Ontology Learning ................................................................................................................ 20 2.3.2 Ontology Alignment ............................................................................................................. 21 3. Linking Media Metadata .................................................................................................................... 23 3.1 Media Metadata .......................................................................................................................... 23 viii 3.1.1 Metadata Generation and Consumption Lifecycle .............................................................. 23 3.1.2 Content Description Metadata ............................................................................................. 25 3.1.3 Current FESAD Access Points ................................................................................................ 26 3.2 Implementation Details ............................................................................................................... 29 3.2.1 Implementation Overview.................................................................................................... 29 3.2.2 Thesaurus Conversion .......................................................................................................... 32 3.2.3 Selection of Linked Data Vocabulary .................................................................................... 33 3.2.4 Thesaurus Alignment ............................................................................................................ 34 3.2.5 Metadata Conversion ........................................................................................................... 38 3.2.6 Tagging of Media Data.......................................................................................................... 38 3.2.7 Faceted Search Client ........................................................................................................... 39 4. Evaluation .......................................................................................................................................... 43 4.1 Alignment Evaluation .................................................................................................................

Tagging Methods for Linked Media Data

THE HISTORY of GERMANY T H E U N Co Mm E N Do N E Pope , Thro Gh His Talented Uncio , , Made Several Extremely Touch I Ng Representations to the Assembly

Namensliste Limbach Name / Vorname Geburtsjahr Orte

The Ginger Fox's Two Crowns Central Administration and Government in Sigismund of Luxembourg's Realms

Kita!Plus Ferienguide 2020

Lokale Erzählungen St. Wendeler Land 5X100

Mosel Musikfestival 2019

Over K. Flags and Standards of the Napoleonic Wars (1976), OCR.Pdf

Trierer Zeitschrift 2006/07

Eine Wahl, 24 Kandidaten, 535 000 Wähler Wer Zieht Am 24

Tempo Tore Titeljagd 2019/2020

Sonne . Juffer . Lebenslust 02 Kultur Aus Vergangenheit Und Gegenwart Vergangenheit Aus Kultur

Ramstein Gets Hardware at RODEO