Context-Aware Semantic Analysis of Video Metadata

nadine steinmetz CONTEXT-AWARESEMANTICANALYSIS OFVIDEOMETADATA CONTEXT-AWARESEMANTICANALYSIS OFVIDEOMETADATA DISSERTATION zur Erlangung des akademischen Grades Doktor Rerum Naturalium (Dr. rer. nat.) am Fachgebiet Internet-Technologien und -Systeme eingereicht an der Mathematisch-Naturwissenschaftlichen Fakultät der Universität Potsdam von nadine steinmetz Betreuer: Prof. Christoph Meinel Gutachter: Prof. Sören Auer Prof. Steffen Staab Datum der mündlichen Aussprache: 6. Mai 2014 This work is licensed under a Creative Commons License: Attribution - Noncommercial - Share Alike 4.0 International To view a copy of this license visit http://creativecommons.org/licenses/by-nc-sa/4.0/ Published online at the Institutional Repository of the University of Potsdam: URL http://opus.kobv.de/ubp/volltexte/2014/7055/ URN urn:nbn:de:kobv:517-opus-70551 http://nbn-resolving.de/urn:nbn:de:kobv:517-opus-70551 Goethe sagte: Leider lässt sich wahrhafte Dankbarkeit mit Worten nicht ausdrücken. Und ich bin wahrhaft dankbar: meinem Ehemann, meiner Familie, meinen Freunden. ABSTRACT The Semantic Web provides information contained in the World Wide Web as machine-readable facts. In comparison to a keyword-based in- quiry, semantic search enables a more sophisticated exploration of web documents. By clarifying the meaning behind entities, search results are more precise and the semantics simultaneously enable an exploration of semantic relationships. However, unlike keyword searches, a semantic entity-focused search requires that web documents are annotated with semantic representations of common words and named entities. Manual semantic annotation of (web) documents is time-consuming; in response, automatic annotation services have emerged in recent years. These annotation services take continuous text as input, detect important key terms and named entities and an- notate them with semantic entities contained in widely used semantic knowledge bases, such as Freebase or DBpedia. Metadata of video documents require special attention. Semantic analysis approaches for continuous text cannot be applied, because a information of a context in video documents originates from multi- ple sources possessing different reliabilities and characteristics. This thesis presents a semantic analysis approach consisting of a context model and a disambiguation algorithm for video metadata. The context model takes into account the characteristics of video metadata and derives a confidence value for each metadata item. The lower the ambiguity and the higher the prospective correctness, the higher the confidence value. The metadata items derived from the video metadata are analyzed in a specific order from high to low confidence level. Previously analyzed metadata are used as reference points in the context for subsequent disambiguations. The contextually most relevant entity is identified by means of descriptive texts and semantic relationships to the context. The context is created dynamically for each metadata item, taking into account the confidence value and other characteristics. The proposed semantic analysis follows two hypotheses: metadata items of a context should be processed in descendent order of their confidence value, and the metadata that pertains to a context should be limited by content-based segmentation boundaries. The evaluation results support the proposed hypotheses and show increased recall and precision for annotated entities, especially for metadata that originates from sources with low reliability. The algo- rithms have been evaluated against several state-of-the-art annotation approaches. The presented semantic analysis process is integrated into a video analysis framework and has been successfully applied in several projects. vii ZUSAMMENFASSUNG Im Vergleich zu einer stichwortbasierten Suche ermöglicht die semantische Suche ein präziseres und anspruchsvolleres Durchsuchen von (Web)-Dokumenten, weil durch die explizite Semantik Mehrdeutig- keiten von natürlicher Sprache vermieden und semantische Beziehun- gen in das Suchergebnis einbezogen werden können. Eine semantische, Entitäten-basierte Suche geht von einer Anfrage mit festgelegter Bedeutung aus und liefert nur Dokumente, die mit dieser Entität an- notiert sind als Suchergebnis. Die wichtigste Voraussetzung für eine Entitäten-zentrierte Suche stellt die Annotation der Dokumente im Archiv mit Entitäten und Kategorien dar. Eine manuelle Annotation erfordert Domänenwissen und ist sehr zeitaufwendig. Die semantische Annotation von Videodokumenten erfordert besondere Aufmerk- samkeit, da inhaltsbasierte Metadaten von Videos aus verschiedenen Quellen stammen, verschiedene Eigenschaften und Zuverlässigkeiten besitzen und daher nicht wie Fließtext behandelt werden können. Die vorliegende Arbeit stellt einen semantischen Analyseprozess für Video-Metadaten vor. Die Eigenschaften der verschiedenen Meta- datentypen werden analysiert und ein Konfidenzwert ermittelt. Die- ser Wert spiegelt die Korrektheit und die wahrscheinliche Mehrdeu- tigkeit eines Metadatums wieder. Beginnend mit dem Metadatum mit dem höchsten Konfidenzwert wird der Analyseprozess inner- halb eines Kontexts in absteigender Reihenfolge des Konfidenzwerts durchgeführt. Die bereits analysierten Metadaten dienen als Referenz- punkt für die weiteren Analysen. So kann eine möglichst korrekte Analyse der heterogen strukturierten Daten eines Kontexts sicherge- stellt werden. Am Ende der Analyse eines Metadatums wird die für den Kontext relevanteste Entität aus einer Liste von Kandidaten identifiziert – das Metadatum wird disambiguiert. Der Kontext für die Disambiguierung wird für jedes Metadatum anhand der Eigen- schaften und Konfidenzwerte zusammengestellt. Der vorgestellte Ana- lyseprozess ist an zwei Hypothesen angelehnt: Um die Analyseergeb- nisse verbessern zu können, sollten die Metadaten eines Kontexts in absteigender Reihenfolge ihres Konfidenzwertes verarbeitet werden und die Kontextgrenzen von Videometadaten sollten durch Segment- grenzen definiert werden, um möglichst Kontexte mit kohärentem In- halt zu erhalten. Durch ausführliche Evaluationen konnten die gestell- ten Hypothesen bestätigt werden. Der Analyseprozess wurden gegen mehrere State-of-the-Art Methoden verglichen und erzielt verbesserte Ergebnisse in Bezug auf Recall und Precision, besonders für Meta- daten, die aus weniger zuverlässigen Quellen stammen. Der Analy- seprozess ist Teil eines Videoanalyse-Frameworks und wurde bereits erfolgreich in verschiedenen Projekten eingesetzt. viii ACKNOWLEDGMENTS I would like to express my appreciation and thanks to my advisor Prof. Dr. Christoph Meinel who always has an open door for problems – regarding research or personal issues. In particular, I am especially grateful to my supervisor Dr. Harald Sack. He always encour- aged me to enhance my research and achieve high level results. His passion for movies and books contributed to demonstrate the inter- esting parts and fun facts of our research. I also want to thank my colleagues who always contributed to in- tensive discussions about problems and research issues despite their own immense work load. A special thanks goes to Magnus for being the best office mate – in good and bad times. It is often said that a main part of a dissertation is endurance. I would not have been able to endure if there weren’t these great lunch times together with Michaela, Lutz, Matthias, Christian, Philipp, and Patrick (amongst others). They always listened and gave advice when I needed encouragement to endure. The last weeks before the submission of a thesis are always stressful. Therefore, I am very grateful to Meike, Anna, and Philipp for reading my thesis, reviewing it and giving advice for the final touch. ix CONTENTS 1 introduction 1 1.1 Problem Definition 1 1.2 Research Objectives and Challenges 2 1.3 Contributions 4 1.4 Thesis Structure 6 i foundations and related work9 2 video metadata 11 2.1 Video and Its Characteristics Regarding Web Search 11 2.2 Definition and Classification of Metadata 11 2.3 User-Generated Tags 13 2.4 Information Extraction from Videos 15 2.4.1 Structural Segmentation 16 2.4.2 Optical Character Recognition 16 2.4.3 Automatic Speech Recognition 17 2.4.4 Detection of Visual Concepts 18 3 natural language processing 21 3.1 Text Segmentation 21 3.2 Syntactic and Semantic Analysis 22 3.2.1 Part-of-Speech Tagging 22 3.2.2 Named Entity Recognition 24 3.2.3 Word Sense Disambiguation 24 3.2.4 Coreference Analysis 28 3.3 Further Semantic Analysis Methods 29 4 linkeddataandthesemanticweb 31 4.1 Semantic Web Technologies 31 4.1.1 Abstract Models with RDF(S) 32 4.1.2 Ontologies and OWL 35 4.1.3 Query of Facts via SPARQL 36 4.1.4 Exchange Information – Linked Data Cloud 37 4.2 Semantic Search 39 4.3 Automatic Semantic Annotation 41 5 definitionsofcontext 45 5.1 Context in Different Research Fields 45 5.2 Negative Context 48 5.3 Syntax, Pragmatics, and Semantics 49 ii semantic analysis for heterogenous contexts 51 6 context 53 6.1 Context in Semantic Analysis 53 6.2 Context Boundaries 54 6.2.1 Structural Boundaries of Natural Language Text 54 6.2.2 Spatial Boundaries of Multidimensional Infor- mation 55 xi xii contents 6.2.3 Temporal Boundaries of Time-Based Documents 55 6.3 Context Refinement 56 6.3.1 Whitelists vs. Blacklists 57 6.3.2 Aggregation Approaches for Whitelists and Black- lists 57 6.4 Positive vs. Negative Context 59 6.4.1 Positive Context 60 6.4.2 Negative Context 60 6.5 Heterogenous Context 61 7 context model for heterogenous contexts 63 7.1 Contextual Description

Load more