University of Southampton Research Repository Eprints Soton
Total Page:16
File Type:pdf, Size:1020Kb
University of Southampton Research Repository ePrints Soton Copyright © and Moral Rights for this thesis are retained by the author and/or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder/s. The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders. When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given e.g. AUTHOR (year of submission) "Full thesis title", University of Southampton, name of the University School or Department, PhD Thesis, pagination http://eprints.soton.ac.uk UNIVERSITY OF SOUTHAMPTON Media Fragment Semantics: The Linked Data Approach by Yunjia Li A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in the Faculty of Physical Sciences and Engineering School of Electronics and Computer Science June 2015 UNIVERSITY OF SOUTHAMPTON ABSTRACT FACULTY OF PHYSICAL SCIENCES AND ENGINEERING SCHOOL OF ELECTRONICS AND COMPUTER SCIENCE Doctor of Philosophy by Yunjia Li In the last few years, the explosion of multimedia content on the Web has made multi- media resources the first class citizen of the Web. While these resources are easily stored and shared, it is becoming more difficult to find specific video/audio content, especially to identify, link, navigate, search and share the content inside multimedia resources. The concept of media fragment refers to the deep linking into multimedia resources, but making annotations to media fragments and linking them to other resources on the Web have yet to be adopted. The Linked Data principles offer guidelines for publishing Linked Data on the Web, so that data can be better connected to each other and ex- plored by machines. Publishing media fragments and annotations as Linked Data will enable the media fragments to be transparently integrated into current Web content. This thesis takes the Linked Data approach to realise the interlinking of media frag- ments to other resources on the Web and demonstrate how the Linked Data can help improve the indexing of media fragments. This thesis firstly identifies the gap between media fragments and Linked Data, and the major requirements that need to be fulfilled to bridge that gap based on the current situation of presenting and sharing multimedia data on the Web. Then, by extending the Linked Data principles, this thesis proposes Interlinking Media Fragment Principles as the basic rationale and best practice of applying Linked Data principles to media fragments. To further automate the me- dia fragments publishing process, a core RDF model and a media fragment enriching framework are designed to link media fragments into the Linked Open Data Cloud via annotations and visualise media fragments on the Web pages. A couple of examples are implemented to demonstrate the use of interlinked media fragments, including the case to enrich YouTube videos with named entities and using media fragments for video classifications. The Media Fragment Indexing Framework is proposed to solve the fundamental problem of media fragments indexing for search engines and, as an example, Twitter is adopted as the source for media fragment annotations. The thesis concludes that applying Linked Data principles to media fragments will bring semantics to media fragments, which will improve the multimedia indexing on a fine-grained level and new research areas can be explored based on the interlinked media fragments. Contents Nomenclature xv Acknowledgements xvii 1 Introduction 1 1.1 Research Questions . .4 1.2 Contributions of the Thesis . .5 1.3 Structure of the Thesis . .8 1.4 Acknowledgements . 11 2 Literature Review 13 2.1 Multimedia Annotations and Synote . 14 2.2 Semantic Web and Linked Data . 16 2.2.1 Linked Data Technologies and Principles . 17 2.2.2 Linked Datasets . 18 2.2.3 Named Entity Recognition and Disambiguation . 20 2.2.4 Linked Data Publishing Patterns . 21 2.2.5 The Infrastructure of Linked Data-Driven Applications . 24 2.3 Media Fragments and Multimedia Annotation Ontologies . 25 2.3.1 W3C Media Fragment URI . 26 2.3.2 Ontology for Media Resource . 27 2.3.3 Other Related Standards and Vocabularies . 28 2.4 Applications of Media Fragments and Semantic Multimedia Annotation . 30 2.5 Video Classification . 34 2.6 Summary . 35 3 Use Case Study and Requirement Analysis 37 3.1 Interlinked Media Fragments Usage on the Web . 38 3.1.1 Use Case 1 (UC1): Sharing Media Fragments in Social Networks . 38 3.1.2 Use Case 2 (UC2): Search Media Fragments Based on User-generated Content . 39 3.1.3 Use Case 3 (UC3): Reasoning Based on Timeline . 41 3.2 Multi-disciplinary Cases . 42 3.2.1 Use Case 4 (UC4): UK Parliamentary Debate . 42 3.2.2 Use Case 5 (UC5):US Supreme Court Argument Analysis . 43 3.2.3 Use Case 6 (UC6):Improving Political Transparency by Analysing Resources in Public Media . 44 3.3 Applying Media Fragments and Linked Data to Existing Applications . 45 v vi CONTENTS 3.3.1 Use Case 7 (UC7): Media Fragments in YouTube . 45 3.3.2 Use Case 8 (UC8): Media Fragments in Facebook . 47 3.3.3 Use Case 9 (UC9): Publishing Media Fragments in the Synote System . 48 3.3.4 Use Case 10 (UC10): Publishing Media Fragments for Edina Me- diahub . 49 3.4 Problem Discussion . 51 3.4.1 Two Types of \Server" . 51 3.4.2 Two Different Concepts of \Video" . 55 3.4.3 Three cases of \Annotations" . 56 3.5 Summary of Requirements . 57 4 Interlinking Media Fragments Principles 63 4.1 Choosing URIs for Media Fragments . 64 4.2 Dereferencing URIs for Media Fragment . 69 4.3 RDF Description for Media Fragments . 74 4.3.1 Choosing Vocabularies to Describe Media Fragments . 75 4.3.2 Describe Media Fragments Annotations . 76 4.4 Media Fragments Interlinking . 78 4.5 Summary of the Principles . 81 5 A Framework for Linking Media Fragments into the Linked Data Cloud 83 5.1 RDF Descriptions for Media Fragments . 84 5.1.1 Core Model for Media Fragment Enrichment . 84 5.1.2 Example Implementation of Core Model for Media Fragment En- richment . 86 5.2 Media Fragment Enriching Framework . 88 5.2.1 Metadata and Timed-Text Retrieval . 90 5.2.2 Named Entity Extraction, Timeline Alignment and RDF Generation 91 5.2.3 Media Fragment Enricher UI . 92 5.2.4 Implementation of Media Fragment Enricher Services . 92 5.3 Creating Enriched YouTube Media Fragments with NERD using Timed Text....................................... 93 5.4 Video Classification using Media Fragments and Semantic Annotations . 96 5.4.1 Dataset . 97 5.4.2 Methodology . 101 5.4.3 Experiments and Discussion . 103 5.4.3.1 Overall Accuracy . 103 5.4.3.2 Breakdown scores per Channel . 104 5.4.4 Summary of the Video Classification . 107 5.5 Summary . 108 6 Visualisation of Media Fragments and Semantic Annotations 111 6.1 Synote Media Fragment Player . 111 6.1.1 Motivation . 112 6.1.2 Implementation and Evaluation of the smfplayer . 113 6.2 Visualise Media Fragments with Annotations . 117 6.3 Summary . 118 CONTENTS vii 7 Automatic Media Fragments Indexing For Search Engines 119 7.1 Media Fragments Indexing Framework . 121 7.1.1 Problem Analysis . 121 7.1.2 Implementation . 122 7.1.3 Evaluation and Discussion . 124 7.2 A Survey of Media Fragment Implementation on Video Sharing Platforms 127 7.2.1 Methodology . 128 7.2.2 Survey Results . 130 7.2.3 Discussion . 132 7.3 Indexing Media Fragments Using Social Media . 134 7.3.1 Workflow of Twitter Media Fragment Indexer . 135 7.3.2 Implementation and Evaluation of Twitter Media Fragment Indexer137 7.4 Summary . 141 8 Conclusions and Future Work 143 8.1 Conclusions . 143 8.2 Future Work . 147 8.2.1 Media Fragment Semantics in Debate Analysis . 147 8.2.2 Extension of Media Fragment Indexing Framework . 149 8.2.3 Visualisation of Media Fragments and Annotations On Second Screen . 150 8.2.4 Video Classification with Media Fragments . 151 8.2.5 Formal Descriptions of Interlinking Media Fragments Principles . 151 A Namespaces Used in the Thesis 153 B Results of Twitter Media Fragment Monitor Using Twitter Firehose API 155 C The Formal Definition of Media Fragment Publishing 157 C.1 RDF Documents as Labelled Directed Multigraph . 157 C.2 A Mathematics Description for Linked Data Principles . 158 C.3 Formalisation of Publishing Media Fragments as Linked Data . 161 Bibliography 163 List of Figures 1.1 The gap between raw multimedia data and interlinked multimedia anno- tations . .3 1.2 Relationship Between Different Chapters in the Thesis . 10 2.1 Synote Object Model . 16 2.2 Linked Data Cloud at September 2011 . 19 2.3 Linked Data Publishing Patterns (Heath and Bizer, 2011) . 22 2.4 Concept of a Linked Data-driven Web application (Hausenblas, 2009) . 24 3.1 An example of interlinking media fragments to other resources . 40 3.2 Addressing relationships between media fragments using Time Ontology . 41 3.3 Screenshot of Synote System . 48 3.4 Edina Mediahub's architecture . 50 3.5 The relationship between Player Server and Multimedia Host Server . 53 3.6 The dependencies among requirements . 60 4.1 Use Content Negotiation and 303 Redirect to dereference Media Fragments 72 4.2 Media Fragments Interlinking and Publishing Patterns . 79 5.1 The graph depicts the MediaFragment serialization and how an Entity and its corresponding text are attached to a MediaFragment though an Annotation.