Enriching Unstructured Media Content About Events to Enable Semi-Automated Summaries, Compilations, and Improved Search by Leveraging Social Networks
Total Page:16
File Type:pdf, Size:1020Kb
Enriching Unstructured Media Content About Events to Enable Semi-Automated Summaries, Compilations, and Improved Search by Leveraging Social Networks Thomas Steiner Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya A thesis submitted for the degree of Philosophiæ Doctor (PhD) February 2014 1st Advisor: Joaquim Gabarró Vallés Universitat Politècnica de Catalunya, Barcelona, Spain 2nd Advisor: Michael Hausenblas Digital Enterprise Research Institute, Galway, Ireland MapR Technologies, San Jose, CA, USA ii Abstract (i) Mobile devices and social networks are omnipresent Mobile devices such as smartphones, tablets, or digital cameras together with social networks enable people to create, share, and consume enormous amounts of media items like videos or photos both on the road or at home. Such mobile devices—by pure definition—accompany their owners almost wherever they may go. In consequence, mobile devices are omnipresent at all sorts of events to capture noteworthy moments. Exemplary events can be keynote speeches at conferences, music concerts in stadiums, or even natural catastrophes like earthquakes that affect whole areas or countries. At such events—given a stable network connection—part of the event-related media items are published on social networks both as the event happens or afterwards, once a stable network connection has been established again. (ii) Finding representative media items for an event is hard Common media item search operations, for example, searching for the of- ficial video clip for a certain hit record on an online video platform can in the simplest case be achieved based on potentially shallow human-generated metadata or based on more profound content analysis techniques like optical character recognition, automatic speech recognition, or acoustic fingerprint- ing. More advanced scenarios, however, like retrieving all (or just the most representative) media items that were created at a given event with the ob- jective of creating event summaries or media item compilations covering the event in question are hard, if not impossible, to fulfill at large scale. The central research question of this thesis can be formulated as follows. (iii) Central research question “Can user-customizable media galleries that summarize given events be created solely based on textual and multimedia data from social networks?” (iv) Core contributions In the context of this thesis, we have developed and evaluated a novel interac- tive application and related methods for media item enrichment, leveraging social networks, utilizing the Web of Data, techniques known from Content- based Image Retrieval (CBIR) and Content-based Video Retrieval (CBVR), and fine-grained media item addressing schemes like Media Fragments URIs to provide a scalable and near realtime solution to realize the abovemen- tioned scenario of event summarization and media item compilation. (v) Methodology For any event with given event title(s), (potentially vague) event location(s), and (arbitrarily fine-grained) event date(s), our approach can be divided in the following six steps. 1. Via the textual search APIs (Application Programming Interfaces) of different social networks, we retrieve a list of potentially event-relevant microposts that either contain media items directly, or that provide links to media items on external media item hosting platforms. 2. Using third-party Natural Language Processing (NLP) tools, we rec- ognize and disambiguate named entities in microposts to predetermine their relevance. 3. We extract the binary media item data from social networks or media item hosting platforms and relate it to the originating microposts. 4. Using CBIR and CBVR techniques, we first deduplicate exact-duplicate and near-duplicate media items and then cluster similar media items. 5. We rank the deduplicated and clustered list of media items and their related microposts according to well-defined ranking criteria. 6. In order to generate interactive and user-customizable media galleries that visually and audially summarize the event in question, we compile the top-n ranked media items and microposts in aesthetically pleasing and functional ways. To Laura, Lena, Emma, and Nil. Acknowledgements Personal Acknowledgements: First and foremost, I would like to thank my wife Laura for her support, understanding, patience, and energy during my time as a PhD student, and simply for being at my side. Without you, I would not be where I am today. I wholeheartedly thank my two advisors Joaquim Gabarró Vallés and Michael Hausenblas for their guidance, helpful comments, informative pointers, and especially for their constructive criticisms. The areas of research that I have tackled in this thesis are still young and sometimes uncharted territory. I am very thankful that the two of you have ventured on the undertaking of leading me through this thesis. I deeply appreciate all the review comments, thoughts, challenging ques- tions, and, last not least, the LATEX help of my dear friend and research col- league Ruben Verborgh. It was, is, and will be an honor to work with you. My warm thanks also go to Raphaël Troncy and his team at EURECOM Sophia Antipolis in France who have helped shape some of the ideas pre- sented in this thesis. I would like to thank my former and current managers at Google, namely N. Kryvossidis, R. Ashley, C. Bouchère, and I. Sassarini for their support for my thesis. A lot of valuable input for my thesis came in via social networks. My thanks go out to everyone I have interacted with around the hashtag #TomsPhD on Twitter, Google+, and Facebook. Finally, I sincerely thank my parents and my brother who have made me the person I am today. My parents have taught me not to go for the easy choices, even if at times they may seem tempting, but instead to try harder and never give up. This thesis also is for you. Formal Acknowledgements: The research presented in this doctoral thesis was partially supported by the European Commission under Grant No. 248296 with the European Union (FP7 ICT STReP) project I-SEARCH . To cite this document: @phdthesis{steiner2013thesis, author = {Thomas Steiner}, title = {Enriching Unstructured Media Content About Events to Enable Semi-Automated Summaries, Compilations, and Improved Search by Leveraging Social Networks}, year = {2013}, school = {Universitat Polit\`{e}cnica de Catalunya} } Copyright and License: ©2013 Thomas Steiner Licensed under the Creative Commons Attribution-ShareAlike 3.0 License. http://creativecommons.org/licenses/by-sa/3.0/ Contents List of Figures xiii List of Listings xv List of Tables xix Glossary xxi 1 Event Summarization Challenge 1 1.1 Motivation and Problem Statement . 1 1.2 Research Question and Hypothesis . 2 1.3 Approach . 2 1.4 Related Work . 3 1.5 Contributions . 6 1.5.1 Social Network Multimedia and Data Analysis . 6 1.5.2 Application of Semantic Web and NLP Techniques . 6 1.5.3 Video Content and Metadata Analysis . 6 1.5.4 Event Detection . 7 1.5.5 Standardization and Specifications . 7 1.5.6 Crowdsourcing . 7 1.5.7 Studies . 7 1.5.8 Multimodal Search Engines . 7 1.5.9 Web Service Description . 7 1.5.10 Others . 8 1.6 Thesis Structure . 8 v CONTENTS 2 Semantic Web Technologies 21 2.1 The World Wide Web and Semantics . 21 2.2 The Semantic Web . 22 2.2.1 The Non-Semantic Web . 22 2.2.2 Structured Data on the Web . 23 2.2.3 The Structured Knowledge Base DBpedia . 24 2.2.4 Semantics in HTML Versions 4.01 and 5 . 25 2.2.5 Structured Data Beyond Pure HTML . 27 Microformats . 27 Microdata . 29 2.3 Resource Description Framework (RDF) . 30 2.3.1 Triples as a Data Structure . 30 2.3.2 Important RDF Serialization Syntaxes . 31 RDF Sample Graph . 31 The RDF/XML Syntax . 31 The N-Triples Syntax . 32 The Turtle Syntax . 33 The Notation3 Syntax . 33 The RDFa Syntax . 34 2.4 SPARQL: Semantic Web Query Language . 35 2.4.1 The Vision of the Web as a Giant Single Database . 35 2.4.2 Different SPARQL Query Variations . 36 SELECT . 37 DESCRIBE . 37 ASK . 37 CONSTRUCT . 37 2.5 Linked Data . 38 2.5.1 The Linked Data Principles . 39 2.5.2 The Linking Open Data Cloud Diagram . 39 2.6 Conclusions . 40 vi CONTENTS 3 Social Networks 47 3.1 Definition of Terms Used in this Thesis . 48 3.2 Description of Popular Social Networks . 49 3.2.1 Facebook . 49 3.2.2 Flickr . 49 3.2.3 Google+ ................................... 50 3.2.4 Img.ly . 50 3.2.5 Imgur . 50 3.2.6 Instagram . 51 3.2.7 Lockerz . 51 3.2.8 MobyPicture . 51 3.2.9 Myspace . 52 3.2.10 Photobucket . 52 3.2.11 Twitpic . 52 3.2.12 Twitter . 53 3.2.13 Yfrog . 53 3.2.14 YouTube . 54 3.3 Decentralized Social Networks . 54 3.4 Classification of Social Networks . 55 3.5 Conclusions . 56 4 Micropost Annotation 61 4.1 Introduction . 61 4.1.1 Direct Access to Micropost Raw Data . 61 4.1.2 Browser Extensions to Access Microposts Indirectly . 62 4.1.3 Data Analysis Flow . 62 4.1.4 A Wrapper API for Named Entity Disambiguation . 63 4.2 Related Work . 63 4.2.1 Named Entity Disambiguation Using Lexical Databases . 64 4.2.2 Named Entity Disambiguation Using Semantic Coherence . 64 4.2.3 Named Entity Disambiguation Using Dictionaries . 64 4.2.4 Disambiguation Using Corpuses and Probability . 65 4.2.5 Disambiguation Using Search Query Logs . 65 vii CONTENTS 4.2.6 Combining Different Web Services and Provenance . 65 4.3 Structuring Unstructured Textual Data . 66 4.3.1 Natural Language Processing Services . 66 OpenCalais . 67 AlchemyAPI . 68 Zemanta . 68 DBpedia Spotlight . 69 4.3.2 Machine Translation . 69 4.3.3 Part-of-Speech Tagging . 69 4.4 Consolidating Named Entity Disambiguation Results .