Data Ingestion, Analysis and Annotation Version 2.0
Total Page:16
File Type:pdf, Size:1020Kb
Enhancing and Re-Purposing TV Content for Trans-Vector Engagement (ReTV) H2020 Research and Innovation Action - Grant Agreement No. 780656 Enhancing and Re-Purposing TV Content for Trans-Vector Engagement Deliverable 1.2 (M20) Data Ingestion, Analysis and Annotation Version 2.0 This document was produced in the context of the ReTV project supported by the European Commission under the H2020-ICT-2016-2017 Information & Communication Technologies Call Grant Agreement No 780656 D1.2 Data Ingestion, Analysis and Annotation DOCUMENT INFORMATION Delivery Type Report Deliverable Number 1.2 Deliverable Title Data Ingestion, Analysis and Annotation Due Date M20 Submission Date August 31, 2019 Work Package WP1 Partners CERTH, MODUL Technology, webLyzard Konstantinos Apostolidis, Nikolaos Gkalelis, Evlampios Apostolidis, Alexandros Pournaras, Vasileios Mezaris Author(s) (CERTH), Lyndon Nixon, Adrian M.P. Braşoveanu (MODUL Technology), Katinka Boehm (webLyzard) Reviewer(s) Basil Philipp (Genistat) Data Ingestion, TV Program Annotation, TV Program Keywords Analysis, Social Media Retrieval, Web Retrieval, Video Analysis, Concept Detection, Brand Detection Dissemination Level PU MODUL Technology GmbH Project Coordinator Am Kahlenberg 1, 1190 Vienna, Austria Coordinator: Dr Lyndon Nixon ([email protected]) R&D Manager: Prof Dr Arno Scharl Contact Details ([email protected]) Innovation Manager: Bea Knecht ([email protected]) Page 2 of 65 D1.2 Data Ingestion, Analysis and Annotation Revisions Version Date Author Changes V. Mezaris, 0.1 30/6/19 Created template and ToC K. Apostolidis 0.2 18/7/19 L. Nixon Initial input (Section 2) 0.3 26/7/19 L. Nixon Completed inputs (Sections 2 and 3) K. Apostolidis, 0.4 27/7/19 Added parts for Sections 4.1, 5 and 6 E. Apostolidis 0.5 27/7/19 A. Pournaras Added parts for Section 6 0.6 29/7/19 A. Brasoveanu Started NER/NEL text for Section 3 0.7 30/7/19 N. Gkalelis Input for Section 4.2 0.8 31/7/19 K. Apostolidis Revision of Sections 4 and 5 0.85 1/8/19 K. Boehm Checked and completed SKB descriptions Revised sections and checked all MODUL 0.95 14/8/19 L. Nixon input 1.0 19/8/19 A. Brasoveanu Finished NER/NEL text with evaluations 1.5 22/8/19 B. Philipp QA review K. Apostolidis, 2.0 27/8/19 Post-QA updates L. Nixon Page 3 of 65 D1.2 Data Ingestion, Analysis and Annotation Statement of Originality This deliverable contains original unpublished work except where clearly indicated otherwise. Acknowledgement of previously published material and of the work of others has been made through appropriate citation, quotation or both. This deliverable reflects only the authors’ views and the European Union is not liable for any use that might be made of information contained therein. Page 4 of 65 D1.2 Data Ingestion, Analysis and Annotation Contents 1 Introduction 8 2 Content Collection Across Vectors 9 2.1 Status . .9 2.2 Updates from Previous Report . 11 2.3 Evaluation of Data Quality . 12 2.4 Outlook . 13 3 Annotation and Knowledge Graph Alignment 14 3.1 Status . 14 3.2 Updates from Previous Report . 16 3.2.1 NLP . 16 3.2.2 SKB . 16 3.2.3 Recognyze NER/NEL . 17 3.3 Evaluation of Annotation Quality . 19 3.3.1 Recognyze General Evaluation . 19 3.3.2 Recognyze Media Annotations . 20 3.3.3 Discussion . 22 3.4 Outlook . 23 4 Concept-Based Video Abstractions 25 4.1 Video Fragmentation . 25 4.1.1 Updated Problem Statement and State of the Art . 25 4.1.2 Fast DCNN-based Video Fragmentation . 27 4.1.3 Implementation Details and Use . 28 4.1.4 Results . 28 4.2 Concept-based Annotation . 30 4.2.1 Updated Problem Statement and State of the Art . 30 4.2.2 Extended Concept Pools for Video Annotation . 31 4.2.3 Improved Deep Learning Architecture . 34 4.2.4 Implementation Details and Use . 37 4.2.5 Results . 37 5 Brand Detection 40 5.1 Updated Problem Statement and State of the Art . 40 5.2 Extended Brand Pools for Video Annotation . 43 5.3 ReTV method for Ad Detection . 44 5.4 Implementation Details and Use . 47 5.5 Results . 48 6 Updated Video Analysis Component, Workflow and Extended API 51 6.1 Video Analysis Component Functionalities and Outputs . 51 6.2 Component API and Usage Instructions . 51 6.3 Component Workflow and Integration . 56 6.4 Component Testing and Software Quality Assessment . 56 7 Conclusion and Outlook 58 Page 5 of 65 D1.2 Data Ingestion, Analysis and Annotation EXECUTIVE SUMMARY This deliverable is an update of D1.1. It covers the topics of content collection across vectors, annotation and knowledge graph alignment, concept-based video abstractions, and brand de- tection. Specifically, on content collection across vectors it reports on the current status of the collection process including quality management. Concerning annotation and knowledge graph alignment, it updates on the implementation of an accurate NLP & NEL pipeline for annotat- ing the collected data with respect to keywords and Named Entities and aligning annotaions to our Semantic Knowledge Base (SKB) as well as external knowledge sources. With respect to concept-based video abstractions, it presents a fast learning-based method for video fragmen- tation to shots and scenes, a new deep learning architecture for improved concept detection that is suitable for learning from large-scale annotated datasets such as YouTube8M, and an extended set of concept pools that are supported by the ReTV Video Analysis service. Con- cerning brand detection, this deliverable discusses the newly-adopted method for brand (logo) detection in ReTV, the extended pools of brands that were specified, and a method that was developed for advertisement detection. Finally, this document also presents in detail the ReTV WP1 Video Analysis REST service, which implements the methods for concept-based video abstraction and brand detection in a complete, integration-ready software component. Page 6 of 65 D1.2 Data Ingestion, Analysis and Annotation ABBREVIATIONS LIST Abbreviation Description Application Programming Interface: a set of functions and procedures that API allow the creation of applications which access the features or data of an application or other service. DCNN Deep Convolutional Neural Network: a type of artificial neural network. Electronic Program Guides: menu-based systems that provide users of EPG television with continuously updated menus displaying broadcast programming or scheduling information for current and upcoming programming. Types of method in the Hypertext Transfer Protocol (HTTP). The HTTP HTTP POST method is used to send data to a server to create/update a resource. POST/GET The HTTP GET method is used to request data from a specified resource. Internet Protocol Television: is the delivery of television content over Internet IPTV Protocol (IP) networks. JSON JavaScript Object Notation: a data-interchange format. LSTM Long Short Term Memory networks: a type of recurrent neural network. Multi-task learning: a field of machine learning in which multiple learning tasks MTL are solved at the same time, exploiting commonalities and differences across tasks. NEL Named Entity Linking NER Named Entity Recognition Natural Language Processing: subfield of linguistics, computer science, and NLP artificial intelligence concerned with the interactions between computers and human (natural) languages. Over The Top: content providers that distribute streaming media as a OTT standalone product directly to viewers over the Internet, bypassing telecommunications that traditionally act as a distributor of such content. Resource Description Framework: a method for conceptual description or RDF modeling of information that is implemented in web resources. Representational State Transfer: an architectural style that defines a set of REST constraints to be used for creating web services. RNN Recurrent Neural Network: a type of an artificial neural network. Semantic Knowledge Base: a RDF-based triple store for a knowledge SKB representation of keywords and entities during in annotation of documents Transactional Video on Demand: a distribution method by which customers TVoD pay for each individual piece of video on demand content. Uniform Resource Locator: a reference to a web resource that specifies its URL location on a computer network and a mechanism for retrieving it. Page 7 of 65 D1.2 Data Ingestion, Analysis and Annotation 1. Introduction This deliverable reports on the work done from month M11 to M20. In this reporting pe- riod all tasks, (i.e., T1.1: Content Collection Across Vectors, T1.2: Concept-Based Video Abstractions, T1.3: Brand Detection in Video, and T1.4: Annotation and Knowledge Graph Alignment) were active. Firstly, the content collection was started by MODUL regarding an initial set of TV broadcast- ers and TV series, using the annotation model described in D1.1 of ReTV [60]. Specifically, we both crawl the Web for news and TV related content as well as collect social media content via TV related channels and a list of terms for public posts by users (Section 2). All collected data is annotated according to keywords and Named Entities, supported by our own Knowledge Graph (Section 3). We report on the current status of the data collection and annotation pipeline including quality management for online data sources and an evaluation of our Named Entity annotation accuracy. In the direction of video fragmentation and annotation, CERTH developed new technologies for learning-based video fragmentation to shots and scenes, and a new deep learning architecture for improved concept detection that is suitable for learning from large-scale annotated datasets such as YouTube8M (Section 4). Alongside this, the set of concept pools that are supported for concept detection was expanded. State-of-the-art object detection frameworks were tested for brand recognition. We report (i) the selection of the framework that best suits the project’s requirements, (ii) the extended pools of brands whose detection is supported, and (iii) a method that was developed for advertisement de- tection (Section 5).