Creating a Novel Semantic Video Search Engine Through Enrichment Textual and Temporal Features of Subtitled Youtube Media Fragments

Creating a Novel Semantic Video Search Engine Through Enrichment Textual and Temporal Features of Subtitled YouTube Media Fragments Babak Farhadi M. B. Ghaznavi-Ghoushchi Department of Computer engineering Dept. of EE. School of Engineering University of Tehran, International Campus Shahed University Kish Island, Iran Tehran, Iran [email protected] ghaznavi AT shahed.ac.ir Abstract— Semantic video Annotation is an active research zone within the field of multimedia content understanding. With the stable increase of videos published on the famous video sharing platforms such as YouTube, more efforts are spent to automatically annotate these videos. In this paper, we propose a novel framework to annotating subtitled YouTube videos using both textual features such as all of portions extracted from web natural language processors in relation to subtitles, and temporal features such as the duration of the media fragments where particular entities are spotted. We implement SY-VSE (Subtitled YouTube Video Search Engine) as an efficient framework to cruising on the subtitled YouTube videos resident in the Linked Open Data (LOD) cloud. For realizing this purpose, we propose Unifier Module of Natural Language Processing (NLP) Tools Results (UM-NLPTR) for extracting main portions of the 10 NLP web tools from subtitles associated to YouTube videos in order to generate media fragments annotated with Figure 1. Structure of subtitled Youtube keyword-based search. resources from the LOD cloud. Then, we propose Unifier Module of Popular API’s Results (UM-PAR) containing the shortcomings caused by e.g. homonyms (see Figure 1). seven favorite web APIs to boost results of Named Entities Optimal results are possible through the analysis of the (NE) obtained from UM-NLPTR. We will use dotNetRDF as available textual metadata with NLP web tools and popular a powerful and flexible API for working with Resource web APIs, especially given the availability of subtitles on Description Framework (RDF) and SPARQL in .Net YouTube videos. environments. B. Semantic multimedia Annotations and Linked Data Index Terms — Subtitled YouTube video, NLP web tools, textual metadata, semantic web, video Annotation, video Role of semantic web technologies is to make implicit search. meaning of content explicit by providing suitable metadata annotations based on formal knowledge representations. In I. INTRODUCTION this area, Linked Data (LD) means to expose, share and connect pieces of data, information and knowledge on the For better understanding the discussions located in semantic web using Uniform Resource Identifiers (URI) introduction section, we have divided them in six for identification of resources and RDF as a structured data subsections that can be summarized as followings: format. It is creating the relationships from the data to other sources on the Web. These data sets are not only A. Influence of Natural language processors on on-line accessible by human beings, but also readable for video platforms machines. LOD can aims to publish and connect open but Nowadays, on-line video-sharing platforms, especially heterogeneous databases by applying the LD principles. YouTube shows that video has become the medium of The aggregation of all LOD data set is denoted as LOD- choice for many people interchanging via the net. On the Cloud. In this direction, the term “media fragment” refers other hand, the amazing increase of online video content to the inside content of multimedia objects, such as a confronts the consuming user with an indefinite amount of certain zone within an image, or a five-minute segment data, which can only be accessed with advanced within a one-hour video. Most research about media multimedia search and management technologies in order fragments focus on exposing the closed data, such as tracks to retrieve the few needles from the giant haystack. The and video segments, within the multimedia file or the host most on-line video search engines provide a keyword- server using semantic Web and LD technologies. based search, where lexical ambiguity of natural language often leads to imprecise and defective results. For instance, YouTube supports a keyword-based search within the textual metadata provided by the users, accepting all the C. Achieving to indexable semantic data on on-line F. Overview on main targets in our research video search engines In this paper by utilizing a rich data model based on the To make use of the semantic web information, search best NLP web tools, the popular web APIs, ontologies and engines need to index semantic data. Generally, this can be RDF, we have developed a web application that called SY- achieved by storing the data in triple stores. Triple stores VSE (Proposed). Previous works of near to our work are Database Management Systems (DBMS) for data haven’t used all the portions and ideas of realized in this modeled using RDF. They store RDF triples and are paper yet. They have used NLP web tools only for detect queried using SPARQL endpoint. For turn a non-semantic NEs (in limited main types). However, for example, video search engine into a semantic video search engine, Alchemy API main portions are included in concept the already existing keyword-based textual metadata has to tagging, entity extraction, keyword extraction, relation be mapped to semantic web entities. The most challenging extraction, sentiment analysis, text categorization and etc. problem on mapping data to semantic web entities is the Previous works behavior against the OpenCalais and other existence of ambiguous names and thus resulting in a set of the best NLP web tools is in the same way, too. By using entities, which have to be disambiguated. One of the the popular web APIs such as Amazon web service, important our goal in this paper is mapping of textual and Google map, wolfram alpha, Google Book, Library of temporal features of YouTube subtitled videos to LOD Congress, Internet Movie Database (IMDB) and entities. ThingISBN, we can have really boosted semantically meaning of NEs. However, none of the preceding works D. Role of video transcripts in semantic video indexing haven’t used the popular web API’s in their approaches. In addition in foregoing web applications, user can’t use In our opinion, one of the most challenging approaches SPARQL endpoint for query from RDF datasets. There concerning semantic video indexing, is based on the isn’t any SPARQL GUI for user queries towards triple extraction of semantics from its subtitles. Subtitles carried stores and also there isn’t any RDF syntaxes outputs (e.g. such information through natural language sentences. The NTriples, Turtle, TriG, TriX, RDFa and etc.) for save into YouTube video content is described as well by metadata storage disk, analysis by user and send query to them referring to the entire video as also by information through SPARQL GUI. SY-VSE (Proposed) has assigned to a distinct time position within the video. Our eliminated all the problems with the previous web intention is entity mapping of both kinds of metadata. applications (see Figure 2). These both kinds include YouTube subtitled Video Metadata (e.g. title, description, rating, publish In this paper, we propose UM-NLPTR for extracting information, thumbnail view, category and etc.) and its NEs and other main portions of 10 NLP web tools from attached transcripts. In addition since textual features subtitles associated to YouTube videos in order to generate containing a higher level semantic concept are easier to media fragments annotated with resources from the LOD understand, they are very useful in video classification. cloud. Then, we propose UM-PAR containing seven favorite web APIs to boost results of NEs obtained from E. Relationships between video transcripts, natural UM-NLPTR. We use dotNetRDF as a powerful and language processors and named entities flexible API for working with RDF and SPARQL in .Net We must evaluate and analyze subtitle in order to environments. Our contribution is offering a new calculate the relatedness between textual inputs. For this integrated method to extracting NEs and other principal purpose, we propose using NLP web tools to discover portions of the best NLP web tools from UM-NLPTR in relatedness between words that possibly represent the same both text/URI area and for linking media fragments to the concepts. Tools for analyzing words and sentences within LOD-cloud using NEs and main portions that extracted NLP include part-of-speech (POS) tagging and word sense from subtitled YouTube videos, boosting resulted NEs by disambiguation (WSD). These tools locate and discover UM-PAR, design a user-friendly and robust user interface the sense and concept that the word represents. NLP is not for browsing the enriched subtitled YouTube videos and only about understanding, but also the ability to make an ability for interact with RDF Generator Module manipulate data, and in some cases, produce answers. NLP (RDF-GM), triple store and SPARQL GUI through user is as well strongly connected with Information Retrieval and mapping YouTube data API keyword-based textual (IR) for searching and retrieving information by using metadata and its attached transcripts to semantic web some of these concepts. A word can be classified into entities. several linguistic word types. These could, for instance, be hypernyms, homonyms and hyponyms. A hypernym is a II. RELATED WORK less specific instance than the original word. Hyponyms are the opposite, a more specific instance of the given A. Overview on previous works related to our proposed word. Homonyms are equal words or synonyms. In order approach to maximize the usability of NLP, we need to have defined Many applications have already published multimedia ontology. NE extraction is an exhaustive task in the NLP and Annotation as LD, which offers experience for us on field that has conceded numerous services gaining multimedia resource publishing. The LEMO multimedia popularity in the Semantic Web community for extracting Annotation framework provides a unified model to knowledge from web documents.

Load more