Creating a Novel Semantic Video Through Enrichment Textual and Temporal Features of Subtitled YouTube Media Fragments

Babak Farhadi M. B. Ghaznavi-Ghoushchi Department of Computer engineering Dept. of EE. School of Engineering University of Tehran, International Campus Shahed University Kish Island, Iran Tehran, Iran [email protected] ghaznavi AT shahed.ac.ir

Abstract— Semantic video Annotation is an active research zone within the field of multimedia content understanding. With the stable increase of videos published on the famous video sharing platforms such as YouTube, more efforts are spent to automatically annotate these videos. In this paper, we propose a novel framework to annotating subtitled YouTube videos using both textual features such as all of portions extracted from web natural language processors in relation to , and temporal features such as the duration of the media fragments where particular entities are spotted. We implement SY-VSE (Subtitled YouTube Video Search Engine) as an efficient framework to cruising on the subtitled YouTube videos resident in the Linked Open Data (LOD) cloud. For realizing this purpose, we propose Unifier Module of Natural Language Processing (NLP) Tools Results (UM-NLPTR) for extracting main portions of the 10 NLP web tools from subtitles associated to YouTube videos in order to generate media fragments annotated with Figure 1. Structure of subtitled Youtube keyword-based search. resources from the LOD cloud. Then, we propose Unifier Module of Popular API’s Results (UM-PAR) containing the shortcomings caused by e.g. homonyms (see Figure 1). seven favorite web APIs to boost results of Named Entities Optimal results are possible through the analysis of the (NE) obtained from UM-NLPTR. We will use dotNetRDF as available textual with NLP web tools and popular a powerful and flexible API for working with Resource web APIs, especially given the availability of subtitles on Description Framework (RDF) and SPARQL in .Net YouTube videos. environments. B. Semantic multimedia Annotations and Linked Data Index Terms — Subtitled YouTube video, NLP web tools, textual metadata, semantic web, video Annotation, video Role of semantic web technologies is to make implicit search. meaning of content explicit by providing suitable metadata annotations based on formal knowledge representations. In I. INTRODUCTION this area, Linked Data (LD) means to expose, share and connect pieces of data, information and knowledge on the For better understanding the discussions located in semantic web using Uniform Resource Identifiers (URI) introduction section, we have divided them in six for identification of resources and RDF as a structured data subsections that can be summarized as followings: format. It is creating the relationships from the data to other sources on the Web. These data sets are not only A. Influence of Natural language processors on on-line accessible by human beings, but also readable for video platforms machines. LOD can aims to publish and connect open but Nowadays, on-line video-sharing platforms, especially heterogeneous databases by applying the LD principles. YouTube shows that video has become the medium of The aggregation of all LOD data set is denoted as LOD- choice for many people interchanging via the net. On the Cloud. In this direction, the term “media fragment” refers other hand, the amazing increase of online video content to the inside content of multimedia objects, such as a confronts the consuming user with an indefinite amount of certain zone within an image, or a five-minute segment data, which can only be accessed with advanced within a one-hour video. Most research about media and management technologies in order fragments focus on exposing the closed data, such as tracks to retrieve the few needles from the giant haystack. The and video segments, within the multimedia file or the host most on-line video search engines provide a keyword- server using semantic Web and LD technologies. based search, where lexical ambiguity of natural language often leads to imprecise and defective results. For instance, YouTube supports a keyword-based search within the textual metadata provided by the users, accepting all the C. Achieving to indexable semantic data on on-line F. Overview on main targets in our research video search engines In this paper by utilizing a rich data model based on the To make use of the semantic web information, search best NLP web tools, the popular web APIs, ontologies and engines need to index semantic data. Generally, this can be RDF, we have developed a web application that called SY- achieved by storing the data in triple stores. Triple stores VSE (Proposed). Previous works of near to our work are Database Management Systems (DBMS) for data haven’t used all the portions and ideas of realized in this modeled using RDF. They store RDF triples and are paper yet. They have used NLP web tools only for detect queried using SPARQL endpoint. For turn a non-semantic NEs (in limited main types). However, for example, video search engine into a semantic video search engine, Alchemy API main portions are included in concept the already existing keyword-based textual metadata has to tagging, entity extraction, keyword extraction, relation be mapped to semantic web entities. The most challenging extraction, sentiment analysis, text categorization and etc. problem on mapping data to semantic web entities is the Previous works behavior against the OpenCalais and other existence of ambiguous names and thus resulting in a set of the best NLP web tools is in the same way, too. By using entities, which have to be disambiguated. One of the the popular web APIs such as Amazon web service, important our goal in this paper is mapping of textual and map, wolfram alpha, Google Book, Library of temporal features of YouTube subtitled videos to LOD Congress, Movie Database (IMDB) and entities. ThingISBN, we can have really boosted semantically meaning of NEs. However, none of the preceding works D. Role of video transcripts in semantic video indexing haven’t used the popular web API’s in their approaches. In addition in foregoing web applications, user can’t use In our opinion, one of the most challenging approaches SPARQL endpoint for query from RDF datasets. There concerning semantic video indexing, is based on the isn’t any SPARQL GUI for user queries towards triple extraction of semantics from its subtitles. Subtitles carried stores and also there isn’t any RDF syntaxes outputs (e.g. such information through natural language sentences. The NTriples, Turtle, TriG, TriX, RDFa and etc.) for save into YouTube video content is described as well by metadata storage disk, analysis by user and send query to them referring to the entire video as also by information through SPARQL GUI. SY-VSE (Proposed) has assigned to a distinct time position within the video. Our eliminated all the problems with the previous web intention is entity mapping of both kinds of metadata. applications (see Figure 2). These both kinds include YouTube subtitled Video Metadata (e.g. title, description, rating, publish In this paper, we propose UM-NLPTR for extracting information, thumbnail view, category and etc.) and its NEs and other main portions of 10 NLP web tools from attached transcripts. In addition since textual features subtitles associated to YouTube videos in order to generate containing a higher level semantic concept are easier to media fragments annotated with resources from the LOD understand, they are very useful in video classification. cloud. Then, we propose UM-PAR containing seven favorite web APIs to boost results of NEs obtained from E. Relationships between video transcripts, natural UM-NLPTR. We use dotNetRDF as a powerful and language processors and named entities flexible API for working with RDF and SPARQL in .Net We must evaluate and analyze subtitle in order to environments. Our contribution is offering a new calculate the relatedness between textual inputs. For this integrated method to extracting NEs and other principal purpose, we propose using NLP web tools to discover portions of the best NLP web tools from UM-NLPTR in relatedness between words that possibly represent the same both text/URI area and for linking media fragments to the concepts. Tools for analyzing words and sentences within LOD-cloud using NEs and main portions that extracted NLP include part-of-speech (POS) tagging and word sense from subtitled YouTube videos, boosting resulted NEs by disambiguation (WSD). These tools locate and discover UM-PAR, design a user-friendly and robust user interface the sense and concept that the word represents. NLP is not for browsing the enriched subtitled YouTube videos and only about understanding, but also the ability to make an ability for interact with RDF Generator Module manipulate data, and in some cases, produce answers. NLP (RDF-GM), triple store and SPARQL GUI through user is as well strongly connected with and mapping YouTube data API keyword-based textual (IR) for searching and retrieving information by using metadata and its attached transcripts to semantic web some of these concepts. A word can be classified into entities. several linguistic word types. These could, for instance, be hypernyms, homonyms and hyponyms. A hypernym is a II. RELATED WORK less specific instance than the original word. Hyponyms are the opposite, a more specific instance of the given A. Overview on previous works related to our proposed word. Homonyms are equal words or synonyms. In order approach to maximize the usability of NLP, we need to have defined Many applications have already published multimedia ontology. NE extraction is an exhaustive task in the NLP and Annotation as LD, which offers experience for us on field that has conceded numerous services gaining multimedia resource publishing. The LEMO multimedia popularity in the Semantic Web community for extracting Annotation framework provides a unified model to knowledge from web documents. These services are annotate media fragments while the annotations are generally organized as pipelines, using dedicated APIs and enriched with contextually relevant information from the different taxonomy for extracting, classifying and LOD-cloud. In LEMO, media fragments are published disambiguating NLP main portions. using MPEG-21 vocabulary. LEMO has to convert

Figure 2. SY-VSE (proposed) framework. existing video files to MPEG compatible version and B. Analysis the closest work related to our proposed stream them from LEMO server. LEMO also derived a approach core Annotation schema from “Annotea Annotation In [7] proposed an approach using 10 NLP tools based Schema” in order to link annotations to media fragments on NERD module in [8]. They only use the portion of identifications [1]. ‘Yovisto.com’ host large amount of entity extraction of NLP web tools. In their demo, user recordings of academic lectures and conferences for users can’t use advantages RDF syntaxes outputs, SPARQL GUI to search in a content-based manner. Some researchers and triple store module. The most output result of their augment Yovisto open academic video search platform by demo isn’t with subtitled YouTube video, and they haven’t publishing the database containing video and annotations utilized the main keyword-based textual metadata of as LD [2]. Yovisto uses Virtuoso server in [3] to publish YouTube data API (see Figure 3). NERD module use the videos and annotations in the database and MPEG-7, results of 10 NLP web tools separately and it doesn’t Core Ontology of Multimedia (COMM) in [4] to describe integrate NE recognition results of 10 NLP Tools. On the multimedia data. It provides both automatic video other hand, NERD uses these 10 NLP Tools only for NE annotations based on video analysis and collaborative user- recognition about the 10 main types. For this propose, generated annotations, which are further linked to entities NERD has answers of chopped and with restricted in the LOD-cloud with the objective to improve the search categories of dependent on NEs. It is ignored the most ability of videos. important portions of the best NLP tools include concept tagging, content scraping, entity extraction, keyword SemWebVid automatically generates RDF video extraction, relation extraction, sentiment analysis, text descriptions using their closed captions. The captions are categorization, fact detection and etc. (see Figure 4). As we analyzed by 3 NLP web Tools (AlchemyAPI, OpenCalais mentioned the most challenging problem on mapping and Zemanta) but chunked into blocks, which make loose textual and temporal features of YouTube subtitled videos the context for the NLP web tools [5]. In [6] has shown to LOD entities is the existence of ambiguous names and how events can be detected on-the-fly through crowd therefore, resulting in a set of entities, which have to be sourcing textual, visual, and behavioral analysis in disambiguated. Using all the main portions of the popular YouTube videos, at scale. They defined three types of NLP web tools and boosting resulted NEs through popular events: visual events in the sense of shot changes, web APIs and also utilizing by RDF-GM and attached occurrence events in the sense of the appearance of an NE, modules of related to it, can be eventuated in an enriched and interest-based events in the sense of purposeful in- entity set. In [7] used NLP tools, including AlchemyAPI, video navigation by users. In occurrence event detection DBpedia Spotlight, Evri, Extractiv, OpenCalais, Saplo, process they analyzed the available video metadata using Wikimeta, Yahoo! Content Extraction and Zemanta only NLP techniques, as outlined in [5]. The detected NEs are for entity recognition. In this paper, we used all of main presented to the user in a list, and upon click via a portions of NLP tools are listed, but the difference is that timeline-like user interface allow for jumping into one of we utilized TextRazor NLP web tool instead of the Evri the shots where the NE occurs. NLP web tool. TextRazor NLP web tool can Identify and disambiguate millions of NEs including people, places, companies and thousands of other types. It has special

Figure 3. Example of the whole resulted page and architecture in [7]. ability in Extracting subject-action-object relations, modules shift towards RDF-GM and its affiliated portions properties and typed dependencies between entities. for cruising on the LOD-cloud by users. We will describe TextRazor returns contextual synonyms or entailments, in full the data flow process, our proposed modules and words that are semantically implied by the given context, SY-VSE (Proposed) user interface. even though they aren't explicitly mentioned. Generally, in comparison with the Evri NLP web tool that it’s out of the A. Data flow Proccess access right now, TextRazor is a complete, integrated Our data flow process is shown in Figure 5. It can be textual analysis solution. It uses state-of-the-art machine summarized as follows: 1) User must enter desired learning and NLP techniques together with a keyword or video id to receive keyword-based textual comprehensive knowledgebase of real-life facts to help metadata of available by YouTube data API. 2) Content of parse and disambiguate transcripts with industry-leading requested keyword or video id is sent by a standard REST accuracy and speed. protocol using HTTP GET requests to YouTube data API. 3) The main keyword-based textual metadata redirected by YouTube data API to a user. All the videos are with subtitles. Returned textual metadata is containing video title, video ID, description, publishing date, updating date, author, recording location, rating average, five snapshots, user comments and etc. 4) In here; user has an ability to pick interesting metadata. Therefore, keyword-based textual metadata of on LOD-cloud is based-on user choice. Then selected metadata with connected video id redirected to semantic Annotation. 5) To get related YouTube subtitled video transcript (caption) information, SY-VSE (Proposed) back-end sends an automatic REST request to YouTube data API. 6) As this time YouTube data API returns related video caption information. It contains subtitle text, start time of media fragment and its duration. Figure 4. The NERD module used in [8]. We implemented an optimized JavaScript code so that by click on the media fragment number, user jump to related Compared with [7] and according to our experiments, start time. Then from start time to end time of related we have proper ability in using all the portions of the best media fragment has highlighted. 7) User can have an NLP tools in both text/URI area and by UM-NLPTR. In advanced search on the whole subtitle and jump to related addition, UM-PAR is an expert module in boosting media fragment. 8) To apply the best NLP techniques, resulted NEs. Furthermore, RDF-GM, triple store and selected timed text send to UM-NLPTR. In here we used SPARQL GUI in our paper play an important role in 10 NLP web tools for accomplish an appropriate result. representing enriched results to cruising on the LOD-cloud Apart from the NE extraction and disambiguation, UM- by users. NLPTR using the other main portions of 10 NLP Tools. 9) Unified and enriched results of UM-NLPTR send towards the user interface. 10) Unified and enriched results of UM- III. FRAMEWORK OVERVIEW NLPTR send towards RDF-GM, too. We propose a novel Figure 5 shows the modular implementation of our RDF Generator Module for the integration of the main proposed framework. In a nutshell, the framework enables keyword-based textual metadata, media fragments and to pick interesting YouTube subtitled video metadata by unified results of UM-NLPTR and UM-PAR that could be users and retrieving related transcript information from the reused for various online media; 11) according to all the YouTube data API to extract NEs and other main portions portions of resulted in UM-NLPTR, based-on selected through integrator modules. Then results of integrator video id and media fragment number, RDF-GM generates

Figure 5. The basic data flow process in proposed SY-VSE. suitable RDF syntaxes outputs (e.g. NTriples, Turtle, TriG, web APIs is returned. For instance, in Wolfram Alpha TriX, RDFa and etc.) to save them into the user storage API, input interpretation, basic information, image, disk. By default for every resulted portion of exist in UM- timeline, notable facts, physical characteristics, estimated NLPTR, RDF-GM generates correct RDF syntaxes outputs net worth, Wikipedia’s page hits history about the “Mark of NTriples, Turtle and RDF/XML. But this is based-on Zukerberg” NE is shown. 14) Unified and enriched results user selection. We used dotNetRDF library for this of UM-PAR send towards the user interface. 15) SPARQL purpose, and it supports from many RDF syntaxes outputs. is the standard query language for the Semantic Web and Desired output formats saved on the user storage disk can be used to query over large volumes of RDF data. (with the portion name instead of folder name and media dotNetRDF provides support for querying over local in- fragment number-video id instead of the file name). 12) In memory data using its own SPARQL implementation. We here, for every generated portion, RDF-GM generates used SPARQL GUI for testing out SPARQL queries on related TriG format to save or load into the in-memory arbitrary data sets which user created by loading in RDF- triple store. By this method, user can make proper GM (or triple store ) and/or remote URIs. A user can easily SPARQL query towards the in-memory triple store. 13) query files that have been stored into triple store (with After displaying unified results of UM-NLPTR to user and TriG format) and other RDF syntaxes of resulted by RDF- entering into RDF-GM, user has this ability in boosting GM. 16) Based-on retrieved YouTube subtitled videos; NEs through UM-PAR. It contains 7 the popular web APIs user can have been cruising on the LOD-cloud. such as Amazon web service, Google map, wolfram alpha, Simultaneously, in here, the user can generate descriptions Google book, Library of Congress, Internet Movie of resulted objects in a media fragment. He/she can send Database (IMDB) and etc., for example “Mark Zukerberg” own annotations and descriptions from the results of NE can send to UM-PAR based-on user chooses. There is integrator modules (text/URI), towards RDF-GM for a collection of the lovable web APIs. UM-PAR returns completing the process of generate enriched RDF/XML unified results from NEs. Hence enriched list of about the file. For example, user annotates relation extraction portion relationships of “Mark Zukerberg” NE with the popular of Alchemy API and writes suitable subject, object and action into an RDF/XML file. Finally, we have enriched NLPTR ontology sends the resulted NEs in URI form RDF/XML file connected with timed text information of a towards OpenCalais, Extractiv, AlchemyAPI, Wikimeta, subtitled YouTube video media fragment. Saplo and all of NLP tools that are containing ontology section. B. SY-VSE (Proposed) Modules Description 2) UM-PAR: After that NEs and other the main 1) UM-NLPTR: We proposed UM-NLPTR for portions of UM-NLPTR on user interface displayed, client extracting NEs and other main portions of 10 NLP web can send interesting NEs towards UM-PAR. By this tools from subtitles associated to YouTube videos in order module, a user can have been an efficient boost on resulted to generate media fragments annotated and enriched with NEs. An NLP tool such as OpenCalais can show resources from the LOD cloud. These NLP web tools are geographical latitude and altitude of an NE. however that it included AlchemyAPI, DBpedia Spotlight, Yahoo! can’t show geographical information on Google map API. Content Extraction, Extractiv, OpenCalais, Saplo, NLP tools can show an NE with its categories well, but Wikimeta, TextRazor and Zemanta. Each of the 10 NLP they can't show enriched information of related to them. web tools that used, have the portions and features that are For example, none of these NLP tools can display physical their own unique. For example, the Extractiv NLP web characteristics of “Mark Zukerberg". However, Wolfram tool can support the Persian language in its input. On the Alpha shows such information and relationships of NEs other hand, the TextRazor NLP web tool has special ability together as well. We can have an overview on books in drawing dependency parse graph. Recently, research related to “Mark Zukerberg”, by Google book. Now, we and commercial communities have spent efforts to publish can see suitable prices and Bibliographic Information on NLP services on the web. Besides the common task of Amazon and library of Congress. “The Social Network” is identifying WSD and POS, they provide more a movie about the life of “Mark Zukerberg”. IMDB disambiguation facility with URIs that describes web presents competent information this movie to users. As we resources, leveraging on the web of real-world objects. The saw above, through NE migration to a higher level, we can most favorite NLP tools, use various portions for provide have been ideal boosting on such NEs. We used JavaScript desirable performance. They use portions such as entity code for implementation some APIs such as Google map extraction, text categorization, sentiment analysis, concept and Google book. UM-PAR’s framework follows REST tagging and other similar items. However, all the principles alike to UM-NLPTR architecture. Unified researches on the online video Annotation included in results of UM-PAR can show as an appropriate entity extraction and sometimes, topic extraction. In this graphical/textual form, similar to UM-NLPTR form, too. paper, we created UM-NLPTR as an integrator module of Therefore, we need to represent a higher level to have a the most NLP tools portions. NLP tools represent a clear boosted NE enrichment. UM-PAR has been implemented opportunity for the web community to increase the volume such level as well. In addition, UM-PAR is better of interconnected data and therefore, they can have huge optimized by collaborate with UM-NLPTR for exploits the contribution in reveal truly invisible web. This paper number of named entities extracted from YouTube video presents UM-NLPTR a framework that unified the output subtitles, their type and their appearance on the time line as of 10 different NLP tools. UM-NLPTR’s architecture features for classifying videos into different categories. follows the REST principles and provides a suitable API 3) RDF-GM: We implemented RDF generator module for machines to exchange content in ideal output modes. It (RDF-GM) based-on dotNetRDF library. dotNetRDF can accepts any text/URI of an NLP tool portion/web produce two main products. One is a Programmer API for document, which is analyzed in order to extract its main working with RDF and SPARQL in code using .Net and and enriched textual content. GET, POST and PUT other is a toolkit which provides an assortment of GUI and methods manage the requests coming from users to command line tools for working with RDF and SPARQL. retrieve the list of NEs and the main NLP tools portions, RDF-GM is a very important module in our work. It has a classification types and text/URIs for a specific tool or for particular ability on reading RDF, writing RDF, working the combination of them. The output sent back to the user with graphs. RDF-GM receives unified results from UM- can be serialized in JSON, XML and RDF depending on NLPTR and makes suitable RDF syntaxes outputs towards the content type requested. Although NLP tools share the the client storage disk. Simultaneously, it creates related same goal, they use different algorithms and their own TriG format towards triple store. dotNetRDF currently classification taxonomies, which make hard their supports reading RDF files in all the RDF syntaxes include comparison. In UM-NLPTR, we use disambiguation NTriples, Turtle, Notation 3, RDF/XML, RDF/JSON, information for the detected entities. UM-NLPTR RDFa 1.0, TriG (Turtle with Named Graphs), TriX Ontology exposes additional ontological mappings for (Named Graphs in XML) and NQuads (NTriples plus detected entities. For example, we used DBpedia Ontology Context). End user can by SPARQL GUI send interesting as one of the important UM-NLPTR ontology modules query towards in-memory triple store and RDF syntaxes that currently contains about 2,350,000 instances, or we stored into his/her disk storage. In here, the user by web utilized by TextRazor ontology as a module that can application can choose suitable location for storing automatically place entities into ontology of thousands of structured and semantic files (for example, drive D). In categories derived from LD sources. Similarly, UM- Addition, end user can write an RDF file towards RDF-

Figure 6. Some overviews on SY-VSE user interface and sample results.

GM by his/her annotations from unified results of UM- towards UM-PAR, and send user queries towards NLPTR and UM-PAR. In YouTube the visualization of SPARQL GUI. In second level, user has a good ability in media resources are embedded within a Web page. This seeking media fragments of selected subtitled YouTube offers opportunities to embed RDF about media fragments video. So that with select desired media fragment by the and annotations into the original displaying multimedia user, he/she jumps to specific start time on the video resources and their annotations. Client-side in this method player. For this purpose, we implemented JavaScript codes that supported through YouTube feed API. From start time does not need to change anything unless the visualization to end time of selected media fragment is highlighted and of media fragments is required, while server-side has to timed text information is sent to subtitle information add RDFa into the web pages. About the RDFa extraction, presenter component. In here, subtitle information such as dotNetRDF has a parser which extracts RDF embedded in closed caption text, media fragment number; start time, RDFa in HTML and XHTML documents. duration and end time are sent to UM-NLPTR. Then, unified results of UM-NLPTR are sent to exhibitioner of unifier modules outputs, and user can see them. Now, end C. SY-VSE User Interface client can have more utilization by using navigator Some overviews on our web application is shown in subcomponents and other main unifier/generator modules. Figure 6. SY-VSE (Proposed) is a web application in C# For instance, User has ideal facilities in send desired query 2012. A user must first log in on SY-VSE (Proposed). towards SPARQL endpoint, write and make RDF syntaxes Then user is redirected to first level of search. In first level, and send them towards RDF-GM, send interesting resulted after entering desired keywords by user, he/she can pick NEs towards UM-PAR, enter desired URI/text towards the main keyword-base textual metadata from subtitled unifier modules, send his/her metadata annotations (e.g. YouTube videos that are resulted. For example, through ranking, comments and etc.) towards YouTube platform. pick category item, we can have a comparison between a Finally, we have a user-friendly and complete cruising on YouTube video category and existed category for every the LOD-cloud by the end client. media fragment subtitle. After the main textual metadata is picked, user redirected to second level of search. In this IV. EVALUATION level, user can have a proper interact with video semantic Annotation components. These components are included in A. Our approach evaluation against previous work video player, subtitle information presenter and exhibitioner of unifier modules outputs. We also We divided our comparisons into two parts. In first implemented navigator subcomponents for make facilities part, we will compare UM-NLPTR against the module of to send desired URIs/text towards unifier modules, send NE extraction of in [8]. In second part, we will have a wrote RDF syntaxes to RDF-GM, send resulted NEs good comparison between our RDF generator module and processors supported by UM-NLPTR and we divided these RDF API in [7]. categories into 11 portions, including entity extraction, keyword extraction, topic extraction, word detection, In comparison with NERD module [8], UM-NLPTR sentiment analysis, phrases detection, text categorization, has some advantages such as 1) present a framework to meaning detection, concept tagging, relation extraction and support all the 10 NLP web tools features such as concept dependency parse graph. About the first step, since we tagging, entity extraction, keyword extraction, text send subtitled YouTube videos get request by REST categorization, sentiment analysis, relation extraction, protocol towards YouTube data API; our experimental event identification, fact identification, entailment results showed that YouTube returns every 80 YouTube extraction and other many features, and representing them videos with their subtitles that located into an XML file. In in a unified format, 2) Present UM-NLPTR ontology to second step, regardless of subtitles environment sounds support all the 10 NLP web tools ontological features and (e.g. train passings ) we will present our representing unified ontological results, 3) Using experimental results in this step. Table I show number of TextRazor as an acceptable NLP web tool instead of the media fragments and number of subtitled YouTube videos Evri NLP web tool of [8], 4) Using all of NEs types of the evaluated per 4 YouTube channels. 10 NLP web tools in output and 5) User tests by URI/text (according to all of NLP web tools features). About the cost or overhead of the 10 NLP web tools, since we use TABLE I. NUMBER OF MEDIA FRAGMENTS AND NUMBER OF advanced cloud-based and on-premise text analysis tools SUBTITLED YOUTUBE VIDEOS EVALUATED PER 4 YOUTUBE CHANNELS. that eliminate the expense and difficulty of integrating YouTube Media fragments Subtitled natural language processing systems into a web channels number YouTube videos application, and they are based-on REST or JSON request, number therefore, we don’t pay a great cost in interact with them. Tech 761 20 As we mentioned, each of the 10 NLP web tools have their Science and 653 20 own advantages and all the tools that located into UM- Education NLPTR and UM-PAR are in interact together. For Sports 602 20 instance, in Figure 6, the Opencalais don’t any information News 734 20 about the Mark Zukerberg nationality. Indeed, the Wolfram and Google map APIs give to a user special information on the nationality and birth place of the Mark Zukerberg NE. TABLE II. NUMBER OF MEDIA FRAGMENTS LOCATED ON 4 YOUTUBE CHANNELS PER PORTIONS CORRECT RESPONSE. Finally, in comparison with RDF API [7], our RDF- GM has special advantages such as 1) Provide facilities to reading RDF, writing RDF, working with graphs, working Web natural YouTube channels with triple stores and querying with SPARQL by using language processors Tech Science and Sports News dotNetRDF library, 2) Make all of the RDF syntaxes portions Education supported by dotNetRDF about the reading RDF including NTriples, Turtle, Notation 3, RDF/XML, RDF/JSON, Entity 759 652 602 734 RDFa 1.0, TriG, TriX and NQuads, 3) Make all of the extraction RDF syntaxes supported by dotNetRDF about the writing Keyword 745 638 597 728 graphs including NTriples, Turtle, Notation 3, RDF/XML, extraction Topic extraction 753 645 591 718 RDF/JSON, XHTML + RDFa, 4) Loading and saving triple stores into an in-memory triple store module, 5) Word detection 761 653 595 732 Embed user annotations, caption information and picked Sentiment 742 639 600 730 metadata into an enriched RDF/XML format, 6) Export analysis every UM-NLPTR results into enriched RDF/XML Phrases 758 650 601 733 format, according to video ID and media fragment number detection 7) Save all of the resulted RDF syntaxes into user’s disk Text 746 641 586 719 storage according to user preferences 8) Save an in- categorization memory triple store as a file in a RDF dataset format such Meaning 757 648 598 730 as TriG, TriX or Nquads according to user preferences detection Concept tagging 748 624 577 705 (inside of user’s disk storage) and 9) Using SPARQL GUI as an tool to send user queries towards enriched files of Relation 751 641 597 722 stored in client-side. extraction Dependency 760 653 600 734 parse graph B. Our approach experimental results Average 752.72 644 594.90 725.90 We collected 80 subtitled YouTube videos in total and evaluated them within four different YouTube channels, including Tech, Science and Education, Sports and News. Now for analysis UM-NLPTR output, we have been The evaluation consisted in two steps: 1) be able to get all evaluated the number of media fragments located on four subtitles of YouTube videos and 2) perform 10 NLP web YouTube channels per portions correct response of the 10 tools main portions using UM-NLPTR. We combined web natural language processors (see Table II). portions category of the main web natural language In [7], they evaluated the only average number of NEs [3] O. Erling and I. Mikhailov, "RDF Support in the Virtuoso DBMS," and entities extracted per three YouTube channels (on 60 in Networked Knowledge-Networked Media, ed: Springer, 2009, YouTube videos). They used their evaluations on entity pp. 7-24. [4] R. Arndt, R. Troncy, S. Staab, L. Hardman, and M. Vacura, extraction portion and in nine limited entity type of NERD "COMM: designing a well-founded multimedia ontology for the module, including thing, person, function, organization, web," in The semantic web, ed: Springer, 2007, pp. 30-43. location, product, time, amount and event. However, we [5] T. Steiner and M. Hausenblas, "SemWebVid-Making Video a First evaluated our experimental results on 11 the main portions Class Semantic Web Citizen and a First Class Web Bourgeois," in of NLP web tools. Furthermore, we could utilize from the ISWC Posters&Demos, 2010, pp. 1-8. thousands entity types located on 10 NLP web tools in our [6] T. Steiner, R. Verborgh, R. Van de Walle, M. Hausenblas, and J. entity extraction portion. For example, in our entity Gabarró Vallès, "Crowdsourcing event detection in YouTube extraction portion and across 20 evaluated videos and 761 videos," 2012, pp. 58-67. media fragments into Tech YouTube channel; 759 media [7] Y. Li, G. Rizzo, R. Troncy, M. Wald, and G. Wills, "Creating enriched YouTube media fragments with NERD using timed-text," fragments are successfully enriched with related NEs. pp. 1-4, 2012. These NEs come from inside of UM-NLPTR. In addition, [8] G. Rizzo and R. Troncy, "NERD: A Framework for Unifying apart from the entity extraction portion, we evaluated Named Entity Recognition and Disambiguation Extraction Tools," outputs of 10 the other main portions located on UM- in Proceedings of the Demonstrations at the 13th Conference of the NLPTR. Our experimental results on 80 subtitled European Chapter of the Association for Computational YouTube videos showed that these 10 portions can have Linguistics, 2012, pp. 73-76. very important effects on enrichment textual and temporal features of subtitled YouTube media fragments.

V. CONCLUSION AND FUTURE WORK The enormous growth of subtitled YouTube video data repositories has increased the need for semantic video indexing techniques. In this paper, we discussed a new way for semantic indexing subtitled YouTube video content through extracting the main portions from the captions with web natural language processors. We introduced integrator modules that their results are associated with subtitled YouTube media fragments. LD has provided a suitable way to expose, index and search media fragments and annotations on semantic web using URI for identification of resources and RDF as a structured data format. In here, LOD can aims to publish and connect open but heterogeneous databases by applying the LD principles. The aggregation of all LOD data set is denoted as LOD-Cloud. Finally, by implementing the important semantic web derivatives such as SPARQL GUI and RDF- GM, end-client can have efficient cruising on the LOD- cloud. On the other hand, by tools such as CaptionTube user can effortlessly create suitable captions for own YouTube videos. Therefore, with minimum distribution of hundreds of thousands YouTube videos in a moment and easily convert them to subtitled YouTube videos by owners, we could have efficient semantic indexing and Annotating on subtitled YouTube contents by SY-VSE (proposed). For future works, we plan to apply our proposed approaches to object-featured video summarization and video categorization field. In addition, we’re developing a query-builder interface which creates the SPARQL queries and thereby, end user without have knowledge about the SPARQL queries can easily interact with enriched and semantic files located on the user storage disk.

REFERENCES

[1] B. Haslhofer, W. Jochum, R. King, C. Sadilek, and K. Schellner, "The LEMO annotation framework: weaving multimedia annotations with the web," International Journal on Digital Libraries, vol. 10, pp. 15-32, 2009. [2] J. Waitelonis, N. Ludwig, and H. Sack, "Use what you have: Yovisto video search engine takes a semantic turn," in Semantic Multimedia, ed: Springer, 2011, pp. 173-185.