Creating a Novel Semantic Video Search Engine Through Enrichment Textual and Temporal Features of Subtitled YouTube Media Fragments
Babak Farhadi M. B. Ghaznavi-Ghoushchi Department of Computer engineering Dept. of EE. School of Engineering University of Tehran, International Campus Shahed University Kish Island, Iran Tehran, Iran [email protected] ghaznavi AT shahed.ac.ir
Abstract— Semantic video Annotation is an active research zone within the field of multimedia content understanding. With the stable increase of videos published on the famous video sharing platforms such as YouTube, more efforts are spent to automatically annotate these videos. In this paper, we propose a novel framework to annotating subtitled YouTube videos using both textual features such as all of portions extracted from web natural language processors in relation to subtitles, and temporal features such as the duration of the media fragments where particular entities are spotted. We implement SY-VSE (Subtitled YouTube Video Search Engine) as an efficient framework to cruising on the subtitled YouTube videos resident in the Linked Open Data (LOD) cloud. For realizing this purpose, we propose Unifier Module of Natural Language Processing (NLP) Tools Results (UM-NLPTR) for extracting main portions of the 10 NLP web tools from subtitles associated to YouTube videos in order to generate media fragments annotated with Figure 1. Structure of subtitled Youtube keyword-based search. resources from the LOD cloud. Then, we propose Unifier Module of Popular API’s Results (UM-PAR) containing the shortcomings caused by e.g. homonyms (see Figure 1). seven favorite web APIs to boost results of Named Entities Optimal results are possible through the analysis of the (NE) obtained from UM-NLPTR. We will use dotNetRDF as available textual metadata with NLP web tools and popular a powerful and flexible API for working with Resource web APIs, especially given the availability of subtitles on Description Framework (RDF) and SPARQL in .Net YouTube videos. environments. B. Semantic multimedia Annotations and Linked Data Index Terms — Subtitled YouTube video, NLP web tools, textual metadata, semantic web, video Annotation, video Role of semantic web technologies is to make implicit search. meaning of content explicit by providing suitable metadata annotations based on formal knowledge representations. In I. INTRODUCTION this area, Linked Data (LD) means to expose, share and connect pieces of data, information and knowledge on the For better understanding the discussions located in semantic web using Uniform Resource Identifiers (URI) introduction section, we have divided them in six for identification of resources and RDF as a structured data subsections that can be summarized as followings: format. It is creating the relationships from the data to other sources on the Web. These data sets are not only A. Influence of Natural language processors on on-line accessible by human beings, but also readable for video platforms machines. LOD can aims to publish and connect open but Nowadays, on-line video-sharing platforms, especially heterogeneous databases by applying the LD principles. YouTube shows that video has become the medium of The aggregation of all LOD data set is denoted as LOD- choice for many people interchanging via the net. On the Cloud. In this direction, the term “media fragment” refers other hand, the amazing increase of online video content to the inside content of multimedia objects, such as a confronts the consuming user with an indefinite amount of certain zone within an image, or a five-minute segment data, which can only be accessed with advanced within a one-hour video. Most research about media multimedia search and management technologies in order fragments focus on exposing the closed data, such as tracks to retrieve the few needles from the giant haystack. The and video segments, within the multimedia file or the host most on-line video search engines provide a keyword- server using semantic Web and LD technologies. based search, where lexical ambiguity of natural language often leads to imprecise and defective results. For instance, YouTube supports a keyword-based search within the textual metadata provided by the users, accepting all the C. Achieving to indexable semantic data on on-line F. Overview on main targets in our research video search engines In this paper by utilizing a rich data model based on the To make use of the semantic web information, search best NLP web tools, the popular web APIs, ontologies and engines need to index semantic data. Generally, this can be RDF, we have developed a web application that called SY- achieved by storing the data in triple stores. Triple stores VSE (Proposed). Previous works of near to our work are Database Management Systems (DBMS) for data haven’t used all the portions and ideas of realized in this modeled using RDF. They store RDF triples and are paper yet. They have used NLP web tools only for detect queried using SPARQL endpoint. For turn a non-semantic NEs (in limited main types). However, for example, video search engine into a semantic video search engine, Alchemy API main portions are included in concept the already existing keyword-based textual metadata has to tagging, entity extraction, keyword extraction, relation be mapped to semantic web entities. The most challenging extraction, sentiment analysis, text categorization and etc. problem on mapping data to semantic web entities is the Previous works behavior against the OpenCalais and other existence of ambiguous names and thus resulting in a set of the best NLP web tools is in the same way, too. By using entities, which have to be disambiguated. One of the the popular web APIs such as Amazon web service, important our goal in this paper is mapping of textual and Google map, wolfram alpha, Google Book, Library of temporal features of YouTube subtitled videos to LOD Congress, Internet Movie Database (IMDB) and entities. ThingISBN, we can have really boosted semantically meaning of NEs. However, none of the preceding works D. Role of video transcripts in semantic video indexing haven’t used the popular web API’s in their approaches. In addition in foregoing web applications, user can’t use In our opinion, one of the most challenging approaches SPARQL endpoint for query from RDF datasets. There concerning semantic video indexing, is based on the isn’t any SPARQL GUI for user queries towards triple extraction of semantics from its subtitles. Subtitles carried stores and also there isn’t any RDF syntaxes outputs (e.g. such information through natural language sentences. The NTriples, Turtle, TriG, TriX, RDFa and etc.) for save into YouTube video content is described as well by metadata storage disk, analysis by user and send query to them referring to the entire video as also by information through SPARQL GUI. SY-VSE (Proposed) has assigned to a distinct time position within the video. Our eliminated all the problems with the previous web intention is entity mapping of both kinds of metadata. applications (see Figure 2). These both kinds include YouTube subtitled Video Metadata (e.g. title, description, rating, publish In this paper, we propose UM-NLPTR for extracting information, thumbnail view, category and etc.) and its NEs and other main portions of 10 NLP web tools from attached transcripts. In addition since textual features subtitles associated to YouTube videos in order to generate containing a higher level semantic concept are easier to media fragments annotated with resources from the LOD understand, they are very useful in video classification. cloud. Then, we propose UM-PAR containing seven favorite web APIs to boost results of NEs obtained from E. Relationships between video transcripts, natural UM-NLPTR. We use dotNetRDF as a powerful and language processors and named entities flexible API for working with RDF and SPARQL in .Net We must evaluate and analyze subtitle in order to environments. Our contribution is offering a new calculate the relatedness between textual inputs. For this integrated method to extracting NEs and other principal purpose, we propose using NLP web tools to discover portions of the best NLP web tools from UM-NLPTR in relatedness between words that possibly represent the same both text/URI area and for linking media fragments to the concepts. Tools for analyzing words and sentences within LOD-cloud using NEs and main portions that extracted NLP include part-of-speech (POS) tagging and word sense from subtitled YouTube videos, boosting resulted NEs by disambiguation (WSD). These tools locate and discover UM-PAR, design a user-friendly and robust user interface the sense and concept that the word represents. NLP is not for browsing the enriched subtitled YouTube videos and only about understanding, but also the ability to make an ability for interact with RDF Generator Module manipulate data, and in some cases, produce answers. NLP (RDF-GM), triple store and SPARQL GUI through user is as well strongly connected with Information Retrieval and mapping YouTube data API keyword-based textual (IR) for searching and retrieving information by using metadata and its attached transcripts to semantic web some of these concepts. A word can be classified into entities. several linguistic word types. These could, for instance, be hypernyms, homonyms and hyponyms. A hypernym is a II. RELATED WORK less specific instance than the original word. Hyponyms are the opposite, a more specific instance of the given A. Overview on previous works related to our proposed word. Homonyms are equal words or synonyms. In order approach to maximize the usability of NLP, we need to have defined Many applications have already published multimedia ontology. NE extraction is an exhaustive task in the NLP and Annotation as LD, which offers experience for us on field that has conceded numerous services gaining multimedia resource publishing. The LEMO multimedia popularity in the Semantic Web community for extracting Annotation framework provides a unified model to knowledge from web documents. These services are annotate media fragments while the annotations are generally organized as pipelines, using dedicated APIs and enriched with contextually relevant information from the different taxonomy for extracting, classifying and LOD-cloud. In LEMO, media fragments are published disambiguating NLP main portions. using MPEG-21 vocabulary. LEMO has to convert
Figure 2. SY-VSE (proposed) framework. existing video files to MPEG compatible version and B. Analysis the closest work related to our proposed stream them from LEMO server. LEMO also derived a approach core Annotation schema from “Annotea Annotation In [7] proposed an approach using 10 NLP tools based Schema” in order to link annotations to media fragments on NERD module in [8]. They only use the portion of identifications [1]. ‘Yovisto.com’ host large amount of entity extraction of NLP web tools. In their demo, user recordings of academic lectures and conferences for users can’t use advantages RDF syntaxes outputs, SPARQL GUI to search in a content-based manner. Some researchers and triple store module. The most output result of their augment Yovisto open academic video search platform by demo isn’t with subtitled YouTube video, and they haven’t publishing the database containing video and annotations utilized the main keyword-based textual metadata of as LD [2]. Yovisto uses Virtuoso server in [3] to publish YouTube data API (see Figure 3). NERD module use the videos and annotations in the database and MPEG-7, results of 10 NLP web tools separately and it doesn’t Core Ontology of Multimedia (COMM) in [4] to describe integrate NE recognition results of 10 NLP Tools. On the multimedia data. It provides both automatic video other hand, NERD uses these 10 NLP Tools only for NE annotations based on video analysis and collaborative user- recognition about the 10 main types. For this propose, generated annotations, which are further linked to entities NERD has answers of chopped and with restricted in the LOD-cloud with the objective to improve the search categories of dependent on NEs. It is ignored the most ability of videos. important portions of the best NLP tools include concept tagging, content scraping, entity extraction, keyword SemWebVid automatically generates RDF video extraction, relation extraction, sentiment analysis, text descriptions using their closed captions. The captions are categorization, fact detection and etc. (see Figure 4). As we analyzed by 3 NLP web Tools (AlchemyAPI, OpenCalais mentioned the most challenging problem on mapping and Zemanta) but chunked into blocks, which make loose textual and temporal features of YouTube subtitled videos the context for the NLP web tools [5]. In [6] has shown to LOD entities is the existence of ambiguous names and how events can be detected on-the-fly through crowd therefore, resulting in a set of entities, which have to be sourcing textual, visual, and behavioral analysis in disambiguated. Using all the main portions of the popular YouTube videos, at scale. They defined three types of NLP web tools and boosting resulted NEs through popular events: visual events in the sense of shot changes, web APIs and also utilizing by RDF-GM and attached occurrence events in the sense of the appearance of an NE, modules of related to it, can be eventuated in an enriched and interest-based events in the sense of purposeful in- entity set. In [7] used NLP tools, including AlchemyAPI, video navigation by users. In occurrence event detection DBpedia Spotlight, Evri, Extractiv, OpenCalais, Saplo, process they analyzed the available video metadata using Wikimeta, Yahoo! Content Extraction and Zemanta only NLP techniques, as outlined in [5]. The detected NEs are for entity recognition. In this paper, we used all of main presented to the user in a list, and upon click via a portions of NLP tools are listed, but the difference is that timeline-like user interface allow for jumping into one of we utilized TextRazor NLP web tool instead of the Evri the shots where the NE occurs. NLP web tool. TextRazor NLP web tool can Identify and disambiguate millions of NEs including people, places, companies and thousands of other types. It has special
Figure 3. Example of the whole resulted page and architecture in [7]. ability in Extracting subject-action-object relations, modules shift towards RDF-GM and its affiliated portions properties and typed dependencies between entities. for cruising on the LOD-cloud by users. We will describe TextRazor returns contextual synonyms or entailments, in full the data flow process, our proposed modules and words that are semantically implied by the given context, SY-VSE (Proposed) user interface. even though they aren't explicitly mentioned. Generally, in comparison with the Evri NLP web tool that it’s out of the A. Data flow Proccess access right now, TextRazor is a complete, integrated Our data flow process is shown in Figure 5. It can be textual analysis solution. It uses state-of-the-art machine summarized as follows: 1) User must enter desired learning and NLP techniques together with a keyword or video id to receive keyword-based textual comprehensive knowledgebase of real-life facts to help metadata of available by YouTube data API. 2) Content of parse and disambiguate transcripts with industry-leading requested keyword or video id is sent by a standard REST accuracy and speed. protocol using HTTP GET requests to YouTube data API. 3) The main keyword-based textual metadata redirected by YouTube data API to a user. All the videos are with subtitles. Returned textual metadata is containing video title, video ID, description, publishing date, updating date, author, recording location, rating average, five snapshots, user comments and etc. 4) In here; user has an ability to pick interesting metadata. Therefore, keyword-based textual metadata of on LOD-cloud is based-on user choice. Then selected metadata with connected video id redirected to semantic Annotation. 5) To get related YouTube subtitled video transcript (caption) information, SY-VSE (Proposed) back-end sends an automatic REST request to YouTube data API. 6) As this time YouTube data API returns related video caption information. It contains subtitle text, start time of media fragment and its duration. Figure 4. The NERD module used in [8]. We implemented an optimized JavaScript code so that by click on the media fragment number, user jump to related Compared with [7] and according to our experiments, start time. Then from start time to end time of related we have proper ability in using all the portions of the best media fragment has highlighted. 7) User can have an NLP tools in both text/URI area and by UM-NLPTR. In advanced search on the whole subtitle and jump to related addition, UM-PAR is an expert module in boosting media fragment. 8) To apply the best NLP techniques, resulted NEs. Furthermore, RDF-GM, triple store and selected timed text send to UM-NLPTR. In here we used SPARQL GUI in our paper play an important role in 10 NLP web tools for accomplish an appropriate result. representing enriched results to cruising on the LOD-cloud Apart from the NE extraction and disambiguation, UM- by users. NLPTR using the other main portions of 10 NLP Tools. 9) Unified and enriched results of UM-NLPTR send towards the user interface. 10) Unified and enriched results of UM- III. FRAMEWORK OVERVIEW NLPTR send towards RDF-GM, too. We propose a novel Figure 5 shows the modular implementation of our RDF Generator Module for the integration of the main proposed framework. In a nutshell, the framework enables keyword-based textual metadata, media fragments and to pick interesting YouTube subtitled video metadata by unified results of UM-NLPTR and UM-PAR that could be users and retrieving related transcript information from the reused for various online media; 11) according to all the YouTube data API to extract NEs and other main portions portions of resulted in UM-NLPTR, based-on selected through integrator modules. Then results of integrator video id and media fragment number, RDF-GM generates
Figure 5. The basic data flow process in proposed SY-VSE. suitable RDF syntaxes outputs (e.g. NTriples, Turtle, TriG, web APIs is returned. For instance, in Wolfram Alpha TriX, RDFa and etc.) to save them into the user storage API, input interpretation, basic information, image, disk. By default for every resulted portion of exist in UM- timeline, notable facts, physical characteristics, estimated NLPTR, RDF-GM generates correct RDF syntaxes outputs net worth, Wikipedia’s page hits history about the “Mark of NTriples, Turtle and RDF/XML. But this is based-on Zukerberg” NE is shown. 14) Unified and enriched results user selection. We used dotNetRDF library for this of UM-PAR send towards the user interface. 15) SPARQL purpose, and it supports from many RDF syntaxes outputs. is the standard query language for the Semantic Web and Desired output formats saved on the user storage disk can be used to query over large volumes of RDF data. (with the portion name instead of folder name and media dotNetRDF provides support for querying over local in- fragment number-video id instead of the file name). 12) In memory data using its own SPARQL implementation. We here, for every generated portion, RDF-GM generates used SPARQL GUI for testing out SPARQL queries on related TriG format to save or load into the in-memory arbitrary data sets which user created by loading in RDF- triple store. By this method, user can make proper GM (or triple store ) and/or remote URIs. A user can easily SPARQL query towards the in-memory triple store. 13) query files that have been stored into triple store (with After displaying unified results of UM-NLPTR to user and TriG format) and other RDF syntaxes of resulted by RDF- entering into RDF-GM, user has this ability in boosting GM. 16) Based-on retrieved YouTube subtitled videos; NEs through UM-PAR. It contains 7 the popular web APIs user can have been cruising on the LOD-cloud. such as Amazon web service, Google map, wolfram alpha, Simultaneously, in here, the user can generate descriptions Google book, Library of Congress, Internet Movie of resulted objects in a media fragment. He/she can send Database (IMDB) and etc., for example “Mark Zukerberg” own annotations and descriptions from the results of NE can send to UM-PAR based-on user chooses. There is integrator modules (text/URI), towards RDF-GM for a collection of the lovable web APIs. UM-PAR returns completing the process of generate enriched RDF/XML unified results from NEs. Hence enriched list of about the file. For example, user annotates relation extraction portion relationships of “Mark Zukerberg” NE with the popular of Alchemy API and writes suitable subject, object and action into an RDF/XML file. Finally, we have enriched NLPTR ontology sends the resulted NEs in URI form RDF/XML file connected with timed text information of a towards OpenCalais, Extractiv, AlchemyAPI, Wikimeta, subtitled YouTube video media fragment. Saplo and all of NLP tools that are containing ontology section. B. SY-VSE (Proposed) Modules Description 2) UM-PAR: After that NEs and other the main 1) UM-NLPTR: We proposed UM-NLPTR for portions of UM-NLPTR on user interface displayed, client extracting NEs and other main portions of 10 NLP web can send interesting NEs towards UM-PAR. By this tools from subtitles associated to YouTube videos in order module, a user can have been an efficient boost on resulted to generate media fragments annotated and enriched with NEs. An NLP tool such as OpenCalais can show resources from the LOD cloud. These NLP web tools are geographical latitude and altitude of an NE. however that it included AlchemyAPI, DBpedia Spotlight, Yahoo! can’t show geographical information on Google map API. Content Extraction, Extractiv, OpenCalais, Saplo, NLP tools can show an NE with its categories well, but Wikimeta, TextRazor and Zemanta. Each of the 10 NLP they can't show enriched information of related to them. web tools that used, have the portions and features that are For example, none of these NLP tools can display physical their own unique. For example, the Extractiv NLP web characteristics of “Mark Zukerberg". However, Wolfram tool can support the Persian language in its input. On the Alpha shows such information and relationships of NEs other hand, the TextRazor NLP web tool has special ability together as well. We can have an overview on books in drawing dependency parse graph. Recently, research related to “Mark Zukerberg”, by Google book. Now, we and commercial communities have spent efforts to publish can see suitable prices and Bibliographic Information on NLP services on the web. Besides the common task of Amazon and library of Congress. “The Social Network” is identifying WSD and POS, they provide more a movie about the life of “Mark Zukerberg”. IMDB disambiguation facility with URIs that describes web presents competent information this movie to users. As we resources, leveraging on the web of real-world objects. The saw above, through NE migration to a higher level, we can most favorite NLP tools, use various portions for provide have been ideal boosting on such NEs. We used JavaScript desirable performance. They use portions such as entity code for implementation some APIs such as Google map extraction, text categorization, sentiment analysis, concept and Google book. UM-PAR’s framework follows REST tagging and other similar items. However, all the principles alike to UM-NLPTR architecture. Unified researches on the online video Annotation included in results of UM-PAR can show as an appropriate entity extraction and sometimes, topic extraction. In this graphical/textual form, similar to UM-NLPTR form, too. paper, we created UM-NLPTR as an integrator module of Therefore, we need to represent a higher level to have a the most NLP tools portions. NLP tools represent a clear boosted NE enrichment. UM-PAR has been implemented opportunity for the web community to increase the volume such level as well. In addition, UM-PAR is better of interconnected data and therefore, they can have huge optimized by collaborate with UM-NLPTR for exploits the contribution in reveal truly invisible web. This paper number of named entities extracted from YouTube video presents UM-NLPTR a framework that unified the output subtitles, their type and their appearance on the time line as of 10 different NLP tools. UM-NLPTR’s architecture features for classifying videos into different categories. follows the REST principles and provides a suitable API 3) RDF-GM: We implemented RDF generator module for machines to exchange content in ideal output modes. It (RDF-GM) based-on dotNetRDF library. dotNetRDF can accepts any text/URI of an NLP tool portion/web produce two main products. One is a Programmer API for document, which is analyzed in order to extract its main working with RDF and SPARQL in code using .Net and and enriched textual content. GET, POST and PUT other is a toolkit which provides an assortment of GUI and methods manage the requests coming from users to command line tools for working with RDF and SPARQL. retrieve the list of NEs and the main NLP tools portions, RDF-GM is a very important module in our work. It has a classification types and text/URIs for a specific tool or for particular ability on reading RDF, writing RDF, working the combination of them. The output sent back to the user with graphs. RDF-GM receives unified results from UM- can be serialized in JSON, XML and RDF depending on NLPTR and makes suitable RDF syntaxes outputs towards the content type requested. Although NLP tools share the the client storage disk. Simultaneously, it creates related same goal, they use different algorithms and their own TriG format towards triple store. dotNetRDF currently classification taxonomies, which make hard their supports reading RDF files in all the RDF syntaxes include comparison. In UM-NLPTR, we use disambiguation NTriples, Turtle, Notation 3, RDF/XML, RDF/JSON, information for the detected entities. UM-NLPTR RDFa 1.0, TriG (Turtle with Named Graphs), TriX Ontology exposes additional ontological mappings for (Named Graphs in XML) and NQuads (NTriples plus detected entities. For example, we used DBpedia Ontology Context). End user can by SPARQL GUI send interesting as one of the important UM-NLPTR ontology modules query towards in-memory triple store and RDF syntaxes that currently contains about 2,350,000 instances, or we stored into his/her disk storage. In here, the user by web utilized by TextRazor ontology as a module that can application can choose suitable location for storing automatically place entities into ontology of thousands of structured and semantic files (for example, drive D). In categories derived from LD sources. Similarly, UM- Addition, end user can write an RDF file towards RDF-
Figure 6. Some overviews on SY-VSE user interface and sample results.
GM by his/her annotations from unified results of UM- towards UM-PAR, and send user queries towards NLPTR and UM-PAR. In YouTube the visualization of SPARQL GUI. In second level, user has a good ability in media resources are embedded within a Web page. This seeking media fragments of selected subtitled YouTube offers opportunities to embed RDF about media fragments video. So that with select desired media fragment by the and annotations into the original displaying multimedia user, he/she jumps to specific start time on the video resources and their annotations. Client-side in this method player. For this purpose, we implemented JavaScript codes that supported through YouTube feed API. From start time does not need to change anything unless the visualization to end time of selected media fragment is highlighted and of media fragments is required, while server-side has to timed text information is sent to subtitle information add RDFa into the web pages. About the RDFa extraction, presenter component. In here, subtitle information such as dotNetRDF has a parser which extracts RDF embedded in closed caption text, media fragment number; start time, RDFa in HTML and XHTML documents. duration and end time are sent to UM-NLPTR. Then, unified results of UM-NLPTR are sent to exhibitioner of unifier modules outputs, and user can see them. Now, end C. SY-VSE User Interface client can have more utilization by using navigator Some overviews on our web application is shown in subcomponents and other main unifier/generator modules. Figure 6. SY-VSE (Proposed) is a web application in C# For instance, User has ideal facilities in send desired query 2012. A user must first log in on SY-VSE (Proposed). towards SPARQL endpoint, write and make RDF syntaxes Then user is redirected to first level of search. In first level, and send them towards RDF-GM, send interesting resulted after entering desired keywords by user, he/she can pick NEs towards UM-PAR, enter desired URI/text towards the main keyword-base textual metadata from subtitled unifier modules, send his/her metadata annotations (e.g. YouTube videos that are resulted. For example, through ranking, comments and etc.) towards YouTube platform. pick category item, we can have a comparison between a Finally, we have a user-friendly and complete cruising on YouTube video category and existed category for every the LOD-cloud by the end client. media fragment subtitle. After the main textual metadata is picked, user redirected to second level of search. In this IV. EVALUATION level, user can have a proper interact with video semantic Annotation components. These components are included in A. Our approach evaluation against previous work video player, subtitle information presenter and exhibitioner of unifier modules outputs. We also We divided our comparisons into two parts. In first implemented navigator subcomponents for make facilities part, we will compare UM-NLPTR against the module of to send desired URIs/text towards unifier modules, send NE extraction of in [8]. In second part, we will have a wrote RDF syntaxes to RDF-GM, send resulted NEs good comparison between our RDF generator module and processors supported by UM-NLPTR and we divided these RDF API in [7]. categories into 11 portions, including entity extraction, keyword extraction, topic extraction, word detection, In comparison with NERD module [8], UM-NLPTR sentiment analysis, phrases detection, text categorization, has some advantages such as 1) present a framework to meaning detection, concept tagging, relation extraction and support all the 10 NLP web tools features such as concept dependency parse graph. About the first step, since we tagging, entity extraction, keyword extraction, text send subtitled YouTube videos get request by REST categorization, sentiment analysis, relation extraction, protocol towards YouTube data API; our experimental event identification, fact identification, entailment results showed that YouTube returns every 80 YouTube extraction and other many features, and representing them videos with their subtitles that located into an XML file. In in a unified format, 2) Present UM-NLPTR ontology to second step, regardless of subtitles environment sounds support all the 10 NLP web tools ontological features and (e.g. train passings
V. CONCLUSION AND FUTURE WORK The enormous growth of subtitled YouTube video data repositories has increased the need for semantic video indexing techniques. In this paper, we discussed a new way for semantic indexing subtitled YouTube video content through extracting the main portions from the captions with web natural language processors. We introduced integrator modules that their results are associated with subtitled YouTube media fragments. LD has provided a suitable way to expose, index and search media fragments and annotations on semantic web using URI for identification of resources and RDF as a structured data format. In here, LOD can aims to publish and connect open but heterogeneous databases by applying the LD principles. The aggregation of all LOD data set is denoted as LOD-Cloud. Finally, by implementing the important semantic web derivatives such as SPARQL GUI and RDF- GM, end-client can have efficient cruising on the LOD- cloud. On the other hand, by tools such as CaptionTube user can effortlessly create suitable captions for own YouTube videos. Therefore, with minimum distribution of hundreds of thousands YouTube videos in a moment and easily convert them to subtitled YouTube videos by owners, we could have efficient semantic indexing and Annotating on subtitled YouTube contents by SY-VSE (proposed). For future works, we plan to apply our proposed approaches to object-featured video summarization and video categorization field. In addition, we’re developing a query-builder interface which creates the SPARQL queries and thereby, end user without have knowledge about the SPARQL queries can easily interact with enriched and semantic files located on the user storage disk.
REFERENCES
[1] B. Haslhofer, W. Jochum, R. King, C. Sadilek, and K. Schellner, "The LEMO annotation framework: weaving multimedia annotations with the web," International Journal on Digital Libraries, vol. 10, pp. 15-32, 2009. [2] J. Waitelonis, N. Ludwig, and H. Sack, "Use what you have: Yovisto video search engine takes a semantic turn," in Semantic Multimedia, ed: Springer, 2011, pp. 173-185.