On Annotation of Video Content for Multimedia Retrieval and Sharing
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Computer Science and Information Security (IJCSIS), Vol. 14, No. 3, March 2016 On Annotation of Video Content for Multimedia Retrieval and Sharing Mumtaz Khan*, Shah Khusro, Irfan Ullah Department of Computer Science, University of Peshawar, Peshawar 25120, Pakistan Abstract-The development of standards like MPEG-7, MPEG-21 and ID3 tags in MP3 have been recognized from quite some time. It is of great importance in adding descriptions to multimedia content for better organization and retrieval. However, these standards are only suitable for closed-world-multimedia-content where a lot of effort is put in the production stage. Video content on the Web, on the contrary, is of arbitrary nature captured and uploaded in a variety of formats with main aim of sharing quickly and with ease. The advent of Web 2.0 has resulted in the wide availability of different video-sharing applications such as YouTube which have made video as major content on the Web. These web applications not only allow users to browse and search multimedia content but also add comments and annotations that provide an opportunity to store the miscellaneous information and thought-provoking statements from users all over the world. However, these annotations have not been exploited to their fullest for the purpose of searching and retrieval. Video indexing, retrieval, ranking and recommendations will become more efficient by making these annotations machine-processable. Moreover, associating annotations with a specific region or temporal duration of a video will result in fast retrieval of required video scene. This paper investigates state-of-the-art desktop and Web-based-multimedia- annotation-systems focusing on their distinct characteristics, strengths and limitations. Different annotation frameworks, annotation models and multimedia ontologies are also evaluated. Keywords: Ontology, Annotation, Video sharing web application I. INTRODUCTION Multimedia content is the collection of different media objects including images, audio, video, animation and text. The importance of videos and other multimedia content is obvious from its usage on YouTube and on other platforms. According to statistics regarding video search and retrieval on YouTube [1], lengthy videos with average duration of 100 hours are uploaded to YouTube per minute, whereas 700 videos per day are shared from YouTube on Twitter, and videos comprising of length equal to 500 years are shared on Facebook from YouTube [1]. Figure1 further illustrates the importance and need of videos among different users from all over the world. As of 2014, about 187.9 million US Internet users watched approximately 46.6 million videos in March 2014 [2]. These videos are not only a good source of entertainment but also facilitate students, teachers, and research scholars in accessing educational and research videos from different webinars, seminars, conferences and encyclopedias. However, in finding relevant videos, the opinions of fellow users/viewers and their annotations in the form of tags, ratings, and comments are of great importance if dealt with carefully. To deal with this issue, a number of video annotation 198 https://sites.google.com/site/ijcsis/ ISSN 1947-5500 International Journal of Computer Science and Information Security (IJCSIS), Vol. 14, No. 3, March 2016 systems are available including YouTube, Vimeo1, Youku2, Myspace3, VideoANT4, SemTube5, and Nicovideo6 that not only allow users to annotate videos but also enable them to search videos using the attached annotations and share videos with other users of similar interests. These applications allow users to annotate not only the whole video, but also their specific event (temporal) and objects in a scene (pointing region). However, these are unable to annotate specific themes in a video, and browsing for specific scene, theme, event and object, and searching using whole video annotations are some of the daunting tasks that need further research. One solution to these problems is to incorporate context-awareness using Semantic Web technologies including Resource Description Framework (RDF), Web Ontology Language (OWL), and several top-level as well as domain level ontologies in proper organizing, listing, browsing, searching, retrieving, ranking and recommending multimedia content on the Web. Therefore, the paper also focuses on the use of Semantic Web technologies in video annotation and video sharing applications. This paper investigates the state-of-the-art in video annotation research and development with the following objectives in mind: To critically and analytically review relevant literature regarding video annotation, video annotation models, and analyze the available video annotation systems in order to pin-point their strengths and limitations To investigate the effective use of Semantic Web technologies in video annotation and study different multimedia ontologies used in video annotation systems. To discover current trends in video annotation systems and place some recommendations that will open new research avenues in this line of research. To the best of our knowledge this is the first ever attempt to the detailed critical and analytical investigation of the current trends in video annotation systems along with a detailed retrospective on annotation frameworks, annotation models and multimedia ontologies. Rest of the paper is organized as Section 2 presents state-of-the-art research and development in video annotation, video annotation frameworks and models, and different desktop- and Web-based video annotation systems. Section 3 contributes an evaluation framework for comparing the available video- annotation systems. Section 4 investigates different multimedia ontologies for annotating video content. Finally, Section 5 concludes our discussion and puts some recommendations before the researchers in this area. 1 http://www.vimeo.com/ 2 http://www.youku.com/ 3 http://www.myspace.com/ 4 http://ant.umn.edu/ 5 http://metasound.dibet.univpm.it:8080/semtube/index.html 6 http://www.nicovideo.jp/ 199 https://sites.google.com/site/ijcsis/ ISSN 1947-5500 International Journal of Computer Science and Information Security (IJCSIS), Vol. 14, No. 3, March 2016 Figure 1. YouTube traffic and videos usage in the world. II. STATE-OF-THE-ART IN VIDEO ANNOTATION RESEARCH The idea behind annotation is not new, rather it has a long history of research and development and its origin can be traced back to 1945 when Vannevar Bush gave the idea of Memex that users can establish associative trails of interest by annotating microfilm frames [3]. In this section, we investigate annotations, video-annotations, video- annotation approaches and present state-of-the-art research and development practices in video-annotation systems. A. Using Annotations in Video Searching and Sharing Annotations are interpreted in different ways. They provide additional information in the form of comments, notes, remarks, expressions, and explanatory data attached to it or one of its selected parts that may not be necessarily in the minds of other users, who may happen to be looking at the same multimedia content [4, 5]. Annotations can be either formal (structured) form or informal (unstructured) form. Annotations can either be implicit i.e., they can only be interpreted and used by original annotators, or explicit, where they can be interpreted and used by other non-annotators as well. The functions and dimensions of annotations can be classified into writing vs reading annotations, intensive vs extensive annotations, and temporary vs permanent annotations. Similarly, annotations can be private, institutional, published, and individual or workgroup-based [6]. Annotations are also interpreted as universal and fundamental research practices that enable researchers to organize, share and communicate knowledge and collaborate with source material [7]. Annotations can be manual, semi-automatic or automatic [8, 9]. Manual annotations are the result of attaching knowledge structures that one bears in his mind in order to make the underlying concepts easier to interpret. Semi- automatic annotations require human intervention at some point while annotating the content by machines. Automatic annotations require no requiring human involvement or intervention. Table 1 summarizes annotation 200 https://sites.google.com/site/ijcsis/ ISSN 1947-5500 International Journal of Computer Science and Information Security (IJCSIS), Vol. 14, No. 3, March 2016 techniques with required levels of participation from humans and machines as well as gives some real world examples of tools that use these approaches. TABLE 1 ANNOTATION APPROCHES Annotation Techniques Human Manual Semi-Automatic Automatic Participation & Use of Tools Human task Entering some descriptive keywords Entering initial query at start-up No interaction Using recognition technologies Providing storage space or databases for Parsing queries to extract information Machine task for detecting labels and storing and recording annotations semantically to add annotations semantic keywords Semantator11, NCBO Annotator12, Examples GAT7, Manatee8, VIA9, VAT10 KIM14, KAAS15, GATE16. cTAKES13 B. Annotation Frameworks and Models Before discussing state-of-the-art video-annotation systems, it is necessary to investigate how these systems manage the annotation process by investigating different frameworks and models. These frameworks and models provide manageable procedures and standards for storing, organizing, processing