<<

International Journal of Computational Intelligence Techniques, ISSN: 0976–0466 & E-ISSN: 0976–0474 Volume 1, Issue 1, 2010, PP-02-05

Searching with integration of context and content analysis

Catherine Chitra and Anitha Julian Department of Computer Science and Engineering, Velammal Engineering College, Chennai, 600 066, [email protected], [email protected]

Abstract – Now a day’s video blogs are playing an important role. To effectively manage the and make them more conveniently accessible for users is a challenging problem. In this paper we proposed a novel management model which is consists of two stages: annotation and search. In Vlog annotation, we extract informative keywords not only from the textual content of the target vlog itself but also from external resources which are semantically and visually relevant to it. Sentiment evaluation obtained from comments. In vlog search we adopt saliency based matching to make the search results. We use different ranking strategies are adapted to the user. Keywords -Annotation expansion, ranking, saliency-based matching, vlog annotation, vlog search

I. Introduction Video blogging, sometimes shortened to vloggers from all over the world, it is Vlogging or video blogging is a form of expected that the words used in vlog texts blogging for which the medium is video are subjective and Nonstandard [5, 6]. entries are made regularly and often Therefore, the annotation extracted directly combine embedded video or a video link from the vlog text is, in most cases, of low with supporting text, images, and other quality, which will consequently put at risk [1, 2]. Because of the richness of the performance of vlog search. Existing expression of , vlogs are much more approach proposed vlog annotation model, powerful and compelling than text-only which comprises intrinsic annotation blogs, and thus gain much attention now a creation from the target vlog and annotation day. A is a type of website maintained expansion from relevant external recourses. by an individual with regular entries of Traditional annotation focuses semantic commentary, descriptions of events or other aspect, while sentimental aspect is totally material such as graphics or video. Most neglected. We propose an effective way for blogs are primarily textual, although some automatic vlog annotation, which analyze focus on artlog, photolog, sketchlog, vlog, vlogs semantically and sentimentally. In Mp3 blog. From this vlog we obtain some order to acquire high-quality annotation for a useful information such as submitting time, vlog, we first extract intrinsic annotation from viewed time, comment entry number and the original text of a vlog; then, using rating. Many number of vlogs growing relevant external resources, we improve the exponentially, how to manage the vlogs intrinsic annotation by context- based effectively and conveniently accessible has annotation expansion. We make better use become a challenging problem. In this of the context information both the global paper, we will discuss this problem in detail. context of the whole vlog and the local It consist of two stages: annotation and context of each keyword and greatly boost search. the quality of annotation. Besides semantic annotation, we perform sentiment analysis A. Vlog Annotation based on the comments to obtain the overall Video-centered vlog has the advantage over evaluation of the vlog [3, 4]. general video because all the textual content in a vlog is used to describe the video. This annotation will be more suitable and less expensive than general video. As we know the manual annotation is labor-intensive and time-consuming, automatic annotation has become the major direction of research efforts. As a result, an effective method for automatic vlog annotation is obviously the key to solving the problem mentioned above. Basically vlogs are created by Fig. 1- Vlog Management Model

Copyright ©2010, Bioinfo Publications, International Journal of Computational Intelligence Techniques ISSN: 0976–0466 & E-ISSN: 0976–0474, Volume 1, Issue 1, 2010

Searching video blogs with integration of context and content analysis

B. Vlog Search semantic relations that specify how sugar The aim of vlog search is to provide the user maples differ from other kinds of maples[7]. with vlogs that are most relevant to the Statistical: As we know Statistical is data submitted query. driven and flexible to measure the relevance To improve the overall search performance of two words by using NGD method[8].This we introduce content-based video search. new method is proposed in to extract To measure the similarity of each model semantic knowledge from the world-wide- linear mixing or discriminative classifier are web for both supervised and unsupervised used. We propose a novel saliency-based learning using the Google search engine in similarity matching approach for vlog search. an unconventional manner. The approach is We place major emphasis on the saliently novel in its unrestricted problem domain, similar aspect of the objects to be simplicity of implementation, and manifestly compared, and thus make the search result ontological underpinnings. We give evidence more agreeable for users. of elementary learning of the semantics of After the relevant vlogs are obtained using concepts, in contrast to most prior saliency-based matching, different ranking approaches outside of Knowledge strategies are adopted according to the Representation research that have neither user’s specific interest. Finally, the ranked the appearance nor the aim of dealing with vlogs are clustered according to the ideas, instead using abstract symbols that category remain permanently ungrounded throughout When a user uploads a vlog to the the machine learning application. database, semantic annotation will run The world-wide-web is the largest database automatically using vlog text and relevant on earth, and it induces a probability mass external resources, and sentiment function, the Google distribution, via page evaluation is obtained from vlog comments counts for combinations of search queries. illustrated from Fig 1.After that, the vlog will Then we take the reciprocal of the revised be stored in the database with the NGD as the correlation or similarity of two corresponding annotation and evaluation. words and, resulting in When a user submits a query to the search engine, the vlog search module will access the vlog database to obtain relevant vlogs by saliency-based matching; then, using user specified ranking strategy and clustering, the results will be returned to the user in a well- organized manner.

II. Techniques Used In Vlog Annotation In this section we discuss word correlation measurement, intrinsic annotation, vlog categorization, context based annotation and sentimental evaluation.

A. Word Correlation Measurement There are two approaches used to calculate word correlation. Existing approaches to measuring word correlation mainly fall into two categories: the lexical and the statistical approaches. Lexical: Relational theories of lexical semantics hold that any word can be defined in terms of the other words to which it is related. For example, a definition of the compound noun “sugar maple” might start with its sub-name, “A sugar maple is a maple that . . . ,” followed by a relative clause based on Part-name or other

International Journal of Computational Intelligence Techniques, ISSN: 0976–0466 & E-ISSN: 0976–0474 3 Volume 1, Issue 1, 2010 Catherine Chitra and Anitha Julian

Where PMI (w1,w2) is the Point wise • Sports Mutual Information between w1 and w2. • Travel & Events From (1) and (2), we These categories are a subset of the derive the relationship between NGD, ' categories defined by You Tube. >vor^, and PMI as follows:

Where C is the set of all categories.

The larger the value of ^miword > the more The category most closely related to a word relevant the two words are on semantics. In is taken as the most probable category the order to make the similarity measurement word falls into more suitable for vlog words which are Where the function mod returns the value video-centered, we use Google Video3 that occurs the most frequently. searcher instead (of Google) to acquire the number of returned results. D. Context-Based Annotation Expansion B. Intrinsic Annotation Creation To achieve a high-quality annotation, we The textual content Consist of title, perform annotation expansion based on the description, and comments, among which specific context of the vlog. Therefore, we the title and description are closely related to adopt YouTube as our labeled database. the semantics of the vlog video illustrated in Given a textual query, the text-based video Fig 2. Therefore, we first extract annotation search engine in YouTube can return rather words from the title. After stop word good results, hence we use YouTube search removal, important words are reserved in to find the semantically related videos. the word set "title- After keywords extraction In classic search-based annotation methods from the title and description, we merge needs labeled dataset. That’s label’s quality word and getting the below equation. and the size are important factors. We make we should also extract annotation words a large dataset and high quality label is time from the body of the textual content, i.e., the consuming and labor-intensive. vlog’s description. Similarly, we remove the stop words beforehand. Then, using the E. Sentimental Evaluation standard text processing technique such as The users can decide whether a vlog is tf-idf, we can acquire the worth viewing based on the existing important words and create another word comments. User’s comments will be either set, text based or speech. The main process of the sentiment evaluation is to extract annotations words from the user’s comments using speech tagging. C. Vlog Categorization In speech tagging to extract nouns, Vlog Categorization plays a major role in adjectives and verbs from each comment Vlogging. For example, NBA game of the entry. Each extracted word is assigned with team Houston Rockets will have an a sign +1 or -1 indicating whether it is annotation word “rockets”. Only from the modified by a negative adverb or not. word “rockets”, the user will be confused Every time a vlog is viewed, new comment whether it is a vlog concerning military. entries can be attached. The set of However, once we have confined its comments is ever-growing and sentimental category to sports, there will be no doubt evaluation of a vlog is ever-changing. that the “rockets” here is the name of a The sentiment evaluation is stored along basketball team. In this paper, we focus on with the semantic annotation to facilitate the following seven categories, which cover vlog search. To improve the performance most of the hot topics of vlogs: context is a key factor. • Autos & Vehicles • Film & Animation III. VLOG Search • Music Vlog search method consists of two • News & Politics categories. Saliency-Based Matching and • Pets & Animals Personalized Ranking.

4 Copyright ©2010, Bioinfo Publications, International Journal of Computational Intelligence Techniques ISSN: 0976–0466 & E-ISSN: 0976–0474, Volume 1, Issue 1, 2010 Searching video blogs with integration of context and content analysis

A. Saliency-Based Matching gratitude to our respected principal Dr. J. We propose a novel saliency-based Shanmugam for his constant support and similarity matching approach for vlog search encouragement. My sincere thanks to the based on the characteristic of human HOD Prof. B. Rajalakshmi for his inspiring perception, in vlog search; two aspects support to my project. I express my sincere should be taken into consideration: the high- thanks to Mrs. Anita Julian Asst Professor level semantics and the low-level video my project internal guide for his inspiring features. support to my project. Regards are also due If a vlog is similar to the query vlog in both to our family, classmates and friends who high level semantics and low-level features offered an unflinching moral support for we obtain the best matching result. If a vlog completion of this project. is similar to the query vlog only in high-level semantics, which means vlog annotation is References almost same to the query vlog. This is [1] J. C. Dvorak, The Blog Phenomena otherwise called as extensive reading. 2002[Online].Available: If a vlog is similar to the query vlog only in http://www.pcmag.com/article2/0,18 the low-level aspect, vlog’s central video is 95,81500,00.asp, Online Essay in almost same to the query vlog. This is PC Mag. otherwise called as intensive reading. If the [2] A. Rosenbloom, “The : submitted query is not a whole vlog i.e., the Introduction,” Commun. ACM, query is just some textual words, or image. pp.31–33, 2004. [3] B. A. Nardi, D. J. Schiano, M. B. Personalized Ranking Gumbrecht, and L. Swartz, “The After submitting the query, the users can blogo-sphere: Why we blog,” personally specify the Commun. ACM, pp. 41–46, 2004. • Ranking criteria by providing [4] B. A. Nardi, D. J. Schiano, and M. • Last submitting Vlogs Gumbrecht, “Communities: Blog- • Most attar active Vlogs ging as social activity, or, would you • Heatedly discussed vlogs. let 900 million people read your • Ranking by popularity diary?,” in Proc. 2004 ACM Conf. As a result, after ranking, we organize the Computer Supported Cooperative vlogs by clustering them according to their Work, Chicago, IL, 2004, pp. 222– category information. Vlogs with the same 231. category information fall into a cluster. With [5] J. Hoem, “Videoblogs as ‘collective the help of the clustered organization, the documentary’,” in BlogTalks 2.0: semantics become more coherent within The European Conf. on Weblogs, each cluster, thus the users can browse the 2004, pp. 237–270. results efficiently and locate the specific [6] D. Stolarz and L. Felix, “Creating your vlogs of interest rapidly. own media empire with & video-blogs,” IV. Conclusion West, 2005 Using this features , we will make a better [7] G. A. Miller, “WordNet: A lexical use of the advanced visual and audio d a tabase for English,” features of vlogs Management. Some Commun.A, pp. 39–41, 1995. advanced techniques in natural language [8] R. Cilibrasi and P. Vitanyi, “Automatic processing and data mining can also be extraction of meaning from the applied to obtain higher-quality textual web,” in Proc. IEEE Int. Symp. features. Furthermore, besides vlog Information Theory, Seoul, annotation and search, we are interested in Korea,2006, pp. 2309–2313. other areas of vlog management, such as vlog submission and vlog storage.

Acknowledgement First of all I thank the almighty for giving me the knowledge and courage to complete this project work successfully. I express my

International Journal of Computational Intelligence Techniques, ISSN: 0976–0466 & E-ISSN: 0976–0474 5 Volume 1, Issue 1, 2010