IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 2, No 3, March 2012 ISSN (Online): 1694-0814 www.IJCSI.org 223

Web Image Retrieval based on Semantically Shared Annotation

Alaa Riad1, Hamdy Elminir2 and Sameh Abd-Elghany3

1Vice dean of Students Affair, Faculty of Computers and Information Sciences, Mansoura University Mansoura, Egypt

2Head of Electronic and Communication Dept, Misr Higher Institute of Engineering and Technology Mansoura, Egypt

3Faculty of Computers and Information Sciences, Mansoura University Mansoura, Egypt

Abstract perceives from the image [3]. This paper try to narrow the This paper presents a new majority voting technique that combines semantic gap problem and enhance the retrieval precision by the two basic modalities of Web images textual and visual features fusing the two basic modalities of Web images, i.e., Textual of image in a re-annotation and search based framework. The context (usually represented by keywords) and visual features proposed framework considers each web page as a voter to vote the for retrieval. This paper presents a new majority voting relatedness of keyword to the web image, the proposed approach is not only pure combination between image low level feature and technique that considers each web page as a voter to vote the textual feature but it take into consideration the semantic meaning of relatedness of keyword to the web image, the proposed each keyword that expected to enhance the retrieval accuracy. The approach is not only pure combination between image low proposed approach is not used only to enhance the retrieval accuracy level feature and textual feature but it take into consideration of web images; but also able to annotated the unlabeled images. the semantic meaning of each keyword that expected to Keywords: Web Image Retrieval, Ontology, Data Mining. enhance the retrieval accuracy.

This paper is organized as follows; the related work presented 1. Introduction in section 2, while, section 3 presents in details the architecture of the proposed approach, while section 4 With the advent of the Internet, billions of images are now concludes this research. freely available online [1]. The rapid growth in the volume of such images can make the task of finding and accessing image of interest, overwhelming for users Therefore, some 2. Related work additional processing is needed in order to make such collections searchable in a useful manner. The current Web Current research in web image retrieval suggested a joint use image retrieval search engines, including Google image existing Textual context and visual features can provide a search, Lycos and AltaVista photo finder, use text (i.e., better retrieval results [4,5]. The simplest approach for this surrounding words) to look for images, without considering method is based on counting the frequency-of-occurrence of image content [2,15]. However, when the surrounding words words for automatic indexing. This simple approach can be are ambiguous or even irrelevant to the image, the search extended by giving more weights to the words which occur in based on text only will result in many unwanted result the alt or src tag of the image or which can occur inside the images. head tag or any other important tags of the HTML document. In contrast, Content-Based Image Retrieval (CBIR) systems However, purely combination of traditional text-based are proposed to utilize only low-level features, such as color, retrieval and content-based retrieval is not adequate to deal texture and shape, to retrieve similar images. In general, the with the problem of image retrieval on the WWW. The first bottleneck to the efficiency of CBIR is the semantic gap reason is that there is already too much clutter and irrelevant between the high level image interpretations of the users and information on the web pages. These semantic features are the low level image features stored in the for less accurate than annotating text. The second reason is due indexing and querying. In other words, there is a difference to the mismatch between the page author’s expression and the between what image features can distinguish and what people user’s understanding and expectation. This problem is similar

Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved. IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 2, No 3, March 2012 ISSN (Online): 1694-0814 www.IJCSI.org 224

to the subjectivity of image annotation. The third reason is considered as being highly relevant to each other [13]. By due to the difficulty to find out the relationship between low- considering a near-duplicate (visually similar) images as a level features and high-level features. transaction and its associated keywords as the items in the transaction, it is very natural to discover correlations between The second approach takes a different stand and treats images image low level feature and keywords by applying and texts as equivalent data. It attempts to discover the association rules mining model [14]. correlation between visual features and textual words on an unsupervised basis, by estimating the joint distribution of 3.2. System Components features and words and posing annotation as statistical inference in a graphical model. For example image retrieval The proposed search engine is composed mainly on two system based on decision trees and rule induction was phase, preprocessing phase and phase. The presented in [7] to annotate web image using combination of preprocessing phase is responsible for data and image image feature and , while in [8], a system that collection and semantic annotation, while semantic search automatically integrate the keyword and visual features for phase is responsible for image retrieval. The next sections web image retrieval by using association rule mining explain in details each phase and its components. technology. These approaches usually learn the keywords 3.2.1 Preprocessing Phase correlations according to the appearance of keywords in the web page, and the correlation may not reflect the real The preprocessing phase has the following main modules: correlation for annotating Web images or semantic meaning These modules are: (1) module, (2) Parser of keywords such as synonym [9]. Ontology-based image module, (3) Image Processing module, (4) NLP module, (5) retrieval is an effective approach to bridge the semantic gap Voting Module. Figure 1 shows these modules. Each module because it is more focused on capturing semantic content from these modules will be composed to a set of functions in which has the potential to satisfy user requirements better terms of system functionality. The following section of the [10,11]. While semantically rich ontology addresses the need research contains the description of each module and its for complete descriptions of image retrieval and improves the functions in details. precision of retrieval. However, the lack of text information which affects the performance of keyword approach is still a (1) Web Crawler Module problem in text ontology approach. Ontology works better with the combination of image features [12].this paper There are lots of images available on the Web pages. In order presents a new framework for web image retrieval search to collect these images, a crawler (or a spider, which is a engine which relay not only on ontology to discover the program that can automatically analyze the web pages and semantic relationship between different keywords inside the download the related pages). Instead of creating a new web page but also propose a new voting annotation technique ontology from scratch, we extend WordNet, the well-known extract the shared semantically related keywords from word ontology, to word-image ontology, WordNet is one of different web pages to eliminate and solve the problem of the most widely used, commonly supported and best subjectivity of image annotation of traditional approaches and developed ontologies. In WordNet, different senses and enhance the performance of the retrieval results by taking the relations are defined for each word. We will use Wordnet to semantic of the correlated data into consideration. provide a comprehensive list of all classes likely to have any kind of visual consistency. We do this by extracting all non- abstract nouns from the database, 75,062 of them in total. , by 3. Architecture of the Proposed Web Image collecting images for all nouns, we have a dense coverage of Retrieval Search Engine Prototype all visual forms.

3.1. General Overview (2) HTML Parsing Module

Due to there are many unrelated keywords are associated to In our system, we use HTML Parser to transform html the web image, in order to enhance the retrieval process of documents into DOM tree. the DOM Tree is based webpage web image, it should be decrease or remove these keywords. segmentation algorithm that automatically segments web The proposed approach trying to solve this shortcoming of pages into sections, with each section consisting of a web most of current systems by proposing a voting technique image and its contextual information (i.e. image segment), which based on frequent itemset mining to get the relation and then extract the text and images by traversing through the between low level feature of image content and high level DOM tree. . First, we generate the DOM tree for each web feature. page containing the web images. From the bottom, visual The more frequent the keyword occurs is the key of objects like image, text paragraph are identifier as basic measuring keyword correlation. If two or more keywords elements. The tags such as

, and appearing together frequently with an image can be
are used to separate the different content passages.

Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved. IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 2, No 3, March 2012 ISSN (Online): 1694-0814 www.IJCSI.org 225

Figure 3 shows an example. The left part of Figure 3 shows two paragraphs and one image of a web page about “Mosque”. The right part illustrates part of the DOM tree parsed from this web page. Second, we extract the semantic Crawler Module 1 text information for the images. The extracted information includes: 2 Parser Module – ALT (Alternate) Text 4 – The closest title with the images in the DOM tree. 3 The ALT text in a web page is used for displaying to replace Image Text Extraction the associated image in a text-based browser. Hence, it Extraction usually represents the semantics of the image concisely. We can obtain the ALT text from ALT tag directly. A feasible Stop Words Feature way is to analysis the context of the web image to obtain the Removing semantic text information of the images. Figure 2 shows that Extraction And Word Stemming the image and its “nearest” title “Mosque” have clear

semantic correlation.

Image Processing Module Module Processing Image Image Natural Language Processing Processing Language Natural Clustering Keyword Extraction (3) Natural Language Processing (NLP) Module

The raw text of the document is treated separately as well. In Database Index order to extract terms from text, classic Natural Language Images and Processing (NLP) techniques are applied. This module Keywords responsible for these functions: Wordnet Database Stop Words Removing: Removal of non-informative words is a commonly used technique in text retrieval and Voting Module 5 categorization. In this scenario, a predefined “stoplist”, which Fig. 1 The preprocessing phase consists of hundreds of less meaningful high-frequency words (e.g., prepositions and conjunctions), is employed to eliminate irrelevance information in text documents. Such a

TAB

TDBODY

TR TR

TD TD

There is an The Mosque awesome view we went to of Cairo from visit is the Mosque this Mosque as Mosque of it overlooks the Muhammad whole of Cairo. ‘Ali Pasha

Fig. 2 Example of HTML Dom Tree

Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved. IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 2, No 3, March 2012 ISSN (Online): 1694-0814 www.IJCSI.org 226

2 1 Ontology Query NLP Wordnet Reasoning Database

3 Result

User Interface Interface User Search

Database Index

Images and Textual

Fig. 3 Semantic Retrieval Phase

method is quite advantageous for improving the accuracy of Image Clustering the search results and reducing the redundancy of the computation. The k-means algorithm is used to perform the clustering process. This choice was mainly motivated by the Word Stemming: By stemming of word, it will be changed comparably fast processing of the k-means algorithm into the word’s basic form. The documents are first parsed compared to other unsupervised clustering. into words. Second the words are represented by their stems, K-means can be described as a partitioning method. It is an for example ‘walk’, ‘walking’ and ‘walks’ would be unsupervised clustering method that provides k clusters, represented by the stem ‘walk’. where k is fixed a priori. K-means treats each observation in data as an object having a location in space. It finds a Keyword Extraction: After stemming each word are then partition in which objects within each cluster are as close to weighted using a normalized (tf-idf). At the end, the text part each other as possible, and as far from objects in other of the document is represented simply by a set of keywords clusters as possible. First k points are chosen as centroids and weights. (one for each cluster). The next step is to assign every point from data set to the nearest centroid. After that, new centroids (4) Image Processing Module of the clusters resulting from the previous step are calculated. Several iterations are done, in each the data set is assigned to This module is responsible for performing the function that the nearest new centroid. are related to the image, the next section explain these functions in details. (5) Voting Module

Feature Extraction Process In the previous modules all images have been initially annotated and visually clustered. Due to the primarily The color feature is extracted using color histogram method. annotation error, the target image may be primarily annotated Color histogram is popular because they are trivial to with error keyword. The underlying problem that we attempt compute, and tend to be robust against small changes in to correct is that annotations generated by probabilistic object rotation and camera viewpoint. The color histogram models present poor performance as a result of too many represents an image by breaking down the various color “noisy” keywords. By “noisy” keywords, we mean those components of an image and extracts the three histograms of which are not consistent with the rest of the image RGB colors; Red (HR),Green (HG),and Blue( HB), one for annotations and in addition to that, are incorrect. each color channel by computing the occurrences of each Our assumption in this module is that if certain images in the color (histogram). After computing the histogram, the database are visually similar (located in one cluster) and histogram of each color is normalized because the images are semantically related to the candidate annotations, the textual downloaded from different sites which maintain images with descriptions of these images should also be related together. different size. If there is in the cluster image label does not have similarity

Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved. IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 2, No 3, March 2012 ISSN (Online): 1694-0814 www.IJCSI.org 227

in semantic to annotations to other images, this means that 8) An association rule is a pattern that states when X this image was initially annotated with error keyword. In this occurs, Y occurs with certain probability. module, we attempt to find the proper label for each cluster based on semantic analysis of the different candidate image The steps of annotation process based association rule: labels inside the cluster and then using data mining technique to find association rule between each different keyword and (1) Scan the transaction DB to get the support S of each the cluster to select most appropriate keyword to this cluster. concept Ki and visual cluster Ci, and select those This module consist these functions; concepts and clusters with support greater than user specified minimum support. Concept Extraction (2) Construct the transaction database D and the basic

candidate 2-itemsets based on the existing inverted file. In this function the concept (lemma) of the keyword will be We do not start from 1-itemset because the visual extracted from wordnet, then the similarity between the features are very high dimensional and the associations different keywords are measured, this process performed between concepts which are single modality association because the image may be described by different keywords rules are much stronger than the associations between ind different web pages but the meaning is related. Two main concepts and low-level features or low-level visual relationships between keyword are analyzed; taxonomy and clusters. If starting from 1-itemset, the concepts and Partonomy. Taxonomy divides a concept into species of visual feature cluster are equally treated, and then most kinds (e.g. Car and bus are types of vehicle), while of the created 2-itemsets based on 1-itemset are concept Partonomy divides the concept as a whole into different parts and concept, but few of concept and visual feature (e.g. Car and wheel). For example, such analysis might show cluster. Our goal is not the association between concept that car is more like a bus than it is a tree, due to the fact that and concept. We are interested in the association car and bus share vehicle as an ancestor in the WordNet noun between concepts and visual feature clusters. Therefore, hierarchy. only the imageset containing one concept and one visual

feature cluster are considered. The existing inverted file Image Label Voting relates the concepts to their associated images.

Our assumption is that if certain images in the database are (3) For each concept ki in the cluster cj, calculate the support visually similar to the target image and semantically related between concept ki and cluster cj. to the candidate annotations, the textual descriptions of these images should also be correlated to the target image. If the Supp (Ki, Cj)= Count(Ki,Cj) / Size(Cj). target image label does not have similarity in semantic to the Where count(ki) is the frequency of occurrence of candidate image label, this means that the target image was concept ki in the cluster cj; while size(cj) is the total annotated with error keyword. number of visual images in the cluster. One of the typical data mining functions is to find association (4) All imageset that have support above the user specified rules among data items in a database. minimum support are selected. These imageset are added To discover the Association between the high-level concept to the frequent imageset. and low-level visual features of images, we need to quantify

the visual features by clustering, because the concept space is (5) For each imageset in the frequent imageset, calculate the discrete while the visual feature space is continuous in confidence between concept ki and cluster cj. general. Therefore, we aim to associate the concepts and the conf (Ki, Cj)= Count(Ki,Cj) / count(ki) visual feature. Where count(ki) is the frequency of occurrence of Dentitions concept ki in the database; 1) imageset: A set of one or more images (6) The rules that have confidence >= minimum Confidence 2) k-images X = {x1, …, xk} are selected to strong rule. 3) count of X: Frequency or occurrence of instance of image X (7) Order all frequent imagset in the strong rule according to 4) support , s, is the fraction of transactions that contains X their confidence, and then select the concept with (i.e., the probability that a transaction contains X) highest confidence as a label to the associated cluster. 5) An imageset X is frequent if X’s support is no less than a minsup threshold 3.2.2 The Semantic Retrieval Phase 6) support, X  Y , probability that a transaction contains X  Y The proposed framework not only support the text based 7) confidence, c, conditional probability that a transaction image retrieval, but also try to enhance the retrieval result by having X also contains Y taking into account the semantic meaning of the user’s query.

Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved. IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 2, No 3, March 2012 ISSN (Online): 1694-0814 www.IJCSI.org 228

When the user provide his/her query of text, the system [2] M.L. Kherfi, D. Ziou, A. Bernardi, “Image Retrieval from the understands the syntax and meaning of a users query and uses : Issues, Techniques, and Systems”, ACM a linguistic ontology to translate this into a query against the Computing Surveys Vol.36, No. 1, 2004, pp. 35–67. visual ontology index and any metadata or keywords [3] Y. Alemu, J. Koh, and M. Ikram, “Image Retrieval in Multimedia : A Survey” Fifth International associated with the image. Figure 3, shows in details the Conference on Intelligent Information Hiding and Multimedia retrieval phase and its functions. This phase contains two Signal Processing, 2009. modules, NLP module and ontology reasoning module. The [4] R. He , N. Xiong, L.T. Yang, and J. H. Park, “Using Multi- NLP module was discussed in details in section 3.1. The next Modal Semantic Association Rules to Fuse Keywords and section explains the ontology reasoning module. Visual Features Automatically for Web Image Retrieval”, Information. Fusion , 2010. Ontology Reasoning [5] J. Hou, D. Zhang, Z. Chen1, L. Jiang, H. Zhang, and X. Qin, “Web Image Search by Automatic Image Annotation and Ontology reasoning is the cornerstone of the , a Translation”, IWSSIP 17th International Conference on Systems, Signals and Image Processing, 2010. vision of a future where machines are able to reason about [6] Z. Chen, L. Wenyin, F. Zhang, H. Zhang, “Web Mining for Web various aspects of available information to produce more Image Retrieval”, Journal of the American Society for comprehensive and semantically relevant results to search Information Science and Technology - Visual based retrieval queries. Rather than simply matching keywords, the web of systems and web mining archive, Vol. 52, No. 10, 2001, pp. the future will make use of ontology to understand the 832-839. relationship between disparate pieces of information in order [7] R.C.F. Wong and C.H.C. Leung,”Automatic Semantic to more accurately analyze and retrieve images. Most image Annotation of Real-World Web Images”, IEEE Transactions on retrieval method always assumes that users have exact the Pattern Analysis and Machine Intelligence, Vol. 30, No. 11, mind searching goal in mind. 2008, pp. 1933-1945. [8] R. He, N. Xiong, T. H. Kim and Y. Zhu, ” Mining Cross-Modal However, in the real world application, the case is that users Association Rules for Web Image Retrieval”, International do not clearly know what they want. Most of the times, they Symposium on Computer Science and its Applications, 2008, only hold a general interest to explore some related images. pp. 393-396. The ontology reasoning is based on the semantic associations [9] H. Xu, X. Zhou, L. Lin, Y. Xiang, and B. Shi, ”Automatic Web between keywords. This is achieved by finding which Image Annotation via Web-Scale Image Semantic Space concepts in the ontology relate to a keyword and retrieving Learning”, APWeb/WAIM 2009, LNCS 5446, pp. 211–222, information about each of these concepts. By this module the 2009. ontology is used for quickly locating the relevant semantic [10] V. Mezaris, I. Kompatsiaris, and M. G. Strintzis, “An Ontology concept and a set of images that are semantically related to Approach to Object-Based Image Retrieval”, in Proc. IEEE ICIP, 2003. the user query are returned. [11] O. Murdoch, L. Coyle and S. Dobson, “Ontology-Based query Recommendation as A Support to Image Retrieval”. In Proceedings of the 19th Irish Conference in Artificial 4. Conclusion and Future Work Intelligence and Cognitive Science. Cork, IE. 2008. [12] H. Wang, S. Liu, L.T. Chia, “Does Ontology Help in Image After a review of existing techniques related to web image Retrieval ?: A Comparison Between Keyword, Text Ontology retrieval, we point out that these methods are not powerful and Multi-Modality Ontology Approaches”, Proceedings of the enough to retrieve efficiently relevant images including 14th annual ACM International Conference on Multimedia, semantic concepts. We propose an architecture that combines 2006, pp. 23-27. semantic annotation, data mining and visual features which [13] Y. Yang, Z. Huang, H. T. Shen, X. Zhou,” Mining Multi-Tag are collected from different web pages that share visually Association for Image Tagging”, Journal World Wide Web Archive, Vol. 14, No.2, 2011. similar images not from single web page as traditional [14] R. Agrawal, T.S Imielin, A.T. Swami, “Mining Association approaches.. This system use visual ontology, which is a Rules Between Sets of Items in Large Databases”, SIGMOD concept hierarchy, is built according to the set of annotations. Rec, 1993, Vol. 22, No.2, pp. 207–216. In the retrieval process to suggest more results that are related [15] Z. Gong, Q. Liu, “Improving Keyword Based Web Image to the user’s query. Currently, we continue to develop the Search with Visual Feature Distribution and Term proposed framework and to look for the best appropriate Expansion”, Journal Knowledge and Information Systems Vol. algorithms and methods to compute interesting relevant 21, No.1, 2009. descriptive metadata and suited visual ontology.

References [1] A. Torralba, R. Fergus, W.T. Freeman, “80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition” Pattern Analysis and Machine Intelligence, IEEE, Vol. 30, No. 11, 2008, pp. 1958 – 1970.

Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved.

,