2014 Fifth International Conference on Intelligent Systems, Modelling and Simulation

Domain Oriented Semantic Web based Personalized

Shruti Kohli Sonam Arora Department of Computer Science Department of Computer Science BIT, Noida BIT, Noida [email protected] [email protected]

Abstract—Present market is dominated by such Search Recall. The above said weakness of today’s search engine Engines which are working on keyword based querying leads to low precision and recall parameters. system. This becomes useless and leads to wastage of user’s This weakness can be resolved to much extent using time if he is not aware of the keywords which are used to index Semantic Web Technologies. This technology works by desired relevant pages. For example, user enters keyword extending the current existing web with semantics that ‘Book’, now Google will show results for both ‘Reading any provides meaning to that web image or document. This book’ and ‘Book a Hotel’. That means user has to look into the meaning is defined in a way that understood by machine thus contents of the web pages to shortlist relevant pages which he reducing the effort of users in searching for relevant images. needs. The same problem exists in case of image search In Semantic Technologies, information is represented in a engines. If query is search for images of ‘hotel in Delhi’, image new W3C standard called Resource Description Framework result set will contain irrelevant as well relevant images. Now solution is needed where machine will itself divide the result (RDF). Now in Semantic technologies, ontology is main into relevant and irrelevant images and then showing the ingredient. Resource Description Framework has relevant ones to the user. But this solution is not feasible one recommended well defined format for representing ontology. because it then has to check the content of image using image Currently existing formats are RDF/XML, OWL, Turtle processing techniques and then checking for similarity between etc. Currently research on semantic technologies is in all the images which is not implementable for millions of beginning stage, therefore traditional search engines like records worldwide. Another solution is Semantic Web Google, Bing, Yahoo etc are still dominating the market. Technologies. It is an extension of the current web that allows Let’s take illustration of query to search for “Taj Mahal”. the meaning of information to be precisely described which can Now the results displayed by traditional search engines will be well understood by computer as well as user. Ontology is display thousands of images for “Taj Mahal in Agra” as well very important ingredient of Semantic Web and in this work as “Taj Mahal casino in Atlantic, USA” Search engine will Ontology for Hotel domain is used. User will be provided easy not refine the result for user. User himself has to sift through to use interface to query hotel ontology. Technologies used are the result set to find relevant results for his use. Information SPARQL query language and JENA API for searching user retrieval in current scenario relies only on keyword searches query inside ontology. In this work focus is given over using Google, Yahoo, Bing etc or based on simple metadata preserving user’s preferences while displaying results on the such as that of an RSS. Moreover, there is no provision to web page. Challenge was on dynamically loading the hotel generate personalized searches easily, so users need to think dataset into ontology in RDF format. This was done using and write search keywords that match their own Semantic Tool which internally uses Google AJAX API for populating latest results from Internet. Advantage of using requirements correspondingly. Such a process of searching is Semantic Web is that it results in only relevant images, which time consuming and requires lot of effort on human part. in turn increases Precision and Recall rates of search engine. That means if users does not understand the keywords to be used for searching, then he can’t perform relevant Keywords-SPARQL; RD; Semantic Web; Web3.0 information retrieval. Now semantic searching increases this efficiency by I. INTRODUCTION providing only relevant results to the user. It represents the Nowadays, whatever information user needs, he gets it data available over internet in format of ontology, which online, anywhere and anytime. If any user wants to visit contains the description of information using metadata. User some place, he would like to search information about that does not need to apply effort to think for keywords that will place, hotel available, weather etc beforehand. User may also give them the result they desire, instead the user can simply like to look at images of hotels before checking their provide the search engine with whatever information it has website. For this, lots of search engines are available in by selecting domain. today’s market online. User needs to enter keyword for the The core idea of domain based search engine is to query he is having and then he is flooded with images that is describe query in the form of domain description. For this available online. But the problem is that information which user need to build a query type that is well understood by is displayed contains relevant as well as irrelevant results. Semantic Web. This paper proposes an architecture where Now user’s work increases in separating his desired set of queries are not build using natural langue, but an easy to use images from the pool of results displayed. This leads to user interface that help users to build complex queries they wastage of time and energy. This measure which checks the want. efficiency of search engine in terms of number of relevant documents returned is done by two factors, Precision and

2166-0662/14 $31.00 © 2014 IEEE 23 DOI 10.1109/ISMS.2014.12 II. CHALLENGES contain lots of unwanted image result which are of no Keyword based search is useful especially to a user who interest to the user. Consider Figure 2. knows what keywords are used to index the images as they can easily define queries. This approach becomes problematic when the user is not aware about the way to write query such that desirable results only appear because for that he must know the semantic concepts that are used in that particular domain in which he is interested. And therefore after user enters the query, he is returned with some irrelevant images along with relevant ones. To check this efficiency of search engine, two parameters are available that is Precision and Recall. Consider Figure 1.

Figure 2. Google images results

Semantic Web based search aims to provide better precise and recall rates as compared to keyword based search. Challenge is to create a domain based semantic web search engine which is highly user friendly and provide advance search options with the help of various parameters that a user can think of. User friendly in a sense that user need not think of appropriate keyword that might give them their desired result, instead the user can simply provide the search engine with whatever information it has by selecting provided options. Now it is clear that Ontology is the main ingredient of Semantic Web and it is build for a particular domain. This work is using domain for Hotels available in the continent of Asia and Europe. Now the challenge is to load the RDF data in Hotel Ontology dynamically. For this, semantic tool is developed. This semantic tool is making use of Google AJAX API internally to fetch results from Google search engine. Now over these results, it employs URL checking to separate relevant results from irrelevant results. These relevant search results are then transformed into RDF format Figure 1. Precision vs. Recall and then populated in Hotel ontology. From the given interface user chooses the desired options and then sends the Let A be number of relevant records retrieved, let B be query to this search engine, which in turn provides only the number of relevant records not retrieved and C be the reliable results from the ontology. number of irrelevant records retrieved.  Another challenge involved was to display results to the Precision: Percentage of returned pages that is users such that their preferences are taken into account. For relevant. Or in other words the capability of this user click history is tracked by the system. minimizing the number of irrelevant links returned Advantage of using Semantic Web is that user shouldn’t to the users. be aware of the concepts supporting the search to use it. Precision: - A*100/ (A+C) Their experience should be as close as the one they currently  Recall: Percentage of relevant pages that is have with the current web and the search engines they use returned. Or in other words the capability of daily. maximizing the number of relevant links returned to the users. Recall: -A*100/ (A+B) Some of the latest works relating semantic areas are:- WANG Yong-gui and JIA Zhen [1] gave introduction to All search mechanism till date performs the function Semantic Web and its Mining and then proved that their where precision and recall percentage is too low. For integration can bring lot of effectiveness in Web Mining. For example, consider a situation when User enters “images of that they used a five step process which actually integrated all hotels in Delhi” query on Google, now search results may Web mining with Semantic technologies.

24 Jiang Huiping [2] proposed a semantic web search model simple variations of them in a prominent way. In case user to enhance efficiency and accuracy of IR for unstructured has entity oriented queries then they will work well only and semi structured documents. He used Ranking Evaluator with ontology. It highlighted that the most challenging task to measure the similarity between documents with semantic is construction of ontology on any particular domain. information for rapid and correct information retrieval. He introduced Search Arbiter to judge whether the query is III. GOAL OF RESEARCH answered by Keyword based Search Engine or Ontology The goal proposed in this research work is to develop Search Engine. He gave just a conceptual architecture of system architecture for semantic web search engine for Semantic Web Information Retrieval System. images over a specific domain that is Hotels. Saman Kamran and Fabio Crestani [3] proposed a In this system, ontology will be the knowledge base method for developing reliable ontology using the potential which will be trained in any specification like RDF, OWL or users available on Social Networking Sites. They created a Turtle. seven step model which takes their input and then using The process of training the ontology with reliable and Wikipedia Link Based Measure (WLM) and Cosine latest data is done using Semantic Tool. This is Java based Similarity, they calculated semantic similarity score of their tool which internally fetch information from Google using inputs. This model is under construction and no evaluation Google AJAX API. available yet. Various attributes of class Hotel are prepared on the basis Eero Hyv¨onen, Avril Styrman and Samppa Saarela [4] of Location, Ratings, and Rate etc. User’s preferences are in their paper have developed a method of annotating the been tracked by the system which internally count user’s images. They have used Promotion event images and clicks per image. annotated them so that an ontology structure can be User will provide query using an easy to use interface. established for that event. This ontology will then help in System will Search directly in the ontology itself after which answering the queries of the user. images of the relevant Hotels will be displayed sorted by Waqas Ahmad and Ch Muhammad Shahzad Faisal [5] user’s preferences, thus giving a self learning framework to proposed a context based search, done over 300 pictures of 5 user similar to that of Google image search. personalities, which were gathered from Google, over various contexts like playing, attending meeting etc. These IV. CURRENT SYSTEM ISSUES images were manually annotated to make search more The above proposed search system based on semantic efficient. web does not takes into consideration the user behaviour Noman Hasany and Mohd. Hasan Selamat [6] presented while returning the search results, which is an important a system where ontology for hotels is used for searching and requirement in order to make user aware of the most popular user is given a natural language query platform for giving the as well as reliable hotel images available till date over queries. This paper provides the detailed construction of the Internet. So this system focuses on using the Hotel Ontology using knowledge base of Malaysian hotels. image’s click history as a tool to track the popularity of an Tuan-Dung CAO, Thanh-Hien PHAN and Anh-Duc image. NGUYEN [7] presented a system called STAAR (Semantic Secondly this system will always look for latest images Tourist Information Access and Recommending). They used available online to be populated in the ontology which is Tourist ontology and helped the user in making query to it done here using Google AJAX API. through mobile platform. In addition they also gave algorithm for suggesting travel route relevant to both V. PROPOSED SYSTEM criterions: itinerary length and user interest. Here in this proposed system, a semantic web K.Palaniammal, Dr. M. Indra Devi and architecture has been designed and developed that can Dr.S.Vijayalakshmi [8] in their paper contributed towards relieve the users from the overburden of doing a lot of semantic based searching which also gives importance to keyword based search before getting the desired result. user’s priority while searching in their required domain. This Data in the ontology is in Turtle format. In this system, system keeps a track of user’s traits like age, current status latest data about hotel gets populated inside ontology using etc to understand his habits and accordingly proposes him Google Ajax API. It is basically Google JavaScript API places which he may like to visit. which helps in loading the online search results which P.Sheba Alice, A.M.Abirami and A.Askarunisa [9] in includes metadata as well as images directly into the web their paper proposed a tool enhancing a refined search application. retrieving only the most relevant links eliminating the other This system takes the user query in the form of links using semantic web technologies. A user’s searched parameters related to that domain in a user friendly text is stemmed and compared with attributes defined in environment, develop a SPARQL query and using JENA ontology. API will give the reliable results to the user. Hannah Bast, Florian Bäurle, Björn Buchhold and Elmar On clicking over the image user is redirected to the Haussmann [10] in their paper discussed the pros and cons of corresponding host website to which this image belongs. full-text search on the one hand and ontology based search This internally updates the user’s preference in the on the other hand. If user has full-text query then it can work well when relevant documents contain the keywords or

25 knowledge base. Next time in the results, images will be rdfs:country "India"; displayed in the order of user’s preferences. rdfs:typeofroom "Twin"; rdfs:typeofhotel "Three"; VI. ONTOLOGY ARCHITECTURE rdfs:rate "Reasonably"; The Hotel ontology architecture used is shown below in rdfs:facility "Restaurant"; Figure 3 in graphical format. rdfs:facility2 "Internet AC TV Refrigerator"; . Here this ontology contains triples that are represented as Subject-Predicate-Object. For example, there is a hotel 1 which has title Hotel Green Park Chennai. So in this Subject Hotel 1 has predicate title which has object value as “Hotel Green Park Chennai”. That’s how the whole ontology is built. This ontology contains latest information which is constructed by using Google AJAX API in this system.

VII. SYSTEM ARCHITECTURE

Figure 3. Ontology Chart

Following Turtle code gives a small skeleton for Hotel Ontology used in this system.

@prefix : . @prefix xsd: . @prefix dc: . @prefix rdfs: . Figure 4. Architecture :1 dc:title "Hotel Green Park Chennai" ; rdfs:urls "hotelgreenpark.com/chennai/"; VIII. ALGORITHM rdfs:imgurls A. Initial Setup:- "hotelgreenpark.com/chennai/images/animation.gif"; 1) Training of Hotel Ontology with image URL is done rdfs:continent "Asia"; using Semantic Tool. rdfs:country "India";  rdfs:typeofroom “Double"; Tool takes input URL through Google AJAX API rdfs:typeofhotel "Five"; based on the user's search query rdfs:rate "Expensive"; 2) Returned URLs are formatted as per the requirement rdfs:facility "Restaurant Swimming Pool Fitness of system, which are then used as input for the Center"; ontology rdfs:facility2 "Internet AC TV Refrigerator King 3) Using the above steps, this system will dynamically Bed"; load the ontology with reliable information about . hotels in Turtle format.  :2 dc:title "Raya Inn Rajasthan" ; In this work, ontology contains latest information rdfs:urls "rayainn.com/"; about hotels in continent of Asia and Europe. rdfs:imgurls 4) For experimentation, system is trained with 200 "rayainn.com/main-images/_DSC1445.JPG"; hotels. rdfs:continent "Asia";

26 5) For all images in the ontology an important parameter Now, user is provided with easy to use interface shown rank is used (initially set to 0). below in Figure 5 which has options that are used in his  This will be changed gradually while the system is query. used by the user

B. Algorithm:- Pre requisite: - Hotel Ontology Search evaluation { 1) System displays the query interface to the user. 2) The input parameters given by user are then converted to SPARQL command. 3) System will then search this SPARQL query in the existing ontology using JENA API. 4) The result set obtained after matching the query, are then sorted according to the user's preferences recorded cumulatively by this search engine. 5) Users preference is recorded in the knowledge base by the system itself in the rank parameter. 6) This rank gets updated dynamically according to the Figure 5. Query Interface following userClick algorithm. 7) userClick algorithm:- For example user selects Continent as “Asia”, Country as  Initially all the images are having rank 0, so they “India”, Star Rating as “Five”, Hotel Price as “Expensive”, are displayed in the order in which they are Type of Room as “Two Twin” and can select facilities from listed in the ontology. the above given list. This interface when submitted to the  When the user clicks over any image, its rank gets system will create a SPARQL query. incremented that indicates the current user's preference for that image. SELECT ?doc ?title ?urls ?imgurls  In the next search, images will be sorted on the WHERE basis of their rank. { ?continent pf:textMatch "Asia" . 8) Now the result set returned contains the images ?country pf:textMatch "India" . sorted according to their rank as explained above. ?typeofhotel pf:textMatch "Five" . 9) These images as thumbnails are displayed on the web ?typeofroom pf:textMatch "Twin" . page, in order of their rank. ?rate pf:textMatch "Expensive" . } ?facility pf:textMatch " Swimming Pool" . ?facility2 pf:textMatch " Internet Heater" . IX. TRAINING THE SYSTEM ?doc dc:title ?title . The data of various users is collected and organized ?doc rdfs:rate ?rate . around ontology of hotel. The system has a large database of ?doc rdfs:typeofhotel ?typeofhotel . images belonging to various categories. The process of ?doc rdfs:typeofroom ?typeofroom . collecting latest data for hotel in the ontology is done ?doc rdfs:facility ?facility . automatically using Google AJAX API. ?doc rdfs:facility2 ?facility2 . All the details along with the URL of image file and its ?doc rdfs:continent ?contnent . category is stored in the ontology. The category of an image ?doc rdfs:country ?country . is identified manually and it can be anything like Location, Star Ratings, and Price etc. ?doc rdfs:urls ?urls . ?doc rdfs:imgurls ?imgurls X. IMPLEMENTATION } Here Semantic Tool is used to train this Hotel Ontology with the reliable data about various hotels available in Asia This query will search in the Hotel Ontology and will and Russia. This system is tested on 200 hotels. This produce a list of hotels which satisfies above given factors. implementation is just a simulation and can be easily From that list it will check each image’s rank according to extended for large dataset. proposed algorithm. Now it will display the resulting images in order of their rank. That is most popular image among that result set will be displayed first and then the next popular and so on.

27 The example shown above is tested on small ontology of Now Recall in this system is dependent on the strength of 200 hotels, following figure 6 shows subset of result the knowledge base that is Ontology. Recall compares the images:- number of relevant results returned by the system with total number of relevant hotels that actually exist. If the ontology is prepared in such a way that it contains all the relevant hotels images, then this system will surely give Recall = 1. In this system, ontology having 200 records is used. It can be easily expanded to large dataset and its Recall will be nearing to 1 always.

XII. CONCLUSION In this research, a model is proposed which will solve the problem of irrelevancy on the search results displayed by current image search engines using the semantic web technologies focusing on single domain that is of Hotels. This model has been tested on limited number of hotels and it has been observed that the two measures that are Precision and Recall improve significantly over currently used keyword based search engines because this model always retrieves relevant results. This improvement will surely persist for even large number of hotels. Figure 6. Query Result This model can be extended for multiple domain searches to give a fully fledged experience similar to that of currently Here in this system, names of hotels are also displayed used search engines but with improved precision and recall along with their images. User can click over any image of his rates. choice and he will be directed to that hotel link. REFERENCES XI. RESULTS AND DISCUSSION [1] WANG Yong-gui1 and JIA Zhen, “Research on Semantic Web User tested the system with some complex image queries Mining” , 2010 International Conference On Computer Design And for example “Images of Five Star Hotel in India with Appliations (ICCDA 2010), 978-1-4244-7164-5 2010 IEEE. Swimming Pool TV AC Internet”. When user manually [2] Jiang Huiping, “Information Retreival and Seamtic Web”, 20I0 checked in the knowledge base of 200 hotels, the number of Technology (ICEIT 2010), 78-1-4244-8035-7/10 2010 IEEE V3-461. relevant hotels which matched his query was 44. But, after [3] Saman Kamran and Fabio Crestani, “Defining Ontology by Using running his query through this Semantic Search System, he Users Collaboration on Social Media”, CSCW 2011, March 19–23, found 47 images. 2011, Hangzhou, China. ACM 978-1-4503-0556-3/11/03. Reason for this deviation is that this system works on [4] Eero Hyv¨onen, Avril Styrman and Samppa Saarela , “Ontology Based ”, University of Helsinki, Department of union approach. Suppose use selects 2 facilities i.e. facility1 Computer Science. and facility2, then system should return images of only those [5] Waqas Ahmad and Ch Muhammad Shahzad Faisal, “Context Based hotels which satisfy both of them. But this system returns Image Search”, 978-1-4577 -0657 -8/11/$26.00 © 2011 IEEE . images of hotels which satisfies any of them. This is done [6] Noman Hasany and Mohd. Hasan Selamat, “Answering User Queries with an intention that is a person generally takes decision for from Hotel Ontology for Decision Making”, 978-1-4577-1481- hotels after exploring their images and all available facilities 8/11/$26.00 ©2011 IEEE 123 . by surfing through its website. So he may like to see even [7] Tuan-Dung CAO, Thanh-Hien PHAN and Anh-Duc NGUYEN, “An those hotels which are only providing facility1 but not Ontology based approach to data representation and information search in Smart Tourist Guide System”, 978-0-7695-4567-7/11 facility2 along with other miscellaneous facilities. $26.00 © 2011 IEEE DOI 10.1109/KSE.2011.33. Now according to manual checking relevant results are [8] K.Palaniammal, Dr. M. Indra Devi and Dr.S.Vijayalakshmi, “An 44. System returned total of 47 results out of which Unfangled Approach to Semantic Search for E-Tourism Domain”, according to the user’s perspective irrelevant records are 3. 978-1-4673-1601-9/12/$31.00 ©2012 IEEE 130 ICRTIT-2012. So Precision comes out to be {44 / (44+3)} = 0.93 which [9] P.Sheba Alice, A.M.Abirami and A.Askarunisa, “A Semantic Based is pretty good. Approach to Organize eLearning through efficient Information When same query is tested on keyword based search Retrieval for Interview Preparation”, 978-1-4673-1601-9/12/$31.00 engine, it gave Precision of 0.075 which is very poor when ©2012 IEEE ICRTIT-2012. compared with Precision given by Semantic based search [10] Hannah Bast, Florian Bäurle, Björn Buchhold and Elmar Haussmann, “A Case for Semantic Full-Text Search”, JIWES ’12 August 12 2012, engine that was 0.93. Portland, OR, USA Copyright 2012 ACM 978-1-4503-1601-9/12/08 On similar basis, Average precision was calculated for 5 ...$15.00. simple queries and keyword based search gave average [11] Shruti Kohli and Sonam Arora, “Topic Specific Concept Matching Precision of 0.165 whereas Semantic based Search gave Based Web Semantic Search Engine”, International Journal of average Precision of 0.95 which is a significant improvement Computer Science & Engineering 2013. over existing keyword based search engines.

28