Domain Oriented Semantic Web Based Personalized Search Engine
Total Page:16
File Type:pdf, Size:1020Kb
2014 Fifth International Conference on Intelligent Systems, Modelling and Simulation Domain Oriented Semantic Web based Personalized Search Engine Shruti Kohli Sonam Arora Department of Computer Science Department of Computer Science BIT, Noida BIT, Noida [email protected] [email protected] Abstract—Present market is dominated by such Search Recall. The above said weakness of today’s search engine Engines which are working on keyword based querying leads to low precision and recall parameters. system. This becomes useless and leads to wastage of user’s This weakness can be resolved to much extent using time if he is not aware of the keywords which are used to index Semantic Web Technologies. This technology works by desired relevant pages. For example, user enters keyword extending the current existing web with semantics that ‘Book’, now Google will show results for both ‘Reading any provides meaning to that web image or document. This book’ and ‘Book a Hotel’. That means user has to look into the meaning is defined in a way that understood by machine thus contents of the web pages to shortlist relevant pages which he reducing the effort of users in searching for relevant images. needs. The same problem exists in case of image search In Semantic Technologies, information is represented in a engines. If query is search for images of ‘hotel in Delhi’, image new W3C standard called Resource Description Framework result set will contain irrelevant as well relevant images. Now solution is needed where machine will itself divide the result (RDF). Now in Semantic technologies, ontology is main into relevant and irrelevant images and then showing the ingredient. Resource Description Framework has relevant ones to the user. But this solution is not feasible one recommended well defined format for representing ontology. because it then has to check the content of image using image Currently existing formats are RDF/XML, OWL, Turtle processing techniques and then checking for similarity between etc. Currently research on semantic technologies is in all the images which is not implementable for millions of beginning stage, therefore traditional search engines like records worldwide. Another solution is Semantic Web Google, Bing, Yahoo etc are still dominating the market. Technologies. It is an extension of the current web that allows Let’s take illustration of query to search for “Taj Mahal”. the meaning of information to be precisely described which can Now the results displayed by traditional search engines will be well understood by computer as well as user. Ontology is display thousands of images for “Taj Mahal in Agra” as well very important ingredient of Semantic Web and in this work as “Taj Mahal casino in Atlantic, USA” Search engine will Ontology for Hotel domain is used. User will be provided easy not refine the result for user. User himself has to sift through to use interface to query hotel ontology. Technologies used are the result set to find relevant results for his use. Information SPARQL query language and JENA API for searching user retrieval in current scenario relies only on keyword searches query inside ontology. In this work focus is given over using Google, Yahoo, Bing etc or based on simple metadata preserving user’s preferences while displaying results on the such as that of an RSS. Moreover, there is no provision to web page. Challenge was on dynamically loading the hotel generate personalized searches easily, so users need to think dataset into ontology in RDF format. This was done using and write search keywords that match their own Semantic Tool which internally uses Google AJAX API for populating latest results from Internet. Advantage of using requirements correspondingly. Such a process of searching is Semantic Web is that it results in only relevant images, which time consuming and requires lot of effort on human part. in turn increases Precision and Recall rates of search engine. That means if users does not understand the keywords to be used for searching, then he can’t perform relevant Keywords-SPARQL; RD; Semantic Web; Web3.0 information retrieval. Now semantic searching increases this efficiency by I. INTRODUCTION providing only relevant results to the user. It represents the Nowadays, whatever information user needs, he gets it data available over internet in format of ontology, which online, anywhere and anytime. If any user wants to visit contains the description of information using metadata. User some place, he would like to search information about that does not need to apply effort to think for keywords that will place, hotel available, weather etc beforehand. User may also give them the result they desire, instead the user can simply like to look at images of hotels before checking their provide the search engine with whatever information it has website. For this, lots of search engines are available in by selecting domain. today’s market online. User needs to enter keyword for the The core idea of domain based search engine is to query he is having and then he is flooded with images that is describe query in the form of domain description. For this available online. But the problem is that information which user need to build a query type that is well understood by is displayed contains relevant as well as irrelevant results. Semantic Web. This paper proposes an architecture where Now user’s work increases in separating his desired set of queries are not build using natural langue, but an easy to use images from the pool of results displayed. This leads to user interface that help users to build complex queries they wastage of time and energy. This measure which checks the want. efficiency of search engine in terms of number of relevant documents returned is done by two factors, Precision and 2166-0662/14 $31.00 © 2014 IEEE 23 DOI 10.1109/ISMS.2014.12 II. CHALLENGES contain lots of unwanted image result which are of no Keyword based search is useful especially to a user who interest to the user. Consider Figure 2. knows what keywords are used to index the images as they can easily define queries. This approach becomes problematic when the user is not aware about the way to write query such that desirable results only appear because for that he must know the semantic concepts that are used in that particular domain in which he is interested. And therefore after user enters the query, he is returned with some irrelevant images along with relevant ones. To check this efficiency of search engine, two parameters are available that is Precision and Recall. Consider Figure 1. Figure 2. Google images results Semantic Web based search aims to provide better precise and recall rates as compared to keyword based search. Challenge is to create a domain based semantic web search engine which is highly user friendly and provide advance search options with the help of various parameters that a user can think of. User friendly in a sense that user need not think of appropriate keyword that might give them their desired result, instead the user can simply provide the search engine with whatever information it has by selecting provided options. Now it is clear that Ontology is the main ingredient of Semantic Web and it is build for a particular domain. This work is using domain for Hotels available in the continent of Asia and Europe. Now the challenge is to load the RDF data in Hotel Ontology dynamically. For this, semantic tool is developed. This semantic tool is making use of Google AJAX API internally to fetch results from Google search engine. Now over these results, it employs URL checking to separate relevant results from irrelevant results. These relevant search results are then transformed into RDF format Figure 1. Precision vs. Recall and then populated in Hotel ontology. From the given interface user chooses the desired options and then sends the Let A be number of relevant records retrieved, let B be query to this search engine, which in turn provides only the number of relevant records not retrieved and C be the reliable results from the ontology. number of irrelevant records retrieved. Another challenge involved was to display results to the Precision: Percentage of returned pages that is users such that their preferences are taken into account. For relevant. Or in other words the capability of this user click history is tracked by the system. minimizing the number of irrelevant links returned Advantage of using Semantic Web is that user shouldn’t to the users. be aware of the concepts supporting the search to use it. Precision: - A*100/ (A+C) Their experience should be as close as the one they currently Recall: Percentage of relevant pages that is have with the current web and the search engines they use returned. Or in other words the capability of daily. maximizing the number of relevant links returned to the users. Recall: -A*100/ (A+B) Some of the latest works relating semantic areas are:- WANG Yong-gui and JIA Zhen [1] gave introduction to All search mechanism till date performs the function Semantic Web and its Mining and then proved that their where precision and recall percentage is too low. For integration can bring lot of effectiveness in Web Mining. For example, consider a situation when User enters “images of that they used a five step process which actually integrated all hotels in Delhi” query on Google, now search results may Web mining with Semantic technologies. 24 Jiang Huiping [2] proposed a semantic web search model simple variations of them in a prominent way. In case user to enhance efficiency and accuracy of IR for unstructured has entity oriented queries then they will work well only and semi structured documents.