Mathematical Model of Semantic Look-An Efficient Context Driven Search

Mathematical Model of Semantic Look - An Efficient Context Driven Search Engine Leena Giri Ga, Srikanth P La, Manjula S Ha, K R Venugopal a, L M Patnaikb aDepartment of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bangalore University, Bangalore 560 001 India, Contact: [email protected]. bHonorary Professor, IISc., Bangalore. The World Wide Web (WWW) is a huge conservatory of web pages. Search Engines are key applications that fetch web pages for the user query. In the current generation web architecture, search engines treat keywords provided by the user as isolated keywords without considering the context of the user query. This results in a lot of unrelated pages or links being displayed to the user. Semantic Web is based on the current web with a revised framework to display a more precise result set as response to a user query. The current web pages need to be annotated by finding relevant meta data to be added to each of them, so that they become useful to Semantic Web search engines. Semantic Look explores the context of user query by processing the Semantic information recorded in the web pages. It is compared with an existing algorithm called OntoLook and it is shown that Semantic Look is a better optimized search engine by being more than twice as fast as OntoLook. Keywords : Ontology, RDF, Semantic Web. 1. INTRODUCTION the web page playing multiple roles. Both Ontolo- gies and RDF are embedded in web pages forming Semantic Web (Web 3.0) is the proliferation of the semantic annotation of a web page. unstructured documents of the web to a ”web of data” [1]. In traditional web architecture there is less emphasis on meta data of the web doc- 1.1. Motivation ument during the data collection phase of the The existing search engines interpret the key- arXiv:1402.7200v1 [cs.IR] 28 Feb 2014 search engine and the concentration is more on words of a user query in isolation without con- classic approaches like Information Retrieval and sidering wholly, the context of the search query. Natural Language Processing. It is difficult to Because of this, most of the results retrieved is know the context or the role played by the web irrelevant to the user query. This hits the per- document designed for such approaches [2][3][4]. formance and accuracy of search engines. The This is overcome by Semantic Web where en- main purpose of providing multiple keywords is hanced version of meta data are embedded in the to make search based on a particular context. web pages as RDF [5] and Ontology [6]. Ontol- It is to say that nothing exists without con- ogy defines the concepts and the relations be- text or relation. As an example, consider a sce- tween these concepts. RDF (Resource Descrip- nario where a user has submitted the keywords tion Framework) describes the web document in ”Ashoka+Bangalore+Hotel” with the intention the form of triplets. Every RDF triplet is a com- to search for Hotel Ashoka in Bangalore. Tradi- position of subject, predicate and object. Subject tional web search engines return all the web pages is an entity to be described, object is an entity containing the keywords ”Ashoka, Bangalore and which describes the subject and predicate is a re- Hotel” without considering the context of the user lationship between subject and object; essentially query. Most of the web pages are irrelevant to the every predicate describes the different context of user query; where some pages may provide information on ”Ashoka Pillar in Bangalore”, some 19 20 Leena Giri G, et al., web pages may provide information on different OntoLook for performing relation based search hotels because of the keyword ’hotel’, and some to derive the context of user query. Concept- pages may provide information about Bangalore Relation Graphs (CRGs) are created with vertices city because of the keyword ’Bangalore’. representing concepts and edges representing the number of relations between these concepts. An 1.2. Contribution algorithm generates sub graphs by pruning edges Semantic Look processes the semantic anno- of the CRG irrespective of whether they are heavy tation embedded in web pages to perform the weighted or less heavy weighted edges and this relevance analysis on the user query to retrieve results in large number of sub graphs to be pro- the related URLs. It constructs the Ontology cessed. graphs which are eqvivalent to the Concept- Relation graph of OntoLook [7]. Semantic Look Junghoo Cho, Hector Garcia-Molina, Lawrence is optimized compared to OntoLook, where heavy Page [8] have proposed the theory based on ”Ef- weighted arcs are retained in every sub graph and ficient Crawling Through URL Ordering”, in or- only half of the number of least weighted arcs are der to obtain more ”important” pages first. Ob- pruned which mitigates the number of sub graphs taining important pages rapidly can be very use- to be processed. ful when a crawler cannot visit the entire web in a reasonable amount of time. They define 1.3. Organization several important metrics, ordering schemes, and This Section describes the rest of the paper in performance evaluation measures for this prob- brief. Section 2 describes a few earlier contri- lem. They experimentally evaluate the ordering butions to the implementation of Semantic Web schemes on the Stanford University Web to prove based Search Engines. Section 3 defines the prob- that a crawler with a good ordering scheme can lem considered. Section 4 explains the architec- obtain important pages significantly faster than ture of Semantic Look. In Section 5 the math- one without agood ordering scheme. ematical model for Semantic Look is given by Li Ding et. al., [9] have proposed the theory describing the notations used and the equations based on ”Search on the Semantic Web”. They that result in a more accurate result set being propose this theory based on the fact that, in displayed. Section 6 describes the algorithms re- order to help human users and software agents quired in the components of the system archi- find relevant knowledge on the Semantic Web, tecture. Section 7 gives the experimental results Swoogle, a search engine, discovers, indexes and mentioning the data sets considered and a graph- analyzes the Ontologies and facts that are en- ical representation of performance evaluation be- coded in Semantic Web documents. Natalya F tween OntoLook and Semantic Look. The paper Noy et. al., [10] have proposed the theory on cre- concludes by mentioning the enhancement that ating ”Semantic Web Contents”. As researchers can be incorporated in Semantic Look and the continue to create new languages in the hope list of references considered by the authors. of developing a Semantic Web, they still lack consensus on a standard. They describe how 2. RELATED WORK Protege-2000 - a tool for Ontology development Traditional search engines return keyword- and knowledge acquisition - can be adapted for isolated pages because of which many unrelated editing models in different Semantic Web lan- pages or web links are returned by them in re- guages. Semantic Web is necessary to express sponse to the user’s query. Semantic search en- information in a precise, machine interpretable gines can be categorized into different types of form, so software agents processing the same set which context based search engine is one. This is of data share an understanding of what the terms the largest group and the aim here is to add se- describing the data mean. mantic operations for better search results. Yufei Li et. al., [7] have proposed a prototype called Mathematical Model of Semantic Look - An Efficient Context Driven Search Engine 21 Lastra J L M et. al., [11] study the use of Se- tains most relevant web pages with unnecessary mantic Web Services in order to overcome this pages filtered from it. It is assumed that every challenge. The use of Ontologies and explicit se- web page consists of embedded RDF (e-RDF) and mantics enable performing logical reasoning to embedded Ontology (e-Ontology) forming the se- infer sufficient knowledge on the classification of mantic annotation for a web page. The web pages processes that machines offer, and on how to exe- with e-RDF and e-Ontology form the Semantic cute and compose those processes to carry out World Wide Web which is used by the crawler for manufacturing orchestration autonomously. Yi crawling the e-RDF/e-Ontology in the web pages. Jin [12] present an architecture of the Semantic Semantic Look, a variant of OntoLook, op- Search Engine and our work shows how the fun- timizes this search logic by retaining the high damental elements of the Semantic Search engine ranked edges in every sub graph, as they are rele- can be used in the fundamental task of informa- vant to the user query, and pruning only the least tion retrieval. An improved version of the tf-idf weighted arcs. As an example, if the number of (term frequency -inverse document frequency) al- edges in the CRG is 7 of which there are 4 less gorithm is proposed to guarantee the retrieval of weighted arcs, OntoLook produces 27 i.e., 128 information resources in a more efficient by look- sub graphs. Semantic Look produces only 6 sub ing for items in which the the keywords that are graphs, since it prunes half the number of least searched for are more common than usual. 4 weighted arcs i.e., C2. Alexander Maedche et. al., [13] have presented an integrated enterprise-knowledge management 4. SYSTEM ARCHITECTURE architecture for implementing an Ontology based Knowledge Management System (OKMS) and The Context Driven Search Engine includes Se- have made a study on two critical issues related mantic Crawler, Semantic Parser, Semantic Look to working with Ontologies in real-world enter- and Ontobase as components for drawing the con- prise applications.

Mathematical Model of Semantic Look-An Efficient Context Driven Search

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support