Comparing SPARQL Tools for Logical Inferencing in Semantic Web

International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-7, Issue-6S4, April 2019

Aman Jolly, Shailender Kumar 

Abstract:--- In today’s modern era, where search engines are iconic birth of World Wide Web in 1990‟s. The networking still using traditional information retrieval system which usually in the second generation of the web is based on document involves string matching and comparison of text and character. centred processing unlike web 1.0 and the mode of Semantic web technologies are considered an only silver bullet information retrieval is making a query in the search engines that has the potential to lead the transition from keyword-based search to context-based search. Semantic web technology can like Google via web browsers [2]. The search engines provide a system where the web of the document can be made display the list of relevant hyperlinks that are meant to machine-understandable. First, this paper projects how a satisfy the required user query. Thus by clicking on the semantic web’s linked data can lead us to more robust and hyperlink, it provides access to the document that is being machine understandable system. Second, this research shows hosted on other servers across the internet. This state of the how a logical inference can be made in linked data by using art is still being used actively. About 44% of the world SPARQL queries on different SPARQL tools. Third, this paper shows the analysis of different SPARQL tools such as Twinkle population that is 3.236 billion people in 2016 had an access 2.0, Jena ARQ 3.5, and Protégé 5.0. A comparative tabular to internet according to World Bank‟s world development analysis has been evaluated in order to compare the features of report. Web 2.0 is much more convenient and don‟t require these SPARQL tools and describe shortcomings in their present any expert knowledge that can be easily used by normal version. users. Information retrieval became much easier due to web Index Terms: Semantic web, Linked Data, SPARQL tools, crawlers and web search engines as they maintain web index Protégé 5.0, Jena ARQ 3.5, Twinkle 2.0, Logical inference . that made the search for required information easily. The statistics that are shown by World Bank clearly reveal that I. INTRODUCTION the web is very big and it is growing continuously at an The semantic web is one of the most prominent and astonishing level. budding fields in IT-oriented research. However, research on adding semantics in the web was initialized as soon as A. Need for Semantic Web the web was started but a robust semantic web that is the After the arrival of “Internet of Things”, a network of large repository distributed metadata that was integrated into different devices such as household and network devices is human readable document emerged only over fourteen needed to establish connections among them [10]. All these years. The Semantic Web (Web 3.0), a web of data, methods devices are having different types of sensor that gather and Technology permit machine to comprehend the meaning different type heterogeneous data. Because all these devises of the information stored on World Wide Web (WWW) are gathering data from internet and environment, they can [15]. It provides a standardized, powerful, worldwide, put data on the web or on any social media platform so that omnipresent communication mechanism whose benefits are devices can communicate with each other and share their not viable to ignore [9].In the history of web there was the information. The issue lies in the enormity of data and time when ARPANET [2], the father of the first generation incapability of machines to make sense of this data. Humans of web was connected with only fewer academic institutions try to solve the problem with contextual knowledge, world in late 1960‟s and networking were highly based computer- knowledge and experience if the content is written in the centric processing and way of retrieving the information language that is understood by them. Exploiting machines to from the source was connecting to a remote system via their full potential have become a necessity, so as to provide terminal, one had to browse through their file system data better filtering and searching capabilities in this vast and retrieve the desired file by downloading it to base cyberspace. The web documents that are currently based on workstation. This type of approach requires expert hypertext markup language are meant for human knowledge as one has to remotely login via advanced consumption [1]. The web pages are designed for providing commands and information retrieval was more costly in only the descriptions about the presentation of information terms of hardware cost. In this first generation of web in the web browsers and the linkage of information from one networks, only the experts of academic institutions were document to another. But it does not show the semantics of connected that made the flow of information limited to data. Thus the widely used web standards are human based, some. This restriction of information gave rise to the second rather than machine-based. Also, there is an ambiguity in the generation of the web also called web 2.0 which marks the natural language. For e.g.: The word „father‟ in WorldNet lexical database contains different concepts. Father can be a male parent or can be used to address male priests in some Revised Manuscript Received on April 05, 2019. churches (Padre). Aman Jolly, Assistant Professor, KIET Group of Institutions, Ghaziabad, Uttar Pradesh, India. Dr. Shailender Kumar, Assosciate Professor, Delhi Technological University, Delhi, India.

Published By: Blue Eyes Intelligence Engineering Retrieval Number: F11250476S4/19©BEIESP 592 & Sciences Publication

COMPARING SPARQL TOOLS FOR LOGICAL INFERENCING IN SEMANTIC WEB

It can be someone who is having an important or referred data sets from ADRIADNE and MERLOT, in distinguished position in some organization. It Learning Objects Metadata format. They also proposed a can also be used interchangeably for representing God, new pattern classification model to match instances from founder, beginner, founding father and don (the head of an different ontologies which are related to e-learning material organized crime family). So in the natural language, the which is used in the same context of the knowledge society. interpretation is complex because same words are actively Their model showed high accuracy result when compared used in different context and different words are used to with some best existing method of ontology matching [6]. specify the same concept that is depicted in Figure 1. Due to M. English [7] has proposed his work based on trust layer this, traditional information retrieval system that usually of semantic web layer cake that can be implemented in involves string matching and comparison of text and blockchain technologies used in bitcoin where it was character show multiple irrelevant results. described that how a blockchain technology can have the potential that can be applied in making semantic web architecture more robust and resilient by fulfilling three key requirements in the Uniform resource identifiers that are security, human readability and decentralization. They also presented how data can be stored in a blockchain and the concept of distributed trust can contribute to the linked data system in the semantic web. In addition to that, they formalized a path where semantic web linked data approach can contribute to blockchain technique. They proposed a model of representing the transaction in RDF so that the semantic representation can ensure trust at the human understandable level, unlike present system where trust is Fig. 1. Ambiguity of natural language established only at technical level [7]. For successful communication in natural language, the C. Fluit [8] has presented his work on Ontology Data syntax and meaning (semantics) of the information must be analysis, querying and data exploring for inferencing and interpreted correctly. The understanding (correct described a cluster map technique for visualizing ontologies, interpretation) depends upon the context of sender and representing classes and their hierarchies, and also described receiver and pragmatics of the sender. The context depends three key applications of Clustered mapping technique that on the experience of both sender and receiver. In the are as follows:- traditional web, there is no provision for extracting implicit • Dope browser semantics which means that the information is there but it • Xarop/SWAP: Peer-to-Peer Knowledge Management has hidden meaning, this makes it difficult to understand the • Aduna AutoFocus correct meaning. Semantic web (web 3.0) [1] is the extension of web 2.0 (the web of documents) is known as III. TECHNIQUES USED the web of data. Web 3.0 is a web as a huge decentralized This section, various techniques that have been used in knowledgebase of machine accessible data. According to T making the linked data and extracting implicit knowledge B Lee, there is a huge potential in the web when human- from them have been addressed. readable text is mixed with machine-readable data [2]. To make the documents on the web read and interpret correctly A. Resource Description Framework by the machine, web content is explicitly annotated with Resource description framework [3] can be used to semantic metadata [9]. This semantic metadata encodes the represent simple facts of knowledge. In the semantic web, meaning of contents of the web that can be read and the information exchange is done in the standardized format. interpreted correctly by machines [2]. The Resource description framework is the basic building block of information exchange in the web of data that is II. LITERATURE SURVEY used in the representation of knowledge in a standard In this section, the prominent work achieved in the field schema. It is introduced because XML schema has multiple of semantic web technologies and its applications in various ways to represent a fact which are standardized using RDF domains have been addressed. In this survey, tools and [4]. related key technologies that have been applied to achieve • Resource: indicates an abstract concept, identified via their proposed objectives. URI. • Description: depicts the properties and relationship Rafael S. Gonçalve [5] used Semantic web linked data for among the Resources describing as a graph. data acquisition. They collected the data from the input • Framework represents the amalgamation of Web gathered from these form as semantically aware ontologies. protocols such as HTTP, XML, URI, etc. They also performed data acquisition from form ontology to perform inference for investigating the eligibility for disability benefits using SPARQL query.

Sergio Cerón-Figueroa [6] implemented pattern classification for matching ontology instances. They have

Published By: Blue Eyes Intelligence Engineering Retrieval Number: F11250476S4/19©BEIESP 593 & Sciences Publication International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-7, Issue-6S4, April 2019

components include a Prefix Set and a Variable set. Prefix set is the endpoint locators and common Prefix is to be used to have concise queries whereas Variable Set includes variables with name lead by a question mark like ?V1, ?V2 etc. The SELECT-WHERE statements – comprises of two components namely a set of question words and a question pattern. The SELECT statement chooses which data is to be displayed whereas The WHERE keyword indicates the selection pattern, i.e. the data specified to pull out. Another component is the result set which is used to show the results in tabular structure or in XML or Text formats. Due to ease of typing XML based linked data formats, any text editor can be used to program linked data. Sublime Text is an open Fig. 2. Defining Semantic Explicit Knowledge. source text editor that is used in our research. The installation package was directly downloaded from their Now suppose there is an individual “Person X” who official website. The linked data is programmed in the turtle explicitly belongs to a class of “Research scholar”, “Ph.D. syntax notation. There are different tools which are used to student” and several other classes such as “JRF stipend execute SPARQL queries. Twinkle 2.0, Jena ARQ 3.5 and holder” as depicted in Figure 2. The “JRF stipend holder” Protégé 5.0 have been used in this research. Twinkle [11] is class is a subclass of the “JRF qualified” then it can deduce the tool to implement SPARQL Query. For installing the hidden knowledge that Person x is “JRF qualified” and Twinkle tool, JDK 1.5 or higher should be installed and “NET qualified” from the given explicit information as configured. Jena framework is developed in java which is shown in Figure 3. used to build semantic web application whereas Jena ARQ [13] is the SPARQL query processor for Jena. Protégé [12] is a free, open-source, GUI based tool for semantic web development.

IV. EXPERIMENT AND RESULTS In this section, we are making linked data and infer new knowledge that was explained in Figure 2 and Figure 3 by using SPARQL queries in different tools. The experiment is demonstrated by a flowchart depicted in Figure 4.

Fig. 3. Logical Inference of Implicit Knowledge.

The meaning of information is made explicit by formal (structured) and standardized knowledge representations (ontologies). Thus it is possible to relate and integrate heterogeneous data, process the information automatically and deduce implicit (non-evident) information in an automated way [14]. The semantic web is a global database that contains a semantic universal network of semantic prepositions [10]. B. SPARQL (Querying RDF) SPARQL query is used for making a logical inference in semantic web linked data. The linked open data which is programmed in resource description framework, where data is denoted in a triple-based data model which is a linked data representation, a query language is needed to acquire Fig. 4. Steps Involved in the Experiment for Extracting the linked data [10]. SPARQL is a W3C standard inspired Semantic Inference. by SQL, which has three components, i.e. a query language for RDF graph traversal second is a protocol layer for using

SPARQL via HTTP and third is the XML output format specification for results. SPARQL queries fetch linked data and represent it as tables for user view. The basic query

Published By: Blue Eyes Intelligence Engineering Retrieval Number: F11250476S4/19©BEIESP 594 & Sciences Publication

COMPARING SPARQL TOOLS FOR LOGICAL INFERENCING IN SEMANTIC WEB

A. Results as tabular format. In figure 7 the SPARQL results of These results show that how the data can be extracted and Twinkle 2.0 are shown in tabular format. filtered to such extent that logical inference can be After checking the SPARQL query result in Twinkle 2.0, calculated by using SPARQL query. The set of queries have Jena ARQ 3.5, Protégé 5.0, the serialization format of linked been performed in order to deduce that named individual open data that was imported in the these tools in turtle “Person_X” belongs to “JRF_qualified” and syntax notation was changed to RDF/XML, OWL/XML, “NET_qualified” class. The data was exported and the LATEX, N3 (.n3), N-Triples (.nt), Manchester format, SPARQL query was used in Twinkle 2.0, Jena ARQ 3.5 and KRSS2 & OBO formats in order to check the support of Protégé 5.0 to infer the following result in different tools. these formats in the SPARQL tools. Based on their support of different formats and features a comparative analysis is evaluated in Table 1. Table 1 Comparison of Twinkle 2.0, Jena ARQ 3.5 & Protege 5.0. Serialization & Twinkle Jena Protégé Features supported 2.0 ARQ 3.5 5.0 1) RDF/XML support    2) OWL/XML    support 3) TURTLE(.ttl)    support Fig. 5. SPARQL results in PROTÉGÉ 5.0 4) LATEX support   

5) N3 (.n3) support    6) N-Triples(.nt)    support 7) Manchester    support 8) KRSS2 support    9) OBO support    10) Query saving    feature 11) Query loading    from file 12) Format    Independent Fig.6. SPARQL results in JENA ARQ 3.5 Parsing 13) Graphical user    interface 14) Long running    SPARQL queries cancellation A. Observations From above analysis, it is clear that parser(s) of Twinkle 2.0 and Jena ARQ 3.5 are file extension dependent which means that it parses the linked data file based on its extension. For example, if the extension of the linked data file which is programmed in turtle notation and saved in „.ttl‟ file extension format. It scans the data with the turtle Fig.7. SPARQL results in TWINKLE 2.0. parser but if same data is saved in „.n3‟ file extension format, then the parser fails to parse the data because the n3 . Figure 5 shows the result that was extracted in Protégé parser is trying to parse turtle notation. Whereas protégé 5.0 5.0 whereas Figure 6 and Figure 7 shows the result that was doesn‟t scan linked data file extension before parsing but it extracted in Jena ARQ 3.5 and Twinkle 2.0 respectively by checks the linked data file content with all its available importing linked data explained in Figure 2 and Figure 3. parsers. The one who correctly interprets the linked data is The result that is shown in Protégé 5.0 is represented in the correctly selected as a valid parser. tabular form as depicted in Figure 5. Whereas results that are extracted by Jena ARQ 3.5 are shown in a textual format as shown in Figure 6. However, Twinkle 2.0 gives the flexibility to the user to visualize data in both textual as well

Published By: Blue Eyes Intelligence Engineering Retrieval Number: F11250476S4/19©BEIESP 595 & Sciences Publication International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-7, Issue-6S4, April 2019

V. CONCLUSION AND FUTURE WORK Information Systems and Industrial Management Applications The paper offered an empirical study of how logical 16 Kumar, S., & Kumar, S. “Semantic Web attacks and inference can be extracted from semantic web linked data. A countermeasures.” 2014 International Conference on Practical approach of extracting implicit data has also been Advances in Engineering and Technology Research, provided by filtering the data set in Twinkle 2.0, Jena ARQ ICAETR 2014. 3.5, Protégé 5.0 using SPARQL query in aggregation with a generated linked dataset that was programmed in Turtle AUTHORS PROFILE syntax in a triple format using a Text editor. The visualized Aman Jolly is currently working as an data graph has been provided for the same. The comparative Assistant Professor in KIET group of analysis of these tools based on their features and support on institutions. He has done his B.Tech in various serialization formats have been evaluated in the computer science from GGSIPU and tabular format. File extension dependent feature in SPARQL M.Tech in Information Security from tools limits its functionality of parser support but it has its GGSIPU. own advantage of execution speed than those who employ file extension independent feature. There are several Dr. Shailender Kumar is currently extensions which can be clearly done to this work. For working as an Associate Professor at future work, this research may be extended to develop a Delhi Technological University. He is hybrid framework that employs both file extension having specialization in Database dependent and independent feature for SPARQL tools that Systems, Data mining, Big Data can support various new serialization formats. Semantic web Analytics, Machine Learning and technology and their applications in various domains have Information Security proven their worth in giving an intellectual way of resolving various dynamic and real-time problems by exploring uncharted areas of the web.

REFERENCES

1 Berners-Lee, T., Hendler, J. and Lassila, O. "The semantic web." Scientific American (2001). 2 Berners-Lee, Tim., Semantic web roadmap, 1998, http://ww w.w3.org/DesignIssues/Semantic.html. 3 https://www.w3.org/2001/sw/wiki/RDF. 4 https://www.w3.org/TR/2004/REC-rdf-concepts- 20040210/. 5 Gonçalves, R S.,” An ontology-driven tool for structured data acquisition using Web forms.”(2017). In Journal of Biomedical Semantics (Vol. 8, p. 26). Journal of Biomedical Semantics. 6 Figueroa, S C., “Instance-based ontology matching for e- learning material using an associative pattern classifier.” (2017). In Computers in Human Behavior (Vol. 69, p. e53). Elsevier Ltd. 7 English, M., “Block Chain Technologies & the Semantic Web : A Framework for Symbiotic Development.” (2016). In Computer Science Conference for University of Bonn Students (p. 47–61). 8 Fluit, C., “Ontology-Based Information Visualization: Toward Semantic Web Applications.” (2006). In Visualizing the Semantic Web (pp. 45–58). 9 Hitzler, P., Krotzsch, M., & Rudolph, S. “Foundations of semantic web technologies.” (2009). CRC press. 10 DuCharme, B., “Learning SPARQL: querying and updating with SPARQL 1.1." (2013). O'Reilly Media, Inc. p. 1-16. 11 Twinkle 2.0 Download Link - 12 http://www.ldodds.com/projects/twinkle/twinkle-2.0- src.zip. Protégé 5.0 Download Link - 13 https://github.com/protegeproject/protegedistribution/rele ases/tag/protege-5.0.0#downloads. 14 Jena ARQ 3.5 Download Link -http://www- eu.apache.org/dist/jena/binaries/apache-Jena-3.5.0.zip. 15 Malik, S. K., & Rizvi, S. (2012).” A framework for SPARQL query processing, optimization and execution with illustrations.” International Journal of Computer