Semantic Web Tools and Techniques for Knowledge Organization: an Overview
Total Page:16
File Type:pdf, Size:1020Kb
Knowl. Org. 44(2017)No.4 273 T. Padmavathi and M. Krishnamurthy. Semantic Web Tools and Techniques for Knowledge Organization: An Overview Semantic Web Tools and Techniques for Knowledge Organization: An Overview† Thimmaiah Padmavathi* and Madaiah Krishnamurthy** *CSIR-Central Food Technological Research Institute (CSIR-CFTRI), Food Science & Technology Information Services (FOSTIS/Library), Mysore, Karnataka, India, <[email protected]> **Documentation Research & Training Centre (DRTC), Indian Statistical Institute, Bangalore, Karnataka, <[email protected]> ThimmaiahPadmavathi is Senior Technical Officer at CSIR-Central Food Technological Research Institute, FOSTIS/Library. She holds BSc and MLISc from Bangalore University, completed a post master’s training course in information technology applications to library and information services at the National Center for Science Information, Indian Institute of Science, Bangalore, and a doctoral degree from Bharathiar University. Her core areas of expertise include semantic web technologies, computer based information services, digital li- brary technologies, documentation, IT applications (web page design and content development) and library automation. Madaiah Krishnamurthy is an associate professor of information science at DRTC, Indian Statistical Institute, Bangalore. He holds a master’s degree in applied economics and library and information science and a PhD from Bangalore. He was a Fulbright Scholar in 2006 and visited the Graduate School of Library and Informa- tion Science, University of Illinois at Urbana-Champaign (USA), and was recipient of ETD travel grant to at- tend the 16th International Symposium on Electronic Theses and Dissertations in France in 2016. He is author of over seventy articles. His research interests are digital libraries, institutional repositories, social networking, library management and automation. Padmavathi, Thimmaiah and Madaiah Krishnamurthy. 2017. “Semantic Web Tools and Techniques for Knowl- edge Organization: An Overview.” Knowledge Organization 44(4): 273-290. 46 references. Abstract: The enormous amount of information generated every day and spread across the web is diversified in nature far beyond human consumption. To overcome this difficulty, the transformation of current unstruc- tured information into a structured form called a “Semantic Web” was proposed by Tim Berners-Lee in 1989 to enable computers to understand and interpret the information they store. The aim of the semantic web is the integration of heterogeneous and distributed data spread across the web for knowledge discovery. The core of sematic web technologies includes knowledge representation languages RDF and OWL, ontology editors and reasoning tools, and on- tology query languages such as SPARQL have also been discussed. † The authors wish to thank the director, CSIR-CFTRI for his support and encouragement. Received: 28 November 2016; Revised: 12 March 2017; Accepted 17 May 2017 Keywords: ontologies, OWL, query, domain knowledge, semantic web, knowledge discovery, knowledge representation 1.0 Introduction impressive for the past few years. Initially, the web was for exchange of documents and data and for collabora- The explosion of personal computers and major ad- tion. It was meant to be a network of workstations where vances in the field of telecommunications were the be- the programs and databases could share their knowledge ginning of the web as its known today. The emergence of and work together in collaboration. Information is stored web interconnected computers to work together and share in large databases kept in the servers where the programs the necessary data paved the way for Berners-Lee (1989). that run the servers generate web pages dynamically. In- The growth of the World Wide Web (WWW) was formation seekers use search engines for locating, com- 274 Knowl. Org. 44(2017)No.4 T. Padmavathi and M. Krishnamurthy. Semantic Web Tools and Techniques for Knowledge Organization: An Overview bining, and aggregating data from the internet in their potential of the semantic web.WordNet, EuroWordNet, quest for information. Most of this information is en- GUM, Mikrokosmos, and SENSUS are linguistic ontolo- coded using the Hypertext Markup Language (HTML) gies which use words as grammatical units. The purpose and thus is difficult to manipulate on a massive scale as it of this type of ontology is to describe semantic con- is made for data representation on the web primarily for structs rather than to model a specific domain. They of- human consumption and not for machine understanding fer quite a heterogeneous amount of resources, used and interpretation. mostly in natural language processing. This is the major disadvantage of the present Web, as finding the right information from the huge collection of – WordNet (Miller et al. 1990, 1995) is a very large lexical web documents is becoming increasingly impossible. database for English created at Princeton University Current information retrieval systems are imprecise, of- and based on psycholinguistic theories. WordNet at- ten yielding matches to thousands of pages. The major tempts to organize lexical information in terms of limitations of the present web are seemingly high recall word meanings rather than word forms, though inflec- and low precision; search output and results are highly tional morphology is also considered. For example, if vocabulary sensitive; and, vocabulary of the query and of you search for “trees” in WordNet, you will have the web resources may require multiple queries or searches if same access as if you searched for “tree.” WordNet 1.7 the information required is spread over several web contains 121,962 words and 99,642 concepts. It is or- documents. These limitations coupled with the growing ganized into 70,000 sets of synonyms (“synsets”), each volume of data and information available on the Web have representing one underlying lexical concept. Synsets are initiated discussions among the members of the web interlinked via relationships such as synonymy and an- community related to enhancing the present web to what tonymy, hypernymy and hyponymy (Subclass-Of and Su- Berners-Lee (2001) called a semantic web. perclass-Of), meronymy and holonymy (Part-Of and Hasa). The primary requirement of the semantic web is that WordNet divides the lexicon into five categories: the underlying data need to be structured properly with nouns, verbs, adjectives, adverbs, and function words. semantic and syntactic meaning so that computers can un- – AGROVOC (1980) is a controlled vocabulary covering derstand the data they store. The aim of the semantic web all areas of interest to the Food and Agriculture Or- iss to transform the current web consisting of hyperlinked ganization of the United Nations (FAO). AGROVOC pages into a web of knowledge that is machine proc- is available as an RDF-SKOS linked dataset. It consists essable. Realization of the semantic web depends on a set of over 32,000 concepts, available in up to 21 lan- of web-related technologies designed to facilitate machine guages, and linked to 16 other vocabularies and re- processing of data and interoperability. The semantic web sources. promises to overcome the challenge of integrating and – Bio2RDF (Belleau 2008) is an open-access semantic querying highly diverse and distributed resources. Systems web knowledge base that provides a mashup over 19 based on the semantic web provide sophisticated frame- different data sets that include the Gene Ontology, works to manage and retrieve knowledge. OMIM, Reactome, ChEBI, BioCyc and KEGG. Knowledge organization systems (KOSs) play a major – BioGateway (Antezana 2009) is a semantic web re- role in structuring development of data for the semantic source that integrates the entire set of OBO Foundry web. KOS tools such as library classification systems, the- ontologies (including both accepted and candidate sauri, taxonomies, controlled vocabularies, terminologies, OBO ontologies), the complete collection of annota- etc., can be exploited to support the development of the tions from the Gene Ontology Annotation (GOA) semantic web. Among these, library classification systems, files, fragments of the NCBI taxonomy and SWISS- especially faceted schemes, have been recognized as an PROT and IntAct. This project marked the fusion of important source of structured and formalised vocabu- the semantic web to systems biology. laries that can be utilized to support the development of – Gene ontology (GO) (Gene Ontology 2008) is a major the semantic web. All KOSs try to bring domain knowl- bioinformatics initiative to unify the representation of edge formulated in a conceptual framework. Ontologies gene and gene product attributes across all species. GO are well known knowledge organization (KO) tools, is part of a larger classification effort, the Open Bio- which can be utilized for capturing domain knowledge, medical Ontologies (OBO). The ontology covers three assigning semantic meaning to information and repre- domains: cellular component, molecular function and senting data for machine consumption. biological process. Since its inception the semantic web has gained steady acceptance in the science and technology community. Sev- Medical ontologies are developed to solve problems such eral projects have been undertaken to demonstrate the as the demand for the reusing and sharing of patient Knowl. Org. 44(2017)No.4 275 T. Padmavathi and M. Krishnamurthy. Semantic Web Tools and Techniques for Knowledge