A Method of Knowledgebase Curation Using RDF Knowledge Graph and SPARQL for a Knowledge-Based Clinical Decision Support System
Total Page:16
File Type:pdf, Size:1020Kb
347 A method of knowledgebase curation using RDF Knowledge Graph and SPARQL for a knowledge-based clinical decision support system Xavierlal J Mattam and Ravi Lourdusamy Sacred Heart College(Autonomous), Tirupattur, Tamil Nadu, India Abstract Clinical decisions are considered crucial and lifesaving. At times, healthcare workers are overworked and there could be lapses in judgements or decisions that could lead to tragic consequences. The clinical decision support systems are very important to assist heath workers. But in spite of a lot of effort in building a perfect system for clinical decision support, such a system is yet to see the light of day. Any clinical decision support system is as good as its knowledgebase. So, the knowledgebase should be consistently maintained with updated knowledge available in medical literature. The challenge in doing it lies in the fact that there is huge amount of data in the web in varied format. A method of knowledgebase curation is proposed in the article using RDF Knowledge Graph and SPARQL queries. Keywords Clinical Decision Support System, RDF Knowledge Graph, Knowledgebase Curation. 1. Introduction and awareness in the diagnosis and treatment of the disease. Although there were many breakthroughs published in medical literature Decision Support has been a crucial part of globally, down-to-earth use of any of them were a healthcare unit. In every area of a health care slow and far-between. It would have not been facility, critical and urgent decisions have to be the case had there been CDSS that was capable made. In such extreme situations, leaving lives of automatically acquiring reliable knowledge at stake totally to mere human knowledge and from authenticated medical literature. Such memory is a very big risk. It can often lead to CDSS could alter heath workers with an all- untold misery to the stakeholders and disaster round advanced knowledge at the moment of to such facilities. In 2009 when Health crucial decisions. Information Technology for Economic and In the article, some aspects of the recent Clinical Health (HITECH) was promulgated in advances in the technology used in CDSS are the United States of America, monetary aid was described together with related works carried disbursed for success in the implementation of out in the development of knowledge-based Clinical Decision Support System(CDSS). It CDSS before the proposed method of was because CDSS, although being far from a knowledgebase curation in CDSS is explained. perfect system, was found to be better than In the section 2 that follows, a brief background mere human decisions. Since then, a lot of study is given into re-cent developments in and research is being done to perfect the CDSS. knowledge-based CDSS. In section 3, some One of the enlightening issues that came to recent works on possible methods of the forefront during the recent pan-demic knowledge base curation are mentioned. Then outbreak was the lack of widespread knowledge the proposed method is explained in section 4 ISIC 2021, February 25-27,2021, New Delhi, India and that is followed by a brief discussion on the EMAIL:[email protected] (X. Mattam); proposed method in section 5. Finally, in [email protected](R. Lourdusamy); section 6, a summarized conclusion is made. ORCID: 0000-0002-5182-3627(X. Mattam); ©️ 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 2. Background 348 CDSS has evolved gradually with the ample availability of such knowledgebases, it could technological developments that has happened also be cost effective in terms of price [12]. But in the past few decades. The system is such knowledge acquisition might lead to the essentially centered on high adaption and knowledge-acquisition bottleneck as certain effective use of constantly updated knowledge. standards and formalization will have to be With the evolution of CDSS over the years, maintained for the knowledge portability. Such there has also been a consistent evolution in its a knowledge-acquisition bottleneck could harm definition from a mere use of information the effectiveness of the CDSS and freeing the technology for data entry to a hi-tech complex CDSS of the bottleneck makes the knowledge system that provides individual specific, acquisition process complex and difficult [12], intelligently filtered and efficiently presented [13]. Another bottleneck in the curation of knowledge for clinicians, staff, patients, or knowledgebase lies in the maintenance of the other individuals [1, 2] knowledgebase [14]. Together with creation of CDSS can be broadly classified as knowledgebase, its verification and constant knowledge-based CDSS and non-knowledge- updating is equally important. The verification based CDSS. The knowledge-based CDSS are and validation of the knowledgebase involves designed to mimic the knowledge processed by transparency, updatability, adaptability, and a human expert. Such systems were earlier learnability [15]. termed as the expert systems. Non-knowledge- A knowledgebase is judged by its accuracy, based systems, on the other hand, rely on completeness and the quality of its data. So, the statistical data that is available to help in construction of the knowledgebase is done in decision making. These systems make full use such a manner that these three factors are of the machine learning and neural network enhanced to the maximum. The methods of algorithms to predict possible outcomes [3, 4, constructing knowledgebases can be classified 5, 6]. into four main groups. There are closed methods in which the knowledgebase is are 2.1. Knowledge-based CDSS manually fixed by experts, open methods in which knowledgebases are curated by volunteers, automated semi-structured methods Knowledge-based systems evolved from in which the knowledgebases are procured from expert systems. While the expert systems were semi-structured texts automatically by using built on the knowledge of human experts, the rules that are programmed into the system and knowledge-based systems have the capability the automated unstructured methods that use to acquire knowledge from different sources artificial intelligence algorithms to extract and build upon it. So, while the expert system knowledge from unstructured texts [16]. could be ranked according to the knowledge of the expert, the knowledge-based systems had the capacity of greater knowledge [7, 8]. 2.2. Knowledge Graphs for The knowledgebase of the knowledge-based knowledge base CDSS ultimately determines the effectiveness of the CDSS [9]. The acquisition, Knowledge graphs are knowledgebases in representation and the integration of knowledge which knowledge is expressed in a graph base in the workflow is vital for the success of structure having nodes to represent the concepts CDSS [10]. The process of selecting, or entities and edges between the nodes to organizing, and looking after the knowledge in represent the relationship between those entities the knowledge base makes the knowledgebase or concepts [16, 17, 18, 19, 20, 21, 22, 23]. efficient and the CDSS successful. The two There are diverse definitions for knowledge important facets of the curation process is the graphs varying according to the purpose for method of knowledge acquisition and which the knowledge graph is created or by the knowledge representation [11]. knowledge graph model [24]. Although Google One way to build a cost effective is credited for the popularity of knowledge knowledge-based CDSS is to use commercial graphs from 2012 [24, 25, 26], the term knowledgebases that are available. It could knowledge graph was used in a report in 1973 reduce cost of the development time of the with a very similar meaning [27] and later in CDSS and also because of the common 1982 the term was used to represent textual 349 concepts using graphs. There has been decades 2.3. RDF Knowledge Graphs of study in representing knowledge using graphs [28]. Resource Description Framework(RDF) is a The maintenance of knowledge graphs has World Wide Web Consortium (W3C) the processes of creation, hosting, curation and specification to represent knowledge in the deployment. The process of creation can be form of triples (subject, predicate, object) manual as in the case of expert systems or semi- containing references, literals or blank [30], automatic or automatic. Apart from these, there [31]. RDF can be modelled as directed label is also a method of annotation by mapping the graphs in which the subject and object are knowledgebase entities to the source without represented by the vertices or nodes and the actually keeping the entities in the corresponding predicate are represented by the knowledgebase. Hosting or storage processes labelled edges [32, 33, 34, 35]. RDF graphs are use various methods of keeping knowledge in widely used to represent knowledge graphs like the knowledgebase. The curation processes in the cases of Freebase, Yago, and Linked Data involve three steps, namely, the assessment of since the billions of triples scattered across the new knowledge, its cleaning and its enrichment web can be captured and integrated with the by detecting the source of the knowledge, existing knowledge using powerful abstraction integrating it with existing knowledgebase, for representing heterogeneous, partial, scant, detecting duplication and correcting entity and potentially noisy knowledge graphs [36], relations. Once the knowledgebase is ready, it [37]. Unlike property graphs that is also quite is deployed in appropriate application [29]. popular representation of knowledge graphs There are various sources of knowledge that due to its property and value representation for can be utilized for the creation of knowledge its edges, the use of metadata in RDF base. Textual knowledge that can be in the form knowledge graphs allows the convenient of newspapers, books, scientific articles, social distributed storage of knowledge. That also media, emails, web crawls, and so on is a very makes RDF graphs more flexible than property rich source of knowledge for building a graphs [37, 38].