347

A method of knowledgebase curation using RDF and SPARQL for a knowledge-based clinical

Xavierlal J Mattam and Ravi Lourdusamy

Sacred Heart College(Autonomous), Tirupattur, Tamil Nadu, India

Abstract Clinical decisions are considered crucial and lifesaving. At times, healthcare workers are overworked and there could be lapses in judgements or decisions that could lead to tragic consequences. The clinical decision support systems are very important to assist heath workers. But in spite of a lot of effort in building a perfect system for clinical decision support, such a system is yet to see the light of day. Any clinical decision support system is as good as its knowledgebase. So, the knowledgebase should be consistently maintained with updated knowledge available in medical literature. The challenge in doing it lies in the fact that there is huge amount of data in the web in varied format. A method of knowledgebase curation is proposed in the article using RDF Knowledge Graph and SPARQL queries.

Keywords Clinical Decision Support System, RDF Knowledge Graph, Knowledgebase Curation.

1. Introduction and awareness in the diagnosis and treatment of the disease. Although there were many breakthroughs published in medical literature Decision Support has been a crucial part of globally, down-to-earth use of any of them were a healthcare unit. In every area of a health care slow and far-between. It would have not been facility, critical and urgent decisions have to be the case had there been CDSS that was capable made. In such extreme situations, leaving lives of automatically acquiring reliable knowledge at stake totally to mere human knowledge and from authenticated medical literature. Such memory is a very big risk. It can often lead to CDSS could alter heath workers with an all- untold misery to the stakeholders and disaster round advanced knowledge at the moment of to such facilities. In 2009 when Health crucial decisions. for Economic and In the article, some aspects of the recent Clinical Health (HITECH) was promulgated in advances in the technology used in CDSS are the United States of America, monetary aid was described together with related works carried disbursed for success in the implementation of out in the development of knowledge-based Clinical Decision Support System(CDSS). It CDSS before the proposed method of was because CDSS, although being far from a knowledgebase curation in CDSS is explained. perfect system, was found to be better than In the section 2 that follows, a brief background mere human decisions. Since then, a lot of study is given into re-cent developments in and research is being done to perfect the CDSS. knowledge-based CDSS. In section 3, some One of the enlightening issues that came to recent works on possible methods of the forefront during the recent pan-demic curation are mentioned. Then outbreak was the lack of widespread knowledge the proposed method is explained in section 4 ISIC 2021, February 25-27,2021, New Delhi, India and that is followed by a brief discussion on the EMAIL:[email protected] (X. Mattam); proposed method in section 5. Finally, in [email protected](R. Lourdusamy); section 6, a summarized conclusion is made. ORCID: 0000-0002-5182-3627(X. Mattam); ©️ 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 2. Background 348

CDSS has evolved gradually with the ample availability of such knowledgebases, it could technological developments that has happened also be cost effective in terms of price [12]. But in the past few decades. The system is such knowledge acquisition might lead to the essentially centered on high adaption and knowledge-acquisition bottleneck as certain effective use of constantly updated knowledge. standards and formalization will have to be With the evolution of CDSS over the years, maintained for the knowledge portability. Such there has also been a consistent evolution in its a knowledge-acquisition bottleneck could harm definition from a mere use of information the effectiveness of the CDSS and freeing the technology for data entry to a hi-tech complex CDSS of the bottleneck makes the knowledge system that provides individual specific, acquisition process complex and difficult [12], intelligently filtered and efficiently presented [13]. Another bottleneck in the curation of knowledge for clinicians, staff, patients, or knowledgebase lies in the maintenance of the other individuals [1, 2] knowledgebase [14]. Together with creation of CDSS can be broadly classified as knowledgebase, its verification and constant knowledge-based CDSS and non-knowledge- updating is equally important. The verification based CDSS. The knowledge-based CDSS are and validation of the knowledgebase involves designed to mimic the knowledge processed by transparency, updatability, adaptability, and a human expert. Such systems were earlier learnability [15]. termed as the expert systems. Non-knowledge- A knowledgebase is judged by its accuracy, based systems, on the other hand, rely on completeness and the quality of its data. So, the statistical data that is available to help in construction of the knowledgebase is done in decision making. These systems make full use such a manner that these three factors are of the and neural network enhanced to the maximum. The methods of algorithms to predict possible outcomes [3, 4, constructing knowledgebases can be classified 5, 6]. into four main groups. There are closed methods in which the knowledgebase is are 2.1. Knowledge-based CDSS manually fixed by experts, open methods in which knowledgebases are curated by volunteers, automated semi-structured methods Knowledge-based systems evolved from in which the knowledgebases are procured from expert systems. While the expert systems were semi-structured texts automatically by using built on the knowledge of human experts, the rules that are programmed into the system and knowledge-based systems have the capability the automated unstructured methods that use to acquire knowledge from different sources algorithms to extract and build upon it. So, while the knowledge from unstructured texts [16]. could be ranked according to the knowledge of the expert, the knowledge-based systems had the capacity of greater knowledge [7, 8]. 2.2. Knowledge Graphs for The knowledgebase of the knowledge-based knowledge base CDSS ultimately determines the effectiveness of the CDSS [9]. The acquisition, Knowledge graphs are knowledgebases in representation and the integration of knowledge which knowledge is expressed in a graph base in the workflow is vital for the success of structure having nodes to represent the concepts CDSS [10]. The process of selecting, or entities and edges between the nodes to organizing, and looking after the knowledge in represent the relationship between those entities the knowledge base makes the knowledgebase or concepts [16, 17, 18, 19, 20, 21, 22, 23]. efficient and the CDSS successful. The two There are diverse definitions for knowledge important facets of the curation process is the graphs varying according to the purpose for method of knowledge acquisition and which the knowledge graph is created or by the knowledge representation [11]. knowledge graph model [24]. Although Google One way to build a cost effective is credited for the popularity of knowledge knowledge-based CDSS is to use commercial graphs from 2012 [24, 25, 26], the term knowledgebases that are available. It could knowledge graph was used in a report in 1973 reduce cost of the development time of the with a very similar meaning [27] and later in CDSS and also because of the common 1982 the term was used to represent textual 349

concepts using graphs. There has been decades 2.3. RDF Knowledge Graphs of study in representing knowledge using graphs [28]. Resource Description Framework(RDF) is a The maintenance of knowledge graphs has World Wide Web Consortium (W3C) the processes of creation, hosting, curation and specification to represent knowledge in the deployment. The process of creation can be form of triples (subject, predicate, object) manual as in the case of expert systems or semi- containing references, literals or blank [30], automatic or automatic. Apart from these, there [31]. RDF can be modelled as directed label is also a method of annotation by mapping the graphs in which the subject and object are knowledgebase entities to the source without represented by the vertices or nodes and the actually keeping the entities in the corresponding predicate are represented by the knowledgebase. Hosting or storage processes labelled edges [32, 33, 34, 35]. RDF graphs are use various methods of keeping knowledge in widely used to represent knowledge graphs like the knowledgebase. The curation processes in the cases of Freebase, Yago, and Linked Data involve three steps, namely, the assessment of since the billions of triples scattered across the new knowledge, its cleaning and its enrichment web can be captured and integrated with the by detecting the source of the knowledge, existing knowledge using powerful abstraction integrating it with existing knowledgebase, for representing heterogeneous, partial, scant, detecting duplication and correcting entity and potentially noisy knowledge graphs [36], relations. Once the knowledgebase is ready, it [37]. Unlike property graphs that is also quite is deployed in appropriate application [29]. popular representation of knowledge graphs There are various sources of knowledge that due to its property and value representation for can be utilized for the creation of knowledge its edges, the use of metadata in RDF base. Textual knowledge that can be in the form knowledge graphs allows the convenient of newspapers, books, scientific articles, social distributed storage of knowledge. That also media, emails, web crawls, and so on is a very makes RDF graphs more flexible than property rich source of knowledge for building a graphs [37, 38]. knowledge graph for CDSS. However, the RDF knowledge graphs are stored as triples process of extracting knowledge from text is in a Triple store or RDF stores. The flexibility complex and involves the application of of RDF stores is its greatest advantage. Since Natural Language Processing(NLP) and the RDF knowledge graph has the ability to link Information Extraction(IE) techniques. any number of entities with their relations, the Curation of the knowledgebase using these RDF stores are also flexible enough to store techniques may follow a combination of five them without restriction on size. Moreover any stages. In the pre-processing stage, the text is kind of knowledge can be expressed and stored analyzed for atomic terms and symbols. Some using RDF knowledge graphs that allows its of the techniques used in the pre-processing extraction and reuse by different applications stage are Tokenization, Part-of- [39]. Speech(POS)tagging, Dependency Parsing and In the case of textual knowledge, RDF Word Sense Disambiguation(WSD). After the knowledge graphs are helpful in finding the pre-processing stage is the Named Entity Thematic Scope or the Topic Model of a text. Recognition (NER) stage in which the various The topic category or the semantic entities in a concepts or entities that forms the nodes of the set of documents is abstracted using the graph are identified. The NER is followed by Thematic Scope or the Topic Model [38, 40]. the Entity Linking (EL) stage in which an Since the RDF knowledge graphs of a association is made between the entities that are document is represented as a set of triples in identified in the text and the entities in the which each triple is considered a word or an existing knowledge graphs so that the similar entity, there is the possibility of detecting word entities could be placed side-by-side. During and phrase patterns automatically by clustering the Relation Extraction (RE) stage, the relation word groups that best characterize a document. between the various entities taken from the text Some of the methods of the Topic Modelling of are considered using a various RE techniques. RDF knowledge graphs are Latent Semantic Finally the extracted relation is joint to the Analysis (LSA), Probabilistic Latent Semantic entities in the last stage of the text processing Indexing (pLSI) and Latent Dirichlet [22]. 350

Allocation (LDA). The challenges faced in using a part or the whole expression of a Topic Modelling include sparseness of the question in natural language. The question in entities, a lack of context especially when the natural language can be interrogative in words used have multiple meanings especially which case it will be a factoid type of question when the entities are sparce and the use of and its answer will be a fact from the unnatural language like the use of special knowledge source or the question could be characters or unusual casing of letters which statement in which case the answer will be in normally are removed while pre-processing the the form of either a list or a definition or text. Normally, these challenges are overcome hypothetical statement or a causal remark or a by supplementing the text or modifying the relationship description or procedural method of Topic Modelling [40, 41, 42]. Entity explanation or just a confirmation. The sources summarization which is the best way of of knowledge are normally unstructured and is summarize an entity by identifying a limited a set of documents, video clips, audio clips, or number of ordered RDF triples is one of the text files that are given as input to the systems problems that is solved using Topic Models of [44]. QA systems make use of Information RDF knowledge graphs [41, 42]. Entity Retrieval, Information Extraction and Natural summarization has many applications like Language Processing(NLP) techniques [45]. search engines and is useful for research QA systems have a long history of activities. The existing entity summarization development starting with the earliest popular techniques can be classified into the generic system BASEBALL in 1961 and LUNAR techniques that apply to a wide range of made in 1972. The Text REtrieval Conference domains, applications and users and the specific (TREC) that began in 1992 for large scale techniques that make use of external resources accelerated interest and or factors that are effective only in specific growth in QA systems. Other forums and domains or applications. While the generic campaigns such as the Cross Language techniques make use of universal features like Evaluation Forum (CLEF) and NIITest frequency and centrality, informativeness, and Collections for IR systems (NTCIR) campaigns diversity and coverage, the specific techniques also enhanced the QA systems. Noteworthy QA make use of domain knowledge, context systems include IBM , and the other awareness and personalization [43]. commercial personal assistant software like Apple’s , Amazon’s Alexa, Google’s Assist 3. Related works and Microsoft’s [44, 45, 46, 47, 48, 49]. SPARQL is a query language in which a Extracting knowledge from unstructured pattern in a query is matched with that in a textual sources has been a challenge. Several graph from different sources. The matching is studies have been done in order to solve the done in three stages. It begins with the pattern problem of retrieving meaningful and relevant matching involving features like optional parts, knowledge from literature since it is crucial for union of patterns, nesting, filtering or decision support in systems like the CDSS. constraining values of matches, and selecting Some relevant techniques have been delt with the data source to be matched by a pattern. Once in the earlier sections on knowledge graphs and these features are applied, the output is RDF knowledge graphs. As part of the computed using standard operators like proposed method, certain other related projection, distinct, order, limit, and offset to techniques have to be explained in order to have modify the values that is got from pattern a complete picture of the complexity of the matching. Finally, the result of the SPARQL problem of curating the knowledgebase for query is given in one of the many forms like CDSS. Boolean answers, the pattern matching values, new triples from the values, and resources 3.1. using description. Working with RDF knowledge SPARQL graphs require the use of SPARQL [50, 51]. In order to apply NLP techniques used in the QA systems over SPARQL queries, query builders Question-answering (QA) is a process of such as QUaTRO2, OptiqueVQS, retrieving knowledge from different sources NITELIGHT, QueryVOWL, Smeagol, 351

SPARQL Assist language-neutral query or more prevalently using graph embeddings composer, XSPARQL-Viz, Ontology Based [58, 59]. Graphical Query Language, NL-Graphs and so Knowledge graphs can be centralized or on are used [52]. Some of the challenges in the distributed. Both the centralized and distributed use of SPARQL for QA systems include lexical knowledge graphs have their advantages and gap, ambiguity, multilingualism, use of disadvantages [60]. When it comes to complex operators, distributed knowledge and distributed knowledge graphs, federated query in the use of procedural, temporal, spatial processing is used in which the result of the templates [53]. query is computed from different data source. The federated query processing accesses 3.2. Ranked RDF Triples and different autonomous, distributed, and heterogeneous data sources to without having Federated Search any control over the sources. Federated query processing is more complex than the Ranking SPARQL query results is an centralized system because of the many important process for applications involving parameters involved in the query processing. searches, QA and entity summarization Federated query processing makes use of techniques. Ranking of RDF triples can be over federated query engine to search for the results resources, properties, or triples as a whole but a over distributed sources [61]. The federated combined ranking of both the triples and its SPARQL query processing can be done either entities are important for RDF knowledge over different SPARQL endpoints or over graphs for faster and efficient searches in the linked data or over Distributed Hash Tables. knowledgebase [54]. Ranking can be done on The federated SPARQL query processing can structured data using structured queries that also be classified either as catalog or index results in a structured graph. Such rankings are assisted processing or as catalog or index free structure-based ranking and mostly use an processing or a combination of both [62]. extension of ranking algebra that was earlier used in relational . Content-based 3.3. Open Information Extraction ranking on the other hand tries to rank the content of structure or unstructured data. In content-based ranking, the ranking is done Information Extraction(IE) is an automated according to the match between the pattern of process of collecting a set of corresponding the query and its holistic match in the information of interest from a given sequences knowledgebase. Further classification of query of unstructured data. IE has many applications ranking can be as keyword queries on such as part-of-speech tagging, named entity unstructured data like documents, structured recognition, shallow parsing, table extraction, queries on structured data, keyword queries on contact information extraction and so on. structured data, and keyword-augmented Methods used for IE can be classified as rule structured queries on structured data [55]. learning based extraction methods, Ranking can also be based on the relevance or classification based extraction methods, and importance of the SPARQL query results with sequential Labeling based extraction methods the topic on which the query is made. The [63]. Open Information Extraction(OIE) is a relevance ranking requires the subject of the text IE paradigm that enables relations query to be clearly defined so that the results of discovery independent of the domain that is the query can be ranked according its relevance readily scalable to the variations in size and to the subject. The importance ranking on the content of the web. OIE is technically capable other hand specifies the importance that is of meeting the challenges of the IE, namely, given to the query result. In the importance automation of the process, heterogeneity of the ranking factors such as authoritative, web corpus and efficiency in extraction of trustworthy, and so on are placed for the information [64]. The OIE like Text Runner, ranking purpose for which human cognitive Clause IE, OLLIE, and the like were data based results are taken for consideration [56]. The using training data that were represented either ranking is placed along with the triple using by dependency parsing or parts-of-speech tokens in an Extended Knowledge Graphs [57] tagged text. The Rule-based OIE were manually programmed using the dependency 352

trees or parts-of-speech tagged text. Two 4. Proposed Method of examples of Rule-based OIE are clauseIE and ExtrHech [65]. Another method of OIE is by Knowledgebase Curation linguistic analysis that shows the canonical ways in which verbs in a text is used to express A CDSS is as efficient as its knowledgebase relationship between entities. RE- is. If the knowledgebase of the CDSS is highly VERB, ARGLEARNER and adaptive to automatically and constantly update R2A2(combination of REVERB and itself reflecting recent advances and local ARGLEARNER) are examples of the linguistic practice, then that will be a robust CDSS. The analysis method. The earlier methods were flexibility of the knowledgebase to accept based on the label, learn and extract stages of knowledge from diverse sources and portability IE. The main drawback of the three-stage of the knowledgebase for various practice process was that the information extracted were settings will make the knowledgebase more either incoherent or uninformative and effective [9, 70] therefore they were of little use to applications. The linguistic-statistical analysis for 4.1. Motivation extractions on the other hand identifying a more meaningful and informative relation phrase CDSS requires quick and reliable [66]. knowledge. Therefore, a centralized The biggest advantage of OIE is its ability to knowledgebase will be better than a federated extract relationship between entities allowing search. Since knowledge on most of the queries like “(?, kill, bacteria)” or “(Bill Gates, advances found in medical literature, the ?, Microsoft)” to extract resultant missing knowledge extracted from the literature has to relationships from a text corpus. Moreover, be found in the knowledgebase. The OIE will result in a compressed data for a information extraction from medical literature knowledgebase [67]. Other than populating can be done using OIE. If the user interface of knowledgebases, OIE is also used for question the CDSS allows natural language questions to answering, semantic indexing, semantic search be asked, the questions can be converted and such target applications. Converting OIE through a QA system as a SPARQL query triples into RDF knowledge graph is possible linked to through the knowledge graph. If since the longer sentences are broken into answers are not found in the existing triples with entities and relationship between knowledgebase of the CDSS, it can than be entities leaving out the determiners and passed through the OIE to relevant medical propositions. Knowledgebase populating using literature and the resultant knowledge can be OIE has been a very useful application domain integrated to the existing knowledgebase. The [68, 69]. Integrating the OIE triples with the knowledgebase is so enhanced that most exiting knowledgebase has been a research answer to query will be found in it and the challenge and there have been many solutions updating will be done automatically once new proposed for it like the predicate or attribute knowledge is found in any form on the web. level schema where similarity on names, types, There are many advantages of such descriptions, instances, and so on are mapped centralized knowledge graphs. Centralized and universal schema to apply got knowledge graphs can be controlled by a single from OIE and the existing knowledge mapped entity when it comes to strategic issues such as at the instance-level. One of the problems of symptoms for diagnostic systems of the CDSS. using universal schema is that the process Such control increases the survivability and ignores unseen entities and entity pairs and tries robustness of the CDSS. The uniform use of to overfit the space entities to large number of terms in centralized knowledgebase make it parameters. Rowless Universal Schema more stable. The fixed curation method of the attempts to find inferences between predicated knowledgebase of the centralized systems make and relations so that the problem of unseen the knowledge consistent and improves its entities and entity pairs are solved. But it tends quality. Moreover, the fixed schema of the to completely ignore the existence of entities centralized knowledge graphs allows uniform and thus it functions like the predicate or usage. The knowledge graphs allows the use of attribute level schema [69]. 353

application programming interface (API) for For the evaluation of the proposed algorithm, the knowledge retrieval and query processing [60]. precision and recall method are used as it is the typical form of evaluation metrics used in information extraction. During the process of 4.2. Proposed Approach information extraction or retrieval, there could be two types of knowledge that is obtained from the The proposed approach of knowledgebase knowledge source. There is knowledge that can be curation for CDSS has three stages. In the first considered important to the application and there is stage, new knowledge is extracted from a knowledge relevant to the query. In the proposed medical literature using an OIE application. approach, since the query is based on keywords from The OIE application is for knowledge the QA system, only knowledge relevant to the extraction since it will result in triples that can query is selected rather than all knowledge that is be integrated to the existing knowledge graph deemed important from the knowledge source. of the CDSS. The triples got from the OIE Therefore, the results of the OIE is restricted to just application on the medical literature is queried the knowledge relevant to the query. The application using keywords from the CDSS interface for of the evaluation metrics is also bound by the relevant knowledge using a QA system. If the consideration that only the sum total of the relevant knowledge found by the system proportionate to all query results in new knowledge being found, the relevant knowledge that can be manually then those triples in RDF from are added to the counted on the same medical literature is calculated existing knowledge graph of the CDSS. A table rather that taking in consideration all the important is maintained with the list of medical literature knowledge present in the literature that is used in the already checked for knowledge so that they test. The precision evaluation metric is given by the need not be looked for new knowledge again. formula in equation (1) The algorithm for the proposed system is as shown in Algorithm 1. relevant RDF triplesretrived RDF triples Precision = (1) Algorithm 1. retrived RDF triples Curating the CDSS knowledge base using RDF A contingency matrix can be formed using Knowledge Graph the relevance of the RDF triple as shown in table 1. If the triple retrieved by the system is URI (Uniform Resource Identifier) relevant to the query than it is true positive table U = {u1, u2, …, un} RDF Knowledge Graph G ={V,E} where V Є otherwise it is false positive. So also, if a relevant triple in the knowledge source is not {v1,v2,….,vn} and E Є {e1,e2,…,en} RDF triple S = {s, p, o} (subject(s), retrieved by the system then it is false negative predicate(p), object(o)) and if a triple that is not relevant and is ignored Keyword K = {k1, k2,…kn} taken from the by the system it is true negative. QA system of CDSS 1 Read ui //URI of a new document Table 1 2 If ui Є U GOTO Step 12 Contingency matrix according to the relevance 3 Else use OpenIE to create G of RDF triples 4 For every K Relevant Not 5 use QA system with SPARQL query Relevant to find ki in G 6 For every S found RDF triple True positive False 7 If S not in CDSS knowledge base retrieved positive 8 Append S to CDSS knowledge base RDF triple False True 9 END For loop not retrieved negative negative 10 END For loop 11 END Else The formula to calculate the precision of the 12 If another document exists GOTO Step 1 system using the contingency matrix can be 13 Else STOP given as in equation (2)

totalnumberof truepositives 4.3. Evaluation of the Proposed Precision = (2) Method totalnumberof truepositivesandfalsepositives 354

The percentage of the precision can also be 6. Conclusion calculated as in equation (3)

totalnumber of true positives CDSS has been considered a very important Precision %= ×100 (3) system in the healthcare sector. That is the totalnumber of true positivesand false positives reason for the numerous studies that has been For recall we take into consideration the done on developing a perfect system that is proportion of the retrieved triples to the total highly efficient while at the same time reliable. relevant triples as in equation (4) Since the CDSS requires a very quick response to queries, a centralized system is to be relevant RDF triples retrived RDFtriples considered. At the same time, the Recall= (4) relevant RDF triples knowledgebase of such a system requires being maintained with constant and consistent Using the contingency matrix, the formula updating from various sources of medical to calculate recall is as in equation (5) literature. Knowledge graphs have proved to be a very formidable approach to represent huge totalnumber of true positives amount of knowledge that is now available in Recall= (5) totalnumber of true positivesand false negatives the web. RDF triples are reliable storage method for knowledge graphs in the form of RDF knowledge graphs. Curation of the RDF 5. Discussion for Further Study and knowledge graph can be done through a QA Development system that converts natural language questions into SPARQL queries which when matched The proposed method of knowledgebase curation with RDF triples from an OIE process can using RDF Knowledge Graph and SPARQL for a enhance the knowledgebase. Therefore, a knowledge-based CDSS is pretty straightforward method is proposed to curate knowledge base and simple. Its efficiency depends on the underlying of a CDSS using RDF Knowledge graph. It is OIE method chosen for extracting knowledge. The possible to evaluate the system using precision system provides the automatic updating of the and recall methods and give an appraisal of its knowledgebase and in turn offers reliability to the efficiency in acquiring knowledge from various CDSS. Being a centralized system, the fixed sources. curation method of the knowledgebase will consistently improve its quality and make room for its usefulness in decision making processes. References However, there is a lot of improvement possibilities that can make the system much [1] B. Middleton, D. F. Sittig, A. Wright, more efficient and robust as a perfect system. Clinical Decision Support: a 25 Year One of the improvements that can be worked Retrospective and a 25 Year Vision, into the system is to use ranked RDF triples Yearbook of Medical Infor- matics 25 which can serve in two ways. First of all, it can (2016) S103–S116. URL: give weightage to the decision suggestion and http://www.thieme-connect.de/ secondly, it can help in removing redundant DOI/DOI?10.15265/IYS-2016-s034. triple from the knowledgebase allowing the doi:10.15265/IYS-2016-s034. CDSS to work faster. The ranked triples can be [2] J. A. Osheroff, J. M. Teich, B. Mid- dleton, evaluated using the precision and recall curves E. B. Steen, A. Wright, D. E. Detmer, A that can give a better appraisal of the system. Roadmap for National Action on Clinical Another approach to removing redundant Decision Support, Journal of the American knowledge is to formalize forgetting. Medical Informatics Association 14 Formalizing forgetting in knowledge graphs (2007) 141–145. URL: implies a method of removing either the https://academic.oup.com/ jamia/article- redundant entities or the redundant relations. lookup/doi/10.1197/jamia. M2334. Entities may not exist without relations. So, by doi:10.1197/jamia.M2334. removing relations would mean new updated [3] A. M. Shahsavarani, Abadi, E. A. Marz, relations replacing old relations. M. H. Kalkhoran2, S. Jafari, S. Qaranli, Clinical Decision Support Systems (CDSSs): State of the art Review 355

of Literature, International Journal of Research Issues Underlying Implemen- Medical Reviews 2 (2015) 299–308. URL: tation Successes and Failures, Journal of http://www.ijmedrev. Biomedical Informatics 78 (2018) 134– com/article{_}68717.html. 143. URL: https://linkinghub.elsevier. [4] R. T. Sutton, D. Pincock, D. C. Baumgart, com/retrieve/pii/S1532046417302757. D. C. Sadowski, R. N. Fedorak, K. I. doi:10.1016/j.jbi.2017.12.005. Kroeker, An Overview of Clinical De- [11] G. Kong, D.-L. Xu, J.-B. Yang, Clinical cision Support Systems: Benefits, Risks, Decision Support Systems: A Review on and Strategies for Success, npj Digital Knowledge Representation and Infer- ence Medicine 3 (2020) 17. URL: http://www. Under Uncertainties, International Journal nature.com/articles/s41746-020-0221-y. of Computational Intelli- gence Systems doi:10.1038/s41746-020-0221-y. 1(2008) 159–167. URL: [5] R. A. Greenes, Definition, Scope, and https://doi.org/10.1080/18756891.2008. Challenges, in: R. A. Greenes (Ed.), 9727613. Clinical Decision Support : The Road to doi:10.1080/18756891.2008.9727613. Broad Adoption, sec- ond ed., Elsevier, [12] A. Al-Badareen, M. Selamat, M. Samat, Y. London, 2014, pp. 3–47. URL: Nazira, O. Akkanat, A Review on Clin- https://linkinghub. ical Decision Support Systems in Health- elsevier.com/retrieve/pii/C20120003043. care, in: Journal of Convergence In- doi:10.1016/C2012-0-00304-3. formation Technology(JCIT), volume 9, [6] E. S. Berner, T. J. La Lande, Overview of 2014, pp. 125–135. Clinical Decision Support Sys- tems, in: E. [13] M. A. Musen, J. van der Lei, Knowledge S. Berner (Ed.), Clinical decision support engineering for clinical consultation systems, third ed., Springer, Cham, Cham, programs: modeling the applica- 2016, pp. 1–17. URL: tion area., Methods of information in http://link.springer.com/10.1007/ 978-3- medicine 28 (1989) 28–35. URL: 319-31913-1{_}1. doi:10.1007/978-3- http://www.ncbi.nlm.nih.gov/pubmed/ 319-31913-1_1. 2649771. [7] D. Graham, Introduction to Knowledge- [14] S. A. Spooner, Mathematical Founda- Based Systems, in: Knowledge-Based tions of Decision Support Systems, Image Processing Systems, Springer Springer International Publishing, London, London, 1997, pp. 3–13. URL: Cham, 2016, pp. 19–43. URL: https:// http://link.springer.com/10.1007/ 978-1- doi.org/10.1007/978-3-319-31913-1{_}2. 4471-0635-7{_}1. doi:10.1007/978-1- doi:10.1007/978-3-319-31913-1_ 2. 4471-0635-7_1. [15] K. von Michalik, M. Kwiatkowska, [8] D. Dinevski, U. Bele, T. Sarenac, U. Ra- K. Kielan, Application of Knowledge- jkovic, O. Sustersic, Clinical Decision Engineering Methods in Medical Support Systems, in: G. Graschew, S. , Rakowsky (Eds.), Telemedicine Springer Berlin Heidelberg, Berlin, Techniques and Applications, Heidelberg, 2013, pp. 205–214. URL: In- techOpen, 2011, pp. 185–210. https://doi. org/10.1007/978-3-642- URL: http://www.intechopen.com/books/ 36527-0{_}14. doi:10.1007/978-3-642- telemedicine-techniques-and-applications/ 36527-0_ 14. clinical-decision-support-systems. [16] M. Nickel, K. Murphy, V. doi:10.5772/25399. Tresp, E. Gabrilovich, A Review of Rela- [9] G. P. Purcell, What makes a good tional Machine Learning for Knowl- edge clinical decision support sys- tem, BMJ Graphs, Proceedings of the IEEE 104 (Clinical research ed.) 330 (2005) 740– (2016) 11–33. URL: https:// 741. URL: https: ieeexplore.ieee.org/document/7358050/. //europepmc.org/articles/PMC555864. doi:10.1109/JPROC.2015.2483592. doi:10.1136/bmj.330.7494.740. [17] K. Balog, Meet the Data, in: Entity- [10] R. A. Greenes, D. W. Bates, K. Kawamoto, Oriented Search, first ed., Springer, Cham, B. Middleton, J. Osheroff, Y. Shahar, Cham, 2018, pp. 25–53. URL: Clinical Decision Support Models and http://link.springer.com/10.1007/978-3- Frameworks: Seeking to Address 319-93935-3{_}2. doi:10.1007/978-3- 356

319-93935-3_2. Evolving Semantics (SuCCESS’16) co- [18] J. Yan, C. Wang, W. Cheng, M. Gao, located with the 12th International Con- A. Zhou, A retrospective of knowledge feren, Sun SITE Central Europe (CEUR), graphs, Frontiers of 12 Leipzig, Germany, 2016. (2018) 55–74. URL: [25] M. Färber, F. Bartscherer, C. Menne, A. http://link.springer.com/10.1007/s11704- Rettinger, Linked data quality of DBpedia, 016-5228-9. doi:10.1007/s11704-016- Freebase, OpenCyc, , and 5228-9. YAGO, 9 (2018) 77– 129. [19] P. A. Bonatti, S. Decker, A. Polleres, doi:10.3233/SW-170275. V. Presutti, Knowledge Graphs: New Di- [26] D. Fensel, U. Şimşek, K. Angele, E. Hua- rections for Knowledge Representation on man, E. Kärle, O. Panasiuk, I. Toma, J. the Semantic Web, Dagstuhl Reports 8 Umbrich, A. Wahler, How to Build a (2019) 29–111. URL: https://drops. Knowledge Graph, in: Knowl- edge dagstuhl.de/opus/volltexte/2019/10328/. Graphs, Springer International Publishing, doi:10.4230/DagRep.8.9.29. Cham, 2020, pp. 11–68. URL: [20] R. Popping, Text Analysis for http://link.springer.com/10.1007/ 978-3- Knowledge Graphs, Qual- ity & 030-37439-6{_}2. doi:10.1007/ 978-3- Quantity 41 (2007) 691–709. URL: 030-37439-6_2. http://link.springer. com/10.1007/s11135- [27] E. W. Schneider, Course Modularization 006-9020-z. doi:10.1007/s11135-006- Applied: The Interface System and its 9020-z. Implications for Sequence Control and [21] R. Popping, Knowledge Graphs and Data Analysis, Technical Report, Human Network Text Analysis, Social Science Resources Research Organization (Hum- Information 42 (2003) 91–106. URL: RRO), Virginia, 1973. URL: https://files. http://journals.sagepub.com/doi/10.1177/ eric.ed.gov/fulltext/ED088424.pdf. 0539018403042001798. [28] S. S. Nurdiati, C. Hoede, 25 years de- doi:10.1177/0539018403042001798. velopment of knowledge graph theory: the [22] A. Hogan, E. Blomqvist, M. Cochez, results and the challenge, Technical C. D’Amato, G. de Melo, C. Gutier- rez, Report, Department of Applied J. E. L. Gayo, S. Kirrane, S. Neu- maier, Mathematics, University of Twente, A. Polleres, R. Navigli, A. C. N. Enschede, 2008. URL: Ngomo, S. M. Rashid, A. Rula, L. https://research.utwente.nl/en/publications Schmelzeisen, J. Sequeda, S. Staab, A. [29] D. Fensel, U. Şimşek, K. Angele, E. Hua- Zimmermann, Knowledge Graphs (2020). man, E. Kärle, O. Panasiuk, I. Toma, J. URL: http://arxiv.org/abs/2003. 02320. Umbrich, A. Wahler, Introduc- tion: arXiv:2003.02320. What Is a Knowledge Graph?, in: [23] S. Seufert, P. Ernst, S. J. Bedathur, S. K. Knowledge Graphs, Springer In- Kondreddi, K. Berberich, G. Weikum, ternational Publishing, Cham, 2020, pp. 1– Instant Espresso: Interactive Analysis of 10. URL: http://link.springer.com/ Relationships in Knowledge Graphs, in: 10.1007/978-3-030-37439-6{_}1. Proceedings of the 25th International doi:10.1007/978-3-030-37439-6_1. Conference Companion on World Wide [30] A. Harth, S. Decker, Optimized In- dex Web - WWW ’16 Companion, ACM Structures for Querying RDF from the Press, New York, New York, USA, 2016, Web, in: Proceedings of the Third Latin pp. 251–254. URL: http://dl.acm.org/ American Web Congress, LA-WEB ’05, citation.cfm?doid=2872518.2890528. IEEE Computer Society, USA, 2005, p. doi:10.1145/2872518.2890528. 71. [24] L. Ehrlinger, W. Wöß, Towards a URL:https://doi.org/10.1109/LAWEB.20 Definition of Knowledge Graphs, in: M. 05.25. doi:10.1109/LAWEB.2005.25. Martin, M. Cuquet, E. Folmer (Eds.), Joint [31] B. Villazon-Terrazas, N. Garcia-Santa, Y. Proceedings of the Postersand De- mos Ren, A. Faraotti, H. Wu, Y. Zhao, G. Track of the 12th International Conference Vetere, J. Z. Pan, Knowledge Graph on Semantic Systems - SE- Foundations, Springer Inter- national MANTiCS2016 and the 1st Internation- Publishing, Cham, 2017, pp. 17–55. alWorkshop on Semantic Change & URL: https://doi.org/10.1007/ 978-3-319- 357

45654-6{_}2. doi:10.1007/ 978-3-319- RDF storage approaches, Revue Africaine 45654-6_2. de la Recherche en Informa- tique et [32] W. Zheng, L. Zou, W. Peng, X. Yan, Math{é}matiques Appliqu{é}es 15 (2012) S. Song, D. Zhao, Semantic SPARQL 11–35. URL: https://hal.inria.fr/ hal- Similarity Search over RDF Knowl- edge 01299496. Graphs, Proc. VLDB Endow. 9 (2016) [40] J. Sleeman, T. W. Finin, A. Joshi, Topic 840–851. URL: https://doi.org/10. Modeling for RDF Graphs, in: A. L. 14778/2983200.2983201. doi:10.14778/ Gentile, Z. Zhang, C. D’Amato, H. Paul- 2983200.2983201. heim (Eds.), Proceedings of the Third In- [33] T. Yang, J. Chen, X. Wang, Y. Chen, ternational Workshop on Linked Data for X. Du, Efficient SPARQL Query Evalua- Information Extraction (LD4IE2015), tion via Automatic Data Partitioning, in: CEUR Workshop Proceedings (CEUR- W. Meng, L. Feng, S. Bressan, W. Wini- WS.org), Bethlehem, Pennsylvania, USA, warter, W. Song (Eds.), Database Sys- 2015, pp. 48–62. tems for Advanced Applications DAS- [41] S. Pouriyeh, M. Allahyari, K. Kochut, G. FAA 2013, Springer Berlin Heidelberg, Cheng, H. R. Arabnia, Combin- ing Berlin, Heidelberg, 2013, pp. 244–258. Word Embedding and Knowledge- Based [34] O. Curé, G. Blin, RDF and the Semantic Topic Modeling for Entity Sum- Web Stack, in: O. Curé, G. Blin (Eds.), marization, in: 2018 IEEE 12th Inter- RDF Database Sys- tems, first ed., national Conference on Semantic Com- Elsevier, Waltham, MA, 2015, pp. 41– puting (ICSC), Institute of Electrical and 80.URL:https://linkinghub.elsevier.com/r Electronics Engineers ( IEEE ), Laguna etrieve/pii/ B9780127999579000031. Hills, CA, USA, 2018, pp. 252–255. doi:10.1016/ B978-0-12-799957-9.00003- doi:10.1109/ICSC.2018.00044. 1. [42] S. Pouriyeh, M. Allahyari, K. Kochut, G. [35] K. Hose, R. Schenkel, RDF Stores, in: L. Cheng, H. R. Arabnia, ES-LDA: En- tity Liu, M. T. Özsu (Eds.), Encyclopedia of Summarization using Knowledge- based Database Systems, Springer New York, Topic Modeling, in: G. Kondrak, T. New York, NY, 2017, pp. 3100– 3106. Watanab (Eds.), Proceedings of the Eighth URL:http://link.springer.com/10.1007/97 International Joint Conference on Natural 8-1-899-7993-3{_}80676-1. Language Processing (Volume 1: Long doi:10.1007/978-1-4899-7993-3_ 80676- Papers), Asian Federation of Natu- ral 1. Language Processing, Taipei, Taiwan, [36] F. Du, Y. Chen, X. Du, Partitioned In- 2017, pp. 316–325. URL: https://www. dexes for Entity Search over RDF Knowl- aclweb.org/anthology/I17-1032.pdf. edge Bases, in: S.-g. Lee, Z. Peng, [43] Q. Liu, G. Cheng, K. Gunaratna, Y. Qu, X. Zhou, Y.-S. Moon, R. Unland, J. Yoo Entity Summarization: State of the Art and (Eds.), Database Systems for Advanced Future Challenges, ArXiv abs/1910.0 Applications DASFAA 2012, Springer (2019). URL: http://arxiv.org/abs/1910. Berlin Heidelberg, Berlin, Heidelberg, 08252. arXiv:1910.08252. 2012, pp. 141–155. [44] O. Kolomiyets, M.F.Moens, A Survey on [37] A. Mohamed, G. Abuoda, A. Ghanem, Z. Question Answering Technology from an Kaoudi, A. Aboulnaga, RDF- Frames: Information Retrieval Perspective, Knowledge Graph Access for Machine Information Sciences 181 (2011) 5412– Learning Tools (2020). URL: 5434.URL: http://www.sciencedirect.com/ http://arxiv.org/abs/2002.03614. science/article/pii/S0020025511003860. arXiv:2002.03614. doi:https://doi.org/10.1016/j.ins.2011.07.0 [38] R. Denaux, Y. Ren, B. Villazon-Terrazas, 47. P. Alexopoulos, A. Faraotti, H. Wu, [45] E. M. Nabil Alkholy, M. Hassan Haggag, Knowledge Architecture for Organisa- A. Aboutabl, Question Answering tions, Springer International Publishing, Systems: Analysis and Survey, Cham, 2017, pp. 57–84. URL: https:// International Journal of Computer Science doi.org/10.1007/978-3-319-45654-6{_}3. & Engineering Survey 09 (2018) 1–13. doi:10.1007/978-3-319-45654-6_ 3. URL:http://aircconline.com/ijcses/V9N6/ [39] D. C. Faye, O. Curé, G. Blin, A sur- vey of 9618ijcses01.pdf. 358

doi:10.5121/ijcses.2018.9601. {CEUR} Workshop Proceedings, CEUR- [46] S. K. Dwivedi, V. Singh, Research and WS.org, 2016, p. 12. URL: http://ceur-ws. Reviews in Question Answering System, org/Vol-1684/paper21.pdf. Procedia Technology 10 (2013) 417–424. [53] K. Höffner, S. Walter, E. Marx, R. Us- URL:https://linkinghub.elsevier.com/retri beck, J. Lehmann, A.-C. N. Ngomo, eve/pii/S2212017313005409. Survey on Challenges of Ques- tion doi:10.1016/j.protcy.2013.12.378. Answering in the Semantic Web, [47] E. Dimitrakis, K. Sgontzos, Y. Tzitzikas, Semantic Web 8 (2017) 895–920. A Survey on Question Answering Sys- URL:https://content.iospress.com/articles/ tems Over Linked Data and Documents, semantic-web/sw247. doi:10.3233/SW- Journal of Intelligent Information Sys- 160247. tems 55 (2020) 233–259. URL: [54] Axel-Cyrille, N. Ngomo, M. Hoffmann, R. https://doi.org/10.1007/s10844-019-0584- Usbeck, K. Jha, Holistic and Scal- able 7. doi:10.1007/s10844-019-00584-7. Ranking of RDF Data, in: 2017 IEEE [48] A. M. N. Allam, M. H. Haggag, The International Conference on Big Data (Big Question Answering Systems: A Survey, Data), IEEE, Boston, MA, 2017, pp. International Journal of Re- search and 746–755. Reviews in Information Sciences doi:10.1109/BigData.2017.8257990. (IJRRIS) 2 (2012) 211–221. URL: [55] S. Elbassuoni, M. Ramanath, R. Schenkel, https://pdfs.semanticscholar.org/b294/ M. Sydow, G. Weikum, Language- 4a85b9cb428a28e30bdd236471e712667e Model-Based Ranking for Queries on 91.pdf. RDF-Graphs, in: Proceedings of the [49] D. Diefenbach, V. Lopez, K. Singh, 18th ACM Conference on Infor- mation P. Maret, Core Techniques of and Knowledge Management, CIKM ’09, Question Answering Systems over Association for Comput- ing Machinery, Knowledge Bases: A Survey, Knowledge New York, NY, USA, 2009, pp. 977–986. and Information Systems 55 (2018) 529– URL:https://doi.org/10.1145/1645953.16 569.URL: https://doi.org/10.1007/s10115- 46078. doi:10.1145/ 1645953.1646078. 017-1100-y. doi:10.1007/s10115-017- [56] A. Buikstra, H. Neth, L. Schooler, A. Ten 1100-y. Teije, F. Van Harmelen, Ranking Query [50] J. Pérez, M. Arenas, C. Gutierrez, Se- Results from Linked Open Data Using a mantics and Complexity of SPARQL, in: Simple Cognitive Heuristic, in: Proceed- I. Cruz, S. Decker, D. Allemang, C. Preist, ings of the 2011 International Confer- ence D. Schwabe, P. Mika, M. Uschold, L. M. on Discovering Meaning On the Go in Aroyo (Eds.), The Semantic Web - ISWC Large Heterogeneous Data, LHD’11, 2006, Springer Berlin Heidelberg, Berlin, Morgan Kaufmann Publishers Inc., San Heidelberg, 2006, pp. 30–43. Francisco, CA, USA, 2011, pp. 55–60. [51] R. Angles, C. Gutierrez, The Expressive [57] M. Yahya, D. Barbosa, K. Berberich, Q. Power of SPARQL, in: A. Sheth, S. Staab, Wang, G. Weikum, Relationship Queries M. Dean, M. Paolucci, D. Maynard, T. on Extended Knowledge Graphs, in: Finin, K. Thirunarayan (Eds.), The Se- Proceedings of the Ninth ACM mantic Web - ISWC 2008, Springer Berlin International Conference on Web Search Heidelberg, Berlin, Heidelberg, 2008, pp. and Data Mining, WSDM ’16, Association 114–129. for Com- puting Machinery, New York, [52] P. Grafkin, M. Mironov, M. Fellmann, B. NY, USA, 2016, pp. 605–614. URL: https: Lantow, K. Sandkuhl, A. V. Smirnov, //doi.org/10.1145/2835776.2835795. SPARQL Query Builders: Overview and doi:10.1145/2835776.2835795. Comparison, in: B. Johansson, F. [58] Q. Wang, Z. Mao, B. Wang, L. Guo, Vencovský (Eds.), Joint Proceedings of Knowledge Graph Embedding: A Survey the {BIR} 2016 Workshops and Doc- toral of Approaches and Ap- plications, IEEE Consortium co-located with 15th Transactions on Knowledge and Data International Conference on Perspec- tives Engineering 29 (2017) 2724–2743. URL: in Business Informatics Research {(BIR} http://ieeexplore.ieee.org/document/8047 2016), Prague, Czech Republic, 276/. doi:10.1109/TKDE.2017.2754499. September 14 - 16, 2016, volume 1684 of [59] G. A. Gesese, R. Biswas, M. Alam, 359

H. Sack, A Survey on Knowledge Graph One, IJCAI’11, AAAI Press, 2011, pp. 3– Embeddings with Literals: Which model 10. links better Literal-ly?, ArXiv abs/1910.1 [67] Mausam, Open Information Extraction (2019). Systems and Downstream Applications, [60] P. Bonatti, S. Decker, A. Polleres, V. in: Proceedings of the Twenty-Fifth In- Presutti, Knowledge Graphs: New ternational Joint Conference on Artifi- cial Directions for Knowledge Representa- Intelligence, IJCAI’16, AAAI Press, 2016, tion on the Semantic Web (Dagstuhl pp. 4074–4077. Seminar 18371), Dagstuhl Reports 8 [68] A. Zhila, E. Yagunova, O. Makarova, Bringing The Output of Open Informa- (2018) 106–110. URL: https://drops. tion Extraction to The RDF / XML For- dagstuhl.de/opus/volltexte/2019/10328/. mat : A Case Study, 2015. doi:10.4230/DagRep.8.9.29. [69] G. Angeli, M. J. J. Premkumar, C. D. [61] K. M. Endris, M.-E. Vidal, D. Graux, Manning, Leveraging Linguistic Struc- Federated Query Processing, Springer ture For Open Domain Information Ex- International Publishing, Cham, 2020, pp. traction, in: Proceedings of the 53rd 73–86. URL: https://doi.org/10.1007/ 978- Annual Meeting of the Association for 3-030-53199-7{_}5. doi:10.1007/978-3- Computational and the 7th 030-53199-7_5. International Joint Conference on Nat- ural [62] M. Saleem, Y. Khan, A. Hasnain, I. Er- Language Processing of the Asian milov, A.-C. N. Ngomo, A Fine-grained Federation of Natural Language Process- Evaluation of SPARQL Endpoint Feder- ing, {ACL} 2015, July 26-31, 2015, ation Systems, Semantic Web 7 (2016) Beijing, The Association for Computer 493–518. Linguis- tics, 2015, pp. 344–354. URL: [63] J. Tang, M. Hong, D. L. Zhang, J. Li, https://doi. org/10.3115/v1/p15-1034. Information Extraction: Methodologies doi:10.3115/ v1/p15-1034. and Applications, in: H. A. do Prad, E. [70] I. Sim, P. Gorman, R. A. Greenes, R. Ferneda (Eds.), Emerging Technolo- gies B. Haynes, B. Kaplan, H. Lehmann, P. C. of Text Mining, IGI Global, 2008, pp. 1– Tang, Clinical decision sup- port systems 33. URL: http://services.igi-global. for the practice of evidence-based com/resolvedoi/resolve.aspx?doi=10.4018 medicine, Journal of the American /978-1-59904-373-9.ch001. doi:10. Medical Informatics Association: JAMIA 4018/978-1-59904-373-9.ch001. 8(2001)527–534. [64] O. Etzioni, M. Banko, S. Soderland, D. URL:https://pubmed.ncbi.nlm. S. Weld, Open Information Extrac- tion nih.gov/11687560https://www.ncbi. from the Web, Commun. ACM 51 (2008) nlm.nih.gov/pmc/articles/PMC130063/. 68–74. URL: https://doi.org/ doi:10.1136/jamia.2001.0080527. 10.1145/1409360.1409378. doi:10.1145/ 1409360.1409378. [65] S. Ali, H. Mousa, M. Hussien, A Review of Open Information Extrac- tion Techniques, IJCI. International Journal of Computers and Infor- mation 6 (2019) 20–28. URL:https://ijci.journals.ekb.eg/article{_} 35099. htmlhttps://ijci.journals.ekb.eg/ article{_}35099{_}2751a97dec8ca23f3e6 ca98f27cee4b6.pdf. doi:10.21608/ijci.2019.35099. [66] O. Etzioni, A. Fader, J. Christensen, S. Soderland, M. Mausam, Open Infor- mation Extraction: The Second Gener- ation, in: Proceedings of the Twenty- Second International Joint Conference on Artificial Intelligence - Volume Vol- ume