Combining Machine Learning Algorithms with an Inference Engine for Effective Clinical Diagnosis and Treatment Shuo Yang

Old Dominion University ODU Digital Commons Information Technology & Decision Sciences Information Technology & Decision Sciences Faculty Publications 2017 Semantic Inference on Clinical Documents: Combining Machine Learning Algorithms With an Inference Engine for Effective Clinical Diagnosis and Treatment Shuo Yang Ran Wei Jingzhi Guo Lida Xu Old Dominion University, [email protected] Follow this and additional works at: https://digitalcommons.odu.edu/itds_facpubs Part of the Databases and Information Systems Commons, Electrical and Electronics Commons, Management Information Systems Commons, and the Systems and Communications Commons Repository Citation Yang, Shuo; Wei, Ran; Guo, Jingzhi; and Xu, Lida, "Semantic Inference on Clinical Documents: Combining Machine Learning Algorithms With an Inference Engine for Effective Clinical Diagnosis and Treatment" (2017). Information Technology & Decision Sciences Faculty Publications. 2. https://digitalcommons.odu.edu/itds_facpubs/2 Original Publication Citation Yang, S., Wei, R., Guo, J. Z., & Xu, L. D. (2017). Semantic inference on clinical documents: Combining machine learning algorithms with an inference engine for effective clinical diagnosis and treatment. IEEE Access, 5, 3529-3546. doi:10.1109/access.2017.2672975 This Article is brought to you for free and open access by the Information Technology & Decision Sciences at ODU Digital Commons. It has been accepted for inclusion in Information Technology & Decision Sciences Faculty Publications by an authorized administrator of ODU Digital Commons. For more information, please contact [email protected]. SPECIAL SECTION ON HEALTHCARE BIG DATA Received November 13, 2016, accepted February 14, 2017, date of publication March 1, 2017, date of current version March 28, 2017. Digital Object Identifier 10.1109/ACCESS.2017.2672975 Semantic Inference on Clinical Documents: Combining Machine Learning Algorithms With an Inference Engine for Effective Clinical Diagnosis and Treatment SHUO YANG1, (Member, IEEE), RAN WEI2, JINGZHI GUO1, (Member, IEEE), AND LIDA XU3, (Senior Member, IEEE) 1Faculty of Science and Technology, University of Macau, Taipa 999078, China 2Department of Microbiology, Rutgers University, Newark, NJ 07103 USA 3Department of Information Technology and Decision Sciences, Old Dominion University, Norfolk, VA 23529 USA Corresponding author: S. Yang ([email protected]) This work was supported by the University of Macau Research under Grant MYRG2015-00043-FST. ABSTRACT Clinical practice calls for reliable diagnosis and optimized treatment. However, human errors in health care remain a severe issue even in industrialized countries. The application of clinical decision support systems (CDSS) casts light on this problem. However, given the great improvement in CDSS over the past several years, challenges to their wide-scale application are still present, including: 1) decision making of CDSS is complicated by the complexity of the data regarding human physiology and pathology, which could render the whole process more time-consuming by loading big data related to patients; and 2) information incompatibility among different health information systems (HIS) makes CDSS an information island, i.e., additional input work on patient information might be required, which would further increase the burden on clinicians. One popular strategy is the integration of CDSS in HIS to directly read electronic health records (EHRs) for analysis. However, gathering data from EHRs could constitute another problem, because EHR document standards are not unified. In addition, HIS could use different default clinical terminologies to define input data, which could cause additional misinterpretation. Several proposals have been published thus far to allow CDSS access to EHRs via the redefinition of data terminologies according to the standards used by the recipients of the data flow, but they mostly aim at specific versions of CDSS guidelines. This paper views these problems in a different way. Compared with conventional approaches, we suggest more fundamental changes; specifically, uniform and updatable clinical terminology and document syntax should be used by EHRs, HIS, and their integrated CDSS. Facilitated data exchange will increase the overall data loading efficacy, enabling CDSS to read more information for analysis at a given time. Furthermore, a proposed CDSS should be based on self-learning, which dynamically updates a knowledge model according to the data-stream-based upcoming data set. The experiment results show that our system increases the accuracy of the diagnosis and treatment strategy designs. INDEX TERMS Big data, case-based reasoning, clinical diagnosis, decision tree, data stream mining, disease detection, electronic health record, medical record, semantic integration. I. INTRODUCTION drug effects are reported annually [3], [4], with statistics Given the recent dramatic progress that global endeavors revealing that approximately 50% are preventable [3]. Sub- have made in health care, it is surprising that human errors groups of this category of error include incorrect prescrip- remain the leading cause of death even in developed coun- tion, drug dose and administration. In addition, incorrect tries such as the US [1]. Among all of the contributing diagnosis is another typical human error, which causes fun- factors, errors related to medication are the most common damentally wrong medical decisions that lead to serious category in medical practices [2]. A large number of adverse consequences [5]. To reduce the risk of human error as 2169-3536 2017 IEEE. Translations and content mining are permitted for academic research only. VOLUME 5, 2017 Personal use is also permitted, but republication/redistribution requires IEEE permission. 3529 See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. S. Yang et al.: Semantic Inference on Clinical Documents: Combining Machine Learning Algorithms With an Inference Engine well as the workload of the medical staff, the application of demands cannot be completely satisfied; however, with the medical software has long been suggested as a possible enormous amount of patient care data from different hospitals tool [6]–[8], [58]. According to [2], different strategies of in EHRs, CDSS could be designed to analyze all of the software design are used to solve the two aforementioned historically hospitalized patient care data and make recom- problems. To prevent medication errors, the application mendations. This arrangement will help the patients to make should be designed as an automated database with historical optimal decisions according to their personalized demands. and current medical records of a patient, as well as other key To address these challenges, this article proposes a per- information, including all prescription and personal allergic sonalized and professional clinical diagnosis and treatment reaction documents to prevent any inappropriate prescrip- system (CDTS) that combines machine learning algorithms tions and provide warnings on them. In addition, computer- with an inference engine. Specifically, this article makes the assisted diagnosis software is used to increase the accuracy of following contributions: the diagnosis and decrease the time that is needed for decision Devise a voted ensemble classification algorithm that making. A few previous research studies have shown the is suitable for data stream mining to meet the big data feasibility of computer-assisted clinical information access computational demand; and practice in terms of significantly reduced incidence of Implement clinical tabular document syntax (DocLang) medical errors or improved accuracy of diagnosing mul- to integrate heterogeneous clinical information for accu- tiple diseases [9]–[11]. Currently, one popular strategy in rate disease diagnosis and treatment. This approach has the implementation of automated clinical practices is the two advantages: (1) maintaining semantic consistency integration of clinical decision support systems (CDSS) and among heterogeneous contexts; and (2) facilitating auto- health information systems (HIS) to directly read electronic matic document understanding and processing across health records (EHR) for analysis [12], [13]. This method different contexts, because DocLang is not only a docu- greatly reduces the cost of training and the time required for ment representation language but also a rule language; entering massive amounts of patient-related data during each Support personalized and professional demands based visit [14], [15], and it dynamically reconstructs the diagnosis on the patients' and doctors' features. Patients are able model according to the real-time fluctuations of the patient's to receive suggestions (e.g., disease diagnosis or treat- condition [16]. ment suggestion) without going to hospitals according However, several remaining problems still challenge the to their preference (e.g., choosing a clinician in another prospect of CDSS applications: country). (1) Rapid growth of data. With the enormous number of patients who are exposed to all types of tests, the size of the The remainder of this article is organized as follows. medical information database is increasing rapidly in millions Section II presents related studies on decision tree and case- of multi-dimensional records (e.g., physical indicators of the based reasoning. In Section III, we introduce the frame- human body collected by smart clothing [59], [60], [61])

Load more