Intelligent Integrated Digital Platform Insilicokdd in Support of Precision Medicine for Scientific Research
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Advanced Science and Technology Vol. 29, No. 7s, (2020), pp. 2101-2109 Intelligent Integrated Digital Platform InSilicoKDD in Support of Precision Medicine for Scientific Research [1]Plamenka Borovska, [2]Iliyan Kordev, [3]Boris Borowsky [1,2,3] Technical University of Sofia, Bulgaria Abstract The paper presents intelligent integrated digital platform InSilicoKDD based on scientific workflows in support of precision medicine designed, implemented and deployed at the Faculty of Applied Mathematics and Informatics, Technical University of Sofia. The platform is built upon our method for adaptive in silico knowledge data discovery and intelligent decision making as well as integrated approach to combine and analyze digital data at all stages of breast cancer progress, deploying modern in silico KDD methods and technologies. The conceptual architectural design of InSilicoKDD platform is presented and the software prototype revealed. Functionality and efficiency of InSilicoKDD platform have been verified on the basis of experimentation for the breast cancer issue by configuring and executing scientific workflows encompassing machine learning models for diagnostics and prediction, data sets with results of the biopsy of the cells in the patient's tumor as well as breast cancer risk assessment models. Index Terms—data intensive scientific discovery, in silico knowledge data discovery, precision medicine, scientific workflows platform. I. THE PROBLEM AREA Nowadays personalized and precision medicine [1],[2] are innovative frontier paradigms worldwide and imply adapting and optimizing of medical treatment to the individual characteristics of each patient including genetic specifics, environmental factors and risk factors of life style. Precision medicine has become possible because more people's genomes can be sequenced and more metadata collected for multiple existing therapies, but it is assumed that the most likely path to success comes from analyzing well-defined subgroups of patients. Personalized and precision medicine naturally arise from the intersection of "omics" [3] and Big Data technologies [4]. In silico technologies [5], [6] have been playing an important role in biomedical research and experimentation stimulating the advance of IT in support of personalized and precision medicine. During the last decades as a result of the digitalization of medical equipment and innovative “omics” and in silico technologies the amounts of the accumulated biomedical data is huge and grows exponentially holding substantial hidden value. The fourth paradigm of scientific research and innovation Data-Intensive-Scientific-Discovery [7] and the progress of computer and data sciences predetermined the emergence of in silico knowledge data discovery (KDD) [8]. The diagnostics and treatment of cancer is one of the areas in medicine where in silico KDD is most widely used. Breast cancer has the highest rate of cancer in women and causes approximately 15% of all women's cancer deaths. The high incidence and mortality rate make it a leading socially significant disease. According to the World Health Organization, 2.1 million women are diagnosed with the disease every year. Therefore, one of the major challenges facing modern information technology is to provide innovative intelligent integrated solutions to support precision medicine, and in particular to support cancer research in line with the fourth paradigm for scientific research, in silico KDD and intelligent decision making based on big biomedical data analytics. The major goal of this paper is to introduce our intelligent integrated digital solution in support of precision medicine for breast cancer issue that has been designed, implemented and deployed at the Department of Informatics, Faculty of Applied Mathematics and Informatics, Technical University of Sofia. The platform is based on scientific workflows and is used as an experimental framework for scientific research by our PhD students and postdocs as well as for education purposes by our MSc students for the course “Big data analytics in support of precision medicine” in our MSc program “Big data analytics”. 2101 ISSN: 2005-4238 IJAST Copyright ⓒ 2020 SERSC International Journal of Advanced Science and Technology Vol. 29, No. 7s, (2020), pp. 2101-2109 II. MOTIVES AND BACKGROUND In our recent study we have designed intelligent method for adaptive in silico knowledge data discovery and decision making based on big data analytics [9]. The method is based on differentiated (descriptive, diagnostic, prognostic and prescriptive) and integrated scientific analytic workflows and is built upon the phase parallel paradigm – machine learning phase and operational analytics phase. It has been verified for a range of bio-molecular applications [10]. The goal is to design and implement intelligent digital scientific workflows based system for in silico KDD and decision-making based on an integrated approach for the case study of breast cancer on the basis of the proposed intelligent method. An integrated approach [11] is applied to combine and analyze digital data at all stages of breast cancer progress, deploying modern in silico KDD methods and technologies. Collecting data from all stages facilitates early and precise diagnostics, complex risk assessment and allows sensitivity analysis of various diagnostic and prognostic factors. In order to be implemented the integrated approach requires integrated digital platform, offering services for selected 4 fundamental areas of research – (1) bioinformatics, (2) machine learning, (3) in silico KDD, (4) breast cancer risk evaluation, and (5) optimization. The research area of KDD and intelligent decision making demands functionality for the construction and execution of scientific analytic workflows, comprising additional components for risk evaluation, personal therapy optimization, and genetic analysis. To select a suitable platform for the implementation of our integrated digital solution InSilicoKDD we have explored modern bioinformatics platforms based on scientific workflows as well as frameworks and environments for data analytics and KDD. Presently, there is a wide spectrum of bioinformatics platforms based on scientific workflows with plentiful functionality such as Galaxy, Unipro UGENE, Taverna, etc. Galaxy [12] is an open source web based platform for intensive biomedical research. It is a scientific workflow management platform and also provides tools for integrating biological data. It is linux based and is written in Python and JavaScript. Unipro UGENE [13] is an open-source multi-platform software written in C++ and whose primary purpose is to assist molecular biologists in managing in silico experimentation. Apache Taverna [14] is an open source, domain-independent workflow management system encompassing a set of software tools used to design and execute bioinformatics scientific workflows for in silico experimentation. Presently, Taverna is incubating project of Apache Software Foundation. Apache Spark™ [15] is an open source framework for cluster computing providing unified software tools for Big Data analytics. Weka (Waikato Environment for Knowledge Analysis) [16] is a machine learning software that implements a collection of Java machine learning algorithms at Waikato University in New Zealand. Following the integrated approach to combine and analyze digital data at significant stages of breast cancer progress and in order to design and implement the integrated intelligent digital solution for the case study of breast cancer, it is necessary to construct and implement custom digital platform for biomedical research. The target platform has to manage scientific workflows and integrate the relevant functionality of bioinformatics platforms and frameworks for data analytics and KDD, as well as additional modules for risk assessment, optimization and selected breast cancer relevant bioinformatics tools. III. CONCEPTUAL ARCHITECTURAL DESIGN AND FUNCTIONALITY OF INSILICOKDD PLATFORM The purpose of InSilicoKDD is to help physicians process and analyze the huge amount of patient-related data to discover in silico knowledge for precision diagnostics, breast cancer risk assessment and estimated life expectancy, as well as automatically generate recommendation for precision therapy. The conceptual architectural design of InSilicoKDD is shown in Fig. 1. 2102 ISSN: 2005-4238 IJAST Copyright ⓒ 2020 SERSC International Journal of Advanced Science and Technology Vol. 29, No. 7s, (2020), pp. 2101-2109 GRAPHIC USER INTERFACE (GUI) DESKTOP APP MOBILE APP FRONT END SYSTEM SCIENTIFIC IN SILICO KDD WORKFLOW MANAGER ML LIB ML MODEL GENERATORS MODEL SELECTOR DATA SET SELECTOR MACHINE LEARNING SECTION MODELS INTEGRATOR DATA INTEGRATOR DIFFERENTIATED WORKFLOW BUILDERS PARALLEL DIFFERENTIATED WORKFLOWS BUILDER LIB MODEL GENERATORS WORKFLOW INTEGRATOR AND RECONFIGURATOR WORKFLOW BUILDERS RISK EVALUAITON SECTION PARALLEL DIFFERENTIATED WORKFLOWS SELECTED EXECUTOR BIOINFORMATICS TOOLS DIFFERENTIATED INTEGRATED WORKFLOW EXECUTOR BIOINFORMATICS SECTION WORKFLOW EXECUTORS WORKFLOW EXECUTORS ParMetaOpt Experimental Framework OPERATIONAL ANALYSIS SECTION OPTIMIZATION SECTION KNOWLEDGE MANAGEMENT DATA SET MANAGEMENT BACKEND SYSTEM MODEL RANKING GENERIC CUSTOM TRAINING AND VALIDATING MODEL REPOSITORY MODEL REPOSITORY DATA SETS Figure 1: Conceptual architectural design of InSilico KDD The digital solution offers functionality covering all stages of in silico knowledge discovery: Descriptive