Grappling with the Future Use of Big Data for Translational Medicine and Clinical Care S
Total Page:16
File Type:pdf, Size:1020Kb
96 © 2017 IMIA and Schattauer GmbH Grappling with the Future Use of Big Data for Translational Medicine and Clinical Care S. Murphy1, 2, 4, V. Castro2, K. Mandl3, 4 1 Massachusetts General Hospital Laboratory of Computer Science, Boston, MA, USA 2 Partners Healthcare Research Information Science and Computing, Somerville, MA, USA 3 Boston Children’s Hospital Computational Health Informatics Program, Boston, MA, USA 4 Harvard Medical School Department of Biomedical Informatics, Boston, MA, USA Summary is surprisingly disjointed in its ability to Introduction handle imaging, waveform, genomic, and Objectives: Although patients may have a wealth of imaging, If we are to infuse science-based clinical continuous cardiac and cephalic monitoring. genomic, monitoring, and personal device data, it has yet to be practice into healthcare, we will need Much of these data fall under the label of fully integrated into clinical care. better technology platforms than the cur- “Big Data” [5]. Methods: We identify three reasons for the lack of integration. rent Electronic Medical Record Systems One defining aspect of “Big Data” is that The first is that “Big Data” is poorly managed by most Electronic (EMRS), which are not built to support the data tends to be organized in a funda- Medical Record Systems (EMRS). The data is mostly available scientific innovation. In different countries, mentally different way that cannot fit easily on “cloud-native” platforms that are outside the scope of most the functionality and overall goals of EMRS and directly into hierarchical and relational EMRs, and even checking if such data is available on a patient can vary, but in the United States (US) the databases. Rather, much of the data arrives often must be done outside the EMRS. The second reason is primary purpose of EMRS is to bill for the as (sometimes very large) data objects. In S3 that extracting features from the Big Data that are relevant to patient visit [1]. The Epic EMRS, dominant object storing systems (such as https://aws. healthcare often requires complex machine learning algorithms, in many US Academic Healthcare Centers amazon.com/s3), there is often only minimal such as determining if a genomic variant is protein-altering. The and known for its strength of efficient pa- metadata regarding what information exists third reason is that applications that present Big Data need to be tient billing [2], is mostly closed to outside in the data. These data “objects” have their modified constantly to reflect the current state of knowledge, such software additions as part of the quest to own internal structure and often require a as instructing when to order a new set of genomic tests. In some develop the most efficient system possible. great deal of computation to extract health- cases, applications need to be updated nightly. Other EMRS in the US are perhaps best care-relevant features. Support exists for Results: A new architecture for EMRS is evolving which could summarized as vehicles for inter-clinician many tools within the cloud-native platform unite Big Data, machine learning, and clinical care through a communication [3]. This communication is that enable the analysis of the healthcare microservice-based architecture which can host applications fo- mostly facilitated through the exchange of “Big Data” such as Hadoop, CouchDB, cused on quite specific aspects of clinical care, such as managing narrative text, and therefore the information MongoDB, Matlab, R, SAS, and Docker, cancer immunotherapy. which is not coded for billing purposes is left but the paradigm of computing on data ob- Conclusion: Informatics innovation, medical research, and in the unstructured notes exchanged between jects rather than examining the data through clinical care go hand in hand as we look to infuse science-based clinicians [4]. This situation has greatly Structured Query Languages requires new practice into healthcare. Innovative methods will lead to a new reduced the appeal of using Big Data for skills and infrastructures which are differ- ecosystem of applications (Apps) interacting with healthcare Healthcare in the US. We will largely focus ent from the database-oriented approaches providers to fulfill a promise that is still to be determined. on the loss of this appeal in this report. currently employed in most EMRS. Keywords At odds with what is presented above In addition to barriers regarding training as an efficient Healthcare Enterprise In- people how to use the new tools, the tools Big Data, healthcare; decision support systems, clinical; genomics formation Technology (IT) management to process Big Data and extract health- system is data entering the system through care-relevant features must often reside on Yearb Med Inform 2017: 96-102 channels outside the Healthcare Enterprise, “Cloud-native” Platforms [6]. This is be- http://dx.doi.org/10.15265/IY-2017-020 data such as personal health data, patient cause they need massive parallel processing Published online August 18, 2017 surveys, patient self-reported outcomes, to scale their file processing environments. and patient-initiated genomic testing. Even Software services and tools on cloud-native inside the Healthcare Enterprise, the bill- platforms enable scalable provisioning of ing and communication-oriented EMRS compute resources, interoperability between IMIA Yearbook of Medical Informatics 2017 97 Grappling with the Future Use of Big Data for Translational Medicine and Clinical Care digital objects, indexing to promote discov- cephalogram waveform data from seizure feeds from the entire population over the past erability of digital objects, sharing of digital monitoring is commonly put into files in spe- year where people are identified in a coded objects between providers and processes, ac- cialized systems. The metadata of the files form. This presents a challenge to placing cess to and deployment of scientific analysis are not mapped to standard annotations, or boundaries on what data is shared, especial- tools and pipeline workflows, and connec- annotations of any kind, that are recognized ly regarding consented and unconsented tivity with other repositories, registries, and at the enterprise level. The system itself may patients. Generally, one must obtain patient resources. The idea of a Big Data Commons use S3 storage, but the internal structures consent to associate the internal data with has been advanced at the National Institute of the data are completely unrecognized specific patients, although in general, Big of Health (NIH) as “The NIH Commons” by a traditional healthcare enterprise query Data sets that are shared as “de-identified” (https://datascience.nih.gov/commons) system. Without a query language to make pose a significant risk for re-identification illustrating many of these differences with a semantic link between systems, much of due to the incredibly detailed data which traditional computing environments. this data is left on the healthcare analytics’ may be available per patient [8]. This makes Understanding the use of Big Data within cutting room floor. Big Data a challenge for integration into the the policy of healthcare remains a work-in- A third integration hurdle is the necessi- healthcare enterprise. Most EMRS are not progress. The cloud-native platform does ty to adopt new policies that deal with the prepared for the challenges of untangling not need to reside in the public cloud, and sharing of Big Data, which must address at the patient attributions in the data, nor similar computational infrastructure can be least four issues: 1) what data to focus upon placing boundaries around the granular data established in private and public clouds such sharing; 2) how to place boundaries on the elements inside the Big Data. that parts of the computing infrastructure can way data is shared; 3) how to maintain the Although the cost of keeping Big move from private to public clouds as need- cost of sharing data; and 4) how to provide Data in the public cloud is comparable to ed, especially when considering concerns adequate motivation for stakeholders to the cost of local S3 storage, the cost of of data privacy. support the sharing of the data. transferring data can be very high (http:// To focus on what data to share, it is im- www.networkworld.com/article/3164444/ portant to understand the use cases behind cloud-computing/how-to-calculate-the- selecting the data to be shared. For example, true-cost-of-migrating-to-the-cloud.html). Organizing an Approach to the initial focus in a US healthcare data This reflects the general principle that network named PCORNet was to include costs are minimized by operating on the Big Data in Healthcare standard coded healthcare data only, but data locally, and extracting the results of Big Data faces numerous integration hurdles, when a needs’ assessment was later under- an operation, rather than transferring the both with the EMRS data and with the other taken among the network sites, most needs entire Big Data set to a remote site for an Big Data. This is due to several factors, in- were found to be for more specific data operation to be performed. For example, cluding the relatively unstructured approach such as cancer registry data, rare laboratory although millions of seconds of waveform to data organization in the cloud-native en- tests, and personal health data [7]. This data may be available, only particular vironment. The approach of S3 file storage emphasized that, although “classic” data in episodes of cardiac arrhythmia or brain is to allow each file to be described with the EMRS may be easier to handle than Big epileptiform discharges may be desired. metadata that helps organize the file system, Data, use cases are guiding the need to in- Supplying metadata can direct analysis pro- but the internal data structures of the files clude Big Data, and because this integration grams operating at the location of the data cannot be directly queried.